Q&A: Historians and AI.
*
Q&A: Historians and AI. *
Since 2024, academic and professional historians have been asking me questions about Artificial Intelligence and how (if) it should matter to their research practices.
Here is a short list of some questions that I managed to record. If you read this Q&A and still have questions about AI and History- great! You can submit a question that I’ll research and answer to colourfulhistories(@)gmail.com. Questions can remain anonymous and may be edited for clarity.
-
That’s a great question. Being the first question in a long list, I’m going to give some definitions and context, before giving a direct response. Skip what you know and enjoy learning what you don’t already know.
Artificial Intelligence AI is ubiquitous now. But you know what? AI is not a new presence in our computer and communication systems; the messaging about its supposed value is has just hyped up significantly in the past two years.
While fountain pens and a good notebook can readily sit alongside with a solid pair of boots for curious historians, digital technology can also be a major aspect of present-day historian’s output. If the latter is the case, it is likely that you’re already using AI in your research. AI is part of digital humanities (DH), a critical branch of humanities research. Research and output associated with DH incorporate key insights from languages and literature, history, music, media and communications, alongside computer science and information studies. Combining these different approaches and technologies into new frameworks provides a range of output (such as avatars, interactive maps, inter-connected record-keeping, digitised records, chatbots, APIs) for public, professional, academic and amateur histories.
Recently, the digital humanities have widened to include critical engagement with research processes such as machine-learning, data science, and AI. It’s important to note that AI is a very broad term for a range of automated machine processing that are designed to identify patterns and make statistical inference. AI models perform tasks or produce output (responses) that normally require human intelligence by applying machine learning techniques to large collections of data and identifying patterns.
For example, if you are monolinguistic or have limited capacity to learn a range of languages, you’ve likely used machine translation on records to better understand another language in the archive or in an interview. Machine translation involves changing text from one language into another language using a computer. There is a distinction between purpose-built translation tools (Google Translate, DeepL, Microsoft Translator) and the general-purpose chatbots ( ChatGPT, Claude, Google Gemini, Microsoft Copilot). A chatbot is a computer program that simulates and provides human conversation (either written or spoken), allowing researchers to interact with digital devices.
Google has used AI to provide search responses to users for over a decade. The recent introduction of AI overviews in Google search responses is changing the way users receive results. Programmed like a chatbot, AI responses will often appear fluent to the reader which increases the users’ sense of trust towards the answer. However, after trawling the internet, and taking into account other parameters set by the programmer, these AI answers trend towards being incomplete (at best) and hallucinating (at worst). Hallucinations in the context of AI refers to the Large Language Model (LLM) generating partially or false answers, often supported by fictitious citations. The supposed fluency of AI makes the responses deceptive because the inaccuracies and omissions/additions are not always obvious right away. These false inclusions increase the spread mis- or dis-information.
After all of that the answer is- you start using AI according to your requirements. If you are working with archival or library systems, maps, images, audio files, digitised material, or handwrittten documents, there will be a paid, or open, version of AI on the market. You just have to find it.
-
I hear you. Paying subscription fees for numerous products to process data is expensive. And while there are paid versions (ChatGPT Plus, Claude PRO and Gemini Advanced) here we are, lucky enough to exist when companies are giving us access to this new technology for free! Why not use it?
Let’s think about the future before we upload the past: what is at stake here? The “free” use of AI is like those “free” bathroom products in the fancy hotel. The cost is there; it’s just embedded somewhere in the costings where you can’t easily see it. In this instance you’re paying for the use of AI, like Gemini, Claude or Copilot, by uploading material for trend, pattern and research processing. When you upload this information to these platforms for image processing, translation, image creation, map creation (just to name a few options) you’re gifting corporations’ information about yourself, or worse, others that don’t know you’re uploading their information, ideas, and creations. While there are services, like Transkribus, that can allow you to protect data, often those “free” products are gathering “data” (personal, business, community and social information) without clear structures concerning privacy, intellectual property, security and digital governance. Before you upload information to a chatbot, ask yourself
- is it your information to share?
- are you happy for the company to use that information in any way they deem necessary?
- are you using someone else’s data in an ethical way?
- Does paying for a license afford data protection, privacy and security?
If your answers are of the negative kind, stop, and rethink how to best process this material. If you want to use AI, be sure to read the data policy before sharing that historic information.
-
Transkribus is a programme specifically designed to deal with the complexity of working with handwritten texts that have been digitised. AI is used innovatively in this platform to improve Handwritten Text Recognition (HTR: when a computer can receive and interpret handwritten input from images) and Optical Character Recognition (OCR: is the process that converts an image of text into machine readable text format). If you upload a digitised image to Transkribus, select the correct language and period, according to Transkribus, the processing time of historical documents, colonial records, diaries, letters, newspapers, memos, can decrease.
What makes Transkribus an excellent time saver is how it can be trained by users and tailored to their particular needs, thereby cutting back on processing time in the long run. Programmers, and some users, have trained this platform using Deep Learning and Neural Networks to look for pattern recognition (having the system understand that certain shapes of ink represent particular letters, which form particular words, and so on).
Users need to acknowledge that Transkribus has bias. It is trained on a select range of already converted images. The “high resource” languages of French, German, Dutch and English are dominant on this platform. The AI uses what it “knows” from previous uploads to transcribe your material, thereby making it searchable, editable and text with improved workability. The issue here is your author (the person who created the original document) must have clear, conforming, handwriting, that the Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) that Transkribus runs on can recognise. If the AI cannot process (“read”) the material upload, you do have the option to train it to do so, but this takes, you guessed it, time. And the lack of time is what got you here in the first place.
TLDR: If you’re using colonial office records, or handwriting that is standardised, use Transkribus to save time. If you have an author with unique handwriting, or are studying “low resource” Pacific languages, for a project, be sure to establish time in your planning to locate a range of data, consult with community, and to train Transkribus in matters of “ground truth”. Yes, truth is a contentious term for historians, but in this context refers to information acquired by direct observation about data. Ground-truth documents are human verified transcription used as the gold-standard for AI to learn from. -
An avatar is an icon or figure representing a particular person in a video game, internet forum, or some other virtual arrangement. In the digital realm, an avatar acts as a bridge between the present and the past. While gamers know them as digital skins, in history, they are embodied forms of archives. This electronic image can be moving or a still. They tend to represent a figure (person, animal or otherwise) that represents a real-world entity. Avatars have the potential to be manipulated by a computer user. If you play video games you’re sitting there, reading this, acknowledging that avatars are indeed, a virtual representation of self. If you are not a gamer, hang on to your notebook; avatars, with the correct research, programming and audience refinement, can be a fun means of sharing stories from the past.
Avatars have been used previously to communicate stories from the past to a viewer. With the foundational characteristic of representing a person, avatars are an appealing avenue by which to represent one particular person’s experience of the past, be it an event or period. They can be static; think about the image of a person and having information delivered to you by a dialogue box. There are also avatars that provide movement, in body or simply facial expressions. The latter are likely to be educational avatars. Unlike static images or simple animations, these avatars are designed to mimic human behaviours and expressions, making them appear more lifelike. An interactive avatar runs on two sources. A
Large Language Model (LLM)
knowledge base paired with system instructions.
Active and emotive avatars can be tailored to a particular curriculum goal, such as Indigenous history or ANZAC history. The creation of specific people, especially Indigenous people, raises important ethical questions. When creating avatars based on real historical people, consultation with community or descendants is essential. If you are using specific oral histories, diary entries, reports, or letters to create the avatar’s answers about someone’s lived experience there needs to be considerations about intellectual property, copyright, and collaborative insight. The merger between informed consent, or ethical engagement with historical records, and technical expertise is crucial to the creation of a good avatar.
The use of virtual avatars for education, to enhance a users’ engagement with the story being told can be a valuable process. A study concerning the use of non-realistic avatars in education programmes indicates they enhance curiosity, reduce social barriers, and foster playful learning atmospheres, while realistic avatars promote empathy, relatability, and deeper emotional investment. An important aspect of these storytelling successes is ensuring high video quality. This includes the expressiveness of the virtual avatar. If there are realistic expressions employed by the avatar’s programmers, then users are more likely to engage with the content.
The combination of VAs, and Artificial Intelligence (AI), especially chatbots, has changed how history is shared with particular cohorts, for instance school groups (primary, secondary and tertiary). Chatbots are a computer program designed to simulate conversation with human users, especially over the internet. You may have encountered a chatbot online while doing shopping, logging an enquiry with a company which you have a service account (electricity, gas, insurance) or doing online training. There are history education and cultural education outreach avatars such as Charlie the Virtual Veteran, Queensland State Library and King David with ACTS Education (when you book a demo, ask if you can trial their Charles Bean avatar). Rather than an avatar talking “at” the user, with the inclusion of AI the user asks questions of the avatar about their experiences in the past.
For example, if you have an avatar of Emily Caroline Creaghe the responses should be (I’m hoping more avatar creators hire historians for content creation) based on the primary sources (her diary) and secondary sources (various, based on the decisions of the associated researcher/historian). These documents provide the information parameters for the avatar. It’s important to acknowledge an avatar will only be as good as the research on which it is based.
A good avatar will also have firm guardrails. These directives will limit the capacity of the avatar; it will not engage with questions about, for example, sexuality (there is no data about this, and we’re not focused on that in a lesson plan about women explorers) or prompts (direction from the user) to assume the persona of a lead singer in a Riot Grrrl band (chatbots priorities can be changed without carefully applied guardrails). Guardrails are important as the chatbot can be commanded to, gently, steer the user back to the task at hand and have an in-depth discussion that draws on primary and secondary sources to consider the life of a female explorer in the late nineteenth century. Having clear guardrails helps to preserve the historical integrity of the persona being presented and prevents “jailbreaking” (trying to make the avatar a singer). These boundaries can also reinforce to the user that they are not speaking with a “person”, but a “simulation” of a person based on specific records.
Creating a limited scope of inquiry via firm guardrails can’t solve all the issues with avatars that have been programmed to utilise AI. The LLM on which they are based can still “hallucinate” (make up information to ensure it meets the chatbot requirement of supplying the user with a sensible answer). This is a risk with avatars. However, I also see it as an opportunity for users to be informed about the limitations of the AI-avatar through the lens of archival gaps and silences. Excellent discussion points for history students and the historically curious. This issue is a pathway to critique the past and the tools we use to deliver stories.
Hopefully I’ve answered the question for this week. Have a question? Send it through to colourfulhistories(@)gmail.com.
Deborah
-
This is a most excellent question! There are a few stages to the response, what is machine learning, examples, and then the historians’ role itself. Grab your beverage of choice and hopefully you’ll be inspired to get involved with supervised learning, or discussions about the importance of critical humanities to model development, after reading this Q&A response.
To understand supervised learning, we need to know the process of machine learning. Researchers have attempted to develop algorithms—termed machine learning—that imitate and even excel in human cognitive ability to do complex tasks in the history profession. Machine learning refers to an algorithm in which a computer recognizes patterns and relationships of variables based on given data. Each algorithm develops a model to output an answer for a specific problem.
Currently, there are three types of machine learning: supervised, unsupervised, and reinforcement learning. In history there are currently no universal or regional guidelines for using AI models in the workplace unlike, say, in medical research. Historians are not averse to guidelines. Just as we follow archival protocols to ensure integrity, we could use supervised learning as an opportunity to ‘label’ data so others can recognise our specific professional standards. We need guidelines for using AI models not to giveaway our expertise and be replaced (Sarah Conner taught me better than that), but so to reassure our readers and peers that our research follows particular standards when using certain digital tools. By ensuring models are subject to supervised learning we continue the historiographical tradition, conceptual styles and frameworks are recognisable to others.
Supervised learning is when the researcher modifies content by tagging information to create a “ground truth” document. While the concept of “truth” is contentious for historians, in this context ground truth is the interdisciplinary assumption of a computer science definition which indicates the document provides the AI program an ultimate truth to work from, providing a sense of what is correct and what is an incorrect response to provide as output. The ground truth for computer scientists is, if we’re using a handwritten diary, the accurate transcription of entries. The historical truth is a debate for historians. From this ground truth, AI can run behind the scenes training sessions and learn how to make predictions which are then applied to a digitised image of text, such as a diary.
In the world of historians (welcome, tis a fine place to be), machine learning can be an option when confronting swathes of information about a particular topic, event, or time period. The ability of machine learning, offers an opportunity to negotiate the issue of all those beautiful, complex, heartbeat-pounding, piles, meters, stacks, boxes, tied bundles of manuscripts and documents or gigabyte and terabytes of digitised images from archives and libraries on a manageable scale. At the moment models that are relative to the work of historians focus on palaeography (the study of old handwriting) to improve access to cursively handwritten documents, such as diaries, letters, and official documents from colonial offices.
Before we get to realising the (impossible?) dream of wrangling copious amounts of information (data) from the past in a timely manner, we need to create a specific model to process a particular collection. Why? Well, I could take a model such as Claude, Gemini, ChatGPT, or even the ChatGPT model, Historians’ friend, and give it a digitised image of handwriting from a diary that was produced in the 19th century with the prompt “translate, transcribe, contextualise” and it would! Wonderful. But it may not be a useable answer as these are generative and probabilistic models that are designed to use pattern recognition to give an answer. Sure, because of the chatbot characteristics in the model, I will have been provided a very nice sounding or reading response. It will make sense, but, and this but is why I am so critical of such models, over time the general model will struggle with the nuances and variability of a person writing in a diary over the course of 12 months uniform spelling, letter shaping and sentence structures are rarely conforming. I work with collections made by a range of people working in various institutions – missions, schools, libraries, maternal health care– which varies their writing presentation format and style. This means we need a model that can cope with the unique elements of such records. That prompts the creation of a model using supervised learning. What we want to process such records with something like Transkribus, an application that uses Handwritten Text Recognition (HTR) a specialised form of supervised computer vision to process handwritten documents.
This supervised learning process requires historians to correct and tag documents, providing the model with clear parameters. For instance, to create a document that is suitable for training an AI model for Transkribus I actioned the following steps:
1. have a supermodel (an epic, pre-trained, model that contains millions of words and “knowledge” of character images) process a digitised image of a diary
2. compare the diary page content with the AI output to correct errors in presentation, while maintaining original spelling mistakes and variations. For this pilot project, to produce "ground truth" with the Strathfieldsaye diaries, an average of 130 changes were made per page.
3. mark-up or tag for particular names, places, or dates of interest to demonstrate to the model what information is required for later data mining (an AI process that uncovers patterns and other valuable information from large data sets). The AI process produces content trying to replicate what it "sees" as accurately as possible. When I see "rain", AI sees "Ram", and with faded ink, digitised material, and individual handwriting, it's not "wrong" but woefully inaccurate. This is why supervised learning is so important in the creation of usable digitised records.
Once steps like the above were completed, the marked-up material was uploaded to the Transkribus platform to create a model that is suitable for this series of records. In a pilot project for the University of Melbourne using the Strathfieldsaye station diaries (1872-1875), the most successful outcomes came from correcting and tagging 50 pages of data, approximately, 24,000 words for model training and supervised machine learning.[i] The need to correct material almost halved to 70 changes needed per page, rather than the initial average of 130 changes.
It’s interesting to note, in these instances, the AI is not capturing spelling and writing conventions—its primary goal is to reproduce an author’s writing-style accurately. Correction and tagging are crucial in supervised learning and to export trustworthy data for collection discovery, data mining, and metadata creation.
Supervised learning of AI by historians serves dual purposes: training more accurate AI models can open up collections to a range of users, archivists and historians can continue to safeguard historical records, while offering an opportunity to address long-term historiographical concerns regarding the gaps and silences within archives.
By being involved in supervised learning, historians assist in multiple organisations and users of historical records. We can help librarians and archivists to address the massive data problem experienced by their organisations and institutions.[ii] If historians are excluded from the process of supervised learning, or reject involvement, AI-augmented output (responses) that addresses historical content is more likely to contain content and contextualisation errors. We can ensure the standards and practices of our discipline are consistently revised, adapted and practiced in both the analogue and digital worlds in which we live.
Hopefully I’ve answered the submitted question.
Do you have a question about history and the digital humanities? Send it through to colourfulhistories(@)gmail.com.
Deborah[i] Disher, Harold Clive, Strathfieldsaye Estate Diary February 1872-1875 (February 1872-31 October 1875), [UMA-IT-000147344]. University of Melbourne Archives, https://archives.library.unimelb.edu.au/nodes/view/637676, accessed 28 October 2025.
[ii] National Archives Australia, ‘Information management: Outsourcing digital storage’, National Archives of Australia, 2025, https://www.naa.gov.au/information-management/storing-and-preserving-information/storing-information/outsourcing-digital-storage; Indigo Holcombe-James, ‘I’m fired up now!’: digital cataloguing, community archives, and unintended opportunities for individual and archival digital inclusion, Archival Science, 22, 2022, 521–538, https://doi.org/10.1007/s10502-021-09380-1; Hider, P. (2024). At a Crossroads: Cataloguing Policy and Practice in Australian Libraries. Journal of the Australian Library and Information Association, 74(1), 53–72. https://doi-org/10.1080/24750158.2024.2403165;
-
Answer forthcoming

