Q&A: Historians and AI.

Submit
  • ‍That’s a great question. Being the first question in a long list, I’m going to give some definitions and context, before giving a direct response. Skip what you know and enjoy learning what you don’t already know.

    Artificial Intelligence AI is ubiquitous now. But you know what? AI is not a new presence in our computer and communication systems; the messaging about its supposed value is has just hyped up significantly in the past two years.

    ‍While fountain pens and a good notebook can readily sit alongside with a solid pair of boots for curious historians, digital technology can also be a major aspect of present-day historian’s output. If the latter is the case, it is likely that you’re already using AI in your research. AI is part of digital humanities (DH), a critical branch of humanities research. Research and output associated with DH incorporate key insights from languages and literature, history, music, media and communications, alongside computer science and information studies. Combining these different approaches and technologies into new frameworks provides a range of output (such as avatars, interactive maps, inter-connected record-keeping, digitised records, chatbots, APIs) for public, professional, academic and amateur histories.

    ‍Recently, the digital humanities have widened to include critical engagement with research processes such as machine-learning, data science, and AI. It’s important to note that AI is a very broad term for a range of automated machine processing that are designed to identify patterns and make statistical inference. AI models perform tasks or produce output (responses) that normally require human intelligence by applying machine learning techniques to large collections of data and identifying patterns.

    ‍For example, if you are monolinguistic or have limited capacity to learn a range of languages, you’ve likely used machine translation on records to better understand another language in the archive or in an interview. Machine translation involves changing text from one language into another language using a computer. There is a distinction between purpose-built translation tools (Google Translate, DeepL, Microsoft Translator) and the general-purpose chatbots ( ChatGPT, Claude, Google Gemini, Microsoft Copilot). A chatbot is a computer program that simulates and provides human conversation (either written or spoken), allowing researchers to interact with digital devices.

    ‍Google has used AI to provide search responses to users for over a decade. The recent introduction of AI overviews in Google search responses is changing the way users receive results. Programmed like a chatbot, AI responses will often appear fluent to the reader which increases the users’ sense of trust towards the answer. However, after trawling the internet, and taking into account other parameters set by the programmer, these AI answers trend towards being incomplete (at best) and hallucinating (at worst). Hallucinations in the context of AI refers to the Large Language Model (LLM) generating partially or false answers, often supported by fictitious citations. The supposed fluency of AI makes the responses deceptive because the inaccuracies and omissions/additions are not always obvious right away. These false inclusions increase the spread mis- or dis-information.

    ‍After all of that the answer is- you start using AI according to your requirements. If you are working with archival or library systems, maps, images, audio files, digitised material, or handwrittten documents, there will be a paid, or open, version of AI on the market. You just have to find it.

  • I hear you. Paying subscription fees for numerous products to process data is expensive. And while there are paid versions (ChatGPT Plus, Claude PRO and Gemini Advanced) here we are, lucky enough to exist when companies are giving us access to this new technology for free! Why not use it?

    Let’s think about the future before we upload the past: what is at stake here? The “free” use of AI is like those “free” bathroom products in the fancy hotel. The cost is there; it’s just embedded somewhere in the costings where you can’t easily see it. In this instance you’re paying for the use of AI, like Gemini, Claude or Copilot, by uploading material for trend, pattern and research processing. When you upload this information to these platforms for image processing, translation, image creation, map creation (just to name a few options) you’re gifting corporations’ information about yourself, or worse, others that don’t know you’re uploading their information, ideas, and creations. While there are services, like Transkribus, that can allow you to protect data, often those “free” products are gathering “data” (personal, business, community and social information) without clear structures concerning privacy, intellectual property, security and digital governance. Before you upload information to a chatbot, ask yourself

    -            is it your information to share?

    -            are you happy for the company to use that information in any way they deem necessary?

    -            are you using someone else’s data in an ethical way?

    -            Does paying for a license afford data protection, privacy and security?

    If your answers are of the negative kind, stop, and rethink how to best process this material. If you want to use AI, be sure to read the data policy before sharing that historic information.

  • Transkribus is a programme specifically designed to deal with the complexity of working with handwritten texts that have been digitised. AI is used innovatively in this platform to improve Handwritten Text Recognition (HTR: when a computer can receive and interpret handwritten input from images) and Optical Character Recognition (OCR: is the process that converts an image of text into machine readable text format). If you upload a digitised image to Transkribus, select the correct language and period, according to Transkribus, the processing time of historical documents, colonial records, diaries, letters, newspapers, memos, can decrease.

    What makes Transkribus an excellent time saver is how it can be trained by users and tailored to their particular needs, thereby cutting back on processing time in the long run. Programmers, and some users, have trained this platform using Deep Learning and Neural Networks to look for pattern recognition (having the system understand that certain shapes of ink represent particular letters, which form particular words, and so on).

    Users need to acknowledge that Transkribus has bias. It is trained on a select range of already converted images. The “high resource” languages of French, German, Dutch and English are dominant on this platform. The AI uses what it “knows” from previous uploads to transcribe your material, thereby making it searchable, editable and text with improved workability. The issue here is your author (the person who created the original document) must have clear, conforming, handwriting, that the Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) that Transkribus runs on can recognise. If the AI cannot process (“read”) the material upload, you do have the option to train it to do so, but this takes, you guessed it, time. And the lack of time is what got you here in the first place.

    TLDR: If you’re using colonial office records, or handwriting that is standardised, use Transkribus to save time. If you have an author with unique handwriting, or are studying “low resource” Pacific languages, for a project, be sure to establish time in your planning to locate a range of data, consult with community, and to train Transkribus in matters of “ground truth”. Yes, truth is a contentious term for historians, but in this context refers to information acquired by direct observation about data. Ground-truth documents are human verified transcription used as the gold-standard for AI to learn from.  

  • Answer available 12 Jan., 2026.

  • Answer available 12 Jan., 2026.

  • Answer available 12 Jan., 2026.

Since 2024, academic and professional historians have been asking me questions about Artificial Intelligence and how (if) it should matter to their research practices.

Here is a short list of some questions that I managed to record. If you read this Q&A and still have questions about AI and History- great! You can submit a question that I’ll research and answer to colourfulhistories(@)gmail.com. Questions can remain anonymous and may be edited for clarity.

Previous
Previous

Publications: articles, reviews, and opinion pieces.