Q&A: Historians and AI.
*
Q&A: Historians and AI. *
Since 2024, academic and professional historians have been asking me questions about Artificial Intelligence and how (if) it should matter to their research practices.
Here is a short list of some questions that I managed to record. If you read this Q&A and still have questions about AI and History- great! You can submit a question that I’ll research and answer to colourfulhistories(@)gmail.com. Questions can remain anonymous and may be edited for clarity.
-
That’s a great question. Being the first question in a long list, I’m going to give some definitions and context, before giving a direct response. Skip what you know and enjoy learning what you don’t already know.
Artificial Intelligence AI is ubiquitous now. But you know what? AI is not a new presence in our computer and communication systems; the messaging about its supposed value is has just hyped up significantly in the past two years.
While fountain pens and a good notebook can readily sit alongside with a solid pair of boots for curious historians, digital technology can also be a major aspect of present-day historian’s output. If the latter is the case, it is likely that you’re already using AI in your research. AI is part of digital humanities (DH), a critical branch of humanities research. Research and output associated with DH incorporate key insights from languages and literature, history, music, media and communications, alongside computer science and information studies. Combining these different approaches and technologies into new frameworks provides a range of output (such as avatars, interactive maps, inter-connected record-keeping, digitised records, chatbots, APIs) for public, professional, academic and amateur histories.
Recently, the digital humanities have widened to include critical engagement with research processes such as machine-learning, data science, and AI. It’s important to note that AI is a very broad term for a range of automated machine processing that are designed to identify patterns and make statistical inference. AI models perform tasks or produce output (responses) that normally require human intelligence by applying machine learning techniques to large collections of data and identifying patterns.
For example, if you are monolinguistic or have limited capacity to learn a range of languages, you’ve likely used machine translation on records to better understand another language in the archive or in an interview. Machine translation involves changing text from one language into another language using a computer. There is a distinction between purpose-built translation tools (Google Translate, DeepL, Microsoft Translator) and the general-purpose chatbots ( ChatGPT, Claude, Google Gemini, Microsoft Copilot). A chatbot is a computer program that simulates and provides human conversation (either written or spoken), allowing researchers to interact with digital devices.
Google has used AI to provide search responses to users for over a decade. The recent introduction of AI overviews in Google search responses is changing the way users receive results. Programmed like a chatbot, AI responses will often appear fluent to the reader which increases the users’ sense of trust towards the answer. However, after trawling the internet, and taking into account other parameters set by the programmer, these AI answers trend towards being incomplete (at best) and hallucinating (at worst). Hallucinations in the context of AI refers to the Large Language Model (LLM) generating partially or false answers, often supported by fictitious citations. The supposed fluency of AI makes the responses deceptive because the inaccuracies and omissions/additions are not always obvious right away. These false inclusions increase the spread mis- or dis-information.
After all of that the answer is- you start using AI according to your requirements. If you are working with archival or library systems, maps, images, audio files, digitised material, or handwrittten documents, there will be a paid, or open, version of AI on the market. You just have to find it.
-
I hear you. Paying subscription fees for numerous products to process data is expensive. And while there are paid versions (ChatGPT Plus, Claude PRO and Gemini Advanced) here we are, lucky enough to exist when companies are giving us access to this new technology for free! Why not use it?
Let’s think about the future before we upload the past: what is at stake here? The “free” use of AI is like those “free” bathroom products in the fancy hotel. The cost is there; it’s just embedded somewhere in the costings where you can’t easily see it. In this instance you’re paying for the use of AI, like Gemini, Claude or Copilot, by uploading material for trend, pattern and research processing. When you upload this information to these platforms for image processing, translation, image creation, map creation (just to name a few options) you’re gifting corporations’ information about yourself, or worse, others that don’t know you’re uploading their information, ideas, and creations. While there are services, like Transkribus, that can allow you to protect data, often those “free” products are gathering “data” (personal, business, community and social information) without clear structures concerning privacy, intellectual property, security and digital governance. Before you upload information to a chatbot, ask yourself
- is it your information to share?
- are you happy for the company to use that information in any way they deem necessary?
- are you using someone else’s data in an ethical way?
- Does paying for a license afford data protection, privacy and security?
If your answers are of the negative kind, stop, and rethink how to best process this material. If you want to use AI, be sure to read the data policy before sharing that historic information.
-
Transkribus is a programme specifically designed to deal with the complexity of working with handwritten texts that have been digitised. AI is used innovatively in this platform to improve Handwritten Text Recognition (HTR: when a computer can receive and interpret handwritten input from images) and Optical Character Recognition (OCR: is the process that converts an image of text into machine readable text format). If you upload a digitised image to Transkribus, select the correct language and period, according to Transkribus, the processing time of historical documents, colonial records, diaries, letters, newspapers, memos, can decrease.
What makes Transkribus an excellent time saver is how it can be trained by users and tailored to their particular needs, thereby cutting back on processing time in the long run. Programmers, and some users, have trained this platform using Deep Learning and Neural Networks to look for pattern recognition (having the system understand that certain shapes of ink represent particular letters, which form particular words, and so on).
Users need to acknowledge that Transkribus has bias. It is trained on a select range of already converted images. The “high resource” languages of French, German, Dutch and English are dominant on this platform. The AI uses what it “knows” from previous uploads to transcribe your material, thereby making it searchable, editable and text with improved workability. The issue here is your author (the person who created the original document) must have clear, conforming, handwriting, that the Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) that Transkribus runs on can recognise. If the AI cannot process (“read”) the material upload, you do have the option to train it to do so, but this takes, you guessed it, time. And the lack of time is what got you here in the first place.
TLDR: If you’re using colonial office records, or handwriting that is standardised, use Transkribus to save time. If you have an author with unique handwriting, or are studying “low resource” Pacific languages, for a project, be sure to establish time in your planning to locate a range of data, consult with community, and to train Transkribus in matters of “ground truth”. Yes, truth is a contentious term for historians, but in this context refers to information acquired by direct observation about data. Ground-truth documents are human verified transcription used as the gold-standard for AI to learn from. -
An avatar is an icon or figure representing a particular person in a video game, internet forum, or some other virtual arrangement. In the digital realm, an avatar acts as a bridge between the present and the past. While gamers know them as digital skins, in history, they are embodied forms of archives. This electronic image can be moving or a still. They tend to represent a figure (person, animal or otherwise) that represents a real-world entity. Avatars have the potential to be manipulated by a computer user. If you play video games you’re sitting there, reading this, acknowledging that avatars are indeed, a virtual representation of self. If you are not a gamer, hang on to your notebook; avatars, with the correct research, programming and audience refinement, can be a fun means of sharing stories from the past.
Avatars have been used previously to communicate stories from the past to a viewer. With the foundational characteristic of representing a person, avatars are an appealing avenue by which to represent one particular person’s experience of the past, be it an event or period. They can be static; think about the image of a person and having information delivered to you by a dialogue box. There are also avatars that provide movement, in body or simply facial expressions. The latter are likely to be educational avatars. Unlike static images or simple animations, these avatars are designed to mimic human behaviours and expressions, making them appear more lifelike. An interactive avatar runs on two sources. A
Large Language Model (LLM)
knowledge base paired with system instructions.
Active and emotive avatars can be tailored to a particular curriculum goal, such as Indigenous history or ANZAC history. The creation of specific people, especially Indigenous people, raises important ethical questions. When creating avatars based on real historical people, consultation with community or descendants is essential. If you are using specific oral histories, diary entries, reports, or letters to create the avatar’s answers about someone’s lived experience there needs to be considerations about intellectual property, copyright, and collaborative insight. The merger between informed consent, or ethical engagement with historical records, and technical expertise is crucial to the creation of a good avatar.
The use of virtual avatars for education, to enhance a users’ engagement with the story being told can be a valuable process. A study concerning the use of non-realistic avatars in education programmes indicates they enhance curiosity, reduce social barriers, and foster playful learning atmospheres, while realistic avatars promote empathy, relatability, and deeper emotional investment. An important aspect of these storytelling successes is ensuring high video quality. This includes the expressiveness of the virtual avatar. If there are realistic expressions employed by the avatar’s programmers, then users are more likely to engage with the content.
The combination of VAs, and Artificial Intelligence (AI), especially chatbots, has changed how history is shared with particular cohorts, for instance school groups (primary, secondary and tertiary). Chatbots are a computer program designed to simulate conversation with human users, especially over the internet. You may have encountered a chatbot online while doing shopping, logging an enquiry with a company which you have a service account (electricity, gas, insurance) or doing online training. There are history education and cultural education outreach avatars such as Charlie the Virtual Veteran, Queensland State Library and King David with ACTS Education (when you book a demo, ask if you can trial their Charles Bean avatar). Rather than an avatar talking “at” the user, with the inclusion of AI the user asks questions of the avatar about their experiences in the past.
For example, if you have an avatar of Emily Caroline Creaghe the responses should be (I’m hoping more avatar creators hire historians for content creation) based on the primary sources (her diary) and secondary sources (various, based on the decisions of the associated researcher/historian). These documents provide the information parameters for the avatar. It’s important to acknowledge an avatar will only be as good as the research on which it is based.
A good avatar will also have firm guardrails. These directives will limit the capacity of the avatar; it will not engage with questions about, for example, sexuality (there is no data about this, and we’re not focused on that in a lesson plan about women explorers) or prompts (direction from the user) to assume the persona of a lead singer in a Riot Grrrl band (chatbots priorities can be changed without carefully applied guardrails). Guardrails are important as the chatbot can be commanded to, gently, steer the user back to the task at hand and have an in-depth discussion that draws on primary and secondary sources to consider the life of a female explorer in the late nineteenth century. Having clear guardrails helps to preserve the historical integrity of the persona being presented and prevents “jailbreaking” (trying to make the avatar a singer). These boundaries can also reinforce to the user that they are not speaking with a “person”, but a “simulation” of a person based on specific records.
Creating a limited scope of inquiry via firm guardrails can’t solve all the issues with avatars that have been programmed to utilise AI. The LLM on which they are based can still “hallucinate” (make up information to ensure it meets the chatbot requirement of supplying the user with a sensible answer). This is a risk with avatars. However, I also see it as an opportunity for users to be informed about the limitations of the AI-avatar through the lens of archival gaps and silences. Excellent discussion points for history students and the historically curious. This issue is a pathway to critique the past and the tools we use to deliver stories.
Hopefully I’ve answered the question for this week. Have a question? Send it through to colourfulhistories(@)gmail.com.
Deborah
-
This is a most excellent question! There are a few stages to the response, what is machine learning, examples, and then the historians’ role itself. Grab your beverage of choice and hopefully you’ll be inspired to get involved with supervised learning, or discussions about the importance of critical humanities to model development, after reading this Q&A response.
To understand supervised learning, we need to know the process of machine learning. Researchers have attempted to develop algorithms—termed machine learning—that imitate and even excel in human cognitive ability to do complex tasks in the history profession. Machine learning refers to an algorithm in which a computer recognizes patterns and relationships of variables based on given data. Each algorithm develops a model to output an answer for a specific problem.
Currently, there are three types of machine learning: supervised, unsupervised, and reinforcement learning. In history there are currently no universal or regional guidelines for using AI models in the workplace unlike, say, in medical research. Historians are not averse to guidelines. Just as we follow archival protocols to ensure integrity, we could use supervised learning as an opportunity to ‘label’ data so others can recognise our specific professional standards. We need guidelines for using AI models not to giveaway our expertise and be replaced (Sarah Conner taught me better than that), but so to reassure our readers and peers that our research follows particular standards when using certain digital tools. By ensuring models are subject to supervised learning we continue the historiographical tradition, conceptual styles and frameworks are recognisable to others.
Supervised learning is when the researcher modifies content by tagging information to create a “ground truth” document. While the concept of “truth” is contentious for historians, in this context ground truth is the interdisciplinary assumption of a computer science definition which indicates the document provides the AI program an ultimate truth to work from, providing a sense of what is correct and what is an incorrect response to provide as output. The ground truth for computer scientists is, if we’re using a handwritten diary, the accurate transcription of entries. The historical truth is a debate for historians. From this ground truth, AI can run behind the scenes training sessions and learn how to make predictions which are then applied to a digitised image of text, such as a diary.
In the world of historians (welcome, tis a fine place to be), machine learning can be an option when confronting swathes of information about a particular topic, event, or time period. The ability of machine learning, offers an opportunity to negotiate the issue of all those beautiful, complex, heartbeat-pounding, piles, meters, stacks, boxes, tied bundles of manuscripts and documents or gigabyte and terabytes of digitised images from archives and libraries on a manageable scale. At the moment models that are relative to the work of historians focus on palaeography (the study of old handwriting) to improve access to cursively handwritten documents, such as diaries, letters, and official documents from colonial offices.
Before we get to realising the (impossible?) dream of wrangling copious amounts of information (data) from the past in a timely manner, we need to create a specific model to process a particular collection. Why? Well, I could take a model such as Claude, Gemini, ChatGPT, or even the ChatGPT model, Historians’ friend, and give it a digitised image of handwriting from a diary that was produced in the 19th century with the prompt “translate, transcribe, contextualise” and it would! Wonderful. But it may not be a useable answer as these are generative and probabilistic models that are designed to use pattern recognition to give an answer. Sure, because of the chatbot characteristics in the model, I will have been provided a very nice sounding or reading response. It will make sense, but, and this but is why I am so critical of such models, over time the general model will struggle with the nuances and variability of a person writing in a diary over the course of 12 months uniform spelling, letter shaping and sentence structures are rarely conforming. I work with collections made by a range of people working in various institutions – missions, schools, libraries, maternal health care– which varies their writing presentation format and style. This means we need a model that can cope with the unique elements of such records. That prompts the creation of a model using supervised learning. What we want to process such records with something like Transkribus, an application that uses Handwritten Text Recognition (HTR) a specialised form of supervised computer vision to process handwritten documents.
This supervised learning process requires historians to correct and tag documents, providing the model with clear parameters. For instance, to create a document that is suitable for training an AI model for Transkribus I actioned the following steps:
1. have a supermodel (an epic, pre-trained, model that contains millions of words and “knowledge” of character images) process a digitised image of a diary
2. compare the diary page content with the AI output to correct errors in presentation, while maintaining original spelling mistakes and variations. For this pilot project, to produce "ground truth" with the Strathfieldsaye diaries, an average of 130 changes were made per page.
3. mark-up or tag for particular names, places, or dates of interest to demonstrate to the model what information is required for later data mining (an AI process that uncovers patterns and other valuable information from large data sets). The AI process produces content trying to replicate what it "sees" as accurately as possible. When I see "rain", AI sees "Ram", and with faded ink, digitised material, and individual handwriting, it's not "wrong" but woefully inaccurate. This is why supervised learning is so important in the creation of usable digitised records.
Once steps like the above were completed, the marked-up material was uploaded to the Transkribus platform to create a model that is suitable for this series of records. In a pilot project for the University of Melbourne using the Strathfieldsaye station diaries (1872-1875), the most successful outcomes came from correcting and tagging 50 pages of data, approximately, 24,000 words for model training and supervised machine learning.[i] The need to correct material almost halved to 70 changes needed per page, rather than the initial average of 130 changes.
It’s interesting to note, in these instances, the AI is not capturing spelling and writing conventions—its primary goal is to reproduce an author’s writing-style accurately. Correction and tagging are crucial in supervised learning and to export trustworthy data for collection discovery, data mining, and metadata creation.
Supervised learning of AI by historians serves dual purposes: training more accurate AI models can open up collections to a range of users, archivists and historians can continue to safeguard historical records, while offering an opportunity to address long-term historiographical concerns regarding the gaps and silences within archives.
By being involved in supervised learning, historians assist in multiple organisations and users of historical records. We can help librarians and archivists to address the massive data problem experienced by their organisations and institutions.[ii] If historians are excluded from the process of supervised learning, or reject involvement, AI-augmented output (responses) that addresses historical content is more likely to contain content and contextualisation errors. We can ensure the standards and practices of our discipline are consistently revised, adapted and practiced in both the analogue and digital worlds in which we live.
Hopefully I’ve answered the submitted question.
Do you have a question about history and the digital humanities? Send it through to colourfulhistories(@)gmail.com.
Deborah[i] Disher, Harold Clive, Strathfieldsaye Estate Diary February 1872-1875 (February 1872-31 October 1875), [UMA-IT-000147344]. University of Melbourne Archives, https://archives.library.unimelb.edu.au/nodes/view/637676, accessed 28 October 2025.
[ii] National Archives Australia, ‘Information management: Outsourcing digital storage’, National Archives of Australia, 2025, https://www.naa.gov.au/information-management/storing-and-preserving-information/storing-information/outsourcing-digital-storage; Indigo Holcombe-James, ‘I’m fired up now!’: digital cataloguing, community archives, and unintended opportunities for individual and archival digital inclusion, Archival Science, 22, 2022, 521–538, https://doi.org/10.1007/s10502-021-09380-1; Hider, P. (2024). At a Crossroads: Cataloguing Policy and Practice in Australian Libraries. Journal of the Australian Library and Information Association, 74(1), 53–72. https://doi-org/10.1080/24750158.2024.2403165;
-
The answer to this question begins with ‘why should I document the use of Artificial Intelligence in my research’? Historians love to show off their primary and secondary sources; that’s a core focus of any first-year introductory class. Show me the evidence! Really, because I want to evaluate it for myself.
Historians don’t seem so keen on showing methods or tools. That’s usually a line or two in an introduction, maybe a paper at a conference, and then it’s on to (trumpets please) THE ARGUMENT.Nice. I love a good argument (argument: that’s the collective noun for historians) and they’re quite important, but we can learn a lot from sharing our methods too.
The ubiquity of AI in our laptops, writing programs, recording equipment, and visual programs means we should be seeking ways to show authenticity and authority in our research. We need to distinguish our work and ideas from the output of AI, which is a researcher’s tool. I’m not saying AI should be used to generate content, like an introduction or an argument in a journal article, blog post, or grant proposal; there’s a lot of concern about plagiarism and AI. Yes, that’s important, but generating papers is not the only output option that AI has. I am acknowledging that AI can be used effectively to create a transcript of handwritten material, process an oral history transcript, create metadata for an archival collection, create a map for a piece about explorations, or create an image of written material from a diary. Places like Te Papa and the University of Melbourne Archive are testing AI to improve access and processing times in their collections. And while they aren’t implementing AI in their workflows yet, the potential is there for future application. When our GLAM institutions do decide to use AI, historians should be able to account for it and explore the impact (if there is any) on our research.
We need to hold ourselves and others associated with our discipline to certain standards. As always, before you start, check your systems’ security settings, institutional policies, and ensure you have the required permissions to work with that material on the AI platform you’ve chosen. While some in our digital worlds are partaking in data fabrication, intellectual property theft, plagiarism, and violating data sovereignty, that doesn’t mean historians have to. Part of our responsibility as educators and researchers comes from using AI, which involves being clear and concise about the chosen AI platform, prompts, output, and our specialist (critical) response to that material.One way of demonstrating disciplinary authority and authenticity is to provide references (*sigh*I love a good footnote). While there is potential to keep doing historical research and writing without AI, a little bit of careful and critical use can be a great benefit. We can then pass this knowledge onto our students to improve their digital literacy; we can help others become critical and advocate against the proliferation of unregulated AI.
Citation standards are an important and underutilised means of regulating AI use. Carefully sidestepping the endnote/footnote debate to provide this example, the Chicago Manual of Style does have guidelines for how to reference AI usage in academic papers.
A footnote would look like this: 1. Text generated by ChatGPT, OpenAI, March 7, 2023, https://chat.openai.com/chat.
Note here, CMS is assuming that you have used ChatGPT to create content. In some journals, this could be perceived as ChatGPT standing in as “author” of the content. If you have denied yourself the fun times and gratification (hair-pulling) that come from writing, you’ll want to note that many publishers (journals and books) will not recognise AI as an author. It’s probably best to delete that paragraph/sentence and try again.
I have faith in you.
Let’s say you have used AI as part of your research method to create a transcript from handwritten materials. You can make mention in the methods section about what you have done and cite AI use in a couple of different ways:Sample acknowledgement (in-text):
Super AI-5000 provided translation assistance for 34 archival documents from Spanish to English, representing roughly 40 per cent of the translation work. All AI translations were reviewed by the lead author, with particular attention to historical context, legal language, and culturally specific concepts. Approximately 25 per cent of AI translations required substantial revision to maintain accuracy.
Sample acknowledgement (Footnote):
Super AI-5000. OpenAI. Accessed 15 July–22 August 2024. Used for initial interview transcription: Spanish language materials (8 hours of audio). Human verification: Complete manual review and correction of all transcripts against original recordings.
To make these references comprehensive you should provide
- specific AI tool names and versions used (e.g., “ChatGPT-4, GPT-4o, Claude-3.5-Sonnet”) and date-range of use;
- specific tasks performed by the AI tool(s), such as translation assistance, transcription support, literature review support, editing assistance (limited to improvements to wording or formatting changes; excludes generative editorial work and autonomous content creation);
- the extent of AI involvement in these tasks. Please also include a discussion of how human oversight and verification were employed at all stages of AI tool use. Record keeping from the start of your project in the form of a research diary is really going to come in handy here.
For multi-author manuscripts, where AI use varies among team members, individual contributions should be specified. Where the same tools and workflows were used collaboratively, a single consolidated statement is sufficient. A word of caution here: if you’re working on an interdisciplinary project, make sure you have a conversation about AI use before the hard work begins. In some disciplines, having AI summarise and write findings up is common. In others, it’s the demise of your career.
AI, generally speaking, has a lot of problems. By recognising the complexity of AI, accounting with authority how we are using it in professional and academic research, we have made an important step towards holding the broader research community to higher standards of transparency.Want to talk about this post further or have a question about history practice, digital humanities and/or AI?
Email me at colourfulhistories@gmail.com. I’d enjoy the opportunity to continue the conversation. -
Welcome back to another instalment of Historians and AI! This week’s submitted question was what can AI do to synthesise written material produced over the course of a career? Can AI locate methods, and changes, used over time from a range of written materials?
Gosh, this was a great question to work through with a colleague. This question initially looks like it’s about summarising and synthesising material and purchasing a product, but really, it’s bigger than that. We’re being prompted to consider a range of topics: methodology, disciplinary norms, secondary-sources, security, privacy, open access, Gen-AI, agents, chatbots, and prompt creation. Let’s go through the steps of answering these questions and tempering our curisosity about AI.
1. Create a project abstract:
The first step is to create a project abstract about your research topic, argument and goal. What do you want to achieve by integrating AI into your workflow? Writing a statement or abstract will help to identify the platform, plug-in, or app you need to complete the task at hand. Is the task better done manually like for small-scale quantitative data projects, or small-scale transcription projects? Often large amounts of data are needed for Large Language Models to work well.Remember: AI is like an archive, or oral histories, in that is it not neutral. It is a tool that can help in research; however the chosen platform must be aligned with historical method and sub-disciplinary norms.
2. Question parameters and obligations
Taking a moment for the abstract writing task also helps to articulate possible ethics and AI capacity issues. Do you own all the copyrights to the material you are thinking of uploading for summation? Also, you may be seeking a program that summarises PDFs. One of the most-hyped summary options is TLDR. If you’re a historian, this platform isn’t perfect for you; TLDR was trained to summarise computer science papers, which have a different disciplinary language and writing conventions than history. If you are looking to summarise history articles, you need to locate a platform that was, ideally, trained on history articles. Historians rely on narrative argumentation, implicit causality, and fragmentary evidence when writing up their research. These are disciplinary characteristics that LLM summarisation will likely flatten.
3. Assess your materials
There are other questions you need to consider before uploading your material. These are as follows.
What kind of documents are you using: PDF, Word, video, audio, websites? Some platforms are trained to summarise various formats, others, just one.
Can you manipulate the source to fit the requirements of the platform? PDFs may need to be broken into small chunks before “feeding” them into AI. Some AIs driven platforms may fail to even process smaller amounts of information good enough for an undergraduate class discussion.
Does the platform run online or offline? Some AI is only designed to work online. Keep in mind there is closed-loop, offline AI available. If you’re working with sensitive data, this latter choice may be best.
Do you own the copyright on these documents?
Is the platform secure? Is there indication of encryption? Where are the servers held? The answers to these questions will inform how you address issues of governance, privacy and security issues. For example, the governance over AI in the EU is very distinctive to that of the US.
How will your data be used after uploading? Yes, you have to read the terms and conditions to know this information. Yes, I am the type of geek that reads T&Cs.
How long will the platform keep your information and who gets to make the decision of destroying the content after it has been processed.
Once you have determined the documents, security, privacy, copyright, and ethical issues for your project you need to locate a platform to process material.
4. Choosing and using a platform to process your chosen materials
AI assistants
You may prefer using something like Claude, or Gemini to summarise material. In that instance, you’ll need to know how to create effective prompts. This is important as taking the time to formulate custom prompts can improve the quality of summaries given. Again, this is where your abstract is beneficial. You’ll need to articulate the following:Context: State your research topic and objectives.
Role: Define the AI's persona (e.g., "Act as a research methodologist").
Methodology Focus: Specify if you need qualitative, quantitative, or mixed-methods guidance.
Output Format: Define the desired format (e.g., table, bullet points).
Once you have given these parameters, you’ll need to tell the model what you want it to do. Below are sample prompts:
If you want detailed extraction: "Read this [article/abstract] and identify the specific research design, data collection, and data analysis techniques used. Present the findings in a structured table".If you want to make a methodological comparison: "Compare the methodology of [Paper A] and [Paper B] regarding their approach to [e.g., oral histories]. Highlight which study’s approach is more robust and why".
If you are looking to conduct limitation analysis: "Analyse the methods section of the attached paper. Identify potential methodological limitations. Identify the primary source base used and how the author addresses gaps in the archival record.
If you are trying to find the why in an article: "Based on this article, explain the rationale behind using a [e.g., qualitative analysis] approach instead of a quantitative survey".
Agents
Perhaps you’re looking at your reading pile and thinking, no, Claude and co. aren’t for me. You may want to consider agents and their value for such work. An artificial intelligence (AI) agent refers to a system or program that is capable of autonomously performing tasks on behalf of a user or another system by designing its workflow and utilizing available tools. AI agents can encompass a wide range of functionalities beyond natural language processing including decision-making, problem-solving, interacting with external environments and executing actions.
You could create an agent to process your material using strict parameters for research and desired output. Personally, the tendency of agents to lack a preciseness in their work and poor security processes means this is not something I recommend.
Historians should favour tightly constrained, inspectable workflows over autonomous agents. Historical argument is built on interpretation, contextual judgement, and an awareness of what sources cannot tell us. Agents are not, yet, well equipped to perform such tasks. For that reason, tightly constrained, inspectable workflows where you remain in control of each step are better.Platforms
This brings us to some platforms that are designed to summarise a range of genres and formats.FileReadyNow condense your PDF files into concise summaries.
Genei.io enables you find useful information sources for your content or upload your own webpages and PDFs. All such documents can be stored in projects and folders to ensure your work remains organised and neatly managed. Once you've uploaded or found the documents relevant to your topic area, you can use Genei's AI to extract key information from the articles instantly.
NoteBookLMGoogle Upload PDFs, websites, YouTube videos, audio files, Google Docs, Google Slides and more, and NotebookLM will summarize them and make interesting connections between topics, all powered by the latest version of Gemini’s multimodal understanding capabilities.
Scholarcy.com Allows you to bulk import files in any format, to quickly build a collection of informative summaries and make sure you never lose another article. Identify and extract key facts and findings more easily with interactive highlights. It’s not like TLDR as it has been trained to respect general academic writing. It may struggle with a older styles of writing as compared to an article written in the 21st-century.
Write notes as you go and export them with your Flashcards to kick-start your lit review. The machine learning algorithms which Scholarcy is built upon were trained to work best on academic articles and individual book chapters. Scholarcy is not a writing tool and cannot write your literature review for you. It can however help you to read and synthesize your collection of research papers, and structure your thoughts in preparation for writing your lit review.
The platform can generate flashcards from videos. However, it was designed to work best on academic articles and book chapters, so depending on the available transcription, and content of the video, your results may vary.
LightPDF AI Summarizer is an online, AI-powered tool. Edit, convert, OCR, sign, annotate, chat with PDFs, and more. Accessible across desktop, mobile, and web platforms.
Key Features and Benefits
- Instant Summarization: Transforms lengthy documents into brief, actionable summaries, highlighting essential information.
- AI Chat with PDF: Enables users to ask specific questions about the document content, facilitating deeper understanding.
- Versatile Functionality: Beyond summarizing, it supports OCR (optical character recognition), editing, and converting PDFs to other formats.
All of these platforms carry with them the caution that the output tends to flatten historiographical debate, arguments and case-studies. If you want broad trends, use the AI, if you want that unique perspective and rich detail you’ll have to plan time to do the task yourself.
Hopefully that walk through will give you some ideas about whether to proceed with using AI for summarising and identifying large bodies of work.
You can keep the conversation going, or submit a new question for the series, by contacting me at colourfulhistories@gmail.com.

