Right now artificial intelligence systems are being trained on massive amounts of historical text.
Books. Newspapers. Legal documents. Government records. Academic papers. Websites. Everything that has been digitized and made available is potentially feeding into AI systems that will shape how future generations access information and understand the world.
That sounds like a good thing. And in many ways it is.
But there is a serious problem buried inside it.
The historical record that AI is learning from has all the same biases and gaps that the historical record has always had. It was produced mostly by educated people with access to publishing and record keeping systems. It reflects the perspectives of the powerful more than the powerless. It contains the voices of wealthy people and institutions far more than the voices of poor people and ordinary workers.
When AI learns from that record it learns those biases too.
What This Looks Like in Practice
AI systems trained on historical text tend to associate certain kinds of language and certain kinds of people with certain outcomes. If the historical record mostly contains stories about wealthy white men making decisions and achieving things, the AI learns patterns that reflect that. It does not know that the record was incomplete. It just learns what the record says.
This has already produced documented problems. AI hiring tools that discriminated against certain demographic groups because the historical hiring data they were trained on reflected historical discrimination. AI image generators that produce stereotyped or unrepresentative images of certain groups because the images they were trained on were not representative. AI language models that handle some dialects and languages better than others because the text they were trained on was not equally distributed.
These are not just technical problems. They are historical problems built into technical systems.
What Needs to Happen
The historical record that AI learns from needs to be expanded. That means digitizing and including more documents, more voices and more perspectives that have historically been underrepresented.
Oral histories need to be transcribed and made available. Documents from underrepresented communities need to be digitized and included. The perspectives of working people, poor people, women and minority communities need to be part of what AI systems learn from.
This is not just about fairness in the abstract. It is about building AI systems that actually understand the full range of human experience. A system that only knows part of the story will only be able to reason about part of the world.
What You Can Do Right Now
Every document, photograph, oral history recording, or personal account that you digitize and make publicly available is potential training data for future AI systems. Your contribution to the historical record is a contribution to what AI will learn.
Publishing your own story on a blog or website. Uploading documents to archive.org. Contributing photographs to public archives. All of these actions expand the record that AI will learn from.
The future of AI is being shaped right now by decisions about what gets preserved and digitized. Ordinary people can be part of making those decisions go in the right direction.
Robert Lee Beers III is a writer and digital preservation advocate based in North Charleston South Carolina.
No comments:
Post a Comment