Part One: AI Today and How to Use AI in Genealogy Research

AI technology impacts every area of our lives today and is growing so quickly it can be difficult to keep up, especially with AI in genealogy research when it feels so new and unknown. We were able to spend some time with Steve Little, an expert in the AI and genealogy space, co-host of The Family History AI Show podcast, and AI educator with NGS, to discuss the future of AI and genealogy.

In part one you will learn about the history of AI, how it has developed to this point, and how to effectively use AI.

Part two will cover use cases for genealogy and how to save time and research more accurately with AI tools.

LTG:What's your background with AI and how did it catch your attention?

Steve:

There's kind of a Day Zero, just about 18 months ago ,when something huge happened. I think we’ll look back in world history and December of 2022 is going to be a date that sticks out. Eighteen months ago, when OpenAI released this product called ChatGPT, it got many people excited, especially if they had peculiar interests. And the three peculiar interests that I have had for more than 40 years are language, technology, and genealogy. When this new tool became available, it captured my attention immediately.

I grew up loving technology. As a teenager in the 1980s, I had one of the first personal computers, a Commodore 64. And I’ve always loved language.

I had an aunt who was into genealogy. She was a serious genealogist in the fifties, sixties and seventies, and I started doing data entry for her in the 1980s with a DOS version of Family Tree Maker. And just by osmosis I learned and came to love genealogy and the genealogical database technology.

When this tool, ChatGPT, became available 18 months ago, it immediately grabbed my attention and the attention of several hundred million other people within a month. OpenAI went from zero users to a hundred million users in one month and no company had ever done anything remotely that fast before. It got a lot of people's attention.

AI Explained In Basic Terms

LTG:What exactly is AI, or artificial intelligence?

Steve:

Let's start from the big idea and get a bit more specific because the phrase artificial intelligence is an umbrella term. There are about 12 different fields of study that would fit inside artificial intelligence. If you talk to a computer scientist, they've been talking about artificial intelligence for more than 50 years.

But in the past 18 months, ordinary people who are talking about artificial intelligence are referring to something very specific and new. Within the broader field of artificial intelligence, one aspect deals with language. This field is called natural language processing and involves teaching computers how to talk and listen, read and write. That’s been around for a long time. Over the past 50 years we’ve seen incremental improvements in how computers can just talk and listen.

But something happened 18 months ago: an incremental change in improvement reached a tipping point such that it wasn't just an incremental improvement, it was as if a light switch had been flipped and it went from darkness to light. And what happened was not just the incremental improvement in how this tool talks and listens, but in its ability to be significantly more useful to everyday people.

To answer the question, what is it that is making these AI tools work?

It's a computer algorithm that is exquisitely good at talking and listening. It's not alive, it's not really talking or listening. It's in many ways a computer program like every other you've ever used. But in some other ways, it's unlike any computer tool you've ever used before. It's significantly different.

It is so good at language that it fools people. Most users will have an experience where they're talking to this machine, conversing with it, typing back and forth to this machine, when it responds in such a humanlike way, it takes your breath away for a moment. Even if you've been using it for a year and a half, it can still surprise you what these new emergent capabilities can produce.

How AI “Thinks”

LTG: Where does the LLM get its data to answer prompts?

Steve:

When you first use these machines, it almost feels like it might be doing research and looking up something for you, but that's not what it's doing. It is a useful oversimplification to say that it is just picking the next right word. For example, “Old McDonald had a [blank]. What's the next word?

“Farm”. Now, you are doing that from memory. You had somebody who loved you and read to you nursery rhymes, and you have a memory of having heard that before. Well, computers have memories too, but not exactly like ours. When you hear that nursery rhyme beginning, you may have an image of a grandmother or parent come to mind, someone who read you those rhymes. That's not what's happening with these tools. They do not remember somebody reading them nursery rhymes. But neither are they looking-up the answer.

LTG:A term that is often heard in reference to AI is hallucination, meaning it is not based in reality and the information or responses coming from the tool are hallucinations. What does this mean?

Steve:

In a sense, it's all hallucinated. All the words are generated in its neural network. And the LLMs neural network is totally disconnected from reality. When we say it has a hallucination rate of 30%, or it's accurate 70% of the time, or when we claim it is accurate 97% of the time, what we're claiming is that it's hallucinating only 3% of the time. But if it's actually hallucinating a hundred percent of the time, and 97% of its hallucinations correspond to our reality, then we say it's getting things right in the real world 97% of the time.

If you learn what the machines can and can't do, you can get the hallucination rate under 1%. There are best practices and if you follow these best practices – if you use these tools the way they're intended to be used within the capabilities that they have today – you can get it to correspond with reality 99% of the time.

The best practices are three:

1) Know your data

2) Know your model

3) Know its limits

These best practices require bringing data to the machine – knowing what the machine can do and only asking it to do what it can do. “Today’s limits are today’s limits,” as we way. New users, they want the chatbot to be a magic genie. But it takes people about 20 hours of using the tools to learn what they are actually good at doing.

How To Use AI in Genealogy Research

LTG:What should you be asking AI to get the most accurate answers in your genealogy research?

Steve:

First off, I encourage people to play. You learn best by playing. And you're not going to break these machines. Play and have fun. That's the best way to learn. But, when you're ready to do fact-based, reality-based, genealogical work, where evidence matters, where facts matter, then you want to pay attention to the limits of the tool and be very conscious and aware of your own expectations and what you're asking the LLM to do.

A year ago, we would have told genealogists not to use it for research, at all. And last year that was good advice because it wasn't very good at research. Today, it is getting better, but it's still not trustworthy. I encourage beginners to ask themselves each time they use this tool, “Are you doing research? Are you asking this tool to tell you something you didn't already know?” And usually they say, “Well, of course, what else would I use it for?” But – today – they are still stepping onto thin ice.

There are about 20 million things you can do with AI tools other than research, but that doesn't occur to us. Google has warped our brains. Over the past 20 years we've learned that if we want to learn something we don't know, we go to Google and we type in a short phrase, and it gives us the answer we were looking for.

And we mistakenly think that's what this tool might be doing. We do the same thing as a Google search and the chatbot seems to respond. We ask it a question and it seems to give us an answer in perfect English. And so, we think maybe the answer is as perfect as its grammar, but it is not. It can get the grammar perfect without getting the reality correct. But there are ways to mitigate that: If you bring to the tool the information you want to work with – instead of asking it to show you something you don't already know – it will slice and dice language very well. You can give it information such as words, language, text, wills, probate files, chapters of a book, an article, and it can help you process that information in lots of different, useful, safe ways.

That's how you drive the hallucination rate below 1%. You say, “Let's just talk about this right here, this information I’m giving you [the chatbot] right now.” You're not asking it to go out and discover something new. You're bringing information to it. And when you do that, that's how it becomes useful genealogically to process information you've already got.

And genealogists have boxes, folders, cabinets, shelves, closets, basem*nts, and external hard drives full of information. You've been collecting data for as long as you've been doing genealogy.

So now you have a very smart intern or assistant to help you process that data and information that you've been collecting for as long as you've been doingfamily history.

Now there's somebody to help you make sense of that.

Now there's somebody to help you find the needle in the haystack.

Now there's somebody to help you condense 800 pages of information you need distilled into two pithy paragraphs. It'll do that in 20 seconds, and that's hugely powerful.

LTG:What were some of your initial ideas about how you could use AI for genealogy research?

Steve:

Over the past 18 months, I've spent about 700 hours trying to figure that out. I spend about 20 hours a week just trying things. Does this work and does this not work? And I fail 90% of the time, so I discover many, many things that it cannot do today. I've been stunned to see there were things we could not do a year ago or even six months or three months ago that we can do today. That's exciting, seeing how fast these tools are getting better and more useful. But just trying things, seeing what it could do and what it failed to do, and mapping that out, has been what I've spent a huge part of the last 18 months doing.

In part two we will discuss use cases for AI in genealogy and how you can save time and become more accurate by using these tools the right way.

If you'd like help with your genealogy research, contact us to ask your questions and get a free quote!

Part One: AI Today and How to Use AI in Genealogy Research - Legacy Tree (2024)

LTG:What's your background with AI and how did it catch your attention?

Steve:

AI Explained In Basic Terms

LTG:What exactly is AI, or artificial intelligence?

Steve:

To answer the question, what is it that is making these AI tools work?

How AI “Thinks”

LTG: Where does the LLM get its data to answer prompts?

Steve:

LTG:A term that is often heard in reference to AI is hallucination, meaning it is not based in reality and the information or responses coming from the tool are hallucinations. What does this mean?

Steve:

The best practices are three:

1) Know your data

2) Know your model

3) Know its limits

How To Use AI in Genealogy Research

LTG:What should you be asking AI to get the most accurate answers in your genealogy research?

Steve:

LTG:What were some of your initial ideas about how you could use AI for genealogy research?

Steve:

References