What is ancestry?

If you’ve ever used a commercial genetic testing service like 23andMe, AncestryDNA, MyHeritage, FamilyTreeDNA…, you’re probably familiar with the concept of “genetic ancestry.” These companies send you a saliva kit and, after you mail it back, they provide you with a report that shows you what percentage of your DNA can be traced back to different populations around the world. On the surface, it seems like determining this information should be a straightforward process: just analyze someone’s genome and use some advanced statistics to generate numbers like “97% French and German” or “56% Germanic Europe, 6% Ireland” (which are actually from my own 23andMe and AncestryDNA reports, respectively). However, when you delve more deeply into the topic of “ancestry inference,” you quickly realize that it involves not just statistics and genetics, but also sociology and psychology.

What “ancestry inference” is really trying to accomplish?

If you ask people to describe their ancestry, you’ll likely get answers that fall into one of two categories: many people will use geographic labels like “German” or “Hawaiian,” while others will use ethnic labels like “African” or “Caucasian.” It seems reasonable to assume that when people take a “genetic ancestry test,” they want it to predict the geographic and/or ethnic labels of their ancestors. However, if you try to write an algorithm that does this, you’ll quickly encounter two major, largely insurmountable problems.

  • The first problem is that the concept of “ancestry” is highly subjective and can mean different things to different people. For some people, ancestry is closely tied to their cultural identity, while for others it is more closely linked to their physical appearance. This means that any algorithm that tries to infer ancestry will have to be able to account for these subjective and often conflicting definitions of ancestry.
  • The second problem is that the human genome is an incredibly complex and dynamic system, and the genetic markers that are used to infer ancestry are just a tiny fraction of the total genetic information that is contained within it. This means that any algorithm that tries to infer ancestry will have to be able to accurately analyze and interpret a vast amount of genetic data in order to produce reliable estimates.

Given these challenges, it’s no surprise that the field of ancestry inference is still a very active area of research, with new approaches and algorithms being developed all the time.

What time frame should be considered when determining ancestry?

There is a question of how far back in time we should consider when determining ancestry. Some people may think of their recent ancestors, such as those from the past 100 years, while others may consider ancestors from hundreds of years ago, such as those who immigrated to the United States. It is not clear what time frame people generally have in mind when considering their ancestry, and it may depend on an individual’s own ancestry. This suggests that ancestry cannot be objectively determined solely from DNA.

Ancestry labels can be influenced by social and political factors.

Ancestry labels can be influenced by social and political factors. For example, changes in religion or the annexation of a territory can affect how an individual and their descendants think of their ancestry, due to cultural transmission of language and traditions. The creation of a shared ancestral identity has often been used to consolidate political power over different cultures. These factors may not be visible in genetics for hundreds or thousands of years, unless they impact marriage and migration patterns.

Ideally, to fully understand one’s ancestry, it would be helpful to have a detailed list of ancestors from different time periods, each labeled with their geographic location and ethnic identity. However, genetic tests are not particularly reliable for obtaining this information. As a result, commercial companies, including those that offer genetic ancestry testing, try to approximate the general geographic regions where an individual’s ancestors likely lived and, in some cases, their ethnic identities. This approximation is based on an indeterminate time in the past, likely a few hundred years ago. While this approach may not provide exact or objective results, it can still be meaningful and provide valuable insight into an individual’s ancestry. Many people have purchased these tests and have learned about previously unknown aspects of their family history, or have discovered hospital mix-ups and the genetic legacy of slavery in their own genomes.