Skip to Main Content

Identifying the Genomic Basis of Biological Variation: The Perils of Biased Representation in Genomic Data Sets

By Diane P. Genereux

The Perils of Biased Representation in Genomic Data Sets

Whole-genome data from individuals of European ancestry are predominant among publicly available data sets, while data from individuals whose ancestry points to other regions are markedly underrepresented. As discussed by Latrice Landry and colleagues in “Lack of Diversity in Genomic Databases Is a Barrier to Translating Precision Medicine Research into Practice,” this distribution limits the power of existing approaches to discover relevant genetic changes that may be at higher frequency in populations with historical roots in areas of the world other than Europe, raising the specter of results that may be less than useful or, worse, even misleading for individuals with some ancestries. The National Institutes of Health Human Genome Research Institute offers an overview of this issue at its website (Diversity in Genomic Research), including interviews with genomics researchers who provide a glimpse into the origins and implications of inequitable ancestry representation. A 2020 review in the journal Nature Reviews Genetics (Amy McGuire et al., “The Road Ahead in Genetics and Genomics”) presented comments from twelve different researchers on strategies for addressing these inequities. Additionally, in “Increasing Diversity in Genomics Requires Investment in Equitable Partnerships and Capacity Building” Alicia Martin and colleagues point to establishment of collaborations between researchers working in higher- and lower-resourced parts of the world as essential for improving equity in genomics research.