In 2001, the first draft of the human genome was published. It consists of an extremely long succession of the four letters, ATGC, that make up our genetic code. Think of this published genome as the reference picture on a puzzle box: Scientists worldwide can compare the genome of any person to the reference and thereby identify important differences.
However, eight percent of the human reference genome is still missing. These missing pieces lie in regions that are full of repeating letters, which were hard to decipher with the technology available in 2001. However, deciphering these regions, which are involved in crucial cellular processes, could enhance our understanding of a multitude of genetic conditions.
In May 2021, two decades after the first draft of the human genome was published, an international group of scientists reported in a pre-print that they have sequenced those missing pieces, creating an almost complete human reference genome.
To achieve this milestone, the scientists combined two new sequencing technologies, which allow for sequencing longer stretches of DNA at a time. Like in puzzles, these larger pieces are easier to put together, allowing the researchers to determine the sequences of the highly repetitive regions that had been missing from the reference genome.
They sequenced the genome of a cell line derived from cells that form when a sperm fertilizes an egg with no nucleus. Since these cells contain genetic material only from the father, the scientists do not need to distinguish between chromosomes from the two parents. However, the sperm from which the cell line was derived carried only an X chromosome. Consequently, the new reference genome lacks the sequence of the Y chromosome. The researchers are already in the process of sequencing also this last missing puzzle piece.
If these results pass the thorough peer-review process, they will be a major step forward in completing the reference picture of our genetic puzzle.