Packed inside every cell in your body is a set of genetic instructions, 3.2 billion base pairs long. Deciphering these directions would be a monumental task but could offer unprecedented insight about the human body. In 1990, a consortium of 20 international research centers embarked on the world’s largest biological collaboration to accomplish this mission. The Human Genome Project proposed to sequence the entire human genome over 15 years with $3 billion of public funds.
Then, seven years before its scheduled completion, a private company called Celera announced that they could accomplish the same goal in just three years and at a fraction of the cost. The two camps discussed a joint venture, but talks quickly fell apart as disagreements arose over legal and ethical issues of genetic property. And so the race began. Though both teams used the same technology to sequence the entire human genome, it was their strategies that made all the difference.
Their paths diverged in the most critical of steps: the first one. In the Human Genome Project’s approach, the genome was first divided into smaller, more manageable chunks about 150,000 base pairs long that overlapped each other a little bit on both ends. Each of these fragments of DNA was inserted inside a bacterial artificial chromosome where they were cloned and fingerprinted.
The fingerprints showed scientists where the fragments overlapped without knowing the actual sequence. Using the overlapping bits as a guide, the researchers marked each fragment’s place in the genome to create a contiguous map, a process that took about six years. The cloned fragments were sequenced in labs around the world following one of the project’s two major principles: that collaboration on our shared heritage was open to all nations. In each case, the fragments were arbitrarily broken up into small, overlapping pieces about 1,000 base pairs long.
Then, using a technology called the Sanger method, each piece was sequenced letter by letter. This rigorous map-based approach called hierarchical shotgun sequencing minimized the risk of misassembly, a huge hazard of sequencing genomes with many repetitive portions, like the human genome. The consortium’s “better safe than sorry” approach contrasted starkly with Celera’s strategy called whole genome shotgun sequencing. It hinged on skipping the mapping phase entirely, a faster, though foolhardy, approach according to some.
The entire genome was directly chopped up into a giant heap of small, overlapping bits. Once these bits were sequenced via the Sanger method, Celera would take the formidable risk of reconstructing the genome using just the overlaps. But perhaps their decision wasn’t such a gamble because guess whose freshly completed map was available online for free?
The Human Genome Consortium, in accordance with the project’s second major principle which held that all of the project’s data would be shared publicly within 24 hours of collection. So in 1998, scientists around the world were furiously sequencing lines of genetic code using the tried and true, yet laborious, Sanger method. Finally, after three exhausting years of continuous sequencing and assembling, the verdict was in. In February 2001, both groups simultaneously published working drafts of more than 90% of the human genome, several years ahead of the consortium’s schedule. The race ended in a tie.
The Human Genome Project’s practice of immediately sharing its data was an unusual one. It is more typical for scientists to closely guard their data until they are able to analyze it and publish their conclusions. Instead, the Human Genome Project accelerated the pace of research and created an international collaboration on an unprecedented scale.
Since then, robust investment in both the public and private sector has led to the identification of many disease related genes and remarkable advances in sequencing technology. Today, a person’s genome can be sequenced in just a few days. However, reading the genome is only the first step. We’re a long way away from understanding what most of our genes do and how they are controlled. Those are some of the challenges for the next generation of ambitious research initiatives.