The advances in molecular biology after the discovery of the structure of DNA were formidable and underlie much of what we know about living things, their cells and their systems. DNA sequencing was described in the late 1970s and, since then, methods for determining the order of the four bases (the letters A, C, T, and G) have evolved surprisingly. In my postdoctoral years, in the early 90s, sequencing was an almost artisanal reaction. It was a great joy to end the week with about 500 bases (500 letters A, C, T and G) sequenced.
Between 1999 and 2001, I was one of the coordinators of the FAPESP / Instituto Ludwig genome project in Brazil. In one week, the project sequenced about 3 million bases, in the 10 research centers involved. In a worldwide effort, it took us about a decade to publish, in 2001, the complete sequence of the human genome, which has about 3 billion bases. Brazil was part of this effort and made relevant contributions. Today, with technological advances and bioinformatics, it is possible to sequence the complete human genome in two days.
To remember, DNA is a double-stranded molecule formed by four bases, one complementary to the other (always pairing A with T, and C with G). The sequencing, either by the first methods or using the most modern equipment, is based on a chemical reaction to synthesize a complementary strand from an already known DNA sequence, which functions as a template.
In the most modern methods, each of the 4 bases is marked by a reagent that, when incorporated in the synthesis of the new strip, emits a specific color, which is read by the equipment. With this, it is possible to determine the exact sequence of the bases of the new tape.
RNA is a single-stranded molecule, also formed by four bases, A, U, C and G. With the aid of the enzyme reverse transcriptase (discovered in retroviruses, such as HIV), we can use RNA as a template to create a molecule of cDNA, and this cDNA can be sequenced as described above. Thus, we can also determine the sequence of the RNA bases.
Today, sequencing is a commodity. The machines that perform this sequencing become obsolete at a very high speed and the cost of the reagents is inversely proportional to the volume of DNA sequenced. The challenge is not to sequence, but to transform the “raw” data coming out of the machines into useful information, which correctly represents the sequenced DNA.