The sequencing or reading of 4 nucleotides i.e. A (Adenine), T(Thymine), C(Cytosine) or G(Guanine) in human DNA under Human Genome project in 2003 is considered as a landmark event inthe development of genomics. The project complexity was determined by the fact that there are 3 billion nucleotide base pairs in human DNA in haploid cells (which contain only single chromosome rather than a pair). The project started in 1990 and costed about $ 2.7 billion.
The sequencing has many benefits in the field of medicine as it allows us to understand risk of a particular disease for an individual based on his or her genome. This allows preventive care besides creating customized treatment after the onset of disease, thus increasing effectiveness and decreasing side effects. It makes it possible to correlate any gene or group of genes with a human trait (eye color, facial looks) or diseases (Alzheimer’s, Huntington’s). Additionally, sequencing of genomes of various organisms aids in study of evolution timeline that can replace or support the fossil evolution timeline.
The benefits in the field of medicine can only occur if the technology permits quick and cost effective sequencing of genome of any human. The current techniques break the DNA into small pieces which are easier to read and reassembles them together by searching common patterns at the end. The latter part is highly computationally intensive and needs memory in hundreds of Gigabytes. Besides entire DNA is sequenced about 30 times to reduce error. But today it is possible to sequence genome in a few days and it costs a few thousand dollars. Here growth in IT is helping in improvement of sequencing techniques. Storing human genome needs a space in a few gigabytes and sequencing of estimated 7.5 billion people on earth will require space in exabytes (10^18 bytes).
Understanding the roles of genes involves finding patterns in genomes of people who have similar traits or diseases. While some diseases e.g. sickle cell disease and cystic fibrosis can be linked to a single gene; others e.g. heart disease, diabetes, and obesity do not have a single genetic cause. Researchers studying Alzheimer's disease at Mayo Clinic, Florida used Blue Waters supercomputer to understand its genetic causes. Additionally, at least some part of DNA that is not used in coding proteins or “junk DNA” regulates the function of genes. Genes constitute less than 3% of DNA. Deciphering “junk DNA” again needs massive computing resources as scientists subject tissues to various conditions and study whichparts of the DNA code are activated in which cells and at what times.
Comparative genomics focusses on the genomic features of different organisms and finds evolutionary relationships between organisms. Again, it needs huge computational resources to find patterns and differences. Research team at the University of Texas used supercomputer at the university to study snake evolution. One project, Earth BioGenome Project (EBP) aims to sequence all living organisms on the earth. It will start with 1 member of each eukaryotic family (which have membrane around nucleus) so will have 9000 as initial set and will eventually go to1.5 million species. Computing power and space is critical here as genome size reach upto 150 Gbases, e,g, for Paris japonica.
Study of genomics needs IT areas e.g. data science, data compression, analytics, statistics, machine learning, security etc. Unlocking full benefits of genomics will challenge the existing technologies of data storage, memory and computing power. But improvement in IT will improve technologies of genomics cutting down the most important factor, time.