The process of identifying the locations of genes and all of the coding sections in a genome, as well as establishing what those genes perform, is known as genome annotation. It's required because DNA sequencing generates sequences with uncertain functions. Genome annotation has progressed over the last three decades from computational annotation of long protein-coding genes on single genomes and experimental annotation of short regulatory elements on a small number of them to population annotation of single nucleotides on thousands of individual genomes. This improved precision and inclusivity of genome annotations (from genotypes to phenotypes) is resulting in more precise insights into the biology of species, communities, and people. Genome annotation includes three major steps:

  • Identifying regions of the genome that do not encode proteins;
  • Identifying genome fundamentals, a process known as gene prediction; and
  • Identifying these elements' organic information

Annotation files include information about the genomic sequence. FASTA, GFF3, and GENBANK are few examples of file formats. There are various file formats for representing sequence, structure, and pathway information relating to genes and proteins, and internet databases provide the ability to select and download certain files.

Genome annotation is of two types structural annotation that includes identification of genomic elements while the assignment of function to structural elements is known as functional annotation.

The genes or proteins that may be recruited by a certain genomic sequence can be predicted using gene annotation algorithms. These new genes or proteins can be functionally annotated by comparing their sequences to well-experimentally confirmed sequences in databases.

Now that the genome sequences of over a thousand human persons (The 100,000 Genomes Project, UK) and some model organisms are fully complete, genome annotation remains a key hurdle for scientists exploring the human genome. Defining the biological "parts list" for the assembly and normal operation of an organism is sometimes referred to as locating genes and other genetic regulatory elements. Scientists are currently in the early stages of defining this parts list and figuring out how all of the pieces "fit together.”