Genetic Algorithms: Examples & Purpose

Instructor: Joseph Said
This lesson explains what genetic algorithms are and gives examples of them. Both molecular biology and computer science are used to discuss a fairly new field called bioinformatics, which utilizes both.

Genetic Algorithms

What is a genetic algorithm? A genetic algorithm is a computer program that's used to evaluate and analyze genetic sequences, either DNA, RNA, or protein. From the early 1980s through the remainder of the 20th century, DNA sequencing technologies produced short sequences of DNA from samples, and analysis of those samples was manageable by simple computer programs. These older sequencers produced a few hundred to a few thousand nucleotide sequences but nothing huge.

The 21st century has brought major advances in molecular biology in terms of genetic sequencing technologies. In a single day, hundreds of millions of DNA reads, with each read containing approximately 300 base pairs, can be sequenced, using just one sequencer.

As a result, genomic databases now exist that house trillions of nucleotides with terabytes of data. To help you relate, the average computer now holds 1 terabyte of data, so we're talking about a lot of data! Bioinformatics, the field that merges molecular biology and computer science, is used to construct genetic algorithms for research and analysis purposes. With these algorithms, we can analyze all that data!

FASTA Sequences

DNA and protein sequences are stored as FASTA sequences. FASTA stands for 'FAST-ALL' since it encompasses both DNA and protein sequences. FASTA files can be small, containing less than 1,000 nucleotides or amino acids, or they can contain billions. Below are two examples of FASTA format, the first with nucleotides and the second with amino acids. Notice the '>' symbol, which denotes that it's a FASTA sequence, followed by the title. Multiple FASTA files can be contained in the same text file separated by '>' symbols.

  • Nucleotide FASTA: ATCCGCCTAGCTACGTTACCGGGGACCGTAGGTACCGACTAAATACG
  • Protein FASTA: MAHTYWRCSTPAGGHYTRLKPLKTYRCSAWEGRTYRASMPLKTLKKP

Imagine billions of these sequences! Computer programs are needed to read and analyze all of that data.

BLAST Matrices

Basic Local Alignment SubsTitution (BLAST) algorithms are one of the most basic genetic algorithms. They search for a specific DNA or for protein FASTA sequences across a nucleotide, translated nucleotide, or protein database of FASTA sequences for similar sequences.

Why use BLAST? Imagine you want to find a specific gene (maybe 1,000 nucleotides) on a genome (billions of nucleotides). The BLAST algorithm can find the location or best match on the genome in seconds. It would take you years, maybe even a lifetime, to do the same thing without the BLAST algorithm.

BLAST algorithms use basic local alignment substitution matrices, which compare two sequences using a pairwise alignment. Each nucleotide in the query sequence is evaluated against the database's nucleotides. Gaps, which are spaces where a nucleotide might exist in either the query or database but not on the other, are evaluated with a penalty of -1, the same as mismatches (when nucleotides don't match). When nucleotides coincide, a +1 score is assigned to the match. The sequences with the highest scores of matches with the fewest gaps and mismatches are returned as the best search results.

Below is an example of a typical BLAST output. The percent identity is nearly 100%, meaning the query and search results were a good match.

BLAST output

ORFs and Annotation Algorithms

Sequenced genomes can be analyzed for genes within them using Open Reading Frame (ORF) scanners designed to read FASTA sequences. An ORF is a gene that has a beginning and end on the genome based on certain sequences. An ORF scanner primarily functions by identifying start and stop codons (sets of three nucleotides like ATG) in all six reading frames of the genome. Remember that genes can be found on either the positive or negative strand of DNA, and each strand has three different reading frames.

When a sequence is identified that contains both start and stop codons, it's marked by the algorithm. Annotation algorithms combine ORF scanners and BLAST algorithms to find the closest match for genes and proteins in a database of known genes and proteins to find its function. They operate using BLAST algorithms to match the ORFs identified with similar genes already annotated and mark them as probable genes of the same kind.

Why annotate the genome? Without annotation, we don't know what any of the genome does or where the genes are on the genome.

To unlock this lesson you must be a Study.com Member.
Create your account

Register to view this lesson

Are you a student or a teacher?

Unlock Your Education

See for yourself why 30 million people use Study.com

Become a Study.com member and start learning now.
Become a Member  Back
What teachers are saying about Study.com
Try it risk-free for 30 days

Earning College Credit

Did you know… We have over 200 college courses that prepare you to earn credit by exam that is accepted by over 1,500 colleges and universities. You can test out of the first two years of college and save thousands off your degree. Anyone can earn credit-by-exam regardless of age or education level.

To learn more, visit our Earning Credit Page

Transferring credit to the school of your choice

Not sure what college you want to attend yet? Study.com has thousands of articles about every imaginable degree, area of study and career path that can help you find the school that's right for you.

Create an account to start this course today
Try it risk-free for 30 days!
Create An Account
Support