Nucleic Acids Overview
Every organism relies on nucleic acids to exist and survive. Nucleic acids are more familiarly known as DNA and RNA. DNA provides the blueprint for life, and RNA is best known for turning that blueprint into proteins that perform numerous functions. Because nucleic acids are crucial for life and are routinely used and studied in molecular biology, understanding their role is foundational.
Therefore, this article provides an overview of the nucleic acids in DNA and RNA and explains what they are, what they do, how they relate to amino acids and proteins through the genetic code, and how you might use them in your molecular biology work. You will also find two nucleic acid reference tables with the nucleotide single-letter code and the genetic code.
What are Nucleic Acids?
All living organisms have nucleic acids. They play the essential roles of passing down genetic information through generations and guiding cells in producing proteins, the large molecules that do much of the work in the cell. If the cell is a factory, you can think of proteins as the machines, DNA as the instructions to create all possible machines, and RNA as the copies of the instructions used to create the specific machines needed for each type of factory.
Nucleic acids account for almost 25% of the cell by weight, most of which is RNA. If you measure a bacterial cell’s dry weight (E. coli) and each of the components, proteins make up ~55%, RNA ~20%, and DNA ~3%.
The individual components that make up the nucleic acids DNA and RNA are called nucleotides, which consist of a nitrogenous base (5 main bases), sugar, and phosphate (1 to 3 phosphates). One difference between DNA and RNA is composition of the sugar. While RNA contains a ribose, DNA contains deoxyribose (i.e., ribose with a hydroxyl group removed) (Figure 1).
So, in the DNA and RNA “instructions,” the nucleotides are the letters in the “instructions” to create the protein “machines.” Normally, DNA uses four nucleotides, A, T, C, and G. In RNA, T is replaced by U. The letters represent the bases: adenine, thymine, cytosine, guanine, and uracil (Table 1). The bases are characterized as either purines (A or G with a dicyclic structure) or pyrimidines (C, T, and U with a monocyclic structure).
Figure 1. General structure of a DNA and RNA nucleotide. A. The DNA base can be adenine, cytosine, guanine, or thymine. B. The RNA base can be adenine, cytosine, guanine, or uracil.
Table 1. Nucleotides: Single-Letter Codes and Base Structures.
|Nucleotide Abbreviation||Base(s)||Base Structure|
|B||C, G, or T||—|
|D||A, G, or T||—|
|H||A, C, or T||—|
|K||G or T||—|
|M||A or C||—|
|N||A, T, C, or G||—|
|R||A or G||—|
|S||C or G||—|
|V||A, C, or G||—|
|W||A or T||—|
|Y||C or T||—|
DNA and RNA
Nucleotides form strands or oligonucleotides when the phosphate group of one nucleotide reacts with the sugar group on another nucleotide. This can occur chemically in labs (e.g., research labs or oligo synthesis companies) or enzymatically (e.g., DNA and RNA polymerases in organisms or used in vitro).
Strands of DNA with complementary sequences can form a double helix where a purine on an A or G nucleotide base pairs through hydrogen bonds with a pyrimidine on a T or C nucleotide, forming A–T or G–C pairs. There are different types of DNA found in nature, many of which are manipulated in labs for research, diagnostic, and therapeutic purposes. For example:
- Genomic DNA: Generally, the cells of an organism each contain a copy of the same genetic material in their nuclei; however, only a subset of the genetic material is expressed in each cell, which results in various cell types with specific functions in the organism.
- Mitochondrial DNA: Multiple copies of circular DNA containing very little non-coding sequences are found in mitochondria.
- Plasmid: Circular, extrachromasomal DNA that encode some proteins can replicate independently in bacteria and protozoa.
- Viral genome: Some viral genomes are made of DNA and can be either linear or circular and either single-stranded or double-stranded.
- cDNA: Complementary DNA is synthesized from single-stranded RNA by retroviruses using reverse transcriptase enzymes.
Most RNA molecules are single-stranded but can have secondary structure that is important for function. RNA can also be double stranded. There are many types of RNA with various functions in nature as well as in research, diagnostics, and therapeutics. For example:
- mRNA: Messenger RNA is transcribed from DNA, and after post-transcriptional processing, is translated into protein by ribosomes.
- tRNA: Transfer RNA binds amino acids and interacts with mRNA and ribosomes during protein synthesis.
- rRNA: Ribosomal RNA is part of the ribosome that brings mRNA and tRNA together during protein synthesis.
- miRNA: MicroRNA are naturally occurring, short, non-coding RNA that regulate gene expression after transcription.
- siRNA: Short interfering RNA are naturally occurring, double-stranded, non-coding RNA that promote degradation of complementary mRNA to prevent translation in a process known as RNA interference.
- lncRNA: Long non-coding RNA are a heterogenous class of RNA that are longer than 200 nucleotides and have diverse functions, including transcriptional and post-transcriptional regulation, chromatin remodeling, and RNA splicing.
- Viral genome: Some viral genomes are made of RNA and can be either linear or circular and either single stranded or double stranded.
- crRNA: CRISPR RNA is part of prokaryotic acquired-immunity defenses where it helps guide prokaryotic enzymes to recognize, bind, and destroy infecting bacteriophage sequences.
The Genetic Code
Nucleotides combine in a three-letter code, called the genetic code, to provide instructions for creating proteins. Each nucleotide triplet or codon is translated into a specific amino acid (Table 2). For example, the nucleotides UUC code for the amino acid phenylalanine (Phe). Long chains of amino acids form proteins.
These concepts about DNA, RNA, and protein together make the central dogma of molecular biology, which is that DNA is transcribed into RNA, which is translated into proteins.
Table 2. The Genetic Code.
|First Nucleotide||Second Nucleotide||Third Nucleotide|
|U||UUU (Phe)||UCU (Ser)||UAU (Tyr)||UGU (Cys)||U|
|UUC (Phe)||UCC (Ser)||UAC (Tyr)||UGC (Cys)||C|
|UUA (Leu)||UCA (Ser)||UAA (Stop)||UGA (Stop)||A|
|UUG (Leu)||UCG (Ser)||UAG (Stop)||UGG (Trp)||G|
|C||CUU (Leu)||CCU (Pro)||CAU (His)||CGU (Arg)||U|
|CUC (Leu)||CCC (Pro)||CAC (His)||CGC (Arg)||C|
|CUA (Leu)||CCA (Pro)||CAA (Gln)||CGA (Arg)||A|
|CUG (Leu)||CCG (Pro)||CAG (Gln)||CGG (Arg)||G|
|A||AUU (Ile)||ACU (Thr)||AAU (Asn)||AGU (Ser)||U|
|AUC (Ile)||ACC (Thr)||AAC (Asn)||AGC (Ser)||C|
|AUA (Ile)||ACA (Thr)||AAA (Lys)||AGA (Arg)||A|
|AUG (Met, Start)||ACG (Thr)||AAG (Lys)||AGG (Arg)||G|
|G||GUU (Val)||GCU (Ala)||GAU (Asp)||GGU (Gly)||U|
|GUC (Val)||GCC (Ala)||GAC (Asp)||GGC (Gly)||C|
|GUA (Val)||GCA (Ala)||GAA (Glu)||GGA (Gly)||A|
|GUG (Val)||GCG (Ala)||GAG (Glu)||GGG (Gly)||G|
* A = Adenine, C = Cytosine, G =Guanine, U = Uracil. Three-letter abbreviations of the amino acids (as well as start and stop codons) are indicated in parentheses.
Other Functions of Nucleotides
In addition to being the building blocks of DNA, nucleotides have physiological functions, including as an energy source (e.g., adenosine triphosphate [ATP]), signaling molecules for cell growth, energy homeostasis, neuronal signaling, and muscle relaxation (e.g., cyclic adenosine monophosphate [cAMP], cyclic guanosine monophosphate [cGMP], cyclic cytidine monophosphate [cCMP]), and activators or regulators of enzymes (e.g., guanosine triphosphate [GTP] and G proteins).
Nucleic Acids in Molecular Biology
In molecular biology, nucleic acids are studied and used for various purposes. Here, we focus on some key methods for analyzing DNA and RNA.
Knowing the sequence of your DNA (from the genome) or RNA (from expressed genes) is key to manipulating or analyzing samples. Note that RNA is converted to cDNA before sequencing. Sanger sequencing is a type of DNA sequencing used for relatively short sequences with fewer than 1000 base pairs, while next generation sequencing technologies can be used for whole genomes and large numbers of sequences from multiple samples.
Polymerase chain reaction (PCR) is routinely conducted to make copies of DNA segments, often genes, for many reasons. For example, PCR copies (also called amplicons) can be used to detect or measure gene expression, in diagnostic tests to detect nucleic acids from pathogens in patient samples, or in cloning to alter the genetic information in an organism. For gene expression studies or detection of RNA viruses in patient samples, an additional enzyme (i.e., reverse transcriptase) is used to convert the RNA to cDNA before PCR. Gel or capillary electrophoresis can be used to visualize the final PCR products (end-point PCR), or fluorescently labeled oligonucleotides can be used to detect PCR amplification in real time (quantitative PCR).
Restriction enzyme cleavage of DNA has been essential for manipulating DNA in molecular biology applications. These enzymes are originally from bacteria and archaea where they help defend against viral infection. However, in laboratory settings, scientists can use them to cut DNA at or near specific sequence recognition sequences (i.e., restriction sites). The resulting fragments can be ligated (e.g., clone or reconstruct a gene sequence into a plasmid vector) or analyzed by gel or capillary electrophoresis (e.g., distinguish alleles).
Working with Nucleic Acids in Your Research
Whenever you work with nucleic acids, you must use appropriate reagents intended for that purpose because enzymes called DNases and RNases can degrade your DNA and RNA, respectively. Extra care is needed when working with RNA because environmental RNases are relatively common and RNases are particularly hard to inactivate or destroy. Therefore, molecular biology equipment and supplies must be treated to inactivate or remove these enzymes. Choose molecular biology reagents that have been tested for these enzymes and, when possible, treated with chemicals such as diethyl pyrocarbonate (DEPC) to inactivate RNases.
If you need molecular biology supplies, be sure to explore these Teknova products: