DNA Barcode: Definition, Uses, and How DNA Barcoding Works

What Is a DNA Barcode?

A DNA barcode is a short, standardized region of the genome used to identify an organism to species. Strictly, the DNA barcode is the sequence or genomic region used as the identifier; DNA barcoding is the method of generating that sequence and comparing it with reference records. The concept borrows from the supermarket barcode: a short, machine-readable pattern that can be matched against a database. In DNA barcoding, the pattern is the order of the four nucleotide bases—adenine, guanine, cytosine, and thymine—across a few hundred bases of a chosen gene. By comparing the barcode sequence from an unknown specimen against a reference library of sequences from named species, researchers can identify the specimen without relying solely on morphology.

At a glance:

DNA barcode: short standardized DNA sequence used for identification
Main animal marker: COI
Plant markers: rbcL + matK, often with ITS or trnH–psbA
Fungal marker: ITS
Key database: Barcode of Life Data System (BOLD)
Related method: metabarcoding for mixed environmental samples

The approach was formally proposed by Paul Hebert and colleagues at the University of Guelph in 2003. Their key insight was that a segment of mitochondrial DNA—a 658-base-pair stretch at the 5′ end of the cytochrome c oxidase subunit I gene, known as COI or cox1—contains enough sequence variation to distinguish many, and often most, animal species in well-sampled groups, while usually showing lower variation within species than between species. This pattern, in which between-species differences substantially exceed within-species differences, is called the “barcode gap” and is the statistical foundation that makes species assignment from a single short sequence useful.

DNA barcoding sits at the intersection of molecular biology, taxonomy, and bioinformatics. It does not replace traditional morphological taxonomy but extends it: every reliable barcode record is anchored to a vouchered specimen identified by a qualified taxonomist, and the resulting sequence database becomes a tool that non-specialists can use to identify samples whose morphology is ambiguous, fragmentary, or absent.

DNA barcode applications including species identification, environmental DNA metabarcoding, nanopore sensing, and cell tracking — DNA barcodes are short sequence-based identifiers used in species identification, environmental DNA analysis, nanopore sensing, and engineered cell-tracking systems. (Image: Nanowerk)

How DNA Barcoding Works

A barcoding workflow proceeds through five well-defined steps: specimen sampling, DNA extraction, amplification of the barcode region, DNA sequencing, and database comparison. A small piece of tissue—a leg from an insect, a leaf clipping, a fin punch from a fish—is sufficient. Total nucleic acid is extracted using standard kits, and the barcode region is then amplified by the polymerase chain reaction using universal primers that anneal to conserved flanking sequences.

For the animal COI barcode, the most widely used universal primers were designed by Folmer and colleagues in 1994. These primers bind to short, highly conserved regions on either side of the variable 658-base-pair stretch, allowing amplification across a broad range of phyla from insects to mammals. The amplified product is then sequenced, traditionally by the Sanger method but increasingly by next-generation sequencing platforms when many samples are processed in parallel.

The resulting sequence is submitted to a query algorithm that finds the closest matches in a reference library. The two main public repositories are the Barcode of Life Data System (BOLD), which specializes in barcode-format records linked to voucher specimens and photographs, and GenBank, the broader sequence database maintained by the U.S. National Center for Biotechnology Information. Identification confidence is reported as a percent match, and many platforms also report a Barcode Index Number (BIN)—an automated cluster of similar sequences that approximates a species, useful when no named match is found.

The power of the method depends entirely on the quality and coverage of the reference library. A barcode sequence from an unknown specimen can only be identified if a verified reference record exists for that species. This is why building comprehensive reference libraries has been the central effort of the international barcoding community for more than two decades.

Standard Barcode Markers Across Kingdoms

No single gene region works as a universal barcode across all of life. Different kingdoms required different solutions, reflecting variation in mitochondrial gene evolutionary rates, the presence of plastids in plants, and the unusual genome organization of fungi. After roughly a decade of comparative studies, the international community converged on three standard barcodes covering the principal eukaryotic groups.

Group	Standard barcode	Genome location	Adopted	Typical species discrimination
Animals	COI (cytochrome c oxidase subunit I), 658 bp	Mitochondrial	2003	High in many animal groups; lower in recent radiations or hybridizing taxa
Land plants	rbcL + matK (two-locus combination)	Plastid (chloroplast)	2009	~70–75%
Fungi	ITS (internal transcribed spacer)	Nuclear ribosomal DNA region	2012	Variable by group; highest of tested loci
Bacteria and archaea	16S rRNA gene amplicons / variable regions	Bacterial or archaeal chromosome	Predates formal barcoding	Genus level for most; species level for some

For animals, the dominance of COI reflects two convenient biological facts: animal mitochondrial DNA evolves rapidly enough to generate species-specific differences, and mitochondria are present in many copies per cell, easing amplification from small or degraded samples. For plants, mitochondrial genes evolve too slowly to discriminate species, so the Consortium for the Barcode of Life (CBOL) Plant Working Group evaluated seven candidate plastid regions and in 2009 recommended a two-locus combination of rbcL and matK. The trnH–psbA spacer and the nuclear ribosomal internal transcribed spacer (ITS) are commonly added as supplementary markers, particularly for groups where the core two-locus barcode lacks resolution.

For fungi, a multinational consortium led by Conrad Schoch evaluated six DNA regions and concluded in 2012 that the nuclear ribosomal internal transcribed spacer (ITS) offers the highest probability of successful identification across the broadest range of taxa. The fungal COI gene is unsuitable because it is difficult to amplify, often contains large introns, and is sometimes too conserved to discriminate species. For bacteria and archaea, 16S rRNA sequencing is analogous to barcoding but is more often described as marker-gene profiling or taxonomic amplicon sequencing. The small-subunit ribosomal RNA gene had been used as a phylogenetic marker for decades before DNA barcoding existed as a discipline, and it often resolves organisms to genus level but not always to species.

The Barcode of Life Initiative and Global Reference Libraries

Constructing the reference libraries that make barcoding useful has been the work of a coordinated international effort. The Consortium for the Barcode of Life was founded in 2004 to develop community standards, and in 2008 it gave rise to the International Barcode of Life Consortium (iBOL), a research alliance now spanning more than 40 nations. iBOL’s first major program, BARCODE 500K, completed in 2015, assembled barcode records for roughly 500,000 species through an international investment often reported at about $125 million.

The current iBOL program, BIOSCAN, runs from 2019 through 2026 with a budget of approximately $180 million. Its goals are to extend the barcode reference library to two million species, codify species interactions at thousands of sites worldwide, and turn metabarcoding of genomic samples into a routine tool for global biodiversity monitoring. BIOSCAN will analyze more than ten million specimens collected from freshwater, marine, and terrestrial ecosystems, with a substantial fraction expected to belong to species not yet described.

The Barcode of Life Data System (BOLD), developed at the University of Guelph and launched in 2005, is the central informatics hub of this effort. BOLD assembles specimen records that link each barcode sequence to a vouchered specimen, photographs, geographic coordinates, and a taxonomic identification by a qualified expert. The system also operates the Barcode Index Number (BIN) registry, an automated clustering algorithm that groups similar sequences into operational taxonomic units serving as proxies for species, especially valuable for analyzing undescribed diversity. BOLD version 5, introduced in 2024, provides access to more than 20.5 million public records representing about 1.7 million species, with additional records awaiting validation and release.

Environmental DNA and Metabarcoding

An extension of barcoding that has transformed biodiversity science is metabarcoding: the application of barcode amplification and sequencing to bulk samples containing many organisms or to environmental samples containing trace DNA. Soil, river water, ocean water, sediment, feces, gut contents, and even air all contain genetic material shed or released by the organisms living in or passing through them. This material is called environmental DNA (eDNA), and its analysis by metabarcoding can produce a species inventory of an entire community from a single sample.

In a typical aquatic eDNA workflow, water is filtered to capture cells and free DNA, the DNA is extracted, a short barcode region is amplified using primers chosen to capture the target group (for example, a fish-specific 12S rRNA region or a short COI fragment for invertebrates), and the resulting mixture is sequenced on a high-throughput platform. Bioinformatic pipelines then sort millions of reads into operational taxonomic units and assign each to a species when reference matches exist.

eDNA metabarcoding is increasingly discussed as a tool for biodiversity monitoring under the Kunming–Montreal Global Biodiversity Framework, partly because it is non-invasive, scalable, and capable of detecting rare or cryptic species that conventional surveys miss. It is now used routinely to monitor freshwater fish communities, detect invasive species before they establish, screen ballast water for biosecurity risks, and reconstruct ancient biological communities from lake sediments and permafrost. Its main limitations are dependence on PCR amplification (which can bias detection toward groups well covered by the chosen primers), the difficulty of converting read counts into reliable abundance estimates, and incomplete reference databases that prevent identification of species without prior barcode records.

Applications in Food Authentication, Conservation, and Forensics

DNA barcoding moved from academic taxonomy into commercial and regulatory practice once mini-barcode protocols using short fragments of 100–300 base pairs were developed. These shorter sequences survive the DNA degradation caused by cooking, drying, smoking, and processing, allowing identification of ingredients in food products where morphology has been destroyed. Seafood mislabeling has been the most extensively studied application: a 2010–2012 investigation by the advocacy group Oceana, conducted with the Canadian Centre for DNA Barcoding at the University of Guelph, found that 33% of 1,215 seafood samples from U.S. retailers were mislabeled, with snapper and tuna showing the highest substitution rates. Subsequent studies in other regions have found that mislabeling varies strongly by product, supply chain, and sampling method, and DNA barcoding is now used by regulatory and research laboratories to verify species claims.

In conservation, barcoding supports the enforcement of wildlife trade regulations such as the Convention on International Trade in Endangered Species (CITES). Customs laboratories use barcode-based identification to detect protected species in confiscated shipments of bushmeat, traditional medicines, and timber, where the source organism has been processed beyond visual recognition. Conservation biologists also use barcoding to study diet through analysis of stomach contents and feces, to identify cryptic species complexes that morphological taxonomy has missed, and to monitor populations of rare or elusive animals through their eDNA traces in water.

Forensic and food-safety applications continue to expand. Barcoding is used to authenticate herbal supplements (where ingredient substitution and contamination are common), to verify the species composition of meat products, to track the geographic origin of agricultural commodities, and to identify human remains and trace evidence from arthropods of forensic interest. Plant barcoding contributes to pollen identification for studies of bee foraging and palynology, and to authentication of ingredients in traditional medicines where multiple plant species may share a common name.

Limitations and Open Challenges

DNA barcoding has clear limits. Species that have diverged recently or hybridize frequently may share identical or near-identical barcodes despite being morphologically and ecologically distinct, particularly in young radiations such as African cichlid fishes and many flowering plant groups. Mitochondrial introgression between species can produce misleading identifications when one species’ mitochondrial lineage spreads into another’s nuclear background. Numts—nuclear copies of mitochondrial sequences—can be co-amplified by COI primers and contaminate datasets if not detected and filtered.

Reference library gaps remain a fundamental constraint. An unidentified barcode is useful only when a verified reference record exists, and large portions of microbial, fungal, and invertebrate diversity are still unsampled. Database errors compound the problem: mislabeled specimens, contamination, and sequencing artifacts have all entered public repositories, and curation is ongoing. Methodologically, PCR amplification introduces biases that affect quantitative analyses, and short barcode sequences carry limited phylogenetic signal compared with whole-genome data.

Several technological developments are addressing these limits. Portable nanopore sequencers now allow field-based barcoding in remote locations, third-generation long-read sequencing can produce full mitochondrial genomes or chloroplast genomes from a single sample, and PCR-free approaches such as genome skimming bypass the biases of amplification entirely. Combining classical barcodes with these higher-resolution methods, sometimes called “ultra-barcoding” or “super-barcoding,” is gradually closing the gap between rapid identification and full phylogenomics.

Other Meaning: Synthetic DNA Barcodes in Cell Tracking

Outside species identification, “DNA barcode” also has a second meaning. The term is used in cell and molecular biology to refer to short synthetic sequences inserted into cells, viruses, or molecules to track them through experiments. In cellular barcoding, a library of millions of unique synthetic sequences is delivered to a population of cells, typically by lentiviral integration, so that each cell—and all its descendants—carries a distinct genetic tag readable by sequencing. This makes it possible to follow individual cell lineages through development, cancer progression, immune responses, or drug treatment, an approach now combined with single-cell RNA sequencing for simultaneous lineage and transcriptomic analysis.

CRISPR-Cas9 has extended this approach by writing barcodes into the genome in situ: an engineered array of target sites accumulates Cas9-induced edits as cells divide, producing a record of lineage relationships that can be reconstructed by single-cell sequencing. Synthetic DNA barcodes also appear in high-throughput screens, in single-molecule sequencing protocols (as unique molecular identifiers that distinguish true variants from PCR duplicates), and in nucleic-acid-based data storage research. These uses share the core idea of taxonomic barcoding—a short, unique identifier carried in DNA—but operate at the scale of individual cells or molecules rather than species.

Frequently Asked Questions

What is the difference between DNA barcoding and DNA sequencing? DNA sequencing is the general process of determining the order of nucleotides in a DNA molecule, which can be applied to any length of DNA, from a few hundred bases to an entire genome. DNA barcoding is a specific use of sequencing that targets one or a few standardized short genome regions chosen because they often differ among species more than they vary within a species. Barcoding uses sequencing as a tool, but its purpose is species identification rather than full genomic characterization.

Why is the cytochrome c oxidase I (COI) gene used as the animal DNA barcode? COI is encoded by mitochondrial DNA, which is present in hundreds to thousands of copies per cell, making it easier to amplify from small or degraded samples than nuclear genes. It often evolves quickly enough for closely related animal species to accumulate distinguishing mutations, while its protein-coding constraints limit excessive within-species variation. Universal primers developed by Folmer and colleagues in 1994 amplify a 658-base-pair region across many animal phyla, giving COI broad taxonomic reach.

Can DNA barcoding identify species from processed food or environmental samples? Yes. Mini-barcode protocols use shorter fragments of around 100 to 300 base pairs that survive the DNA degradation caused by cooking, drying, and freezing, allowing identification of fish in sushi, meat in sausages, or herbs in processed food. For environmental samples such as water, soil, or air, a related approach called metabarcoding amplifies and sequences barcode regions from all organisms present, producing a community profile from a single sample.

What is the Barcode of Life Data System (BOLD)? BOLD is the central online database and workbench for DNA barcoding, developed at the Centre for Biodiversity Genomics at the University of Guelph in Canada. It stores barcode sequences linked to voucher specimens, photographs, geographic data, and taxonomic identifications, and provides identification tools that match query sequences against the reference library. BOLD version 5 was introduced in 2024, and the BOLD data portal now reports more than 20.5 million public records representing about 1.7 million species, with additional records awaiting validation and release.

Does DNA barcoding work equally well for animals, plants, and fungi? No. A single mitochondrial barcode region works well for many animal groups, but plants required a different solution because plant mitochondrial genes evolve too slowly to discriminate species. The Consortium for the Barcode of Life adopted a two-locus plastid combination of rbcL and matK for land plants in 2009, often supplemented with the nuclear ITS region. For fungi, the nuclear ribosomal internal transcribed spacer (ITS) was formally adopted as the primary barcode in 2012 after a multinational comparison of candidate markers.