Cladistics

Cladistics, from the ancient Greek, klados, "branch", is the hierarchical classification of species based on phylogeny or evolutionary ancestry. The term phylogenetics is often used synonymously with cladistics. Cladistics is distinguished from other taxonomic systems because it focuses on the evolutionary relationships of species rather than on morphological similarities, which may be convergent, and because it places heavy emphasis on objective, quantitative analysis.

Cladistics originated in the work of the German entomologist, Willi Hennig, who himself referred to it as phylogenetic systematics; the use of the terms "cladistics" and "clade" was popularized by other researchers. Cladistics originated in the field of biology but in recent years has found application in other disciplines, for example in Textual criticism to determine the relationship between the surviving manuscripts of the Canterbury Tales

Cladistics generates diagrams called cladograms that represent the evolutionary tree of life. DNA and RNA sequencing data are used in many important cladistic efforts. Computer programs are widely used in cladistics, due to the highly complex nature of cladogram generation procedures.

Terminology

 * A clade is an ancestor and all of its descendents
 * A monophyletic group is a clade
 * A paraphyletic group is a monophyletic group that excludes some of the descendants (e.g. reptiles are sauropsids excluding birds). Most cladists discourage the use of paraphyletic groups.
 * A polyphyletic group is a group consisting of members from two non-overlapping monophyletic groups (e.g. flying animals). Most cladists discourage the use of polyphyletic groups.
 * An outgroup is an organism that is considered not to be part of the group in question, but is closely related to the group.
 * A plesiomorphy characteristic that is present in both the outgroups and in the ancestors is called a plesiomorphy (meaning "close form", also called an ancestral state).
 * A characteristic that occurs only in later descendants is called an apomorphy (meaning "separate form", also called a "derived" state) for that group.  Note:  The adjectives plesiomorphic and apomorphic are used instead of "primitive" and "advanced" to avoid placing value-judgments on the evolution of the character states, since both may be advantageous in different circumstances. It is not uncommon to  refer informally to a collective set of plesiomorphies as a ground plan for the clade or clades they refer to.
 * A species or clade is basal to another clade if it holds more plesiomorphic characters than that other clade. Usually a basal group is very species-poor as compared to a more derived group. It is not a requirement that a basal group be extant. For example, palaeodicots are basal to flowering plants.
 * A clade or species located within another clade is said to be nested within that clade.

Three definitions of clade
There are three major ways to define a clade for use in a cladistic taxonomy.


 * Node-based: the last common ancestor of A and B, and all descendants of that ancestor. Crown groups are a type of node-based clade.


 * Branch-based: the first ancestor of A which is not also an ancestor of Z, and all descendants of that ancestor. (This type of definition was originally called "stem-based", but this was changed to avoid confusion with the term "stem group".) Total groups are a type of branch-based clade.


 * Apomorphy-based: the first ancestor of A to possess derived trait M homologously (that is, synapomorphically) with that trait in A, and all descendants of that ancestor.

History of cladistics
Hennig's major book, even the 1979 version, does not contain the term cladistics in the index. He referred to his own approach as phylogenetic systematics, as implied by the book's title. A review paper by Dupuis observes that the term clade was introduced in 1958 by Julian Huxley, cladistic by Cain and Harrison in 1960, and cladist (for an adherent of Hennig's school) by Mayr in 1965.

From the time of Hennig's original formulation until the end of the 1980s cladistics remained a minority approach to classification. However in the 1990s it rapidly became the dominant method of classification in evolutionary biology. Cheap but increasingly powerful personal computers made it possible to process large quantities of data about organisms and their characteristics. At about the same time the development of effective polymerase chain reaction techniques made it possible to apply cladistic methods of analysis to biochemical and molecular genetic features of organisms as well as to anatomical ones.

Cladistics as a successor to phenetics
For some decades in the mid to late 20th century, a commonly used methodology was phenetics ("numerical taxonomy"). This can be seen as a predecessor to some methods of today's cladistics (namely distance matrix methods like neighbor-joining), but made no attempt to resolve phylogeny, only similarities.

Cladograms
The starting point of cladistic analysis is a group of species and molecular, morphological, or other data characterizing those species. The end result is a tree-like relationship diagram called a cladogram, or sometimes a dendrogram (Greek for "tree drawing"). The cladogram graphically represents a hypothetical evolutionary process. Cladograms are subject to revision as additional data become available.

Synonyms
The terms evolutionary tree, and sometimes phylogenetic tree are often used synonymously with cladogram, but others treat phylogenetic tree as a broader term that includes trees generated with a nonevolutionary emphasis.

Subtrees are clades
In cladograms, all organisms lie at the leaves. The two taxa on either side of a split are called sister taxa or sister groups. Each subtree, whether it contains only two or a hundred thousand items, is called a clade.

2-way versus 3-way forks
Many cladists require that all forks in a cladogram be 2-way forks. Some cladograms include 3-way or 4-way forks when there are insufficient data to resolve the forking to a higher level of detail. See phylogenetic tree for more information about forking choices in trees.

Depth
If a cladogram represents N species, the number of levels (the "depth") in the cladogram is on the order of log2(N). For example, if there are 32 species of deer, a cladogram representing deer will be around 5 levels deep (because 25 = 32). A cladogram representing the complete tree of life, with about 10 million species, would be about 23 levels deep. This formula gives a lower limit: in most cases the actual depth will be a larger value because the various branches of the cladogram will not be uniformly deep. Conversely, the depth may be shallower if forks larger than 2-way forks are permitted.

Time scale
A cladogram tree has an implicit time axis, with time running forward from the base of the tree to the leaves of the tree. If the approximate date (for example, expressed as millions of years ago) of all the evolutionary forks were known, those dates could be captured in the cladogram. Thus, the time axis of the cladogram could be assigned a time scale (e.g. 1 cm = 1 million years), and the forks of the tree could be graphically located along the time axis. Such cladograms are called scaled cladograms. Many cladograms are not scaled along the time axis, for a variety of reasons:
 * Many cladograms are built from species characteristics that cannot be readily dated (e.g. morphological data in the absence of fossils or other dating information)
 * When the characteristic data are DNA/RNA sequences, it is feasible to use sequence differences to establish the relative ages of the forks, but converting those ages into actual years requires a significant approximation of the rate of change
 * Even when the dating information is available, positioning the cladogram's forks along the time axis in proportion to their dates may cause the cladogram to become difficult to understand or hard to fit within a human-readable format

Extinct species
Cladistics makes no distinction between extinct and nonextinct species, and it is appropriate to include extinct species in the group of organisms being analyzed. Cladograms that are based on DNA/RNA generally do not include extinct species because DNA/RNA samples from extinct species are rare. Cladograms based on morphology, especially morphological characteristics that are preserved in fossils, are more likely to include extinct species.