Glossary of terms

7.11. Glossary of terms#

data

An alignment of sequences descended from a common ancestor.

estimate

The value of a parameter obtained from a population sample. Statisticians employ “hat” notation to identify a quantity as being an estimate of a parameter, e.g. \(\hat \pi_A\) is an estimate of the parameter \(\pi_A\)

fit a model

Numerical optimisation of parameters in a model. In the case of maximum likelihood, the model parameters are selected that produce the maximum likelihood of observing the data under the model

free parameter

A model parameter whose value is not fixed during numerical optimisation, hence the parameter is free to vary

maximum likelihood

A method of parameter estimation

maximum likelihood estimators (MLEs)

The estimated values of model parameters after fitting

parameter

A parameter is a term in a model. More generally, it refers to the true value of a quantity from a population.

rate matrix (Q)

A square matrix with off-diagonal elements ≥ 0 and row sums of 0. Represents the instantaneous rate of change from the row label state (e.g. a nucleotide) to a column label state (e.g. another nucleotide)

substitution matrix (P)

A square matrix of probabilities, with row sums of 1. Represents the probability of substituting the row label state with the column label state for a given time interval. Typically derived from Q by matrix exponentiation.

rate parameter

A component of Q, specifying the instantaneous rate of change between row and column states

tree, phylogenetic tree

A graph displaying the relationship among sequences

branch length

On a phylogenetic tree, the expected number of substitutions per site for a specific branch. This is a parameter estimated by maximum likelihood

site

A column in a multiple sequence alignment

Likelihood ratio (LR)

A test statistic that is distributed \(\chi^2\) with degrees of freedom equal to the difference in the number of free parameters between alternate and null models.

number of free parameters (nfp)

Parameters in the likelihood function that are “free” – can be modified by the numerical optimiser – to modify the fit to data.

degrees of freedom (df)

A quantity defining the difference in the number of parameters between two models.