PhyloSpec

Core Component Library

PhyloSpec aims to start a conversation about standard model components, common assumptions, and best practices in the field of phylogenetics. The result of this will be the Core Component Library, a set of model components which are considered to be common and well-established.

The following represents Draft 12.2025 of the core components. The prototype tools implement most of these core components.

Add your feedback in the GitHub Discussion of this draft!

Types

Boolean

Logical value

String

Text value

Real

Real-valued number

NonNegativeReal

Non-negative real number (>= 0)

Extends: Real

PositiveReal

Positive real number (> 0)

Extends: NonNegativeReal

Probability

Probability value [0,1]

Extends: NonNegativeReal

Rate

A rate (a positive real number)

Alias for: PositiveReal

Age

An age (a non-negative number)

Alias for: NonNegativeReal

Integer

Integer-valued number

NonNegativeInteger

Non-negative integer (>= 0)

Extends: Integer

PositiveInteger

Positive integer (> 0)

Extends: NonNegativeInteger

Count

Non-negative integer (>= 0)

Alias for: NonNegativeInteger

Map<K, V>

Generic map of key-value pairs

Vector<T>

Generic ordered collection of elements

Matrix<T>

Generic two-dimensional grid of values

SquareMatrix<T>

Square matrix with equal number of rows and columns

Extends: Matrix

Simplex

Probability vector with elements that sum to 1.0

Extends: Vector<Probability>

QMatrix

Rate matrix for substitution models

Extends: SquareMatrix<Real>

StochasticMatrix

Stochastic matrix - probability transition matrix

Extends: Matrix<Probability>

Nucleotide

DNA/RNA nucleotide alphabet

AminoAcid

Standard amino acid alphabet

Sequence<A>

Biological sequence with elements from alphabet A

Extends: Vector<A>

Tree

Phylogenetic tree structure

Distribution<T>

Abstract type representing a probability distribution

PopulationFunction

Population function mapping an age to the effective population size

Taxon

Taxonomic unit

Taxa

A set of taxa

Alias for: Vector<Taxon>

Alignment<T>

Multiple sequence alignment with extractable properties

Parser

Description of how to parse a value from a string

Generators

log

Natural logarithm

Generated Type: Real
Arguments:
  • x: PositiveReal (required) - Input value
  • base: Integer (optional) - Base value

exp

Exponential function

Generated Type: PositiveReal
Arguments:
  • x: Real (required) - Input value

sqrt

Square root function

Generated Type: NonNegativeReal
Arguments:
  • x: NonNegativeReal (required) - Input value

linspace

Generate a vector of evenly spaced values over a specified interval

Generated Type: Vector<Real>
Arguments:
  • start: Real (required) - Starting value
  • end: Real (required) - Ending value
  • num: PositiveInteger (required) - Number of values to generate

range

Generate a vector of consecutive integers

Generated Type: Vector<Integer>
Arguments:
  • start: Integer (required) - Starting value
  • end: Integer (required) - Ending value (inclusive)

repeat<T>

Create a vector by repeating a value n times

Generated Type: Vector<T>
Arguments:
  • value: T (required) - Value to repeat
  • num: PositiveInteger (required) - Number of times to repeat

fromNexus

Load an alignment from Nexus file

Generated Type: Alignment
Arguments:
  • file: String (required) - Path to Nexus file
  • age: Parser (optional) - How to parse the taxon ages from the taxon names
  • speciesName: Parser (optional) - How to parse the species names from the taxon names

fromNexus

Load an alignment from Nexus file

Generated Type: Alignment
Arguments:
  • file: String (required) - Path to Nexus file
  • date: Parser (optional) - How to parse the absolute taxon dates from the taxon names
  • speciesName: Parser (optional) - How to parse the species names from the taxon names

fromFasta

Load alignment from FASTA file

Generated Type: Alignment
Arguments:
  • file: String (required) - Path to FASTA file
  • age: Parser (optional) - How to parse the taxon ages from the taxon names
  • speciesName: Parser (optional) - How to parse the species names from the taxon names

fromFasta

Load alignment from FASTA file

Generated Type: Alignment
Arguments:
  • file: String (required) - Path to FASTA file
  • date: Parser (optional) - How to parse the absolute taxon dates from the taxon names
  • speciesName: Parser (optional) - How to parse the species names from the taxon names

fromTree

Load a phylogenetic tree from Newick or Nexus file

Generated Type: Tree
Arguments:
  • file: String (required) - Path to tree file

fromCSV

Load a CSV file

Generated Type: Vector<Map<String, String>>
Arguments:
  • file: String (required) - Path to csv file
  • delimiter: String (optional) [default: ,] - Path to csv file
  • headers: Vector<String> (optional) - Headers to use instead of the first row

traitsFromTaxa

Retrieves an alignment with a single trait from taxa names.

Generated Type: Alignment
Arguments:
  • taxa: Taxa (required) - The taxa to extract the traits from
  • trait: Parser (required) - How to parse the trait from the taxon names

env

Reads an env variable.

Generated Type: String
Arguments:
  • variable: String (required) - Name of the environment variable

fromNewick

Parse a phylogenetic tree from Newick string

Generated Type: Tree
Arguments:
  • newickString: String (required) - Newick format tree string

parse

Creates a parser to extract information out of a delimited string

Generated Type: Parser
Arguments:
  • delimiter: String (required) - The delimiter to separate parts of the input string
  • part: PositiveInteger (required) - Which part to take from the delimited input string (one-based)

parse

Creates a parser to extract information out of a string using regex

Generated Type: Parser
Arguments:
  • regex: String (required) - The regex pattern with a capturing group

taxa

Extract taxa from an alignment

Generated Type: Taxa
Arguments:
  • alignment: Alignment (required) - Input alignment

taxa

Extract taxa from a tree

Generated Type: Taxa
Arguments:
  • tree: Tree (required) - Input tree

taxon

Create a taxon from a name, an optional species name and an optional age

Generated Type: Taxon
Arguments:
  • name: String (required) - Taxon name
  • species: String (optional) - Species name
  • age: Age (optional) [default: 0] - The age of the sample of this taxon

subset

Extract a subset of sites from an alignment

Generated Type: Alignment
Arguments:
  • alignment: Alignment (required) - Input alignment
  • start: PositiveInteger (optional) - Starting site position
  • end: PositiveInteger (optional) - Ending site position (inclusive)
  • codonPosition: PositiveInteger (optional) - Codon position to extract (1, 2, or 3)

numBranches

Count the number of branches in a tree

Generated Type: PositiveInteger
Arguments:
  • tree: Tree (required) - Input tree

numTaxa

Count the number of taxa in an alignment

Generated Type: PositiveInteger
Arguments:
  • alignment: Alignment (required) - Input alignment

numTaxa

Count the number of taxa in a tree

Generated Type: PositiveInteger
Arguments:
  • tree: Tree (required) - Input tree

numSites

Count the number of sites in an alignment

Generated Type: PositiveInteger
Arguments:
  • alignment: Alignment (required) - Input alignment

numSites<A>

Count the number of sites in a sequence

Generated Type: PositiveInteger
Arguments:
  • sequence: Sequence<A> (required) - Input sequence

num<T>

Count the number of elements in a vector

Generated Type: NonNegativeInteger
Arguments:
  • vector: Vector<T> (required) - Input vector

rootAge

Get the age of the root node in a tree

Generated Type: Age
Arguments:
  • tree: Tree (required) - Input tree

age

Get the age of a taxon in a tree

Generated Type: Age
Arguments:
  • node: String (required) - Taxon name or species name
  • tree: Tree (required) - Input tree

age

Get the age of a taxon

Generated Type: Age
Arguments:
  • taxon: Taxon (required) - Input taxon

mrca

Get the age of the most common recent ancestor of a clade in the tree

Generated Type: Age
Arguments:
  • clade: Vector<String> (required) - Vector with the taxon or species names of the clade
  • tree: Tree (required) - Input tree

numRows<T>

Count the number of rows in a matrix

Generated Type: PositiveInteger
Arguments:
  • matrix: Matrix<T> (required) - Input matrix

numCols<T>

Count the number of columns in a matrix

Generated Type: PositiveInteger
Arguments:
  • matrix: Matrix<T> (required) - Input matrix

sum

Sum all elements in a vector of real numbers

Generated Type: Real
Arguments:
  • vector: Vector<Real> (required) - Input vector

sum

Sum all elements in a vector of integers

Generated Type: Integer
Arguments:
  • vector: Vector<Integer> (required) - Input vector

name

Get the name of a taxon

Generated Type: String
Arguments:
  • taxon: Taxon (required) - Input taxon

species

Get the species name of a taxon

Generated Type: String
Arguments:
  • taxon: Taxon (required) - Input taxon

IID<T>

Vector of independent and identically distributed random variables

Generated Type: Distribution<Vector<T>>
Arguments:
  • base: Distribution<T> (required) - Base distribution for each component
  • num: PositiveInteger (required) - Number of independent draws

Mixture<T>

Mixture of distributions with the same return type

Generated Type: Distribution<T>
Arguments:
  • components: Vector<Distribution<T>> (required) - Component distributions that all generate type T
  • weights: Simplex (required) - Mixture weights for each component

Truncated<T>

Truncated version of the given distribution on reals

Generated Type: Distribution<T>
Arguments:
  • base: Distribution<Real> (required) - The base distribution to truncate
  • lower: T (optional) [default: -Inf] - Lower bound
  • upper: T (optional) [default: +Inf] - Upper bound

Offset

Offset version of the given distribution on reals

Generated Type: Distribution<Real>
Arguments:
  • base: Distribution<Real> (required) - The base distribution to offset
  • offset: Real (required) - How much to offset the distribution

Normal

Normal (Gaussian) distribution

Generated Type: Distribution<Real>
Arguments:
  • mean: Real (required) - Mean of the distribution
  • sd: PositiveReal (required) - Standard deviation

LogNormal

Log-normal distribution for positive real values

Generated Type: Distribution<PositiveReal>
Arguments:
  • logMean: Real (required) - Mean of the distribution in log space
  • logSd: PositiveReal (required) - Standard deviation in log space

Gamma

Gamma distribution for positive real values

Generated Type: Distribution<PositiveReal>
Arguments:
  • shape: PositiveReal (required) - Shape parameter
  • rate: PositiveReal (required) - Rate parameter

DiscreteGamma

Discrete gamma-distributed variable with mean 1.0 (equal shape and rate)

Generated Type: Distribution<Real>
Arguments:
  • shape: PositiveReal (required) - Shape parameter
  • numCategories: PositiveInteger (required) - Number of discrete categories

Beta

Beta distribution for values in (0,1)

Generated Type: Distribution<Probability>
Arguments:
  • alpha: PositiveReal (required) - Alpha parameter
  • beta: PositiveReal (required) - Beta parameter

Exponential

Exponential distribution for rate parameters

Generated Type: Distribution<PositiveReal>
Arguments:
  • rate: Rate (required) - Rate parameter

Uniform

Uniform distribution for bounded values

Generated Type: Distribution<Real>
Arguments:
  • lower: Real (required) - Lower bound
  • upper: Real (required) - Upper bound

DiscreteUniform

Uniform distribution of integers for bounded values

Generated Type: Distribution<Integer>
Arguments:
  • lower: Integer (required) - Lower bound
  • upper: Integer (required) - Upper bound

Cauchy

Cauchy distribution

Generated Type: Distribution<Real>
Arguments:
  • location: Real (required) - location parameter
  • scale: PositiveReal (required) - scale parameter

Dirichlet

Dirichlet distribution for probability vectors

Generated Type: Distribution<Simplex>
Arguments:
  • concentration: Vector<Real> (required) - Concentration parameters

MultivariateNormal

Multivariate normal for correlated values

Generated Type: Distribution<Vector<Real>>
Arguments:
  • mean: Vector<Real> (required) - Mean vector
  • covariance: Matrix<Real> (required) - Covariance matrix

Bernoulli

Bernoulli distribution (either 0 or 1)

Generated Type: Distribution<NonNegativeInteger>
Arguments:
  • p: Real (required) - Probability of success

Categorical

Categorical distribution

Generated Type: Distribution<PositiveInteger>
Arguments:
  • probabilities: Simplex (required) - Success probabilities

Binomial

Binomial distribution (number of successes)

Generated Type: Distribution<NonNegativeInteger>
Arguments:
  • numTrials: NonNegativeInteger (required) - Number of trials
  • p: Probability (required) - Probability of success

Multinomial

Multinomial distribution

Generated Type: Distribution<NonNegativeInteger>
Arguments:
  • numTrials: NonNegativeInteger (required) - Number of trials
  • numEvents: PositiveInteger (required) - Number of possible events
  • probabilities: Simplex (required) - The probabilities of each event

Geometric

Geometric distribution (number of failures before the first success)

Generated Type: Distribution<NonNegativeInteger>
Arguments:
  • p: Probability (required) - Probability of success

Poisson

Poisson distribution

Generated Type: Distribution<NonNegativeReal>
Arguments:
  • rate: Rate (required) - Rate parameter

ExponentialMarkovChain

Generates a chain of auto-correlated random variables

Generated Type: Distribution<Vector<Real>>
Arguments:
  • initialMean: Real (required) - The mean of the exponential for the first value of the chain
  • numValues: Count (required) - The length of the returned chain

Yule

Yule pure-birth process for trees conditioned on the number of taxa

Generated Type: Distribution<Tree>
Arguments:
  • birthRate: Rate (required) [default: 1] - Birth rate parameter
  • taxa: Taxa (required) - Taxa for the tree

Yule

Yule pure-birth process for trees conditioned on the root age

Generated Type: Distribution<Tree>
Arguments:
  • birthRate: Rate (required) [default: 1] - Birth rate parameter
  • rootAge: Age (required) - The age of the root
  • taxa: Taxa (required) - Taxa for the tree

BirthDeath

Birth-death process for trees of extant taxa. Conditioned on the root age if given, otherwise conditioned on the number of extant taxa

Generated Type: Distribution<Tree>
Arguments:
  • birthRate: Rate (required) - Birth rate parameter
  • deathRate: Rate (required) - Death rate parameter
  • samplingProbability: Probability (optional) [default: 1] - The proportion of extant taxa sampled at the present
  • taxa: Taxa (required) - Taxa for the tree
  • rootAge: Age (optional) - The age of the root

BirthDeath

Birth-death process for trees of extant taxa. Conditioned on the root age if given, otherwise conditioned on the number of extant taxa

Generated Type: Distribution<Tree>
Arguments:
  • diversificationRate: Rate (required) - Diversification rate parameter
  • turnover: Rate (required) - Turnover parameter
  • samplingProbability: Probability (optional) [default: 1] - The proportion of extant taxa sampled at the present
  • rootAge: Age (optional) - The age of the root
  • taxa: Taxa (required) - Taxa for the tree

Coalescent

Coalescent process for population genetics

Generated Type: Distribution<Tree>
Arguments:
  • populationSize: PositiveReal (required) - Effective population size
  • taxa: Taxa (required) - Taxa for the tree

Coalescent

Coalescent process for population genetics

Generated Type: Distribution<Tree>
Arguments:
  • populationSize: PopulationFunction (required) - Population size function
  • taxa: Taxa (required) - Taxa for the tree

SkylineCoalescent

Skyline coalescent process with piecewise-constant population sizes

Generated Type: Distribution<Tree>
Arguments:
  • populationSizes: Vector<PositiveReal> (required) - Effective population sizes for each epoch
  • changeTimes: Vector<PositiveReal> (required) - Times at which population size changes occur (ages backward in time)
  • groupSizes: Vector<PositiveInteger> (optional) - Number of coalescent events for each epoch
  • taxa: Taxa (required) - Taxa for the tree

FossilizedBirthDeath

Fossilized birth-death process for trees with fossil taxa, conditioned on the number of taxa and root age

Generated Type: Distribution<Tree>
Arguments:
  • speciationRate: Rate (required) - Birth (speciation) rate
  • extinctionRate: Rate (required) - Death (extinction) rate
  • serialSamplingRate: Rate (required) - Fossil sampling rate through time
  • samplingProbability: Probability (optional) [default: 1] - Probability of sampling extant taxa at the present
  • rootAge: Age (optional) - The age of the root
  • taxa: Taxa (required) - Taxa for the tree (including fossils)

FossilizedBirthDeath

Fossilized birth-death process for trees with fossil taxa, conditioned on the number of taxa and root age

Generated Type: Distribution<Tree>
Arguments:
  • diversificationRate: Rate (required) - Diversification rate parameter
  • turnover: Rate (required) - Turnover rate parameter
  • serialSamplingRate: Rate (required) - Fossil sampling rate through time
  • samplingProbability: Probability (optional) [default: 1] - Probability of sampling extant taxa at the present
  • rootAge: Age (optional) - The age of the root
  • taxa: Taxa (required) - Taxa for the tree (including fossils)

PhyloCTMC

Phylogenetic continuous-time Markov chain

Generated Type: Distribution<Alignment>
Arguments:
  • tree: Tree (required) - Phylogenetic tree
  • qMatrix: QMatrix (required) - Q Matrix
  • siteRates: Vector<Real> (optional) - Rate heterogeneity across sites
  • branchRates: Vector<Real> (optional) - Rate heterogeneity across branches

PhyloCTMC

Phylogenetic continuous-time Markov chain

Generated Type: Distribution<Alignment>
Arguments:
  • tree: Tree (required) - Phylogenetic tree
  • siteQMatrices: Vector<QMatrix> (required) - Q Matrix per site
  • siteRates: Vector<Real> (optional) - Rate heterogeneity across sites
  • branchRates: Vector<Real> (optional) - Rate heterogeneity across branches

PhyloBM

Phylogenetic Brownian motion for the evolution of continuous traits

Generated Type: Distribution<Alignment<Real>>
Arguments:
  • tree: Tree (required) - Phylogenetic tree
  • branchRates: Vector<Rate> (required) - Rate-heterogeneity across branches
  • siteRates: Vector<Rate> (required) - Rate-heterogeneity across sites
  • rootValues: Vector<Real> (optional) - Trait values across sites

PhyloOU

Phylogenetic Ornstein-Uhlenbeck process for univariate traits with selection

Generated Type: Distribution<Alignment<Real>>
Arguments:
  • tree: Tree (required) - Phylogenetic tree
  • siteVariances: Vector<PositiveReal> (required) - The rate of random evolution per site (sigma^2)
  • selectionStrength: PositiveReal (required) [default: 1] - Selection strength (alpha)
  • siteOptima: Vector<Real> (required) [default: 0] - Optimal trait values per site
  • rootValues: Vector<Real> (optional) - Trait values at the root per site

StrictClock

Strict clock model for branch rates

Generated Type: Distribution<Vector<Real>>
Arguments:
  • clockRate: Rate (optional) [default: 1] - The base rate
  • tree: Tree (required) - Phylogenetic tree

RelaxedClock

Relaxed clock model for branch rates

Generated Type: Distribution<Vector<Real>>
Arguments:
  • base: Distribution<Rate> (required) - The distribution with mean 1.0 to draw branch-wise rates from
  • clockRate: Rate (required) - The distribution to draw the mean clock rate from
  • tree: Tree (required) - Phylogenetic tree

DiscreteGammaInv

Discrete gamma site rates with a proportion of invariant sites

Generated Type: Distribution<Vector<Real>>
Arguments:
  • shape: PositiveReal (required) - The shape parameter of the gamma distribution
  • numCategories: PositiveInteger (required) - Number of discrete categories
  • invariantProportion: Probability (optional) [default: 0] - Proportion of invariant sites
  • numSites: NonNegativeInteger (required) - Number of sites

jc69

Jukes-Cantor model (equal rates)

Generated Type: QMatrix

k80

Kimura 2-parameter model

Generated Type: QMatrix
Arguments:
  • kappa: PositiveReal (required) - Transition/transversion ratio

f81

Felsenstein 81 model

Generated Type: QMatrix
Arguments:
  • baseFrequencies: Simplex (required) - Nucleotide frequencies

hky

Hasegawa-Kishino-Yano model

Generated Type: QMatrix
Arguments:
  • kappa: PositiveReal (required) - Transition/transversion ratio
  • baseFrequencies: Simplex (required) - Nucleotide frequencies

gtr

General Time-Reversible model

Generated Type: QMatrix
Arguments:
  • rateAC: Rate (required) - Substitution rate for A to C
  • rateAG: Rate (required) - Substitution rate for A to G
  • rateAT: Rate (required) - Substitution rate for A to T
  • rateCG: Rate (required) - Substitution rate for C to G
  • rateCT: Rate (required) - Substitution rate for C to T
  • rateGT: Rate (required) - Substitution rate for G to T
  • baseFrequencies: Simplex (required) - Nucleotide frequencies

wag

Whelan And Goldman model

Generated Type: QMatrix
Arguments:
  • baseFrequencies: Simplex (optional) - Amino acid frequencies

jtt

Jones-Taylor-Thornton model

Generated Type: QMatrix
Arguments:
  • baseFrequencies: Simplex (optional) - Amino acid frequencies

lg

Le-Gascuel model

Generated Type: QMatrix
Arguments:
  • baseFrequencies: Simplex (optional) - Amino acid frequencies

gy94

The M0 (GY94) codon substitution model

Generated Type: QMatrix
Arguments:
  • kappa: PositiveReal (required) - The transition-transversion rate ratio
  • omega: PositiveReal (required) - The dN / dS rate ratio
  • baseFrequencies: Simplex (required) - Stationary codon frequencies

mk

The Lewis-Mk model for discrete traits

Generated Type: QMatrix
Arguments:
  • rate: PositiveReal (optional) [default: 1] - The expected rate of change

constantPopulationFunction

A population function of a constant population

Generated Type: PopulationFunction
Arguments:
  • populationSize: PositiveReal (required) - Effective population size

exponentialPopulationFunction

A population function of a exponentially growing population

Generated Type: PopulationFunction
Arguments:
  • populationSize: PositiveReal (required) - Present-day population size
  • growthRate: Real (required) - The exponent of the exponential growth

logisticPopulationFunction

A population function of a logistically growing population

Generated Type: PopulationFunction
Arguments:
  • inflectionAge: Age (required) - The age of the inflection point t50
  • carryingCapacity: PositiveReal (required) - The carrying capacity K
  • growthRate: Real (required) - The logistic growth rate

compoundPopulationFunction

Combines several population functions in a piece-wise manner

Generated Type: PopulationFunction
Arguments:
  • functions: Vector<PopulationFunction> (required) - The population functions to combine
  • changeTimes: Vector<PositiveReal> (required) - The change times