Overview of cogent3 apps

8.9. Overview of `cogent3` apps#

cogent3.apps can be though of as customised functions that have some special capabilities. The approach to using an existing app is to construct a configured instance, typically by calling the app with some argument values tuned for your particular case. You then apply it to input data by simply calling, without specifying any arguments.

There are quite a few apps and you can see an overview of what’s available as follows

from cogent3 import available_apps

apps = available_apps()

apps.head()

package	name	composable	doc	input type	output type
cogent3	align_to_ref	True	Aligns sequences to a nominated reference in the unaligned collection.	SequenceCollection	Alignment, ArrayAlignment, SerialisableType
cogent3	ancestral_states	True	Computes ancestral state probabilities from a model result.	model_result	SerialisableType, tabular_result
cogent3	approx_jc69	True	Converts p-distances and returns pairwise JC69 distances	DistanceMatrix	DistanceMatrix
cogent3	approx_pdist	True	Calculates an approximation of the p-distance between sequences based on Jaccard distances (see Notes for details).	DistanceMatrix	DistanceMatrix
cogent3	bootstrap	True	Parametric bootstrap for a provided hypothesis.	Alignment, ArrayAlignment	SerialisableType, bootstrap_result

Top 5 rows from 46 rows x 6 columns

That function returns a cogent3 table, so I’m just displaying the first few rows.

See the cogent3 apps documentation for more details.

8.10. Getting help on an app#

To get information on a particular app, use the special app_help() function, passing the name of the app as a string.

from cogent3 import app_help

app_help("model")

Overview
--------
Define a substitution model + tree for maximum likelihood evaluation.

Options for making the app
--------------------------
model_app = get_app(
    'model',
    sm: str | cogent3.evolve.substitution_model._SubstitutionModel,
    tree=None,
    unique_trees=False,
    tree_func=None,
    name=None,
    optimise_motif_probs=False,
    sm_args=None,
    lf_args=None,
    time_het=None,
    param_rules=None,
    opt_args=None,
    lower=1e-06,
    upper=50,
    split_codons=False,
    show_progress=False,
    verbose=False,
)

Parameters
----------
sm
    substitution model (str or instance) if string must be available
    via get_model()
tree
    if None, assumes a star phylogeny (only valid for 3 taxa). Can be a
    newick formatted tree, a path to a file containing one, or a Tree
    instance
unique_trees
    whether to specify a unique tree per alignment. Only applies if
    number of sequences equals 3
tree_func: callable
    a callable that takes an alignment and returns a Tree instance.
    Overrides tree and unique_tree settings.
name
    the model name
optimise_motif_probs
    whether the motif probabilities are free parameters. If False,
    takes the average of frequencies from the alignment. Overrides
    the setting of a sub model instance, or any value provided in
    sm_args
sm_args
    arguments to be passed to the substitution model constructor
lf_args
    arguments to be passed to the likelihood function constructor
time_het
    Affects whether substitution model rate parameters are
    heterogeneous between branches on the tree. To define a maximally
    time-heterogeneous model, set the string value 'max', which
    makes all rate matrix exchangeability parameters unique for all
    edges. More restricted time-heterogeneity can be specified
    using a list of dicts corresponding to edge_sets, e.g.
    ``[dict(edges=['Human', 'Chimp'], is_independent=False, upper=10)]``.
    This value is passed to <likelihood function>.set_time_heterogeneity()
param_rules
    other parameter rules, passed to
    <likelihood function>.set_param_rule()
opt_args
    arguments for the numerical optimiser, e.g.
    dict(max_restarts=5, tolerance=1e-6, max_evaluations=1000,
    limit_action='ignore')
lower, upper
    bounds for all rate and length parameters. Ignored if a
    rule in ``param_rules`` or ``time_het`` has a value defined.
split_codons
    if True, incoming alignments are split into the 3 frames and each
    frame is fit separately
show_progress
    show progress bars during numerical optimisation
verbose
    prints intermediate states to screen during fitting

Returns
-------
Calling an instance with an alignment returns a model_result instance
with the optimised likelihood function. In the case of split_codons,
the result object has a separate entry for each codon position.

Examples
--------

Create a model and fit to a three-sequence alignment. For three
sequences, there is only one possible unrooted tree so we do not need
to provide one. (We're limiting the optimiser's workload by setting
``max_evaluations=10``, solely to ensure quick execution of the examples, not
because we recommend it!)

>>> from cogent3 import make_aligned_seqs, get_app
>>> aln = make_aligned_seqs(
...     {
...         "Human": "ATGCGGCTCGCGGAGGCCGCGCTCGCGGAG",
...         "Mouse": "ATGCCCGGCGCCAAGGCAGCGCTGGCGGAG",
...         "Opossum": "ATGCCAGTGAAAGTGGCGGCGGTGGCTGAG",
...     },
...     moltype="dna",
... )
>>> app = get_app(
...     "model", "F81", opt_args=dict(limit_action="ignore", max_evaluations=10)
... )
>>> result = app(aln)
>>> result
F81...

For the following, we will only show different model construction options
but don't apply them to data.

To apply a model to an alignment with more than three sequences
we need to provide a tree. We can provide the tree as a newick
string.

>>> tree = "(Mouse,(Human,Gorilla),Opossum)"
>>> aln2 = make_aligned_seqs(
...     {
...         "Human": "ATGCGGCTCGCGGAGGCCGCGCTCGCGGAG",
...         "Gorilla": "ATGCGGCGCGCGGAGGCCGCGCTCGCGGAG",
...         "Mouse": "ATGCCCGGCGCCAAGGCAGCGCTGGCGGAG",
...         "Opossum": "ATGCCAGTGAAAGTGGCGGCGGTGGCTGAG",
...     },
...     moltype="dna",
... )
>>> app_tr = get_app("model", "F81", tree=tree)

Or we could assign a function that estimates the tree for an alignment.

>>> dist_cal = get_app("fast_slow_dist", fast_calc="paralinear", moltype="dna")
>>> est_tree = get_app("quick_tree")
>>> tree_func = dist_cal + est_tree
>>> model = get_app("model", "F81", tree_func=tree_func)

We can specify a time-heterogeneous model (where substitution rate parameters
differ between branches). For details, see
https://cogent3.org/doc/app/evo-model-timehet

>>> app_thet = get_app(
...     "model",
...     "HKY85",
...     tree=tree,
...     time_het=[dict(tip_names=["Human", "Opossum"], outgroup_name="Mouse")],
... )

Specify the upper and lower bounds for certain branch length and rate
exchangeability parameter.

>>> app_alt_params = get_app(
...     "model",
...     "HKY85",
...     tree=tree,
...     param_rules=[
...         {"par_name": "length", "edge": "Human", "upper": 5, "lower": 1e-2},
...         {"par_name": "kappa", "upper": 20, "lower": 1e-6},
...     ],
... )

Specify the settings in the optimiser. By default, the Powell local optimiser is used.
The Powell algorithm can use restarts, configured using ``max_restarts``, to overcome
local maxima. With ``limit_action="ignore"`` defined, the optimiser will disregard
optimisation failures caused by exceeding ``max_evaluations``, rather than meeting the
``tolerance`` condition. (For more information see
https://cogent3.org/doc/cookbook/evo_modelling.html.)

>>> app_alt_opt = get_app(
...     "model",
...     "HKY85",
...     tree=tree,
...     opt_args=dict(
...         max_restarts=5,
...         tolerance=1e-8,
...         max_evaluations=1_000_000,
...         limit_action="ignore",
...     ),
... )

Specify settings in the likelihood function constructor.

>>> app_alt_lf = get_app(
...     "model", "HKY85", tree=tree, lf_args=dict(discrete_edges=["Opossum"])
... )

Splitting codons and fit models to each codon position class.

>>> app_sp_codon = get_app("model", "HKY85", tree=tree, split_codons=True)

A ``NotCompleted`` object (see https://cogent3.org/doc/app/not-completed.html)
is returned if ``tree`` (or ``tree_func``) is not provided and the number of seqs
exceeds 3.

>>> app_notree = get_app("model", "HKY85")
>>> result = app_notree(aln2)
>>> result.message
'to model more than 3, you must provide a tree'

A ``NotCompleted`` object is also returned if the model optimization is unsuccessful.
(Note that we have deliberately configured the optimiser to raise an exception if
it exits because it reached the maximum allowed evaluations.)

>>> app_limit_act = get_app(
...     "model", "GN", opt_args=dict(limit_action="raise", max_evaluations=10)
... )
>>> result = app_limit_act(aln)
>>> print(result.message)  # doctest: +NORMALIZE_WHITESPACE
Traceback ... FORCED EXIT from optimiser after 10 evaluations

Input type
----------
Alignment, ArrayAlignment

Output type
-----------
model_result, SerialisableType

Overview of cogent3 apps

Contents

8.9. Overview of cogent3 apps#

8.10. Getting help on an app#

8.9. Overview of `cogent3` apps#