8.9. Overview of cogent3
apps#
cogent3.apps
can be though of as customised functions that have some special capabilities. The approach to using an existing app is to construct a configured instance, typically by calling the app with some argument values tuned for your particular case. You then apply it to input data by simply calling, without specifying any arguments.
There are quite a few apps and you can see an overview of what’s available as follows
from cogent3 import available_apps
apps = available_apps()
apps.head()
package | name | composable | doc | input type | output type |
---|---|---|---|---|---|
cogent3 | align_to_ref | True | Aligns sequences to a nominated reference in the unaligned collection. | SequenceCollection | Alignment, ArrayAlignment, SerialisableType |
cogent3 | ancestral_states | True | Computes ancestral state probabilities from a model result. | model_result | SerialisableType, tabular_result |
cogent3 | approx_jc69 | True | Converts p-distances and returns pairwise JC69 distances | DistanceMatrix | DistanceMatrix |
cogent3 | approx_pdist | True | Calculates an approximation of the p-distance between sequences based on Jaccard distances (see Notes for details). | DistanceMatrix | DistanceMatrix |
cogent3 | bootstrap | True | Parametric bootstrap for a provided hypothesis. | Alignment, ArrayAlignment | SerialisableType, bootstrap_result |
Top 5 rows from 46 rows x 6 columns
That function returns a cogent3 table, so I’m just displaying the first few rows.
See the cogent3 apps documentation for more details.
8.10. Getting help on an app#
To get information on a particular app, use the special app_help()
function, passing the name of the app as a string.
from cogent3 import app_help
app_help("model")
Overview
--------
Define a substitution model + tree for maximum likelihood evaluation.
Options for making the app
--------------------------
model_app = get_app(
'model',
sm: str | cogent3.evolve.substitution_model._SubstitutionModel,
tree=None,
unique_trees=False,
tree_func=None,
name=None,
optimise_motif_probs=False,
sm_args=None,
lf_args=None,
time_het=None,
param_rules=None,
opt_args=None,
lower=1e-06,
upper=50,
split_codons=False,
show_progress=False,
verbose=False,
)
Parameters
----------
sm
substitution model (str or instance) if string must be available
via get_model()
tree
if None, assumes a star phylogeny (only valid for 3 taxa). Can be a
newick formatted tree, a path to a file containing one, or a Tree
instance
unique_trees
whether to specify a unique tree per alignment. Only applies if
number of sequences equals 3
tree_func: callable
a callable that takes an alignment and returns a Tree instance.
Overrides tree and unique_tree settings.
name
the model name
optimise_motif_probs
whether the motif probabilities are free parameters. If False,
takes the average of frequencies from the alignment. Overrides
the setting of a sub model instance, or any value provided in
sm_args
sm_args
arguments to be passed to the substitution model constructor
lf_args
arguments to be passed to the likelihood function constructor
time_het
Affects whether substitution model rate parameters are
heterogeneous between branches on the tree. To define a maximally
time-heterogeneous model, set the string value 'max', which
makes all rate matrix exchangeability parameters unique for all
edges. More restricted time-heterogeneity can be specified
using a list of dicts corresponding to edge_sets, e.g.
``[dict(edges=['Human', 'Chimp'], is_independent=False, upper=10)]``.
This value is passed to <likelihood function>.set_time_heterogeneity()
param_rules
other parameter rules, passed to
<likelihood function>.set_param_rule()
opt_args
arguments for the numerical optimiser, e.g.
dict(max_restarts=5, tolerance=1e-6, max_evaluations=1000,
limit_action='ignore')
lower, upper
bounds for all rate and length parameters. Ignored if a
rule in ``param_rules`` or ``time_het`` has a value defined.
split_codons
if True, incoming alignments are split into the 3 frames and each
frame is fit separately
show_progress
show progress bars during numerical optimisation
verbose
prints intermediate states to screen during fitting
Returns
-------
Calling an instance with an alignment returns a model_result instance
with the optimised likelihood function. In the case of split_codons,
the result object has a separate entry for each codon position.
Examples
--------
Create a model and fit to a three-sequence alignment. For three
sequences, there is only one possible unrooted tree so we do not need
to provide one. (We're limiting the optimiser's workload by setting
``max_evaluations=10``, solely to ensure quick execution of the examples, not
because we recommend it!)
>>> from cogent3 import make_aligned_seqs, get_app
>>> aln = make_aligned_seqs(
... {
... "Human": "ATGCGGCTCGCGGAGGCCGCGCTCGCGGAG",
... "Mouse": "ATGCCCGGCGCCAAGGCAGCGCTGGCGGAG",
... "Opossum": "ATGCCAGTGAAAGTGGCGGCGGTGGCTGAG",
... },
... moltype="dna",
... )
>>> app = get_app(
... "model", "F81", opt_args=dict(limit_action="ignore", max_evaluations=10)
... )
>>> result = app(aln)
>>> result
F81...
For the following, we will only show different model construction options
but don't apply them to data.
To apply a model to an alignment with more than three sequences
we need to provide a tree. We can provide the tree as a newick
string.
>>> tree = "(Mouse,(Human,Gorilla),Opossum)"
>>> aln2 = make_aligned_seqs(
... {
... "Human": "ATGCGGCTCGCGGAGGCCGCGCTCGCGGAG",
... "Gorilla": "ATGCGGCGCGCGGAGGCCGCGCTCGCGGAG",
... "Mouse": "ATGCCCGGCGCCAAGGCAGCGCTGGCGGAG",
... "Opossum": "ATGCCAGTGAAAGTGGCGGCGGTGGCTGAG",
... },
... moltype="dna",
... )
>>> app_tr = get_app("model", "F81", tree=tree)
Or we could assign a function that estimates the tree for an alignment.
>>> dist_cal = get_app("fast_slow_dist", fast_calc="paralinear", moltype="dna")
>>> est_tree = get_app("quick_tree")
>>> tree_func = dist_cal + est_tree
>>> model = get_app("model", "F81", tree_func=tree_func)
We can specify a time-heterogeneous model (where substitution rate parameters
differ between branches). For details, see
https://cogent3.org/doc/app/evo-model-timehet
>>> app_thet = get_app(
... "model",
... "HKY85",
... tree=tree,
... time_het=[dict(tip_names=["Human", "Opossum"], outgroup_name="Mouse")],
... )
Specify the upper and lower bounds for certain branch length and rate
exchangeability parameter.
>>> app_alt_params = get_app(
... "model",
... "HKY85",
... tree=tree,
... param_rules=[
... {"par_name": "length", "edge": "Human", "upper": 5, "lower": 1e-2},
... {"par_name": "kappa", "upper": 20, "lower": 1e-6},
... ],
... )
Specify the settings in the optimiser. By default, the Powell local optimiser is used.
The Powell algorithm can use restarts, configured using ``max_restarts``, to overcome
local maxima. With ``limit_action="ignore"`` defined, the optimiser will disregard
optimisation failures caused by exceeding ``max_evaluations``, rather than meeting the
``tolerance`` condition. (For more information see
https://cogent3.org/doc/cookbook/evo_modelling.html.)
>>> app_alt_opt = get_app(
... "model",
... "HKY85",
... tree=tree,
... opt_args=dict(
... max_restarts=5,
... tolerance=1e-8,
... max_evaluations=1_000_000,
... limit_action="ignore",
... ),
... )
Specify settings in the likelihood function constructor.
>>> app_alt_lf = get_app(
... "model", "HKY85", tree=tree, lf_args=dict(discrete_edges=["Opossum"])
... )
Splitting codons and fit models to each codon position class.
>>> app_sp_codon = get_app("model", "HKY85", tree=tree, split_codons=True)
A ``NotCompleted`` object (see https://cogent3.org/doc/app/not-completed.html)
is returned if ``tree`` (or ``tree_func``) is not provided and the number of seqs
exceeds 3.
>>> app_notree = get_app("model", "HKY85")
>>> result = app_notree(aln2)
>>> result.message
'to model more than 3, you must provide a tree'
A ``NotCompleted`` object is also returned if the model optimization is unsuccessful.
(Note that we have deliberately configured the optimiser to raise an exception if
it exits because it reached the maximum allowed evaluations.)
>>> app_limit_act = get_app(
... "model", "GN", opt_args=dict(limit_action="raise", max_evaluations=10)
... )
>>> result = app_limit_act(aln)
>>> print(result.message) # doctest: +NORMALIZE_WHITESPACE
Traceback ... FORCED EXIT from optimiser after 10 evaluations
Input type
----------
Alignment, ArrayAlignment
Output type
-----------
model_result, SerialisableType