mofaflex.MOFAFLEX

mofaflex.MOFAFLEX#

class mofaflex.MOFAFLEX(**kwargs)#

The MOFA-FLEX model.

This class is not meant to be instantiated by the user. Rather, it is created by instantiating a term.

Attributes table#

`feature_names`	Feature names for each view.
`group_names`	Group names.
`likelihoods`	The likelihoods.
`n_features`	Number of features in each view.
`n_features_total`	Total number of features.
`n_groups`	Number of groups.
`n_samples`	Number of samples in each group.
`n_samples_total`	Total number of samples.
`n_terms`	Number of additive terms.
`n_views`	Number of views.
`sample_names`	Sample names for each group.
`terms`	The additive terms.
`training_loss`	Total loss (negative ELBO) for each training epoch.
`view_names`	View names.

Methods table#

`fit`(data, *[, likelihoods, group_by, layer, ...])	Fit the model using the provided data.
`get_dispersion`([moment])	Get the dispersion vectors for each view.
`get_r2`([type, ordered, term])	Get the fraction of variance explained for each view and group.
`impute_data`(data[, missing_only])	Impute values in the training data using the trained factorization.
`load`(path[, map_location])	Load a saved MOFAFLEX model.

Attributes#

property feature_names: Mapping[str, ndarray[tuple[int], str]]#: Feature names for each view.

property group_names: ndarray[tuple[int], str]#: Group names.

property likelihoods: Mapping[str, None]#: The likelihoods.

property n_features: dict[str, int]#: Number of features in each view.

property n_features_total: int#: Total number of features.

property n_groups: int#: Number of groups.

property n_samples: dict[str, int]#: Number of samples in each group.

property n_samples_total: int#: Total number of samples.

property n_terms: int#: Number of additive terms.

property n_views: int#: Number of views.

property sample_names: Mapping[str, ndarray[tuple[int], str]]#: Sample names for each group.

property terms: Mapping[str, None]#: The additive terms.

property training_loss: ndarray[tuple[int], floating]#: Total loss (negative ELBO) for each training epoch.

property view_names: ndarray[tuple[int], str]#: View names.

Methods#

fit(data, *, likelihoods=None, group_by=None, layer=None, use_obs='union', use_var='union', subset_var='highly_variable', plot_data_overview=True, remove_constant_features=True, device='cuda', batch_size=0, max_epochs=10000, lr=0.001, early_stopper_patience=100, save_path=None, seed=None, num_workers=0, pin_memory=False, n_particles=1, update_every=0.1)#

Fit the model using the provided data.

Parameters:

data (MuData | Mapping[str, Mapping[str, AnnData]] | AnnData) –
can be any of:
- MuData object
- Nested dict with group names as keys, view names as subkeys and AnnData objects as values (incompatible with .group_by)
likelihoods (Mapping[str, Literal['Bernoulli', 'NegativeBinomial', 'Normal'] | Likelihood] | Literal['Bernoulli', 'NegativeBinomial', 'Normal'] | Likelihood | None (default: None)) – Data likelihoods for each view (if dict) or for all views (if str or Likelihood). Inferred automatically if None.
group_by (str | Sequence[str] | None (default: None)) – Columns of .obs in MuData or AnnData objects to group data by. Ignored if the input data is not a MuData or AnnData object.
layer (Mapping[str, str | None] | Mapping[str, Mapping[str, str | None]] | str | None (default: None)) – Which layer to use. If None, the .X element will be used. If str, the same layer will be used for all groups and views. If a dict of strings, the keys must correspond to view names and the values to layers. If a nested dict, different layers can be used for each combination of group and view. The last format is only accepted if the data is a nested dictionary of AnnData objects.
use_obs (Literal['union', 'intersection'] (default: 'union')) – How to align observations across views. Ignored if the data is not a nested dict of AnnData objects.
use_var (Literal['union', 'intersection'] (default: 'union')) – How to align variables across groups. Ignored if the data is not a nested dict of AnnData objects.
subset_var (str | None (default: 'highly_variable')) – .var column with boolean values to select features.
plot_data_overview (bool (default: True)) – Plot data overview.
remove_constant_features (bool (default: True)) – Remove constant features from the data.
device (str | device (default: 'cuda')) – Device to run training on.
batch_size (int (default: 0)) – Batch size.
max_epochs (int (default: 10000)) – Maximum number of training epochs.
lr (float (default: 0.001)) – Learning rate.
early_stopper_patience (int (default: 100)) – Number of steps without relevant improvement to stop training.
save_path (Path | str | None (default: None)) – Path to save model.
seed (int | None (default: None)) – Seed for the pseudorandom number generator.
num_workers (int (default: 0)) – Number of data loader workers.
pin_memory (bool (default: False)) – Whether to use pinned memory in the data loader.
n_particles (int (default: 1)) – Number of particles for ELBO estimation.
update_every (float (default: 0.1)) – Minimum interval between progress bar updates in seconds. Set to a negative value to disable the progressbar.

get_dispersion(moment='mean')#

Get the dispersion vectors for each view.

Parameters:: moment (Literal['mean', 'std'] (default: 'mean')) – Which moment of the posterior distribution to return.
Return type:: dict[str, Series]

get_r2(type=None, ordered=False, term=None)#

Get the fraction of variance explained for each view and group.

Parameters:

type (Literal['total', 'byterm', 'term'] | None (default: None)) –
How fine-grained the fraction of explained variance should be split up.
- total: Returns the total fraction of explained variance.
- byterm: Returns the fraction of explained variance for each additive term.
- term: Returns the fraction of explained variance for each component (e.g. factor) of the given term.
Defaults to term if the model has only one additive term, byterm otherwise.
ordered (bool (default: False)) – Whether to sort the returned dataframes by explained variance (highest to lowest, per group and view). Has no effect for type="total".
term (str | None (default: None)) – The name of the additive term for type="term". Can be None if the model has only one term.

Return type:

DataFrame

impute_data(data, missing_only=False)#

Impute values in the training data using the trained factorization.

The data will be transformed into a space compatible with model predictions. Usually that involves shifting and/or scaling, e.g. Gaussian data will be mean-centered and scaled to unit variance. This also implies that only dense matrices can be returned. Be aware that this can result in high memory consumption.

Parameters:

data (MuData | Mapping[str, Mapping[str, AnnData]] | AnnData) – The data the model was trained on.
missing_only (default: False) – Only impute missing values in the data.

Return type:

dict[dict[str, AnnData]]

Returns:

Nested dictionary of AnnData objects with either fully imputed data or with only the missing values filled in.

classmethod load(path, map_location=None)#

Load a saved MOFAFLEX model.

Parameters:

path (str | Path) – Path to the saved model file.
map_location (default: None) – Specify how to remap storage locations for PyTorch tensors. See the torch.load documentation for details.

Return type:

MOFAFLEX

mofaflex.MOFAFLEX

Contents

mofaflex.MOFAFLEX#

Attributes table#

Methods table#

Attributes#

Methods#