mofaflex.MOFAFLEX#

class mofaflex.MOFAFLEX(**kwargs)#

The MOFA-FLEX model.

This class is not meant to be instantiated by the user. Rather, it is created by instantiating a term.

Attributes table#

feature_names

Feature names for each view.

group_names

Group names.

likelihoods

The likelihoods.

n_features

Number of features in each view.

n_features_total

Total number of features.

n_groups

Number of groups.

n_samples

Number of samples in each group.

n_samples_total

Total number of samples.

n_terms

Number of additive terms.

n_views

Number of views.

sample_names

Sample names for each group.

terms

The additive terms.

training_loss

Total loss (negative ELBO) for each training epoch.

view_names

View names.

Methods table#

fit(data, *[, likelihoods, group_by, layer, ...])

Fit the model using the provided data.

get_dispersion([moment])

Get the dispersion vectors for each view.

get_r2([type, ordered, term])

Get the fraction of explained variance for each view and group.

impute_data(data[, missing_only])

Impute values in the training data using the trained factorization.

load(path[, map_location])

Load a saved MOFAFLEX model.

Attributes#

MOFAFLEX.feature_names#

Feature names for each view.

MOFAFLEX.group_names#

Group names.

MOFAFLEX.likelihoods#

The likelihoods.

MOFAFLEX.n_features#

Number of features in each view.

MOFAFLEX.n_features_total#

Total number of features.

MOFAFLEX.n_groups#

Number of groups.

MOFAFLEX.n_samples#

Number of samples in each group.

MOFAFLEX.n_samples_total#

Total number of samples.

MOFAFLEX.n_terms#

Number of additive terms.

MOFAFLEX.n_views#

Number of views.

MOFAFLEX.sample_names#

Sample names for each group.

MOFAFLEX.terms#

The additive terms.

MOFAFLEX.training_loss#

Total loss (negative ELBO) for each training epoch.

MOFAFLEX.view_names#

View names.

Methods#

MOFAFLEX.fit(data, *, likelihoods=None, group_by=None, layer=None, use_obs='union', use_var='union', subset_var='highly_variable', plot_data_overview=True, remove_constant_features=True, device='cuda', batch_size=0, max_epochs=10000, lr=0.001, early_stopper_patience=100, save_path=None, seed=None, num_workers=0, pin_memory=False, n_particles=1)#

Fit the model using the provided data.

Parameters:
  • data (MuData | Mapping[str, Mapping[str, AnnData]] | AnnData) –

    can be any of:

    • MuData object

    • Nested dict with group names as keys, view names as subkeys and AnnData objects as values (incompatible with .group_by)

  • likelihoods (Union[Mapping[str, Union[Literal['Bernoulli', 'NegativeBinomial', 'Normal'], Likelihood]], Literal['Bernoulli', 'NegativeBinomial', 'Normal'], Likelihood, None] (default: None)) – Data likelihoods for each view (if dict) or for all views (if str or Likelihood). Inferred automatically if None.

  • group_by (str | Sequence[str] | None (default: None)) – Columns of .obs in MuData or AnnData objects to group data by. Ignored if the input data is not a MuData or AnnData object.

  • layer (Mapping[str, str | None] | Mapping[str, Mapping[str, str | None]] | str | None (default: None)) – Which layer to use. If None, the .X element will be used. If str, the same layer will be used for all groups and views. If a dict of strings, the keys must correspond to view names and the values to layers. If a nested dict, different layers can be used for each combination of group and view. The last format is only accepted if the data is a nested dictionary of AnnData objects.

  • use_obs (Literal['union', 'intersection'] (default: 'union')) – How to align observations across views. Ignored if the data is not a nested dict of AnnData objects.

  • use_var (Literal['union', 'intersection'] (default: 'union')) – How to align variables across groups. Ignored if the data is not a nested dict of AnnData objects.

  • subset_var (str | None (default: 'highly_variable')) – .var column with boolean values to select features.

  • plot_data_overview (bool (default: True)) – Plot data overview.

  • remove_constant_features (bool (default: True)) – Remove constant features from the data.

  • device (str | device (default: 'cuda')) – Device to run training on.

  • batch_size (int (default: 0)) – Batch size.

  • max_epochs (int (default: 10000)) – Maximum number of training epochs.

  • lr (float (default: 0.001)) – Learning rate.

  • early_stopper_patience (int (default: 100)) – Number of steps without relevant improvement to stop training.

  • save_path (Path | str | None (default: None)) – Path to save model.

  • seed (int | None (default: None)) – Seed for the pseudorandom number generator.

  • num_workers (int (default: 0)) – Number of data loader workers.

  • pin_memory (bool (default: False)) – Whether to use pinned memory in the data loader.

  • n_particles (int (default: 1)) – Number of particles for ELBO estimation.

MOFAFLEX.get_dispersion(moment='mean')#

Get the dispersion vectors for each view.

Parameters:

moment (Literal['mean', 'std'] (default: 'mean')) – Which moment of the posterior distribution to return.

Return type:

dict[str, Series]

MOFAFLEX.get_r2(type=None, ordered=False, term=None)#

Get the fraction of explained variance for each view and group.

Parameters:
  • type (Optional[Literal['total', 'byterm', 'term']] (default: None)) –

    How fine-grained the fraction of explained variance should be split up.

    • total: Returns the total fraction of explained variance.

    • byterm: Returns the fraction of explained variance for each additive term.

    • term: Returns the fraction of explained variance for each component (e.g. factor) of the given term.

    Defaults to term if the model has only one additive term, byterm otherwise.

  • ordered (bool (default: False)) – Whether to sort the returned dataframes by explained variance (highest to lowest, per group and view). Has no effect for type="total".

  • term (str | None (default: None)) – The name of the additive term for type="term". Can be None if the model has only one term.

Return type:

DataFrame

MOFAFLEX.impute_data(data, missing_only=False)#

Impute values in the training data using the trained factorization.

The data will be transformed into a space compatible with model predictions. Usually that involves shifting and/or scaling, e.g. Gaussian data will be mean-centered and scaled to unit variance. This also implies that only dense matrices can be returned. Be aware that this can result in high memory consumption.

Parameters:
Return type:

dict[dict[str, AnnData]]

Returns:

Nested dictionary of AnnData objects with either fully imputed data or with only the missing values filled in.

classmethod MOFAFLEX.load(path, map_location=None)#

Load a saved MOFAFLEX model.

Parameters:
  • path (str | Path) – Path to the saved model file.

  • map_location (default: None) – Specify how to remap storage locations for PyTorch tensors. See the torch.load documentation for details.

Return type:

MOFAFLEX