Getting started with MOFA-FLEX

Getting started with MOFA-FLEX#

MOFA-FLEX is a framework for factor analysis of multimodal data, with a focus on single-cell omics, representing an observed data matrix \(\mat{Y}\) as a product of a low-rank factor matrix \(\mat{Z}\) and a low-rank weight matrix \(\mat{W}\): \(\mat{Y} \approx \mat{Z} \mat{W}\). It is essentially a synthesis of MOFA[AAB+20, AVA+18], MEFISTO[VBA+22], MuVI[QB23], nonnegative matrix factorization, and NSF[TE23]. It natively supports MuData and AnnData objects, integrating into the wider scverse ecosystem.

For this notebook, we will use the pbmc3k dataset by 10x Genomics, conveniently packaged in the mudatasets package. The dataset contains 10 000 single cells profiled with the 10x multiome assay, capturing gene expression (scRNAseq) and chromatin accessibility (scATACseq).

import mudata as md
import mudatasets as mds
import muon as mu
import mofaflex as mfl
import scanpy as sc
from plotnine import *
Importing the dtw module. When using in academic works please cite:
  T. Giorgino. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package.
  J. Stat. Soft., doi:10.18637/jss.v031.i07.
theme_set(theme_bw())
md.set_options(display_style="html", display_html_expand=0);
import warnings

warnings.filterwarnings("ignore", category=FutureWarning)
mdata = mds.load("pbmc3k_multiome")
■ File filtered_feature_bc_matrix.h5 from pbmc3k_multiome has been found at /home/kats/mudatasets/pbmc3k_multiome/filtered_feature_bc_matrix.h5
■ Checksum is validated (md5) for filtered_feature_bc_matrix.h5
■ Loading filtered_feature_bc_matrix.h5...
Added `interval` annotation for features from /home/kats/mudatasets/pbmc3k_multiome/filtered_feature_bc_matrix.h5
mdata.var_names_make_unique()
mdata
MuData object 2711 obs × 134920 var in 2 modalities
Metadata
.obs0 elements
No metadata
Embeddings & mappings
.obsm2 elements
rna bool numpy.ndarray
atac bool numpy.ndarray
Distances
.obsp0 elements
No distances
rna
2711 × 36601
AnnData object 2711 obs × 36601 var
Matrix
.X
float32    scipy.sparse._csr.csr_matrix
Layers
.layers0 elements
No layers
Metadata
.obs0 elements
No metadata
Embeddings
.obsm0 elements
No embeddings
Distances
.obsp0 elements
No distances
Miscellaneous
.uns0 elements
No miscellaneous
atac
2711 × 98319
AnnData object 2711 obs × 98319 var
Matrix
.X
float32    scipy.sparse._csr.csr_matrix
Layers
.layers0 elements
No layers
Metadata
.obs0 elements
No metadata
Embeddings
.obsm0 elements
No embeddings
Distances
.obsp0 elements
No distances
Miscellaneous
.uns0 elements
No miscellaneous

Preprocessing#

We first perform the basic preprocessing steps outlined in the muon tutorial to remove undetected genes and poor-quality cells.

rna = mdata["rna"]
rna.var["mt"] = rna.var_names.str.startswith("MT-")
sc.pp.calculate_qc_metrics(
    rna, qc_vars=["mt"], percent_top=None, log1p=False, inplace=True
)
mu.pp.filter_var(rna, "n_cells_by_counts", lambda x: x >= 3)
mu.pp.filter_obs(rna, "n_genes_by_counts", lambda x: (x >= 200) & (x < 5000))

mu.pp.filter_obs(rna, "total_counts", lambda x: x < 15000)
mu.pp.filter_obs(rna, "pct_counts_mt", lambda x: x < 20)

atac = mdata.mod["atac"]
sc.pp.calculate_qc_metrics(atac, percent_top=None, log1p=False, inplace=True)
mu.pp.filter_var(atac, "n_cells_by_counts", lambda x: x >= 10)
mu.pp.filter_obs(atac, "n_genes_by_counts", lambda x: (x >= 2000) & (x <= 15000))
mu.pp.filter_obs(atac, "total_counts", lambda x: (x >= 4000) & (x <= 40000))

We further normalize the data and determine highly variable genes. MOFA-FLEX will automatically use only the highly variable genes for the analysis.

sc.pp.normalize_total(rna, target_sum=1e4)
sc.pp.log1p(rna)
sc.pp.highly_variable_genes(rna, min_mean=0.02, max_mean=4, min_disp=0.5)

mu.atac.pp.tfidf(atac, scale_factor=1e4)
sc.pp.normalize_per_cell(atac, counts_per_cell_after=1e4)
sc.pp.log1p(atac)
sc.pp.highly_variable_genes(atac, min_mean=0.05, max_mean=1.5, min_disp=0.5)

mdata.update()
mdata
MuData object 2695 obs × 119335 var in 2 modalities
Metadata
.obs7 elements
rna:n_genes_by_counts float64 2272.00,3254.00,1798.00,1145.00,1495.00,1342.00,25...
rna:total_counts float32 4747.00,7761.00,3666.00,2162.00,2910.00,2616.00,66...
rna:total_counts_mt float32 369.00,693.00,409.00,271.00,293.00,213.00,308.00,6...
rna:pct_counts_mt float32 7.77,8.93,11.16,12.53,10.07,8.14,4.62,12.36,10.79,...
atac:n_genes_by_counts float64 5415.00,9572.00,9459.00,6634.00,7107.00,7378.00,56...
atac:total_counts float32 12757.00,23124.00,22912.00,17106.00,18400.00,18723...
atac:n_counts float32 6129.60,10004.25,9936.28,6779.99,6849.25,7125.32,5...
Embeddings & mappings
.obsm2 elements
rna bool numpy.ndarray
atac bool numpy.ndarray
Distances
.obsp0 elements
No distances
rna
2636 × 21256
AnnData object 2636 obs × 21256 var
Matrix
.X
float32    scipy.sparse._csr.csr_matrix
Layers
.layers0 elements
No layers
Metadata
.obs4 elements
n_genes_by_counts int32 2272,3254,1798,1145,1495,1342,2589,2385,3009,3320,...
total_counts float32 4747.00,7761.00,3666.00,2162.00,2910.00,2616.00,66...
total_counts_mt float32 369.00,693.00,409.00,271.00,293.00,213.00,308.00,6...
pct_counts_mt float32 7.77,8.93,11.16,12.53,10.07,8.14,4.62,12.36,10.79,...
Embeddings
.obsm0 elements
No embeddings
Distances
.obsp0 elements
No distances
Miscellaneous
.uns2 elements
log1p dict 1 element base: None
hvg dict 1 element flavor: seurat
atac
2450 × 98079
AnnData object 2450 obs × 98079 var
Matrix
.X
float32    scipy.sparse._csr.csr_matrix
Layers
.layers0 elements
No layers
Metadata
.obs3 elements
n_genes_by_counts int32 5415,9572,9459,6634,7107,7378,5654,5212,13768,1387...
total_counts float32 12757.00,23124.00,22912.00,17106.00,18400.00,18723...
n_counts float32 6129.60,10004.25,9936.28,6779.99,6849.25,7125.32,5...
Embeddings
.obsm0 elements
No embeddings
Distances
.obsp0 elements
No distances
Miscellaneous
.uns2 elements
log1p dict 1 element base: None
hvg dict 1 element flavor: seurat

Fitting a model#

Creating a MOFAFLEX object automatically fits the model using the specified parameters. A basic understanding of the model is required to correctly set the parameters for a given dataset.

Since we have normalized the data in the previous section, we will use a Normal (Gaussian) likelihood. In general, a negative binomial likelihood is more appropriate for unnormalized count data. With default settings, MOFA-FLEX will plot the number of observations in each view and group. This can be turned off by customizing mofaflex.DataOptions. MOFA-FLEX will also automatically save the final model after training which can later be loaded with mofaflex.MOFAFLEX.load. The name of the saved file can be changed in mofaflex.TrainingOptions.

model = mfl.MOFAFLEX(
    mdata,
    mfl.ModelOptions(n_factors=15, likelihoods="Normal"),
    mfl.TrainingOptions(batch_size=1000, seed=42),
)
../_images/088814b33d554618679851ce0e27b039136fe62b27cc87f4f61a32e8b2d9c123.png

We can plot the loss curve to get an overview of the training process:

mfl.pl.training_curve(model)

Analysing the model#

The trained model can now be analysed to assess its quality and discover relevant biology. First, we plot the correlations between factors, which should be as low as possible: We want factors to be uncorrelated, as each factor should capture a different aspect of the data.

mfl.pl.factor_correlation(model)

This is not ideal, but good enough for a first look at the data. We can now plot the fraction of variance in the data explained by each factor to determine the most important factors. Note that for non-Normal likelihoods, these values are approximate.

mfl.pl.variance_explained(model)

We can plot the most important genes per factor to get an idea of what each factor represents. This plot aggregates over all views.

mfl.pl.top_weights(model, figsize=(10, 10))

It appears that the most important features are all from the RNA dataset. We can use a different plotting function to get the top weights individually for each view.

mfl.pl.weights(model, factors=(1, 2), figsize=(10, 10))

We can also plot factors agains each other. This may be useful to define clusters of cells with similar factor values.

mfl.pl.factors_scatter(model, 1, 2, alpha=0.5)

Of course, MOFA-FLEX does not and cannot provide all imaginable analysis functions. It thus provides methods to access the factor and weight values, such that they can be used for manual analysis.

weights = model.get_weights()
factors = model.get_factors()

weights["rna"]
ISG15 C1orf159 AL390719.3 TNFRSF18 TNFRSF4 PLCH2 MEGF6 AL365255.1 NPHP4 ACOT7 ... SPANXA2-OT1 AFF2 LINC00893 TMEM185A LINC00894 PDZD4 SLC10A3 MTCP1 BRCC3 VBP1
Factor 1 -0.189376 -0.073787 -0.049348 0.035380 0.047967 -0.091178 -0.025529 -0.001774 0.113424 -0.148712 ... 0.051964 -0.051119 0.001705 -0.005496 0.066848 -0.120905 0.086391 0.095509 0.274914 -0.091386
Factor 2 0.248409 0.102557 0.029819 0.152524 0.329639 -0.084845 0.073064 0.011662 0.051096 -0.057731 ... 0.021905 0.038603 -0.069393 -0.020068 0.007216 -0.039692 0.062904 -0.104711 -0.215993 -0.125060
Factor 3 0.384845 -0.016436 -0.012052 -0.008531 -0.096441 0.020407 -0.150675 -0.048524 0.107885 -0.001884 ... -0.018592 -0.018152 0.035284 -0.175562 -0.021295 -0.021576 -0.067565 0.042839 0.085181 -0.055791
Factor 4 -0.016846 0.079865 0.046001 -0.558960 -0.572362 -0.144281 0.079722 -0.001279 0.103057 -0.087016 ... -0.000845 0.106108 0.020730 0.040423 0.029478 -0.107213 -0.051710 -0.014692 -0.044695 -0.092186
Factor 5 0.044211 0.129531 0.048840 0.229818 0.254385 0.001333 -0.081784 -0.005494 0.128608 -0.127923 ... 0.029484 0.063783 -0.068990 -0.084985 -0.038371 0.116027 -0.022857 0.003913 0.005418 0.097961
Factor 6 -0.618853 0.012128 0.000998 0.137428 -0.043499 0.024533 0.022788 0.098064 -0.003297 0.041672 ... 0.003277 0.039947 -0.060559 -0.013331 -0.037571 0.125552 0.036886 0.033270 0.039675 -0.211510
Factor 7 -0.442458 -0.101479 0.023583 0.021900 -0.025400 0.025268 -0.010805 -0.070370 -0.009152 -0.072582 ... 0.032161 0.029726 0.015703 0.066717 -0.074298 -0.045682 0.017101 0.057714 -0.067829 0.175016
Factor 8 -0.055849 -0.068473 -0.042274 -0.050389 0.022151 -0.007174 0.024929 0.053787 -0.038904 0.020470 ... -0.038868 0.070869 -0.020878 0.079161 0.106335 -0.110068 -0.017513 0.049150 0.053820 -0.074065
Factor 9 0.847176 -0.144409 -0.082529 -0.125883 -0.021260 0.076999 0.086520 -0.044230 0.058836 0.120648 ... 0.062515 0.004620 0.024979 0.045098 0.093637 0.085579 0.036728 -0.073276 -0.049939 -0.045169
Factor 10 0.260658 -0.018242 -0.048973 0.095457 -0.118889 0.128337 0.097937 0.013919 -0.051902 -0.046666 ... -0.019701 0.003984 -0.087492 0.046393 0.042418 -0.085316 0.038125 0.041113 0.045989 0.145492
Factor 11 -0.339596 0.035699 0.036781 -0.028903 -0.294902 0.033796 0.027752 -0.074360 0.002477 0.172775 ... -0.018687 0.141971 0.082159 0.021479 0.253133 -0.051885 -0.067781 -0.044705 0.005597 0.129108
Factor 12 0.140051 -0.023272 0.053881 -0.398822 -0.443519 -0.104664 0.022436 0.003456 0.010753 -0.123234 ... 0.005473 0.047491 -0.030022 0.053311 -0.077040 0.153919 -0.029247 -0.129744 -0.092596 -0.098910
Factor 13 0.008263 -0.002270 0.098201 0.249155 0.543481 -0.062938 0.084722 -0.030612 0.052744 0.031904 ... -0.021037 0.073529 -0.029792 -0.118077 -0.062912 -0.022290 -0.148696 -0.148372 -0.096425 0.122402
Factor 14 1.380962 -0.087727 0.009225 0.235166 -0.039743 0.009549 -0.131573 0.023868 0.013604 -0.046723 ... -0.036650 -0.089463 0.005698 0.111561 -0.090331 0.049375 0.033765 -0.003689 -0.055969 0.053556
Factor 15 -1.152911 0.025368 0.005462 -0.023505 -0.274491 0.015588 -0.015119 -0.011525 0.011900 -0.101080 ... -0.039308 -0.086269 0.002518 0.082119 -0.133889 -0.046957 -0.097321 -0.010911 -0.036072 -0.400827

15 rows × 3718 columns

weights["atac"]
chr1:267561-268455 chr1:629484-630393 chr1:778284-779202 chr1:844149-845034 chr1:857873-858632 chr1:923392-924241 chr1:958865-959762 chr1:1012993-1013910 chr1:1040399-1041292 chr1:1068892-1069685 ... GL000205.2:67744-68642 GL000219.1:39937-40840 GL000219.1:44650-45512 GL000219.1:45733-46550 GL000219.1:90065-90960 GL000219.1:125017-125889 KI270721.1:2089-2980 KI270726.1:27153-28037 KI270726.1:41489-42329 KI270713.1:29578-30400
Factor 1 0.015588 -0.068973 -0.323210 -0.075846 -0.078673 -0.043594 -0.082744 -0.230949 -0.137500 -0.120305 ... 0.192618 0.039591 -0.136961 -0.108797 0.024091 -0.132526 -0.083251 -0.106187 0.003338 -0.140618
Factor 2 0.010297 0.171070 -0.272004 -0.182100 -0.033910 0.060713 -0.279495 -0.237344 -0.373046 -0.010249 ... 0.278677 -0.176570 -0.259601 -0.642847 -0.032023 -0.105531 -0.058108 -0.053535 -0.071863 0.093508
Factor 3 -0.028898 0.051319 0.071390 -0.293090 -0.084974 -0.074341 -0.122872 -0.012087 -0.311388 -0.116551 ... 0.213033 -0.107144 -0.029083 -0.057040 -0.043446 0.101682 0.070688 -0.050273 0.063659 -0.128200
Factor 4 0.067406 -0.064552 -0.144720 0.222972 0.198841 -0.117017 -0.030485 0.141264 0.137078 0.002803 ... -0.195972 -0.159258 0.034300 -0.262307 -0.124259 -0.005265 0.026266 0.027857 -0.010228 0.033526
Factor 5 0.134998 -0.118389 -0.048504 -0.033692 -0.094288 -0.105451 -0.134580 0.170727 -0.105009 -0.052774 ... -0.299938 -0.023304 -0.075331 0.213807 0.079724 -0.047202 0.067408 -0.105169 -0.095975 -0.008091
Factor 6 -0.094902 0.175581 0.199167 0.130938 0.128411 -0.070910 0.290823 -0.085090 0.139106 0.375201 ... -0.055348 0.008064 0.035890 0.025920 -0.017034 -0.139985 -0.021245 0.011567 0.146342 0.039091
Factor 7 -0.029667 0.232662 0.137632 -0.034105 0.026049 0.141460 -0.023896 0.103624 0.080493 0.355293 ... 0.093679 0.253191 0.091128 -0.129581 0.027496 0.098308 -0.037056 0.088649 0.149442 -0.027713
Factor 8 0.105609 0.116295 0.127468 0.013525 0.036117 0.050187 -0.022508 -0.186174 -0.188627 -0.149945 ... -0.042306 -0.212531 0.053144 -0.260661 -0.238006 0.011654 -0.002434 -0.004869 -0.002524 0.011428
Factor 9 -0.088714 -0.236125 -0.034857 -0.275669 -0.199103 -0.208306 -0.173522 -0.175142 -0.361846 -0.388860 ... -0.062702 0.381349 0.329235 0.350945 0.091241 0.032202 -0.146296 -0.041344 -0.130320 -0.230856
Factor 10 0.003886 0.182097 0.042181 0.142671 0.075899 0.056084 0.086588 0.131228 0.225935 0.243304 ... 0.153632 -0.270489 -0.095530 -0.105563 -0.187800 0.022895 -0.096935 -0.006983 0.029871 0.032132
Factor 11 -0.040994 -0.144084 -0.171954 -0.104147 0.027021 -0.026749 -0.303967 -0.186920 -0.241965 -0.203547 ... 0.040131 -0.203556 -0.029991 0.042943 -0.209069 0.003288 -0.004122 0.006341 -0.013852 -0.120541
Factor 12 0.057618 0.117719 -0.108942 -0.091153 -0.023501 -0.011017 -0.033363 -0.139303 -0.102051 0.049994 ... 0.412098 0.178078 0.142233 0.369375 0.073314 0.032400 0.120729 -0.055238 -0.183104 -0.080130
Factor 13 -0.156427 -0.054138 0.163906 0.073237 0.101645 0.183813 0.142219 0.094903 -0.288277 0.467657 ... 0.027979 0.187276 -0.064292 -0.197571 0.028351 -0.001363 0.196256 0.097051 -0.038267 -0.153199
Factor 14 -0.070237 -0.079855 -0.091271 -0.036744 -0.044658 0.030472 -0.194341 0.209160 -0.318201 -0.377731 ... -0.011044 -0.237870 -0.114371 -0.183502 -0.053019 -0.095012 -0.068152 -0.062755 -0.023839 0.033481
Factor 15 -0.034799 0.015444 0.088015 0.141185 -0.030220 0.149655 0.039116 -0.162965 -0.071959 -0.050679 ... -0.063380 -0.320987 -0.064431 -0.133580 -0.006968 -0.118564 0.050569 0.075011 0.030030 -0.033265

15 rows × 20985 columns

factors["group_1"]
Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 Factor 6 Factor 7 Factor 8 Factor 9 Factor 10 Factor 11 Factor 12 Factor 13 Factor 14 Factor 15
AAACAGCCAAATATCC-1 -0.711329 -0.138095 -0.117859 -0.758095 0.488893 0.365223 -0.317043 -0.150380 0.091710 0.352163 0.347944 0.072361 -0.308070 0.369941 -0.224889
AAACAGCCAGGAACTG-1 0.200870 0.226601 0.103635 0.109878 -0.087861 -0.279875 -0.261920 0.095023 0.175019 -0.056246 0.125093 0.110894 0.007098 0.348968 -0.031934
AAACAGCCAGGCTTCG-1 0.386977 0.143410 0.149321 0.150196 0.409791 -0.082680 -0.210290 0.268729 0.126539 -0.305792 0.150471 0.006699 -0.195209 -0.029911 0.360625
AAACCAACACCTGCTC-1 0.076602 0.108646 0.310807 -0.504941 -0.660438 -0.048514 0.184567 0.005226 -0.126196 0.157478 0.201867 0.261932 0.003733 -0.281539 0.241265
AAACCAACAGATTCAT-1 -0.460528 -0.714788 -0.103196 -0.712085 0.181718 0.024262 -0.154573 0.017619 0.033718 0.131182 0.076497 0.016913 -0.226345 0.206774 0.051490
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
TGTTGTGCATTGTGTG-1 -0.124523 0.017649 -0.219138 0.317680 -0.174649 0.291931 0.288152 0.005809 0.000549 0.103788 0.006043 -0.046411 0.079992 0.115161 0.123362
TTCCCACAGGCGAATA-1 0.180360 0.090379 0.226677 0.074547 0.022169 -0.193567 -0.180897 0.119575 0.107953 -0.082141 0.144209 0.016051 -0.054936 0.239553 0.037676
TTGCTTAGTCCTTTAA-1 0.156793 0.047492 0.187003 0.003541 0.134676 0.015608 -0.149397 0.134034 0.051219 -0.063679 0.030517 0.004928 -0.028842 0.089295 0.124001
TTGTCCGGTGCATTAG-1 0.192446 0.149734 0.154785 0.046855 0.088373 -0.134648 -0.230934 0.164872 0.063290 -0.153682 0.181172 0.039701 -0.090489 0.129575 0.078037
TTTGTTGGTAGGTTTG-1 0.174301 0.196018 0.306593 -0.232571 -0.284522 0.040452 0.124159 -0.008937 -0.192886 0.019016 0.023248 0.296303 -0.117336 -0.118165 -0.064181

2695 rows × 15 columns