Getting started with MOFA-FLEX#
MOFA-FLEX is a framework for factor analysis of multimodal data, with a focus on single-cell omics, representing an observed data matrix \(\mat{Y}\) as a product of a low-rank factor matrix \(\mat{Z}\) and a low-rank weight matrix \(\mat{W}\): \(\mat{Y} \approx \mat{Z} \mat{W}\). It is essentially a synthesis of MOFA[AAB+20, AVA+18], MEFISTO[VBA+22], MuVI[QB23], nonnegative matrix factorization, and NSF[TE23]. It natively supports MuData and AnnData objects, integrating into the wider scverse ecosystem.
For this notebook, we will use the pbmc3k dataset by 10x Genomics, conveniently packaged in the mudatasets package.
The dataset contains 10 000 single cells profiled with the 10x multiome assay, capturing gene expression (scRNAseq) and chromatin accessibility (scATACseq).
import mudata as md
import mudatasets as mds
import muon as mu
import mofaflex as mfl
import scanpy as sc
from plotnine import *
Importing the dtw module. When using in academic works please cite:
T. Giorgino. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package.
J. Stat. Soft., doi:10.18637/jss.v031.i07.
theme_set(theme_bw())
md.set_options(display_style="html", display_html_expand=0);
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
mdata = mds.load("pbmc3k_multiome")
■ File filtered_feature_bc_matrix.h5 from pbmc3k_multiome has been found at /home/kats/mudatasets/pbmc3k_multiome/filtered_feature_bc_matrix.h5
■ Checksum is validated (md5) for filtered_feature_bc_matrix.h5
■ Loading filtered_feature_bc_matrix.h5...
Added `interval` annotation for features from /home/kats/mudatasets/pbmc3k_multiome/filtered_feature_bc_matrix.h5
mdata.var_names_make_unique()
mdata
Metadata.obs0 elements
No metadataEmbeddings & mappings.obsm2 elements
| rna | bool | numpy.ndarray | |
| atac | bool | numpy.ndarray |
Distances.obsp0 elements
No distancesrna2711 × 36601
AnnData object 2711 obs × 36601 varLayers.layers0 elements
No layersMetadata.obs0 elements
No metadataEmbeddings.obsm0 elements
No embeddingsDistances.obsp0 elements
No distancesMiscellaneous.uns0 elements
No miscellaneousatac2711 × 98319
AnnData object 2711 obs × 98319 varLayers.layers0 elements
No layersMetadata.obs0 elements
No metadataEmbeddings.obsm0 elements
No embeddingsDistances.obsp0 elements
No distancesMiscellaneous.uns0 elements
No miscellaneousPreprocessing#
We first perform the basic preprocessing steps outlined in the muon tutorial to remove undetected genes and poor-quality cells.
rna = mdata["rna"]
rna.var["mt"] = rna.var_names.str.startswith("MT-")
sc.pp.calculate_qc_metrics(
rna, qc_vars=["mt"], percent_top=None, log1p=False, inplace=True
)
mu.pp.filter_var(rna, "n_cells_by_counts", lambda x: x >= 3)
mu.pp.filter_obs(rna, "n_genes_by_counts", lambda x: (x >= 200) & (x < 5000))
mu.pp.filter_obs(rna, "total_counts", lambda x: x < 15000)
mu.pp.filter_obs(rna, "pct_counts_mt", lambda x: x < 20)
atac = mdata.mod["atac"]
sc.pp.calculate_qc_metrics(atac, percent_top=None, log1p=False, inplace=True)
mu.pp.filter_var(atac, "n_cells_by_counts", lambda x: x >= 10)
mu.pp.filter_obs(atac, "n_genes_by_counts", lambda x: (x >= 2000) & (x <= 15000))
mu.pp.filter_obs(atac, "total_counts", lambda x: (x >= 4000) & (x <= 40000))
We further normalize the data and determine highly variable genes. MOFA-FLEX will automatically use only the highly variable genes for the analysis.
sc.pp.normalize_total(rna, target_sum=1e4)
sc.pp.log1p(rna)
sc.pp.highly_variable_genes(rna, min_mean=0.02, max_mean=4, min_disp=0.5)
mu.atac.pp.tfidf(atac, scale_factor=1e4)
sc.pp.normalize_per_cell(atac, counts_per_cell_after=1e4)
sc.pp.log1p(atac)
sc.pp.highly_variable_genes(atac, min_mean=0.05, max_mean=1.5, min_disp=0.5)
mdata.update()
mdata
Metadata.obs7 elements
| rna:n_genes_by_counts | float64 | 2272.00,3254.00,1798.00,1145.00,1495.00,1342.00,25... |
| rna:total_counts | float32 | 4747.00,7761.00,3666.00,2162.00,2910.00,2616.00,66... |
| rna:total_counts_mt | float32 | 369.00,693.00,409.00,271.00,293.00,213.00,308.00,6... |
| rna:pct_counts_mt | float32 | 7.77,8.93,11.16,12.53,10.07,8.14,4.62,12.36,10.79,... |
| atac:n_genes_by_counts | float64 | 5415.00,9572.00,9459.00,6634.00,7107.00,7378.00,56... |
| atac:total_counts | float32 | 12757.00,23124.00,22912.00,17106.00,18400.00,18723... |
| atac:n_counts | float32 | 6129.60,10004.25,9936.28,6779.99,6849.25,7125.32,5... |
Embeddings & mappings.obsm2 elements
| rna | bool | numpy.ndarray | |
| atac | bool | numpy.ndarray |
Distances.obsp0 elements
No distancesrna2636 × 21256
AnnData object 2636 obs × 21256 varLayers.layers0 elements
No layersMetadata.obs4 elements
| n_genes_by_counts | int32 | 2272,3254,1798,1145,1495,1342,2589,2385,3009,3320,... |
| total_counts | float32 | 4747.00,7761.00,3666.00,2162.00,2910.00,2616.00,66... |
| total_counts_mt | float32 | 369.00,693.00,409.00,271.00,293.00,213.00,308.00,6... |
| pct_counts_mt | float32 | 7.77,8.93,11.16,12.53,10.07,8.14,4.62,12.36,10.79,... |
Embeddings.obsm0 elements
No embeddingsDistances.obsp0 elements
No distancesMiscellaneous.uns2 elements
| log1p | dict | 1 element | base: None |
| hvg | dict | 1 element | flavor: seurat |
atac2450 × 98079
AnnData object 2450 obs × 98079 varLayers.layers0 elements
No layersMetadata.obs3 elements
| n_genes_by_counts | int32 | 5415,9572,9459,6634,7107,7378,5654,5212,13768,1387... |
| total_counts | float32 | 12757.00,23124.00,22912.00,17106.00,18400.00,18723... |
| n_counts | float32 | 6129.60,10004.25,9936.28,6779.99,6849.25,7125.32,5... |
Embeddings.obsm0 elements
No embeddingsDistances.obsp0 elements
No distancesMiscellaneous.uns2 elements
| log1p | dict | 1 element | base: None |
| hvg | dict | 1 element | flavor: seurat |
Fitting a model#
Creating a MOFAFLEX object automatically fits the model using the specified parameters. A basic understanding of the model is required to correctly set the parameters for a given dataset.
Since we have normalized the data in the previous section, we will use a Normal (Gaussian) likelihood.
In general, a negative binomial likelihood is more appropriate for unnormalized count data.
With default settings, MOFA-FLEX will plot the number of observations in each view and group.
This can be turned off by customizing mofaflex.DataOptions.
MOFA-FLEX will also automatically save the final model after training which can later be loaded with mofaflex.MOFAFLEX.load.
The name of the saved file can be changed in mofaflex.TrainingOptions.
model = mfl.MOFAFLEX(
mdata,
mfl.ModelOptions(n_factors=15, likelihoods="Normal"),
mfl.TrainingOptions(batch_size=1000, seed=42),
)
We can plot the loss curve to get an overview of the training process:
Analysing the model#
The trained model can now be analysed to assess its quality and discover relevant biology. First, we plot the correlations between factors, which should be as low as possible: We want factors to be uncorrelated, as each factor should capture a different aspect of the data.
This is not ideal, but good enough for a first look at the data. We can now plot the fraction of variance in the data explained by each factor to determine the most important factors. Note that for non-Normal likelihoods, these values are approximate.
We can plot the most important genes per factor to get an idea of what each factor represents. This plot aggregates over all views.
It appears that the most important features are all from the RNA dataset. We can use a different plotting function to get the top weights individually for each view.
We can also plot factors agains each other. This may be useful to define clusters of cells with similar factor values.
Of course, MOFA-FLEX does not and cannot provide all imaginable analysis functions. It thus provides methods to access the factor and weight values, such that they can be used for manual analysis.
weights = model.get_weights()
factors = model.get_factors()
weights["rna"]
| ISG15 | C1orf159 | AL390719.3 | TNFRSF18 | TNFRSF4 | PLCH2 | MEGF6 | AL365255.1 | NPHP4 | ACOT7 | ... | SPANXA2-OT1 | AFF2 | LINC00893 | TMEM185A | LINC00894 | PDZD4 | SLC10A3 | MTCP1 | BRCC3 | VBP1 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Factor 1 | -0.189376 | -0.073787 | -0.049348 | 0.035380 | 0.047967 | -0.091178 | -0.025529 | -0.001774 | 0.113424 | -0.148712 | ... | 0.051964 | -0.051119 | 0.001705 | -0.005496 | 0.066848 | -0.120905 | 0.086391 | 0.095509 | 0.274914 | -0.091386 |
| Factor 2 | 0.248409 | 0.102557 | 0.029819 | 0.152524 | 0.329639 | -0.084845 | 0.073064 | 0.011662 | 0.051096 | -0.057731 | ... | 0.021905 | 0.038603 | -0.069393 | -0.020068 | 0.007216 | -0.039692 | 0.062904 | -0.104711 | -0.215993 | -0.125060 |
| Factor 3 | 0.384845 | -0.016436 | -0.012052 | -0.008531 | -0.096441 | 0.020407 | -0.150675 | -0.048524 | 0.107885 | -0.001884 | ... | -0.018592 | -0.018152 | 0.035284 | -0.175562 | -0.021295 | -0.021576 | -0.067565 | 0.042839 | 0.085181 | -0.055791 |
| Factor 4 | -0.016846 | 0.079865 | 0.046001 | -0.558960 | -0.572362 | -0.144281 | 0.079722 | -0.001279 | 0.103057 | -0.087016 | ... | -0.000845 | 0.106108 | 0.020730 | 0.040423 | 0.029478 | -0.107213 | -0.051710 | -0.014692 | -0.044695 | -0.092186 |
| Factor 5 | 0.044211 | 0.129531 | 0.048840 | 0.229818 | 0.254385 | 0.001333 | -0.081784 | -0.005494 | 0.128608 | -0.127923 | ... | 0.029484 | 0.063783 | -0.068990 | -0.084985 | -0.038371 | 0.116027 | -0.022857 | 0.003913 | 0.005418 | 0.097961 |
| Factor 6 | -0.618853 | 0.012128 | 0.000998 | 0.137428 | -0.043499 | 0.024533 | 0.022788 | 0.098064 | -0.003297 | 0.041672 | ... | 0.003277 | 0.039947 | -0.060559 | -0.013331 | -0.037571 | 0.125552 | 0.036886 | 0.033270 | 0.039675 | -0.211510 |
| Factor 7 | -0.442458 | -0.101479 | 0.023583 | 0.021900 | -0.025400 | 0.025268 | -0.010805 | -0.070370 | -0.009152 | -0.072582 | ... | 0.032161 | 0.029726 | 0.015703 | 0.066717 | -0.074298 | -0.045682 | 0.017101 | 0.057714 | -0.067829 | 0.175016 |
| Factor 8 | -0.055849 | -0.068473 | -0.042274 | -0.050389 | 0.022151 | -0.007174 | 0.024929 | 0.053787 | -0.038904 | 0.020470 | ... | -0.038868 | 0.070869 | -0.020878 | 0.079161 | 0.106335 | -0.110068 | -0.017513 | 0.049150 | 0.053820 | -0.074065 |
| Factor 9 | 0.847176 | -0.144409 | -0.082529 | -0.125883 | -0.021260 | 0.076999 | 0.086520 | -0.044230 | 0.058836 | 0.120648 | ... | 0.062515 | 0.004620 | 0.024979 | 0.045098 | 0.093637 | 0.085579 | 0.036728 | -0.073276 | -0.049939 | -0.045169 |
| Factor 10 | 0.260658 | -0.018242 | -0.048973 | 0.095457 | -0.118889 | 0.128337 | 0.097937 | 0.013919 | -0.051902 | -0.046666 | ... | -0.019701 | 0.003984 | -0.087492 | 0.046393 | 0.042418 | -0.085316 | 0.038125 | 0.041113 | 0.045989 | 0.145492 |
| Factor 11 | -0.339596 | 0.035699 | 0.036781 | -0.028903 | -0.294902 | 0.033796 | 0.027752 | -0.074360 | 0.002477 | 0.172775 | ... | -0.018687 | 0.141971 | 0.082159 | 0.021479 | 0.253133 | -0.051885 | -0.067781 | -0.044705 | 0.005597 | 0.129108 |
| Factor 12 | 0.140051 | -0.023272 | 0.053881 | -0.398822 | -0.443519 | -0.104664 | 0.022436 | 0.003456 | 0.010753 | -0.123234 | ... | 0.005473 | 0.047491 | -0.030022 | 0.053311 | -0.077040 | 0.153919 | -0.029247 | -0.129744 | -0.092596 | -0.098910 |
| Factor 13 | 0.008263 | -0.002270 | 0.098201 | 0.249155 | 0.543481 | -0.062938 | 0.084722 | -0.030612 | 0.052744 | 0.031904 | ... | -0.021037 | 0.073529 | -0.029792 | -0.118077 | -0.062912 | -0.022290 | -0.148696 | -0.148372 | -0.096425 | 0.122402 |
| Factor 14 | 1.380962 | -0.087727 | 0.009225 | 0.235166 | -0.039743 | 0.009549 | -0.131573 | 0.023868 | 0.013604 | -0.046723 | ... | -0.036650 | -0.089463 | 0.005698 | 0.111561 | -0.090331 | 0.049375 | 0.033765 | -0.003689 | -0.055969 | 0.053556 |
| Factor 15 | -1.152911 | 0.025368 | 0.005462 | -0.023505 | -0.274491 | 0.015588 | -0.015119 | -0.011525 | 0.011900 | -0.101080 | ... | -0.039308 | -0.086269 | 0.002518 | 0.082119 | -0.133889 | -0.046957 | -0.097321 | -0.010911 | -0.036072 | -0.400827 |
15 rows × 3718 columns
weights["atac"]
| chr1:267561-268455 | chr1:629484-630393 | chr1:778284-779202 | chr1:844149-845034 | chr1:857873-858632 | chr1:923392-924241 | chr1:958865-959762 | chr1:1012993-1013910 | chr1:1040399-1041292 | chr1:1068892-1069685 | ... | GL000205.2:67744-68642 | GL000219.1:39937-40840 | GL000219.1:44650-45512 | GL000219.1:45733-46550 | GL000219.1:90065-90960 | GL000219.1:125017-125889 | KI270721.1:2089-2980 | KI270726.1:27153-28037 | KI270726.1:41489-42329 | KI270713.1:29578-30400 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Factor 1 | 0.015588 | -0.068973 | -0.323210 | -0.075846 | -0.078673 | -0.043594 | -0.082744 | -0.230949 | -0.137500 | -0.120305 | ... | 0.192618 | 0.039591 | -0.136961 | -0.108797 | 0.024091 | -0.132526 | -0.083251 | -0.106187 | 0.003338 | -0.140618 |
| Factor 2 | 0.010297 | 0.171070 | -0.272004 | -0.182100 | -0.033910 | 0.060713 | -0.279495 | -0.237344 | -0.373046 | -0.010249 | ... | 0.278677 | -0.176570 | -0.259601 | -0.642847 | -0.032023 | -0.105531 | -0.058108 | -0.053535 | -0.071863 | 0.093508 |
| Factor 3 | -0.028898 | 0.051319 | 0.071390 | -0.293090 | -0.084974 | -0.074341 | -0.122872 | -0.012087 | -0.311388 | -0.116551 | ... | 0.213033 | -0.107144 | -0.029083 | -0.057040 | -0.043446 | 0.101682 | 0.070688 | -0.050273 | 0.063659 | -0.128200 |
| Factor 4 | 0.067406 | -0.064552 | -0.144720 | 0.222972 | 0.198841 | -0.117017 | -0.030485 | 0.141264 | 0.137078 | 0.002803 | ... | -0.195972 | -0.159258 | 0.034300 | -0.262307 | -0.124259 | -0.005265 | 0.026266 | 0.027857 | -0.010228 | 0.033526 |
| Factor 5 | 0.134998 | -0.118389 | -0.048504 | -0.033692 | -0.094288 | -0.105451 | -0.134580 | 0.170727 | -0.105009 | -0.052774 | ... | -0.299938 | -0.023304 | -0.075331 | 0.213807 | 0.079724 | -0.047202 | 0.067408 | -0.105169 | -0.095975 | -0.008091 |
| Factor 6 | -0.094902 | 0.175581 | 0.199167 | 0.130938 | 0.128411 | -0.070910 | 0.290823 | -0.085090 | 0.139106 | 0.375201 | ... | -0.055348 | 0.008064 | 0.035890 | 0.025920 | -0.017034 | -0.139985 | -0.021245 | 0.011567 | 0.146342 | 0.039091 |
| Factor 7 | -0.029667 | 0.232662 | 0.137632 | -0.034105 | 0.026049 | 0.141460 | -0.023896 | 0.103624 | 0.080493 | 0.355293 | ... | 0.093679 | 0.253191 | 0.091128 | -0.129581 | 0.027496 | 0.098308 | -0.037056 | 0.088649 | 0.149442 | -0.027713 |
| Factor 8 | 0.105609 | 0.116295 | 0.127468 | 0.013525 | 0.036117 | 0.050187 | -0.022508 | -0.186174 | -0.188627 | -0.149945 | ... | -0.042306 | -0.212531 | 0.053144 | -0.260661 | -0.238006 | 0.011654 | -0.002434 | -0.004869 | -0.002524 | 0.011428 |
| Factor 9 | -0.088714 | -0.236125 | -0.034857 | -0.275669 | -0.199103 | -0.208306 | -0.173522 | -0.175142 | -0.361846 | -0.388860 | ... | -0.062702 | 0.381349 | 0.329235 | 0.350945 | 0.091241 | 0.032202 | -0.146296 | -0.041344 | -0.130320 | -0.230856 |
| Factor 10 | 0.003886 | 0.182097 | 0.042181 | 0.142671 | 0.075899 | 0.056084 | 0.086588 | 0.131228 | 0.225935 | 0.243304 | ... | 0.153632 | -0.270489 | -0.095530 | -0.105563 | -0.187800 | 0.022895 | -0.096935 | -0.006983 | 0.029871 | 0.032132 |
| Factor 11 | -0.040994 | -0.144084 | -0.171954 | -0.104147 | 0.027021 | -0.026749 | -0.303967 | -0.186920 | -0.241965 | -0.203547 | ... | 0.040131 | -0.203556 | -0.029991 | 0.042943 | -0.209069 | 0.003288 | -0.004122 | 0.006341 | -0.013852 | -0.120541 |
| Factor 12 | 0.057618 | 0.117719 | -0.108942 | -0.091153 | -0.023501 | -0.011017 | -0.033363 | -0.139303 | -0.102051 | 0.049994 | ... | 0.412098 | 0.178078 | 0.142233 | 0.369375 | 0.073314 | 0.032400 | 0.120729 | -0.055238 | -0.183104 | -0.080130 |
| Factor 13 | -0.156427 | -0.054138 | 0.163906 | 0.073237 | 0.101645 | 0.183813 | 0.142219 | 0.094903 | -0.288277 | 0.467657 | ... | 0.027979 | 0.187276 | -0.064292 | -0.197571 | 0.028351 | -0.001363 | 0.196256 | 0.097051 | -0.038267 | -0.153199 |
| Factor 14 | -0.070237 | -0.079855 | -0.091271 | -0.036744 | -0.044658 | 0.030472 | -0.194341 | 0.209160 | -0.318201 | -0.377731 | ... | -0.011044 | -0.237870 | -0.114371 | -0.183502 | -0.053019 | -0.095012 | -0.068152 | -0.062755 | -0.023839 | 0.033481 |
| Factor 15 | -0.034799 | 0.015444 | 0.088015 | 0.141185 | -0.030220 | 0.149655 | 0.039116 | -0.162965 | -0.071959 | -0.050679 | ... | -0.063380 | -0.320987 | -0.064431 | -0.133580 | -0.006968 | -0.118564 | 0.050569 | 0.075011 | 0.030030 | -0.033265 |
15 rows × 20985 columns
factors["group_1"]
| Factor 1 | Factor 2 | Factor 3 | Factor 4 | Factor 5 | Factor 6 | Factor 7 | Factor 8 | Factor 9 | Factor 10 | Factor 11 | Factor 12 | Factor 13 | Factor 14 | Factor 15 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AAACAGCCAAATATCC-1 | -0.711329 | -0.138095 | -0.117859 | -0.758095 | 0.488893 | 0.365223 | -0.317043 | -0.150380 | 0.091710 | 0.352163 | 0.347944 | 0.072361 | -0.308070 | 0.369941 | -0.224889 |
| AAACAGCCAGGAACTG-1 | 0.200870 | 0.226601 | 0.103635 | 0.109878 | -0.087861 | -0.279875 | -0.261920 | 0.095023 | 0.175019 | -0.056246 | 0.125093 | 0.110894 | 0.007098 | 0.348968 | -0.031934 |
| AAACAGCCAGGCTTCG-1 | 0.386977 | 0.143410 | 0.149321 | 0.150196 | 0.409791 | -0.082680 | -0.210290 | 0.268729 | 0.126539 | -0.305792 | 0.150471 | 0.006699 | -0.195209 | -0.029911 | 0.360625 |
| AAACCAACACCTGCTC-1 | 0.076602 | 0.108646 | 0.310807 | -0.504941 | -0.660438 | -0.048514 | 0.184567 | 0.005226 | -0.126196 | 0.157478 | 0.201867 | 0.261932 | 0.003733 | -0.281539 | 0.241265 |
| AAACCAACAGATTCAT-1 | -0.460528 | -0.714788 | -0.103196 | -0.712085 | 0.181718 | 0.024262 | -0.154573 | 0.017619 | 0.033718 | 0.131182 | 0.076497 | 0.016913 | -0.226345 | 0.206774 | 0.051490 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| TGTTGTGCATTGTGTG-1 | -0.124523 | 0.017649 | -0.219138 | 0.317680 | -0.174649 | 0.291931 | 0.288152 | 0.005809 | 0.000549 | 0.103788 | 0.006043 | -0.046411 | 0.079992 | 0.115161 | 0.123362 |
| TTCCCACAGGCGAATA-1 | 0.180360 | 0.090379 | 0.226677 | 0.074547 | 0.022169 | -0.193567 | -0.180897 | 0.119575 | 0.107953 | -0.082141 | 0.144209 | 0.016051 | -0.054936 | 0.239553 | 0.037676 |
| TTGCTTAGTCCTTTAA-1 | 0.156793 | 0.047492 | 0.187003 | 0.003541 | 0.134676 | 0.015608 | -0.149397 | 0.134034 | 0.051219 | -0.063679 | 0.030517 | 0.004928 | -0.028842 | 0.089295 | 0.124001 |
| TTGTCCGGTGCATTAG-1 | 0.192446 | 0.149734 | 0.154785 | 0.046855 | 0.088373 | -0.134648 | -0.230934 | 0.164872 | 0.063290 | -0.153682 | 0.181172 | 0.039701 | -0.090489 | 0.129575 | 0.078037 |
| TTTGTTGGTAGGTTTG-1 | 0.174301 | 0.196018 | 0.306593 | -0.232571 | -0.284522 | 0.040452 | 0.124159 | -0.008937 | -0.192886 | 0.019016 | 0.023248 | 0.296303 | -0.117336 | -0.118165 | -0.064181 |
2695 rows × 15 columns