mofaflex.FeatureSets#

class mofaflex.FeatureSets(feature_sets, name='UNL', remove_empty=True)#

Class for storing a collection of feature sets (see FeatureSet).

This class stores a collection of feature sets and provides set operations for intersection, union, and difference.

Parameters:
  • feature_sets (Collection[FeatureSet]) – The collection of feature sets.

  • name (str (default: 'UNL')) – The name of the feature set collection.

  • remove_empty (bool (default: True)) – Whether to remove empty feature sets.

Attributes table#

empty

Check if the feature set collection is empty.

feature_set_by_name

Return a dictionary of feature set names (key) to feature sets (value).

feature_sets

The collection of feature sets.

features

Return the union of all features in the feature sets.

median_size

Return the median size of the feature sets.

name

The name of the feature set collection.

Methods table#

filter(features[, min_fraction, min_count, ...])

Filter feature sets.

find(partial_name)

Perform a simple search given a (partial) feature set name.

find_similar_pairs([observations, metric, ...])

Find similar pairs of feature sets.

from_dataframe(df[, name, name_col, ...])

Create a FeatureSets object from a DataFrame.

from_dict(d[, name, remove_empty])

Create a FeatureSets object from a dictionary.

from_gmt(path[, name, remove_empty])

Create a FeatureSets object from a GMT file.

keep(names)

Keep feature sets by name.

merge_pairs(pairs)

Merge pairs of feature sets.

merge_similar([observations, metric, ...])

Merge similar feature sets.

remove(names)

Remove feature sets by name.

similarity_to_feature_sets([other, metric, ...])

Compute similarity matrix between feature sets.

similarity_to_observations(observations)

Compute similarity matrix between feature sets using observations as a reference.

subset(features)

Subset feature sets by features.

to_dict()

Convert this feature set collection to a dictionary.

to_gmt(path)

Write this feature set collection to a GMT file.

to_mask([features, sort])

Convert feature sets to a mask.

trim([min_count, max_count])

Trim feature sets by min/max size.

Attributes#

FeatureSets.empty#

Check if the feature set collection is empty.

FeatureSets.feature_set_by_name#

Return a dictionary of feature set names (key) to feature sets (value).

FeatureSets.feature_sets#

The collection of feature sets.

FeatureSets.features#

Return the union of all features in the feature sets.

FeatureSets.median_size#

Return the median size of the feature sets.

FeatureSets.name#

The name of the feature set collection.

Methods#

FeatureSets.filter(features, min_fraction=0.0, min_count=5, max_count=300, keep=None, subset=True)#

Filter feature sets.

Parameters:
  • features (Iterable[str]) – Features to filter.

  • min_fraction (float (default: 0.0)) – Mininimum portion of the feature set to be present in features.

  • min_count (int (default: 5)) – Minimum size of the intersection set between a feature set and the set of features.

  • max_count (int | None (default: 300)) – Maximum size of the intersection set between a feature set and the set of features.

  • keep (Iterable[str] | None (default: None)) – Feature sets to keep regardless of the filter conditions.

  • subset (bool (default: True)) – Whether to subset the resulting feature sets based on features.

Return type:

FeatureSets

Returns:

Filtered feature sets.

FeatureSets.find(partial_name)#

Perform a simple search given a (partial) feature set name.

Parameters:

partial_name (str) – Feature set (partial) name to search for.

Return type:

FeatureSets

FeatureSets.find_similar_pairs(observations=None, metric=None, similarity_threshold=0.8)#

Find similar pairs of feature sets.

Parameters:
  • observations (DataFrame (default: None)) – Dataframe of observations, if provided, the similarity between feature sets is computed based on the correlation of the similarity from the mean of the observations in the feature set.

  • metric (str | None (default: None)) – Similarity metric, by default “jaccard” if observations not provided.

  • similarity_threshold (float (default: 0.8)) – Similarity threshold to consider similar pairs.

Return type:

set[tuple[str, str, float]]

Returns:

Similar pairs of feature sets.

classmethod FeatureSets.from_dataframe(df, name=None, name_col='name', features_col='features', desc_col=None, remove_empty=True)#

Create a FeatureSets object from a DataFrame.

Parameters:
  • df (DataFrame) – DataFrame of feature sets.

  • name (str | None (default: None)) – Name of the collection.

  • name_col (str (default: 'name')) – Name of the column containing feature set names.

  • features_col (str (default: 'features')) – Name of the column containing feature set features.

  • desc_col (str | None (default: None)) – Name of the column containing feature set descriptions.

  • remove_empty (bool (default: True)) – Whether to remove empty feature sets.

Return type:

FeatureSets

classmethod FeatureSets.from_dict(d, name=None, remove_empty=True)#

Create a FeatureSets object from a dictionary.

Parameters:
  • d (dict[str, Iterable[str]]) – Dictionary of feature sets.

  • name (str | None (default: None)) – Name of the collection.

  • remove_empty (bool (default: True)) – Whether to remove empty feature sets.

Return type:

FeatureSets

classmethod FeatureSets.from_gmt(path, name=None, remove_empty=True)#

Create a FeatureSets object from a GMT file.

Parameters:
  • path (str | Path | TextIOBase) – Path to the GMT file.

  • name (str | None (default: None)) – Name of the collection. Defaults to the file name.

  • remove_empty (bool (default: True)) – Whether to remove empty feature sets.

Return type:

FeatureSets

FeatureSets.keep(names)#

Keep feature sets by name.

Parameters:

names (Iterable[str]) – Collection of feature set names.

FeatureSets.merge_pairs(pairs)#

Merge pairs of feature sets.

Parameters:

pairs (Iterable[tuple[str, str]]) – Pairs of feature sets.

Return type:

FeatureSets

Returns:

Merged feature sets.

FeatureSets.merge_similar(observations=None, metric=None, similarity_threshold=0.8, iteratively=True)#

Merge similar feature sets.

Parameters:
  • observations (DataFrame (default: None)) – Dataframe of observations, if provided, the similarity between feature sets is computed based on the correlation of the similarity from the mean of the observations in the feature set.

  • metric (str | None (default: None)) – Similarity metric, by default “jaccard” if observations not provided.

  • similarity_threshold (float (default: 0.8)) – Similarity threshold to consider similar pairs.

  • iteratively (bool (default: True)) – Whether to merge iteratively.

Return type:

FeatureSets

Returns:

Merged feature sets.

FeatureSets.remove(names)#

Remove feature sets by name.

Parameters:

names (Iterable[str]) – Collection of feature set names.

FeatureSets.similarity_to_feature_sets(other=None, metric='jaccard', metric_kwargs=None)#

Compute similarity matrix between feature sets.

Parameters:
  • other (FeatureSets (default: None)) – Other feature set collection, by default None.

  • metric (str (default: 'jaccard')) – Similarity metric, by default “jaccard”.

  • metric_kwargs (dict | None (default: None)) – further arguments to scipy.spatial.distance.cdist

Return type:

DataFrame

Returns:

Similarity matrix as 1 minus distance matrix, may lead to negative values for some distance metrics.

FeatureSets.similarity_to_observations(observations)#

Compute similarity matrix between feature sets using observations as a reference.

Parameters:

observations (DataFrame) – Dataframe of observations.

Return type:

DataFrame

Returns:

Similarity matrix as correlation matrix.

FeatureSets.subset(features)#

Subset feature sets by features.

Parameters:

features (Iterable[str]) – Collection of features.

FeatureSets.to_dict()#

Convert this feature set collection to a dictionary.

Return type:

dict[str, Iterable[str]]

Returns:

Dictionary of feature sets.

FeatureSets.to_gmt(path)#

Write this feature set collection to a GMT file.

Parameters:

path (str | Path | TextIOBase) – Path to the output file.

FeatureSets.to_mask(features=None, sort=True)#

Convert feature sets to a mask.

Parameters:
  • features (Iterable[str] | None (default: None)) – Collection of features.

  • sort (bool (default: True)) – Sort feature sets alphabetically.

Return type:

DataFrame

Returns:

Binary mask of features.

FeatureSets.trim(min_count=1, max_count=None)#

Trim feature sets by min/max size.

Parameters:
  • min_count (int (default: 1)) – Minimum number of features, by default 1.

  • max_count (int | None (default: None)) – Maximum number of features, by default None.