mofaflex.FeatureSets

mofaflex.FeatureSets#

class mofaflex.FeatureSets(feature_sets, name='UNL', remove_empty=True)#

Class for storing a collection of feature sets (see FeatureSet).

This class stores a collection of feature sets and provides set operations for intersection, union, and difference.

Parameters:

feature_sets (Collection[FeatureSet]) – The collection of feature sets.
name (str (default: 'UNL')) – The name of the feature set collection.
remove_empty (bool (default: True)) – Whether to remove empty feature sets.

Attributes table#

`empty`	Check if the feature set collection is empty.
`feature_set_by_name`	Return a dictionary of feature set names (key) to feature sets (value).
`feature_sets`	The collection of feature sets.
`features`	Return the union of all features in the feature sets.
`median_size`	Return the median size of the feature sets.
`name`	The name of the feature set collection.

Methods table#

`filter`(features[, min_fraction, min_count, ...])	Filter feature sets.
`find`(partial_name)	Perform a simple search given a (partial) feature set name.
`find_similar_pairs`([observations, metric, ...])	Find similar pairs of feature sets.
`from_dataframe`(df[, name, name_col, ...])	Create a FeatureSets object from a DataFrame.
`from_dict`(d[, name, remove_empty])	Create a FeatureSets object from a dictionary.
`from_gmt`(path[, name, remove_empty])	Create a FeatureSets object from a GMT file.
`keep`(names)	Keep feature sets by name.
`merge_pairs`(pairs)	Merge pairs of feature sets.
`merge_similar`([observations, metric, ...])	Merge similar feature sets.
`remove`(names)	Remove feature sets by name.
`similarity_to_feature_sets`([other, metric, ...])	Compute similarity matrix between feature sets.
`similarity_to_observations`(observations)	Compute similarity matrix between feature sets using observations as a reference.
`subset`(features)	Subset feature sets by features.
`to_dict`()	Convert this feature set collection to a dictionary.
`to_gmt`(path)	Write this feature set collection to a GMT file.
`to_mask`([features, sort])	Convert feature sets to a mask.
`trim`([min_count, max_count])	Trim feature sets by min/max size.

Attributes#

FeatureSets.empty#: Check if the feature set collection is empty.

FeatureSets.feature_set_by_name#: Return a dictionary of feature set names (key) to feature sets (value).

FeatureSets.feature_sets#: The collection of feature sets.

FeatureSets.features#: Return the union of all features in the feature sets.

FeatureSets.median_size#: Return the median size of the feature sets.

FeatureSets.name#: The name of the feature set collection.

Methods#

FeatureSets.filter(features, min_fraction=0.0, min_count=5, max_count=300, keep=None, subset=True)#

Filter feature sets.

Parameters:

features (Iterable[str]) – Features to filter.
min_fraction (float (default: 0.0)) – Mininimum portion of the feature set to be present in features.
min_count (int (default: 5)) – Minimum size of the intersection set between a feature set and the set of features.
max_count (int | None (default: 300)) – Maximum size of the intersection set between a feature set and the set of features.
keep (Iterable[str] | None (default: None)) – Feature sets to keep regardless of the filter conditions.
subset (bool (default: True)) – Whether to subset the resulting feature sets based on features.

Return type:

FeatureSets

Returns:

Filtered feature sets.

FeatureSets.find(partial_name)#

Perform a simple search given a (partial) feature set name.

Parameters:: partial_name (str) – Feature set (partial) name to search for.
Return type:: FeatureSets

FeatureSets.find_similar_pairs(observations=None, metric=None, similarity_threshold=0.8)#

Find similar pairs of feature sets.

Parameters:

observations (DataFrame (default: None)) – Dataframe of observations, if provided, the similarity between feature sets is computed based on the correlation of the similarity from the mean of the observations in the feature set.
metric (str | None (default: None)) – Similarity metric, by default “jaccard” if observations not provided.
similarity_threshold (float (default: 0.8)) – Similarity threshold to consider similar pairs.

Return type:

set[tuple[str, str, float]]

Returns:

Similar pairs of feature sets.

classmethod FeatureSets.from_dataframe(df, name=None, name_col='name', features_col='features', desc_col=None, remove_empty=True)#

Create a FeatureSets object from a DataFrame.

Parameters:

df (DataFrame) – DataFrame of feature sets.
name (str | None (default: None)) – Name of the collection.
name_col (str (default: 'name')) – Name of the column containing feature set names.
features_col (str (default: 'features')) – Name of the column containing feature set features.
desc_col (str | None (default: None)) – Name of the column containing feature set descriptions.
remove_empty (bool (default: True)) – Whether to remove empty feature sets.

Return type:

FeatureSets

classmethod FeatureSets.from_dict(d, name=None, remove_empty=True)#

Create a FeatureSets object from a dictionary.

Parameters:

d (dict[str, Iterable[str]]) – Dictionary of feature sets.
name (str | None (default: None)) – Name of the collection.
remove_empty (bool (default: True)) – Whether to remove empty feature sets.

Return type:

FeatureSets

classmethod FeatureSets.from_gmt(path, name=None, remove_empty=True)#

Create a FeatureSets object from a GMT file.

Parameters:

path (str | Path | TextIOBase) – Path to the GMT file.
name (str | None (default: None)) – Name of the collection. Defaults to the file name.
remove_empty (bool (default: True)) – Whether to remove empty feature sets.

Return type:

FeatureSets

FeatureSets.keep(names)#

Keep feature sets by name.

Parameters:: names (Iterable[str]) – Collection of feature set names.

FeatureSets.merge_pairs(pairs)#

Merge pairs of feature sets.

Parameters:: pairs (Iterable[tuple[str, str]]) – Pairs of feature sets.
Return type:: FeatureSets
Returns:: Merged feature sets.

FeatureSets.merge_similar(observations=None, metric=None, similarity_threshold=0.8, iteratively=True)#

Merge similar feature sets.

Parameters:

observations (DataFrame (default: None)) – Dataframe of observations, if provided, the similarity between feature sets is computed based on the correlation of the similarity from the mean of the observations in the feature set.
metric (str | None (default: None)) – Similarity metric, by default “jaccard” if observations not provided.
similarity_threshold (float (default: 0.8)) – Similarity threshold to consider similar pairs.
iteratively (bool (default: True)) – Whether to merge iteratively.

Return type:

FeatureSets

Returns:

Merged feature sets.

FeatureSets.remove(names)#

Remove feature sets by name.

Parameters:: names (Iterable[str]) – Collection of feature set names.

FeatureSets.similarity_to_feature_sets(other=None, metric='jaccard', metric_kwargs=None)#

Compute similarity matrix between feature sets.

Parameters:

other (FeatureSets (default: None)) – Other feature set collection, by default None.
metric (str (default: 'jaccard')) – Similarity metric, by default “jaccard”.
metric_kwargs (dict | None (default: None)) – further arguments to scipy.spatial.distance.cdist

Return type:

DataFrame

Returns:

Similarity matrix as 1 minus distance matrix, may lead to negative values for some distance metrics.

FeatureSets.similarity_to_observations(observations)#

Compute similarity matrix between feature sets using observations as a reference.

Parameters:: observations (DataFrame) – Dataframe of observations.
Return type:: DataFrame
Returns:: Similarity matrix as correlation matrix.

FeatureSets.subset(features)#

Subset feature sets by features.

Parameters:: features (Iterable[str]) – Collection of features.

FeatureSets.to_dict()#

Convert this feature set collection to a dictionary.

Return type:: dict[str, Iterable[str]]
Returns:: Dictionary of feature sets.

FeatureSets.to_gmt(path)#

Write this feature set collection to a GMT file.

Parameters:: path (str | Path | TextIOBase) – Path to the output file.

FeatureSets.to_mask(features=None, sort=True)#

Convert feature sets to a mask.

Parameters:

features (Iterable[str] | None (default: None)) – Collection of features.
sort (bool (default: True)) – Sort feature sets alphabetically.

Return type:

DataFrame

Returns:

Binary mask of features.

FeatureSets.trim(min_count=1, max_count=None)#

Trim feature sets by min/max size.

Parameters:

min_count (int (default: 1)) – Minimum number of features, by default 1.
max_count (int | None (default: None)) – Maximum number of features, by default None.

mofaflex.FeatureSets

Contents

mofaflex.FeatureSets#

Attributes table#

Methods table#

Attributes#

Methods#