mofaflex.FeatureSets#
- class mofaflex.FeatureSets(feature_sets, name='UNL', remove_empty=True)#
Class for storing a collection of feature sets (see FeatureSet).
This class stores a collection of feature sets and provides set operations for intersection, union, and difference.
- Parameters:
feature_sets (
Collection[FeatureSet]) – The collection of feature sets.name (
str(default:'UNL')) – The name of the feature set collection.remove_empty (
bool(default:True)) – Whether to remove empty feature sets.
Attributes table#
Check if the feature set collection is empty. |
|
Return a dictionary of feature set names (key) to feature sets (value). |
|
The collection of feature sets. |
|
Return the union of all features in the feature sets. |
|
Return the median size of the feature sets. |
|
The name of the feature set collection. |
Methods table#
|
Filter feature sets. |
|
Perform a simple search given a (partial) feature set name. |
|
Find similar pairs of feature sets. |
|
Create a FeatureSets object from a DataFrame. |
|
Create a FeatureSets object from a dictionary. |
|
Create a FeatureSets object from a GMT file. |
|
Keep feature sets by name. |
|
Merge pairs of feature sets. |
|
Merge similar feature sets. |
|
Remove feature sets by name. |
|
Compute similarity matrix between feature sets. |
|
Compute similarity matrix between feature sets using observations as a reference. |
|
Subset feature sets by features. |
|
Convert this feature set collection to a dictionary. |
|
Write this feature set collection to a GMT file. |
|
Convert feature sets to a mask. |
|
Trim feature sets by min/max size. |
Attributes#
- FeatureSets.empty#
Check if the feature set collection is empty.
- FeatureSets.feature_set_by_name#
Return a dictionary of feature set names (key) to feature sets (value).
- FeatureSets.feature_sets#
The collection of feature sets.
- FeatureSets.features#
Return the union of all features in the feature sets.
- FeatureSets.median_size#
Return the median size of the feature sets.
- FeatureSets.name#
The name of the feature set collection.
Methods#
- FeatureSets.filter(features, min_fraction=0.0, min_count=5, max_count=300, keep=None, subset=True)#
Filter feature sets.
- Parameters:
min_fraction (
float(default:0.0)) – Mininimum portion of the feature set to be present infeatures.min_count (
int(default:5)) – Minimum size of the intersection set between a feature set and the set offeatures.max_count (
int|None(default:300)) – Maximum size of the intersection set between a feature set and the set offeatures.keep (
Iterable[str] |None(default:None)) – Feature sets to keep regardless of the filter conditions.subset (
bool(default:True)) – Whether to subset the resulting feature sets based onfeatures.
- Return type:
- Returns:
Filtered feature sets.
- FeatureSets.find(partial_name)#
Perform a simple search given a (partial) feature set name.
- Parameters:
partial_name (
str) – Feature set (partial) name to search for.- Return type:
- FeatureSets.find_similar_pairs(observations=None, metric=None, similarity_threshold=0.8)#
Find similar pairs of feature sets.
- Parameters:
observations (
DataFrame(default:None)) – Dataframe of observations, if provided, the similarity between feature sets is computed based on the correlation of the similarity from the mean of the observations in the feature set.metric (
str|None(default:None)) – Similarity metric, by default “jaccard” if observations not provided.similarity_threshold (
float(default:0.8)) – Similarity threshold to consider similar pairs.
- Return type:
- Returns:
Similar pairs of feature sets.
- classmethod FeatureSets.from_dataframe(df, name=None, name_col='name', features_col='features', desc_col=None, remove_empty=True)#
Create a FeatureSets object from a DataFrame.
- Parameters:
df (
DataFrame) – DataFrame of feature sets.name_col (
str(default:'name')) – Name of the column containing feature set names.features_col (
str(default:'features')) – Name of the column containing feature set features.desc_col (
str|None(default:None)) – Name of the column containing feature set descriptions.remove_empty (
bool(default:True)) – Whether to remove empty feature sets.
- Return type:
- classmethod FeatureSets.from_dict(d, name=None, remove_empty=True)#
Create a FeatureSets object from a dictionary.
- classmethod FeatureSets.from_gmt(path, name=None, remove_empty=True)#
Create a FeatureSets object from a GMT file.
- Parameters:
- Return type:
- FeatureSets.keep(names)#
Keep feature sets by name.
- FeatureSets.merge_pairs(pairs)#
Merge pairs of feature sets.
- Parameters:
- Return type:
- Returns:
Merged feature sets.
- FeatureSets.merge_similar(observations=None, metric=None, similarity_threshold=0.8, iteratively=True)#
Merge similar feature sets.
- Parameters:
observations (
DataFrame(default:None)) – Dataframe of observations, if provided, the similarity between feature sets is computed based on the correlation of the similarity from the mean of the observations in the feature set.metric (
str|None(default:None)) – Similarity metric, by default “jaccard” if observations not provided.similarity_threshold (
float(default:0.8)) – Similarity threshold to consider similar pairs.iteratively (
bool(default:True)) – Whether to merge iteratively.
- Return type:
- Returns:
Merged feature sets.
- FeatureSets.remove(names)#
Remove feature sets by name.
- FeatureSets.similarity_to_feature_sets(other=None, metric='jaccard', metric_kwargs=None)#
Compute similarity matrix between feature sets.
- Parameters:
other (
FeatureSets(default:None)) – Other feature set collection, by default None.metric (
str(default:'jaccard')) – Similarity metric, by default “jaccard”.metric_kwargs (
dict|None(default:None)) – further arguments toscipy.spatial.distance.cdist
- Return type:
- Returns:
Similarity matrix as 1 minus distance matrix, may lead to negative values for some distance metrics.
- FeatureSets.similarity_to_observations(observations)#
Compute similarity matrix between feature sets using observations as a reference.
- FeatureSets.subset(features)#
Subset feature sets by features.
- FeatureSets.to_dict()#
Convert this feature set collection to a dictionary.
- FeatureSets.to_gmt(path)#
Write this feature set collection to a GMT file.
- Parameters:
path (
str|Path|TextIOBase) – Path to the output file.
- FeatureSets.to_mask(features=None, sort=True)#
Convert feature sets to a mask.