name: pymaftools description: Use when writing code that imports pymaftools, or when the user asks about MAF files, oncoplots, mutation analysis, multi-omics integration, copy number variation, gene expression analysis, lollipop plots, or any bioinformatics genomic analysis task.
pymaftools — Python MAF Analysis Toolkit
You are an expert in using the pymaftools package for genomic and multi-omics analysis. The source code is at ./pymaftools/pymaftools/. Always refer to the actual source when uncertain about API details.
Package Overview
pymaftools provides tools for loading, analyzing, and visualizing Mutation Annotation Format (MAF) files and multi-omics cancer genomics data.
Core Classes & Usage
MAF — Load & Filter Mutation Files
from pymaftools import MAF
maf = MAF.read_maf("path/to/file.maf", sample_id="SampleA")
# Filter to nonsynonymous mutations
filtered = maf.filter_maf(filter_type="nonsynonymous")
# Convert to PivotTable (gene x sample matrix)
pt = maf.to_pivot_table()
# Merge multiple MAFs
merged = MAF.merge_mafs([maf1, maf2])
# Protein info for lollipop plots
AA_length, mutations_data = maf.get_protein_info("EGFR")
domains_data, refseq_ID = MAF.get_domain_info("EGFR", AA_length)
PivotTable — Core Analysis Data Structure
A gene/feature x sample matrix with synchronized metadata.
from pymaftools import PivotTable
pt = maf.to_pivot_table()
pt = pt.add_freq() # Add mutation frequency to feature_metadata
pt = pt.filter_by_freq(min_freq=0.05) # Keep genes mutated in >=5% samples
pt = pt.calculate_TMB() # Add TMB to sample_metadata
similarity = pt.compute_similarity(method="jaccard")
# Sorting
pt = pt.sort_features(by="freq")
pt = pt.sort_samples_by_mutations()
pt = pt.sort_samples_by_group(group_col="subtype", group_order=["A", "B"], top=10)
# Subsetting
pt_sub = pt.subset(features=["TP53", "KRAS"], samples=sample_list)
# Persistence
pt.to_sqlite("data.db")
pt = PivotTable.read_sqlite("data.db")
# Visualization (lazy-loaded accessor)
pt.plot.plot_pca_samples(group_col="subtype")
pt.plot.plot_boxplot_with_annot(group_col="subtype", value_col="TMB")
pt.plot.plot_heatmap()
Key attributes:
-
pt.feature_metadata— DataFrame indexed by features (genes) -
pt.sample_metadata— DataFrame indexed by samples
PivotTable — Advanced Filtering
# Filter by variance (keep top 25% most variable features)
pt = pt.filter_by_variance(quantile=0.75, method="var") # method: "var" or "mad"
pt = pt.filter_by_variance(threshold=0.5, method="mad") # absolute threshold
# Filter by statistical test with FDR correction
pt = pt.filter_by_statistical_test(
group_col="subtype",
method="kruskal", # "ttest", "mann_whitney", "kruskal", "anova"
alpha=0.05
)
# Results in feature_metadata: "p_value", "adjusted_p_value"
# Group frequencies
pt = pt.add_freq(
groups={"LUAD": pt.subset(samples=pt.sample_metadata.subtype == "LUAD"),
"LUSC": pt.subset(samples=pt.sample_metadata.subtype == "LUSC")}
)
Cohort — Multi-Omics Container
from pymaftools import Cohort
cohort = Cohort(sample_IDs=sample_list)
cohort.add_table("mutations", mutation_pt)
cohort.add_table("cnv", cnv_table)
cohort.add_table("expression", expr_table)
cohort.add_sample_metadata(clinical_df)
# Access tables as attributes
cohort.mutations
cohort.cnv
# Persistence
cohort.to_sqlite("cohort.db")
cohort = Cohort.read_sqlite("cohort.db")
Specialized Table Types
from pymaftools import (
CopyNumberVariationTable, ExpressionTable, SignatureTable,
CancerCellFractionTable, SmallVariationTable
)
# GISTIC results
cnv = CopyNumberVariationTable.read_gistic_arm_level("arm_level.txt")
cnv = CopyNumberVariationTable.read_gistic_gene_level("gene_level.txt")
# Expression data
expr = ExpressionTable(expression_df)
cluster_expr = expr.to_cluster_table()
# COSMIC signatures
sig = SignatureTable.read_signature("signature_file.txt")
# Cancer cell fraction (PyClone output)
ccf_table = CancerCellFractionTable.pyclone_to_sorted_table("pyclone_results.tsv")
# SmallVariationTable — PivotTable subclass for SNV/INDEL data
svt = SmallVariationTable(snv_data)
Pairwise Analysis
from pymaftools import SimilarityMatrix
sim = pt.compute_similarity(method="jaccard") # Returns SimilarityMatrix
sim.get_mean_group_similarity(group_series)
sim.calculate_group_similarity_pvalues(group_series, n_permutations=1000)
sim.plot_group_heatmap(group_series)
Visualization
OncoPlot — Mutation Landscape
OncoPlot takes a PivotTable in the constructor and uses method chaining.
from pymaftools import OncoPlot, ColorManager
oncoplot = (OncoPlot(pt)
.set_config(figsize=(15, 10),
width_ratios=[20, 2, 2], # heatmap, freq bar, legend
categorical_columns=["subtype"]) # optional metadata columns
.mutation_heatmap() # categorical mutation heatmap
.plot_freq() # frequency bar chart
.plot_bar() # sample bar chart
.plot_categorical_metadata(cmap_dict=cmap_dict) # sample metadata rows
.plot_all_legends()
.save("oncoplot.png", dpi=300)
)
# Numeric heatmap (e.g., CNV data)
oncoplot = (OncoPlot(cnv_table)
.set_config(figsize=(30, 10), width_ratios=[25, 1, 0, 3])
.numeric_heatmap(cmap="coolwarm", vmin=-2, vmax=2)
.plot_bar()
.plot_categorical_metadata(cmap_dict=cmap_dict)
.plot_all_legends()
.save("cnv_oncoplot.tiff", dpi=600)
)
LollipopPlot — Protein Mutations
from pymaftools import LollipopPlot
AA_length, mutations_data = maf.get_protein_info("TP53")
domains_data, refseq_ID = MAF.get_domain_info("TP53", AA_length)
plot = LollipopPlot(
protein_name="TP53",
protein_length=AA_length,
domains=domains_data,
mutations=mutations_data
)
plot.plot()
Color & Font Management
from pymaftools import ColorManager, FontManager
cm = ColorManager()
cm.get_cmap("nonsynonymous") # Mutation type colors
cm.get_cmap("cnv") # CNV colors
cm.register_cmap("custom", {"A": "red", "B": "blue"})
fm = FontManager()
fm.setup_matplotlib_fonts(family="Arial", size=10)
ModelPlot & MethodsPlot
from pymaftools import ModelPlot, MethodsPlot
# ModelPlot — model performance visualizations (inherits BasePlot)
model_plot = ModelPlot()
# MethodsPlot — 3D cohort demonstration plots
methods_plot = MethodsPlot()
Machine Learning
Stacking Model for Multi-Omics
from pymaftools import OmicsStackingModel
from pymaftools.model.modelUtils import (
evaluate_model, cross_validate_importance, get_importance,
to_importance_table, plot_top_feature_importance_heatmap,
run_rfecv_feature_selection, run_model_evaluation,
plot_metric_comparison_with_annotation
)
model = OmicsStackingModel()
model.fit(cohort, labels)
preds = model.predict(cohort)
proba = model.predict_proba(cohort)
importance = model.get_omics_feature_importance()
weights = model.get_omics_weights()
# Evaluation
metrics = evaluate_model(model, X_test, y_test) # Returns Accuracy, F1, AUC
results = cross_validate_importance(model, X, y, n_seeds=10)
# Feature selection
selected = run_rfecv_feature_selection(model, X, y)
# Importance visualization
plot_top_feature_importance_heatmap(importance_table, top_n=20)
Utilities
from pymaftools import read_GMT, fetch_msigdb_geneset, PCA_CCA
from pymaftools.utils.geneinfo import get_ncbi_gene_IDs, parse_gene_info
# Gene sets
gmt = read_GMT("pathways.gmt")
geneset = fetch_msigdb_geneset("HALLMARK_TP53_PATHWAY")
# Gene info from NCBI
gene_ids = get_ncbi_gene_IDs(["TP53", "KRAS", "BRCA1"])
# Dimensionality reduction
pca_cca = PCA_CCA()
Important Notes
- PivotTable extends pandas DataFrame — all pandas operations work
- Always call
add_freq()beforefilter_by_freq()orsort_features(by="freq") - OncoPlot takes PivotTable in constructor and uses method chaining — most methods return
self -
filter_by_varianceandfilter_by_statistical_testadd results tofeature_metadata - When unsure about a method's signature, read the source at
./pymaftools/pymaftools/ - Use
.venvwithuvfor package management
chat Comments (0)
Sign in to join the discussion and leave a comment.
Skill Details
GitHub Stars
6
GitHub Forks
2
Created
Mar 2026
Last Updated
il y a 3 mois
tools
tools automation tools
Related Skills
Build your own?
Join 12,000+ developers contributing to the Claude ecosystem.
No comments yet. Be the first to share your thoughts!