VarSome's ACMG Implementation
VarSome's ACMG implementation strives to follow the official published ACMG guidelines. Our scientific and clinical team have provided direction to ensure we are interpreting the rules correctly, and have also been provided with guidance on the approach or thresholds to use for automatic classification.
VarSome's ACMG classification is provided for educational use only - as indeed are the ACMG guidelines themselves.
Transparency & Clarity
Rather than writing reams of documentation, each "rule" in VarSome provides a detailed explanation of why triggered or not. This makes the workings of the system clear and transparent, whilst also ensuring that the explanations are always fully consistent with the coded logic.
All thresholds used are explicitly visible in the explanation, and the annotation itself only uses the data available in the VarSome page & genome browser. This not only guarantees consistency, it also makes it possible for the user to verify the classifications by looking up the corresponding data in the rest of the VarSome variant & gene pages.
Saphetor's molecular database underpins the classification, allowing for extremely efficient look-up operations against more than 30 different databases. For the ACMG classification we use the following:
- Transcripts: RefSeq & Ensembl, from which we deduce coding & functional impacts.
- Frequencies: GnomAD exomes & genomes frequencies & coverage.
- Pathogenicity: UniProt, ClinVar and VarSome user classifications.
- Proteins: UniProt regions.
- Splice Site Prediction: scSNV via dbNSFP, note: this is limited to SNVs only.
- Conservation: GERP++.
- Genes: CGD for mode of inheritance & links to diseases, ExAC for probabilities of tolerance or loss-of-function
- Computational Predictions: DANN, GERP, Cosmic, FATHMM, and other databases via dbNSFP (SNVs only): LRT, MetaLR, MetaSVM, MutationAssessor, MutationTaster, PROVEAN, FATHMM-MKL & SIFT.
More databases will be integrated as they become available.
We maintain a database of "Known Variants" that is used for rules that require statistical heuristics (hotspots, protein functional domains, gene spectra of variation etc.). This database will be used to deduce, for example, that "most synonymous variants in gene BRAF are benign", or that "missense variants in gene IDS are most likely pathogenic".
The database is constructed using all the variants in ClinVar, UniProt or that have been manually classified by VarSome users. We discard ClinVar entries that are "literature only" in order to improve the data quality (but we do not use review stars here).
Coding Impact: we assign a unique coding impact (exon deletion, splice junction loss, nonsense, frameshift, stop loss, start loss, missense, in-frame indel, synonymous, non-coding) to each variant. Where there are multiple transcripts with differing coding impacts, we pick the single "most serious" coding impact, so each variant is only counted once, thus avoiding any duplication.
Pathogenicity: when counting variants within a gene, protein domain or region, we group Pathogenic & Likely Pathogenic variants, and Benign or Likely Benign. Uncertain Significance variants are ignored.
This database of known variants is then used to compute statistcs with a region or a gene as follows:
- Gene Statistics: Many genes have a defined spectrum of pathogenic and benign variation. VarSome displays a table with the numbers of all known benign / pathogenic variants for a given gene, grouped by their coding impact. These are used in rules PP2, BP1 & PVS1.
- Protein Domains: if a variant falls within a known functional domain (per UniProt) we then count the pathogenic / benign variants within the domain in order to trigger rules PM1. If there are at least 10 known variants, and 2/3rds of them (66.7%) are pathogenic, then that will trigger rule PM1.
- Hotspots: to determine whether a variant is in a mutational hotspot, we count all the known variants within 18 base-pairs (6 codons) of the variant, effectively scanning a region of 36 base-pairs centred on the variant. These counts are then used for PM1 as above.
- Rule BP3 also uses this database of known variants to verify that there are no known pathogenic variants within (or near) a repeat region.
The ACMG guidelines were intended for human interpretation rather than machines, and whilst there are cases with strictly defined thresholds (variant population frequencies for example) many of the rules are really up to human judgement and experience.
Our ACMG implementation uses approximately 30 constants internally for these sorts of heuristics. The values for these thresholds have been established by looking at well-known variants, taking direction from our advisors and reputable labs and reading relevant research papers.
We strive to display the reasoning used in the explanations provided for each rule, whether the rule was triggered or not:
- PM1 succeeded because: UniProt protein BRAF_HUMAN domain 'Protein kinase' has 117 pathogenic variants out of 125 classified variants = 93.6% (greater than 66.7%).
- BP4 failed because: The position is conserved (GERP++ rejected substitutions = 5.650 is greater than 4.000).
- BP1 failed because: Missense variant in gene BRAF that has 98 pathogenic variants of which 98 pathogenic missense variants = 100.0% which is more than maximum of 10.0%.
Currently there are limited configurable options: users can only instruct VarSome to filter by ClinVar stars, or disable databases such as UniProt or ClinVar.
In future, we may allow users to adjust the thresholds used internally and store common configurations to be used in their organization or laboratory group.
Feedback & Questions
We are always very keen to hear from the VarSome community. Please use the VarSome Feedback form if you have any further questions or suggestions to help us improve the system!