VarSome Germline Classification

(c) Copyright Saphetor SA. All rights reserved.

version: 13.1.2, dated: 15 Mar 2025 06:08:40 UTC

Introduction

The ”Standards and guidelines for the interpretation of sequence variants” was published in 2015 by Sue Richards et al. in their seminal paper (ACMG Guidelines). The VarSome germline variant classifier automatically generates a pathogenicity recommendation based on these guidelines and the vast range of machine-readable genomic data available.

The standards were very much written for interpretation by humans, not machines, they assume the clinician has a deep knowledge of the domain and relevant papers and conditions. Automating these standards is a matter of interpretation, we have opted to statistically quantify terms such as “hot-spot” or “well known” resulting in many thresholds that are tuned via our calibration process.

Our guiding principle throughout has been to implement the best algorithms we could, following the advice from our clinical advisors, feedback from the VarSome user community, and using statistically justified thresholds. All the rules provide clear natural language explanations of why they were triggered and which evidence was used, or indeed, a full explanation of why the criteria were not met (this is currently only visible in VarSome).

We also strive to continuously improve our implementation, adjusting rules or thresholds, incorporating new data sources, and adding refinements as new publications and methodology changes are suggested. We have also implemented some of the general recommendations for using ACMG/AMP criteria published by the SVI (Sequence Variant Interpretation) ClinGen expert panel.

Clinical Evidence

Clinical Evidence is the foundation stone of our ACMG evaluation, we currently source this from:

The VarSome options allow the user to specify a minimum number of stars to filter ClinVar, so entries with fewer stars will be ignored, or similarly disable clinical classifications from UniProt.

Clinically Reported Variants

On a daily basis, we re-annotate all the variants from the sources listed above, this data is then used for all the rules that require clinical evidence, or statistics derived thereof.

The current database was last updated on version 14-Mar-2025 (3.47M records).

For each variant we record its original “source” classification, allele frequency and coding impact. We also re-classify the variants using our implementation of the ACMG rules, with the clinical evidence rules (PS3,BS3,PP5 & BP6) disabled - this is useful in establishing how reliable the evidence might be. The strengths of rules such as PS1 and PM5 will be downgraded if a variant has been reported pathogenic but that it is not confirmed through the independent ACMG re-classification.

This database is displayed in VarSome as a “lollipop graph” in the genome browser:

The graph can be filtered by coding impact, various types of null variants or pathogenicity sources.

Gene Statistics

This database is derived from the clinically reported variants and is also updated daily: it keeps track of how many variants are benign/pathogenic for each gene, along with their coding impacts and exon location - these are used in rules PP2 and BP1 for example.

The gene statistics are displayed in the VarSome “gene” page:

We derive a “benign cut-off frequency” from these variant classifications & their allele frequencies for use in rule BS1.

Mode Of Inheritance

A number of rules (PM2, BS2, BP1) depend on the mode of inheritance for a given gene.

We aggregate the information from different sources to assess the mode of inheritance of a gene, taking the level of confidence of each source into consideration:

  • OMIM®: source of inheritance data available for VarSome Premium, VarSome Clinical and VarSome API users. It is considered as definitive (strongest evidence) unless the relationship between the gene and the phenotype is provisional (labeled with a question mark at the beginning of the phenotype name).
  • CGD: source of inheritance data available for all users. This source is considered with the strongest confidence, together with OMIM®.
  • ClinGen Disease Validity, gene2Phenotype, GenCC and PanelApp: these are used as supplementary sources to CGD and OMIM®. We use the confidence level (e.g. Strong, Definitive, Supporting, Disputed Evidence), to assess the mode of inheritance. Lower confidence level entries will have lower strength in the final inheritance selection.
  • Domino: This is a high quality in-silico prediction tool, as detailed in PMID:5630195. This source is only used for genes that do not have any other entry.

We compare Domino's 'Probability of Autosomal Dominant' to the following thresholds:

  • Dominant: if more than 0.5934,
  • Recessive: if less than 0.3422.AD/AR: if in between 0.3422 and 0.5934.

'Only OMIM®' option in VarSome Clinical: if a VarSome Clinical user selects 'Only OMIM®' as source to be used for inheritance when launching an analysis, we will only use OMIM® to get the mode of inheritance of a gene.

Note that we do not currently use the input phenotype for genes that are AD or AR for different diseases.

Splice-Site Prediction

We use the scSNV database and MaxEntScan for splice-site prediction in rules for rules BP7 and PP3. This score is limited to single-nucleotide variants only. A variant will be predicted splicing if 'ADA Boost Splicing' threshold is greater than 0.958.

Conservation

We use PhyloP100Way for conservation tests, this is available for nearly all positions in both genomes, and proves to be a useful indication of whether a variant may be benign or pathogenic.

Conservation is used:

  • To exclude highly-conserved variants from rules BP4 and BP7.
  • As a last-resort fallback in rules PP3 and BP4 if no other in-silico predictions are available.

Thresholds have been carefully calibrated to maximise accuracy whilst not over-calling, see in-silico predictions

  • Supporting Benign: if the score is less than 3.58
  • Supporting Pathogenic: if the score is greater than 7.52,
  • Moderate Pathogenic: if the score is greater than 9.88.

Transcript Selection

All the ACMG rules are evaluated against a single transcript. Selecting this transcript is clearly of critical importance and can modify the outcome of the classification. Transcripts are prioritized according to the following criteria:

  1. Most severe coding impact, or within +/- 2 bases of the splicing site,
  2. MANE Select,
  3. Canonical,
  4. MANE Plus,
  5. Longest transcript.

NIH MANE identifies high quality transcripts that match in both RefSeq and Ensembl, and furthermore match the GRCh38 reference genome perfectly. The MANE Plus Clinical set includes additional transcripts for genes where MANE Select alone is not sufficient to report all clinical variants available in public resources.

The above criteria can be overridden by users as follows:

  • Selecting a different transcript in the VarSome UI.
  • Configuring transcripts to be used for specific genes in VarSome Clinical.

The Ensembl Transcript Support Level (TSL) is a method to highlight the well-supported and poorly-supported transcript models for users, based on the type and quality of the alignments used to annotate the transcript. We disqualify Ensembl transcripts that have a TSL with a value different from 1.

Note: some variants can be in multiple transcripts associated with multiple genes, although it is rare for a variant to be coding in multiple genes. The rules above will first determine the transcript to use, from which the gene is then derived.

Allele Frequency

VarSome currently uses GnomAD exomes & genomes to evaluate allele counts and frequencies, it uses both the frequency data and the coverage data reported for both these databases.

Frequencies will not be considered valid if:

  • Coverage is less than 20,
  • the Allele Number is less than 2000,
  • the GnomAD quality filter is suspect (ie: not PASS).

Rules BA1 and BS1 will iterate through the various ethnicities to see whether the variant is common in a sub-population.

Rule Strengths

Each rule has a default strength recommended by ACMG, however the guidelines also allow the clinician to change the strength of a rule based on the evidence they have at their disposal. We use this option in VarSome to boost or reduce the strength of rules based on the data from the annotation.The user is completely free of course to modify this manually if they disagree using VarSome UI. Our own regression testing shows these 'variable strengths' are very useful in improving the overall accuracy of the automated classifier.

More detail is provided in the documentation for the individual rules, but here are some key examples:

  • PVS1: the strength is adjusted in line with the guidelines for NMD, or is reduced to 'Strong' for variants in the 3' UTR or close to the end of the protein.
  • clinical evidence rules PP5 and BP6: here we may significantly boost the default strength from 'Supporting' to 'Very Strong' if the evidence justifies it. We do this to ensure that 'Expert Panel' or 'Practice Guideline' variants from ClinVar and curated variants from LOVD are correctly highlighted and classified, or similarly to highlight publications linked by VarSome users.
  • PS1 and PM5: we reduce the strength to 'Moderate' or 'Supporting' respectively if the alternative amino-acid variant reported at the same position has not been independently confirmed as pathogenic using the ACMG rules (with clinical evidence disabled).
  • PM1: we boost the strength to 'Strong' if the variant is located in a dense hot-spot, or reduce it to 'Supporting' if, for example, there are a small number of benign variants reported in a protein domain.
  • PP3 and BP4: now follow the revised guidelines for in-silico evidence and may be triggered with moderate or strong strength, and exceptionally very strong if the variant is predicted splicing.

Modifying the rule strengths is a conscious decision we have made in order to ensure we provide the most accurate automated classification possible, but the user can very easily override the strengths provided, or even disable a rule completely using the VarSome UI.

Important: we provide an option to disable all the clinical evidence rules (PS3, BS3, PP5 & BP6) in the VarSome UI.

Germline Classifier Verdict

Rules are combined using the point system described in PMID:32720330

Each rule triggered is assigned a number of points based on the strength of the evidence provided:

  • Supporting: 1 point
  • Moderate: 2 points
  • Strong: 4 points
  • Very Strong: 8 points

A total score is computed as the sum of the points from the pathogenic rules, minus the sum of the points from benign rules.

The total score is then compared to thresholds to assign the final verdict:

  • Pathogenic if greater than or equal to 10,
  • Likely Pathogenic if between 6 and 9 inclusive,
  • Uncertain Significance if between 0 and 5,
  • Likely Benign if between -6 and -1,
  • Benign if less than or equal to -7.

Calibration

Many of the rules implemented here rely on carefully calibrated thresholds, PM1 is a good example where defining a “hot-spot” is clearly a fuzzy measure.

We carefully estimate these thresholds through statistical regression against a large population of reliably curated variants, using the methodology described in Evidence-based calibration of computational tools for missense variant pathogenicity classification and ClinGen recommendations for clinical use of PP3/BP4 criteria. This allows us to consistently establish levels for supporting, moderate or possibly even strong evidence.

In order to measure the overall accuracy of the classifier, or assess the impact of any changes, we disable the clinical evidence rules (PS3, BS3, PP5 and BP6) in order to ensure that the classifier works well in the absence of variant-specific evidence, and thus can be extrapolated reliably beyond the test population. The calibrations are 'fair' in that they do not over-emphasise pathogenic vs benign or uncertain variants: we simply seek to maximise overall accuracy.

Saphetor reserves the right to adjust the implementation of the rules and the calibrated thresholds at any time. In practice this has allowed us to deliver continual improvements in the overall quality of our automated classification - but it also entails that results may change when re-annotating a variant several months later: methodologies, thresholds, and especially the clinical data used to calibrate them, may all have changed.

Although we use machine-learning techniques to adjust the thresholds used, we do not use neural-networks in the actual classification itself. We believe it is important to have fully transparent, justifiable and explainable rules, as opposed to inscrutable black-boxes. The 'AI' aspect is also well captured in the computational evidence, CADD being a prime example of how powerful such approaches can be.

In-Silico Predictions

In-silico prediction tools play an important part in the evaluation of a variant's pathogenicity in the the ACMG rules. They are particularly helpful for missense variants which can be a challenge to classify accurately.

VarSome now implements the ClinGen recommendations from Evidence-based calibration of computational tools for missense variant pathogenicity classification and ClinGen recommendations for clinical use of PP3/BP4 criteria:

  • Only one engine at a time is used, depending on availability of data, in order: MitoTip & MitImpact, MetaRNN, CADD (Premium only), DANN (if CADD is not available).
  • The maximum strength allowed for rules PP3 & BP4 is Strong, even if there may be evidence for Very Strong, with the exception of variants that are predicted splicing (ie: similar to PVS1).
  • The strength is limited to Supporting, if there's Moderate evidence from rules PM1 or PM5.
  • Splice prediction (scSNV) is given priority over the other in-silico predictions.
  • conservation is used for some low-sensitivity variant types, or if no other in-silico prediction is available.

Please refer to PP3 and BP4 for more specific detail.

Calibrated in-silico thresholds

We have calibrated thresholds based on the latest ClinVar data, using the methodology described the publication above, using only missense variants, and ignoring:

  • Variants with allele frequency greater than 0.01,
  • Genes for which no pathogenic missense variants have been reported,
  • Entries with 0 stars (literature only, or no approved methodology),
  • Entries prior to 2018.
Engine
Calibration Variants
(excluding VUS)
Accuracy
Sensitivity
Specificity
Strong
benign
Moderate
benign
Supporting
benign
VUS
Supporting
pathogenic
Moderate
pathogenic
Strong
pathogenic
AlphaMissense 107 325 78.3% 72.4% 82.1% <= 0.0853 <= 0.166 <= 0.316 >= 0.787 >= 0.956 >= 0.994
BLOSUM* 63 334 8.0% 0.0% 13.6% <= -4 <= -2
BayesDel
addAF
75 323 89.2% 90.5% 87.7% <= -0.238 <= -0.0746 <= -0.00775 >= 0.16 >= 0.219 >= 0.421
BayesDel
noAF
75 323 84.8% 85.7% 83.8% <= -0.476 <= -0.204 <= -0.0404 >= 0.133 >= 0.303 >= 0.521
CADD 120 127 84.3% 78.4% 87.4% <= 16.1 <= 22 <= 23.2 >= 25.6 >= 28.8 >= 33
DANN 74 945 25.7% 8.3% 45.8% <= 0.478 <= 0.915 <= 0.974 >= 0.999
DEOGEN2 56 225 70.4% 62.4% 76.6% <= 0.0269 <= 0.258 <= 0.402 >= 0.795 >= 0.875 >= 0.969
EIGEN 68 158 72.7% 75.4% 69.6% <= -0.694 <= -0.0295 <= 0.189 >= 0.683 >= 0.861 >= 1.04
EIGEN-PC 68 158 71.2% 73.4% 68.5% <= -0.857 <= -0.0618 <= 0.224 >= 0.625 >= 0.8 >= 0.976
EVE 35 908 66.7% 73.3% 58.2% <= 0.162 <= 0.255 >= 0.603 >= 0.723 >= 0.905
FATHMM* 58 962 46.4% 36.7% 53.7% <= -1.42 <= 0.46 >= 4.4 >= 5.33
FATHMM-MKL 74 945 56.1% 63.7% 47.4% <= 0.06 <= 0.515 <= 0.758 >= 0.969 >= 0.994
FATHMM-XF 55 094 63.5% 58.8% 67.0% <= 0.105 <= 0.303 <= 0.545 >= 0.896 >= 0.95
LIST-S2 59 444 53.1% 43.3% 60.4% <= 0.747 <= 0.85 >= 0.972 >= 0.989
LRT* 54 453 59.9% 60.0% 59.8% <= -0.00687 <= -0.000435 >= 0
M-CAP 50 457 70.1% 71.1% 69.0% <= 0.00565 <= 0.0366 <= 0.0854 >= 0.29 >= 0.619
MVP 59 831 72.8% 68.0% 76.4% <= 0.261 <= 0.661 <= 0.796 >= 0.943 >= 0.967
MaxEntScan 43 027 94.8% 91.6% 97.0% >= 4.24 >= 5.96 >= 7.65
MetaLR 59 382 65.6% 60.6% 69.5% <= 0.0376 <= 0.206 <= 0.361 >= 0.832 >= 0.91 >= 0.985
MetaRNN 60 582 91.0% 89.0% 92.4% <= 0.108 <= 0.267 <= 0.43 >= 0.748 >= 0.841 >= 0.939
MetaSVM 59 620 69.7% 62.0% 75.5% <= -0.677 <= -0.286 >= 0.794 >= 0.901
MitImpact 1 325 75.6% 51.4% 86.8% <= 0.51 >= 0.54 >= 0.67
MitoTip 731 82.2% 77.4% 88.7% <= 9.44 >= 9.56 >= 15.2
MutPred 32 637 74.1% 81.6% 56.5% <= 0.267 <= 0.403 >= 0.614 >= 0.715 >= 0.861
MutationAssessor 53 180 66.6% 59.3% 72.4% <= 1.19 <= 2.01 >= 2.91 >= 3.5
MutationTaster 75 324 17.2% 0.0% 36.8% <= 0.979 <= 1
PROVEAN* 59 307 67.9% 64.2% 70.7% <= 1.73 <= 2.29 >= 4.38 >= 6.67
Polyphen2-HDIV 54 708 60.2% 58.7% 61.4% <= 0 <= 0.163 <= 0.768 >= 1
Polyphen2-HVAR 54 708 59.4% 49.7% 67.1% <= 0.001 <= 0.112 <= 0.489 >= 0.997 >= 1
PrimateAI 57 863 54.1% 48.0% 58.9% <= 0.428 <= 0.519 >= 0.789 >= 0.895
REVEL 59 620 82.6% 78.9% 85.4% <= 0.133 <= 0.351 <= 0.471 >= 0.685 >= 0.798 >= 0.946
SIFT* 58 365 65.8% 68.1% 64.0% <= -0.095 <= -0.03 >= -0.001
SIFT4G* 58 370 62.0% 59.9% 63.6% <= -0.156 <= -0.063 >= -0.002
phastCons100way
vertebrate
75 805 25.7% 0.0% 54.8% <= 0.424 <= 0.998
phyloP 63 334 61.8% 54.2% 67.1% <= -1.04 <= 1.08 <= 3.58 >= 7.52 >= 9.88
scSNV-ADA 49 904 96.6% 96.1% 96.8% >= 0.957813 >= 0.999322 >= 0.999925
scSNV-RF 49 415 97.3% 97.3% 97.2% >= 0.584 >= 0.832 >= 0.994

Thresholds calibrated on 13/Jun/2024.

* This engine uses high values for benign variants and low values for pathogenic. Multiply the raw score by -1 before using the thresholds in this table.

The accuracy, sensitivity and specificity measures displayed are calculated against the same ClinVar calibration data-set, using the benign-supporting and pathogenic-supporting thresholds we have computed. This may not reflect the thresholds or accuracy reported by the original authors of these in-silico predictors.

Detailed Rule Implementations

PVS1

Null variant (nonsense, frameshift, canonical ±1 or 2 splice sites, initiation codon, single or multiexon deletion) in a gene where LOF is a known mechanism of disease. (Pathogenic, Very Strong)

The rule first establishes whether this is a null variant by checking its coding impact on the transcript:

  • nonsense variant
  • frameshift variant
  • exon deletion variant
  • intronic variant within ±2 bases of the transcript splice site
  • start loss variant.

We determine that LOF is a “Known Mechanism of Disease” from either:

  • The gene statistics: if at least 2 LOF variants in this gene have been reliably reported as pathogenic.
  • GnomAD gene constraints LOF Observed/Expected is less than 0.7555.

For nonsense or frameshift variants that cause NMD we use the default strength (Very Strong) if either of the following is true:

  • Either the exon or the truncated region has at least 2 pathogenic variants
  • The exon or the truncated region is known to affect functional domains reported by UniProt.

Otherwise we reduce the strength to Strong if the variant is frameshift or nonsense, causes NMD and either the exon or the truncated region has at least 1 variant.

For nonsense or frameshift variants that don't cause NMD we use the default strength (Very Strong) if one of the following is true:

  • Either the exon or the truncated region is known to affect functional domains reported by UniProt.
  • There is more than 1 LOF pathogenic variants in either the exon or the truncated region AND the variant removes at least 10% of the protein.

Otherwise we reduce the strength to Strong if either of the following is true:

  • There is at least 1 LOF pathogenic variant in either the exon or the truncated region AND the variant removes at least 10% of the protein.
  • There is more than 1 LOF pathogenic variants in either the exon or the truncated region AND the variant removes less than 10% of the protein.

We reduce the strength to Moderate if there is at least 1 LOF pathogenic variant in either the exon or the truncated region AND the variant removes less than 10% of the protein.

For all other variants we reduce the strength to Strong if the variant is located:

  • in the 3' UTR
  • in the last exon, and would remove less than 10 amino-acids from the protein..

Purely for information, a list of possible associated diseases is sourced from CGD and reported in the rule explanation.

Rule PVS1 disables rules PP3 and PM4 in order to avoid double-counting the same evidence.

Important: Rule PVS1 is disabled for oncogenes when in Somatic (AMP) annotation mode.

PS1

Same amino acid change as a previously established pathogenic variant regardless of nucleotide change. (Pathogenic, Strong)

This rule only applies to missense variants, it considers all possible equivalent amino acid missense variants (ie: resulting in the same amino-acid). The rule will trigger if any pathogenic variants are identified in the clinically reported variants. We then check whether they are independently confirmed pathogenic using the ACMG rules, and if not will reduce the rule strength to accordingly.

PS3

Well-established in vitro or in vivo functional studies supportive of a damaging effect on the gene or gene product. (Pathogenic, Strong)

BS3

Well-established in vitro or in vivo functional studies show no damaging effect on protein function or splicing. (Benign, Strong)

These two rules leverage the clinically reported variants, looking for papers that refer to in-vitro or functional studies. VarSome user contributions are particularly helpful as users are asked to manually confirm the studies referred to in the paper. For papers linked by ClinVar, UniProt, LOVD & MitoMap, we automatically scan the title & abstract and look for potential studies.

Ultimately the papers highlighted by this rule must be reviewed by an experienced clinician.

Important: we provide an option to disable all the clinical evidence rules (PS3, BS3, PP5 & BP6) in the VarSome UI.

PM1

Located in a mutational hot spot and/or critical and well-established functional domain (e.g., active site of an enzyme) without benign variation. (Pathogenic, Moderate)

This rule leverages the clinically reported variants to evaluate how many missense/in-frame pathogenic variants are found in the region of the variant being classified:

  • Hot-Spot: using a region of 25 base-pairs on either side of the variant, the rule checks that there are at least 4 pathogenic variants (only using missense and inframe-indel variants), then weighs them by distance to compute a “proximity score”. The rule triggers with strength supporting, moderate or strong depending on the proximity and density of pathogenic and benign variants located within the hot-spot.
  • Protein Domains: if the variant is within a functional domain reported by UniProt, the rule tallies all the clinically reported missense/in-frame variants within the domain. It checks that the domain contains at least 2 pathogenic variants, and then triggers with strength supporting or strong based on the number of pathogenic, uncertain & benign variants reported within the domain.

The thresholds used by rule PM1 have been established through a careful calibration process and may change over time as further clinical evidence becomes available, or we refine the methodology.

Note: benign variants with a frequency greater than 0.015 are excluded when counting the clinically reported variants within a given domain or hot-spot.

Rule PM1 is disabled for mitochondrial variants, in line with ClinGen Guidelines.

BP3

In-frame deletions/insertions in a repetitive region without a known function. (Benign, Supporting)

This rule is the benign counterpart to rule to PM1:

  • it uses UniProt to ensure the variant isn't in a known functional domain
  • it checks whether the variant is in a repeat region,
  • it further checks whether the variant is in a region of low-conservation (maximum PhyloP100Way less than 3.58),
  • lastly it checks whether there are any known pathogenic variants in the region considered.

Rule BP3 will be disabled if rule PM1 triggered.

Rule BP3 is disabled for mitochondrial variants, in line with ClinGen Guidelines.

PM2

Absent from controls (or at extremely low frequency if recessive) in Exome Sequencing Project, 1000 Genomes Project, or Exome Aggregation Consortium. (Pathogenic, Moderate)

We first established the gene's mode of inheritance.

The rule will trigger if the allele frequency is not found in GnomAD, with valid coverage, or:

  • For dominant genes (including X-Linked and AD/AR) we check that the allele count is less than 5.
  • For recessive genes (AR): the rule will trigger if the homozygous allele count is less than 2. Alternatively the rule also checks whether the allele frequency is less than 0.0001.
  • For mitochondrial variants, rule PM2 will trigger if the allele frequency is below 2e-05 per ClinGen Guidelines.

In line with the SVI Recommendation for Absence/Rarity (PM2) - Version 1.0, rule PM2 always triggers with strength supporting. Rule PM2 may be phased out altogether in future.

PM4

Protein length changes as a result of in-frame deletions/insertions in a non-repeat region or stop-loss variants. (Pathogenic, Moderate)

This rule applies to in-frame indels or stop-loss variants that cause the length of the protein to change. The rule will not fire if the variant is in a repeat region, as reported by UniProt, or by checking for repetitive sequences in the reference genome.

In order to avoid double-counting the same evidence, rule PM4 will not be applied if rule PVS1 was triggered.

PM5

Novel missense change at an amino acid residue where a different missense change determined to be pathogenic has been seen before. (Pathogenic, Moderate)

This rule is a weaker version of PS1, it similarly only applies to missense variants, but considers all possible amino acid missense variants in the same codon. The rule will trigger if any pathogenic variants are identified in the clinically reported variants. We then check whether they are independently confirmed pathogenic using the ACMG rules, and if not will reduce the rule strength to accordingly. The rule also applies with supporting strength to non-coding mitochondrial variants if there exists a known pathogenic variant in the same position.

PP2

Missense variant in a gene that has a low rate of benign missense variation and in which missense variants are a common mechanism of disease. (Pathogenic, Supporting)

In order to avoid double-counting the same evidence, rule PP2 will not be applied if rule PM1 was triggered.

BP1

Missense variant in a gene for which primarily truncating variants are known to cause disease. (Benign, Supporting)

PP2 and BP1

These two “variant spectrum” rules are very similar: they only apply to missense variants and leverage the gene statistics for the relevant gene:

  • PP2 checks that the ratio of pathogenic missense variants over all non-VUS missense variants is greater than 0.808
  • BP1 conversely checks that the ratio of benign missense variants over all non-VUS missense variants is greater than 0.569.

The calibration section explains how these thresholds are established. It is notable that rule PP2 only meets a supporting level of evidence in genes with very little benign missense variation.

These two rules are disabled for mitochondrial variants, inline with the corresponding ClinGen Guidelines.

PP3

Multiple lines of computational evidence support a deleterious effect on the gene or gene product (conservation, evolutionary, splicing impact, etc.) (Pathogenic, Supporting)

BP4

Multiple lines of computational evidence suggest no impact on gene or gene product (conservation, evolutionary, splicing impact, etc.) (Benign, Supporting)

PP3 and BP4

The two rules PP3 and BP4 use a common implementation, based on the updated methodology presented in-silico predictions. As a result the rules to trigger with strengths of Supporting through Strong depending on the confidence level of the estimate.

The in-silico prediction data-sets used are static, many are sourced from dbNSFP which includes predictions for all non-synonymous coding single-nucleotide variants, but some (DANN & CADD for example) are available for a much wider range of non-coding SNVs, and are sourced directly from the provider.

Primary Prediction

The published guidance recommends only using a single in-silico predictor for missense variants. Based on our own calibration and accuracy tests, we have picked the following in-silico prediction tools (although many more are available in the VarSome UI). Only one of the following will be used, depending on whether the score is available for the variant considered:

Splice Prediction

We use scSNV ADA Boost and MaxEntScan for splice-site prediction. Rule PP3 will trigger if a variant is predicted splicing (score >= 0.958), and loss of function is a known cause of disease, overriding any benign primary prediction. The strength may be further increased if the position is conserved or the principle prediction was also pathogenic.

Conservation

If no other prediction score is available, we use phyloP as a simple fall-back, returning a pathogenic prediction if phyloP is greater than 7.52, or benign if the variant is non-truncating and phyloP is less than 3.58.

Alternatively, conservation may be used to adjust the strength of the principle prediction.

Adjustments for PP3

Rule PP3 is disabled if either rule PVS1 or PM4 are triggered, in order to avoid double-counting similar evidence.

In line with guidance from our clinical advisors, the strength of a pathogenic prediction is limited to Moderate if either rule PM1 or rule PM5 was triggered with strength Moderate.

Adjustments for BP4

In line with guidance from our clinical advisors, the strength of a pathogenic prediction is limited to Moderate if either rule PM1 or rule PM5 was triggered with strength Moderate.

The strength of rule PP3 is reduced to Supporting if there if there is moderate benign clinical evidence or if the variant is stopLoss.

As a consequence of the calibration methodology, rule BP4 will trigger with strength Moderate for missense variants in genes which have no reported pathogenic missense variants, or for missense variants with a gnomAD frequency greater than 0.01. Both of these cases are excluded from the threshold calibration process.

Rule BP4 is disabled if rule PVS1 triggered.

The strength of rule BP4 is reduced to Supporting if there if there is moderate pathogenic clinical evidence or if the variant is stopLoss.

As a consequence of the calibration methodology, rule BP4 will trigger with strength Moderate for missense variants in genes which have no reported pathogenic missense variants, or for missense variants with a gnomAD frequency greater than 0.01. Both of these cases are excluded from the threshold calibration process.

Rule BP4 may trigger in conjunction with rule BP7 which allows many non-truncating synonymous variants to be classified Likely Benign. Rule BP4 explicitly checks for conservation itself rather than relying solely on the principle prediction.

PP5

Reputable source recently reports variant as pathogenic, but the evidence is not available to the laboratory to perform an independent evaluation. (Pathogenic, Supporting)

BP6

Reputable source recently reports variant as benign, but the evidence is not available to the laboratory to perform an independent evaluation. (Benign, Supporting)

PP5 and BP6

Similarly to rules PS3 and BS3, these two rules leverage the clinically reported variants to report whether the variant has been clinically reported (see clinical evidence, but without any reference to in-vitro or functional studies).

The default strength for these rules is Supporting, per the ACMG Guidelines, however our implementation will use stronger rule strengths if borne out by the available evidence (see rule strengths). Whilst this may be considered not strictly in-line with the guidelines, it does allow us to ensure that critical clinical evidence is not missed. Users remain free to manually change the strength used when reviewing the classification.

In practice, we may boost the strength of the rule all the way up to 'Very Strong' if the evidence justifies it. We have deployed a system where each evidence is assigned a strength, and then we calculate an accumulative strength from all evidences, which is the final strength of the rule. The points assigned to each strength are:

  • Very Strong: 8 points
  • Strong: 4 points
  • Moderate: 2 points
  • Supporting: 1 point

The above points are also the thresholds for the final strength assigned to the rule, i.e. if there are two Supporting and one Moderate evidence, then the final strength is going to be Strong.

  • ClinVar
    • We use Very Strong if 'practice guideline' (4 stars), or 'reviewed by expert panel' (3 stars) or multiple reputable submitters agree in a classification (2 stars)
    • Strong if high number of consistent submissions from multiple sources including reputable source(s) (2 stars), or submission from a highly reputable source (1 star)
    • Moderate if low number of consistent submissions from multiple sources including reputable source(s) (2 stars), or single submission from reputable source, or conflicting but multiple reputable sources agree (1 star)
    • Supporting if low number of consistent submissions from multiple sources (2 stars), or single submitter or conflicting submissions but one being a reputable source (1 star)
  • VarSome curator team & VarSome user linked publications
    • Very Strong Multiple members of the curation team classify as P/B
    • Strong VarSome curator classify as P/B
    • Moderate VarSome curator team classify as LP/LB or high number of VarSome users linked publications
    • Supporting Multiple VarSome users linked publications
  • LOVD: we use Strong if the entry is from a curator, Moderate if the entry is a full submission, and Supporting in all other cases.
  • MitoMap: we use Strong if the status is Confirmed and Supporting if the status is Reported but not Confirmed.
  • UniProt: we use Supporting in all cases.

Note: rules PP3 or BS3 may trigger too, but the same evidence will not be counted twice.

Important: we provide an option to disable all the clinical evidence rules (PS3, BS3, PP5 & BP6) in the VarSome UI.

BA1

Allele frequency is >5% in Exome Sequencing Project, 1000 Genomes Project, or Exome Aggregation Consortium. (Benign, Stand Alone)

Rule BA1 is applied if the allele frequency is greater than the threshold 0.05. This is in strict concordance with the ACMG Guidelines and determines a variant to be stand-alone benign for Mendelian disease.

A lower frequency of 0.01 is used for mitochondrial variants.

The BA1 Exceptions have also been implemented, as recommended by ClinGen.

Note that rules BS1 and BS2 may trigger at much lower frequency thresholds.

BS1

Allele frequency is greater than expected for disorder. (Benign, Strong)

Here we find the highest GnomAD allele frequency for the variant across the main population ethnicities and compare this to the benign cut-off frequency derived from the gene statistics. If there are too few known variants (fewer than 4), we use a much higher default threshold, 0.015, for rare diseases.

For mitochondrial variants, a single frequency of 0.005 is used per ClinGen guidelines.

In order to avoid double-counting, rule BS1 is not evaluated if either rules BA1 or PM2 were triggered first.

Rule BS1 will not trigger if there is strong pathogenic clinical evidence.

BS2

Observed in a healthy adult individual for a recessive (homozygous), dominant (heterozygous), or X-linked (hemizygous) disorder, with full penetrance expected at an early age. (Benign, Strong)

We first determine the mode of inheritance of the gene, then compares the allele count (see allele frequency for quality checks) to the corresponding threshold:

  • recessive or X-linked genes: allele count greater than 2,
  • dominant genes: allele count greater than 5.

Rule BS2 is not evaluated if rule BA1 was triggered, to avoid double-counting the same evidence, and for performance we disable BS2 if rule PM2 triggered.

BP7

A synonymous (silent) variant for which splicing prediction algorithms predict no impact to the splice consensus sequence nor the creation of a new splice site AND the nucleotide is not highly conserved. (Benign, Supporting)

This rule applies to synonymous variants that are not deemed highly conserved using phyloP (see conservation).

For non-mitochondrial variants, splicing is checked as follows:

  • the variant is found more than 2 bases away from the next splice site,
  • it isn't predicted splicing using splice-site prediction.

Rule BP7 will be disabled if there is strong clinical evidence to the contrary (ie: possibly a cryptic splice-site).

PS2

De novo (both maternity and paternity confirmed) in a patient with the disease and no family history. (Pathogenic, Strong)

PM6

Assumed de novo, but without confirmation of paternity and maternity. (Pathogenic, Moderate)

PP1

Cosegregation with disease in multiple affected family members in a gene definitively known to cause the disease. (Pathogenic, Supporting)

BS4

Lack of segregation in affected members of a family. (Benign, Strong)

Unimplemented Rules

The following rules are not implemented or not currently available to VarSome users - in most cases this is because the necessary data required to evaluate the rules is not in the public-domain, or the rules require patient-specific information, sometimes on a per-variant basis. Should they have more evidence, users can manually toggle rules on or off in VarSome, or adjust the strength used, and the resulting classification will be re-evaluated immediately.

PS4

The prevalence of the variant in affected individuals is significantly increased compared with the prevalence in controls. (Pathogenic, Strong)

This rule has not been implemented.

PM3

For recessive disorders, detected in trans with a pathogenic variant (Pathogenic, Moderate)

This rule has not been implemented.

PP4

Patient’s phenotype or family history is highly specific for a disease with a single genetic etiology. (Pathogenic, Supporting)

This rule has not been implemented.

BP2

Observed in trans with a pathogenic variant for a fully penetrant dominant gene/disorder or observed in cis with a pathogenic variant in any inheritance pattern. (Benign, Supporting)

This rule has not been implemented.

BP5

Variant found in a case with an alternate molecular basis for disease. (Benign, Supporting)

This rule has not been implemented.

Implemented ClinGen SVI General Recommendations for Using ACMG/AMP guidelines

Implemented ClinGen Mitochondrial Variant Curation Expert Panel Specifications for using ACMG/AMP guidelines

We have implemented a number of the recommendations in Specifications of the ACMG/AMP standards and guidelines for mitochondrial DNA variant interpretation:

  • Specific frequency thresholds are used for rules BA1, BS1, and PM2.
  • Rule BP7 is applied to all synonymous mitochondrial variants.
  • Rules PP2 and BP1 are disabled.
  • Rule PVS1 does not attempt to evaluate NMD.
  • Two specific in-silico scores are used: MitImpact for missense variants, and MitoTip for non-coding variants.
  • Rule PM5 applies with supporting strength to non-coding mitochondrial variants if a known pathogenic variant exists in the same position.

Databases

The VarSome automated classification processes rely on vast quantities of accurate curated data from the following databases (in no particular order).

Important:depending on licensing agreements and in some cases the fees charged by source organisations, not all databases are visible to all users, and this may directly impact the completeness or quality of automated classifications.

Databases used by the germline variant classifier

  1. UniProt Variants, provided by UNIPROT, version 07-Feb-2025 (72.5k records)
  2. UniProt Regions, provided by UNIPROT, version 07-Feb-2025 (283k records)
  3. RefSeq, provided by NCBI, version 228
  4. phyloP100way, provided by CSH, version 13-Apr-2021 (3.14G records)
  5. PanelApp, provided by Genomics England, version 17-Feb-2025
  6. MitoTip, provided by CHOP, version 13-Dec-2022 (11.1k records)
  7. Mitomap, provided by CHOP, version 08-Dec-2023 (39.0k records)
  8. MitImpact, provided by IRCCS, version 13-Dec-2022 (48.2k records)
  9. MaxEntScan, provided by Burge Lab, version 5-Apr-2023
  10. LOVD, provided by LUMC, version 19-Feb-2025
  11. HPO, version 07-Feb-2025 (19.0k records)
  12. gnomAD Mitochondrial, provided by Broad, version 3.1 (18.2k records)
  13. gnomAD genomes coverage, provided by Broad, using version 2.1 (3.14G records) for hg19, and using version 3.0 (3.21G records) for hg38
  14. gnomAD genomes, provided by Broad, using version 2.1.1 (262M records) for hg19, and using version 4.1 (759M records) for hg38
  15. gnomAD gene constraints, provided by Broad, version 4.1 (18.6k records)
  16. gnomAD exomes coverage, provided by Broad, using version 2.1 (59.6M records) for hg19, and using version 4.0 (169M records) for hg38
  17. ClinVar, provided by NCBI, version 07-Feb-2025 (3.22M records)
  18. ClinGen Disease Validity, provided by NIH, version 07-Feb-2025 (2.50k records)
  19. CADD, provided by UW, version 1.7
  20. CGD, provided by NHGRI, version 03-Jul-2024 (4.74k records)
  21. DANN SNVs, provided by UCI, using version 2014 (9.41G records) for hg19, unavailable for hg38
  22. dbNSFP-c, provided by dbNSFP, version 4.9 (82.8M records)
  23. dbNSFP genes, provided by dbNSFP, version 4.9 (21.5k records)
  24. dbscSNV, provided by dbNSFP, version v1.1 (15.0M records)
  25. Domino, provided by UNIL, version 04-Sep-2019 (17.9k records)
  26. Ensembl, provided by EMBL, version 113
  27. GenCC, version 07-Feb-2025 (5.17k records)
  28. gene2phenotype, provided by EBI, version 04-Oct-2024 (2.89k records)
  29. gnomAD exomes, provided by Broad, using version 2.1.1 (17.2M records) for hg19, and using version 4.1 (184M records) for hg38
  30. Papers & classifications contributed by the VarSome community.

Other Databases

VarSome also annotates variants using the following databases, although these are not currently leveraged by the automated classifications:

  1. VCF attributes, provided by generic, version generic VCF file
  2. TP53 Somatic, provided by IARC, version release 20 (2.45k records)
  3. TP53 Germline, provided by IARC, version release 20 (436 records)
  4. DGV, provided by TCAG, version 30-Jun-2021 (792k records)
  5. Semantic Scholar, provided by Allen Institute
  6. Cancer Gene Census, provided by Sanger, version v101
  7. Pub Med, provided by NCBI
  8. The Human Protein Atlas, provided by KAW, version 14-Mar-2024 (20.1k records)
  9. PMKB, provided by Weill Cornell Medicine, version 08-Nov-2024 (161 records)
  10. phastCons100way, provided by CSH, version 14-Apr-2021 (3.14G records)
  11. PharmGKB, version 07-Feb-2025
  12. OncoTree, provided by MSK, version 15-Jan-2024
  13. Mondo, provided by Monarch, version 07-Feb-2025
  14. Mastermind, provided by Genomenon, version 230612 (22.6M records)
  15. kaviar3, provided by ISB, version 4-Feb-2016 (83.3M records)
  16. ICGC somatic, provided by ICGC
  17. HGNC, provided by HUGO, version 13-Feb-2025
  18. GWAS Catalog, provided by EBI, version 07-Feb-2025 (789k records)
  19. GTEx, provided by NIH, version v8 (313k records)
  20. CPIC Genes-Drugs, provided by CPIC, version 07-Feb-2025
  21. Cosmic Licensed, provided by Sanger, version v101
  22. ClinVar CNVs, provided by NCBI, version 07-Feb-2025 (61.6k records)
  23. ClinGen Variants, provided by NIH, version 07-Feb-2025 (9.72k records)
  24. ClinGen Regions, provided by NIH, version 07-Feb-2025 (516 records)
  25. ClinGen CNVs, provided by NIH, version 07-Feb-2025 (156 records)
  26. ClinGen, provided by NIH, version 07-Feb-2025 (1.56k records)
  27. CKB, provided by JAX, version 23-Feb-2025
  28. CIViC, provided by WUSTL, version 08-Dec-2023 (849 records)
  29. AACT, provided by CTTI, version 07-Feb-2025
  30. AlphaMissense, provided by HL, version 03-Jul-2024 (69.1M records)
  31. Analysis-specific variant data, provided by generic, version any
  32. BAM Coverage, provided by generic
  33. Bravo, provided by UMICH, using version Freeze5 (25.5M records) for hg19, and using version Freeze8 (75.5M records) for hg38
  34. CancerHotspots, provided by MSK, version 10-Sep-2021 (2.25M records)
  35. cBioPortal, provided by MSK, version 06-Jun-2023 (19.5M records)
  36. DailyMed, provided by NIH, version 03-Sep-2021
  37. dbNSFP-p, provided by dbNSFP
  38. dbSNP, provided by NCBI, version build 156 (1.27G records)
  39. dbVar, provided by NCBI, version 03-Jul-2024 (3.06M records)
  40. DVD, provided by UOI, using version v9 (2.49M records) for hg19, unavailable for hg38
  41. DECIPHER, provided by Sanger, version 07-Feb-2025 (31.0k records)
  42. DoCM, provided by WUSTL, version 07-Jun-2022 (1.24k records)
  43. DGI, provided by WUSTL, version 04-Jun-2024
  44. EMA Approved Drugs, provided by EMA, version 03-Sep-2021
  45. EVE, provided by OATML, unavailable for hg19version 07-Jun-2022 (4.73M records) for hg38
  46. ExacCNV, provided by Broad, using version 01-Jul-2021 (49.3k records) for hg19, and using version 20180227 (48.6k records) for hg38
  47. ExAC genes, provided by Broad, version 18-Sep-2018 (18.3k records)
  48. FDA Approved Drugs, provided by FDA, version 03-Sep-2021
  49. Pharmacogenomic Biomarkers, provided by FDA, version 19-Sep-2022
  50. FusionGDB, provided by UTexas, version 19-Nov-2021 (15.6k records)
  51. GDC, provided by NIH, version 08-Dec-2023 (2.17M records)
  52. GERP, using version 2010 (2.60G records) for hg19, unavailable for hg38
  53. GHR Genes, provided by NLM, version 05-Dec-2024 (1.50k records)
  54. gnomAD structural variants, provided by Broad, version 30-Jun-2021 (334k records)
(Version information subject to change at any time, some databases may require a license and may not be displayed).

dbNSFP Sources (non-synonymous coding SNVs)

Additional sources annotated using the dbNSFP database:

Functional predictions:

  • ALoFT
  • BayesDel
  • DEOGEN2
  • Eigen
  • Eigen-PC
  • FATHMM
  • FATHMM-XF
  • FATHMM-MKL
  • fitCons
  • LIST-S2
  • LRT
  • M-CAP
  • MetaLR
  • MetaRNN
  • MetaSVM
  • MPC
  • MutationAssessor
  • MutationTaster
  • MutPred
  • MVP
  • Polyphen-2
  • PrimateAI
  • PROVEAN
  • REVEL
  • SIFT
  • SIFT4G

Conservation scores:

  • bStatistic
  • phastCons100way Vertebrate
  • phastCons30way Mammalian
  • phastCons17way Primate
  • phyloP100way Vertebrate
  • phyloP30way Mammalian
  • phyloP17way Primate
  • SiPhy

Gene annotation sources:

  • BioCarta
  • Consensus
  • egenetics
  • Essential Genes
  • GDI
  • Gene Ontology
  • GHIS
  • GNF/Atlas
  • HIPred
  • KEGG
  • LoFTool
  • Mouse genes
  • P(HI) Score
  • P(rec) Score
  • RVIS
  • UniProt Genes
  • Zebrafish genes