HADA: automated annotation of next-generation sequencing data in hereditary angioedema studies

Name: *

Email address: *

Institution/Company: *

Country: *

Allow HADA to access your location?

Case description:

Input method:

Paste Upload a VCF

Input variations:

Upload a VCF file:

Browse...

Annotation databases

Notes:

The application will not store or share any uploaded or analysed data on the server.

HADA input can be provided in different ways for annotation process:

As VCF in v.4.2
As list of variants as is shown in the example: chr1-31413-A-T
As Reference SNP cluster IDs

It is not allowed to upload or analyse multisample VCF files.

Download data

Find predictions for any unrelated HAE variant in associated genes: Download related data

User guide

HADA is designed for annotate VCF and plot pathogenicity results in Shiny R package for reported variation in Hereditary Angioedema (hereinafter namely as HAE).

Input files

HADA input can be provided in different ways for annotation process:

A VCF in v.4.2 (https://samtools.github.io/hts-specs/VCFv4.2.pdf)
As list of variants in genomic coordinates format separating each field as is shown in the example: CHR-POSITION_START-REFERENCE_ALLELE-ALTERNATIVE_ALLELE
As Reference SNP cluster ID, as known as rs-number could be identified as input variant. More than one variant can be provided as list of rs-numbers.

A reference of input formats is provided at input page by clicking in “example_variants”, allowing a preview use of HADA to users.

Note: In case of a VCF will be selected as input, only related variation with Hereditary Angioedema will be provided to users.

Variant annotation

HADA integrates information of pathogenicity predictors such as SIFT, POLYPHEN2, LRT, METASVM, etc. for reported variation in Hereditary Angioedema. Through literature revision and databases exploration, causal variation is registered in HADA database, allowing to users a fast way to detect and analyse genetic variation related to HAE.

Variant characteristics:

CHR: Chromosome.
POS (hg19): Genomic coordinates of the variant according to GRCh37/hg19 build.
REF: Reference allele in GRCh37/hg19.
ALT: Alternative allele observed in the sample.
GENE: Corresponding HUGO name of the gene contain of the variant.
EXON: Exon where the variant is located.
HGVS CODING: Indication of the change according to HGVS nomenclature, used to report and exchange information regarding variants found in DNA, RNA and protein sequences and serves as an international standard.
AMINOACID CHANGE: The notation defined by the Human Genome Variation Society (HGVS), based on the position of the variant in the protein.
FUNCTION: Gene function in human genome.
ACMG CLASSIFICATION: Pathogenicity classification predicted on-the-fly according to ACMG guidelines.
CLINVAR CLINICAL SIGNIFICANCE: Pathogenicity classification obtained from ClinVar.
INTERVAR CLASSIFICATION: Pathogenicity classification obtained from InterVar tool.
EFFECT: Coding effect of the indicated variant.
VARSOME LINK: Link to VarSome tool, provided with graphical representation of the variant location.
REFERENCE SNP CLUSTER ID: Rs ID for each variation (when it is available) obtained from avsnp150 database and scientific literature review.
NOTES: HAEdb ID and pathogenicity classification and notes (HAE type, mutation effect in the protein, associated diseases, etc).
CITE: Reference citation where the mutation is detected. It is ordinated by chronogram.
DATABASE: Indication about where the mutation was detected and analyzed.
FUNC REF GENE: Indication about the nature of the variant.

Frequency of alternative allele annotation

ExAC: A database that incorpores allele frequency calculation by 60,706 individuals sequenced in different genetic studies. This dataset serves as a reference set of allele frequencies for disease studies. ExAC provides frequencies for several human populations.

ExAC_ALL: Allele frequency in total ExAC samples.
ExAC_AFR: Allele frequency in African/African American ExAC samples.
ExAC_AMR: Allele frequency in Latino ExAC samples.
ExAC_EAS: Allele frequency in East Asian ExAC samples.
ExAC_FIN: Allele frequency in Finnish ExAC samples.
ExAC_NFE: Allele frequency in Non-Finnish European ExAC samples.
ExAC_SAS: Allele frequency in South Asian ExAC samples.
ExAC_OTH: Allele frequency in Other ExAC samples.

gnomAD: The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects and making summary data available for the wider scientific community. The data set provided on this website spans 125,748 exome sequences and 15,708 whole-genome sequences from unrelated individuals sequenced as part of various disease-specific and population genetic studies.

gnomAD_exome_ALL: Allele frequency in total gnomAD samples.
gnomAD_exome_AFR: Allele frequency in African/African American gnomAD samples.
gnomAD_exome_AMR: Allele frequency in Admixed American gnomAD samples.
gnomAD_exome_ASJ: Allele frequency in Ashkenazi Jewish gnomAD samples.
gnomAD_exome_EAS: Allele frequency in East Asian gnomAD samples.
gnomAD_exome_FIN: Allele frequency in Finnish gnomAD samples.
gnomAD_exome_NFE: Allele frequency in Non-Finnish European gnomAD samples.
gnomAD_exome_SAS: Allele frequency in South Asian gnomAD samples.
gnomAD_exome_OTH: Allele frequency in Other gnomAD samples.

Pathogenicity scores

SIFT: SIFT predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids. It was build for missense deleteriousness prediction.

SIFT_SCORE: Scores range from 0 to 1. The smaller the score the more likely the SNP has damaging effect.
SIFT_CONVERTED_RANKSCORE: SIFT scores were first converted to SIFTnew=1-SIFT, then ranked among all new SIFTnew scores in dbNSFP. The rankscore is the ratio of the rank the SIFTnew score over the total number of SIFTnew scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The rankscores range from 0.02654 to 0.87932.
SIFT_PRED: If SIFT score is smaller than 0.05 (rankscore>0.55) the corresponding NS is predicted as D(amaging); otherwise it is predicted as T(olerated).

Polyphen2: PolyPhen-2 (Polymorphism Phenotyping v2) is a tool for possible impact prediction of amino acid substitutions on the structure and function of human protein using straightforward physical and comparative considerations.

POLYPHEN2_HDIV_SCORE: Polyphen2 score based on HumDiv, i.e. hdiv_prob. The score ranges from 0 to 1.
Polyphen2_HDIV_rankscore: Polyphen2 HDIV scores were first ranked among all HDIV scores in dbNSFP. The rankscore is the ratio of the rank the score over the total number the scores in dbNSFP. If there are multiple scores, only the most damaging (largest) of rankscore is presented. The scores range from 0.02656 to 0.89917.
Polyphen2_HDIV_pred: Polyphen2 prediction based on HumDiv, D (probably damaging, HDIV score in [0.957,1] or rankscore in [0.52996,0.89917]), P (possibly damaging, HDIV score in [0.453,0.956] or rankscore in [0.34412,0.52842]) and B (benign, HDIV score in [0,0.452] or rankscore in [0.02656,0.34399]). Score cutoff for binary classification is 0.5 for HDIV score or 0.35411 for rankscore, i.e. the prediction is neutral if the HDIV score is smaller than 0.5 (rankscore is smaller than 0.35411), and deleterious if the HDIV score is larger than 0.5 (rankscore is larger than 0.35411).

MutationTaster2: Evaluates the pathogenic potential of DNA sequence alterations. It is designed to predict the functional consequences of not only amino acid substitutions but also intronic and synonymous alterations, short insertion and/or deletion (indel) mutations and variants spanning intron-exon borders.

MutationTaster_score: MutationTaster p-value score (MTori), ranges from 0 to 1.
MutationTaster_converted_rankscore: The scores were first converted: if the prediction is A or D MTnew=MTori; if the prediction is N or P, MTnew=1- MTori. Then MTnew scores were ranked among all MTnew scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of MTnew scores in dbNSFP. The scores range from 0.09067 to 0.80722.
MutationTaster_pred: MutationTaster prediction, A (disease_causing_automatic), D (disease_causing), N (polymorphism) or P (polymorphism_automatic). The score cutoff between D and N is 0.5 for MTori and 0.31655 for the rankscore.

Combined Annotation Dependent Depletion (CADD): CADD is a tool for scoring the deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome. It integrates multiple annotations into one metric by contrasting variants that survived natural selection with simulated mutations.

CADD_raw: CADD raw score for funtional prediction of a SNP. The larger the score the more likely the SNP has damaging effect.
CADD_raw_rankscore: CADD raw scores were ranked among all CADD raw scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of CADD raw scores in dbNSFP.
CADD_phred: CADD phred-like score. This is phred-like rank score based on whole genome CADD raw scores. The larger the score the more likely the SNP has damaging effect. CADD score above >30, its highly pathogenic. Score >20 then its pathogenic. Between 15-20, likely pathogenic. Below <15-10, likely benign and below <10, its benign.

Likelihood Ratio Test (LRT): The likelihood ratio test (LRT) is a statistical test of the goodness-of-fit between two models. A relatively more complex model is compared to a simpler model to see if it fits a particular dataset significantly better. If so, the additional parameters of the more complex model are often used in subsequent analyses. The LRT is only valid if used to compare hierarchically nested models.

LRT_score: The likelihood ratio test (LRT) is a statistical test of the goodness-of-fit between two models. The original LRT two-sided p-value (LRTori), ranges from 0 to 1.
LRT_converted_rankscore: LRTori scores were first converted as LRTnew=1-LRTori*0.5 if Omega<1, or LRTnew=LRTori*0.5 if Omega>=1. Then LRTnew scores were ranked among all LRTnew scores in dbNSFP. The rankscore is the ratio of the rank over the total number of the scores in dbNSFP. The scores range from 0.00166 to 0.85682.
LRT_pred: LRT prediction, D(eleterious), N(eutral) or U(nknown), which is not solely determined by the score.

MetaSVM: Achieves the purpose of meta-analysis as jointly leveraging multiple omics data. Meta-SVM is a meta-analytic support vector machine (SVM) that can accommodate multiple omics data, making it possible to detect consensus genes associated with diseases across studies. The objective function of Meta-SVM applies the hinge loss and the sparse group lasso. It also facilitates identifying potential biomarkers and elucidating the disease process.

MetaSVM_score: The support vector machine (SVM) is based on ensemble prediction score, which incorporated 10 scores (SIFT, PolyPhen-2 HDIV, PolyPhen-2 HVAR, GERP++, MutationTaster, Mutation Assessor, FATHMM, LRT, SiPhy, PhyloP) and the maximum frequency observed in the 1000 genomes populations. Larger value means the SNV is more likely to be damaging. Scores range from -2 to 3 in dbNSFP.
MetaSVM_rankscore: MetaSVM scores were ranked among all MetaSVM scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of MetaSVM scores in dbNSFP. The scores range from 0 to 1.
MetaSVM_pred: Prediction of SVM based ensemble prediction score, classifying them as T(olerated) or D(amaging). The MetaSVM_score cutoff between D and T is 0. The MetaSVM_rankscore cutoff between D and T is 0.83357.

PhastCons:

phastCons100way_vertebrate: phastCons conservation score based on the multiple alignments of 100 vertebrate genomes (including human). The larger the score, the more conserved the site.
phastCons100way_vertebrate_rankscore: phastCons100way_vertebrate scores were ranked among all phastCons100way_vertebrate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phastCons100way_vertebrate scores in dbNSFP.

PON-P2: PON-P2 is a random forest predictor for pathogenicity-association of amino acid substitutions. PON-P2 makes predictions only for variations leading to amino acid substitutions.

POP-P2_Probability_Of_Pathogenicity: Probability estimated by PON-P2 algorithm. The higher the probability, the higher the damaging effect of the variation analysed.
POP-P2_Standard_Error: Standard error generated in the estimation of pathogenicity probability.
POP-P2_Prediction: Sorts amino acid substitutions into three categories-pathogenic, neutral or unknown tolerance. Variants with more than 80% of probability of pathogenicity were classified as pathogenic. Variants with probability between 80-20% were classified as unknown and less than 20% of probability constitute the neutral class.

Plot generation

Users can plot the results by histograms or pie graphs with pathogenicity scores or ACMG class. For its representation, users need to select the score in gene plot tab. Users can generate a combined representation of ACMG classes using the information provided by ClinVar, Intervar and Varsome, allowing the benchmark the differences in pathogenicity classification for the same variant and all submitted variation.

Download results

HADA annotation results can be downloaded after processing steps in different format files such as text file (.txt), comma separated values (.csv), tab separated values (.tsv) or variant calling file (.VCF) with the pathogenicity information for each detected variant. If a VCF format is selected for downloading, users can choose a short file describing only HAE variants detected or a large format with all variants calls and annotated for HAE variants.

About HADA

How to cite HADA

Mendoza-Alvarez A, Muñoz-Barrera A, Rubio-Rodríguez LA, Marcelino-Rodríguez I, Corrales A, Iñigo-Campos A, Callero A, Perez-Rodriguez E, García-Robaina JC, González-Montelongo R, Lorenzo-Salazar JM, Flores C. Interactive web-based resource for annotation of genetic variants causing hereditary angioedema (HADA): Database, development, implementation, and validation. Journal of Medical Internet Research 2020, 22: e19040.

What is HADA?

HADA (Hereditary Angioedema Database Annotation) is a publicity available database which reports and annotate the variation described in Hereditary Angioedema (hereafter namely as HAE) scientific literature. Through steps of analysing and processing the genetic variation in several pathogenicity prediction tools, we develop an update database for HAE variation. This database is constructed in Shiny (R) environment for graphical interface creation for clinician easy use.

With this tool, users can upload a Variant Calling File or a list of doubt causal variation for HAE and obtain the pathogenicity prediction provided by several algorithms such as SIFT, POLYPHEN2, MUTATIONTASTER, etc, as well as obtain the pathogenicity classification established by the American College of Medical Genetics and Genomics (ACMG).

HADA is a project created by efforts of the Instituto Tecnológico y de Energías Renovables (ITER, Institute of Technology and Renewable Energies) and the Servicio Canario de la Salud (Canarian Health Service). Both entities are based in Tenerife, Canary Islands, Spain. This Shiny App runs on ITER’s Teide-HPC, which is the second most powerful computer in Spain at present.

Users can find instructions for HADA usage in the User Guide tab. If you are interested in a command-line format annotation, please see our HADA GitHub (https://github.com/genomicsITER/HADA). Changes and updates in both formats will be indicate in corresponding changelogs.

Our current group members are:

This project has been created with the efforts of several researchers from different institutions.

Institutions:

Hospital Universitario Nuestra Señora de Candelaria (HUNSC)
Instituto Tecnológico y de Energías Renovables (ITER)
Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBERES)
Instituto de Tecnologías Biomédicas (ITB)
Universidad de La Laguna (ULL)

Members:

Alejandro Mendoza-Alvarez, PhD Student in Health Sciences (HUNSC, ULL)
Adrián Muñoz-Barrera, PhD Student in Computer Science (ULL, ITER)
Luis A. Rubio-Rodríguez, PhD Student in Computer Science (ULL, ITER)
Itahisa Marcelino-Rodriguez, PhD in Health Sciences (HUNSC)
Ariel Callero, MD in Allergology (HUNSC)
Jose M. Lorenzo-Salazar (ITER)
Carlos Flores, PhD in Biology (HUNSC, ITER, CIBERES, ITB, ULL)

Evolution and Use:

The total number of HADA submissions is displayed below, as well as the location of users (for statistical use only).

Number of submissions: 272

Terms and Privacy Policy:

Academic User Version

Provider: The provider is composed by:
- ITER, S.A.: Instituto Tecnológico y de Energías Renovables, (A38259115), Polígono Industrial de Granadilla, s/n. 38660 – Granadilla de Abona, Santa Cruz de Tenerife, Spain.
- FIISC: Fundación Canaria Instituto de Investigación de Canarias, (G76208396), Bco. de la Ballena s/n, Edf. Anexo al Hospital Univ. De Gran Canaria "Dr Negrín". 35019 - Las Palmas de Gran Canaria, Spain.
- ULL: Universidad de La Laguna, (Q3818001D), Calle Padre Herrera, s/n, 38200 San Cristóbal de La Laguna, Santa Cruz de Tenerife, Spain.
User: The person who has clicked through the acceptance of these terms and conditions.

AGREED TERMS

INTERPRETATION
1. In this agreement:
  - Consent means the right and license given to the User in clause 2.1;
  - System means the HADA client and server backend computer program, as well as any of its components, access to and the use of which is offered by the Provider to the User under this agreement via HADA usage with this address: https://hada.hpc.iter.es;
  - User’s data means data and other information that is generated by the System and provided to the User by the System as a result of receiving and processing the User’s data under this agreement, excluding any part of the System.
USER ACCESS CONSENT
1. The Provider hereby gives the User a personal, nonexclusive, non-transferable, free of charge right to access and use the System (without any right of sub-license) at HADA.
2. The Provider consent is subject to the other terms and conditions of this agreement.
3. The User undertakes to and agrees with the Provider that he/she will:
  1. not use the System for or on behalf of any third party or to provide a service;
  2. limit his/her use of the System to his/her own internal academic or other noncommercial research;
  3. use the System in accordance with the prevailing instructions and guidance for use given on HADA and faithfully comply with the Provider procedures for User identification, authentication and access;
  4. comply with all applicable laws and regulations with respect to his/her use of the System; and
  5. except to the extent expressly permitted under this agreement, not attempt to: reverse compile, disassemble, reverse engineer or copy, modify, duplicate, create derivative works from, frame, mirror, republish, download, display, transmit, or distribute all or any portion of the System or HADA in any form or media or by any means.
4. The User undertakes and agrees with the Provider not to access, store, send or receive to or from the System:
  1. any data or other material from which a human being may be directly or indirectly identified by any means; or
  2. any material that is unlawful, harmful, threatening, defamatory, obscene or offensive or which infringes the rights of any person.
5. The Provider reserves the right at any time and without liability or prior notice to the User to:
  1. revise, modify and replace the specification, functionality and performance of the System, including procedures governing access, security and operation;
  2. suspend or permanently discontinue availability of the System; and
  3. disable the User’s access to the System or to any part temporarily or permanently.
OWNERSHIP OF USER DATA AND ACKNOWLEDGMENT
1. The Provider makes no claim to any of the data that is input into the System by the User, who shall have sole responsibility for the legality, reliability, integrity, accuracy and quality of that data.
2. The User shall own the User’s data. The User shall assume sole responsibility for: (i) the User’s data; (ii) conclusions drawn from the User’s data; and (iii) for abstracting and backing up all User’s data.
3. The User must acknowledge in any publications or public statement that the User’s data was produced by HADA as described in: Mendoza-Alvarez, A., Muñoz-Barrera, A., Rubio-Rodríguez, L.A., Marcelino-Rodríguez, I., Corrales, A., Iñigo-Campos, A., Callero, A., García-Robaina, J.C., González-Montelongo, R., Lorenzo-Salazar, J.M., and Flores, C. (2020). HADA: an interactive web-based resource for the annotation of genetic variants causing hereditary angioedema. (Under revision).
4. The Provider will not store the User’s data indefinitely and will delete all User’s data that has been entered into the System once the User’s data has been processed, which will be notified by an email whose address will be provided by the User, as well as the deadline to download the results.
PROPRIETARY RIGHTS
1. The User shall not have any right, title or interest in or to the System or HADA contents or any results or other output from the System save as expressly given to the User in this agreement.
CONFIDENTIALITY
1. The User shall not send to the System any information that it wishes to keep confidential and the Provider does not accept any obligation to keep the User’s information confidential. However, the Provider will treat any identifying information the User sends to the System in accordance with Spanish privacy laws.
INDEMNITY
1. The User shall defend, indemnify and hold harmless the Provider against any claims, actions, proceedings, losses, damages, expenses and costs (including without limitation court costs and reasonable legal fees) arising out of or in connection with the User's possession or use of the User’s data or System, or any breach of this agreement by the User.
LIMITATION OF LIABILITY
1. The System is provided on an ‘as is’ basis and the User uses the System at his/her own risk. No representations, conditions, warranties or other terms of any kind are given in respect of the System or the User’s data, and all statutory warranties and conditions are excluded to the fullest extent permitted by law. Without affecting the generality of the previous sentences, the Provider gives no implied or express warranty and makes no representation that the System or any part of it:
  1. will enable specific results to be obtained; or
  2. that it meets a particular specification or is comprehensive within its field or that it is error free or will operate without interruption; or
  3. is suitable for any particular, or the User's specific, purposes; or
  4. will not cause any loss damage or injury; or
  5. that it is of satisfactory quality.
2. The user hereby confirms that he/she is aware that User's data processing by HADA is intended to show hereditary angioedema causal variants previously reported in scientific literature and which have been in-depth analyzed to limit the report of false positives. HADA has been exclusively designed for research purposes. Bugs in the System stated above may lead to predictions affected by any degree and rate of error. The User may under no circumstances use the System in any context which may affect or be in involved in decisions regarding individual health.
3. The Provider liability under or in connection with his agreement howsoever arising, including in respect of any negligent act or omission relating to the System, User’s data, or under this Agreement, shall:
  1. be limited in aggregate to $0; and
  2. exclude any liability for indirect or consequential loss or damage and for any loss of profit, reputation, or business or opportunity, even if any of these types of loss or damage were foreseeable as at the date of this agreement.
4. User hereby irrevocably undertakes to the Provider not to make any claim against any employee, student, researcher or other individual engaged by the Provider, being a claim, which seeks to enforce against any of them any liability whatsoever in connection with this agreement or its subjectmatter.
CONTRACT STATUS
1. Each visit to HADA and accompanying use of the System by the User shall be deemed to be pursuant to a separate and discrete contract made under the prevailing version of this agreement. The User understands and acknowledges that his/her acceptance of these Terms and Privacy Policy constitutes a binding contract.
WAIVER
1. A waiver of any right under this agreement by the Provider shall only be effective if it is in writing and signed by an authorized representative of the Provider.
SEVERANCE
1. If any provision (or part of a provision) of this agreement is found by any court or administrative body of competent jurisdiction to be invalid, unenforceable or illegal, the other provisions shall remain in force.
ENTIRE AGREEMENT
1. This agreement, and any documents referred to in it, constitute the whole agreement between the parties and supersede any previous arrangement, understanding or agreement between them relating to the subject matter they cover. The User acknowledges and agrees that it has not relied on any statement that is not expressly contained in this agreement.
NO PARTNERSHIP OR AGENCY
1. The User agrees that:
  1. the rights, duties, obligations and liabilities of the User and the Provider will in every case, be several and not joint or joint and several;
  2. nothing contained in this agreement constitutes the User and the Provider as joint venture, agent, partner or trustee, or creates any agency, partnership, joint venture or trust for any purpose whatsoever; and
  3. the User does not have any authority or power to act for, or to create or assume any responsibility or obligation on behalf of, the Provider.
PRIVACY POLICY
1. HADA uses geoloc and leaflet R packages and any information captured concerning the use of this application is governed by the Provider Privacy Policy, available at:
  - For ITER: https://www.iter.es/politica-de-privacidad/
  - For FIISC: http://www.funcanis.org/index.php
  - For ULL: https://www.ull.es/informacion-sobre-web-institucional/
GOVERNING LAW AND JURISDICTION
1. This agreement and any disputes or claims arising out of or in connection with it or its subject matter or formation (including non-contractual disputes or claims) shall be governed by and construed in accordance with the Spanish law.
2. The parties irrevocably agree that the courts of Spain have exclusive jurisdiction to settle any dispute or claim that arises out of or in connection with this agreement or its subject matter or formation (including non‐contractual disputes or claims).

Contact section

Name: *

Email address: *

Text: *

Request for inclusion of causal genetic variants

Gene:

Chr: *

Start position (hg19): *

End position (hg19):

If you will report a SNV, please indicate the same start position.

Reference allele: *

Alternate allele: *

Reference SNP cluster ID:

HGVS coding:

Cite for the study:

If the variant has been reported in a published scientific article.

Amino acid change:

ACMG classification: *

HAE type: *

Please indicate the type of angioedema for which the variant is reported.