Human Genome Variation Society (HGVS) nomenclature, the Recommendations for the description of sequence variants in DNA, RNA, and protein sequences, is authorized by the Human Genome Organization (HUGO), under the responsibility of the HGVS Variant Nomenclature Committee (HVNC).
Reference sequences are sequences used as references to provide a description of the variants found in an analyzed sequence. They must be accepted and the file must be public and clearly described. The reference sequence used must contain the described residue(s) to be modified. The recommended reference is a reference genomic sequence based on a recent genome construct, for humans the recommended reference is based on GRCh38/hg38. The MANE project (a cooperative venture between EMBL-EBI’s Ensembl project and NCBI’s RefSeq project) suggests the recommended reference sequence when variants are reported in relation to a transcript.
Recommended Reference Sequences types are as follows:
RefSeq: NC_, NT_, NW_,NG_, NM_, NR_ or NP_
- Chromosome – NC_
- Genomic contigs or scaffolds – NT_, NW_
- Gene/genomic region – NG_
- Coding transcript – NM_
- Non-coding transcript – NR_
- Protein – NP_
Ensembl transcript (ENST) and protein (ENSP) which are not identified by Ensembl as being incomplete
- Gene/genomic region – ENSG
- Coding transcript – ENST
- Non-coding transcript – ENST
- Protein – ENSP
LRG: LRG_#, LRG_#t#, LRG_#p#
Examples reported by HGVS:
- Gene/genomic region – LRG_199
- Coding transcript (or non-coding transcript) – LRG_199t1
- Protein – LRG_199p1
Please note: Locus Reference Genomic (LRG) sequences, although still accepted, no new ones are generated, and RefSeq or Ensembl transcripts specified by the MANE project are preferred for all genes, if available, to improve standardization of reporting.
DNA, RNA or protein reference sequence
The description of a variant is preceded by the use of a prefix (lowercase letter followed by a period) indicating the type of reference sequence used.
DNA
- g. = linear genomic reference sequence
- o. = circular genomic reference sequence
- m. = mitochondrial reference
- c. = coding DNA reference sequence
- n. = non coding DNA reference sequence
RNA
- r. = RNA reference sequence
Protein
- p. = protein reference sequence
IMPORTANT: Some information inherent in the current consensus of international recommendations has been reported in this brief report. Please consult the guidelines for complete information. Also important to note the words reported on VarNomen: “For additional questions, comments or examples of cases not yet covered, with a suggestion on how to describe them, you can contact VarNomen@HUGO-int.org ” (Find the sites in the bibliography).
Bibliography
- Den Dunnen et al. 2016, Hum. Mutat. 37:564-569.
- http://www.hgvs.org/content/guidelines
- http://varnomen.hgvs.org/bg-material/refseq/