Ensembl Variant Effect Predictor (VEP) | 變異注釋工具


https://asia.ensembl.org/info/docs/tools/vep/index.html

https://github.com/Ensembl/ensembl-vep

 

輸入一些variant的名字,出來一些注釋結果。

 

注釋結果:

#Uploaded_variation	Location	Allele	Consequence	IMPACT	SYMBOL	Gene	Feature_type	Feature	BIOTYPE	EXON	INTRON	HGVSc	HGVSp	cDNA_position	CDS_position	Protein_position	Amino_acids	Codons	Existing_variation	DISTANCE	STRAND	FLAGS	SYMBOL_SOURCE	HGNC_ID	MANE	TSL	APPRIS	SIFT	PolyPhen	AF	CLIN_SIG	SOMATIC	PHENO	PUBMED	MOTIF_NAME	MOTIF_POS	HIGH_INF_POS	MOTIF_SCORE_CHANGE	TRANSCRIPTION_FACTORS
rs1258750482	19:61902-61902	A	downstream_gene_variant	MODIFIER	WASH5P	ENSG00000282458	Transcript	ENST00000631796.1	processed_transcript	-	-	-	-	-	-	-	-	-	rs1258750482	3920	-1	-	HGNC	HGNC:33884	-	2	-	-	-	-	-	-	-	-	-	-	-	-	-
rs1258750482	19:61902-61902	A	downstream_gene_variant	MODIFIER	WASH5P	ENSG00000282458	Transcript	ENST00000631994.1	processed_transcript	-	-	-	-	-	-	-	-	-	rs1258750482	4476	-1	-	HGNC	HGNC:33884	-	5	-	-	-	-	-	-	-	-	-	-	-	-	-
rs1258750482	19:61902-61902	A	downstream_gene_variant	MODIFIER	WASH5P	ENSG00000282458	Transcript	ENST00000632089.1	processed_transcript	-	-	-	-	-	-	-	-	-	rs1258750482	3920	-1	-	HGNC	HGNC:33884	-	3	-	-	-	-	-	-	-	-	-	-	-	-	-
rs1258750482	19:61902-61902	A	downstream_gene_variant	MODIFIER	WASH5P	ENSG00000282458	Transcript	ENST00000632496.1	processed_transcript	-	-	-	-	-	-	-	-	-	rs1258750482	3920	-1	-	HGNC	HGNC:33884	-	3	-	-	-	-	-	-	-	-	-	-	-	-	-
rs1258750482	19:61902-61902	A	splice_region_variant,intron_variant,non_coding_transcript_variant	LOW	WASH5P	ENSG00000282458	Transcript	ENST00000632506.1	processed_transcript	-	2/2	-	-	-	-	-	-	-	rs1258750482	-	-1	-	HGNC	HGNC:33884	-	1	-	-	-	-	-	-	-	-	-	-	-	-	-
rs1258750482	19:61902-61902	A	downstream_gene_variant	MODIFIER	WASH5P	ENSG00000282458	Transcript	ENST00000633703.1	processed_transcript	-	-	-	-	-	-	-	-	-	rs1258750482	1919	-1	-	HGNC	HGNC:33884	-	5	-	-	-	-	-	-	-	-	-	-	-	-	-
rs1258750482	19:61902-61902	A	downstream_gene_variant	MODIFIER	WASH5P	ENSG00000282458	Transcript	ENST00000633719.1	retained_intron	-	-	-	-	-	-	-	-	-	rs1258750482	211	-1	-	HGNC	HGNC:33884	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
rs1258750482	19:61902-61902	A	downstream_gene_variant	MODIFIER	WASH5P	ENSG00000282458	Transcript	ENST00000633742.1	transcribed_processed_pseudogene	-	-	-	-	-	-	-	-	-	rs1258750482	4418	-1	-	HGNC	HGNC:33884	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
rs1258750482	19:61902-61902	A	downstream_gene_variant	MODIFIER	WASH5P	ENSG00000282458	Transcript	ENST00000634023.1	processed_transcript	-	-	-	-	-	-	-	-	-	rs1258750482	3149	-1	-	HGNC	HGNC:33884	-	5	-	-	-	-	-	-	-	-	-	-	-	-	-
rs1156485833	19:107157-107157	C	upstream_gene_variant	MODIFIER	OR4F17	ENSG00000176695	Transcript	ENST00000318050.4	protein_coding	-	-	-	-	-	-	-	-	-	rs1156485833	3486	1	-	HGNC	HGNC:15381	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
rs1156485833	19:107157-107157	C	splice_region_variant,5_prime_UTR_variant	LOW	OR4F17	ENSG00000176695	Transcript	ENST00000585993.3	protein_coding	1/3	-	-	-	54	-	-	-	-	rs1156485833	-	1	-	HGNC	HGNC:15381	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
rs1156485833	19:107157-107157	C	downstream_gene_variant	MODIFIER	OR4G1P	ENSG00000267310	Transcript	ENST00000588632.2	transcribed_unprocessed_pseudogene	-	-	-	-	-	-	-	-	-	rs1156485833	1685	1	-	HGNC	HGNC:8302	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
rs1156485833	19:107157-107157	C	missense_variant,splice_region_variant	MODERATE	OR4F17	ENSG00000176695	Transcript	ENST00000618231.3	protein_coding	1/2	-	-	-	54	9	3	K/N	aaG/aaC	rs1156485833	-	1	-	HGNC	HGNC:15381	-	-	P1	deleterious_low_confidence(0.03)	benign(0.062)	-	-	-	-	-	-	-	-	-	-
rs1156485833	19:107157-107157	C	downstream_gene_variant	MODIFIER	OR4G1P	ENSG00000267310	Transcript	ENST00000641173.1	processed_transcript	-	-	-	-	-	-	-	-	-	rs1156485833	1080	1	-	HGNC	HGNC:8302	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
rs1156485833	19:107157-107157	C	splice_region_variant,non_coding_transcript_exon_variant	LOW	OR4F17	ENSG00000176695	Transcript	ENST00000641591.1	processed_transcript	1/4	-	-	-	54	-	-	-	-	rs1156485833	-	1	-	HGNC	HGNC:15381	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
rs1156485833	19:107157-107157	C	downstream_gene_variant	MODIFIER	OR4G1P	ENSG00000267310	Transcript	ENST00000641984.1	processed_transcript	-	-	-	-	-	-	-	-	-	rs1156485833	1080	1	-	HGNC	HGNC:8302	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
rs867704559	19:110630-110630	TT	upstream_gene_variant	MODIFIER	OR4F17	ENSG00000176695	Transcript	ENST00000318050.4	protein_coding	-	-	-	-	-	-	-	-	-	rs867704559	13	1	-	HGNC	HGNC:15381	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
rs867704559	19:110630-110630	TT	5_prime_UTR_variant	MODIFIER	OR4F17	ENSG00000176695	Transcript	ENST00000585993.3	protein_coding	3/3	-	-	-	143	-	-	-	-	rs867704559	-	1	-	HGNC	HGNC:15381	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
rs867704559	19:110630-110630	TT	frameshift_variant	HIGH	OR4F17	ENSG00000176695	Transcript	ENST00000618231.3	protein_coding	2/2	-	-	-	60	15	5	T/TX	acT/acTT	rs867704559	-	1	-	HGNC	HGNC:15381	-	-	P1	-	-	-	-	-	-	-	-	-	-	-	-
rs867704559	19:110630-110630	TT	downstream_gene_variant	MODIFIER	OR4G1P	ENSG00000267310	Transcript	ENST00000641173.1	processed_transcript	-	-	-	-	-	-	-	-	-	rs867704559	4553	1	-	HGNC	HGNC:8302	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
rs867704559	19:110630-110630	TT	non_coding_transcript_exon_variant	MODIFIER	OR4F17	ENSG00000176695	Transcript	ENST00000641591.1	processed_transcript	3/4	-	-	-	143	-	-	-	-	rs867704559	-	1	-	HGNC	HGNC:15381	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
rs867704559	19:110630-110630	TT	downstream_gene_variant	MODIFIER	OR4G1P	ENSG00000267310	Transcript	ENST00000641984.1	processed_transcript	-	-	-	-	-	-	-	-	-	rs867704559	4553	1	-	HGNC	HGNC:8302	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-

 

問題:

為什么一個snp有這么多注釋?因為注釋是按Transcript進行的,同一個位點在不同的Transcript中的功能是不同的。另外,如果兩個基因離得太近,那就有可能注釋到兩個基因里。【按優先級排序,去掉冗余的即可】

為什么注釋有冗余,既有downstream又有non_coding?是的,肯定是有冗余的,注釋有不同層面,可以很粗放,也可以很精細。

 


 

數量少可以用web server,https://asia.ensembl.org/Tools/VEP【幾十萬個以內都可以用,可以支持rs id,非常方便】

數量多就用local tool,https://github.com/Ensembl/ensembl-vep

 

安裝perl依賴包

perl -MCPAN -e shell
install Archive::Zip
install DBI
cpan Module::Build

  

https://github.com/Ensembl/Bio-DB-HTS,這個模塊不好裝,fatal error: zlib.h: No such file or directory

 

裝本地數據庫:

cd $HOME/.vep
curl -O ftp://ftp.ensembl.org/pub/release-102/variation/indexed_vep_cache/homo_sapiens_vep_102_GRCh38.tar.gz
tar xzf homo_sapiens_vep_102_GRCh38.tar.gz

  

  

問題:#include <zlib.h> zlib.h: No such file or directory【非程序員背景,碰到編譯問題真是頭大】

解決方案:Compilation error - missing zlib.h

export PATH =$PATH:/home/lizhixin/softwares/ensembl-vep/zlib-1.2.11
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/lizhixin/softwares/ensembl-vep/zlib-1.2.11/lib/
export LIBRARY_PATH=$LIBRARY_PATH:/home/lizhixin/softwares/ensembl-vep/zlib-1.2.11/lib/
export C_INCLUDE_PATH=/home/lizhixin/softwares/ensembl-vep/zlib-1.2.11/include/
export CPLUS_INCLUDE_PATH=/home/lizhixin/softwares/ensembl-vep/zlib-1.2.11/include/
export PKG_CONFIG_PATH=/home/lizhixin/softwares/ensembl-vep/zlib-1.2.11/lib/pkgconfig

  

 

問題:MSG: ERROR: Cannot use ID format in offline mode【local模式無法使用rs id】

那就准備其他的格式測試一下,vcf肯定沒問題。

  

最后安不上就用docker,https://hub.docker.com/r/ensemblorg/ensembl-vep#install

 


 

 

結果解讀:

Consequences (all)

  • intron_variant: 44%
  • non_coding_transcript_variant: 16%
  • upstream_gene_variant: 12%
  • downstream_gene_variant: 11%
  • NMD_transcript_variant: 4%
  • regulatory_region_variant: 3%
  • intergenic_variant: 3%
  • non_coding_transcript_exon_variant: 2%
  • missense_variant: 1%
  • Others

Coding consequences

  • missense_variant: 73%
  • synonymous_variant: 26%
  • stop_gained: 1%
  • protein_altering_variant: 0%
  • frameshift_variant: 0%
  • stop_lost: 0%
  • coding_sequence_variant: 0%

 

注意:

  • non-coding包括intergenic + UTR + intron
  • exon包括CDS + UTR
  • upstream和downstream一般指基因上下游的2kbp
  • ncRNA exonic/splicing/intronic

 

優先級:

  • 1 splicing/ncRNA splicing
  • 2 missense
  • 3 coding region/ncRNA exonic
  • 4 5'UTR/3'UTR
  • 5 Upstream/Downstream
  • 6 regulatory_region_variant
  • 7 intronic/non_coding_transcript_variant:
  • 8 intergenic
  • 9 others

 

這里的功能注釋也有ontology

Google search:vep regulatory_region_variant

Ensembl Variation - Calculated variant consequences 注釋本身就是根據ensembl的transcript的功能來的

http://www.sequenceontology.org/miso/current_svn/term/SO:0001566

 

Critical association of ncRNA with introns

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM