可變剪切調控因子motif基因富集分析 | motif enrichment | FIMO | MEME


類似篇:轉錄因子motif TSS區域富集分析 | motif enrichment | HOMER | FIMO | MEME

 

一個新的領域,現在我關注的是可變剪切調控因子,如PTBP1,它們有特定的RNA結合motif,類似TF。

相同點:

  • 都是蛋白質的序列結合區域
  • 有特定的序列motif

不同點:

  • TF的motif主要結合在promoter和enhancer,負責基因轉錄
  • ASF的motif主要結合在gene的intro區域,負責可變剪切

 

這里以PTBP1為例。

 

靈感來源文章:2018 - cancer cell - PTBP1-Mediated Alternative Splicing Regulates the Inflammatory Secretome and the Pro-tumorigenic Effects of Senescent Cells

RNA-Binding Motif Analysis
FIMO (Grant et al., 2011) was used to scan the human gene sequences for the PTBP1 RNA-binding motifs inferred by (Ray et al., 2013). The thereby predicted occurrences were mapped to the analyzed splicing events. To generate the RNA-maps (Figures 7B and S7D), for each comparison alternative exons were divided into those with PSIs significantly increasing upon PTBP1 knockdown (putatively repressed), those with PSIs significantly decreasing upon PTBP1 knockdown (putatively enhanced), and those with PSIs not altered upon PTBP1 knockdown (putatively not regulated). Statistical significance for local motif enrichment is associated with Fisher’s exact tests for differences in motif occurrences between groups of exons within 31 bp moving windows.

 

找RNA motif

查Ray et al., 2013,A compendium of RNA-binding motifs for decoding gene regulation

順藤摸瓜,找到一個數據庫:CISBP-RNA Database: Catalog of Inferred Sequence Binding Preferences of RNA binding proteins

 

操作,導出hg38的gene序列(包含exon和intro)

http://www.genome.ucsc.edu/cgi-bin/hgTables

 

用FIMO預測:https://meme-suite.org/meme/tools/fimo

 

得到短序列的motif的meme格式,網頁版會給出來,下載即可。

MEME version 4

ALPHABET= ACGT

strands: + -

Background letter frequencies (from unknown source):
A 0.250 C 0.250 G 0.250 T 0.250

MOTIF 1 HYTTTYT

letter-probability matrix: alength= 4 w= 7 nsites= 1 E= 0e+0
0.333333 0.333333 0.000000 0.333333
0.000000 0.500000 0.000000 0.500000
0.000000 0.000000 0.000000 1.000000
0.000000 0.000000 0.000000 1.000000
0.000000 0.000000 0.000000 1.000000
0.000000 0.500000 0.000000 0.500000
0.000000 0.000000 0.000000 1.000000

  

fimo --alpha 1 --max-strand -oc target PTBP1.motif.meme hg38_gene.fasta

  

一個小的DNA、RNA、protein轉換工具:http://biomodel.uah.es/en/lab/cybertory/analysis/trans.htm

 

注意:

motif與序列要匹配,DNA就是T,RNA就是U,不然無法匹配。

如果是RNA motif,則需要做一個反向互補的DNA motif

MEME version 4

ALPHABET= ACGT

strands: + -

Background letter frequencies (from unknown source):
A 0.250 C 0.250 G 0.250 T 0.250

MOTIF 1 ARAAARD

letter-probability matrix: alength= 4 w= 7 nsites= 1 E= 0e+0
1.000000 0.000000 0.000000 0.000000
0.500000 0.000000 0.500000 0.000000
1.000000 0.000000 0.000000 0.000000
1.000000 0.000000 0.000000 0.000000
1.000000 0.000000 0.000000 0.000000
0.500000 0.000000 0.500000 0.000000
0.333333 0.000000 0.333333 0.333333

 

fimo --alpha 1 --max-strand -oc target PTBP1.DNA.motif.meme hg38_gene.fasta --max-stored-scores 1000000 --thresh 1e-4

  

下次要用小數據測試,不然一晚上白跑了。 

 

--max-strand

If matches on both strands at a given position satisfy the output threshold, only report the match for the strand with the higher score. If the scores are tied, the matching strand is chosen at random.

 

資源消耗統計

--max-stored-scores 1000000用到了1.48G內存,1個CPU

--max-stored-scores 10000000用到了內存,個CPU

 

最新命令:

fimo --max-stored-scores 10000000 --thresh 1e-4 --alpha 1 -oc target2 --text --max-strand PTBP1.DNA.motif.meme hg38_gene.fasta > output.tsv
fimo --max-stored-scores 10000000 --thresh 1e-4 --alpha 1 -oc target2 --skip-matched-sequence --max-strand PTBP1.DNA.motif.meme hg38_gene.fasta > output2.tsv

  

--skip-matched-sequence【超速輸出,一個半小時縮短為10分鍾】

Like the --text option, this limits output to tab-separated values (TSV) sent to standard out, but in addition, turns off output of the sequence of motif matches. This speeds up processing considerably.

  

--text【結果到標准輸出】

Limits output to TSV (tab-separated values) formatted results sent to standard output. The results are unsorted and no q-values are output, allowing very large files to be searched.

 

參考:

~/project/scPipeline/motifEnrichment/ASF_motif/

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM