CNV:
人類主要是二倍體。如果有些區域出現3個、4個拷貝,那就是擴增了,如果只出現1個拷貝,就是缺失。
所以CNV分析是依靠特定位置的測序深度來估算的,先在染色體上划窗,然后看每個窗口的平均測序深度,如果連續多個窗口的測序深度在樣品/對照中都有差異,那么就判斷為CNV,標准是拷貝數相除,然后取log2,log2Ratio小於-1或大於0.6即視為出現拷貝數變異,對應的ratio就是小於二分之一或者三分之二,也就是至少增加或減少一個拷貝
CNV:注釋
library(biomaRt)
mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
results <- getBM(attributes = c("hgnc_symbol", "chromosome_name",
"start_position", "end_position"),
filters = c("chromosome_name", "start", "end"),
values=list(1, 94312388, 96000000),
mart=mart)
dim(results) # 34 hits, only 12 with gene symbols
library(GenomicRanges)
filename <- "test.txt"
#test.txt
Sample Chromosome Start End Num_Probes Segment_Mean
TCGA-BR-A4J9-10A-01D-A255-01 1 3218610 247813706 127587 -8e-04
TCGA-BR-A4J9-10A-01D-A255-01 2 484222 16358510 9812 4e-04
TCGA-BR-A4J9-10A-01D-A255-01 2 16358715 16359561 3 -2.0811
TCGA-BR-A4J9-10A-01D-A255-01 2 16360852 149639289 67009 0.0085
TCGA-BR-A4J9-10A-01D-A255-01 2 149641890 149644977 2 -2.552
tbl <- read.table(filename, sep="\t", as.is=TRUE, header=TRUE);
gr <- makeGRangesFromDataFrame(tbl)
gr.short <- subset(gr, width < 100)
length(gr) # 117 regions
length(gr.short) # just 2 regions
gr.short
regions <- paste(seqnames(gr.short), start(gr.short), end(gr.short), sep=":")
regions
results <- getBM(attributes = c("hgnc_symbol", "chromosome_name",
"start_position","end_position"),
filters = c("chromosomal_region"),
values=regions,
mart=mart)