最近在分析甲基化數據,發現ChAMP包的輸入端是beta(甲基化數據)和pheno(表型數據),找不到校正協變量(比如年齡、性別、批次等)的輸入端。
如下代碼所示:
champ.DMP(beta = myNorm,
pheno = myLoad$pd$Sample_Group,
compare.group = NULL,
adjPVal = 0.05,
adjust.method = "BH",
arraytype = "450K")
查了一下ChAMP包的代碼,發現ChAMP包調用的是limma函數。
因此,這事就比較好解決了。
只需要在原代碼的基礎上修改model就行了。
下面介紹一下如何修改model。
1創建甲基化數據集和表型、協變量數據集
beta <- matrix(runif(60,min=0,max=1),10,6)
rownames(beta) <- c("cg18478105","cg09835024","cg14361672","cg01763666","cg12950382","cg02115394","cg25813447","cg07779434","cg13417420","cg12480843")
colnames(beta) <-paste("sample",1:6)
da=data.frame(pheno=c("1","1","1","2","2","2"),SEX=c("1","2","1","1","1","2"),AGE=c(54,23,58,43,68,36))
rownames(da) <- paste("sample",1:6)
甲基化數據如下所示:
注意:beta數據是隨機產生的,因此每個人生成的數據是不一樣的。
表型(pheno)、協變量(SEX、AGE)的數據如下:
2 修改model、校正協變量
在這里我們假定要校正的協變量有SEX和AGE。
我在tian yuan創作的champ.DMP函數的基礎上增加了cov1=da$cov1
和cov2=da$cov2
兩個輸入參數。
在champ.DMP函數里面修改了
design <- model.matrix( ~ 0+ factor(p)+cov1+cov2 )
colnames(design) <- c("control", "case","cov1","cov2")
contrast.matrix <- makeContrasts(contrasts=paste(colnames(design)[2:1],collapse="-"), levels = design)
最后修改完的函數如下所示:
champ.DMP1 <- function(beta = myNorm,
pheno = da$Sample_Group,
cov1=da$cov1,
cov2=da$cov2,
adjPVal = 0.05,
adjust.method = "BH",
compare.group = NULL,
arraytype = "EPIC")
{
message("[===========================]")
message("[<<<<< ChAMP.DMP START >>>>>]")
message("-----------------------------")
if(is.null(pheno) | length(unique(pheno))<=1)
{
stop("pheno parameter is invalid. Please check the input, pheno MUST contain at least two phenotypes.")
}else
{
message("<< Your pheno information contains following groups. >>")
sapply(unique(pheno),function(x) message("<",x,">:",sum(pheno==x)," samples."))
message("[The power of statistics analysis on groups contain very few samples may not strong.]")
}
if(is.null(compare.group))
{
message("You did not assign compare groups. The first two groups: <",unique(pheno)[1],"> and <",unique(pheno)[2],">, will be compared automatically.")
compare.group <- unique(pheno)[1:2]
}else if(sum(compare.group %in% unique(pheno))==2)
{
message("As you assigned, champ.DMP will compare ",compare.group[1]," and ",compare.group[2],".")
}else
{
message("Seems you did not assign correst compare groups. The first two groups: <",unique(pheno)[1],"> and <",unique(pheno)[2],">, will be compared automatically.")
compare.group <- unique(pheno)[1:2]
}
p <- pheno[which(pheno %in% compare.group)]
beta <- beta[,which(pheno %in% compare.group)]
design <- model.matrix( ~ 0+ factor(p)+cov1+cov2 )
colnames(design) <- c("control", "case","cov1","cov2")
contrast.matrix <- makeContrasts(contrasts=paste(colnames(design)[2:1],collapse="-"), levels = design)
message("\n<< Contrast Matrix >>")
print(contrast.matrix)
message("\n<< All beta, pheno and model are prepared successfully. >>")
fit <- lmFit(beta, design)
fit2 <- contrasts.fit(fit,contrast.matrix)
tryCatch(fit3 <- eBayes(fit2),
warning=function(w)
{
stop("limma failed, No sample variance.\n")
})
DMP <- topTable(fit3,coef=1,number=nrow(beta),adjust.method=adjust.method)
message("You have found ",sum(DMP$adj.P.Val <= adjPVal), " significant MVPs with a ",adjust.method," adjusted P-value below ", adjPVal,".")
message("\n<< Calculate DMP successfully. >>")
if(arraytype == "EPIC") data(probe.features.epic) else data(probe.features)
com.idx <- intersect(rownames(DMP),rownames(probe.features))
avg <- cbind(rowMeans(beta[com.idx,which(p==compare.group[1])]),rowMeans(beta[com.idx,which(p==compare.group[2])]))
avg <- cbind(avg,avg[,2]-avg[,1])
colnames(avg) <- c(paste(compare.group,"AVG",sep="_"),"deltaBeta")
DMP <- data.frame(DMP[com.idx,],avg,probe.features[com.idx,])
message("[<<<<<< ChAMP.DMP END >>>>>>]")
message("[===========================]")
message("[You may want to process DMP.GUI() or champ.GSEA() next.]\n")
return(DMP)
}
3、使用champ.DMP1分析甲基化數據
champ.DMP1 <- champ.DMP1(beta =beta,pheno = da$pheno, cov1=da$SEX,cov2=da$AGE, adjust.method = "BH", arraytype = "EPIC")
beta是第一步生成的甲基化數據,da數據框上的pheno、SEX、AGE也是在第一步生成的數據;
arraytype = "EPIC"表示甲基化數據類型為850K。如果是"450K"數據則將"EPIC"改為"450K";
4、修改技巧
上面舉得例子是校正兩個協變量,如果想校正三個協變量的話則做如下三個改動:
第一個改動:
champ.DMP1 <- function(beta = myNorm,
pheno = da$Sample_Group,
cov1=da$cov1,
cov2=da$cov2,
cov3=da$cov3,
adjPVal = 0.05,
adjust.method = "BH",
compare.group = NULL,
arraytype = "EPIC")
第二個改動:
design <- model.matrix( ~ 0+ factor(p)+cov1+cov2+cov3 )
colnames(design) <- c("control", "case","cov1","cov2","cov3")
contrast.matrix <- makeContrasts(contrasts=paste(colnames(design)[2:1],collapse="-"), levels = design)
第三個改動:
champ.DMP1 <- champ.DMP1(beta =beta,pheno = da$pheno, cov1=da$SEX,cov2=da$AGE, cov3=da$cov3,adjust.method = "BH", arraytype = "EPIC")
5、致謝
感謝原作者tian yuan(這個包很全能!);
感謝健明分享的甲基化分析入門練習:甲基化芯片的一般分析流程
建議各位剛入門甲基化的同學們可以看看健明在B站的視頻,講的很詳細。