在甲基化ChAMP包中加入協變量

本文轉載自查看原文 2020-05-19 22:24 726 甲基化

最近在分析甲基化數據，發現ChAMP包的輸入端是beta（甲基化數據）和pheno（表型數據），找不到校正協變量（比如年齡、性別、批次等）的輸入端。

如下代碼所示：

champ.DMP(beta = myNorm,
              pheno = myLoad$pd$Sample_Group,
              compare.group = NULL,
              adjPVal = 0.05,
              adjust.method = "BH",
              arraytype = "450K")

查了一下ChAMP包的代碼，發現ChAMP包調用的是limma函數。

因此，這事就比較好解決了。

只需要在原代碼的基礎上修改model就行了。

下面介紹一下如何修改model。

1創建甲基化數據集和表型、協變量數據集

beta <- matrix(runif(60,min=0,max=1),10,6)
rownames(beta) <- c("cg18478105","cg09835024","cg14361672","cg01763666","cg12950382","cg02115394","cg25813447","cg07779434","cg13417420","cg12480843")
colnames(beta) <-paste("sample",1:6)
da=data.frame(pheno=c("1","1","1","2","2","2"),SEX=c("1","2","1","1","1","2"),AGE=c(54,23,58,43,68,36))
rownames(da) <- paste("sample",1:6)

甲基化數據如下所示：

注意：beta數據是隨機產生的，因此每個人生成的數據是不一樣的。

表型（pheno）、協變量（SEX、AGE）的數據如下：

2 修改model、校正協變量

在這里我們假定要校正的協變量有SEX和AGE。

我在tian yuan創作的champ.DMP函數的基礎上增加了cov1=da$cov1和cov2=da$cov2兩個輸入參數。
在champ.DMP函數里面修改了

design <- model.matrix( ~ 0+ factor(p)+cov1+cov2 )
colnames(design) <- c("control", "case","cov1","cov2")
contrast.matrix <- makeContrasts(contrasts=paste(colnames(design)[2:1],collapse="-"), levels = design)

最后修改完的函數如下所示：

champ.DMP1 <- function(beta = myNorm,
                      pheno = da$Sample_Group,
                      cov1=da$cov1,
                      cov2=da$cov2,
                      adjPVal = 0.05,
                      adjust.method = "BH",
                      compare.group = NULL,
                      arraytype = "EPIC")
{
    message("[===========================]")
    message("[<<<<< ChAMP.DMP START >>>>>]")
    message("-----------------------------")

    if(is.null(pheno) | length(unique(pheno))<=1)
    {
        stop("pheno parameter is invalid. Please check the input, pheno MUST contain at least two phenotypes.")
    }else
    {
        message("<< Your pheno information contains following groups. >>")
        sapply(unique(pheno),function(x) message("<",x,">:",sum(pheno==x)," samples."))
        message("[The power of statistics analysis on groups contain very few samples may not strong.]")
    }
	
    if(is.null(compare.group))
    {
        message("You did not assign compare groups. The first two groups: <",unique(pheno)[1],"> and <",unique(pheno)[2],">, will be compared automatically.")
        compare.group <- unique(pheno)[1:2]
    }else if(sum(compare.group %in% unique(pheno))==2)
    {
        message("As you assigned, champ.DMP will compare ",compare.group[1]," and ",compare.group[2],".")
    }else
    {
        message("Seems you did not assign correst compare groups. The first two groups: <",unique(pheno)[1],"> and <",unique(pheno)[2],">, will be compared automatically.")
        compare.group <- unique(pheno)[1:2]
    }
    
    p <- pheno[which(pheno %in% compare.group)]
    beta <- beta[,which(pheno %in% compare.group)]
   design <- model.matrix( ~ 0+ factor(p)+cov1+cov2 )
	colnames(design) <- c("control", "case","cov1","cov2")
    contrast.matrix <- makeContrasts(contrasts=paste(colnames(design)[2:1],collapse="-"), levels = design)
    message("\n<< Contrast Matrix >>")
    print(contrast.matrix)

    message("\n<< All beta, pheno and model are prepared successfully. >>")
	
	fit <- lmFit(beta, design)
	fit2 <- contrasts.fit(fit,contrast.matrix)
	tryCatch(fit3 <- eBayes(fit2),
      warning=function(w) 
      {
      	stop("limma failed, No sample variance.\n")
      })
    DMP <- topTable(fit3,coef=1,number=nrow(beta),adjust.method=adjust.method)
    message("You have found ",sum(DMP$adj.P.Val <= adjPVal), " significant MVPs with a ",adjust.method," adjusted P-value below ", adjPVal,".")
    message("\n<< Calculate DMP successfully. >>")

    if(arraytype == "EPIC") data(probe.features.epic) else data(probe.features)
    com.idx <- intersect(rownames(DMP),rownames(probe.features))
    avg <-  cbind(rowMeans(beta[com.idx,which(p==compare.group[1])]),rowMeans(beta[com.idx,which(p==compare.group[2])]))
    avg <- cbind(avg,avg[,2]-avg[,1])
    colnames(avg) <- c(paste(compare.group,"AVG",sep="_"),"deltaBeta")
    DMP <- data.frame(DMP[com.idx,],avg,probe.features[com.idx,])

    message("[<<<<<< ChAMP.DMP END >>>>>>]")
    message("[===========================]")
    message("[You may want to process DMP.GUI() or champ.GSEA() next.]\n")
    return(DMP)
}

3、使用champ.DMP1分析甲基化數據

champ.DMP1 <- champ.DMP1(beta =beta,pheno = da$pheno, cov1=da$SEX,cov2=da$AGE, adjust.method = "BH", arraytype = "EPIC")

beta是第一步生成的甲基化數據，da數據框上的pheno、SEX、AGE也是在第一步生成的數據；
arraytype = "EPIC"表示甲基化數據類型為850K。如果是"450K"數據則將"EPIC"改為"450K"；

4、修改技巧

上面舉得例子是校正兩個協變量，如果想校正三個協變量的話則做如下三個改動：

第一個改動：

champ.DMP1 <- function(beta = myNorm,
                      pheno = da$Sample_Group,
                      cov1=da$cov1,
                      cov2=da$cov2,
                      cov3=da$cov3,
                      adjPVal = 0.05,
                      adjust.method = "BH",
                      compare.group = NULL,
                      arraytype = "EPIC")

第二個改動：

design <- model.matrix( ~ 0+ factor(p)+cov1+cov2+cov3 )
colnames(design) <- c("control", "case","cov1","cov2","cov3")
contrast.matrix <- makeContrasts(contrasts=paste(colnames(design)[2:1],collapse="-"), levels = design)

第三個改動：

champ.DMP1 <- champ.DMP1(beta =beta,pheno = da$pheno, cov1=da$SEX,cov2=da$AGE, cov3=da$cov3,adjust.method = "BH", arraytype = "EPIC")

5、致謝

感謝原作者tian yuan（這個包很全能！）;

感謝健明分享的甲基化分析入門練習：甲基化芯片的一般分析流程

建議各位剛入門甲基化的同學們可以看看健明在B站的視頻，講的很詳細。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 DNA甲基化 FEM：整合RANSEQ和DNA甲基化數據分析的R包甲基化分析中limma包中的logFC值是怎么來的表觀遺傳學——甲基化，組蛋白修飾【5】腫瘤DNA甲基化數據分析原理及流程一文讀懂：八大RNA m6A甲基化研究核心問題｜易基因綜述：如何研究植物基因組DNA羥甲基化（5hmC）？｜易基因 PHP字符串中的變量解析（+教你如何在PHP字符串中加入變量）在maven 2工程中加入iTextAsian支持(maven添加自定義jar包到本地倉庫) js中加入數據緩存