1、列表包括數據庫名稱、表型、是否能下載到基因型(genotype)、是否能下載到GWAS結果文件(P值、效應值、SNP位點)。目前收集到的有如下:

參考到這些數據庫的文獻:Genome-wide association study identifies 74 loci associated with educational attainment
2、The Japanese Genotype-phenotype Archive (JGA) :該數據擁有個體水平的基因型和表型數據,需要申請,已經有人做過GWAS了,數據庫連接:https://www.ddbj.nig.ac.jp/jga/index-e.html
3、ExAC,不提供個體水平的genotype,但提供vcf、CNV、coverage等。表型只提供已經發表過的表型,比如二型糖尿病。
ExAC涉及的population和樣本數:
| Population |
Male Samples |
Female Samples |
Total |
| African/African American (AFR) |
1,888 |
3,315 |
5,203 |
| Latino (AMR) |
2,254 |
3,535 |
5,789 |
| East Asian (EAS) |
2,016 |
2,311 |
4,327 |
| Finnish (FIN) |
2,084 |
1,223 |
3,307 |
| Non-Finnish European (NFE) |
18,740 |
14,630 |
33,370 |
| South Asian (SAS) |
6,387 |
1,869 |
8,256 |
| Other (OTH) |
275 |
179 |
454 |
| Total |
33,644 |
27,062 |
60,706 |
ExAC可下載的數據:
| FTP Link |
Description |
| Sites VCF |
VCF of Variant Sites |
| CNV |
CNV Counts and Intolerance Scores |
| Coverage |
Per Base Coverage |
| Functional Gene Constraint |
Functional Gene Constraint Scores for ExAC and Subsets |
| Manuscript Data |
Variant Tables Used in Manuscript |
| Resources |
Exome Calling and Purcell5k Intervals |
| Subsets |
Non-TCGA VCF Subset |
數據庫鏈接:http://exac.broadinstitute.org/downloads
4、Simons Genome Diversity Project (SGDP)
提供279個樣本,涉及的群體有:美洲、非洲、東亞、南亞、西歐、大洋洲;提供vcf、Phased genotypes、STR、BAMS for Y-chromosomes
鏈接地址:http://reichdata.hms.harvard.edu/pub/datasets/sgdp/
5、CHINESE MILLIONOME DATABASE
網址:https://db.cngb.org/cmdb/
The Chinese Millionome Database(CMDB) is a unique large-scale Chinese genomics database produced by BGI and hosted in the National GeneBank. The CMDB delivers peridical and useful variation information and scientific insights derived from the analysis of millions of Chinese sequencing data. The results aim to promote genetic research and precision medicine actions in China.
The delivering information includes any of detected variants and the corresponding allele frequency, annotation, frequency comparison to the global populations from existing databases, etc.
6 、UK biobank
UKbiobank的GWAS summary數據:https://ctg.cncr.nl/documents/p1651/ukb2_sumstats.tar.gz
這個數據很大,下載請謹慎。
7、失眠、阿爾茲海默症、各種精神類疾病、智力等的summary數據庫
https://ctg.cncr.nl/software/summary_statistics
8、日本的公共數據庫National Bioscience Database Centre (NBDC) Human Database
https://humandbs.biosciencedbc.jp/
9、CVDKP Datasets
表型:人體測量、心血管疾病、心電圖、房顫、血脂、血糖、精神病
http://www.kp4cd.org/datasets/mi
10、CARDIoGRAMplusC4D Consortium
表型:冠狀動脈疾病、心血管疾病
http://www.cardiogramplusc4d.org/data-downloads/
11、diagram consortium
表型:T2D
http://diagram-consortium.org/downloads.html
12、GWAS公共數據以及代碼存儲
https://data.mendeley.com/research-data/
13、日本的GWAS summary數據
