Skip to content

Projects

This is an overview of the GWASStudio projects available on-premises at Human Tecnopole, along with the relevant core metadata fields (see metadata.md#core-metadata-fields).

Note: All summary statistics were harmonized to the GWASLab format via harmonization_pipeline.

Project name category project study
deCODE pQTL DECODE large scale plasma 2023
FinnGen GWAS finngen R12
Genes & Health GWAS genesandhealth v010_binary_traits_3digitICD10, v010_quantitative_traits_median_values
UKB GWAS opengwas ukb-a, ukb-b, ukb-d
UKB-PPP pQTL ukb-ppp_consortium ukb-ppp_African, ukb-ppp_American, ukb-ppp_Combined, ukb-ppp_East_Asian, ukb-ppp_European, ukb-ppp_Middle_Eastern, ukb-ppp_South_Asian

deCODE

The deCODE project is a large-scale population genetics initiative that integrates medical and genetic data from over 160,000 Icelandic participants. This includes genotype information, whole-genome sequencing, quantitative traits, binary disease phenotypes, and extensive genealogical records. Through global collaborations with medical institutions, deCODE has also incorporated data from ~500,000 individuals worldwide. A primary goal is to identify genetic risk factors for common diseases (e.g., heart attack, asthma, stroke, and cancer) and to advance personalized medicine.

GWASStudio includes 5,284 summary statistics from the large-scale plasma proteomics GWAS conducted by deCODE.

For more information on the deCODE Project, see the deCODE genetics Inc. article.

For details on the plasma proteomics GWAS, see the original article and the 2023 follow-up study that integrates results with data from UKB-PPP: UKB-deCODE comparative analysis.

Key Metadata

  • Number of Protein Analytes
    • 4,907 aptamers (measuring 4,719 proteins)
  • Protein Platform
    • SomaScan multiplex aptamer assay (version 4)
  • Number of Variants
    • ~27.2 million imputed variants
  • Imputation Panel
    • Whole-genome sequencing (WGS on ~60,000 Icelanders) used as population-specific reference panel for genotype imputation into the larger Icelandic cohort (~160,000 individuals)
  • Genome build
    • GRCh38
  • Sample Size and Ancestry
    • ~36,000 Icelandic individuals (i.e. primarily Icelandic and European ancestry)
  • GWAS Method
    • linear mixed model (BOLT-LMM)

Note: For detailed GWAS methodology, see the original article.


FinnGen

FinnGen is a large-scale research initiative that integrates genomic data with national health registry data from 500,000 Finnish biobank participants. The project aims to identify genetic risk factors for disease, understand disease progression and treatment response, and advance personalized medicine.

GWASStudio includes 2,469 summary statistics from the FinnGen GWAS R12 analysis.

For more information on the FinnGen Project and methods, see the FinnGen documentation and FinnGen Handbook.

To browse FinnGen traits (including trait definition, prevalence, and longitudinal comorbidities), see the Risteys portal.

Key Metadata

  • Number of Traits
    • 2,469
  • Trait Types
    • 2,466 binary traits
    • 3 quantitative traits (HEIGHT_IRN, WEIGHT_IRN, BMI_IRN)
  • Number of Variants
    • ~21 million imputed variants
  • Imputation Panel
  • Genome build
    • GRCh38
  • Sample Size and Ancestry
    • 500,348 individuals of primarily Finnish/European ancestry
  • GWAS Method
    • regenie v2.2.4 (v3.3 used for traits not converging under v2.2.4)

Note: For detailed GWAS methodology, see the FinnGen documentation.


Genes & Health

Genes & Health is a community‑based genetic research initiative, focusing on British Bangladeshi and British Pakistani communities, with over 100,000 participants. The project aims to reduce health disparities in these groups (e.g. in diabetes, cardiovascular diseases, and mental health disorders) by researching genetic risk factors, disease progression, treatment response, and supporting more inclusive precision medicine.

GWASStudio includes 762 binary‑trait summary statistics and 107 quantitative‑trait summary statistics from Genes & Health (2025 Realease).

For more information on the Genes & Health project, data access, and methods, see the Genes & Health "Data & Researchers" page and the project website.

Key Metadata

  • Number of Traits
    • v010_binary_traits_3digitICD10: 762 binary traits
    • v010_quantitative_traits_median_values: 107 quantitative traits (median values)
  • Trait Types
    • v010_binary_traits_3digitICD10: ICD10-coded binary traits (e.g. A01–Q99)
    • v010_quantitative_traits_median_values: quantitative (e.g. BMI, LDL-cholesterol)
  • Number of Variants
    • ~37.5 million imputed variants
  • Imputation Panel
    • TOPMed-r3
  • Genome build
    • GRCh38
  • Sample Size and Ancestry
    • ~100,000 participants of British Bangladeshi and British Pakistani (South Asian) ancestry
  • GWAS Method
    • regenie

Note: For quantitative traits, Genes & Health provides 321 quantitative‑trait summary statistics derived from minimum, median, and maximum observed values per individual. For consistency and to reduce variability caused by extreme measurements or outliers, median values were selected to represent each quantitative trait in the GWAS analyses included in GWASStudio.

Note: For more information on the phenotypes and notation, see the dedicated page on the project website.


UK Biobank (UKB)

UK Biobank (UKB) is a large-scale biomedical database and research resource containing detailed health, lifestyle, environmental, and genetic data from approximately 500,000 UK participants aged 40–69 at recruitment (2006–2010). Participants have been followed longitudinally through linked electronic health records, imaging, biomarker assays, and repeat assessments.

A wide range of GWAS analyses have been conducted on UKB phenotypes by multiple research groups, often focusing on different trait types, QC pipelines, or analysis methods. GWASStudio includes results from three major sources, for a total of 3,945 UKB-based GWAS summary statistics:

  • ukb-a: Neale Lab Phase 1 Release (2017) — early GWAS on core phenotype (596)
  • ukb-b: MRC-IEU OpenGWAS project (2021+) — harmonized and quality-controlled GWAS across thousands of traits (2,514)
  • ukb-d: Neale Lab Phase 2/3 Release (2018) — expanded GWAS with improved imputation, additional phenotypes, and updated QC pipelines (835)

For more information on the UK Biobank project, see https://www.ukbiobank.ac.uk/ and UKB Data Showcase, which provides searchable metadata on all available phenotypes and samples.

Key Metadata

  • Number of Traits
    • ukb-a: 596
    • ukb-b: 2,514
    • ukb-d: 835
  • Trait Types
    • ukb-a and ukb-d: binary (disease case-control), continuous (physical and lab measures)
    • ukb-b: binary, continuous, categorical ordered (e.g. job satisfaction)
  • Number of Variants
    • All studies begin with ~17 million imputed variants
    • ukb-a: filtered to ~10.5 million
    • ukb-b: filtered to ~10 million
    • ukb-d: filtered to ~13.5 million
  • Imputation Panel
    • UK Biobank Imputation Release v3 (R3)
  • Genome build
  • Sample Size and Ancestry
    • ukb-a: ~337,000 unrelated White British individuals
    • ukb-b: ~460,000 full UKB cohort (mostly European), including related individuals
    • ukb-d: ~361,000 White British individuals, including related individuals
  • GWAS Method
    • ukb-a and ukb-d: linear regression (Hail, linreg3)
    • ukb-b: linear mixed model (BOLT-LMM or SAIGE, depending on the trait)

Note: For detailed GWAS methodology, see:


UK Biobank Pharma Proteomics Project (UKB-PPP)

UK Biobank Pharma Proteomics Project (UKB-PPP) is a biopharmaceutical consortium that performed plasma proteomic profiling of 54,219 UK Biobank participants. Protein quantitative trait loci (pQTL) analyses were conducted across European individuals and five additional non-European ancestry groups.

GWASStudio includes a total of 20,576 UKB-PPP GWAS summary statistics.

For more information on the UKB-PPP project, see the original UKB-PPP article.

Key Metadata

  • Number of Protein Analytes:
    • 2,941 protein analytes (2,943 unique proteins)
  • Protein Panel:
  • Number of Variants:
    • ~23.8 million imputed variants
  • Genome build:
    • GRCh38
  • Sample Size and Ancestry:
    • ukb-ppp_European: 34,557 European individuals
    • ukb-ppp_African: 931 African individuals
    • ukb-ppp_South_Asian: 920 Central South Asian individuals
    • ukb-ppp_Middle_Eastern: 308 Middle Eastern individuals
    • ukb-ppp_East_Asian: 262 East Asian individuals
    • ukb-ppp_American: 97 Admixed American (mostly Hispanic/Latino groups)
    • ukb-ppp_Combined: 52,363 individuals in total — the combined UKB-PPP cohort includes all above groups plus 17,806 additional European individuals from the replication cohort. The combined cohort results from QC filtering starting from 54,219 initial participants selected by the UKB-PPP consortium
  • GWAS Method
    • regenie v2.2.1

Note: Ancestry groups in UKB-PPP are defined following the pan-UKBB framework, based on clustering individuals in principal component (PC) space using reference populations from the 1000 Genomes Project and the Human Genome Diversity Panel. For more information, see the pan-UKBB study design.

Note: For detailed GWAS methodology, see the original UKB-PPP article.