Each project-study combination is associated with a metadata record that describes the key characteristics of the available summary-statistics file(s).
These records are used throughout the platform (e.g., for filtering, querying, and API responses) and therefore follow a stable schema. When new attributes become relevant they should be added to this table and, if possible, given a concise, human-readable label.
Field (canonical name) |
Friendly label |
Description |
Possible values / format |
category |
Summary-statistic type |
Type of summary statistics |
GWAS , pQTL , eQTL , … |
project |
Project |
Identifier of the project the data belongs to |
opengwas , pqtl , genesandhealth , … |
study |
Study |
Identifier of the specific study |
ukb-a , ukb-b , ukb-d , … |
data_id |
Record ID |
Unique identifier for the metadata entry |
e.g. 89f31189b3 |
build |
Genome build |
Reference genome build |
GRCh37 , GRCh38 |
population |
Broad ancestry category |
Ancestry of the cohort |
see the ancestry categories page |
total_samples |
Number of samples |
Sample size of the cohort |
integer |
total_cases |
Number of cases |
Count of case subjects (if applicable) |
integer |
total_controls |
Number of controls |
Count of control subjects (if applicable) |
integer |
trait_desc |
Trait description |
Human-readable description of the phenotype or protein |
e.g. “Pregnancy, childbirth and the puerperium”, “Alpha-1B-glycoprotein” |
notes_sex |
Sex of participants |
Sex composition of the cohort |
Males , Females , Combined |
Additional optional fields (may appear in specific projects)
Field |
Friendly label |
When to use |
notes_maker |
Data producer |
Institution or consortium that generated the data |
notes_maker_platform_technology |
Assay technology |
Technology used to generate the data (e.g., array, sequencing) |
notes_maker_platform_description |
Assay description |
Provide extra detail beyond the technology name |
notes_maker_platform_version |
Assay version |
Useful when multiple releases exist |
notes_maker_platform_normalization |
Normalization strategy |
Clarifies how raw data were transformed |
notes_software_description |
Analysis software |
Record the tool that generated the statistics |
notes_source_id |
Source ID |
Original identifier of the upstream summary‑statistics file |
trait_gene_ids |
Gene identifiers |
When multiple genes are linked to the trait |
trait_icd10 |
Clinical code |
When the trait is a disease phenotype |
trait_protein_ids |
Protein identifiers |
For proteomics‑related summary statistics |
trait_seqid |
Sequence identifier |
SomaLogic specific sequence identifier for the trait |
trait_tissue |
Measured tissue |
eQTL/pQTL studies that are tissue‑specific |
trait_unit |
Measurement unit |
For quantitative traits (e.g., blood pressure) |