PLINK: Whole genome data analysis toolset plink...
Last original PLINK release is v1.07 (10-Oct-2009); PLINK 1.9 is now available for beta-testing

Whole genome association analysis toolset

Introduction | Basics | Download | Reference | Formats | Data management | Summary stats | Filters | Stratification | IBS/IBD | Association | Family-based | Permutation | LD calcualtions | Haplotypes | Conditional tests | Proxy association | Imputation | Dosage data | Meta-analysis | Result annotation | Clumping | Gene Report | Epistasis | Rare CNVs | Common CNPs | R-plugins | SNP annotation | Simulation | Profiles | ID helper | Resources | Flow chart | Misc. | FAQ | gPLINK

1. Introduction

2. Basic information

3. Download and general notes

4. Command reference table

5. Basic usage/data formats 6. Data management

7. Summary stats 8. Inclusion thresholds 9. Population stratification 10. IBS/IBD estimation 11. Association 12. Family-based association 13. Permutation procedures 14. LD calculations 15. Multimarker tests 16. Conditional haplotype tests 17. Proxy association 18. Imputation (beta) 19. Dosage data 20. Meta-analysis 21. Annotation 22. LD-based results clumping 23. Gene-based report 24. Epistasis 25. Rare CNVs 26. Common CNPs 27. R-plugins 28. Annotation web-lookup 29. Simulation tools 30. Profile scoring 31. ID helper 32. Resources 33. Flow-chart 34. Miscellaneous 35. FAQ & Hints

36. gPLINK
 

Meta-analysis

This page describes the basic meta-analysis functions in PLINK, in which two or more result files can be combined in fixed-effects and random-effects meta-analysis.

Basic usage

The basic command for meta-analysis is invoked as
plink --meta-analysis study1.assoc study2.assoc study3.assoc

PLINK expects each file to be a plain-text, rectangular white-space delimited file, with a header row. PLINK will search the header row for the columns:
     SNP   SNP idenitifier
      OR   Odds ratio (or BETA, etc)
      SE   Standard error of OR (or user-defined weight field)
       P   (Optional) p-value from test

     CHR   (Optional) 
      BP   (Optional)
      A1   (Optional)
      A2   (Optional) 
HINT The SE field is added as an output field in the standard --assoc, --mh, --linear and --logistic tests, etc, if the --ci 0.95 command is specified.

For example, consider we have two association files from independent studies, s1.assoc and s2.assoc. For example, if the first few rows of s1.assoc were as follows:

   CHR        SNP         BP   A1      F_A      F_U   A2     CHISQ       P      OR       SE      L95     U95
    22   rs915677   14433758    A   0.1522   0.1842    G    0.1538   0.695  0.7949   0.5862    0.252   2.508
    22   rs140378   15251689    G  0.02083  0.04762    C    0.4988    0.48  0.4255    1.243  0.03719   4.869
    22   rs131564   15252977    C   0.1522   0.2619    G     1.625  0.2024  0.5058   0.5401   0.1755   1.458
    22  rs4010550   15274688    G   0.1364    0.275    A     2.495  0.1142  0.4163   0.5642   0.1377   1.258
    22  rs5747361   15365080    0        0        0    G        NA      NA      NA       NA       NA      NA
    22  rs2379981   15405346    G  0.02083        0    A    0.8848  0.3469      NA       NA       NA      NA
    ...
The command
plink --meta-analysis s1.assoc s2.assoc

gives the following output
     Performing meta-analysis of 2 files
     Reading results from [ s1.assoc ]  with 2680 read
     Reading results from [ s2.assoc ]  with 2655 read
     2778 unique SNPs, 2557 in two or more files
     Rejected 1911 SNPs, writing details to [ plink.prob ]
     Writing meta-analysis results to [ plink.meta ]
In general, SNPs across two or more files do not need to be in the same order; also, a SNP does not need to feature in all files. By default, meta-analysis will be reported for any SNP in two or more files.

In this case, a number of SNPs are reported as being rejected from meta-analysis. The reason for this is reported in the file
     plink.prob
which lists the SNP, the file and the problem code, as follows:
     BAD_CHR               Invalid chromosome code 
     BAD_BP                Invalid base-position code 
     BAD_ES                Invalid effect-size (e.g. OR) 
     BAD_SE                Invalid standard error 
     MISSING_A1            Missing allele 1 label
     MISSING_A2            Missing allele 2 label
     ALLELE_MISMATCH       Mismatching allele codes across files
The main output is in the file
     plink.meta
for example,
      CHR         BP         SNP  A1  A2   N        P     P(R)      OR   OR(R)       Q       I
       22   14433758    rs915677   A   G   2   0.2217   0.2217  0.5823  0.5823  0.4184    0.00
       22   15252977    rs131564   C   G   2   0.2608   0.2608  0.6665  0.6665  0.4924    0.00
       22   15274688   rs4010550   G   A   2    0.298   0.3545  0.6748  0.6673  0.2489   24.79
       22   15462210  rs11089263   A   C   2   0.3992   0.3992  1.3108  1.3108  0.3600    0.00
       22   15462259  rs11089264   A   G   2   0.4719   0.4719  1.2606  1.2606  0.4079    0.00
       22   15475051   rs2154615   T   C   2   0.5518   0.5518  1.2876  1.2876  0.7534    0.00
       22   15476541   rs5993628   A   G   2   0.8014   0.8014  1.0948  1.0948  0.3380    0.00
       22   15549842   rs2845362   C   G   2    0.865   0.9789  0.9399  0.9854  0.1307   56.23
which has the following fields:
     CHR       Chromosome code
     BP        Basepair position
     SNP       SNP identifier
     A1        First allele code
     A2        Second allele code
     N         Number of valid studies for this SNP
     P         Fixed-effects meta-analysis p-value
     P(R)      Random-effects meta-analysis p-value
     OR        Fixed-effects OR estimate
     OR(R)     Random-effects OR estimate
     Q         p-value for Cochrane's Q statistic
     I         I^2 heterogeneity index (0-100)
The effect (OR, or BETA in case of quantitative trait) is with respect to the A1 allele (i.e. if OR is greater than 1, implies A1 increases risk relative to A2).

HINT If an input file is compressed (gzip compression) and ends in the .gz extension, PLINK will automatically decompress it (if compiled with ZLIB support)

Misc. options

A number of options can be specified after the list of result files. As --meta-analysis takes a variable number of files as arguments, it is necessary to explicitly indicate that additional options are specified, by a plus sign, as follows:
plink --meta-analysis s1.assoc s2.assoc + report-all

In this example, the report-all option means that even SNPs that are only found in a single file are reported. A full list of options is give here:
     study        Collate study-specific effect estimates in plink.meta (F0, F1, ...)
     no-map       Do not look for or use CHR/BP positions (i.e. if absent from files)
     no-allele    Do not look for or use A1/A2 allele codes (i.e. if absent from files)
     report-all   Report for SNPs seen only in a single file
     logscale     Indicates that effects are already on log-scale (i.e. beta from logistic regression)
     qt           Indicates that effects are from linear regression (i.e. not OR, do not take log)
Selecting subsets of SNPs: One can use the --extract option as well as --chr, etc, to input and perform meta-analysis only on certain subsets of SNPs.

HINT If performing meta-analysis on a large number of large files (e.g. 10+ files of imputed results, each with over 2 million entries), one might need to perform this one chromosome at a time, with the --chr option, as all the result files might not fit in memory in one go otherwise.
 

This document last modified Wednesday, 25-Jan-2017 11:39:27 EST