1. Introduction
2. Basic information
3. Download and general notes
4. Command reference table
5. Basic usage/data formats
6. Data management
7. Summary stats
8. Inclusion thresholds
9. Population stratification
10. IBS/IBD estimation
11. Association
12. Family-based association
13. Permutation procedures
14. LD calculations
15. Multimarker tests
16. Conditional haplotype tests
17. Proxy association
18. Imputation (beta)
19. Dosage data
20. Meta-analysis
21. Annotation
22. LD-based results clumping
23. Gene-based report
24. Epistasis
25. Rare CNVs
26. Common CNPs
27. R-plugins
28. Annotation web-lookup
29. Simulation tools
30. Profile scoring
31. ID helper
32. Resources
33. Flow-chart
34. Miscellaneous
35. FAQ & Hints
36. gPLINK
|
|
Order of major operations in PLINK
This section contains a rough flow-chart of some of the main
operations in PLINK. In particular, it is designed to indicate the
order in which certain operations are performed (i.e. whether SNPs are
excluded before or after merging files, etc), and also when PLINK
halts operation, e.g. after certain commands, meaning that certain
combinations are not feasible.
Most of these steps are optional (i.e. will only occur if a specific command
has been issued on the command line).
- Parse command line, for commands and options
- Check version, unused options, warnings
- Define chromosome set (human, or --mouse, --rice, etc)
- Run ID-helper utility (--id-dict and --id-match), then QUIT
- Run SNP-annotation (--lookup and --lookup-gene), then QUIT
- Run compression/decompressio utility (--compress and --decompress), then QUIT
- Read input, either:
- Dummy dataset(--dummy), or
- Simulated dataset (--simulate), or
- Result files for meta-analysis (--meta-analysis), or
- Result files for gene-based report (--gene-report), or
- Result and annotation files (--annotate), or
- Maps for CNVs (--cfile, --cnv-list), or
- Binary filset (--bfile), or
- PED fileset (--file), or
- LGEN fileset (--lfile), or
- Transposed fileset (--tfile), or
- Maps for generic variants (--gfile), or
- Map and dosage files (--dosage)
- For commands not involving basic SNP or CNV data directly (e.g. --meta-analysis, --annotate,
--dosage, --gene-report, etc) then call the corresponding function directly, then QUIT
- At this stage, the following filters apply directly when loading
(Note: some other filters not mentioned below are done later, e.g.
--snps, --extract, --remove, --filter-males):
- --chr
- --snp, --window
- --from, --to
- --from-kb, --to-kb, etc
- Check for duplicate individual or SNP names
- Merge one or more filesets (--merge, --bmerge,
--merge-list)
- Swap in alternate phenotype file (--pheno), or make a
new phenotype (--make-pheno)
- Remove individuals with missing phenotypes (--prune)
- Update SNP information (--update-map)
- Update FAM information (--update-ids, --update-sex, ...)
- Update allele information (--update-alleles)
- Flip strand (--flip)
- Recode alleles 1234/ACGT (--alleleACGT, --allele1234 )
- Either, if (--exclude-before-extract), then
- extract any SNPs (--extract)
- then exclude any SNPs (--exclude)
- otherwise
- exclude any SNPs (--exclude)
- then extract any SNPs (--extract)
- Either, if (--keep-before-remove), then
- keep any individuals (--keep)
- then remove any individuals (--remove)
- otherwise
- remove any individuals (--remove)
- then keep any individuals (--keep)
- Filter SNPs based on attributes (--attrib)
- Filter individuals based on attributes (--attrib-indiv)
- Filter SNPs based on quality scores (--qual-scores)
- Filter genotypes based on quality scores (--qual-geno-scores)
- Random thinning of SNPs (--thin)
- Read --genome-lists
- Read list of obligatory missing genotypes (--oblig-missing)
- Filter based on a variable (--filter)
- Filter based on sex, phenotype, etc (--filter-males, --filter-cases, ...)
- Read covariate file (--covar)
- Read cluster file (--within)
- Zero-out specific genotypes (--zero-cluster)
- Process rare CNV data
- Read CNV list, map to genomic positions
- Filter on genes, sizes, types, etc (--cnv-intersect, --cnv-del, --cnv-kb, etc)
- Write back any genes, regions intersected (--cnv-report-regions)
- Filter CNVs based on frequency (--cnv-freq-exclude-above, etc)
- Report basic count of CNVs in LOG file
- Write a new CNV list, map file (--cnv-write, --cnv-make-map)
- Calculate per-individual CNV summary statistics
- Calculate per-position CNV summaries
- Make summary displays(--cnv-track, --cnv-seglist)
- Find overlapping CNVs as pools (--segment-group)
- Perform association / genome-wide burden test (--mperm, --cnv-indiv-perm)
- QUIT
- Process generic variant data (--gfile)
- Read GVAR data (might be on top of existing, standard file)
- Calculate frequency statistics for each allele, CNP state
- Perform linear/logistic regression of phenotype on CNP states
- QUIT
- Main SNP filters
- Count founders and nonfounders
- Calculate per-individual genotyping rate, remove individuals below threshold (--missing, --mind)
- Calculate (or read from file (--read-freq) allele frequencies
- Determine per SNP missing genotype rate, after removing individuals, exclude below threshold (--geno)
- Determine minor (reference) allele
- List of heterozygous hets found, by default set to missing
- List SNPs with no founder genotypes observed
- Write allele frequencies to file (--freq)
- Calculate HWE statistics per SNP (--hardy, --hwe); after --hardy, then QUIT
- Report genotyping rate per SNP and per individual as calculated above (--missing)
- Remove SNPs below the MAF filter (--maf)
- Re-report basic case/control counts to LOG
- Re-specify reference alleles (--reference-allele )
- Make family units, if needed; perform Mendel checks
(--mendel, --me, --tdt, etc)
- Reset pat and mat codes of non-founders if parents not present (--make-founders)
- Perform sex-check (--check-sex)
- Create pseudo case/control units from trio data (--tucc)
- Write permuted phenotype file (--make-perm-pheno), QUIT
- Write table of SNPs/set scoring (--set-table), QUIT
- Write covariate file (--write-covar), then QUIT
- Write cluster file (--write-cluster), then QUIT
- Write snplist file (--write-snplist), then QUIT
- Write binary fileset file (--make-bed), then QUIT
- Write other file formats for genotype data (--recode,
--recodeA, --list, --two-locus, etc), then QUIT
- Create and output a SET file given ranges (--make-set), then QUIT
- LD-based clumping of association results, (--clump), then QUIT
- Generate lists of SNPs tagging other SNPs (--show-tags), then QUIT
- Generate haplotype blocks (--blocks), then QUIT
- Determine if conditioning SNPs used (--condition)
- Perform IBS, cluster analysis and MDS analysis
(--cluster, --mds-plot, --neighbour),
then QUIT
- Test for differences in IBS between groups (--ibs-test), then QUIT
- Calculate genome-wide IBS and IBD (--genome), then QUIT
- Calculate F inbreeding statistic (--het)
- Calculate runs of homozygosity (--homozyg), then QUIT
- Perform LD-based pruning of SNP (--indep, --indep-pairwise), then QUIT
- Perform LD-based scan for strand flips (--flipscan), then QUIT
- Calculate and display pairwise LD (--r2, --ld), then QUIT
- General haplotype estimation, (association, phase reports, frequencies) --hap)
- Phasing
- Report haplotype frequencies
- Report hapotype phases
- Perform mis-hap test for non-missing randomness
- Proxy association and imputation
- QUIT
- SNP-by-SNP epistasis tests (--epistasis), then QUIT
- Score per-individual risk profiles (--score), then QUIT
- Run R-plugin on dataset (--R), then QUIT
- For main association tests, loop over all phenotypes, (--all-pheno)
- Perform assocaition test (--mh, --model,
--assoc, --fisher, --linear,
--logistic, --homog, --qfam,
--tdt, --poo, --dfam, --gxe, etc)
- Perform haplotype association test (--hap-assoc, --hap-tdt)
- Perform conditional haplotype test (--chap), then QUIT
- Perform --test-missing
- If specified, repeat the above tests with permuted datasets
- Go to next phenotype
- Perform PLINK segmental sharing test
- Definitely QUIT
|
|