PLINK: Whole genome data analysis toolset
[an error occurred while processing this directive]
Data management tools
PLINK provides a simple interface for recoding, reordering, merging,
flipping DNA-strand and extracting subsets of data.
Recode and reorder a sample
A basic, but often useful feature: to output a file a) with the PED
file markers reordered for physical position, b) with excluded SNPs
(negative values in the MAP file) excluded from the new PED file, c)
possibly recoding the SNPs to a 1/2 coding (the second
--recode12 option).
plink --file data --recode
or
plink --file data --recode12
These create 2 new files
plink-recode.ped
plink-recode.map
("plink" would be replaced by any specified --output {filename} )
Flip DNA strand for SNPs
This command will read the list of SNPs in the file list.txt
and flip the strand for these SNPs, then save a new PED/MAP file
(i.e. by using the --recode command):
plink --file data --flip list.txt --recode
The list.txt should just be a simple list of SNP IDs, one SNP per
line.
Merge two PED/MAP files
To merge two PED/MAP files:
plink --file data1 --merge data2.ped data2.map --recode --out merge
The --merge option must be followed by 2 arguments: the name
of the new PED file and the name of the new MAP file. The
--recode option is necessary to output the newly merged file;
the --out option will call the new files
merge-recode.ped and merge-recode.map.
The --merge option can also be used with binary PED files,
either as input or output, but not as the second file: i.e.
plink --bfile data1 --merge data2.ped data2.map --make-bed --out merge
will create merge.bed, merge.fam and
merge.bim, as the --make-bed option was used instead
of the --recode option. Likewise, the data1.* files
point to a binary PED file set; data2.ped must be a standard
plain text PED file, however. In otherwords, you cannot merge two
binary files; all other combinations are okay.
The two PED files can either overlap completely, partially, or not at
all both in terms of markers and individuals. Imputed genotypes will
be set to missing (i.e. if SNP_B is not measured in the first
file, but it is in the second, then any individuals in the first file
who are not also present in the second file will be set to missing for
SNP_B.
Any existing genotype data (i.e. in data1.ped)
will not be over-written by data in the second file
(data2.ped).
A warning will be given if the chromosome and/or physical position
differ between the two MAP files.
Extract a subset of SNPs
Sometimes, to exclude certain SNPs, it is more convenient to specify a
list of required SNPs and make a new file,
rather than having to set the 4th column of the MAP file to
negative/positive values for every SNP. This is achieved by using the
command
plink --file data --extract mysnps.txt
where the file is just a list of SNPs, one per line, e.g.
snp005
snp008
snp101
This command will create two new files:
plink-extract.ped
plink-extract.map
Again, "plink" would be replaced by any specified --output
{filename}.
Remove a subset of SNPs
To re-write the PED/MAP files, but with certain SNPs excluded, use the
option
plink --file data --exclude mysnps.txt
where the file mysnps.txt is, as for the --extract
command, just a list of SNPs, one per line.
This option creates the files
plink-extract.ped
plink-extract.map
Another way of removing SNPs is to make the physical position negative
in the MAP file and then use the --recode option.
Extract a subset of individuals
To re-write the PED/MAP files, but with only certain individuals included,
use the option
plink --file data --keep mylist.txt
where the file mylist.txt is, as for the --remove
command, just a list of Family ID / Individual ID pairs, one set per
line, i.e. one person per line.
Remove a subset of individuals
To re-write the PED/MAP files, but with certain individuals excluded,
use the option
plink --file data --remove mylist.txt
where the file mylist.txt is, as for the --keep
command, just a list of Family ID / Individual ID pairs, one set per
line, i.e. one person per line.
Make a Binary PED file
To create a binary PED file use the following command, which will
create the three files necessary:
plink --file mydata --make-bed
which creates (by default)
plink.bed ( binary file, genotype information )
plink.fam ( first six columns of mydata.ped )
plink.bim ( extended MAP file: two extra cols = allele names)
You can specify a different output root file name (i.e. different to
"plink") by using the --output (or --out) option:
plink --file mydata --output mydata --make-bed
which will create
mydata.bed
mydata.fam
mydata.bim
[an error occurred while processing this directive]
This document last modified [an error occurred while processing this directive]