PLINK: Whole genome data analysis toolset [an error occurred while processing this directive]
Data management tools
PLINK provides a simple interface for recoding, reordering, merging, flipping DNA-strand and extracting subsets of data.


Recode and reorder a sample

A basic, but often useful feature: to output a file a) with the PED file markers reordered for physical position, b) with excluded SNPs (negative values in the MAP file) excluded from the new PED file, c) possibly recoding the SNPs to a 1/2 coding (the second --recode12 option).

plink --file data --recode

or

plink --file data --recode12

These create 2 new files
     plink-recode.ped
     plink-recode.map
("plink" would be replaced by any specified --output {filename} )


Flip DNA strand for SNPs

This command will read the list of SNPs in the file list.txt and flip the strand for these SNPs, then save a new PED/MAP file (i.e. by using the --recode command):

plink --file data --flip list.txt --recode

The list.txt should just be a simple list of SNP IDs, one SNP per line.


Merge two PED/MAP files

To merge two PED/MAP files:

plink --file data1 --merge data2.ped data2.map --recode --out merge

The --merge option must be followed by 2 arguments: the name of the new PED file and the name of the new MAP file. The --recode option is necessary to output the newly merged file; the --out option will call the new files merge-recode.ped and merge-recode.map.

The --merge option can also be used with binary PED files, either as input or output, but not as the second file: i.e.

plink --bfile data1 --merge data2.ped data2.map --make-bed --out merge

will create merge.bed, merge.fam and merge.bim, as the --make-bed option was used instead of the --recode option. Likewise, the data1.* files point to a binary PED file set; data2.ped must be a standard plain text PED file, however. In otherwords, you cannot merge two binary files; all other combinations are okay.

The two PED files can either overlap completely, partially, or not at all both in terms of markers and individuals. Imputed genotypes will be set to missing (i.e. if SNP_B is not measured in the first file, but it is in the second, then any individuals in the first file who are not also present in the second file will be set to missing for SNP_B. Any existing genotype data (i.e. in data1.ped) will not be over-written by data in the second file (data2.ped).

A warning will be given if the chromosome and/or physical position differ between the two MAP files.


Extract a subset of SNPs

Sometimes, to exclude certain SNPs, it is more convenient to specify a list of required SNPs and make a new file, rather than having to set the 4th column of the MAP file to negative/positive values for every SNP. This is achieved by using the command

plink --file data --extract mysnps.txt

where the file is just a list of SNPs, one per line, e.g.
     snp005
     snp008
     snp101
This command will create two new files:
     plink-extract.ped
     plink-extract.map
Again, "plink" would be replaced by any specified --output {filename}.


Remove a subset of SNPs

To re-write the PED/MAP files, but with certain SNPs excluded, use the option

plink --file data --exclude mysnps.txt

where the file mysnps.txt is, as for the --extract command, just a list of SNPs, one per line.

This option creates the files
     plink-extract.ped
     plink-extract.map
Another way of removing SNPs is to make the physical position negative in the MAP file and then use the --recode option.


Extract a subset of individuals

To re-write the PED/MAP files, but with only certain individuals included, use the option

plink --file data --keep mylist.txt

where the file mylist.txt is, as for the --remove command, just a list of Family ID / Individual ID pairs, one set per line, i.e. one person per line.


Remove a subset of individuals

To re-write the PED/MAP files, but with certain individuals excluded, use the option

plink --file data --remove mylist.txt

where the file mylist.txt is, as for the --keep command, just a list of Family ID / Individual ID pairs, one set per line, i.e. one person per line.


Make a Binary PED file

To create a binary PED file use the following command, which will create the three files necessary:

plink --file mydata --make-bed

which creates (by default)
     plink.bed      ( binary file, genotype information )
     plink.fam      ( first six columns of mydata.ped ) 
     plink.bim      ( extended MAP file: two extra cols = allele names)
You can specify a different output root file name (i.e. different to "plink") by using the --output (or --out) option:

plink --file mydata --output mydata --make-bed

which will create
	mydata.bed
	mydata.fam
	mydata.bim
[an error occurred while processing this directive]
This document last modified [an error occurred while processing this directive]