PLINK: Whole genome data analysis toolset

PLINK: Whole genome data analysis toolset [an error occurred while processing this directive]

How can I generate the a multidimensional scaling plot to represent population substructure in my sample?

To get the multidimensional scaling plot: then in the freely-available statistical package R, for example:

	d <- read.table("plink.mdist")
	m <- as.matirx(d)
	mds <- cmdscale(as.dist(1-m))
	plot(mds)

To import lables, e.g. by phenotype:

	gawk ' { print $6 } ' mydata.ped > phenotypes
	gawk ' { print $1 } ' mydata.ped > ids

Then in R, as above, plus

	
	p <- read.table("phenotypes")[,1]
	plot(mds,col=p)
	id <- read.table("ids")[,1]
	plot(mds,type="n")
	text(mds,as.character(id))

To save a plot as EPS

	plot(mds)
	dev.copy2eps(file="myplot.eps")

To redirect to PDF

	pdf(file="myplot.pdf")
	plot(mds)
	dev.off()

My sample consists of sibling pairs? Can I perform a sibling-based analysis?

The --within-file or --within options: for example, to permute individuals only within family ID groups, use the following:

plink --file mydata --assoc --within --family

How can I check my file prior to using PLINK?

One approach is to use Unix tools: the things to check are that every row has the same number of items. For example,

gawk ' { print NF } ' myfile.ped | sort | uniq -c

should just give one line of output (showing the number of lines, and the number of items each line has). More than two lines of output indicates that the file is not properly formed.

To check pedigree structure, you might use the program PEDSTATS or FAMTYPES.

[an error occurred while processing this directive]

This document last modified [an error occurred while processing this directive]