| 
1. Introduction 
2. Basic information 
3. Download and general notes 
4. Command reference table 
5. Basic usage/data formats 
6. Data management 
 
7. Summary stats
8. Inclusion thresholds
9. Population stratification
10. IBS/IBD estimation
11. Association
12. Family-based association
13. Permutation procedures
14. LD calculations
15. Multimarker tests
16. Conditional haplotype tests
17. Proxy association
18. Imputation (beta)
19. Dosage data
20. Meta-analysis
21. Annotation
22. LD-based results clumping
23. Gene-based report
24. Epistasis
25. Rare CNVs
26. Common CNPs
27. R-plugins
28. Annotation web-lookup
29. Simulation tools
30. Profile scoring
31. ID helper
32. Resources
33. Flow-chart
34. Miscellaneous
35. FAQ & Hints
36. gPLINK
 | 
 | 
 
Rare copy number variant (CNV) data
This page describes some basic file formats, convenience functions and
analysis options for rare copy number variant (CNV) data. Support
for common copy number polymorphisms (CNPs) is
described here.
Copy number variants are represented as segments. These
segments are essentially represented and analysed in a similar manner
to how PLINK handles runs of homozygosity (defined by a start and stop
site on a given chromosome). Allelic (i.e. basic SNP) information is
not considered here: PLINK skips the usual procedure of
reading in SNP genotype data.
Here we assume that some other software package such as
the Birdsuite
package has previously been used to make calls for either specific
copy-number variable genotypes or to identify particular genomic
regions in individuals that are deletions or duplications, based on
the raw data. That is, PLINK only offers functions for downstream
analysis of CNV data, not for identifying CNVs in the first place,
i.e. similar to the distinction between SNP genotype calling versus
the subsequent analysis of those calls.
In this section, we describe the basic 
format for rare CNV data; the steps involved in 
making a MAP file and
loading the data. We consider ways to filter the
CNV lists by 
type,
genomic location or
frequency. We describe options for relating CNVs 
to phenotype, either at the level of 
genome-wide burden or looking for 
specific associations. Finally, we detail the tools for 
producing reports of any 
genes intersected by CNVs and for displaying groups of 
overlapping CNVs.
Basic support for segmental CNV data
The basic command for reading a list of segmental CN variants is
  plink --cnv-list mydata.cnv
        --fam mydata.fam
        --map mydata.cnv.map
which can be abbreviated
plink --cfile mydata
(note that the map file must have the .cnv.map map extension). The CNV list file
mydata.cnv has the format
     FID     Family ID
     IID     Individual ID
     CHR     Chromosome
     BP1     Start position (base-pair)
     BP2     End position (base-pair)
     TYPE    Type of variant, e.g. 0,1 or 3,4 copies
     SCORE   Confidence score associated with variant 
     SITES   Number of probes in the variant
Having a header row is optional; if the first line starts with FID
it will be ignored.
Note The SCORE and SITES values
are not used in any direct way, except potentially as variates to filter
segments on, as described below. That is, the values of these do not
fundamentally impact the way analysis is performed by PLINK
itself (they might alter the meaning of the results of course, e.g. if
including low-confidence calls into the analysis!). In other words, if
whatever software was used to generate the CNV calls does not supply
some conceptually similar values, it is okay to simply put dummy codes
(e.g. all 0) in these two fields.
The first few lines of a small example file is shown here:
     FID    IID   CHR       BP1       BP2  TYPE   SCORE  SITE
     P1     P1      4  71338469  71459318     1      27     0
     P1     P1      5  31250352  32213542     1    34.2     0
     P1     P1      7  53205351  53481230     3    18.2     0
     P2     P2     11  86736484  87074601     1      22     0  
     P2     P2     14  47817280  47930190     4    55.1     0
     ...
The FAM file format is the first 6 fields of a PED file,
described here; this file lists the sex,
phenotype and founder status of each individual. The MAP file format
is described here, although the next
section how this can be automatically created using
the --cnv-make-map command.
Creating MAP files for CNV data
Prior to any analysis, a dummy MAP first needs to be created (this
step only needs to be performed once per CNV file). This
PLINK-generated MAP file has dummy entries that correspond to the
start and stop sites of all segments.  This facilitates subsequent
parsing and analysis of CNV data by PLINK. 
The --cnv-make-map command is used as follows:
   plink --cnv-list mydata.cnv --cnv-make-map 
which creates a file 
     plink.cnv.map
which will look just like a standard MAP file but with
dummy markers:
     1       p1-51593   0    51593
     1       p1-51598   0    51598
     1       p1-51666   0    51666
     1       p1-52282   0    52282
     1       p1-69061   0    69061
     ...
where the marker names start with the p prefix and contain
chromosome and base-position information. 
As an (unrealistic) example to illustrate how the mapping works,
consider the following, with 3 segments, spanning "positions" 1 to 8,
4 to 12 and 16 to 23.  In this case, 6 unqiue map positions would be 
created, the three start positions and the three stop positions. 
        Base                1111111111222222
        Position   1234567890123456789012345  
        Marker #   1..2...3...4...5......6..
                   |  |   |   |   |      |
                              |   |      |
        Segments   *------*       |      |
                      *-------*             
                                  *------*
The new MAP file would then be
     1 p1-1   0  1
     1 p1-4   0  4
     1 p1-8   0  8
     1 p1-12  0  12
     1 p1-16  0  16
     1 p1-23  0  23
Given such a MAP file, these three segments would then be perfectly
mapped to the corresponding markers (p1-1 to p1-8,
p1-4 to p1-12 and p1-16
to p1-23). The created MAP file is then specified in
subsequent segmental CNV analyses (using --cnv-list) with the
standard --map command (or --cfile command).
Loading CNV data files
Once a suitable MAP file has been created, i.e. with dummy markers
that correspond to the position of every start and stop site of all
segments, use the --cnv-list command again to load in the CNV
segment data. As mentioned above, in addition to the basic CNV file, a
MAP (previously generated) and FAM file (continaing ID and phenotype
information) also need to be specified. For example.
  plink --map plink.cnv.map  --fam mydata.fam  --cnv-list mydata.cnv
Alternatively, if the MAP, FAM and CNV list files all have the same
root, the command
  plink --cfile study1
is equivalent, i.e. it implies the following files exist
     study1.cnv
     study1.cnv.map
     study1.fam
By default either command will simply load in the CNV data and produce
a report in the LOG file, enumerating the number of CN states in the
total dataset and any filtering processes applied. For example,
     Reading segment list (CNVs) from [ cnv1.list ]
     714 of 2203 mapped as valid segments
     1872 mapped to a person, of which 714 passed filters
        CopyN  Count
        0      46
        1      339
        3      200
        4      129
     Writing segment summary to [ plink.cnv.indiv ]
This indicates that of 2203 total segments (i.e. should correspond to
number of lines in the cnv1.list file, allowing for any
header) 1872 are mapped to a person in the dataset. In other words,
some of the segments in cnv1.list are for individuals not in
cnv1.fam.  These are simply ignored; for example, these
individuals might have been filtered out of the study for other
reasons, e.g. QC based on standard SNP genotypes. Of these, 714 passed
the further set of filters, as described below. As described below,
segments can be filtered based on genomic location, frequency, size,
quality score/number of sites and type (duplication or deletion).
It will also be reported in the LOG file if some of the segments do
not map to a marker in the MAP file: if this is because you've
used --chr or similar commands to restrict the portion of the
data examined, you can safely ignore this line; otherwise, it might
mean that the appropriate MAP file wasn't created
(e.g. using --cnv-make-map) for that CNV file.
By default, PLINK will create a file that summarises per
individual events (after any filtering has been applied), in a file
named
     plink.cnv.indiv
which has the fields, one row per person, in the same order as the original FAM file:
     FID     Family ID
     IID     Individual ID
     PHE     Phenotype 
     NSEG    Number of segments that individual has
     KB      Total kilobase distance spanned by segments 
     KBAVG   Average segment size
PLINK will also create a file
     plink.cnv.summary
that represents a count of CNVs, in cases (AFF) and controls (UNAFF)
that overlap each map position.
Checking for overlapping CNV calls (within the same individual)
As a sanity check of a CNV file: to check whether segments are
overlapping for the same person (e.g. if a deletion and a duplication
event had been specified for the same person in the same region, or if
the same event is listed twice), use the option
 plink --cfile mydata --cnv-check-no-overlap
If there is overlap, this writes a warning to the LOG, with the number of 
implicated events:
     Within-individual CNV overlap detected, involving 2 CNVs
and creates a file
     plink.cnv.overlap
that lists these offending segments, with the format:
     FID   Family ID
     IID   Individual ID
     CHR   Chromosome code
     BP1   Segment start (bp)
     BP2   Segment end (bp) 
Filtering of CNV data based on CNV type
The segments read in can be filtered in a number of ways. First, one
can specify to read in only either deletions (TYPE is less
than 2) or duplications (TYPE is greater than 2), with the
options,
   --cnv-del
 
and
   --cnv-dup
 
Segments can also be filtered based on a minimum size (kb), score or
number of sites contributing with the following commands:
   --cnv-kb 50 
   --cnv-score 3
   --cnv-sites 5
 
The default minimum segment size is 20kb; none of the other filters
have a default setting that would exclude anything. Also,
corresponding maximum thresholds can be set:
   --cnv-max-kb 2000
   --cnv-max-score 10
   --cnv-max-sites 10
 
As mentioned above, the SCORE and SITES fields are not used for any other purpose in analysis, and
so if you do not have this information, can can safely enter dummy information (e.g. a value of 1 for every CNV).
The set of individuals for whom segment data are based on can be
modified with the standard --keep and --remove
options, to exclude people from the analysis.
Filtering of CNV data based on genomic location
  
It is possible to extract a specific set of segments that overlap with
one or more regions as specified in a file, e.g. that might contain
the genomic co-ordinates for genes or segmental duplications, etc. Use the command 
     --cnv-intersect regions.list
The file regions.list should be in the following format: one range per line, whitespace-separated:
     CHR     Chromosome code (1-22, X, Y, XY, MT, 0)
     BP1     Start of range, physical position in base units
     BP2     End of range, as above
     MISC    Any other fields after 3rd ignored
For example, if regions.list were
     2 30000000 35000000  REGION1
     2 60000000 62000000
     X 10000000 20000000  Linkage hotspot
then
plink --cfile mydata --cnv-intersect regions.list
would extract all segments in mydata.cnv that at least partially span these
three regions (5Mb and 2Mb on chromosome 2 and 10Mb on chromosome X), ignoring
the comments or gene names. A typical type of file used with --cnv-intersect 
will often be a list of genes (such as available in the resources page).
Alternatively, you can use
     --cnv-exclude regions.list
 
to filter out a specific set of segments, i.e. to remove any CNVs that overlap with
one or more regions specified in the file regions.list.
Assuming the region file has consistent, unique names in the fourth field, the command
     --cnv-subset mylist.txt
takes a list of region names and extracts just these from the
main --cnv-intersect, --cnv-exclude
(or --cnv-count, as described below) list. e.g. if mylist.txt contained
     REGION1
     REGION2
and region.list where
     2 30000000 35000000  REGION1
     2 60000000 62000000  GENE22
     X 10000000 20000000  LinkageHotspot
then only the first region (chromosome 3, 30Mb to 35Mb, labelled
REGION1) would be extracted, as REGION2 does not exist. The --cnv-subset
command requires that the regions.list file  has exactly four fields 
(i.e. always a unique region/gene name in the fourth field).
 Defining overlap for partially overlapping CNVs and regions
The basic intersection or exclusion commands will select all segments 
that are at least partially in the specified region. Alternatively, 
one can select only segments that have at least X percent 
of them in the specified region, for example
     --cnv-overlap 0.50
would only include (--cnv-intersect), or exclude (--cnv-exclude), 
events that have at least 50% of their length spanned by the region.  
There are two other variant forms of the overlap command, which change
the denominator in calculating the proportion overlap:
     --cnv-union-overlap 0.50 
which defines overlap as the ratio of the intersection and the union,
also
     --cnv-region-overlap 0.50
which defines overlap as the ratio of the intersection and the length
of the region (rather than the CNV).  For example, 
     ------|-----|---------------------   Region/gene
     ----------+++++++++++++++---------   CNV (duplication, +)
     ----------XXX---------------------   Intersection
     ----------XXXXXXXXXXXXXXX---------   Denominator for basic overlap
     ------XXXXXXXXXXXXXXXXXXX---------   Denominator for union overlap
     ------XXXXXXX---------------------   Denominator for region overlap
In this example, if we take each character to represent a standard length
     Default overlap = 3 / 15
     Union overlap = 3 / 19
     Region overlap = 3 / 7  
This next example illustrates how the overlap statistics can then
subsequently be used to include or exclude specific CNVs: if overlap 
threshold were set to 0.5, then only the first of therse two CNVs would
be selected by --cnv-intersect
     ------|----------|----------------     
         
     --------OOOOOOOOOOXXX-------------   Selected
     ------------OOOOOOXXXXXXXXXXXXXX--   Not selected
The default setting is equivalent to setting
--cnv-overlap 0 (i.e. more than 0% must
overlap).
Finally, the command
     --cnv-disrupt
will select only CNVs that start or stop within a region
specified in the region list (i.e. resulting in a partially deleted or
duplicated gene or region). The normal overlap commands cannot be used
in conjunction with the --cnv-disrupt defintion of whether or 
not a CNV overlaps a gene.
Filtering by chromosomal co-ordinates
In addition, the standard commands for filtering chromosomal positions
are still applicable, for example
     --chr 5
or
     --chr 2 --from-mb 20 --to-mb 25
Note that for a CNV to be included when using these filters, both the start and stop site must fall within 
the prespecified range (i.e. a CNV spanning from 19 to 24Mb on chromosome 2 would not be included in the above example).
Filtering of CNV data based on frequency
It is also possible to exclude based on the frequency of CNVs at a
particular position. There are two main approaches to this: by
assigning frequencies for regions and then applying the
same routines as for the range-intersection command described above,
or alternatively by assigning each CNV a single, specific count.
These commands, and the differences between them, are described more
fully on this page. As well as the two
basic approaches described above, one can specify different degrees of
overlap when calculating frequencies, which can alter the result of
frequency filtering.
The key commands and some examples are given here. To remove segments that map to regions
with more than 10 segments
     --cnv-freq-exclude-above 10
To remove any segments that only have at most 4 copies
     --cnv-freq-exclude-below 5
To remove any segments not in regions with exactly 5 copies
     --cnv-freq-exclude-exact 5
and correspondingly to include only segments in regions with exactly 5 copies
     --cnv-freq-include-exact 5
As with the earlier range intersection commands, the definition
of intersection can be soft, specified with
the --cnv-overlap option. In most cases here, one would
probably want to allow for a soft filtering,
e.g. with --cnv-overlap 0.5 for example.
For example, given the following segments, and counts below
        Segments   *------*       
                      *------------*             
                                  *------*
          Counts 001112222211111112211111100000
  Common regions      XXXXX       XX             
then --cnv-freq-exclude-above 1 would remove all three
segments if --cnv-overlap 0 (the default) were set.  This is
because each CNV has at least some part of it that intersects with a
region that contains more than 1 CNV. However,
if --cnv-overlap were instead set to 0.5, for example, then
only the top segment would be removed (as the other two segments have
more than 50% of their length outside of a region with more than 1
segment). If the overlap were set higher still, then in this example no
CNVs would be removed by the command --cnv-freq-exclude-above 1.
NOTE Because multiple CNVs at the same region will
not all exactly overlap, and may be spanned by distinct larger events,
or contain smaller events, in other individuals, then requesting that
you include only CNVs with exactly five copies for example
(--cnv-freq-include-exact 5) does not mean that at all
positions in the genome you will always see either 0 or 5
copies. Rather, the selection process works exactly as specified
above. Please see this page for further
details.
Alternative frequency filtering specification
The alternate approach is invoked with the command
     --cnv-freq-method2 0.5
where the value following it represents an overlap parameter (there is no
need to specify the --cnv-overlap command directly when using
--cnv-freq-method2). Based on this overlap, PLINK will assign a specific 
count to each CNV that represents the number of CNVs that overlap it (including itself) 
based on a union intersection overlap definition with the specified proportion parameter, 
between that CNV and all CNVs.
 
This approach is illustrated in the page, that gives more details
on the frequency filtering commands including a comparison to the region-based approach to filtering,
described above.
If the --cnv-freq-method2 command is used, then the other frequency filtering commands
will use the CNV-based counts to include of exclude CNVs, for example
 plink --cfile mydata 
       --cnv-freq-method2 0.5
       --cnv-freq-exclude-above 10
If --cnv-write (see below) is specified with --cnv-freq-method2, 
then the additional command
     --cnv-write-freq
will add a field FREQ to the plink.cnv file generated that shows the frequency for each CNV.
Also, the --cnv-seglist command (see below) can be modified with --cnv-write-freq
(to report the frequency as a number at the start and stop of each CNV instead of the usual codes).
Miscellaneous commands frequency filtering commands
To keep only segments that are
unique to either cases or to controls
     --cnv-unique
This can be used in conjunction with other frequency filter
commands. To drop individuals from the file who do not have at least
one segment after filtering, add the flag
     --cnv-drop-no-segment
This can make the plink.cnv.indiv summary files easy to
browse, for example.
Burden analysis of segmental CNV data
To perform a set of global test of CNV burden in cases versus
controls, add the 
     --cnv-indiv-perm
option as well as 
     --mperm 10000
for example (i.e. permutation is required). By default, this reports on four tests, 
which use these metrics to calculate burden in both cases and controls
     RATE    Number of segments
     PROP    Proportion of sample with one or more segment
     TOTKB   Total kb length spanned
     AVGKB   Average segment size
Tests are based (1-sided) on comparing these metrics in cases versus controls, 
evaluated by permutation.
If a list of regions is supplied in a file, e.g. gene.list and the command 
     --cnv-count gene.list
then an extra test is added
     GRATE   Number of regions/genes spanned by CNVs
     GPROP   Number of CNVs with at least one gene
     GRICH   Number of regions/genes per total CNV kb
These tests respect all the normal filtering commands, with the exception 
that --cnv-intersect and --cnv-exclude cannot be used if 
--cnv-count is also being used.
The mean metrics in cases and controls are reported in the file
     plink.cnv.grp.summary
when the --cnv-indiv-perm command is used. For example: this gives 
the number of events (N) in cases and controls, the rate per person, 
the proportion of cases/controls to have at least one event, the total distance
spanned per person and the average event size per person.
        TEST      GRP      AFF    UNAFF
           N      ALL      528      362
        RATE      ALL   0.1557   0.1138
        PROP      ALL   0.1309   0.1041
       TOTKB      ALL    290.8    265.4
       AVGKB      ALL    249.8    243.3
As usual, if the --within command is added and a cluster file
specified, then any permutations are performed within cluster. In this
case, the statistics displayed in the plink.cnv.grp.summary
file are also split out by the strata as well as presented in total
(as indicated by the GRP field).
Improved gene-set enrichment analysis for segmental CNV data
This test implements a geneset-enrichment method for CNV data
described in Raychaudhuri
et al, 2010. It is appropriate for either continuous or disease
traits and allows for the inclusion of multiple other covariates and
for empirical significance tests. 
The following examples illustrate basic usage. If the file
genes.dat contains the locations of all genes (i.e. as
available from here and the file
pathway.txt is a file of gene names forming the pathway to be
tested for enrichment and the CNV data are in the files mycnv.cnv,
mycnv.cnv.map and mycnv.fam as described above, then
one can ask whether a) genes are enriched for CNVs, b) a subset of
genes are enriched, relative to the whole genome, c) a subset of genes
are enriched, relative to all genes. The latter form of the enrichment
test might be desirable, for example, to determine whether any
enrichment is general to all genes, or specific to a subset of
genes. 
To test for enrichment of genic CNVs: 
  plink --cfile mycnv
        --cnv-count genes.dat
        --cnv-enrichment-test
Enrichment of pathway genes CNVs, relative to all CNVs 
  plink --cfile mycnv
        --cnv-count genes.dat
        --cnv-subset pathway.txt
        --cnv-enrichment-test
To test for enrichment of pathway genes CNVs, relative to all genic CNVs 
  plink --cfile mycnv
        --cnv-intersect genes.dat
        --cnv-write my-genic-cnv
  plink --cfile my-genic-cnv
        --cnv-count genes.dat
        --cnv-subset pathway.txt
        --cnv-enrichment-test
The usual modifiers (to define intersection differently, allow for a
certain kb border around each gene, filter on CNV size, type or
frequency, etc) are all available. Under all circumstances, 2-sided
asymptotic p-values are returned. Alternatively, permutation testing
can be applied and 1-sided empirical p-values are returned (positive
enrichment in cases, based on estimated regression coefficient), by adding the flag: 
  --mperm 10000
 
Association mapping with segmental CNV data
To perform a simple permutation-based test of association of segmental
CNV data for case/control phenotypes, add the option 
     --mperm 50000
to perform, for example, 50,000 null permutations to generate empirical p-values.  T
he results are saved in the file
     plink.cnv.summary.mperm
This is a standard empirical p-value file: EMP1 and EMP2 represent pointwise 
and genome-wide corrected p-values, respectively. Both tests are 1-sided by default.
You can consult the corresponding
     plink.cnv.summary
that is also generated for details of the association: this file has the fields
     CHR    Chromosome code
     SNP    SNP identifier (dummy SNP, see below)
     BP     Base-pair position
     AFF    Number of affected individuals with a segment at this position
     UNAFF  Number of unaffected individuals 
To instead perform a 2-sided test (i.e. allowing that events might be more common in controls) add 
the flag
     --cnv-test-2sided
To perform an analysis in which the total number of events within a
sliding window is compared between cases and controls (rather than the
number overlapping a single position) add the flag
     --cnv-test-window 50
where the parameter is the kb window either side of the test
position. As before, the association results are reported per marker,
but now the counts indicate the total number of segments that overlap
any of the 100kb window surrounding the test position (+/- 50kb),
rather than just the test position itself. Significance is evaluated
by permutation as before.
Association mapping with segmental CNV data: regional tests
To perform a test of association for CNVs in particular regions, use the command
 ./plink --cfile mydata  --cnv-intersect glist-hg18  --cnv-test-region --mperm 10000
where glist-hg18 contains a list of genes (as available from the resources page. The output is written to
     plink.cnv.regional.summary
which has the fields
     CHR       Chromosome code
     REGION    Name of region
     BP1       Start position of region
     BP2       End position of region
     AFF       Number of case CNVs spanning region
     UNAFF     Number of control CNVs spanning region
and the permutation results are written to
     plink.cnv.regional.summary.mperm
which has the fields
     CHR     Chromosome code
     REGION  Name of region
     STAT    Statistic
     EMP1    Empirical p-value, per region
     EMP2    Empirical p-value, corrected for all tests
For example, the line
 CHR           REGION          BP1          BP2      AFF    UNAFF
   1           TTLL10      1079148      1143176        2        3
   ...
 
implies 2 case CNVs (note, PLINK does not distinguish whether these
CNVs belong to the same individual or not) and 3 control CNVs span the
gene TTLL10. The standard commands for regions in CNV
analysis such as --cnv-border and --cnv-overlap 
can be used in this context.
Association mapping with segmental CNV data: quantitative traits
To test for association between rare CNVs and a quantitative trait,
use the same commands as for disease traits.  PLINK will automatically
detect that the phenotype is continuous.  For example, if the file
pheno.qt contains a quantitative trait, the command
 ./plink --cfile mydata --pheno qt.dat --mperm 10000
will generate a file
     plink.cnv.qt.summary
which contains the fields
     CHR    Chromosome code
     SNP    Dummy label for map position
     BP     Physical position (base-pairs)
     NCNV   Number of individiuals with a CNV here
     M1     QT mean in individuls with a CNV here
     M0     QT mean in individuals without a CNV here
and the file
     plink.cnv.qt.summary.mperm
that contains the empirical p-values, EMP1 and EMP2,
as for disease traits.  The only difference is that the quantitative
trait test is, by default, two-sided.  To perform a 1-sided CNV test,
add the command
     --cnv-test-1sided
NOTE Currently, genome-wide burden
(--cnv-indiv-perm), window-based (--cnv-test-window)
and region-based (--cnv-test-region) CNV association tests
are not available for quantitative traits. 
Writing new CNV lists
Given a set of filters applied, you can output as a new CNV file the
filtered subset, with the command
     --cnv-write
For example, to make a new file using only deletions over 200kb but not more 
than 1000kb, with a quality score of 10 or more, use the command
 plink --cfile cnv1 
       --cnv-del 
       --cnv-kb 200 
       --cnv-max-kb 1000 
       --cnv-score 10
       --cnv-write 
       --out hiqual-large-deletions 
which will generate two new files 
     hiqual-large-deletions.cnv
     hiqual-large-deletions.fam
To obtan a corresponding MAP file, so that you can subsequently use
     --cfile hiqual-large-deletions
give the command
 plink --cnv-list hiqual-large-deletions.cnv --cnv-make-map --out hiqual-large-deletions
(although note that this will overwrite the LOG file generated by the --cnv-write 
command).
Creating UCSC browser CNV tracks
As opposed to listing CNVs in PLINK format with --cnv-write,
the command --cnv-track will generate a UCSC-friendly BED
file (note: this is distinct from a PLINK binary PED file) that can be
uploaded to their browser for convenient viewing. 
 plink --cfile mydata --cnv-track --out mycnvs
which generates a file
     plink.cnv.bed
The filtering commands described above can be combined with this option.
By using the Manage custom tracks option on
the UCSC genome
browser, one can easily visualise the CNV data, along side other
genomic features. For example, the file (IID
and SCORE, SITES information is omitted for clarity)
    FID IID  CHR        BP1          BP2  TYPE    SCORE    SITES
    ... ...   22   20140420     20241877     1      ...      ...
    ... ...   22   20140420     20241877     1      ...      ...
    ... ...   22   20129453     20241877     1      ...      ...    
    ... ...   22   20140609     20241877     1      ...      ...    
    ... ...   22   20140420     20241877     1      ...      ...    
    ... ...   22   20140420     20241877     1      ...      ...    
    ... ...   22   20639721     20793965     1      ...      ...    
    ... ...   22   20639721     20765489     1      ...      ...    
    ... ...   22   20305076     20591362     3      ...      ...    
    ... ...   22   20646213     20756780     3      ...      ...    
    ... ...   22   20140420     20259122     1      ...      ...    
    ... ...   22   20639866     20787533     3      ...      ...    
    ... ...   22   20140420     20241877     1      ...      ...    
    ... ...   22   20140420     20241877     1      ...      ...    
    ... ...   22   20140420     20241877     1      ...      ...    
    ... ...   22   20140420     20241877     1      ...      ...    
    ... ...   22   20348901     20498220     3      ...      ...    
    ... ...   22   20140420     20241877     1      ...      ...    
    ... ...   22   20140420     20241877     1      ...      ...    
    ... ...   22   20639643     20793173     3      ...      ...    
    ... ...   22   20140420     20241877     1      ...      ...    
    ... ...   22   20141114     20241877     1      ...      ...    
    ... ...   22   20140420     20254215     1      ...      ...    
    ... ...   22   20140420     20241877     1      ...      ...    
    ... ...   22   20129130     20241877     1      ...      ...    
is rendered
 
Note that the CNVs are split by deletion versus duplication (red versus blue) and case versus control (light versus dark).
Additionally, a poor-man's version of this plot can be obtained with the 
command
     --cnv-seglist
which produces a file
     plink.cnv.seglist
which, for the CNV list above, can be seen here. Deletions 
and duplications are represented by + and - symbols at the start of each CNV; case
and control status is represented as A and U.
Finally, it is also possible to report CNVs annotated by the regions or genes
they span (see --cnv-verbose-report-regions
described below.
Listing intersected genes and regions
With the --cnv-intersect (or --cnv-exclude) command, 
you can add the flag
     --cnv-report-regions
which will create a file
     plink.reg
listing only the regions that intersect (or do not intersect) with any of the 
CNVs (given the filtering and overlap commands that might also be specified). For example, 
to obtain a list of genes that are intersected by a rare case singleton deletions over 500kb (i.e. event seen only once)
 plink --cfile mydata 
       --filter-cases
       --cnv-freq-exclude-above 1
       --cnv-del
       --cnv-kb 500
       --cnv-report-regions
       --cnv-intersect glist-hg18
Alternatively, the command
     --cnv-verbose-report-regions
produces a verbose form of plink.reg, which does not just list the regions or genes intersected 
but lists the specific segmental CNVs also. This can be used in conjunction with, for example,
     --cnv-subset genes.txt
in order to produce reports on specific genes of interest. For example if genes.txt contained
     HES4
     ISG15
then
 plink --cfile mydata 
       --cnv-verbose-report-regions 
       --cnv-intersect glist-hg18 
       --cnv-border 20
       --cnv-subset genes.txt
would produce a file
     plink.reg
that, for each gene/region, contains the following fields
     FID        Family ID
     IID        Individual ID
     PHE        Phenotype
     CHR        Chromosome code
     BP1        Start position (base-pair)
     BP2        Stop position (base-pair)
     TYPE       DELetion or DUPlication
     KB         Kilobase length of CNV
     OLAP       Overlap (extent of CNV covered by gene)
     OLAP_U     Union overlap (ration of intersection to union)
     OLAP_R     Region overlap (extent of gene covered by CNV)
that might contain something like the following report
  RANGE (+/- 20kb )  [ 1 924206 925333 HES4 ]
       FID   IID   PHE  CHR        BP1      BP2  TYPE      KB     OLAP   OLAP_U   OLAP_R
      P001     1     2    1     789258  1232396   DUP   443.1  0.09281  0.09281        1
      P002     1     1    1     826576  1304312   DEL   477.7  0.08609  0.08609        1
      P003     1     2    1     864765  1913364   DUP    1049  0.03922  0.03922        1
      P004     1     1    1     890974  1258710   DUP   367.7   0.1118   0.1118        1
  RANGE (+/- 20kb )  [ 1 938709 939782 ISG15 ]
       FID   IID   PHE  CHR        BP1      BP2  TYPE      KB     OLAP   OLAP_U   OLAP_R
      P001     1     2    1     789258  1232396   DUP   443.1  0.09269  0.09269        1
      P002     1     1    1     826576  1304312   DEL   477.7  0.08598  0.08598        1
      P003     1     2    1     864765  1913364   DUP    1049  0.03917  0.03917        1
      P004     1     1    1     890974  1258710   DUP   367.7   0.1117   0.1117        1
That is, this is a list of any CNV that at least partially overlaps
these two genes.  The exact behavior can be modified with flags such
as --cnv-del, --cnv-kb, --cnv-disrupt, --cnv-overlap,
--filter-cases, etc.
Reporting sets of overlapping segmental CNVs
Finally, there are two option to group or report sets of segments that 
span a particular position. In the first case, use the option
     --segment-group
which takes all segments in a given region (whole genome unless
otherwise specified) and forms "pools" of overlapping
segments. Several pools of overlapping segments will be created; these
will be listed in order of decreasing size (number of segments); note
that the same segment can appear in multiple pools (e.g. if A overlaps
with C, and B overlaps with C, but A and B do not overlap). The pools
give information as described below.
The more restricted form of this command forms a single pool of all
segments that overlap a particular position, which takes a single
parameter of a marker name; typically these will be the dummy pos* 
markers created by the --cnv-make-map command.
     --segment-spanning pos119
In this case, for some made-up data, we see from
the plink.cnv.summary file that there are 8 cases and 6
controls with a segment spanning a particular
position, pos586
      CHR        SNP           BP      AFF    UNAFF
      ...        ...          ...      ...      ...
        1     pos586     16631570        8        6
      ...        ...          ...      ...      ...
In this case, there is unsurprisingly no association between segmental
CNVs and disease: for example, the corresponding position in
the plink.cnv.summary.mperm file shows an empirical p-value
of 0.35, but of p=1 if adjusted for multiple testing (EMP2)
      CHR     SNP         STAT         EMP1         EMP2
      ...     ...          ...          ...          ...
        1  pos586     0.419408     0.351324            1
      ...     ...          ...          ...          ...
Naturally, one would usually be more interested in following up significantly
associated regions of course...  Nonetheless, if so desired we can see which 
segments (given any of the filtering specified) are spanning this position, with 
--segment-spanning, which gives the following:
    POOL       FID       IID    PHE  CHR        BP1       BP2       KB  TYPE  SCORE
      S1   PT-2378   PT-2378      2   12   16631570  16751087  119.517   DEL  10.23
      S1   PT-268D   PT-268D      2   12   16631494  16732162  100.668   DEL    9.3
      S1   PT-2M8O   PT-2M8O      1   12   16631441  16751082  119.641   DEL  31.23
      S1   PT-2FZ9   PT-2FZ9      2   12   16631436  16751045  119.609   DEL   15.2
      S1   PT-287D   PT-287D      1   12   16616579  17183201  566.622   DUP  200.3
      S1   PT-2C91   PT-2C91      2   12   16616579  16751045  134.466   DEL   14.3
      S1   PT-28A8   PT-28A8      1   12   16616579  16751045  134.466   DEL    8.3
      S1   PT-2FPB   PT-2FPB      1   12   16616579  16714372   97.793   DEL   11.1
      S1   PT-28IG   PT-28IG      2   12   16616579  16708856   92.277   DEL   10.3
      S1   PT-2E5N   PT-2E5N      2   12   16614664  16715703  101.039   DEL   9.87
      S1   PT-2FVL   PT-2FVL      1   12   16614664  16751045  136.381   DEL  10.67
      S1   PT-2DYE   PT-2DYE      2   12   16614664  16715489  100.825   DEL  11.82
      S1   PT-264I   PT-264I      2   12   16614664  16751045  136.381   DEL   14.2
      S1   PT-25WZ   PT-25WZ      1   12   16591338  16715767  124.429   DEL   14.7
      S1       CON        14    8:6   12   16631570  16708856   77.286    NA     NA
      S1     UNION        14    8:6   12   16591338  17183201  591.863    NA     NA
For CNV data (in contrast to shared segments based on homozygosity or IBD sharing) the 
extra fields of TYPE (deletion or duplication) and SCORE (some metric
of quality/confidence of CNV call) are also presented.
Here we see the 14 segments listed, 8 cases and 6
controls. The CON and UNION lines at the end of the
pool give the consensus region (i.e. shared by all segments) and the
total distance spanned by all. The PHE field gives the
phenotype for each individual.
Note that the way in which the dummy markers are selected will effectively mean that every 
possibly unique position, in terms of counts of segments, is evaluated.  The actual base 
pair regions of any dummy marker is itself probably not of interest: given a sigificant (set of) 
SNPs, the strategy would be to select any one and generate the corresponding pool to see 
what and where the association maps to.
 | 
  |