Meta Information

An important component of genetic variation data is the meta information that accompanies variants. PLINK/SEQ adopts a represention of meta information similar to that of VCF files, explained here. PLINK/SEQ supports three classes of meta information for variant and genotype data:

  • Static attributes are population-wide properties of a variant. If a variant is included in multiple VCF files, static attributes should be consistent across all files. For example, whether or not a variant is included in dbSNP is static.
  • Sample-specific attributes only apply to a particular variant in a particular sample. For example, the number of distinct observed alternate alleles for a variant can vary across VCF files if multiple alternate alleles are observed in one sample, but only a single alternate allele in another sample.
  • Genotype meta-information applies to a single genotype in a single study. An example is the read depth of a particular variant call.

Meta-information type

Meta information can also have any of the following variable types: Integer, Float, String, Bool and Flag. In additon, for the first four types, meta-information can a single value, or a vector of values (of either fixed or variable length). This broadly follows the VCF specification for tags.

In addition, the type of a tag can be explicitly declared by use of the declare command. This can be useful if an important VCF did not explicitly define the types of its constituent tags.

Sources of meta-information

A project's meta-information can come from three main sources:

  • Directly from a VCF (in the INFO or FORMAT fields, for example), imported into a variant database (VARDB)
  • Subsequently attached to the VARDB for existing variants by use of the attach-meta command
  • Attached on-the-fly for a specific command (without being permanently loaded into the VARDB) by use of the meta.file mask option

The METAMETA file

The INFO fields of a VCF file do not specify whether a field is static or sample-specific. In PLINK/SEQ, this can be specified in the METAMETA file, as in the following example:

VM     STATIC

Each line of a METAMETA file contains two tab-separated values. The first is the name of a meta field; the second is its class. The default scope - if a meta field is omitted from the METAMETA file, or if no METAMETA file is given, is sample-specific.