Basic syntax and conventions for the PSEQ tool
PSEQ is a command-line interface to the PLINK/Seq library and will likely be the main entry point for most users. This page provides a linked index for all major commands and concepts. For an introduction to the overall architecture of PLINK/Seq and how PSEQ fits into this framework, please read this page first.Sections on this page below
Pages describing specific PSEQ commands in detail
- Project specification : Create, configure and summarise projects.
- Data input : Input genotypic data and meta-information from VCFs and PLINK filesets.
- Annotating and grouping variants : Adding auxiliary information to a project
- Viewing data : Filter and view genotype and phenotype information, by variant or gene.
- Data output : Output genotypic data in a variety of formats.
- Summary statistics : Calculate various metrics per dataset, gene or individual.
- Association analysis : Genotype/phenotype association tests, for single variants and groups of variants.
- Locus database operations : Manage locus (gene) databases.
- Reference/sequence database operations : Manage reference variant and sequence databases.
Other relevant pages
- Tutorial : Illustrates use of PSEQ, with 1000 Genomes pilot 3 data.
- Masks : Syntax for specifying masks (filters).
- Variants and samples : Overview of how PLINK/Seq stores and processes genotypic information.
- Full reference table : List of all commands and options.
Basic usage
From your system's command-line shell, the PSEQ tool is invoked as follows:
pseq (project|VCF file|-|.) command {--options...}
Here, the first argument determines the input source for the command, as described below. The second argument is a command. All available PSEQ commands can be listed with the help command, also described below. Commands specify general activities, such as loading, summarising, viewing, analysing or transforming project data. Each PSEQ job takes exactly one command.
In addition to a primary command, a number of auxiliary arguments can be specified. In some instances a particular primary command requires further arguments to be given: for example, the load-plink command requires a fileset to be specified via the --file argument. In general, PSEQ will complain with an error message if a necessary argument has been omitted for a given command: e.g.
pseq /path/to/project load-plink
will give the error message
pseq error : no --file specified
All auxiliary arguments start with a double hyphen (--argument), although some of the most common flags have an abbreviated syntax with a single hyphen: for example, --file and -f are equivalent.
Here are some of the more common auxiliary arguments:
Argument | Abbreviation | Description |
---|---|---|
--mask | -m | Specify a mask (filters for the project data, described here) |
--phenotype | -p | Specify a phenotype by its label (or, sometimes a filename and a label) |
--file | -f | Specify one or more file names, as required by some commands |
--vcf | Specify one or more VCF files | |
--group | -g | Specify one or more group names, e.g. in a LOCDB or REFDB |
--stats | List of statistics to be calcated, e.g. by a v-stats command |
Most arguments are used as simple flags (e.g. --vmeta) or take a white-space delimited list of 1 (or more) values (e.g. --group dbsnp). Certain arguments take a list of keyword=value pairs, most commonly this will be the --mask option: that have their own syntax, e.g.
pseq /path/to/project v-view --mask loc=ccds,refseq mac=1-10
Help command
A full list of commands and auxiliary arguments can be obtained with the help function:
pseq help
By itself, this will display a list of topics that can be further queried with the help command:
usage:pseq {project-file|VCF|-|.} {command} {--options} Command groups --------------------------------------------------------- input Data input output Variant data output project Project functions stats Variant summary statistics tests Genotype-phenotype association tests qc Quality control metrics and tests views Viewing variant and other data annot Annotation functions varop Variant database operations locop Locus database operations seqop Sequence database operations refop Reference database operations indop Individual database operations ibd IBD analysis net Network-based analysis misc Misc. Mask groups --------------------------------------------------------- annotation Masks based on LOCDB transcript annotation (under revision) case-control Individual masks based on a disease phenotype from INDDB files Include/exclude variants based on presence in one or more files filters Masks based on the FILTER and QUAL fields frequency Masks based on variant allele frequency genotype Per-genotype masks and behaviours locus-groups Interval-based masks involving LOCDB locus-set-groups Masks based on sets of loci from a LOCDB misc-masks Various other masks phenotype Individual masks based on phenotypes from INDDB ref-variants Masks based on REFDB variants regions Interval-based masks specified on the command line samples Include/exclude individuals/files skip Options to skip reading certain things (improves speed) vmeta Masks based on a variant's meta-information (INFO field) pseq help all pseq help {group} pseq help {command}
A help command that specifies a group (or commands, or masks) will list all the commands (or masks) in that group, e.g.:
pseq help views
views : Viewing variant and other data --------------------------------------------------------- v-view view variant data rv-view view rare alleles mv-view view multiple variants mrv-view view multiple rare variants g-view view variants grouped by gene i-view individuals in project/file seg-view individual segments loc-view show all loci in a LOCDB group loc-stats locus-based stats ref-view view a group from a REFDB seq-view view regions of sequence from SEQDB counts summary/count statistics g-counts genotype summary/count statistics unique view variants specific to individual groups
A help command that specifies a specific command or mask will list all the options available for that command:
pseq help v-view
v-view : view variant data --------------------------------------------------------- --geno { flag } show genotypes --gmeta { flag } show genotype meta-information --hide-null { flag } . --only-alt { flag } . --only-minor { flag } . --pheno . --samples { flag } show each specific sample variant --simple { flag } simple variant format, POS RET ALT --verbose { flag } verbose output --vmeta { flag } show variant meta-information
Note: the help function is still in the process of being updated, so not all commands will have all possible options listed as of the current version. Some topic areas (e.g. net and ibd) do not list any available commands currently -- these represent hidden commands that are under-development and will appear in future releases.
Input mode : working with or without projects
All PSEQ commands have the form
pseq input-source command --options...
The input-source can be one of four possibilities. First, it can be a project file, as described here (in this case, the actual genetic data will be in the project's variant database, or in indexed compressed VCFs):
pseq myproj v-view
Alternatively, it may be a single VCF, either plain-text or compressed (with zlib/gzip or BGZF compression) (but must end .vcf or .vcf.gz):
pseq my.vcf.gz v-view
If the input-source is the - character, this indicates that a valid VCF will be streamed into standard input:
cat my.vcf | pseq - v-view
Finally, the . character is used to indicate that no VCF or project is attached. This is often appropriate when using auxiliary commands, e.g. loc-load to make a LOCDB, that will stand alone (i.e. is not attached to any one project):
pseq . loc-load --locdb /path/to/locdb --file my.gtf --group my_loci