Basic syntax and conventions for the PSEQ tool

PSEQ is a command-line interface to the PLINK/Seq library and will likely be the main entry point for most users. This page provides a linked index for all major commands and concepts. For an introduction to the overall architecture of PLINK/Seq and how PSEQ fits into this framework, please read this page first.

Sections on this page below

  • Basic usage : Using the PSEQ command-line.
  • Help command : Obtaining a list of available commands, masks and options
  • Input modes : Using PSEQ with and without attached projects

    Pages describing specific PSEQ commands in detail

    Other relevant pages

    • Tutorial : Illustrates use of PSEQ, with 1000 Genomes pilot 3 data.
    • Masks : Syntax for specifying masks (filters).
    • Variants and samples : Overview of how PLINK/Seq stores and processes genotypic information.
    • Full reference table : List of all commands and options.

    Basic usage

    From your system's command-line shell, the PSEQ tool is invoked as follows:

    pseq (project|VCF file|-|.) command {--options...}

    Here, the first argument determines the input source for the command, as described below. The second argument is a command. All available PSEQ commands can be listed with the help command, also described below. Commands specify general activities, such as loading, summarising, viewing, analysing or transforming project data. Each PSEQ job takes exactly one command.

    In addition to a primary command, a number of auxiliary arguments can be specified. In some instances a particular primary command requires further arguments to be given: for example, the load-plink command requires a fileset to be specified via the --file argument. In general, PSEQ will complain with an error message if a necessary argument has been omitted for a given command: e.g.

    pseq /path/to/project load-plink

    will give the error message

    pseq error : no --file specified
    

    All auxiliary arguments start with a double hyphen (--argument), although some of the most common flags have an abbreviated syntax with a single hyphen: for example, --file and -f are equivalent.

    Here are some of the more common auxiliary arguments:

    Argument Abbreviation Description
    --mask -m Specify a mask (filters for the project data, described here)
    --phenotype -p Specify a phenotype by its label (or, sometimes a filename and a label)
    --file -f Specify one or more file names, as required by some commands
    --vcf Specify one or more VCF files
    --group -g Specify one or more group names, e.g. in a LOCDB or REFDB
    --stats List of statistics to be calcated, e.g. by a v-stats command

    Most arguments are used as simple flags (e.g. --vmeta) or take a white-space delimited list of 1 (or more) values (e.g. --group dbsnp). Certain arguments take a list of keyword=value pairs, most commonly this will be the --mask option: that have their own syntax, e.g.

    pseq /path/to/project v-view --mask loc=ccds,refseq mac=1-10

    Help command

    A full list of commands and auxiliary arguments can be obtained with the help function:

    pseq help

    By itself, this will display a list of topics that can be further queried with the help command:

     
    usage:pseq {project-file|VCF|-|.} {command} {--options}
    
    Command groups
    ---------------------------------------------------------
    input       Data input
    output      Variant data output
    project     Project functions
    stats       Variant summary statistics
    tests       Genotype-phenotype association tests
    qc          Quality control metrics and tests
    views       Viewing variant and other data
    annot       Annotation functions
    varop       Variant database operations
    locop       Locus database operations
    seqop       Sequence database operations
    refop       Reference database operations
    indop       Individual database operations
    ibd         IBD analysis
    net         Network-based analysis
    misc        Misc.
    
    Mask groups
    ---------------------------------------------------------
    annotation        Masks based on LOCDB transcript annotation (under revision)
    case-control      Individual masks based on a disease phenotype from INDDB
    files             Include/exclude variants based on presence in one or more files
    filters           Masks based on the FILTER and QUAL fields
    frequency         Masks based on variant allele frequency
    genotype          Per-genotype masks and behaviours
    locus-groups      Interval-based masks involving LOCDB
    locus-set-groups  Masks based on sets of loci from a LOCDB
    misc-masks        Various other masks
    phenotype         Individual masks based on phenotypes from INDDB
    ref-variants      Masks based on REFDB variants
    regions           Interval-based masks specified on the command line
    samples           Include/exclude individuals/files
    skip              Options to skip reading certain things (improves speed)
    vmeta             Masks based on a variant's meta-information (INFO field)
    
    pseq help all
    pseq help {group}
    pseq help {command}
    
    

    A help command that specifies a group (or commands, or masks) will list all the commands (or masks) in that group, e.g.:

    pseq help views
      views : Viewing variant and other data
      ---------------------------------------------------------
      v-view     view variant data
      rv-view    view rare alleles
      mv-view    view multiple variants
      mrv-view   view multiple rare variants
      g-view     view variants grouped by gene
      i-view     individuals in project/file
      seg-view   individual segments
      loc-view   show all loci in a LOCDB group
      loc-stats  locus-based stats
      ref-view   view a group from a REFDB
      seq-view   view regions of sequence from SEQDB
      counts     summary/count statistics
      g-counts   genotype summary/count statistics
      unique     view variants specific to individual groups
    

    A help command that specifies a specific command or mask will list all the options available for that command:

    pseq help v-view
      v-view : view variant data
      ---------------------------------------------------------
      --geno { flag }       show genotypes
      --gmeta { flag }      show genotype meta-information
      --hide-null { flag }  .
      --only-alt { flag }   .
      --only-minor { flag } .
      --pheno               .
      --samples { flag }    show each specific sample variant
      --simple { flag }     simple variant format, POS RET ALT
      --verbose { flag }    verbose output
      --vmeta { flag }      show variant meta-information
    

    Note: the help function is still in the process of being updated, so not all commands will have all possible options listed as of the current version. Some topic areas (e.g. net and ibd) do not list any available commands currently -- these represent hidden commands that are under-development and will appear in future releases.

    Input mode : working with or without projects

    All PSEQ commands have the form

    pseq input-source command --options...

    The input-source can be one of four possibilities. First, it can be a project file, as described here (in this case, the actual genetic data will be in the project's variant database, or in indexed compressed VCFs):

    pseq myproj v-view

    Alternatively, it may be a single VCF, either plain-text or compressed (with zlib/gzip or BGZF compression) (but must end .vcf or .vcf.gz):

    pseq my.vcf.gz v-view

    If the input-source is the - character, this indicates that a valid VCF will be streamed into standard input:

    cat my.vcf | pseq - v-view

    Finally, the . character is used to indicate that no VCF or project is attached. This is often appropriate when using auxiliary commands, e.g. loc-load to make a LOCDB, that will stand alone (i.e. is not attached to any one project):

    pseq . loc-load --locdb /path/to/locdb --file my.gtf --group my_loci