Principal Spectral Components (PSC) analysis

Fit principal spectral components (PSC) to sample-level power spectral and cross-channel connectivity metrics

Luna commands can produce a lot of output. For example, estimates of spectral power at 0.5 to 30 Hz in 0.25 Hz bins, for 60 EEG channels, will give 7,140 metrics (e.g. from PSD). Looking at cross-channel coherence for the same frequency range will give 210,630 more metrics per individual/sleep stage. This poses statistical and practical challenges if these measures are to be visualized or used in downstream statistical analyses.

There is a great deal of redundancy between many of these measures, however, meaning that the effective dimensionality will typically be order-of-magnitude lower. This scenario suggests the use of data reduction as an intermediate step, to represent these types of spectral metrics more efficiently.

Principal spectral components (PSC) is one method that provides a simple means of data reduction, essentially applying singular value decomposition (SVD) to the matrix of individuals/epochs (rows) by spectral measures (columns). The spectral measures will typically be power for different frequency bins and channels; alternatively, these data may also include cross-channel metrics such as coherence or the phase slope index.

In general, the idea is to take the high-dimensional (but also highly redundant) data from commands such as PSD, MTM, COH or PSI and extract a much smaller number of components that explain most of the original variance. This has the potential to provide insights into the structure of individual differences across related sleep measures (although interpreting components can be challenging). More directly, it has the potential to provide a powerful set of independent measures for subsequent statistical analyses (or, in the context of the POPS model, sleep staging), as well as a means to handle multiple-testing problems.

Two commands provide support to 1) fit a PSC decomposition to existing spectral output data (either between individual, or within-individual) via --psc, and 2) to project new data into a previously defined lower dimensional space. Although the computation behind these commands is very standard (e.g. the same output would be obtained via standard commands from any statistics package given the same input matrix), these commands are designed to work efficiently from a practical standpoint with Luna output and EDFs.

Command	Description
`--psc`	Estimate PSCs from samples of spectral/connectivity metrics
`PSC`	Project new samples into an existing PSC space

--psc

Estimate PSCs from samples of spectral/connectivity metrics

This command reads one or more files, looking for spectral measures stratified by channel and frequency, expecting long-format, i.e. as would be generated by

destrat out.db +PSD -r F CH > psd.txt

or

destrat out.db +COH -r E F CH1 CH2 > coh.txt

Specifically, each file must have a header row that lists the following stratifying variables:

an ID column
either CH or the pair of CH1 and CH2
the frequency column F is also expected
if expecting epoch-level data (i.e. given the epoch flag has been added), then a column E must also be present

In addition to the above, the command also expects one or more variables to be present, which correspond to the v parameter, e.g. v=PSD in the case of output from the PSD command.

The command reads in the data from one or more long-format file, and constructs a matrix where rows are individuals (or epochs) and columns are values of the variable(s) listed (e.g. PSD) stratified by channel(s) and frequencies. It checks that the matrix is fully-specified (i.e. all measures are defined for all individuals) and then performs one or more case-wise outlier removal sweeps, based on a row having values beyond X standard deviation units from the mean for one or more variables. The command then applies SVD and writes the U, and W and (optionally) V matrices out.

For a toy example example (i.e. obviously not real data - this is purely to illustrate the structure of the input data):

ID    CH   F   PSD
id01  C3   1   1.11
id01  C3   2   1.12
id01  C3   3   1.13
id01  F4   1   1.21
id01  F3   2   1.22
id01  F3   3   1.23
id02  C3   1   2.11
id02  C3   2   2.12
id02  C3   3   2.13
id02  F4   1   2.21
id02  F3   2   2.22
id02  F3   3   2.23

With the parameters v=PSD then the implied data matrix is a two-by-six as follows:

ID    PSD_C3_1 PSD_C3_2 PSD_C3_3 PSD_F3_1 PSD_F3_2 PSD_F3_3
id01  1.11     1.12     1.13     1.21     1.22     1.23
id01  2.11     2.12     2.13     2.21     2.22     2.23

This command does not assume any EDFs for input, and so no sample list need be specified (i.e. this is why this command has the special form --psc rather than using Luna's normal command syntax). The only inputs are the results files from previous spectral analyses.

Note

Note that although this is named spectral components, and the command (as below) calls for the spectra to be input, this command is generic, in the sense that any measures can be input; these measures may (or may not) additionally be stratified by frequency, channel, channel pair. The label spectral is really a historical accident in Luna development, reflecting the first application of what is really a more generic command.

Parameters

Core parameters are:

Parameter	Example	Description
`spectra`	`psd.txt,coh.txt`	Original metrics (i.e. input)
`v`	`PSD`	Name of the variables(s) to extract
`nc`	15	Number of components to extract (default: 10)
`norm`		Standardize inputs
`th`	`5,5`	Set individuals to missing (case-wise deletion)

Optional parameters:

Parameter	Example	Description
`ch`	`C3,C4`	Only extract these channels
`inc-ids`	`id1,id2`	Only extract these individuals
`ex-ids`	`id3`	Exclude these individuals
`dB`	`PSD`	Take log of these variables
`abs`	`ICOH`	Take absolute value of these variables
`epoch`		Expect epoch-level input (and so key on `ID:E`)
`f-lwr`	0.5	Lower frequency bin
`f-upr`	20	Upper frequency bin

Output parameters:

Parameter	Example	Description
`proj`	`file.txt`	Save projection
`not-only-u`		Output V matrix
`v-matrix`	`file.txt`	Write component definitions to this file

Outputs

Individual-level output: (strata: PSC)

Variable	Description
`U`	Component scores (left singular vectors U)

Model-level output, per component: (strata: I)

Variable	Description
`VE`	Variance explained
`CVE`	Cumulative variance explained
`W`	W (diagonal) matrix element
`INC`	0/1 indicator for whether this component was selected (given `nc`)

Model-level output, per feature: (strata: J)

Variable	Description
`CH`	Channel
`CH1`	First channel (for features based on channel pairs)
`CH2`	Second channel (for features based on channel pairs)
`F`	Frequency

Model-level output, per component/feature: (strata: I x J)

Variable	Description
`V`	V matrix element

Example

Obtain power spectra from 50 individuals in a sample-list, for two channels:

luna s.lst 1 50 -o out.db -s 'MASK ifnot=NREM2 & RE & PSD sig=C3,C4 spectrum dB'

destrat out.db +PSD -r F CH > psd.txt

The file psd.txt contains 9900 rows (plus a header).

head psd.txt

ID          CH     F        PSD
id-0001     C3     0.5      -27.4700529572773
id-0001     C3     0.75     -33.0192745307719
id-0001     C3     1        -37.3284151965963
id-0001     C3     1.25     -40.4175405446029
id-0001     C3     1.5      -42.6210646368782
...

Note how we use echo to send the arguments to Luna via standard input for this special command:

echo "spectra=psd.txt v=PSD nc=10" | luna --psc -o psc.db

The console logs some key information:

  reading spectra from psd.txt
  converting input spectra to a matrix
  found 50 rows (individuals) and 198 columns (features)
  good, all expected observations found, no missing data
  after outlier removal, 50 individuals remaining
  mean-centering data matrix
  about to perform SVD...
  done... now writing output

The new components (left singular vectors) are in the U matrix, which is stratified by PSC (i.e. here the ten PSCs requested):

destrat psc.db +PSC -r PSC

We can see the variance explained by each component:

destrat psc.db +PSC -r I

For the i'th component, the variance explained VE and cumulative variance explained CVE, as well as the singular values (W). The INC column indicates whether this component was selected for output.

ID   I    CVE      INC   VE       W
.    1    0.6866   1     0.6866   334.394566520496
.    2    0.8438   1     0.1572   160.027701798135
.    3    0.8885   1     0.0446   85.2727231551893
.    4    0.9269   1     0.0383   79.0675929112509
.    5    0.9462   1     0.0193   56.0980576766604
.    6    0.9629   1     0.0167   52.1652898703596
.    7    0.9730   1     0.0101   40.5734600579774
.    8    0.9787   1     0.0056   30.4009241212978
.    9    0.9821   1     0.0034   23.6838493013292
.    10   0.9850   1     0.0029   21.8003728646808

In addition, the J factors give some useful meta-information about each feature (column in the original data)

destrat psc.db +PSC -r J  | head

ID   J             CH   F      VAR
.    C3~0.5~PSD    C3   0.5    PSD
.    C3~0.75~PSD   C3   0.75   PSD
.    C3~1~PSD      C3   1      PSD
.    C3~1.25~PSD   C3   1.25   PSD
.    C3~1.5~PSD    C3   1.5    PSD
.    C3~1.75~PSD   C3   1.75   PSD
.    C3~10~PSD     C3   10     PSD
.    C3~10.25~PSD  C3   10.25  PSD
.    C3~10.5~PSD   C3   10.5   PSD

Info

We intend to produce a vignette to some applications of PSC in the near future.

`PSC`

Project new samples into an existing PSC space

Parameters

Parameter	Example	Description
`proj`	`proj=p1.txt`	Projection file from prior `--psc proj` output
`cache`	`cache=c1`	Cache name (from prior `cache-metrics` performed this run)
`norm`		Standardize inputs given the mean/SD from the original (`--psc` sample) data

Output

Individual-level output: (strata: PSC)

Variable	Description
`U`	Component scores (left singular vectors U)

Example

Continuing from the example above: based on N2 power spectra from 50 individuals, we repeat the above command but saving the projection (basically the V and W matrices from the SVD, along with the mean/SD of the original features, and a description of what they are, i.e. which channels, frequencies and metrics) in the file p1.txt:

echo "spectra=psd.txt v=PSD nc=10 proj=p1.txt" | luna --psc -o psc.db

To project a new individual into this space, we need to generate the equivalent set of features, and use Luna's cache mechanism to allow the PSC to speak to the PSD command, i.e. supplying the relevant features X for this individual, which will be scaled by V and W to give the corresponding U values (components) for this new individual.

luna s.lst 51 -o out.db -s ' MASK ifnot=NREM2 & RE
                 PSD sig=C3,C4 spectrum dB cache-metrics=c1
                 PSC proj=p1.txt cache=c1 '

Note the use of cache-metrics for PSD and the same cache (arbitrarily labelled c1 here) is attached to the PSC command.

The PSC checks that all of the required features (i.e. PSD for C3 and C4 channels for a given set of frequencies) are available in the cache; if they are not, the PSC command reports an error message. Naturally, the PSC is not able to check that other factors are similar (i.e. whether absolute or relative, raw versus log-scaled power was used, or whether power is only from N2 sleep etc). Naturally, for the PSCs to be interpretable in these new individuals, it is important to ensure that one is comparing like with like.