Skip to content

Principal Spectral Components (PSC) analysis

Fit principal spectral components (PSC) to sample-level power spectral and cross-channel connectivity metrics

Luna commands can produce a lot of output. For example, estimates of spectral power at 0.5 to 30 Hz in 0.25 Hz bins, for 60 EEG channels, will give 7,140 metrics (e.g. from PSD). Looking at cross-channel coherence for the same frequency range will give 210,630 more metrics per individual/sleep stage. This poses statistical and practical challenges if these measures are to be visualized or used in downstream statistical analyses.

There is a great deal of redundancy between many of these measures, however, meaning that the effective dimensionality will typically be order-of-magnitude lower. This scenario suggests the use of data reduction as an intermediate step, to represent these types of spectral metrics more efficiently.

Principal spectral components (PSC) is one method that provides a simple means of data reduction, essentially applying singular value decomposition (SVD) to the matrix of individuals/epochs (rows) by spectral measures (columns). The spectral measures will typically be power for different frequency bins and channels; alternatively, these data may also include cross-channel metrics such as coherence or the phase slope index.

In general, the idea is to take the high-dimensional (but also highly redundant) data from commands such as PSD, MTM, COH or PSI and extract a much smaller number of components that explain most of the original variance. This has the potential to provide insights into the structure of individual differences across related sleep measures (although interpreting components can be challenging). More directly, it has the potential to provide a powerful set of independent measures for subsequent statistical analyses (or, in the context of the SUDS model, sleep staging), as well as a means to handle multiple-testing problems.

Two commands provide support to 1) fit a PSC decomposition to existing spectral output data (either between individual, or within-individual) via --psc, and 2) to project new data into a previously defined lower dimensional space. Although the computation behind these commands is very standard (e.g. the same output would be obtained via standard commands from any statistics package given the same input matrix), these commands are designed to work efficiently from a practical standpoint with Luna output and EDFs.

Command Description
--psc Estimate PSCs from samples of spectral/connectivity metrics
PSC Project new samples into an existing PSC space

--psc

Estimate PSCs from samples of spectral/connectivity metrics

This command reads one or more files, looking for spectral measures stratified by channel and frequency, expecting long-format, i.e. as would be generated by

destrat out.db +PSD -r F CH > psd.txt
or
destrat out.db +COH -r E F CH1 CH2 > coh.txt

Specifically, each file must have a header row that lists the following stratifying variables:

  • an ID column
  • either CH or the pair of CH1 and CH2
  • the frequency column F is also expected
  • if expecting epoch-level data (i.e. given the epoch flag has been added), then a column E must also be present

In addition to the above, the command also expects one or more variables to be present, which correspond to the v parameter, e.g. v=PSD in the case of output from the PSD command.

The command reads in the data from one or more long-format file, and constructs a matrix where rows are individuals (or epochs) and columns are values of the variable(s) listed (e.g. PSD) stratified by channel(s) and frequencies. It checks that the matrix is fully-specified (i.e. all measures are defined for all individuals) and then performs one or more case-wise outlier removal sweeps, based on a row having values beyond X standard deviation units from the mean for one or more variables. The command then applies SVD and writes the U, and W and (optionally) V matrices out.

For a toy example example (i.e. obviously not real data - this is purely to illustrate the structure of the input data):

ID    CH   F   PSD
id01  C3   1   1.11
id01  C3   2   1.12
id01  C3   3   1.13
id01  F4   1   1.21
id01  F3   2   1.22
id01  F3   3   1.23
id02  C3   1   2.11
id02  C3   2   2.12
id02  C3   3   2.13
id02  F4   1   2.21
id02  F3   2   2.22
id02  F3   3   2.23
With the parameters v=PSD then the implied data matrix is a two-by-six as follows:
ID    PSD_C3_1 PSD_C3_2 PSD_C3_3 PSD_F3_1 PSD_F3_2 PSD_F3_3
id01  1.11     1.12     1.13     1.21     1.22     1.23
id01  2.11     2.12     2.13     2.21     2.22     2.23

This command does not assume any EDFs for input, and so no sample list need be specified (i.e. this is why this command has the special form --psc rather than using Luna's normal command syntax). The only inputs are the results files from previous spectral analyses.

Note

Note that although this is named spectral components, and the command (as below) calls for the spectra to be input, this command is generic, in the sense that any measures can be input; these measures may (or may not) additionally be stratified by frequency, channel, channel pair. The label spectral is really a historical accident in Luna development, reflecting the first application of what is really a more generic command.

Parameters

Core parameters are:

Parameter Example Description
spectra psd.txt,coh.txt Original metrics (i.e. input)
v PSD Name of the variables(s) to extract
nc 15 Number of components to extract (default: 10)
norm Standardize inputs
th 5,5 Set individuals to missing (case-wise deletion)

Optional parameters:

Parameter Example Description
ch C3,C4 Only extract these channels
inc-ids id1,id2 Only extract these individuals
ex-ids id3 Exclude these individuals
dB PSD Take log of these variables
abs ICOH Take absolute value of these variables
epoch Expect epoch-level input (and so key on ID:E)
f-lwr 0.5 Lower frequency bin
f-upr 20 Upper frequency bin

Output parameters:

Parameter Example Description
proj file.txt Save projection
not-only-u Output V matrix
v-matrix file.txt Write component definitions to this file
Outputs

Individual-level output: (strata: PSC)

Variable Description
U Component scores (left singular vectors U)

Model-level output, per component: (strata: I)

Variable Description
VE Variance explained
CVE Cumulative variance explained
W W (diagonal) matrix element
INC 0/1 indicator for whether this component was selected (given nc)

Model-level output, per feature: (strata: J)

Variable Description
CH Channel
CH1 First channel (for features based on channel pairs)
CH2 Second channel (for features based on channel pairs)
F Frequency

Model-level output, per component/feature: (strata: I x J)

Variable Description
V V matrix element
Example

Obtain power spectra from 50 individuals in a sample-list, for two channels:

luna s.lst 1 50 -o out.db -s 'MASK ifnot=NREM2 & RE & PSD sig=C3,C4 spectrum dB'
destrat out.db +PSD -r F CH > psd.txt

The file psd.txt contains 9900 rows (plus a header).

head psd.txt
ID          CH     F        PSD
id-0001     C3     0.5      -27.4700529572773
id-0001     C3     0.75     -33.0192745307719
id-0001     C3     1        -37.3284151965963
id-0001     C3     1.25     -40.4175405446029
id-0001     C3     1.5      -42.6210646368782
...

Note how we use echo to send the arguments to Luna via standard input for this special command:

echo "spectra=psd.txt v=PSD nc=10" | luna --psc -o psc.db

The console logs some key information:

  reading spectra from psd.txt
  converting input spectra to a matrix
  found 50 rows (individuals) and 198 columns (features)
  good, all expected observations found, no missing data
  after outlier removal, 50 individuals remaining
  mean-centering data matrix
  about to perform SVD...
  done... now writing output

The new components (left singular vectors) are in the U matrix, which is stratified by PSC (i.e. here the ten PSCs requested):

destrat psc.db +PSC -r PSC      

We can see the variance explained by each component:

destrat psc.db +PSC -r I

For the i'th component, the variance explained VE and cumulative variance explained CVE, as well as the singular values (W). The INC column indicates whether this component was selected for output.

ID   I    CVE      INC   VE       W
.    1    0.6866   1     0.6866   334.394566520496
.    2    0.8438   1     0.1572   160.027701798135
.    3    0.8885   1     0.0446   85.2727231551893
.    4    0.9269   1     0.0383   79.0675929112509
.    5    0.9462   1     0.0193   56.0980576766604
.    6    0.9629   1     0.0167   52.1652898703596
.    7    0.9730   1     0.0101   40.5734600579774
.    8    0.9787   1     0.0056   30.4009241212978
.    9    0.9821   1     0.0034   23.6838493013292
.    10   0.9850   1     0.0029   21.8003728646808

In addition, the J factors give some useful meta-information about each feature (column in the original data)

destrat psc.db +PSC -r J  | head
ID   J             CH   F      VAR
.    C3~0.5~PSD    C3   0.5    PSD
.    C3~0.75~PSD   C3   0.75   PSD
.    C3~1~PSD      C3   1      PSD
.    C3~1.25~PSD   C3   1.25   PSD
.    C3~1.5~PSD    C3   1.5    PSD
.    C3~1.75~PSD   C3   1.75   PSD
.    C3~10~PSD     C3   10     PSD
.    C3~10.25~PSD  C3   10.25  PSD
.    C3~10.5~PSD   C3   10.5   PSD

Info

We intend to produce a vignette to some applications of PSC in the near future.

PSC

Project new samples into an existing PSC space

Parameters
Parameter Example Description
proj proj=p1.txt Projection file from prior --psc proj output
cache cache=c1 Cache name (from prior cache-metrics performed this run)
norm Standardize inputs given the mean/SD from the original (--psc sample) data
Output

Individual-level output: (strata: PSC)

Variable Description
U Component scores (left singular vectors U)
Example

Continuing from the example above: based on N2 power spectra from 50 individuals, we repeat the above command but saving the projection (basically the V and W matrices from the SVD, along with the mean/SD of the original features, and a description of what they are, i.e. which channels, frequencies and metrics) in the file p1.txt:

echo "spectra=psd.txt v=PSD nc=10 proj=p1.txt" | luna --psc -o psc.db

To project a new individual into this space, we need to generate the equivalent set of features, and use Luna's cache mechanism to allow the PSC to speak to the PSD command, i.e. supplying the relevant features X for this individual, which will be scaled by V and W to give the corresponding U values (components) for this new individual.

luna s.lst 51 -o out.db -s ' MASK ifnot=NREM2 & RE
                 PSD sig=C3,C4 spectrum dB cache-metrics=c1
                 PSC proj=p1.txt cache=c1 '

Note the use of cache-metrics for PSD and the same cache (arbitrarily labelled c1 here) is attached to the PSC command.

The PSC checks that all of the required features (i.e. PSD for C3 and C4 channels for a given set of frequencies) are available in the cache; if they are not, the PSC command reports an error message. Naturally, the PSC is not able to check that other factors are similar (i.e. whether absolute or relative, raw versus log-scaled power was used, or whether power is only from N2 sleep etc). Naturally, for the PSCs to be interpretable in these new individuals, it is important to ensure that one is comparing like with like.