Skip to content

Principal Spectral Components (PSC) analysis

Fit principal spectral components (PSC) to sample-level power spectral and cross-channel connectivity metrics

Spectral analysis of multi-channel EEG can generate very high-dimensional output — thousands of frequency-by-channel metrics per individual — but with substantial redundancy, so the effective dimensionality is typically much lower. PSC addresses this by applying singular value decomposition (SVD) to a matrix of individuals-by-spectral-metrics, extracting a compact set of components that captures most of the variance. The resulting components can be used for visualization, as features for downstream statistical models (such as POPS), or to reduce the multiple-testing burden. --psc fits a decomposition to a set of existing Luna spectral output files; PSC projects new data into a previously defined PSC space.

Command Description
--psc Estimate PSCs from samples of spectral/connectivity metrics
PSC Project new samples into an existing PSC space

--psc

Estimate PSCs from samples of spectral/connectivity metrics

This command reads one or more files, looking for spectral measures stratified by channel and frequency, expecting long-format, i.e. as would be generated by

destrat out.db +PSD -r F CH > psd.txt
or
destrat out.db +COH -r E F CH1 CH2 > coh.txt

Specifically, each file must have a header row that lists the following stratifying variables:

  • an ID column
  • either CH or the pair of CH1 and CH2
  • the frequency column F is also expected
  • if expecting epoch-level data (i.e. given the epoch flag has been added), then a column E must also be present

In addition to the above, the command also expects one or more variables to be present, which correspond to the v parameter, e.g. v=PSD in the case of output from the PSD command.

The command reads in the data from one or more long-format file, and constructs a matrix where rows are individuals (or epochs) and columns are values of the variable(s) listed (e.g. PSD) stratified by channel(s) and frequencies. It checks that the matrix is fully-specified (i.e. all measures are defined for all individuals) and then performs one or more case-wise outlier removal sweeps, based on a row having values beyond X standard deviation units from the mean for one or more variables. The command then applies SVD and writes the U, W and (optionally) V matrices out.

For a toy example (i.e. obviously not real data - this is purely to illustrate the structure of the input data):

ID    CH   F   PSD
id01  C3   1   1.11
id01  C3   2   1.12
id01  C3   3   1.13
id01  F4   1   1.21
id01  F3   2   1.22
id01  F3   3   1.23
id02  C3   1   2.11
id02  C3   2   2.12
id02  C3   3   2.13
id02  F4   1   2.21
id02  F3   2   2.22
id02  F3   3   2.23
With the parameters v=PSD then the implied data matrix is a two-by-six as follows:
ID    PSD_C3_1 PSD_C3_2 PSD_C3_3 PSD_F3_1 PSD_F3_2 PSD_F3_3
id01  1.11     1.12     1.13     1.21     1.22     1.23
id02  2.11     2.12     2.13     2.21     2.22     2.23

This command does not assume any EDFs for input, and so no sample list need be specified (i.e. this is why this command has the special form --psc rather than using Luna's normal command syntax). The only inputs are the results files from previous spectral analyses.

Note

Note that although this is named spectral components, and the command (as below) calls for the spectra to be input, this command is generic, in the sense that any measures can be input; these measures may (or may not) additionally be stratified by frequency, channel, channel pair. The label spectral is really a historical accident in Luna development, reflecting the first application of what is really a more generic command.

Parameters

Core parameters are:

Parameter Example Description
spectra psd.txt,coh.txt Original metrics (i.e. input)
v PSD Name of the variables(s) to extract
nc 15 Number of components to extract (default: 10)
norm Standardize inputs
th 5,5 Set individuals to missing (case-wise deletion)

Optional parameters:

Parameter Example Description
ch C3,C4 Only extract these channels
inc-ids id1,id2 Only extract these individuals
ex-ids id3 Exclude these individuals
dB PSD Take log of these variables
abs ICOH Take absolute value of these variables
epoch Expect epoch-level input (and so key on ID:E)
f-lwr 0.5 Lower frequency bin
f-upr 20 Upper frequency bin

Output parameters:

Parameter Example Description
proj file.txt Save projection
not-only-u Output V matrix
v-matrix file.txt Write component definitions to this file

Outputs

Individual-level output: (strata: PSC)

Variable Description
U Component scores (left singular vectors U)

Model-level output, per component: (strata: I)

Variable Description
VE Variance explained
CVE Cumulative variance explained
W W (diagonal) matrix element
INC 0/1 indicator for whether this component was selected (given nc)

Model-level output, per feature: (strata: J)

Variable Description
CH Channel
CH1 First channel (for features based on channel pairs)
CH2 Second channel (for features based on channel pairs)
F Frequency

Model-level output, per component/feature: (strata: I x J)

Variable Description
V V matrix element

Example

Obtain power spectra from 50 individuals in a sample-list, for two channels:

luna s.lst 1 50 -o out.db -s 'MASK ifnot=NREM2 & RE & PSD sig=C3,C4 spectrum dB'
destrat out.db +PSD -r F CH > psd.txt

The file psd.txt contains 9900 rows (plus a header).

head psd.txt
ID          CH     F        PSD
id-0001     C3     0.5      -27.4700529572773
id-0001     C3     0.75     -33.0192745307719
id-0001     C3     1        -37.3284151965963
id-0001     C3     1.25     -40.4175405446029
id-0001     C3     1.5      -42.6210646368782
...

Note how we use echo to send the arguments to Luna via standard input for this special command:

echo "spectra=psd.txt v=PSD nc=10" | luna --psc -o psc.db

The console logs some key information:

  reading spectra from psd.txt
  converting input spectra to a matrix
  found 50 rows (individuals) and 198 columns (features)
  good, all expected observations found, no missing data
  after outlier removal, 50 individuals remaining
  mean-centering data matrix
  about to perform SVD...
  done... now writing output

The new components (left singular vectors) are in the U matrix, which is stratified by PSC (i.e. here the ten PSCs requested):

destrat psc.db +PSC -r PSC      

We can see the variance explained by each component:

destrat psc.db +PSC -r I

For the i'th component, the variance explained VE and cumulative variance explained CVE, as well as the singular values (W). The INC column indicates whether this component was selected for output.

ID   I    CVE      INC   VE       W
.    1    0.6866   1     0.6866   334.394566520496
.    2    0.8438   1     0.1572   160.027701798135
.    3    0.8885   1     0.0446   85.2727231551893
.    4    0.9269   1     0.0383   79.0675929112509
.    5    0.9462   1     0.0193   56.0980576766604
.    6    0.9629   1     0.0167   52.1652898703596
.    7    0.9730   1     0.0101   40.5734600579774
.    8    0.9787   1     0.0056   30.4009241212978
.    9    0.9821   1     0.0034   23.6838493013292
.    10   0.9850   1     0.0029   21.8003728646808

In addition, the J factors give some useful meta-information about each feature (column in the original data)

destrat psc.db +PSC -r J  | head
ID   J             CH   F      VAR
.    C3~0.5~PSD    C3   0.5    PSD
.    C3~0.75~PSD   C3   0.75   PSD
.    C3~1~PSD      C3   1      PSD
.    C3~1.25~PSD   C3   1.25   PSD
.    C3~1.5~PSD    C3   1.5    PSD
.    C3~1.75~PSD   C3   1.75   PSD
.    C3~10~PSD     C3   10     PSD
.    C3~10.25~PSD  C3   10.25  PSD
.    C3~10.5~PSD   C3   10.5   PSD

Info

We intend to produce a vignette to some applications of PSC in the near future.

PSC

Project new samples into an existing PSC space

Parameters

Parameter Example Description
proj proj=p1.txt Projection file from prior --psc proj output
cache cache=c1 Cache name (from prior cache-metrics performed this run)
norm Standardize inputs given the mean/SD from the original (--psc sample) data

Output

Individual-level output: (strata: PSC)

Variable Description
U Component scores (left singular vectors U)

Example

Continuing from the example above: based on N2 power spectra from 50 individuals, we repeat the above command but saving the projection (basically the V and W matrices from the SVD, along with the mean/SD of the original features, and a description of what they are, i.e. which channels, frequencies and metrics) in the file p1.txt:

echo "spectra=psd.txt v=PSD nc=10 proj=p1.txt" | luna --psc -o psc.db

To project a new individual into this space, we need to generate the equivalent set of features, and use Luna's cache mechanism to allow the PSC to speak to the PSD command, i.e. supplying the relevant features X for this individual, which will be scaled by V and W to give the corresponding U values (components) for this new individual.

luna s.lst 51 -o out.db -s ' MASK ifnot=NREM2 & RE
                 PSD sig=C3,C4 spectrum dB cache-metrics=c1
                 PSC proj=p1.txt cache=c1 '

Note the use of cache-metrics for PSD and the same cache (arbitrarily labelled c1 here) is attached to the PSC command.

The PSC checks that all of the required features (i.e. PSD for C3 and C4 channels for a given set of frequencies) are available in the cache; if they are not, the PSC command reports an error message. Naturally, the PSC is not able to check that other factors are similar (i.e. whether absolute or relative, raw versus log-scaled power was used, or whether power is only from N2 sleep etc). Naturally, for the PSCs to be interpretable in these new individuals, it is important to ensure that one is comparing like with like.

Back to top