Principal Spectral Components (PSC) analysis
Fit principal spectral components (PSC) to sample-level power spectral and cross-channel connectivity metrics
Luna commands can produce a lot of output. For example, estimates of
spectral power at 0.5 to 30 Hz in 0.25 Hz bins, for 60 EEG channels,
will give 7,140 metrics (e.g. from PSD
). Looking at cross-channel
coherence for the same frequency range will give 210,630 more metrics
per individual/sleep stage. This poses statistical and practical
challenges if these measures are to be visualized or used in
downstream statistical analyses.
There is a great deal of redundancy between many of these measures, however, meaning that the effective dimensionality will typically be order-of-magnitude lower. This scenario suggests the use of data reduction as an intermediate step, to represent these types of spectral metrics more efficiently.
Principal spectral components (PSC) is one method that provides a simple means of data reduction, essentially applying singular value decomposition (SVD) to the matrix of individuals/epochs (rows) by spectral measures (columns). The spectral measures will typically be power for different frequency bins and channels; alternatively, these data may also include cross-channel metrics such as coherence or the phase slope index.
In general, the idea is to take the high-dimensional (but also highly
redundant) data from commands such as PSD
, MTM
, COH
or PSI
and
extract a much smaller number of components that explain most of the
original variance. This has the potential to provide insights into
the structure of individual differences across related sleep measures
(although interpreting components can be challenging). More directly,
it has the potential to provide a powerful set of independent measures
for subsequent statistical analyses (or, in the context of the
POPS
model, sleep staging), as well as a means to
handle multiple-testing problems.
Two commands provide support to 1) fit a PSC decomposition to existing
spectral output data (either between individual, or within-individual)
via --psc
, and 2) to project new data into a previously
defined lower dimensional space. Although the computation behind
these commands is very standard (e.g. the same output would be
obtained via standard commands from any statistics package given the
same input matrix), these commands are designed to work efficiently
from a practical standpoint with Luna output and EDFs.
Command | Description |
---|---|
--psc |
Estimate PSCs from samples of spectral/connectivity metrics |
PSC |
Project new samples into an existing PSC space |
--psc
Estimate PSCs from samples of spectral/connectivity metrics
This command reads one or more files, looking for spectral measures stratified by channel and frequency, expecting long-format, i.e. as would be generated by
destrat out.db +PSD -r F CH > psd.txt
destrat out.db +COH -r E F CH1 CH2 > coh.txt
Specifically, each file must have a header row that lists the following stratifying variables:
- an
ID
column - either
CH
or the pair ofCH1
andCH2
- the frequency column
F
is also expected - if expecting epoch-level data (i.e. given the
epoch
flag has been added), then a columnE
must also be present
In addition to the above, the command also expects one or more
variables to be present, which correspond to the v
parameter,
e.g. v=PSD
in the case of output from the PSD
command.
The command reads in the data from one or more long-format file, and
constructs a matrix where rows are individuals (or epochs) and
columns are values of the variable(s) listed (e.g. PSD
) stratified
by channel(s) and frequencies. It checks that the matrix is
fully-specified (i.e. all measures are defined for all individuals)
and then performs one or more case-wise outlier removal sweeps, based
on a row having values beyond X standard deviation units from the
mean for one or more variables. The command then applies SVD and writes
the U, and W and (optionally) V matrices out.
For a toy example example (i.e. obviously not real data - this is purely to illustrate the structure of the input data):
ID CH F PSD
id01 C3 1 1.11
id01 C3 2 1.12
id01 C3 3 1.13
id01 F4 1 1.21
id01 F3 2 1.22
id01 F3 3 1.23
id02 C3 1 2.11
id02 C3 2 2.12
id02 C3 3 2.13
id02 F4 1 2.21
id02 F3 2 2.22
id02 F3 3 2.23
v=PSD
then the implied data matrix is a two-by-six as follows:
ID PSD_C3_1 PSD_C3_2 PSD_C3_3 PSD_F3_1 PSD_F3_2 PSD_F3_3
id01 1.11 1.12 1.13 1.21 1.22 1.23
id01 2.11 2.12 2.13 2.21 2.22 2.23
This command does not assume any EDFs for input, and so no sample list
need be specified (i.e. this is why this command has the special form
--psc
rather than using Luna's normal command syntax). The only
inputs are the results files from previous spectral analyses.
Note
Note that although this is named spectral components, and
the command (as below) calls for the spectra
to be input, this
command is generic, in the sense that any measures can be input;
these measures may (or may not) additionally be stratified by
frequency, channel, channel pair. The label spectral is really
a historical accident in Luna development, reflecting the first
application of what is really a more generic command.
Parameters
Core parameters are:
Parameter | Example | Description |
---|---|---|
spectra |
psd.txt,coh.txt |
Original metrics (i.e. input) |
v |
PSD |
Name of the variables(s) to extract |
nc |
15 | Number of components to extract (default: 10) |
norm |
Standardize inputs | |
th |
5,5 |
Set individuals to missing (case-wise deletion) |
Optional parameters:
Parameter | Example | Description |
---|---|---|
ch |
C3,C4 |
Only extract these channels |
inc-ids |
id1,id2 |
Only extract these individuals |
ex-ids |
id3 |
Exclude these individuals |
dB |
PSD |
Take log of these variables |
abs |
ICOH |
Take absolute value of these variables |
epoch |
Expect epoch-level input (and so key on ID:E ) |
|
f-lwr |
0.5 | Lower frequency bin |
f-upr |
20 | Upper frequency bin |
Output parameters:
Parameter | Example | Description |
---|---|---|
proj |
file.txt |
Save projection |
not-only-u |
Output V matrix | |
v-matrix |
file.txt |
Write component definitions to this file |
Outputs
Individual-level output: (strata: PSC
)
Variable | Description |
---|---|
U |
Component scores (left singular vectors U) |
Model-level output, per component: (strata: I
)
Variable | Description |
---|---|
VE |
Variance explained |
CVE |
Cumulative variance explained |
W |
W (diagonal) matrix element |
INC |
0/1 indicator for whether this component was selected (given nc ) |
Model-level output, per feature: (strata: J
)
Variable | Description |
---|---|
CH |
Channel |
CH1 |
First channel (for features based on channel pairs) |
CH2 |
Second channel (for features based on channel pairs) |
F |
Frequency |
Model-level output, per component/feature: (strata: I
x J
)
Variable | Description |
---|---|
V |
V matrix element |
Example
Obtain power spectra from 50 individuals in a sample-list, for two channels:
luna s.lst 1 50 -o out.db -s 'MASK ifnot=NREM2 & RE & PSD sig=C3,C4 spectrum dB'
destrat out.db +PSD -r F CH > psd.txt
The file psd.txt
contains 9900 rows (plus a header).
head psd.txt
ID CH F PSD
id-0001 C3 0.5 -27.4700529572773
id-0001 C3 0.75 -33.0192745307719
id-0001 C3 1 -37.3284151965963
id-0001 C3 1.25 -40.4175405446029
id-0001 C3 1.5 -42.6210646368782
...
Note how we use echo
to send the arguments to Luna via standard
input for this special command:
echo "spectra=psd.txt v=PSD nc=10" | luna --psc -o psc.db
The console logs some key information:
reading spectra from psd.txt
converting input spectra to a matrix
found 50 rows (individuals) and 198 columns (features)
good, all expected observations found, no missing data
after outlier removal, 50 individuals remaining
mean-centering data matrix
about to perform SVD...
done... now writing output
The new components (left singular vectors) are in the U matrix, which is
stratified by PSC
(i.e. here the ten PSCs requested):
destrat psc.db +PSC -r PSC
We can see the variance explained by each component:
destrat psc.db +PSC -r I
For the i'th component, the variance explained VE
and cumulative variance explained CVE
, as well as the singular values (W
). The INC
column
indicates whether this component was selected for output.
ID I CVE INC VE W
. 1 0.6866 1 0.6866 334.394566520496
. 2 0.8438 1 0.1572 160.027701798135
. 3 0.8885 1 0.0446 85.2727231551893
. 4 0.9269 1 0.0383 79.0675929112509
. 5 0.9462 1 0.0193 56.0980576766604
. 6 0.9629 1 0.0167 52.1652898703596
. 7 0.9730 1 0.0101 40.5734600579774
. 8 0.9787 1 0.0056 30.4009241212978
. 9 0.9821 1 0.0034 23.6838493013292
. 10 0.9850 1 0.0029 21.8003728646808
In addition, the J
factors give some useful meta-information about each feature (column in the original data)
destrat psc.db +PSC -r J | head
ID J CH F VAR
. C3~0.5~PSD C3 0.5 PSD
. C3~0.75~PSD C3 0.75 PSD
. C3~1~PSD C3 1 PSD
. C3~1.25~PSD C3 1.25 PSD
. C3~1.5~PSD C3 1.5 PSD
. C3~1.75~PSD C3 1.75 PSD
. C3~10~PSD C3 10 PSD
. C3~10.25~PSD C3 10.25 PSD
. C3~10.5~PSD C3 10.5 PSD
Info
We intend to produce a vignette to some applications of PSC in the near future.
PSC
Project new samples into an existing PSC space
Parameters
Parameter | Example | Description |
---|---|---|
proj |
proj=p1.txt |
Projection file from prior --psc proj output |
cache |
cache=c1 |
Cache name (from prior cache-metrics performed this run) |
norm |
Standardize inputs given the mean/SD from the original (--psc sample) data |
Output
Individual-level output: (strata: PSC
)
Variable | Description |
---|---|
U |
Component scores (left singular vectors U) |
Example
Continuing from the example above: based on N2 power spectra from 50 individuals,
we repeat the above command but saving the projection (basically the V and W matrices from
the SVD, along with the mean/SD of the original features, and a description of what they are, i.e.
which channels, frequencies and metrics) in the file p1.txt
:
echo "spectra=psd.txt v=PSD nc=10 proj=p1.txt" | luna --psc -o psc.db
To project a new individual into this space, we need to generate the equivalent set of features,
and use Luna's cache mechanism to allow the PSC
to speak to the PSD
command, i.e. supplying
the relevant features X for this individual, which will be scaled by V and W to give the corresponding
U values (components) for this new individual.
luna s.lst 51 -o out.db -s ' MASK ifnot=NREM2 & RE
PSD sig=C3,C4 spectrum dB cache-metrics=c1
PSC proj=p1.txt cache=c1 '
Note the use of cache-metrics
for PSD
and the same cache (arbitrarily labelled c1
here) is attached to the PSC
command.
The PSC
checks that all of the required features (i.e. PSD
for C3
and C4
channels for a given set of frequencies) are available
in the cache; if they are not, the PSC
command reports an error message. Naturally, the PSC
is not able to check that other factors
are similar (i.e. whether absolute or relative, raw versus log-scaled power was used, or whether power is only from N2 sleep etc). Naturally,
for the PSCs to be interpretable in these new individuals, it is important to ensure that one is comparing like with like.