Skip to content

5.5. Dimension reduction

Many metrics derived from the analysis of the sleep EEG can be of relatively high dimension, with repeated values stratified by frequency, channel (or channel pair), sleep stage, etc. Here we consider a standard approach to dimension reduction -- principal components analysis (PCA) -- that is embedded within Luna.

Because of the typical context in which this is applied within Luna, the command that implements this is called PSC -- principal spectral components. That is, typically (although not necessarily) PCA (as implemented via singular value decomposition, SVD) is applied to the outputs of prior spectral analysis.

We'll start by looking at reducing sets of metrics derived from this N=20 dataset, considering a) multi-channel EEG power spectra and b) cross-channel connectivity metrics (PSI from the previous step).

For the PSD example only, we'll also consider the scenario where a comparable (and larger) external sample is used to generate the PC solution, and is used to project the N=20 sample into that space. Here, we'll derive these using the remaining N=106 indepenent control subjects from the GRINS study.

Sample size and PCA

In this walkthrough, note that the number of observations (n) is typically very small relative to the number of variables, which, depending on the nature of the data and covariance structures, can impact the stability of extracted components. At the extreme, with only 20 observations, we cannot estimate more than 20 components from the data (nor would we want to, in these contexts). Typically, looking at only the first few components that explain the bulk of the variance will be sufficient. Alternatively, as we'll do in the second and third steps here, one can use component loadings calculated on comparable but larger samples, and project this smaller number of individuals into the pre-defined component space. (Note: the larger set of EDFs is not available as part of this walkthrough; we'll only use the results derived from those studies, applied to the N=20 set.)

PS*-what?

It just so happens that this step uses commands with superficially similar labels: to clarify

  • PSD : power spectral density
  • PSI : phase slope index
  • PSC : principal spectral components

As noted, above, here we'll be using PSC to reduce the outputs of prior PSD (single-channel power) and, second, PSI (cross-channel connectivity) runs.

PSD

We'll first run through the PSC steps for the power spectra metrics (from PSD) and then below turn to connectivity (from PSI).

Inputs

As previously described here, we'll generate N2 and REM power statistics for all channels, using just Welch (PSD) to speed things up:

luna c.lst  -o out/welch.db \
 -s ' FREEZE F1
      TAG stg/N2
      MASK ifnot=N2 & RE
      CHEP-MASK sig=${eeg} ep-th=3,3
      CHEP sig=${eeg} epochs & RE
      PSD sig=${eeg} dB spectrum min=0.5 max=20
      THAW F1
      TAG stg/R
      MASK ifnot=R & RE
      CHEP-MASK sig=${eeg} ep-th=3,3
      CHEP sig=${eeg} epochs & RE
      PSD sig=${eeg} dB spectrum min=0.5 max=20 '

We'll extract power metrics separately by stage:

destrat out/welch.db +PSD -r F CH stg/N2 > res/spectral.welch.n2.allchs
destrat out/welch.db +PSD -r F CH stg/R > res/spectral.welch.rem.allchs

We'll also extract band power and Welch PSD metrics from PSD for (for later use when testing for association).

destrat out/welch.db +PSD -r B CH stg/N2 > res/spectral.welch.band.n2.allchs
destrat out/welch.db +PSD -r B CH stg/R > res/spectral.welch.band.rem.allchs

Our inputs for PSC (here considering N2 spectra only) are therefore in the form:

head res/spectral.welch.n2.allchs
ID      CH       F   stg        PSD
F01    Fp1     0.5    N2    24.1401
F01    Fp1    0.75    N2    21.2989
F01    Fp1       1    N2    20.0258
F01    Fp1    1.25    N2    18.6562
F01    Fp1     1.5    N2    17.2585
F01    Fp1    1.75    N2    16.0717
F01    Fp1       2    N2    14.7810
F01    Fp1    2.25    N2    13.9197
F01    Fp1     2.5    N2    13.5161
...

In total, this yields 4,503 metrics per individual i.e. (57 * 79 frequency bins, 0.5 to 20 Hz in 0.25 Hz increments). We'll use PSC to reduce these to a small, nonredundant set of components.

N=20 PSC

The --psc command is a command-line option that takes the outputs of other Luna commands, or really any dataset that is structured by channel (CH) -- or channel-pair (CH1 and CH2) -- and frequency (F). It expects inputs in long format, meaning that it will look for these labels in the header row (assuming tab-delimited text files as inputs).

Application of PSC to different input types

The values of these stratifying factors aren't actually used other than to define variables: i.e. you could feed hypnogram-based macro-architecture metrics into --psc. Although they don't have any intrinsic "channel" or "frequency" attributes, you can simply set those factors to arbitrary values, e.g. CH to - and F to 0.

The --psc command is a special one, in that (unlike most Luna commands) it does not require individual-level signals as inputs (i.e. EDF/annotations). Rather, the primary option for --psc is spectra, which takes one (or more) files (here tmp/psd.n2):

luna --psc -o out/psc-psd-n2.db \
     --options spectra=res/spectral.welch.n2.allchs nc=10 v=PSD f-lwr=0.75 norm

The other relevant options are:

  • nc : the number of principal components to extract

  • v : the name of the variable(s) in the input file(s) to extract, here just PSD (i.e. the input file may have many more columns that we don't want to include)

  • norm : standardize each variable to unit variance, as well as mean-centering prior to SVD; this is equivalent to performing a PCA on a correlation (rather than covariance) matrix

  • f-lwr : a lower-bound on frequencies to read in (there is also a f-upr option); although we output power values from 0.5 Hz upwards in the step above, just to illustrate this option, here we'll reduce inputs to frequencies 0.75 Hz or above

We also specify an output database (out/psc-psd-n2.db). On running the above, you should see the following in the console log:

  reading spectra from tmp/psd.n2
  restricting to 0.75 <= F
  converting input spectra to a matrix
  found 20 rows (observations) and 4446 columns (features)
  finished making regular data matrix on 20 observations
  after outlier removal, 20 observations remaining
  standardizing data matrix
  about to perform SVD...
  done... now writing output

Because we excluded the 0.5 Hz values in this particular example, we only have 4446 features per individual: this command constructs a matrix that is 20 rows (individuals) by 4446 columns (features/variables). There are various outlier/winsorization steps that can be put in place here, but we'll skip those for now. The primary activity is to perform the SVD and output results.

In R, we'll take a look:

library(luna)
k <- ldb( "out/psc-psd-n2.db" )
(Ignore any message about reading data from 21 individuals - this is because the output includes sample-level metrics, that have a dummy (.) individual ID.)

There are four main tables generated by PSC:

lx(k) 
PSC : I J PSC I_J 

  • I : output per component, ordered by variance explained (VE) - i.e. how important is this component?

  • PSC : the scores on each component for each individual - i.e. how does individual X score on this component?

  • I x J : the loading of each variable on each component (right singular vectors, or the V matrix from SVD, the PCA principal/axes) - i.e. what does this component reflect?

  • J : some book-keeping information on what channels/frequencies, etc. each variable corresponds to (rather than duplicate this in the I x J output for every component)

Components

First looking at I: (note: minor known issues - ignore the ID column, it should be . for this table, but is currently set as the ID of the last person in the file; we omit it below):

k$PSC$I
  I       CVE INC           VE            W
  1 0.5411471   1 5.411471e-01 2.138057e+02
  2 0.6712794   1 1.301323e-01 1.048465e+02
  3 0.7733245   1 1.020451e-01 9.284479e+01
  4 0.8234218   1 5.009732e-02 6.505322e+01
  5 0.8589879   1 3.556614e-02 5.481254e+01
  6 0.8876806   1 2.869270e-02 4.923198e+01
  7 0.9109759   1 2.329523e-02 4.436036e+01
  8 0.9304785   1 1.950265e-02 4.058899e+01
  9 0.9444868   1 1.400827e-02 3.439964e+01
 10 0.9560990   1 1.161217e-02 3.131974e+01
 11 0.9661331   0 1.003420e-02 2.911407e+01
 12 0.9732768   0 7.143670e-03 2.456531e+01
 13 0.9794879   0 6.211121e-03 2.290586e+01
 14 0.9837834   0 4.295473e-03 1.904877e+01
 15 0.9879196   0 4.136194e-03 1.869227e+01
 16 0.9917914   0 3.871839e-03 1.808507e+01
 17 0.9949665   0 3.175043e-03 1.637707e+01
 18 0.9976846   0 2.718114e-03 1.515289e+01
 19 1.0000000   0 2.315397e-03 1.398538e+01
 20 1.0000000   0 1.465777e-31 1.112745e-13

Plotting the (cumulative) variance explained by each component:

d <- k$PSC$I

par(mfcol=c(1,2))

plot(d$VE, type="b", col=ifelse(d$INC, "black", "gray"),
     pch=20, xlab="Components", ylab="Variance explained" )

plot(d$CVE, type="b", col=ifelse(d$INC, "black", "gray"),
     pch=20, xlab="Components", ylab="Cumulative variance explained" )

img

So, these 10 components explained 95% of the variance across the 4000+ metrics (and only 4 components are needed to explain 80%). Note that Luna will output the full set of components in this table (i.e. up to 20 here) but flag the requested components with INC set to 1. The other tables have only nc components output, however.

Scores

Turning next to the scores, which are defined per individual per component:

head( k$PSC$PSC ) 
  ID PSC           U
 F01   1 0.251936436
 F02   1 0.160775179
 F03   1 0.243140059
 F04   1 0.003124696
 F05   1 0.224909894
 F06   1 0.011261479
...

Each individual has 10 rows (PSC from 1 to nc, i.e. 10) with the score value encoded U. Here we plot the first component, that explains 53.7% of the total variance, for the 20 individuals:

d <- k$PSC$PSC

plot( d$U[ d$PSC == 1 ] , pch = 20 ,
      col = rep( c("red", "blue" ) , each=10 ) ,
      ylab = "PC1" , xlab = "Individuals" )

abline( h = 0 )       

img

Sign of components

Note that the scaling and sign/direction of components is arbitrary: i.e. running the same analysis with only very slightly different data could give essentially the same PC1 but with the opposite sign. Care must be taken when deciding whether two components from different analyses are "similar" or not, because of this.

Loadings

But what does this component actually reflect in terms of multi-channel power spectra? We'll look at the loadings to answer this, in the I-byJ table, here called I_J:

head( k$PSC$I_J ) 
 I             J           V
 1  AF3~0.75~PSD 0.008801388
 1  AF3~1.25~PSD 0.014543511
 1   AF3~1.5~PSD 0.014888870
 1  AF3~1.75~PSD 0.015198765
 1 AF3~10.25~PSD 0.014694940
 1  AF3~10.5~PSD 0.014315266
...

Note that Luna has generated variable IDs (J) in the form CH~F~VAR (or, as below, CH1.CH2~F~VAR), where VAR represents the original base variable (i.e. in this case, PSD). To make these easier to work with, we can link the aforementioned book-keeping table, which expands this same information (for each variable/column in the data matrix) into a more analysis-friendly form:

head( k$PSC$J ) 
             J  CH     F VAR
  AF3~0.75~PSD AF3  0.75 PSD
  AF3~1.25~PSD AF3  1.25 PSD
   AF3~1.5~PSD AF3  1.50 PSD
  AF3~1.75~PSD AF3  1.75 PSD
 AF3~10.25~PSD AF3 10.25 PSD
  AF3~10.5~PSD AF3 10.50 PSD
...

We'll merge these two tables:

d <- merge( k$PSC$I_J , k$PSC$J , by="J" ) 
and extract the key columns (e.g. all VAR rows are the same here):

d <- d[ , c("CH","F","I","V") ]

It is generally a good idea to ensure the right ordering after a merge() -- at least that rows are ordered by increasing F values for some of the plots below:

d <- d[ order( d$I , d$CH , d$F ) , ]
head( d ) 
  CH    F I           V
 AF3 0.75 1 0.008801388
 AF3 1.00 1 0.013593292
 AF3 1.25 1 0.014543511
 AF3 1.50 1 0.014888870
 AF3 1.75 1 0.015198765
 AF3 2.00 1 0.015333972
...

We now have 44,460 rows in d (i.e. the 4460 variables for 10 extracted components), where V gives the loading for that component/variable pair. We can plot these using the ltopo.xy() convenience function:

ylim = max( abs( range( d$V ) ) ) 
ylim = c( -ylim , ylim ) 
par(mar=c(0,0,0,0))

ltopo.xy(d$CH, x=d$F, y=d$V, f= d$I==1,
         yline=0, xline=13, ylim=ylim, ylab="V", mt="PSC1")

img

What does this tell us? It is perhaps easier to explain the above plot in reference to the same plot generated for the second and third components, that explain 12.8% and 10.2% of the variance respectively:

img

img

So, roughly speaking:

  • PSC1 reflects a general total power effect: the coefficients are uniformly positive in all cases and there is not a lot of variability between channels. (Indeed, we'd likely find there isn't much variability if we were to calculate this across different sleep/wake stages, not just N2.) We do see a systematic dip in the coefficients (towards the y=0 line) around the sigma band: this is effectively saying that in N2 sleep, in this population/sample, variability in spindle activity is also a significant source of person-to-person variability, but that it doesn't track with this total power effect - thus, the contribution of extra-variable spindle activity is effectively removed from this component.

  • PSC2 reflects an inverse association between slower frequency activity (i.e. positive weights < 10 Hz, with a peak around 7Hz, denoted by the vertical line) and faster activity (i.e. negative weights > 10 Hz )

  • PSC3 reflects a) an anteroposterior gradient of declining power, especially for slower frequencies (e.g. compare OZ and FPZ) as well as b) non-zero loadings for sigma/spindle activity, which also shows an anteroposterior gradient (e.g. compare AFZ and POZ, which load onto slower and faster sigma activity respectively, as indicated by the vertical lines at 12 and 14 Hz).

The fourth component -- that explains 5% of the variance -- shows a clear inverse relationship between slow/delta power and sigma power:

img

Some important points to make/reiterate:

  • the proportion of variance explained metrics are somewhat artificial, as slower activity tends to span a smaller number of bins than faster activity (e.g. beta); also note that if we ran this with norm, then slower activity would explain a lot more of the total variance, i.e. reflecting the 1/f slope of the EE

  • the precise interpretation of these components, based on an N=20 sample, should not be taken too literally

  • if performed on different samples, these components would be different (either superficially or fundamentally)

  • by definition, PCA defines axes/components on based on their orthogonality: as such, they provide an efficient, nonredundant description of variation but we should not necessarily expect any type of one-to-one mapping with the underlying physiologically distinct processes that generate this variability

So, what might the value of PSC analysis be? Think of PSCs as data-driven complements to classical descriptors of variability in the (sleep EEG), including both band power but also spectral slope, intercept and other similar types of metrics. Whereas we've used the Welch PSD here, one could equivalently use the periodic component of the power spectra (from IRASA as seen earlier) to generate alternative PSCs. Unlike these classical metrics, PSCs can in theory capture some of the more nuanced relationships between different frequency bands, as well as (when applied in this hd-EEG case) topographical effects. Further, this approach naturally extends to include other types of metrics where it might not be obvious how to summarize thousands of inter-related effects in a quantitative manner.

Overall, the primary utility of PSC analysis is more likely in the context of generating predictive biomarkers, visualizations, or certain types of data-driven normalizations:

  • internally, the POPS stager uses the same PSC approach but applied over epochs rather than over individuals to build predictors that are subsequently used in a machine learning (gradient descent boosting) approach to predict stage labels.

  • you can think of PSC as an upfront approach to handling multiple testing; as we'll see below, if associated sleep metrics with outcomes, we can handle multiple testing between dependent measures empirically, to generate single-metric adjusted test statistics that appropriately handle the correlational structure (and so are not unduly conservative in correction). It is also possible to adopt cluster-based approaches, as we'll see, that might increase power. Alternatively, rather than "test-then-correct/reduce" approach, PSC provides a "reduce-then-test" approach, which might be more efficient under some conditions (i.e. the orthogonality simplifies multiple-test correction). Given a positive association, one can then investigate which constituent features drive that effect, analogous to post-hoc tests following an initial omnibus test.

  • one can imagine using PSCs to provide alternative approaches to normalizing EEG data: it is possible to reconstruct the original signal based on all the components. It is also possible to do this but leaving out certain components. Typically, in data-reduction contexts it is the components that explain less variance that are dropped. However, one can imagine, for example, dropping the first "total power" component and reconstructing power spectra, as an alternative to normalization by use of relative power (which inherently introduces relationships between power at different frequencies, if based on the sum/total power) etc.

N=106 solution

Although you can't run this as part of the walkthrough, we've generated PSD values and performed PSC analysis for the remaining independent 106 subjects in GRINS. Note that we add a proj option here, to make Luna save the solution, so that other (new) individuals can be scored (projected into) this solution: (don't run this)

# luna --psc -o out/n106-psc-psd-n2.db \
#      --options spectra=n106/psd.n2 v=PSD nc=10 proj=work/data/auxiliary/n106/psd.n2.proj

If we generate the same variance explained plots, we see a broadly similar pattern as in our case:

img

We'll return to projection in later planned steps, but for now we'll give the Luna code one would use use to take an existing projection (as shown above) and apply it to a new sample, i.e. to generate scores per individual for the same components. There is no need to run this particular command, the results won't be used subsequently, but we include here for reference.

luna c.lst  -o out/psc-psd-n2-proj106.db \
  -s ' CACHE cache=c1 record=PSD,PSD,F,CH uppercase-keys
       MASK ifnot=N2 & RE preserve-cache
       CHEP-MASK ep-th=3,3
       CHEP epochs & RE preserve-cache
       MASK random=50 & RE preserve-cache
       PSD sig=${eeg} dB spectrum min=0.5 max=20 
       PSC proj=work/data/auxiliary/n106.psd.n2.proj cache=c1 norm '

Of note

  • this uses Luna cache mechanism to store the PSD values (the uppercase-keys is to handle potential differences in channel names due to case, i.e. when aligning with the PSC solution in PSC)

  • we restruct to N2, do some cleaning and estimate Welch spectral power (here for 50 random epochs)

  • the key step is then the PSC command, where we supply a) the cache that contains the necessary values for the EDF being processed, and b) the prior solution (here derived from an independent sample); we also add norm as before

Note that the projection use of PSC doesn't fit a new PCA model; unlike --psc, this also works on an attached EDF, i.e. which is why it features as a "standard" Luna command here, expecting a sample list of EDFs, rather than as a one-off special command. We'll return to further applications of PSC and projection in future steps.

PSI

What about the connectivity metrics? Whereas traditional power metrics can be summarized quite readily by topo-plots of classical band power, etc, it is less straightforward to summarize tens-of-thousands of highly-interrelated connectivity metrics. Here we'll consider the application of the same PSC approach as above:

Inputs

If you've followed the previous connectivity step, then you should already have a file res/psi2.n2 containing N2 PSI metrics for all channel pairs, across nine frequency windows. That step yielded 14,364 metrics per individual (i.e. 57 * 56 / 2 * 9 frequency bins) per individual. Again, the goal is to reduce this to an orders-of-magnitude smaller number whilst retaining most of the information on individual differences.

The N2 PSI values for pairs of channels are in res/psi2.n2 (skipping other columns of output, as we'll only focus on the PSI variable):

head res/psi2.n2 
ID      CH1     CH2     F                   PSI     
F01     AF3     AF4     3    -0.416104452343918 
F01     AF3     AFZ     3     -1.55691457101163  
F01     AF3      C1     3      3.27365730289692   
F01     AF3      C2     3      2.54138740466229   
F01     AF3      C3     3      1.84440293093488   
F01     AF3      C4     3      1.39577670245724   
F01     AF3      C5     3     0.198660734953541  
F01     AF3      C6     3     0.485771012741785  
F01     AF3     CP1     3      4.72866760705559   
...

N=20 PSC

Here we apply PSC to the N=20 dataset:

luna --psc -o out/psc-psi-n2.db \
     --options spectra=res/psi2.n2 nc=10 v=PSI 

The only differences from the application to the spectral power (PSD) metrics above are that:

  • here we do not add norm (unlike spectral power measures, the PSI metrics have a more standardized distribution to start with, although adding norm shouldn't make much difference);

  • also, as PSI has a different range of frequencies (nine 4 Hz windows centered from 3 to 19 Hz in this application) we'll remove the f-lwr option.

We still request 10 components (which is obviously arbitrary). We should see similar output in the console log:

  reading spectra from res/psi2.n2
  converting input spectra to a matrix
  found 20 rows (observations) and 14364 columns (features)
  finished making regular data matrix on 20 observations
  after outlier removal, 20 observations remaining
  mean-centering data matrix
  about to perform SVD...
  done... now writing output

Components

Extracting the components in R:

library(luna)
k <- ldb( "out/psc-psi-n2.db" )

Looking at the variance explained per component:

d <- k$PSC$I
par(mfcol=c(1,2))
plot( d$VE , type="b" , col = ifelse( d$INC , "black", "gray" ) , pch = 20 , xlab = "Components" , ylab = "Variance explained" )
plot( d$CVE , type="b" , col = ifelse( d$INC , "black", "gray" ) , pch = 20 , xlab = "Components" , ylab = "Cumulative variance explained" )

img

In contrast to the PSD example above, the top K PSCs do not explain as much of the variance, but largely because for the PSI there is no equivalent of the first huge "total power" effect we saw for absolute power. Still, there are six components that each account for over 5% of the total variance (and collectively, 65% of the total variance).

Scores

Looking at the scores on the first component for the N=20 individuals:

d <- k$PSC$PSC

plot( d$U[ d$PSC == 1 ] , pch = 20 ,
      col = rep( c("red", "blue" ) , each=10 ) ,
      ylab = "PC1" , xlab = "Individuals" )
abline( h = 0 )       

img

Loadings

And finally, we can investigate what each component reflects: i.e. what are the major (independent) sources of variation in patterns of NREM functional connectivity as indexed by the PSI?

Here, we'll leverage the same visualization functions we used to plot the group averages, etc for these measures, but instead applied to the loadings (V) from PSC. Following the steps as above (but splitting the information into two channels, e.g. a CH value of FZ.CZ becomes CH1 and CH2 values of FZ and CZ respectively; further, we double-enter the pairwise metrics, as described in the previous section:

d <- merge( k$PSC$I_J , k$PSC$J , by="J" ) 
d$CH1 <- unlist( strsplit( d$CH , "." , fixed = T ) )[c(T,F)]
d$CH2 <- unlist( strsplit( d$CH , "." , fixed = T ) )[c(F,T)] 
d <- d[ , c("CH1","CH2","F","I","V") ] 
d <- d[ order( d$I , d$CH1, d$CH2 , d$F ) , ] 
head(d)
 CH1 CH2  F I            V
 AF3 AF4  3 1 -0.003705147
 AF3 AF4  5 1 -0.004534031
 AF3 AF4  7 1 -0.001675343
 AF3 AF4  9 1  0.002476905
 AF3 AF4 11 1  0.002960878
 AF3 AF4 13 1 -0.000786632
 ...

Double-entering:

d <- rbind( d ,
            data.frame(CH1=d$CH2, CH2=d$CH1, F=d$F, I=d$I, V=-d$V ) ) 

This gives a data frame with 287,280 rows, as expected (10 components by 9 frequency windows by ( 57^2 - 57 ) double-entered cross-channel pairs).


We'll use the same arconn() plotting function:

source("http://zzz.bwh.harvard.edu/dist/luna/arconn.R" )
source("arconn.R")
for (i in 1:2) { 
q = 0.025

x <- d[ d$I == i , ] 
th = quantile( x$V , 1 - q ) 
zr <- range( x$V )
cat(i,"\n\n")

par(mfcol=c(1,9) , mar = c(0,0,1,0) ) 
 for (f in seq(3,19,2) )
   arconn( x$CH1 , x$CH2 , x$V ,
           flt = x$F == f & x$V > th,
           zr = zr , 
           head=F, cex=1, directional = T , 
           title = paste( "PSC",i,": ",f-2,"-",f+2,"Hz",sep="") ) 
}

Looking at the first two components' loadings: whereas the first component is characterized predominantly by a flow of directed connectivity from frontal/central to parietal/occipital regions in the 9-13 Hz range:

img

the second component is characterized by theta connectivity patterns showing the opposite direction of flow, as well as qualitatively distinct delta connectivity patterns in parietal and occipital regions.

img

It is important to remember that these are statistical reductions of the connectivity data, that are potentially specific to this (small) sample. Nonetheless, given that the first two components explain almost 40% of the total variance, this can still be a useful descriptive tool, highlighting major trends across a complex dataset.

Combined inputs

Although we won't explore this further here, there is not restriction that that inputs to PSC have to come from the same analysis, or even the same class of analysis. For example:

  • we could combine power values from N2, N3 and REM sleep to create a set of composite PSCs, that in principle could capture stage-specific similarities and differences (here, one would label the variables PSD_N2, PSD_N3 and PSD_R in the headers, for example, if there were similar files as res/psd.n2, and then specify all three files as a comma-delimited list after spectra, and also add var=PSD_N2,PSD_N3,PSD_R; now, each row would contain concatenated spectra across the three stages (tracked by VAR in the output)

  • we could alternatively combine the PSD and PSI values in a single analysis (or including cross-spectral power (CSPEC from COH along with the individual channel spectra); if combining two qualitatively different types of metric (here PSC and PSI) it would be important to add the norm option, i.e. so that arbitrary unit differences between metrics do not unduly influence the solutions):

     luna --psc -o out/psc-combined-n2.db \
          --options spectra=res/spectral.welch.n2.allchs,res/psi2.n2 nc=10 v=PSD,PSI norm
    
    reading spectra from res/psd.n2
    reading spectra from res/psi2.n2
    converting input spectra to a matrix
    found 20 rows (observations) and 18867 columns (features)
    finished making regular data matrix on 20 observations
    after outlier removal, 20 observations remaining
    standardizing data matrix
    about to perform SVD...
    done... now writing output
    

Summary

In this section we've introduced PSC as a tool for dimension reduction of high dimensional sleep metrics:

  • we applied it to multi-channel N2 spectral data, as well as pairwise sensor-space connectivity metrics (PSI)

  • the resulting PCA structures show that often only a handful of components are necessary to explain the bulk of the variance in these measures

  • we've considered approaches to visualizing component loadings, i.e. to get a sense of what a given component captures

  • we've also noted that solutions can be transfered between (comparable) independent samples

A fuller discussion of the potential benefits and caveats of this type of approach are beyond the scope of this walkthrough. We plan to add a future section in which we explore applications of PSC/PCA further.


In the next section we'll consider spindle/SO detection and quantification.