Analyzing human sleep EEG: a methodological primer with code implementation

Drs. Roy Cox and Juergen Fell recently published an excellent review/tutorial manuscript in Sleep Medicine Reviews, providing a useful overview of some common approaches -- and associated pitfalls -- for the analysis of sleep EEG data. Of note, the authors also provided open data and Matlab code, thereby enabling readers to reproduce and better understand the methods.

Cox R & Fell J (2020) Analyzing human sleep EEG: A methodological primer with code implementation, Sleep Medicine Reviews. https://doi.org/10.1016/j.smrv.2020.101353

In the vignette below (co-authored with Dr. Roy Cox), we'll use Luna to look at the data from this manuscript (labelled C&F), attempting to reproduce some of the key Figures. By comparison against the transparent Matlab code C&F provide, our intention here is to 1) illustrate using Luna in the context of high-density sleep EEG data, and 2) hopefully cast a little light onto the admittedly black box nature of various Luna commands. A strong theme of C&F is to point out the ways in which seemingly minor methodolgocial differences can have marked effects on the results of analyses: in this spirit, we step through some of these methods and try to make Luna's particular implementation clear, as well as validating our results against C&F's standard.

Scope of this vignette

We do not attempt to cover all aspects of C&F in this vignette (for example, the discussion of statistical hypothesis testing, or any in-depth consideration of the impact of reference choice). Furthermore, this vignette is not intended to be a guide to best practices when working with sleep data. Rather, here we have a more limited goal of simply making connections between quite different software implementations (arguably, each with their own strengths and weaknesses) that can hopefully achieve comparable results.

Data

C&F provided EEG recordings (each comprising 58 EEG channels) on three individuals, for both N2 and N3 sleep, using three referencing schemes: linked mastoids, the common average and surface Laplacian. The original 18 (i.e. 3 x 2 x 3) datasets are distributed by C&F in Matlab/EEGLAB format (available here). As Luna primarily works with EDFs, we've converted these data to EDF and posted them on the NSRR. To follow the tutorial below, you should download these EDFs, which are split into three archives according to the reference used. As almost all of this tutorial uses the linked mastoid versions of the data, you can start by only downloading the first one (linked mastoid reference):

Archive	Contents	Filename
Linked mastoid EDFs (primary)	6 EDFs (3 individuals for N2 & N3)	`cox_fell_edfs_mast.zip`
Common average EDFs	6 EDFs (3 individuals for N2 & N3)	`cox_fell_edfs_ave.zip`
Surface Laplacian EDFs	6 EDFs (3 individuals for N2 & N3)	`cox_fell_edfs_lap.zip` - not posted

After downloading the zip archive to your working directory, extracting the contents should generate a subfolder edfs containing six EDFs:

unzip cox_fell_edfs_mast.zip

Code

C&F provide Matlab code to reproduce all of the Figures in their manuscript; for convenience, C&F's Matlab code (version released 2020-07-03) is copied here (separate from the original data files). For brevity, in the examples below we only show some of the primary Matlab commands: in order to fully follow C&F's workflow, you'll need to run the entire scripts (and see the readMe.txt in the code folder). Each section below links to the original, full version of each script.

In particular, below we consider the following, starting simple and then building up to more sophisticated analyses, includng empirical, surrogate-based coupling and connectivity analyses:

Looking at raw signal data
Spectral analysis using the DFT, the Welch algorithm and topographical visualization
Examples of applying Hilbert and wavelet transformations
Phase-amplitude coupling (dPAC) analysis using surrogate time-series
Connectivity analysis with weighted Phase Lag Index (wPLI), using surrogate time-series

Each section focuses on reproducing Figures and results from C&F as closely as possible. We do not explain the commands or methods in detail -- the C&F manuscript does a good job of that, as well as pointing to the relevant core literature. Rather, our aim is to implement some of the methods described using Luna, with a focus on better understanding any differences in implementations.

In a future, follow-up vignette, we will go on to use these data to explore other Luna commands including areas not covered by C&F: e.g. the impact of filter choice, other connectivity metrics, multi-channel artifact detection and interpolation, and time-series clustering to reveal epoch-level dynamics and channel-level substructure. Stay tuned...

In the text below, Matlab code and Figures taken (with the authors' permission) directly from C&F are shown in purple boxes like this:

e.g. C&F Matlab code or Figure

%load data
EEG=pop_loadset([dataFolder subName filesep subName '_N3_mast.set']);

In contrast, Luna commands (entered via the terminal/command-line), any R statements, as well as any output, are listed as follows:

luna s.lst -o o.db -s HEADERS

Note that this vignette is structured for didactic purposes -- exposing some of the underlying methods implemented in Luna including FFT, filter-Hilbert and wavelet methods -- rather than for showing typical best practices in how to use Luna. That is, below we'll write signals to files on disk, and use fairly 'low level' commands such as CWT and HILBERT, which would not typically be done in practice. Rather, those types of commands are wrapped into higher-level commands (such as SPINDLES or CC, etc). That is, if this is your first exposure to Luna, you'll get a better sense of the general workflows by following the tutorials. Also note that version Luna v0.24 is required to be able to follow all of the examples below.

Setting up

To generate a Luna sample-list, use the --build command:

luna --build edfs/ > s.lst

The file s.lst is a simple text file, with six rows corresponding to the six EDFs. We can examine the contents of the EDFs via the DESC command, here applied just to the first EDF (row of s.lst):

luna s.lst 1 -s DESC

EDF filename      : ./edfs/pp1_N2_mast.edf
ID                : pp1_N2_mast
Clock time        : 13.11.05 - 14.42.05
Duration          : 01.31.00
# signals         : 59
Signals           : Fp2[400] Fp1[400] AF8[400] AF4[400] AFz[400] AF3[400]
                    AF7[400] F8[400] F6[400] F4[400] F2[400] Fz[400]
                    F1[400] F3[400] F5[400] F7[400] FC6[400] FC4[400]
                    FC2[400] FCz[400] FC1[400] FC3[400] FC5[400] T8[400]
                    C6[400] C4[400] C2[400] Cz[400] C1[400] C3[400]
                    C5[400] T7[400] TP8[400] CP6[400] CP4[400] CP2[400]
                    CPz[400] CP1[400] CP3[400] CP5[400] TP7[400] P8[400]
                    P6[400] P4[400] P2[400] Pz[400] P1[400] P3[400]
                    P5[400] P7[400] PO8[400] PO4[400] POz[400] PO3[400]
                    PO7[400] O2[400] Oz[400] O1[400] Status[400]

As made evident by the ID, this EDF contains data on the first individual (pp1), for N2 sleep using a linked mastoid referencing scheme. There are 58 EEG channels (plus a Status channel which we will ignore), each with a sampling rate of 400 Hz.

Hint

Instead of using a number to indicate a row of the sample list, you can also specify an individual by their ID:

luna s.lst pp1_N2_mast -s DESC

See the main documentation for more details.

Extracting raw signals

In Supplementary Figure A-1B, C&F take a 20-second epoch of data, plot the raw signal, and subsequently apply spectral analysis. They use linked mastoid N3 data from subject pp3, selecting a 20-second window (from 1660 to 1680 seconds from the start of the EDF) of the Cz channel. Here are script excerpts used to setup and load these data; C&F plot the signal as below:

C&F Supplementary Figure A-1 (B) : illustrative 20-seconds of raw sleep EEG (Cz)

Using Luna to achieve the same goal, we will first extract the raw signal to a text file (cz.txt) and then use R to plot it. Here we a) set 20-second EPOCHs, b) MASK all epochs that do not span the interval 1660-1680 seconds, and then c) write this signal to a file, using the MATRIX command:

luna s.lst pp3_N3_mast -s 'EPOCH dur=20 & 
                           MASK sec=1660-1680 & 
                           MATRIX sig=Cz minimal file=cz.txt'

Because the default epoch length is 30 seconds in Luna, we had to explicitly tell it to use 20-second instead. Then, from within R, for example, we can generate a simple plot:

plot( (1:8000/400), scan( "cz.txt" ),
      type = "l", ylab = "Cz", xlab = "Seconds", col="darkorange" )

Alternatively: using lunaR

To simplify matters in this vignette, we are not going to use the lunaR package: instead, we will swap between command-line luna (lunaC) and the R package for any visualization, as above. However, all the steps performed here could be performed using only lunaR if so desired, and in some instances this could be simpler/preferrable. For example: here we attach the correct EDF, extract a data-frame d that corresponds to 20-seconds of Cz, and then plot it:

library(luna)
sl <- lsl( "s.lst" )
lattach( sl , "pp3_N3_mast" )
d <- ldata.intervals( chs = "Cz" , i = list( c( 1660 , 1680 ) ) ) 
plot( d$SEC , d$Cz , type="l" , ylab="Cz" , xlab="Seconds" , col="darkorange" )

Spectral analyses

Following C&F, we next consider spectral analysis using the discrete Fourier transform (DFT) (implemented using the fast Fourier transform, FFT) first on a simulated signal, then on the above interval of real EEG data (20-seconds of Cz from N3). We'll then use the Welch method (rather than raw DFT) to estimate a better PSD spectrum.

Simulated data

Hint

If you're familiar with DFT/FFT basics skip this section, which purposefully duplicates the introductory example material in C&F. Skip ahead to where we use Luna to do DFT/FFT on real sleep signals

C&F generated a simple, simulated signal using this code, and then applied this code, invoking Matlab's fft() function to estimate the (normalized) amplitude spectrum of this signal: the key lines are copied below:

powerDefined.m excerpt : estimating the amplitude spectrum of simulated data

%fft function does not return actual frequencies, so compute
sineFreqs_fft=linspace(0,1,numSamplesSine/2+1)*NyLimit;

%perform fft
sineFft_twoSided=fft(sineWaves); %complex-valued, two-sided Fourier spectrum

%for single-sided spectrum, we take first half of results (positive
%frequencies), plus 1 for DC
sineFft_oneSided=sineFft_twoSided(1:numSamplesSine/2+1);

%non-normalized amplitude spectrum
sineAmp_oneSided=abs(sineFft_oneSided);

%normalized spectrum (by data length, and multiplied by two include
%negative frequencies)
sineAmpNorm_oneSided=2*sineAmp_oneSided/numSamplesSine;

They plot the simulated signal, which has sinusoidal oscillations at 1 and 13 Hz, along with the normalized amplitude (sineAmpNorm_oneSided):

C&F Supplementary Figure A-1 (A, F) : simulated signal and amplitude from DFT/FFT analysis

Here, we'll repeat these steps, but first using R to generate the simulated signal, writing it to a text file s1.txt:

sRate <- 400
sPeriod <- 1 / sRate
sineLength <- 5
sineTime <- seq(0 , by = sPeriod , length.out = sineLength * sRate )

# SO-like sine
sineFreq_1 <- 1
sineAmp_1 <- 50
sineWave_1 <- sineAmp_1 * sin( 2 * pi * sineFreq_1 * sineTime)

# spindle-like sine
sineFreq_2 <- 13
sineAmp_2 <- 10
sineWave_2 <- sineAmp_2 * sin( 2 * pi * sineFreq_2 * sineTime)

sineWaves <- sineWave_1 + sineWave_2
plot( sineTime , sineWaves , type="l" ) 
write( format( sineWaves , digits=20 ) , file="s1.txt" , ncolumns = 1 )

That is, s1.txt contains a signal 5 seconds in duration, sampled at 400 Hz (i.e. 2000 data points). It comprises a single signal that is a composite of a slow 1 Hz sinusoidal oscillation of large amplitude (50) and a faster 13 Hz oscillation of relatively smaller amplitude (10).

We can now perform the same DFT/FFT analysis using Luna. Although the DFT/FFT is core to many methods, in practice Luna uses Welch's algorithm (in PSD) or multi-tapers (in MTM), either of which are generally superior for sleep EEG data, as C&F note.

For the purpose of this vignette however, we shall uncover a hidden Luna option (actually an internal, debugging feature), which provides direct access to the FFT algorithm via the -d option, albeit applying it only to a single signal read from standard input. (The FFT command does the same for EDF signals.) From the section above, we've already written the signal to the file s1.txt, and so we can run the following, where the second parameter is the sample rate, 400 Hz:

luna -d fft 400 < s1.txt

N        F           RE            IM   UNNORM_AMP     NORM_AMP          PSD
0        0  -8.8543e-13             0   8.8543e-13  4.42715e-16  9.79982e-31
1      0.2 -1.37534e-12  -3.51501e-13  1.41955e-12  1.41955e-15  5.03781e-30
2      0.4 -9.66871e-13  -8.34902e-13  1.27746e-12  1.27746e-15  4.07975e-30
3      0.6 -1.27042e-12   1.66038e-12  2.09066e-12  2.09066e-15  1.09271e-29
4      0.8 -1.50445e-12   1.66702e-12  2.24552e-12  2.24552e-15  1.26059e-29
5        1 -1.23403e-11        -50000        50000           50         6250
6      1.2  1.90059e-12  -6.63079e-12  6.89779e-12  6.89779e-15  1.18949e-28
7      1.4  3.44644e-12  -3.94652e-12  5.23955e-12  5.23955e-15  6.86323e-29
...
63    12.6  2.03742e-12  -2.06926e-11  2.07926e-11  2.07926e-14  1.08083e-27
64    12.8 -8.86146e-14  -4.31716e-11  4.31717e-11  4.31717e-14  4.65949e-27
65      13  1.40656e-10        -10000        10000           10          250
66    13.2  7.86557e-13   4.58913e-11   4.5898e-11   4.5898e-14  5.26657e-27
67    13.4  1.78749e-12   2.01059e-11  2.01852e-11  2.01852e-14  1.01861e-27
...
997  199.4   1.1755e-12  -4.71233e-12  4.85673e-12  4.85673e-15  5.89696e-29
998  199.6 -9.80151e-13  -3.55298e-12  3.68569e-12  3.68569e-15  3.39608e-29
999  199.8  3.88777e-13   4.94795e-12   4.9632e-12   4.9632e-15  6.15834e-29
1000   200  2.47011e-12  -5.04871e-28  2.47011e-12  2.47011e-15  7.62679e-30

This outputs 1, 001 rows (plus a header), showing a single side of the symmetric DFT (i.e. ignoring negative frequencies which, as the input data are real-valued, are redundant).

Column F (sineFreqs_fft in the Matlab code) contains the frequencies, ranging from 0 (DC) to 200 (the Nyquist frequency, i.e. half the sampling rate). The difference between each frequency (i.e. the spectral resolution) is 0.2 Hz (i.e. which equals the sampling rate over the number of sample points in the DFT, 400/2000).

The columns RE and IM are the real and imaginary parts of the Fourier transform, which correspond to the complex-valued sineFft_twoSided in the Matlab code.

The column UNNORM_AMP is the unnormalized amplitude of the Fourier transform (sineAmp_oneSided in the Matlab code): this is the absolute value of the corresponding complex number.

The column NORM_AMP (sineAmpNorm_oneSided in the Matlab code) is the amplitude normalized by dividing by 2000 (the number of data points) and, for non-DC components, multiplying by a factor of 2 (to correctly scale after ignoring the negative frequency side of the spectrum).

Most values are very small (i.e. due to trivial numerical imprecisions) and are effectively 0, as would be expected for this simulated signal. Only for 1 Hz and 13 Hz frequency bins do we see substantial non-zero values. Note, the normalized amplitudes are 50 and 10 for the 1 Hz and 13 Hz frequency components respectively, which mirror the original values used in the simulation code (sineAmp_1 and sineAmp_2).

Finally, Luna also outputs the power spectral density (PSD), which has units of micro-volts squared per Hz. To obtain this we instead i) square the original, unnormalized amplitude, ii) double all non-DC points and then iii) normalize by the sampling frequency as well as the number of sample points 1/Fs.N (i.e. dividing by 400 * 2000 = 800,000) to get the PSD, which gives values of 6250 and 250 for 1 Hz and 13 Hz, respectively.

Contrasting with C&F, we see similar results. If the above output had been saved to the file out/sim.fft, we can make the same plot of the normalized amplitude spectrum:

d <- read.table( "out/sim.fft" , header = T ) 

plot( d$F[ d$F <= 30 ] , d$NORM_AMP[ d$F <= 30 ] , type="l" ,
      col="darkgreen" , xlab="Frequency (Hz)" , ylab="Normalized amplitude" )

FFT of real data

C&F next apply DFT to the 20-second interval of real data extracted above (i.e. Figure A-1), using the same basic code as above (see here), resulting in the following PSD spectrum (left panel):

C&F Supplementary Figure A-1 (G,H) : estimating PSD spectra with raw DFT/FFT and Welch's method

We can use the same -d fft option, but feeding in the cz.txt channel generated earlier:

luna -d fft 400 < cz.txt > out/fft.out

Plotting the spectrum, the results are identical to the Matlab code (nb. only showing amplitude and PSD, not power):

fft <- read.table( "out/fft.out" , header = T ) 

plot( fft$F ,log10( fft$PSD ) , type="l" , xlim=c(0,30) , ylim=c(-5,3) ,
      col="blue" , ylab="amplitude (green) / PSD (blue)" , xlab="Freq (Hz)" ) 

lines( fft$F ,log10( fft$NORM_AMP ) , type="l" ,
       xlim=c(0,30) , ylim=c(-5,3) , col="darkgreen" )

Welch algorithm

The plot of the Figure A-1 H (shown above) in C&F gives the PSD spectra estimated using Welch's method, cutting up the signal into small, overlapping chunks and averaging PSD over these chunks to provide a more reliable final result, which has lower spectral resolution, but reduced variance of the estimates. This is the code they use for the 20-second segment; and this for the entire Cz channel (all N3 for this individual).

Here we'll apply Luna's default PSD command (which implements Welch's method) on the same 20-second segment, working directly with the EDF and saving the output in a Luna-format database (with the -o option):

luna s.lst pp3_N3_mast -o out/welch.db \
      -s 'EPOCH dur=20 & 
          MASK sec=1660-1680 & RE & 
          PSD sig=Cz spectrum '

To view the results, we use the destrat utility that accompanies Luna (only showing the relevant rows here):

destrat out/welch.db

--------------------------------------------------------------------------------
distinct strata group(s):
  commands      : factors           : levels        : variables 
----------------:-------------------:---------------:---------------------------
                :                   :               : 
  [PSD]         : F CH              : 99 level(s)   : PSD
                :                   :               : 
  [PSD]         : B CH              : 8 level(s)    : PSD RELPSD
                :                   :               : 
----------------:-------------------:---------------:---------------------------

That is, amongst other things, the output database contains a variable PSD stratified by both frequency F and channel CH, which corresponds to the spectrum for Cz. We can view this output as follows:

destrat out/welch.db +PSD -r F CH

ID             CH     F      PSD
pp3_N3_mast    Cz     0.5    407.176141081321
pp3_N3_mast    Cz     0.75   365.011219296792
pp3_N3_mast    Cz     1      312.329794883209
pp3_N3_mast    Cz     1.25   166.992808840289
pp3_N3_mast    Cz     1.5    64.7901538256349
pp3_N3_mast    Cz     1.75   68.330452921674
pp3_N3_mast    Cz     2      39.6088857305875
...

Below we plot these values (e.g. using R's plot() function as above, in red), but superimposed on the results obtained from running the C&F code (in gray) for this 20-second segment, plotting the log-scaled PSD:

That is, the gray line is the original Matlab output from pwelch(), whereas the red is the output from Luna's default PSD command. We can see pretty close alignment between the two sets of estimates, but not complete. These differences reflect differences in how the two methods are parameterized/fine-tuned: for didactic purposes, we'll step through these here, and see whether we're able to reproduce an identical version of the Matlab results.

First, we see that the Luna output only ranges from 0.5 Hz to 25 Hz; in contrast, the Matlab output gives the full spectrum, from 0 to 200 Hz (although we only plot up to 30 Hz). This difference in the range of the frequencies output is a trivial consequence of Luna's default, which by default truncate output at these values. These defaults can be changed easily, by adding the options min=0 max=30, i.e. to get all values between 0 and 30 Hz.

There is another difference in the frequencies, however: the Luna outputs frequencies in steps of 0.25 Hz, whereas the Matlab code is in steps of 0.1953 Hz (i.e. the spectral resolution):

ID            CH   F      PSD
pp3_N3_mast   Cz   0.5    407.176141081321
pp3_N3_mast   Cz   0.75   365.011219296792
pp3_N3_mast   Cz   1      312.329794883209
pp3_N3_mast   Cz   1.25   166.992808840289
pp3_N3_mast   Cz   1.5    64.7901538256349
pp3_N3_mast   Cz   1.75   68.330452921674
pp3_N3_mast   Cz   2      39.6088857305875
...

There are two reasons for this difference.

First, by default, Luna uses 4-second segments with 50% (i.e. 2-second) overlap, whereas the Matlab code used 5-second segments (also with 50%, i.e. 2.5-second overlap). Both are reasonable options and will typically give similar (albeit not identical) results; we can change the segment size (and overlap) with the segment-sec and segment-overlap options. The spectral resolution is given by the sampling rate (400 Hz) divided by the size of the FFT (i.e. of 1600 datapoints). For 4 seconds, this gives 400 / ( 4 * 400 ) = 0.25 Hz, as we observed in the Luna output.

In contrast, C&F used a 5-second segment size for each FFT, which following the logic above would yield a frequency resolution of 0.2 Hz, i.e. 400 / ( 5 * 400 ). However, the Matlab code gives a slightly different value, 0.1953 Hz. This reflects yet another small difference in implementations. By default, the pwelch() does not apply an FFT of size 5 * 400 = 2000 sample-points, but instead will use the next highest power of 2, namely 2048 (2^11). This is done purely for computational reasons, as (for highly technical reasons) the FFT algorithm is optimized for inputs that are a power of 2. Internally, the pwelch() command will pad the input, by adding zeros at the end of the signal (i.e. 48 zeros in this case). This will not fundamentally change the analysis, but it will change the spectral resolution and therefore the size of the frequency bins of the transform. In this instance, we have 400 / 2048 = 0.1953 Hz. For comparability, we can force Luna to also zero-pad up to the next power of 2, by adding the pow2 option to PSD.

So, let's try again, repeating the above command but with these additional options:

luna s.lst pp3_N3_mast  -o out/welch.db \
     -s 'EPOCH dur=20 &
         MASK sec=1660-1680 & RE &
         PSD sig=Cz spectrum segment-sec=5 segment-overlap=2.5 min=0 max=30 pow2 '

This is closer. We now see that the Luna output has the same spectral resolution as the Matlab code, of 0.1953 Hz.

       F        PSD
0.000000   2.711244
0.195312  16.079487
0.390625 215.816172
0.585938 422.248449
0.781250 313.669716
0.976562 301.519550
...

As we see above, however, the lines still do not overlap completely. This is due to yet another small difference in the behavior of these routines. By default, pwelch() applies a Hamming window to each (5-second) segment, to avoid edge artifacts in the FFT. Luna uses a 50%-Tukey window by default, however. Again, both are reasonable options, and likely to give comparable results under most circumstances. We can make Luna use a Hamming window instead, by adding the hamming option:

luna s.lst pp3_N3_mast  -o out/welch.db \
     -s 'EPOCH dur=20 &
         MASK sec=1660-1680 & RE &
         PSD sig=Cz spectrum segment-sec=5 segment-overlap=2.5  min=0 max=30 pow2 hamming '

Now we finally see identical results from both Luna and the Matlab script: (gray plotted with extra width so we can see otherwise perfectly overlapping lines):

In contrast, we could have altered the Matlab code to match Luna's default behavior. Rather than the original code:

Original Matlab pwelch() usage

fftWindowLength=5; 
fftWindowOverlap=0.5; 
[rawPSD,freq_welch] = ...
   pwelch(EEG_singleChan_30s.data',sRate*fftWindowLength,fftWindowOverlap*sRate*5,[],sRate);

we might use something like:

Revised Matlab pwelch() usage: 4-second segments, fixing FFT size to 2000, and using a Tukey window

fftWindowLength=4;
fftWindowOverlap=0.5;
[ rawPSD , freq_welch ] = ...
       pwelch( EEG_singleChan_30s.data', tukeywin( sRate * fftWindowLength , 0.5 ) , ...
               fftWindowOverlap * sRate * fftWindowLength, sRate * fftWindowLength , sRate);

Running this second version of the pwelch() gives identical results to the original Luna PSD command with default options.

So, we've seen how segment length and overlap, size of the FFT and windowing choices can impact the results of spectral analysis. Do these things matter? Most likely not, we'd expect both options to give broadly comparable results, as seen from the first plot. However, there is value in getting the programs to match identically, as it supports the correctness of the implementation of both, and in the spirit of C&F, it forces one to consider all the small, oft-overlooked details of these commonly used commands, which sometimes might have more series, deleterious impact on results.

FFT size and powers of 2

As regards the FFT size, for the implementation Luna uses, using a power of 2 versus not seems to have only a trivial impact on speed. Typical PSD/Welch analyses of sleep data are fast in any case, and so there is limited room for improvement. In this case, although it does not really matter, having "round" frequency bins (i.e. 0.25 Hz or 0.2 Hz steps) seems a bonus, and the fraction of a second extra for the command to complete is a small price to pay (i.e. given these analyses are being done off-line rather than transforming a signal in real-time).

PSD topoplots

C&F go on to apply PSD/Welch analysis to more signals, and consider the differences between absolute and relative power. They also show how to generate so-called topo-plots using EEGLAB, as shown here.

C&F Figure 1 (C, D) : topographical representation of absolute and relative band PSD

This analysis was performed on individual pp2 (N3 sleep, mastoid reference). We can generate PSD for all EEG channels as follows:

luna s.lst pp2_N3_mast -o out/all.db -s 'PSD sig=${eeg} spectrum'

Luna has a number of special variables (in format ${var}). The ${eeg} variable is automatically generated based on channel names (i.e. versus EOG, EMG, ECG, respiratory signals, and other PSG channels typically encountered). This has the effect of generating 58 PSD for this individual, all saved in the out/all.db database in this case. These can be extracted to a simple text file (e.g. all.txt) as follows:

destrat out/all.db +PSD -r F CH > out/all.txt

We can use R to plot the results. First, we'll load the PSD and generate a new column which is the relative PSD, i.e. the raw PSD divided by the total power (0.5 to 25 Hz in this case), and then log-scale both measures:

d <- read.table( "out/all.txt", header=T )
tot <- tapply( d$PSD , d$CH , sum ) 
tot <- data.frame( CH = names(tot) , TOT = tot ) 
d <- merge( d , tot , by="CH" )
d$REL <- log10( d$PSD / d$TOT )
d$PSD <- log10( d$PSD )

To plot the four spectra, as per the C&F plot above (left panel):

chs <- c("AFz","Cz","Pz","Oz")
cols <- c("blue","red","green","orange")
freqs <- c( 0.75 , 6 , 10.75 , 13.25 , 25 )
par(mfcol=c(1,2))

plot(d$F[d$CH==chs[1]],d$PSD[d$CH==chs[1]],
     type="n",xlab="Frequency(Hz)",ylab="Absolute PSD")
abline(v=freqs,col="gray")
for (i in 1:4) lines(d$F[d$CH==chs[i]],d$PSD[d$CH==chs[i]],col=cols[i],lwd=2)

plot(d$F[d$CH==chs[1]],d$REL[d$CH==chs[1]],
     type="n",xlab="Frequency(Hz)",ylab="Relative PSD")
abline(v=freqs,col="gray")
for (i in 1:4) lines(d$F[d$CH==chs[i]],d$REL[d$CH==chs[i]],col=cols[i],lwd=2)

This gives us the same spectra for these four selected channels (AFz, Cz, Pz and Oz) as C&F:

Next, we can generate some rough topoplots. Visualization has not been a strong focus of Luna to date, and so there is not a fully developed visualization routine included. It's very easy to roll your own, however, to at least get a rough visualization (even if the figures aren't as pretty as in C&F). First, we'll load some coordinates for the channels:

xy <- read.table( "http://zzz.bwh.harvard.edu/dist/luna/cf/xy.lay", header=T)

Next, we'll define a simple ftopo() function, which uses the akima library for interpolation:

library( akima )
ftopo <- function( ch , z , np = 40 , nc = 0 , main = "" , 
         cols=colorRampPalette(c("black","#220000","darkred", "red4" , "red",
                                  "orangered1","orange","yellow","white"))(100))
{
 ch <- toupper(ch)
 d <- data.frame(ch, z )
 names(d) <- c("CH", "Z")
 d <- merge( xy, d, by="CH" )
 pp <- seq(-.5, .5, length.out=np )
 ii <- interp( d$X , d$Y , d$Z , nx=npoints , ny=npoints , xo=pp, yo=pp )
 image(ii, breaks= quantile(z, seq(0,1,0.01) ), col=cols, axes=F, main=main )
 if (nc>0) contour(ii$x, ii$y, ii$z, nlevels=nc, drawlabels=F, add=T)
}

Then we can apply this function across the different selected frequency bins, following C&F:

par( mfcol=c(2,5) , mar=c(0,0,2,0) )
for (f in freqs) {
 flt <- d$F == f 
 ftopo( d$CH[flt] , d$PSD[flt] , main=paste(f, "Hz" ) )
 ftopo( d$CH[flt] , d$REL[flt] )
}

This effectively reproduces the results of C&F, albeit with messier visual rendering:

PSD specification

As we've noted , small issues of window choice, FFT size, segment size, etc, will slightly alter the results of the two pipelines, although these can be modified if so desired to be completely concordant, as described above.

Hilbert and wavelet transforms

In Figure 5, C&F illustrate how to (and how not to) extract amplitude and phase information from a signal, using either a Hilbert or wavelet transforms.

Hilbert transform

Using code listed here, C&F apply the Hilbert transform to extract the phase of a simple sine wave, oscillating at 1 Hz. In Figure 5A, they show how the phase of the signal can be extracted (noting some ways in which the results can be misleading or incorrect; those panels are removed from the plot below though):

C&F Figure 5 (A) : extracting phase with the Hilbert transform

To reproduce this analysis and Figure, we'll first use R to generate the simulated sine wave (saved as the file sine.txt):

sRate <- 400
sPeriod <- 1/sRate
sineFreq <- 1
sineAmp <- 5
sineLength <- 4
sineTime <- seq( 0 , sineLength-sPeriod , by = sPeriod )
sineWave <- sineAmp*sin(2 * pi * sineFreq * sineTime)
write( format( sineWave , digits=20 ) , file="sine.txt" , ncolumns = 1 )

We can then use Luna's HILBERT command, coupled with reading in the sine.txt file directly, to estimate the phase of the signal:

luna sine.txt --fs=400 epoch-len=4 \
     -s 'HILBERT phase &
         MATRIX file=out/ht.txt'

The HILBERT command effectively adds two new channels to the internal/in-memory 'EDF'; the MATRIX command dumps all signals out to a text file. The input signal is assigned a label S1 automatically (if read from a text file rather than an EDF); the new channels are labelled S1_hilbert_mag and S1_hilbert_phase by default. We can use R to plot these values:

d <- read.table( "out/ht.txt" , header=T )
par( mfcol=c(2,1) , mar=c(4,4,1,1) )
plot( d$T , d$S1 , type="l" , col="red" , lwd=2 , xlab="Time (seconds)" , ylab="Amplitude" )
abline( h = 0 , col = "gray" )
abline( v = c( 0.25 , 1 ) , col=c( "magenta" , "goldenrod" ) , lwd=3 )
plot( d$T , d$S1_hilbert_phase , type="l" , col="blue" , lwd=2 , xlab="Time (seconds)" , ylab="Phase angle (radians)" )
abline( h = 0 , col = "gray" )
abline( v = c( 0.25 , 1 ) , col=c( "magenta" , "goldenrod" ) , lwd=3 )

We see that we're able to replicate the main features of Figure 5A.

We also see that Luna uses the same convention as C&F adopt, whereby phase is implicitly defined with respect to a cosine, such that peaks correspond to 0 radians and positive zero-crossings to -0.5π/1.5π radians (or -90/270 ̊).

Filter-Hilbert

The previous signal was narrow-band (i.e. only occupying only a very small space of the frequency spectrum). For other, more broad-band signals, we can use the filter-Hilbert approach to first extract (via bandpass filtering) a particular segment of the frequency spectrum, and apply the Hilbert transform to that, i.e. to obtain frequency/band specific magnitude and phase estimates.

C&F used this code to apply a filter-Hilbert to yet another simulated signal, which contained 1 Hz and 13 Hz components, and where the amplitude of the spindle (13 Hz) component was modulated by the phase of the slow (1 Hz) component. They used the filter-Hilbert to show how the amplitude of the fast oscillation varied over time (i.e. with the phase of the slow oscillation):

C&F Figure 5 (B) : filter-Hilbert to extract slow oscillation and spindle amplitude

We'll do the same here, again using R to simulate the signal, and then Luna to apply the filter-Hilbert transform:

sRate <- 400
sPeriod <- 1/sRate
sineLength <- 20
sineTime= seq( 0 , sineLength-sPeriod ,sPeriod )
dataLength <- length(sineTime)

# SO-like sine
sineFreq_1 <- 1
sineAmp_1 <- 5
sineWave_1 <- sineAmp_1*sin(2*pi*sineFreq_1*sineTime)

# spindle-like sine
sineFreq_2 <- 13;

# calculate phase-based modulation of amplitude
ampMod <- sineWave_1 / max(sineWave_1)

# but no spindle activity when SO amplitude is negative
ampMod[ampMod<0] <- 0 

# modulate spindle amplitude depending on SO phase
sineWave_2 <- ampMod * sin(2*pi*sineFreq_2*sineTime)

# noise
noiseAmp <- 0.2
noise <- noiseAmp*runif( length(sineTime) )
compoundSignal <- sineWave_1 +sineWave_2 + noise

# select small window and plot
window <- sineTime>=1 & sineTime < 5 
png( file="img/simulated2.png" , height=300 , width=800 , res=100 ) 
plot( sineTime[window] , compoundSignal[window] , type="l" , xlim=c(1,5) ,
 xlab="Time (seconds)" , ylab="Amplitude (uV)" , col="red",lwd=2) 
dev.off()

# save to disk
write.table( compoundSignal , file="wave2.txt" , row.names=F, col.names=F)

This generates the expected simulated signal:

C&F then used two bandpass FIR (finite impulse response) filters, designed by the Matlab firls() command, to apply to the signal prior to the Hilbert transform. We'll touch on filter design as applied to sleep EEG analysis in the follow-up vignette, but for now we can take one of two approaches when using Luna:

directly read in filter coefficients designed by other code, e.g. the Matlab script, using file
have Luna design the filters, using a windowed-sinc FIR filter design approach, either with Kaiser windows, or other if fixing the filter order, with other windows including the Hamming.

We'll skip the details of how these are specified, how they differ, etc, for now: we can use Luna's FIR-DESIGN command to look at some of the properties of these filters, confirming that C&F's Matlab FIR filters (red line), and Luna's (black line) have broadly comparable properties. (Note: the blue line is from C&F when using a shorter filter, 1 second / order 400, rather than the longer (5 second/ order 2000) FIR they use for the actual manuscript):

We can then use the HILBERT command also specifying a filter (here with f and order, which by default uses a Hamming-window FIR of the specified order, with transition frequencies (-3dB) at, e.g. 0.35 and 2.25 Hz, for the slow component:

luna wave2.txt --fs=400 epoch-len=20 \
     -s 'HILBERT f=0.35,2.25 order=2000 phase & MATRIX file=out/fh.slow'

We can do likewise for the spindle component:

luna wave2.txt --fs=400 epoch-len=20 \
     -s 'HILBERT f=11.5,14.5 order=2000 phase & MATRIX file=out/fh.spindle'

Finally, we can output the bandpass-filtered spindle-frequency signal, as plotted by C&F in Figure 5B:

luna wave2.txt --fs=400 epoch-len=20 \
     -s 'FILTER bandpass=11.5,14.5 order=2000 & MATRIX file=out/filt.spindle'

We'll postpone visualization of these until the next section, where we look at the comparable results using wavelets instead of a filter-Hilbert.

Wavelets

Next, C&F use wavelets (with this code) to extract the amplitude and phase of different components of the simulated signal, here saved in the file wave2.txt:

C&F Figure 5 (C) : wavelets to extract spindle amplitude and phase

The key result is in Figure 5C panel i, which shows how for the correct phase estimates, one needs to wrap the wavelet (see C&F for details):

We can use the CWT command in Luna, which is parallel to HILBERT, in that it provides low-level access to complex Morlet wavelets.

Here we can specify a wavelet using the same notation as C&F, by giving a center frequency (fc) and FWHM in the time domain (here 0.25 seconds):

luna wave2.txt --fs=400 epoch-len=20 \
     -s 'CWT fc=13 fwhm=0.25 phase & MATRIX file=out/cwt.spindle'

Note: this specification follows this reparameterization of the wavelet, rather than specifying a center frequency and then the number of cycles (as Luna up until v0.24 did).

We can use the CWT-DESIGN command to visualize the properties of this wavelet:

echo "fs=400 fc=13 fwhm=0.25" | luna --cwt-design -o out/cwt.db

We can view the empirical FWHM in both the time and frequency domains:

destrat out/cwt.db +CWT-DESIGN -r PARAM

ID   PARAM         FWHM   FWHM_F   FWHM_LWR   FWHM_UPR
.    13_0.25_400   0.25   3.5      11.25      14.75

i.e. this wavelet spans from 11.15 to 14.75 Hz, given the time-domain FWHM and fc. We can plot the MAG variable contained in this output also, to see the frequency characterization of the wavelet (with the horizontal line pointing to the two FWHM values):

The CWT command generates two new signals (S1_cwt_mag and S1_cwt_phase) in the in-memory EDF, which are then dumped to a file (via MATRIX).

We can plot these along with the (filtered) signal, and the estimates from the filter-Hilbert method above, to see how the amplitude and phase estimates from each method compare.

However, the default implementation of the CWT uses a standard wavelet, rather than a wrapped wavelet, as C&F note. Although this has no impact on the typical usage of wavelets (e.g for spindle detection, which is based on the amplitude of these signals), it will have an impact on the direct interpretation of the phase values, again, as C&F note. You can use a wrapped wavelet instead by adding the option wrapped. This then produces output that corresponds closely with the filter-Hilbert approach, and matches the results obtain by C&F's Matlab code (when using wrapped wavelets, i.e. Figure 5Ci).

luna wave2.txt --fs=400 epoch-len=20 \
     -s 'CWT fc=13 fwhm=0.25 phase wrapped & MATRIX file=out/cwt.spindle'

Plotting the spindle-band filtered signal, with the magnitude from filter-Hilbert (left) and CWT (right), we see broadly comparable results. (Note that the CWT command actually outputs power, so we plot the square root here):

As C&F note, some of the quirks in the filter-Hilbert plot reflect the imprecisions due to filtering -- this is an issue we'll revisit in the follow-up vignette.

Phase-amplitude coupling

C&F go on to present a debiased metric of phase-amplitude coupling, using this code. Focussing on two channels (AFz and Oz), they estimate coupling between slow oscillation phase and spindle amplitude, using wavelets to estimate the phase and amplitude at two frequencies (0.81 and 10.1 Hz).

To calibrate the metrics, they use time-series shifting, i.e. breaking down the coupling between the phase and amplitude components to generate an empirical null distribution for each metric:

C&F Figure 6 (E,F) : slow oscillation-spindle coupling

Their example considers a single 30-second epoch epoch from the EDF pp2_N3_mast. We can recreate those results here, with the CC command.

luna s.lst pp2_N3_mast  -o out/pac1.db  \
     -s 'MASK epoch=1 & RE 
         CC sig=AFz,Oz nreps=1000 pac fc=0.8094 fwhm=3.5149 fc2=10.1493 fwhm2=0.5525'

The nreps option specifies the number of time-shifted surrogates to use (per epoch) in order to estiamte the normalized dPAC metric (dPAC_Z). The CC command (described in more detail here) uses (by default) wavelets to estimate the amplitude and phase.

Note that the CC command includes cross-channel options also, and so all output is stratified in terms of two frequencies and two channels, even if (as in this instance) CH1 and CH2 are the same (i.e. dPAC is an intra-channel, but cross-frequency analysis). Looking at the relevant lines of output:

destrat out/pac1.db +CC -r CH1 CH2 F1 F2

ID           CH1   CH2   F1       F2        CFC   XCH      dPAC     dPAC_Z
pp2_N3_mast  AFz   AFz   0.8094   10.1493    1     0    0.80337    2.04179
pp2_N3_mast  Oz    Oz    0.8094   10.1493    1     0    0.04838    -1.3982

These results match the values observed by C&F: namely, for AFz, the observed dPAC statistic was 0.80 but the normalized Z-score associated with this was ~2.00 (note: the exact Z scores will vary run to run as they are based on randomisation). In contrast, for Oz, the observed slow oscillation-spindle coupling statistic was 0.05, which an associated negative Z score of -1.4. These match the values in the plot above.

Next, C&F apply the same type of dPAC analysis to all channels, to look at the topography of coupling, and how it varies based on the normalization scheme (i.e. raw dPAC vesus normalized dPAC_Z).

They use the code here to create these plots:

C&F Figure 7 (A) : phase amplitude coupling (dPAC) and surrogate-based normalization

Note, in this analysis they use 20 epochs from the EDF pp3_N3_mast. They consider two spindle frequencies also (slow and fast) to be coupled with the slow (<1 Hz) activity. We'll recreate this here using Luna and the CC command (restricted to the first 20 epochs, as per C&F):

luna s.lst pp3_N3_mast -o out/pac2.db \
     -s 'MASK epoch=1-20 & RE &
         CC sig=${eeg} nreps=200 pac fc=0.8094 fwhm=3.5149 fc2=11.4481,14.5656 fwhm2=0.5059,0.4242 '

i.e. note how fc2 and fwhm2 take multiple comma-delimited values. The CC command considers all pairs of fc by fc2 frequencies in this CFC/PAC analyses (with wavelets specified by the corresponding FWHM values in fwhm and fwhm2). Extracting the output:

destrat out/pac2.db +CC -r CH1 CH2 F1 F2 > out/pac2.txt

we can use the same ftopo() function we wrote above to give some quick-and-dirty topolots, which broadly correspond to C&F's results. (As above, it is the plotting that introduces apparent differences in results: the actual dPAC values calculated are identical between Matlab and Luna scripts).

frqs <- unique( d$F2 )
par(mfcol=c(1,4) , mar=c(0,0,3,0))
ftopo( d$CH1[ d$F2 == frqs[1] ] , d$dPAC[ d$F2 == frqs[1] ] , main="Slow spindes (dPAC)" )
ftopo( d$CH1[ d$F2 == frqs[2] ] , d$dPAC[ d$F2 == frqs[2] ] , main="Fast spindes (dPAC)" )
ftopo( d$CH1[ d$F2 == frqs[1] ] , d$dPAC_Z[ d$F2 == frqs[1] ] , main="Slow spindes (dPAC Z)" )
ftopo( d$CH1[ d$F2 == frqs[2] ] , d$dPAC_Z[ d$F2 == frqs[2] ] , main="Fast spindes (dPAC Z)" )

Connectivity analyses

Finally, C&F apply a connectivity analysis, using the weighted Phase Lag Index (wPLI). Considering only the first 20 epochs for channels AFz and Oz from the EDF pp3_N3_mast, they assess connectivity as a function of frequency (35 frequencies, from 0.5 to 30 Hz, evenly distributed on a log-scale), as shown here.

Their Figure 7B shows spikes in wPLI at slow and spindle frequencies; after evaluating via time-shifting surrogate analysis however, only the spindle spike remains significant:

C&F Figure 7 (B) : weighted Phase Lag Index (wPLI) and surrogate-based normalization

The CC command, introduced above, performs cross-channel intra-frequency connectivity (i.e. wPLI) as well as intra-channel, cross-frequency coupling (i.e. dPAC), by adding the xch (cross-channel) option instead of pac.

luna s.lst pp3_N3_mast -o out/wpli.db \
     -s 'MASK epoch=1-20 & RE &
         CC sig=AFz,Oz xch nreps=200 fc-range=0.5000,30.0000 fwhm-range=5,0.25 num=35 '

destrat out/wpli.db +CC -r CH1 CH2 F1 F2 > out/wpli.txt

Closing comments

We've used Luna to reconstruct a number of Figures and results from C&F, and (after tweaking one or two options) have been able to achieve effectively exact correspondence between C&F and this alternate implementation.

As mentioned, in the coming weeks we'll post a follow-up vignette which continues to illustrate different Luna commands by applying them to these data.