Skip to content

A deeper dive

In this third section of the tutorial, we use Luna to calculate some typical metrics of interest for sleep studies, including:

  • how basic statistics vary across the night
  • measures of sleep macro-architecture
  • filtering and artifact detection in the EEG
  • spectral analysis using fast Fourier transforms
  • spindle detection using wavelets

Epoch-level summaries

Returning to the STATS command described above, here we use it with the epoch option, in order to generate per-epoch statistics. This section also illustrates working with the text output format.

luna s.lst 2 sig=EMG,ECG -s 'EPOCH & STATS epoch' > res.txt
+++ luna | v0.26.0, 11-Nov-2021 | starting 01-Dec-2021 11:08:09 +++
input(s): s.lst
output  : .
signals : ECG EMG
commands: c1    EPOCH   
        : c2    STATS   epoch=T

Processing: nsrr02 [ #2 ]
 duration: 09.57.30 | 35850 secs | clocktime 21.18.06 - 07.15.36

 signals: 2 (of 14) selected in a standard EDF file

  Arousal (x51) | Hypopnea (x101) | N1 (x11) | N2 (x399)
  N3 (x185) | Obstructive_Apnea (x5) | R (x120) | SpO2_artifact (x34)
  SpO2_desaturation (x66) | W (x480)

  ecg=ECG | emg=EMG | id=nsrr02
   options: sig=ECG,EMG
 set epochs, length 30 (step 30, offset 0), 1195 epochs
   options: epoch=T sig=ECG,EMG
 processing ECG ...
 processing EMG ...

...processed 1 EDFs, done.
...processed 1 command set(s),  all of which passed
+++ luna | finishing 01-Dec-2021 11:08:14                       +++

This command was applied only to the second EDF in the sample-list (i.e. the 2 argument), and only to the ECG and EMG signals. After defining 30-second epochs (the default from the EPOCH command), it generates basic statistics per-epoch (the epoch option for the STATS command), as well as entire-signal summaries. Note that 1195 epochs are generated, which corresponds to the stated duration of 9 hours, 57 and a half minutes (i.e. ( 9 * 60 + 57 ) * 2 + 1 = 1195).

The first few lines of res.txt are as follows:

nsrr02   EPOCH   .        .   NE       1195
nsrr02   EPOCH   .        .   DUR      30
nsrr02   EPOCH   .        .   INC      30
nsrr02   STATS   CH/ECG   1   MAX      1.25
nsrr02   STATS   CH/ECG   1   MIN     -0.651961
nsrr02   STATS   CH/ECG   1   MEAN     0.00506797
nsrr02   STATS   CH/ECG   1   SKEW     1.68822
nsrr02   STATS   CH/ECG   1   MEDIAN   0.0343137
nsrr02   STATS   CH/ECG   1   RMS      0.271051
nsrr02   STATS   CH/ECG   2   MAX      1.25
That is, the first three lines (for which the 4th column is empty, .) are from the EPOCH command. The subsequent lines are the per-epoch values from the STATS command: e.g. the mean is 0.00506797 for the first epoch.

This format is designed to be easily parsed and read by other software. For example, to use R to plot the per-epoch RMS values of the ECG, we might use the following (from within R):

d <- read.table( "res.txt" , header=F , sep="\t" , stringsAsFactors=F )


As you'll see in the next section of this tutorial, it is now possible to read lunout databases directly into R

After loading the data, it is convenient to relabel the 6 columns to have sensible names:

names(d) <- c( "ID" , "COMM" , "STRATA" , "T" , "VAR" , "VAL" ) 

Our goal is to generate a plot of per-epoch RMS for the ECG. We can extract these values by filtering rows where VAR equals "RMS", when T is not ".", and when STRATA is "CH/ECG":

d <- d[ d$STRATA == "CH/ECG" & d$T != "." & d$VAR == "RMS" , ] 

We can confirm that this produces the expected number of rows/epochs:

 [1] 1195    6

We need to ensure that the epoch and value fields are treated as numeric data points:

d$T <- as.numeric(d$T)
d$VAL <- as.numeric(d$VAL)

As a sanity-check, we can view the structure of the data frame:

'data.frame':   1195 obs. of  6 variables:
  $ ID    : chr  "nsrr02" "nsrr02" "nsrr02" "nsrr02" ...
  $ COMM  : chr  "STATS" "STATS" "STATS" "STATS" ...
  $ STRATA: chr  "CH/ECG" "CH/ECG" "CH/ECG" "CH/ECG" ...
  $ T     : num  1 2 3 4 5 6 7 8 9 10 ...
  $ VAR   : chr  "RMS" "RMS" "RMS" "RMS" ...
  $ VAL   : num  0.271 0.279 0.268 0.264 0.265 ... 

To plot these per-epochs values:

plot( d$T , d$VAL , ylab = "RMS(ECG)" , xlab = "Epoch" , pch=20 )


Clearly there are a number of extreme points towards the end of the recording, presumably indicating bad data after the subject woke for the final time that night. We can confirm this by indicating the manually-annotated sleep stage for each point. Going back to the command line (type q() to get out of R, or even better, run the next couple of Luna commands in a separate window to keep the R session open), we can use Luna to extract sleep stages (using the STAGE command) in a format that can be read into R:

luna s.lst 2 -o ss.db -s STAGE
and extracting the output as a text file ss.txt:
destrat ss.db +STAGE -r E -v STAGE > ss.txt

We then load this sleep stage information into R as follows, which indicates that we have the expected number of epochs:

ss <- read.table( "ss.txt" , header=T )
[1] 1195    3

This is the breakdown of epochs by each sleep stage for this individual:

 N1  N2  N3   R   W 
 11 399 185 120 480 

If you previously closed the R session, you'll need to rerun the commands to create the d data-frame. Then, plotting epochs colored by sleep stage, we indeed confirm that the end aberrant epochs are all wake: (plot not shown).

plot( d$T , d$VAL  , ylab = "RMS(ECG)" , xlab = "Epoch" , pch=20 , col = as.factor( ss$STAGE  ) ) 
cols <- sort(unique(ss$STAGE))
legend("topleft",legend=cols , fill = as.factor( cols ) ) 

Simply ignoring these final points (which roughly corresponds to looking only up to the 950th epoch in this example), we can generate a clearer plot of how the RMS of the ECG varies across the night in this individual, which tracks with the progression of sleep and wake in a semi-predictable manner:

inc <- 1:950
plot( d$T[inc], d$VAL[inc], ylab="RMS(ECG)", xlab="Epoch", pch=20, col=as.factor(ss$STAGE[inc]) ) 
legend("bottomleft",legend=cols , fill = as.factor( cols ) ) 



Given a set of epoch-based staging values, the HYPNO command produces a series of statistics that describe different aspects of sleep macro-architecture.

luna s.lst -o stage.db -s 'EPOCH & HYPNO epoch'

This generate a large number of variables, as described here. There are various different strata groups in the output from STAGE:

destrat stage.db
stage.db: 2 command(s), 3 individual(s), 66 variable(s), 107919 values
  command #1:   c1  Thu Aug 13 13:40:56 2020    EPOCH   sig=*
  command #2:   c2  Thu Aug 13 13:40:56 2020    HYPNO   sig=*
distinct strata group(s):
  commands    : factors           : levels        : variables 
  [EPOCH]     : .            : 1 level(s)    : DUR INC NE
              :              :               : 
  [HYPNO]     : .            : 1 level(s)    : CONF E0_START E1_LIGHTS_OFF E2_SLEEP_ONSET
              :              :               : E3_SLEEP_MIDPOINT E4_FINAL_WAKE
              :              :               : E5_LIGHTS_ON E6_STOP EINS FIXED_LIGHTS
              :              :               : FIXED_SLEEP FIXED_WAKE FWT HMS0_START
              :              :               : HMS1_LIGHTS_OFF HMS2_SLEEP_ONSET
              :              :               : HMS3_SLEEP_MIDPOINT HMS4_FINAL_WAKE
              :              :               : HMS5_LIGHTS_ON HMS6_STOP LOST LOT
              :              :               : LZW NREMC NREMC_MINS OTHR REM_LAT
              :              :               : REM_LAT2 SE SFI SINS SME SOL SOL_PER
              :              :               : SPT SPT_PER T0_START T1_LIGHTS_OFF
              :              :               : T2_SLEEP_ONSET T3_SLEEP_MIDPOINT
              :              :               : T4_FINAL_WAKE T5_LIGHTS_ON T6_STOP
              :              :               : TIB TI_RNR TI_S TI_S3 TRT TST TST_PER
              :              :               : TWT WASO
              :              :               : 
  [HYPNO]     : E            : (...)         : CLOCK_HOURS CLOCK_TIME CYCLE CYCLE_POS_ABS
              :              :               : CYCLE_POS_REL E_N1 E_N2 E_N3 E_REM
              :              :               : E_SLEEP E_WAKE E_WASO FLANKING_ALL
              :              :               : FLANKING_MIN MINS N2_WGT NEAREST_WAKE
              :              :               : OSTAGE PCT_E_N1 PCT_E_N2 PCT_E_N3
              :              :               : PCT_E_REM PCT_E_SLEEP PERIOD PERSISTENT_SLEEP
              :              :               : STAGE STAGE_N TOT_NR2R TOT_NR2W
              :              :               : TOT_R2NR TOT_R2W TOT_W2NR TOT_W2R
              :              :               : TR_NR2R TR_NR2W TR_R2NR TR_R2W
              :              :               : TR_W2NR TR_W2R WASO
              :              :               : 
  [HYPNO]     : C            : 6 level(s)    : NREMC_MINS NREMC_N NREMC_NREM_MINS
              :              :               : NREMC_OTHER_MINS NREMC_REM_MINS
              :              :               : NREMC_START
              :              :               : 
  [HYPNO]     : SS           : 13 level(s)   : BOUT_05 BOUT_10 BOUT_MD BOUT_MN
              :              :               : BOUT_MX BOUT_N MINS PCT
              :              :               : 
  [HYPNO]     : N            : 141 level(s)  : FIRST_EPOCH LAST_EPOCH MINS STAGE
              :              :               : START STOP
              :              :               : 
  [HYPNO]     : PRE POST     : 9 level(s)    : N P P_POST_COND_PRE P_PRE_COND_POST
              :              :               :

The first (.) HYPNO group is a set of variables with no stratifiers: that is, simply one value per individual/EDF, such as Total Sleep Time (TST).

Stage durations are stratifed by SS (sleep stage). To view the number of minutes of each sleep stage for the three individuals, NREMC, we can use:

destrat stage.db +HYPNO -c SS/N1,N2,N3,R -v MINS
nsrr01   54.5         261.5         8.5           119
nsrr02   5.5          199.5         92.5          60
nsrr03   26           187.5         10.5          0

To view cycle-level variables (C factor), we can use the following to see the duration (in minutes) and start (in epoch number):

destrat stage.db +HYPNO -r C -v NREMC_START NREMC_MINS
nsrr01   1    45          87
nsrr01   2    100.5       184
nsrr01   3    47          451
nsrr01   4    77.5        654
nsrr01   5    60          819
nsrr01   6    50          956
nsrr02   1    72.5        92
nsrr02   2    133         237
nsrr02   3    42          503
nsrr02   4    105.5       675
nsrr02   5    32.5        887
nsrr03   1    69          118
nsrr03   2    74          312
nsrr03   3    137.5       630

To illustrate extracting epoch-level information: here we consider the designated sleep stage (STAGE variable), along with the time (CLOCK_TIME) and a measure of how many nearby epochs had the same stage (FLANKING_MIN). The latter is the minimum number of contiguous epochs, either forwards or backwards in time, that are similar to the index epoch (truncated at the start and end of the recording). In other words, selecting epochs with 3 or more for this variable means that there are at least 3 epochs before and 3 epochs after with a similar stage. This can be used to select periods of sleep that are less likely to contain stage transitions, for example.

destrat stage.db +HYPNO -i nsrr01  -r E  -v STAGE FLANKING_MIN CLOCK_TIME
nsrr01   1      21:58:17     0             W 
nsrr01   2      21:58:47     1             W 
nsrr01   3      21:59:17     2             W 
nsrr01   4      21:59:47     3             W 
nsrr01   5      22:00:17     4             W 
nsrr01   6      22:00:47     5             W 
nsrr01   7      22:01:17     6             W 
nsrr01   8      22:01:47     7             W 
nsrr01   9      22:02:17     8             W 
nsrr01   10     22:02:47     9             W 
nsrr01   11     22:03:17     10            W 
 ... cont'd ...                     
nsrr01   183    23:29:17     0             N1
nsrr01   184    23:29:47     0             N2
nsrr01   185    23:30:17     0             W 
nsrr01   186    23:30:47     0             N1
nsrr01   187    23:31:17     0             N2
nsrr01   188    23:31:47     1             N2
nsrr01   189    23:32:17     2             N2
nsrr01   190    23:32:47     3             N2
nsrr01   191    23:33:17     4             N2
nsrr01   192    23:33:47     5             N2
nsrr01   193    23:34:17     6             N2
nsrr01   194    23:34:47     7             N2
nsrr01   195    23:35:17     8             N2
nsrr01   196    23:35:47     9             N2
nsrr01   197    23:36:17     10            N2
nsrr01   198    23:36:47     10            N2
nsrr01   199    23:37:17     9             N2
nsrr01   200    23:37:47     8             N2
nsrr01   201    23:38:17     7             N2
nsrr01   202    23:38:47     6             N2
nsrr01   203    23:39:17     5             N2
nsrr01   204    23:39:47     4             N2
nsrr01   205    23:40:17     3             N2
nsrr01   206    23:40:47     2             N2
nsrr01   207    23:41:17     1             N2
nsrr01   208    23:41:47     0             N2
nsrr01   209    23:42:17     0             W 
nsrr01   210    23:42:47     0             N2
 ... cont'd 

Epoch-level masks

To illustrate applying different epoch-level masks, consider the following (somewhat unrealistic) example: to select epochs that:

  • are in persistent sleep as defined above (i.e. at least 10 minutes of sleep prior)
  • occur between 11pm and 3am
  • are during stage 2 NREM sleep
  • do not contain any apnea or hypopnea events

For the first individual, we can use awk to extract the indicator (0/1 for N/Y) of persistent sleep: (we ignore the header and print the 3rd column)

destrat stage.db +HYPNO -i nsrr01 -r E -v PERSISTENT_SLEEP \
  | awk ' NR>1 { print "PS"$3 } ' > cmd/ps-nsrr01.eannot 

which generates an epoch-annotation file ps-nsrr01.eannot with the two labels PS0 and PS1 to indicate persistent sleep or not.

Windows users

For users who do not have awk immediately available on your system, the output file ps-nsrr01.eannot is already present in the cmd tutorial folder, so you do not need to run the above command. We'll see other ways in lunaR to work with epoch-level annotations.

The following Luna command script (cmd/third.txt) might then be used:

MASK none
MASK hms=23:00:00-03:00:00
EPOCH-ANNOT file=cmd/ps-^.eannot 
MASK mask-if=PS0
MASK mask-ifnot=N2
MASK mask-if=apnea:obstructive,hypopnea

This command file first epochs the dataset, in 30-second epochs by default. Next, we clear the mask (i.e. to include all epochs). We then use the hms flag to mask any epoch outside of the specified range. (Actually, this includes any epoch that has any overlap with the interval between 23:00:00 and 03:00:00, i.e. so it will also include a 30-second epoch that starts at 22:59:47, for example. If you require more precise timing, should should set epoch duration to be 1 second, not 30 seconds, for example, when running the MASK hms command.)


You must use 24 hour hh:mm:ss notation, so 11pm is 23:00:00.

We then read the .eannot file as an epoch-level annotation and mask those epochs to be excluded, i.e. those with PS0. Note that we've used a special character ^ in the script -- this is expanded into the current ID of the EDF (based on the first column of the sample-list), thereby providing a convenient way to use a single command script across multiple individuals. That is, here it looked for a file called cmd/ps-nsrr01.eannot.

We then mask any epoch that is not stage 2 NREM sleep (annotation N2). Finally, we mask any epochs that contain at least one apnea or hypopnea event. Because these tutorial files come from the NSRR, and so have associated annotation files, we know about these types of respiratory events, i.e. they were manually annotated. Luna has a set of automatic functions that recast NSRR annotations into a single, clean set: this page is where apnea:obstructive and hypopnea are defined.


Here we used the mask-ifnot rather than ifnot flag: the difference is that the former will not unmask N2 epochs that were previously masked. Remember:

  • To mask or set the mask to true means to exclude that epoch.
  • To unmask or set the mask to false mean to include that epoch.

See the page on masks for a detailed description of how the MASK syntax works in Luna.

Let's run this command, just for nsrr01, in order to study the output sent to the console that describes the masking process (i.e. we'll ignore the output in out.db):

luna s.lst id=nsrr01 nsrr-remap=T -o out.db < cmd/third.txt

Note that the script used the NSRR remapped annotation terms (i.e. apnea:obstructive) and so we must include the nsrr-remap=T flag to make that mapping. Alternatively, we could simply have written the script to reference Obstructive Apnea instead.

Looking at the output written to the console, we can track what happened at each MASK stage. First, all epochs are included (with the none mask option):

 reset all 1364 epochs to be included

Next, epochs between 11pm and 3am are included, all other epochs are masked. As a sanity check, from epoch 124 to 604 inclusive is 481 epochs, which is 4 hours and 30 seconds (i.e. this includes an extra epoch that started at 22:59:47 and so still overlaps the 11pm-3am interval):

 selecting epochs from 124 to 604; 883 newly masked, 0 unmasked, 0 unchanged
 total of 481 of 1364 retained

At this point there are 481 unmasked epochs, although because we have not yet issued any RESTRUCTURE command, the effective duration of the in-memory EDF is still 1364 epochs. As such, we can still attach the annotations from the file cmd/ps-nsrr01.eannot, i.e. which is expecting 1364 epochs:

 mapping 2 distinct epoch-annotations (1364 in total) from cmd/ps-nsrr01.eannot

We then apply a mask based on this annotation: 720 epochs have the flag set to be excluded (i.e. from the file); of those, only 209 must be within the 11pm to 3am window, as only that many epochs have their status changed (i.e. because we've set these MASK commands up to only ever mask, not to unmask, epochs), leaving us with 272 epochs:

  set masking mode to 'mask' (default)
  based on exc 720 epochs match; 209 newly masked, 0 unmasked, 1155 unchanged
  total of 272 of 1364 retained

Next, we further mask based on not being in N2 sleep, which drops us to 235 epochs:

 set masking mode to 'mask' (default)
 based on N2 523 epochs match; 37 newly masked, 0 unmasked, 1327 unchanged
 total of 235 of 1364 retained

Finally, we screen for matching any apnea or hypopnea event, which takes us down to 86 epochs:

 set masking mode to 'mask' (default)
  based on apnea:obstructive 60 epochs match; 7 newly masked, 0 unmasked, 1357 unchanged
  total of 228 of 1364 retained
  based on hypopnea 509 epochs match; 142 newly masked, 0 unmasked, 1222 unchanged
  total of 86 of 1364 retained

Manipulating EDFs

Here we give an overview of filtering, resampling, relabelling, re-referencing and re-writing signal data.

The following command script (cmd/fourth.txt) illustrates the above features:

FILTER bandpass=0.3,35 ripple=0.02 tw=1
WRITE edf-tag=v2 edf-dir=newedfs/ sample-list=new.lst

Using the following Luna command (note use of the parameter file cmd/vars.txt to define new labels EEG1 and EEG2):

luna s.lst @cmd/vars.txt sig=EEG1,EEG2 < cmd/fourth.txt
Processing: nsrr01 [ #1 ]
 duration: 11.22.00 | 40920 secs | clocktime 21.58.17 - 09.20.17

 signals: 2 (of 14) selected in a standard EDF file:
  EEG2 | EEG1

  Arousal (x194) | Hypopnea (x361) | N1 (x109) | N2 (x523)
  N3 (x17) | Obstructive_Apnea (x37) | R (x238) | SpO2_artifact (x59)
  SpO2_desaturation (x254) | W (x477)

  eeg=EEG2,EEG1 | id=nsrr01
 CMD #1: mV
   options: sig=EEG1,EEG2
  rescaled EEG1 to mV
  rescaled EEG2 to mV
   options: sig=EEG1,EEG2 sr=100
  resampling channel EEG1 from sample rate 125 to 100
  resampling channel EEG2 from sample rate 125 to 100
   options: bandpass=0.3,35 ripple=0.02 sig=EEG1,EEG2 tw=1
  filtering channel(s): EEG1 EEG2
   options: ref=EEG1 sig=EEG2
  referencing EEG2 with respect to EEG1
   options: edf-dir=newedfs/ edf-tag=v2 sample-list=new.lst sig=EEG1,EEG2
  appending newedfs/learn-nsrr01-v2.edf to sample-list new.lst (dropping any annotations)
  no epoch mask set, no restructuring needed
nsrr01  WRITE   .   .   NR1 40920
nsrr01  WRITE   .   .   NR2 40920
nsrr01  WRITE   .   .   DUR1    40920
nsrr01  WRITE   .   .   DUR2    40920
  data are not truly discontinuous
  writing as a standard EDF
  writing 2 channels
  saved new EDF, newedfs/learn-nsrr01-v2.edf


This selects only the two EEG signals, based on their aliases EEG1 and EEG2. the mV command rescales any signals in uV and V units to mV, changing the EDF headers appropriately too. The RESEAMPLE command takes an argument sr which is the new sampling rate (i.e. set to 100Hz here). The FILTER command designs and applies a FIR filter with transition frequencies at 0.3 and 35 Hz.

The REFERENCE command re-references the signal data -- in this example, for illustration of the syntax only, we reference EEG2 relative to EEG1 (i.e. subtract EEG1 values from EEG2). Signals must have the same sampling rate to be referenced. Multiple, comma-delimited signals can be given as the ref, in which case the average of those values is used as the reference; if multiple signals are given for the sig command, then each signal is re-referenced with respect to whatever ref was specified.

Finally, the WRITE command generates a new EDF, that will reflect these changes. The edf-tag value is appended to the original filename for the EDF, and it is written to a directory edf-dir (if this doesn't exist, it will be created). A new sample-list new.lst is also generated, with each newly generated EDF being appended to it. (Note, values are only appended to the sample-list, to facilitate running Luna in parallel, so if new.lst already existed and was not empty, you might want to delete that file before running the above.)

We can confirm that the new sample list is as expected:

cat new.lst
nsrr01     newedfs/learn-nsrr01-v2.edf
nsrr02     newedfs/learn-nsrr02-v2.edf
nsrr03     newedfs/learn-nsrr03-v2.edf

We can also inspect the new sets of EDFs, using the SUMMARY command on new.lst:

luna new.lst id=nsrr01 -s SUMMARY
EDF filename   : newedfs/learn-nsrr01-v2.edf
Patient ID     : 
Recording info : 
Start date     : 01.01.85
Start time     : 21:58:17

# signals      : 2
# records      : 40920
Duration       : 1

Signal 1 : [EEG2]
       # samples per record : 100
       transducer type      : 
       physical dimension   : mV
       min/max (phys)       : -0.3726/0.391668
       EDF min/max (phys)   : -0.3726/0.391668
       min/max (digital)    : -32768/32767
       EDF min/max (digital): -32768/32767
       pre-filtering        : 

Signal 2 : [EEG1]
       # samples per record : 100
       transducer type      : 
       physical dimension   : mV
       min/max (phys)       : -0.24235/0.237407
       EDF min/max (phys)   : -0.24235/0.237407
       min/max (digital)    : -32768/32767
       EDF min/max (digital): -32768/32767
       pre-filtering        : 

That is, in relation to the original SUMMARY for this individual, the sampling rate is different, the labels have been reset to their alias values, there are now only two signals, not 14, the units of mV and not uV, and the min/max range is different (reflecting both the change in scale, and the filtering and referencing).

Artifact detection

Here we detect likely artifact in EEG (and other) signals, using two approaches. The first, specific to sleep EEG, is as described by Buckelmueller et al (2006). The second is based on per-epoch statistical filtering of RMS and Hjorth parameters.

Running just for the first individual, the command file cmd/fifth.txt contains the following:

FILTER bandpass=0.3,35 ripple=0.02 tw=1

After filtering the signal to consider only sleep epochs, and filtering the signal to remove very slow and fast rhythms, we use the SIGSTATS command to generate Hjorth parameters (and also RMS) for a signal, both overall (i.e. all sleep epochs) and per-epoch (by virtue of the epoch option). We run Luna and save the results in artifact.db, only for a single EEG channel for now:

luna s.lst 1 sig=EEG -o artifact.db < cmd/fifth.txt

Viewing the contents of artifact.db:

destrat artifact.db
artifact.db: 5 command(s), 1 individual(s), 16 variable(s), 2677 values
  command #1:   c1  Thu Aug 13 13:45:19 2020    EPOCH   sig=EEG
  command #2:   c2  Thu Aug 13 13:45:19 2020    MASK    if=W sig=EEG
  command #3:   c3  Thu Aug 13 13:45:19 2020    RESTRUCTURE sig=EEG
  command #4:   c4  Thu Aug 13 13:45:19 2020    FILTER  bandpass=0.3,35 ripple=0.02 sig=EEG tw=1
  command #5:   c5  Thu Aug 13 13:45:20 2020    SIGSTATS    epoch=T sig=EEG
distinct strata group(s):
  commands      : factors           : levels        : variables 
  [EPOCH]       : .                 : 1 level(s)    : DUR INC NE
                :                   :               : 
  [MASK]        : EMASK             : 1 level(s)    : N_MASK_SET N_MASK_UNSET N_MATCHES
                :                   :               : N_RETAINED N_TOTAL N_UNCHANGED
                :                   :               :
  [RESTRUCTURE] : .                 : 1 level(s)    : DUR1 DUR2 NA NR1 NR2 NS
                :                   :               : 
  [SIGSTATS]    : CH                : 1 level(s)    : H1 H2 H3
                :                   :               : 
  [SIGSTATS]    : E CH              : (...)         : H1 H2 H3
                :                   :               : 

The SIGSTATS command produces Hjorth parameters (H1, H2 and H3) both overall (CH strata group) and per-epoch (E CH strata group). To see the overall values:

destrat artifact.db +SIGSTATS -r CH -p 2
ID      CH   H1     H2    H3 
nsrr01  EEG  78.38  0.35  0.83

To save all per-epoch values to a file res.txt:

destrat artifact.db +SIGSTATS -r CH E  > res.txt

One can load this file in R, for example, to view the per-epoch distribution (we do this below also).

As well as viewing these statistics, we can use them to flag likely artifacts, by considering values that are statistical outliers in the per-epoch distribution. If we consider a new command script, below (and cmd/sixth.txt in the tutorial folder), we add the mask option to the SIGSTATS command, and specify a set of thresholds: 3,3,3 means to iteratively remove epochs that are more than 3 standard deviations above or below the mean. We also add the ARTIFACTS command beforehand, which implements the Buckelmueller et al. filtering referenced above. Note that the SIGSTATS command will only operate on unmasked values (the prior ARTIFACTS command will have masked some).

FILTER bandpass=0.3,35 ripple=0.02 tw=1
CHEP-MASK ep-th=3,3,3 epoch
CHEP epochs

Running this new script, which ends by saving the current mask, we see the following:

luna s.lst 1 sig=EEG -o mask.db < cmd/sixth.txt
 ... cont'd ...
   options: mask=T sig=EEG
  masked 12 of 887 epochs, altering 12
   options: ep-th=3,3,3 epoch=T sig=EEG
  within-channel/between-epoch outlier detection, ep-th = 3,3,3
   iteration 1: removed 21 channel/epoch pairs this iteration (21 in total)
   iteration 2: removed 9 channel/epoch pairs this iteration (30 in total)
   iteration 3: removed 5 channel/epoch pairs this iteration (35 in total)
   options: epochs=T sig=EEG
  masking epochs with >0% masked channels: 35 epochs
  CHEP summary:
   35 of 887 channel/epoch pairs masked (4%)
   35 of 887 epochs with 1+ masked channel, 35 with all channels masked
   1 of 1 channels with 1+ masked epoch, 0 with all epochs masked
 ... cont'd ...

That is, the first approach to artifact detection removes 12 epochs; the second approach (which itself comprises three iterations) removes a total of 35 additional epochs. We can extract the mask (output by the DUMP-MASK command)

destrat mask.db +DUMP-MASK -r E -v EMASK > mask.txt

This has 888 rows: one header plus the 877 non-wake epochs (i.e. after a RESTRUCTURE command, this is the "total" number of epochs left). Loading this file, plus the previous output from SIGSTATS into R, we can plot the per-epoch distribution of the Hjorth parameters, colored by whether that epoch was excluded by artifact detection (run the following in R):

 d <- read.table("res.txt",header=T,stringsAsFactors=F)
 m <- read.table("mask.txt",header=T,stringsAsFactors=F)
 m <- merge( d , m , by=c("ID","E") ) 

 plot( m$E , m$H1 , pch = 20 , xlab="Epoch" , ylab="H1" , col = m$EMASK + 1) 
 plot( m$E , m$H2 , pch = 20 , xlab="Epoch" , ylab="H2" , col = m$EMASK + 1) 
 plot( m$E , m$H3 , pch = 20 , xlab="Epoch" , ylab="H3" , col = m$EMASK + 1) 


In practice, given the heterogeneous nature of sleep signals, naturally it may be preferable to perform stage-specific artifact detection, or only remove very aberrant signals based on global outlier status.

To examine one of the removed epochs more closely (we can use scope as below, also), here we illustrate using commands such as DUMP and MATRIX to obtain simple, plain-text signal data. This extracts the EEG signal only, for the first individual's epoch 220. (This was one of the flagged epochs, as you can see by examining the output of DUMP-MASK above.

luna s.lst 1 sig=EEG -s 'EPOCH & MASK epoch=220 & MATRIX file=eeg.txt minimal'

This is a plain-text file with one number (sample-point) per row. We expect 30 times 125 = 3750 rows, for a single 30-sec epoch of signal sampled at 125 Hz. In R:

s <- scan("eeg.txt")
Read 3750 items

We can then generate a plot of this signal, e.g. :

plot( seq(1/125,30,1/125), s, type="l", ylim=c(-125, 125 ) , 
      xlab="Time (sec)" , ylab="EEG" )



lunaR provides a far easier means to directly import raw signal data into R, that does not involve intermediate text file; we'll consider this explicitly in the next tutorial section.

Spectral and spindle analyses

To generate the EEG power density spectrum for each individual, say during stage 2 NREM sleep, we can use the PSD command. Specifically, if we had a command file as follows:

MASK ifnot=N2
FILTER bandpass=0.3,35 ripple=0.02 tw=1
CHEP-MASK ep-th=3,3,3
CHEP epoch
PSD spectrum

then the following command should epoch the signals, only include N2 epochs, filter the EEG and attempt to remove epochs that contain gross artifact, and then calculate the power spectra, both for classical bands (delta, alpha, sigma, etc) and also for each frequency bin (subject to the resolution determined by the sampling frequency and the size of the intervals analyzed. Internally, Luna uses of Welch algorithm and FFT to estimate the PSD.

luna s.lst sig=EEG -o out.db < cmd/seventh.txt


This script cmd/seventh.txt also contains two extra commands at the end to detect spindles, which we'll describe below.

The power spectra are values implicitly stratified both by channel (CH) and frequency (F), so we use the following command to extract the absolute power values:

destrat out.db +PSD -r F CH -v PSD > res.psd

In R, we can read the res.psd file as follows,

d <- read.table("res.psd",header=T,stringsAsFactors=F)

      ID  CH    F      PSD
1 nsrr01 EEG 0.50 26.09547
2 nsrr01 EEG 0.75 36.36182
3 nsrr01 EEG 1.00 39.85385
4 nsrr01 EEG 1.25 30.42863
5 nsrr01 EEG 1.50 22.48063
6 nsrr01 EEG 1.75 19.02125

We can create 3 plots, on a log-scale, of

ids <- unique(d$ID)
for (i in ids) plot( d$F[ d$ID == i ] , log( d$PSD[ d$ID == i ] ) , 
  type="l" , xlab="Hz" , ylab="log(uV^2/Hz)" , lwd=2,main=i) 


Especially for the second individual, we see the characteristic peak in the sigma band (11-15Hz), typically representing spindle activity during stage 2 NREM sleep.

As a side-note: naturally, without filtering and artifact detection, especially if including many waking epochs at the start and end of recordings, the power spectra can be very noisy, indicating many sources of (typically non-biological) artifact. For example, this command estimates the PSD without any prior filtering:

luna s.lst @cmd/vars.txt sig=EEG1,EEG2 -o psd-all.db \
   -s 'EPOCH & MASK if=W & PSD spectrum max=62'
destrat psd-all.db +PSD -r CH F -v PSD > res.psd-all

In R:

d <- read.table( "res.psd-all",header=T,stringsAsFactors=F)
ids <- unique(d$ID)
for (ch in c("EEG1","EEG2") )
 for (i in ids) 
   plot( d$F[ d$ID == i & d$CH == ch ] , log( d$PSD[ d$ID == i & d$CH == ch ] ) , 
            type="l" , xlab="Hz" , ylab="log(uV^2/Hz)" , lwd=2,main=paste(i,ch))


Especially for nsrr02, there is clearly a great deal of 50 and 60Hz line noise in the signals. Visual inspection of the signals, especially EEG2 reveals other artifacts (below). Depending on the type of analyses one wishes to perform, careful filtering and/or artifact detection/correction will be important.

Finally, as we noted above, the last two lines of cmd/seventh.txt additionally instructed Luna to detect spindles and then write them to a file (where the ^ indicates the individual ID is to be swapped in):

SPINDLES fc=11,15 annot=spindles
WRITE-ANNOTS file=spindles-^.txt annot=spindles no-specials

The fc options specify the targeted frequencies: slower spindles at 11 Hz and faster spindles at 15 Hz. The annot options request that an annotations be generated (for each individual) to represent the spindle calls -- these can be used as filters.

To extract the spindle density (count per minute, DENS) we use the following command:

destrat out.db +SPINDLES -r CH -c F -v DENS -p 2 
ID     CH   DENS.F_11  DENS.F_15
nsrr01 EEG  0.73       0.64
nsrr02 EEG  1.29       2.06
nsrr03 EEG  0.72       0.19

Because we added the annot option, Luna generated some annotations that represent the spindle calls and some associated per-spindle meta-data (e.g. amplitude amp). These are then saved in the final command, WRITE-ANNOTS to .annot files.

head spindles-nsrr01.txt 
class     instance channel  start     stop      meta
spindles  15       EEG      2706.560  2707.176  amp=11.067|dur=0.616|frq=13.798|isa=0.790
spindles  11       EEG      2834.640  2835.224  amp=19.170|dur=0.584|frq=10.274|isa=0.839
spindles  15       EEG      2868.120  2868.752  amp=12.517|dur=0.632|frq=13.449|isa=0.889
spindles  11       EEG      2868.832  2869.656  amp=15.997|dur=0.824|frq=10.315|isa=0.993
spindles  15       EEG      2870.896  2871.408  amp=13.363|dur=0.512|frq=13.671|isa=0.900
spindles  15       EEG      2905.008  2905.696  amp=16.057|dur=0.688|frq=13.081|isa=1.643
spindles  15       EEG      2949.696  2950.520  amp=19.656|dur=0.824|frq=14.563|isa=2.408
spindles  11       EEG      2969.200  2969.768  amp=15.151|dur=0.568|frq=9.6831|isa=0.823
spindles  11       EEG      2978.264  2979.192  amp=14.449|dur=0.928|frq=10.237|isa=0.924

There are a large number of other options and output variables for the SPINDLES and other commands, described in the Reference pages of this web site.

Back to top