A deeper dive

In this third section of the tutorial, we use Luna to calculate some typical metrics of interest for sleep studies, including:

how basic statistics vary across the night
measures of sleep macro-architecture
filtering and artifact detection in the EEG
spectral analysis using fast Fourier transforms
spindle detection using wavelets

Epoch-level summaries

Returning to the STATS command described above, here we use it with the epoch option, in order to generate per-epoch statistics. This section also illustrates working with the text output format.

luna s.lst 2 sig=EMG,ECG -s 'EPOCH & STATS epoch' > res.txt

===================================================================
+++ luna | v0.26.0, 11-Nov-2021 | starting 01-Dec-2021 11:08:09 +++
===================================================================
input(s): s.lst
output  : .
signals : ECG EMG
commands: c1    EPOCH   
        : c2    STATS   epoch=T

___________________________________________________________________
Processing: nsrr02 [ #2 ]
 duration: 09.57.30 | 35850 secs | clocktime 21.18.06 - 07.15.36

 signals: 2 (of 14) selected in a standard EDF file
  ECG | EMG

 annotations:
  Arousal (x51) | Hypopnea (x101) | N1 (x11) | N2 (x399)
  N3 (x185) | Obstructive_Apnea (x5) | R (x120) | SpO2_artifact (x34)
  SpO2_desaturation (x66) | W (x480)

 variables:
  ecg=ECG | emg=EMG | id=nsrr02
 ..................................................................
 CMD #1: EPOCH
   options: sig=ECG,EMG
 set epochs, length 30 (step 30, offset 0), 1195 epochs
 ..................................................................
 CMD #2: STATS
   options: epoch=T sig=ECG,EMG
 processing ECG ...
 processing EMG ...

___________________________________________________________________
...processed 1 EDFs, done.
...processed 1 command set(s),  all of which passed
-------------------------------------------------------------------
+++ luna | finishing 01-Dec-2021 11:08:14                       +++
===================================================================

This command was applied only to the second EDF in the sample-list (i.e. the 2 argument), and only to the ECG and EMG signals. After defining 30-second epochs (the default from the EPOCH command), it generates basic statistics per-epoch (the epoch option for the STATS command), as well as entire-signal summaries. Note that 1195 epochs are generated, which corresponds to the stated duration of 9 hours, 57 and a half minutes (i.e. ( 9 * 60 + 57 ) * 2 + 1 = 1195).

The first few lines of res.txt are as follows:

nsrr02   EPOCH   .        .   NE       1195
nsrr02   EPOCH   .        .   DUR      30
nsrr02   EPOCH   .        .   INC      30
nsrr02   STATS   CH/ECG   1   MAX      1.25
nsrr02   STATS   CH/ECG   1   MIN     -0.651961
nsrr02   STATS   CH/ECG   1   MEAN     0.00506797
nsrr02   STATS   CH/ECG   1   SKEW     1.68822
nsrr02   STATS   CH/ECG   1   MEDIAN   0.0343137
nsrr02   STATS   CH/ECG   1   RMS      0.271051
nsrr02   STATS   CH/ECG   2   MAX      1.25
...

That is, the first three lines (for which the 4th column is empty, .) are from the EPOCH command. The subsequent lines are the per-epoch values from the STATS command: e.g. the mean is 0.00506797 for the first epoch.

This format is designed to be easily parsed and read by other software. For example, to use R to plot the per-epoch RMS values of the ECG, we might use the following (from within R):

d <- read.table( "res.txt" , header=F , sep="\t" , stringsAsFactors=F )

Hint

As you'll see in the next section of this tutorial, it is now possible to read lunout databases directly into R

After loading the data, it is convenient to relabel the 6 columns to have sensible names:

names(d) <- c( "ID" , "COMM" , "STRATA" , "T" , "VAR" , "VAL" )

Our goal is to generate a plot of per-epoch RMS for the ECG. We can extract these values by filtering rows where VAR equals "RMS", when T is not ".", and when STRATA is "CH/ECG":

d <- d[ d$STRATA == "CH/ECG" & d$T != "." & d$VAR == "RMS" , ]

We can confirm that this produces the expected number of rows/epochs:

dim(d)

 [1] 1195    6

We need to ensure that the epoch and value fields are treated as numeric data points:

d$T <- as.numeric(d$T)
d$VAL <- as.numeric(d$VAL)

As a sanity-check, we can view the structure of the data frame:

str(d)

'data.frame':   1195 obs. of  6 variables:
  $ ID    : chr  "nsrr02" "nsrr02" "nsrr02" "nsrr02" ...
  $ COMM  : chr  "STATS" "STATS" "STATS" "STATS" ...
  $ STRATA: chr  "CH/ECG" "CH/ECG" "CH/ECG" "CH/ECG" ...
  $ T     : num  1 2 3 4 5 6 7 8 9 10 ...
  $ VAR   : chr  "RMS" "RMS" "RMS" "RMS" ...
  $ VAL   : num  0.271 0.279 0.268 0.264 0.265 ...

To plot these per-epochs values:

plot( d$T , d$VAL , ylab = "RMS(ECG)" , xlab = "Epoch" , pch=20 )

Clearly there are a number of extreme points towards the end of the recording, presumably indicating bad data after the subject woke for the final time that night. We can confirm this by indicating the manually-annotated sleep stage for each point. Going back to the command line (type q() to get out of R, or even better, run the next couple of Luna commands in a separate window to keep the R session open), we can use Luna to extract sleep stages (using the STAGE command) in a format that can be read into R:

luna s.lst 2 -o ss.db -s STAGE

and extracting the output as a text file ss.txt:

destrat ss.db +STAGE -r E -v STAGE > ss.txt

We then load this sleep stage information into R as follows, which indicates that we have the expected number of epochs:

ss <- read.table( "ss.txt" , header=T )
dim(ss)

[1] 1195    3

This is the breakdown of epochs by each sleep stage for this individual:

table(ss$STAGE)

 N1  N2  N3   R   W 
 11 399 185 120 480

If you previously closed the R session, you'll need to rerun the commands to create the d data-frame. Then, plotting epochs colored by sleep stage, we indeed confirm that the end aberrant epochs are all wake: (plot not shown).

plot( d$T , d$VAL  , ylab = "RMS(ECG)" , xlab = "Epoch" , pch=20 , col = as.factor( ss$STAGE  ) ) 
cols <- sort(unique(ss$STAGE))
legend("topleft",legend=cols , fill = as.factor( cols ) )

Simply ignoring these final points (which roughly corresponds to looking only up to the 950th epoch in this example), we can generate a clearer plot of how the RMS of the ECG varies across the night in this individual, which tracks with the progression of sleep and wake in a semi-predictable manner:

inc <- 1:950
plot( d$T[inc], d$VAL[inc], ylab="RMS(ECG)", xlab="Epoch", pch=20, col=as.factor(ss$STAGE[inc]) ) 
legend("bottomleft",legend=cols , fill = as.factor( cols ) )

Hypnograms

Given a set of epoch-based staging values, the HYPNO command produces a series of statistics that describe different aspects of sleep macro-architecture.

luna s.lst -o stage.db -s 'EPOCH & HYPNO epoch'

This generate a large number of variables, as described here. There are various different strata groups in the output from STAGE:

destrat stage.db

--------------------------------------------------------------------------------
stage.db: 2 command(s), 3 individual(s), 66 variable(s), 107919 values
--------------------------------------------------------------------------------
  command #1:   c1  Thu Aug 13 13:40:56 2020    EPOCH   sig=*
  command #2:   c2  Thu Aug 13 13:40:56 2020    HYPNO   sig=*
-------------------------------------------------------------------------
distinct strata group(s):
 commands  : factors   : levels      : variables 
-----------:-----------:-------------:---------------------------
 [EPOCH]   : .         : 1 level(s)  : DUR GENERIC INC NE OFFSET
           :           :             : 
 [HYPNO]   : .         : 1 level(s)  : CONF E0_START E1_LIGHTS_OFF E2_SLEEP_ONSET
           :           :             : E3_SLEEP_MIDPOINT E4_FINAL_WAKE
           :           :             : E5_LIGHTS_ON E6_STOP EINS FIXED_LIGHTS
           :           :             : FIXED_SLEEP FIXED_WAKE FWT HMS0_START
           :           :             : HMS1_LIGHTS_OFF HMS2_SLEEP_ONSET
           :           :             : HMS3_SLEEP_MIDPOINT HMS4_FINAL_WAKE
           :           :             : HMS5_LIGHTS_ON HMS6_STOP LOST LOT
           :           :             : LZW LZW3 NREMC NREMC_MINS OTHR
           :           :             : POST PRE REM_LAT REM_LAT2 SE SFI
           :           :             : SINS SME SOL SOL_PER SPT SPT_PER
           :           :             : T0_START T1_LIGHTS_OFF T2_SLEEP_ONSET
           :           :             : T3_SLEEP_MIDPOINT T4_FINAL_WAKE
           :           :             : T5_LIGHTS_ON T6_STOP TGT TIB TI_RNR
           :           :             : TI_S TI_S3 TRT TST TST_PER TWT
           :           :             : WASO
           :           :             : 
 [HYPNO]   : E         : (...)       : AFTER_GAP CLOCK_HOURS CLOCK_TIME
           :           :             : CYCLE CYCLE_POS_ABS CYCLE_POS_REL
           :           :             : E_N1 E_N2 E_N3 E_REM E_SLEEP E_WAKE
           :           :             : E_WASO FLANKING FLANKING_ALL MINS
           :           :             : N2_WGT NEAREST_WAKE OSTAGE PCT_E_N1
           :           :             : PCT_E_N2 PCT_E_N3 PCT_E_REM PCT_E_SLEEP
           :           :             : PERIOD PERSISTENT_SLEEP POST PRE
           :           :             : SPT STAGE STAGE_N START_SEC TOT_NR2R
           :           :             : TOT_NR2W TOT_R2NR TOT_R2W TOT_W2NR
           :           :             : TOT_W2R TR_NR2R TR_NR2W TR_R2NR
           :           :             : TR_R2W TR_W2NR TR_W2R WASO
           :           :             : 
 [HYPNO]   : C         : 6 level(s)  : NREMC_MINS NREMC_N NREMC_NREM_MINS
           :           :             : NREMC_OTHER_MINS NREMC_REM_MINS
           :           :             : NREMC_START
           :           :             : 
 [HYPNO]   : SS        : 11 level(s) : BOUT_05 BOUT_10 BOUT_MD BOUT_MN
           :           :             : BOUT_MX BOUT_N DENS MINS PCT TA
           :           :             : TS
           :           :             : 
 [HYPNO]   : N         : 141 level(s): FIRST_EPOCH LAST_EPOCH MINS STAGE
           :           :             : START STOP
           :           :             : 
 [HYPNO]   : PRE POST  : 9 level(s)  : N P P_POST_COND_PRE P_PRE_COND_POST
           :           :             :
           :           :             : 
-----------:-----------:-------------:---------------------------

The first (.) HYPNO group is a set of variables with no stratifiers: that is, simply one value per individual/EDF, such as Total Sleep Time (TST).

Stage durations are stratifed by SS (sleep stage). To view the number of minutes of each sleep stage for the three individuals, NREMC, we can use:

destrat stage.db +HYPNO -c SS/N1,N2,N3,R -v MINS

ID       MINS.SS_N1   MINS.SS_N2    MINS.SS_N3    MINS.SS_R
nsrr01   54.5         261.5         8.5           119
nsrr02   5.5          199.5         92.5          60
nsrr03   26           187.5         10.5          0

To view cycle-level variables (C factor), we can use the following to see the duration (in minutes) and start (in epoch number):

destrat stage.db +HYPNO -r C -v NREMC_START NREMC_MINS

ID       C    NREMC_MINS  NREMC_START
nsrr01   1    45          87
nsrr01   2    100.5       184
nsrr01   3    47          451
nsrr01   4    77.5        654
nsrr01   5    60          819
nsrr01   6    50          956
nsrr02   1    72.5        92
nsrr02   2    133         237
nsrr02   3    42          503
nsrr02   4    105.5       675
nsrr02   5    32.5        887
nsrr03   1    69          118
nsrr03   2    74          312
nsrr03   3    137.5       630

To illustrate extracting epoch-level information: here we consider the designated sleep stage (STAGE variable), along with the time (CLOCK_TIME) and a measure of how many nearby epochs had the same stage (FLANKING). By default, FLANKING treats all three NREM stages as similar for this particular analysis, unless the option flanking-collapse-nrem=F is passed to HYPNO. The latter is the minimum number of contiguous epochs, either forwards or backwards in time, that are similar to the index epoch (truncated at the start and end of the recording). In other words, selecting epochs with 3 or more for this variable means that there are at least 3 epochs before and 3 epochs after with a similar stage. This can be used to select periods of sleep that are less likely to contain stage transitions, for example.

destrat stage.db +HYPNO -i nsrr01  -r E  -v STAGE FLANKING CLOCK_TIME

ID       E      CLOCK_TIME   FLANKING  STAGE
nsrr01   1      21:58:17     0         W 
nsrr01   2      21:58:47     1         W 
nsrr01   3      21:59:17     2         W 
nsrr01   4      21:59:47     3         W 
nsrr01   5      22:00:17     4         W 
nsrr01   6      22:00:47     5         W 
nsrr01   7      22:01:17     6         W 
nsrr01   8      22:01:47     7         W 
nsrr01   9      22:02:17     8         W 
nsrr01   10     22:02:47     9         W 
nsrr01   11     22:03:17     10        W 
 ... cont'd ...                     
nsrr01  183     23:29:17     0         N1 
nsrr01  184     23:29:47     0         N2
nsrr01  185     23:30:17     0         W 
nsrr01  186     23:30:47     0         N1
nsrr01  187     23:31:17     1         N2
nsrr01  188     23:31:47     2         N2
nsrr01  189     23:32:17     3         N2
nsrr01  190     23:32:47     4         N2
nsrr01  191     23:33:17     5         N2
nsrr01  192     23:33:47     6         N2
nsrr01  193     23:34:17     7         N2
nsrr01  194     23:34:47     8         N2
nsrr01  195     23:35:17     9         N2
nsrr01  196     23:35:47     10        N2
nsrr01  197     23:36:17     11        N2
nsrr01  198     23:36:47     10        N2
nsrr01  199     23:37:17     9         N2
nsrr01  200     23:37:47     8         N2
nsrr01  201     23:38:17     7         N2
nsrr01  202     23:38:47     6         N2
nsrr01  203     23:39:17     5         N2
nsrr01  204     23:39:47     4         N2
nsrr01  205     23:40:17     3         N2
nsrr01  206     23:40:47     2         N2
nsrr01  207     23:41:17     1         N2
nsrr01  208     23:41:47     0         N2
nsrr01  209     23:42:17     0         W 
nsrr01  210     23:42:47     0         N2
 ... cont'd

Epoch-level masks

To illustrate applying different epoch-level masks, consider the following (somewhat unrealistic) example - to select epochs that:

occur between 11pm and 3am
are in persistent sleep as defined above (i.e. at least 10 minutes of sleep prior)
are during stage 2 NREM sleep
do not contain any apnea or hypopnea events

The following Luna command script (cmd/third.txt) shows one way of doing this, leveraging the ability of HYPNO to add annotations on-the-fly (to define persistent sleep):

% with `annot`, HYPNO will add annotations to the EDF starting h_
% and will include the epoch-level annotation h_persistent_sleep
HYPNO annot

% not needed, but start by ensuring mask is reset
MASK none

% first select epochs based on clock-time
MASK hms=23:00:00-03:00:00

% equivalently - at resolution of 1hr at least - could have
% used annotations added by `HYPNO annot` 
%MASK mask-ifnot=h_clock_23,h_clock_00,h_clock_01,h_clock_02

% next, include only if in persistent sleep
MASK mask-ifnot=h_persistent_sleep

% select only N2 epochs
MASK mask-ifnot=N2

% and reject epochs with any respiratory event
MASK mask-if=Obstructive_Apnea,Hypopnea

% finally, output the mask
DUMP-MASK

The above first runs HYPNO (which also epochs the data in 30-second epochs) but adds the flag annot which tells HYPNO to add a series of annotations to the recording, largely based upon the hynogram. By default, these start with the prefix h_ and include one that defines (at an epoch-level) persistent sleep, called h_persistent_sleep.

Next, although not strictly necessary, we clear the mask (i.e. to include all epochs). We then use the hms flag to mask any epoch outside of the specified range. (Actually, this includes any epoch that has any overlap with the interval between 23:00:00 and 03:00:00, i.e. so it will also include a 30-second epoch that starts at 22:59:47, for example. If you require more precise timing, should should set epoch duration to be 1 second, not 30 seconds, for example, when running the MASK hms command.)

Info

You must use 24 hour hh:mm:ss notation, so 11pm is 23:00:00.

As shown above, HYPNO annot also emits some annotations corresponding to clock-times, and we could have used that MASK statement instead to achieve the same effect.

It next masks any epochs that are not labelled as h_persistent_sleep. Next, it masks any epoch that is not stage 2 NREM sleep (annotation N2).

Note

Here we used the mask-ifnot rather than ifnot flag: the difference is that the former will not unmask N2 epochs that were previously masked. Remember:

To mask or set the mask to true means to exclude that epoch.
To unmask or set the mask to false mean to include that epoch.

See the page on masks for a detailed description of how the MASK syntax works in Luna.

Finally, it masks any epochs that contain at least one apnea or hypopnea event, using the labels Obstructive_Apnea and Hypopnea as they appear in the Luna console output (i.e. with an underscore replacing the space).

Let's run this command, just for nsrr01, in order to study the output sent to the console that describes the masking process (i.e. we'll ignore the output in out.db):

luna s.lst id=nsrr01 -o out.db < cmd/third.txt

Looking at the output written to the console, we can track what happened at each MASK stage. First, the hypnogram derived annotations are added from running HYPNO, with a note that these will have a prefix h_:

 CMD #1: HYPNO
   options: annot sig=*
  set epochs to default 30 seconds, 1364 epochs
  set 0 leading/trailing sleep epochs to '?' (given end-wake=120 and end-sleep=5)
  creating hypnogram-derived annotations, with prefix h

To see a full list of the added annotations, either run a ANNOTS command or see the documentation page for HYPNO. Next, all epochs are included (with the none mask option):

 CMD #2: MASK
  set masking mode to 'mask' (default)
  reset all 1364 epochs to be included

Next, epochs between 11pm and 3am are included, all other epochs are masked. As a sanity check, from epoch 124 to 604 inclusive is 481 epochs, which is 4 hours and 30 seconds (i.e. this includes an extra epoch that started at 22:59:47 and so still overlaps the 11pm-3am interval):

 CMD #3: MASK
  selecting epochs from 124 to 604; selecting from set of 481 epochs; 883 newly masked, 0 unmasked, 0 unchanged
  total of 481 of 1364 retained

At this point there are 481 unmasked epochs, although because we have not yet issued any RESTRUCTURE command, the effective duration of the in-memory EDF is still 1364 epochs. We next further mask epochs that are not in persistent sleep:

 CMD #4: MASK
  annots: h_persistent_sleep
  applied annotation mask for 1 annotation(s)
  644 epochs match; 209 newly masked, 0 unmasked, 1155 unchanged
  total of 272 of 1364 retained

Based on this, it implies that 209 epochs are within the 11pm to 3am window but not in persistent sleep, as they are newly masked. Only those epochs have their status changed (i.e. because we've set this MASK commands up to only ever mask, not to unmask, epochs), leaving us with 272 epochs:

Next, we further mask based on not being in N2 sleep, which drops us to 235 epochs:

 CMD #5: MASK
  annots: N2
  applied annotation mask for 1 annotation(s)
  523 epochs match; 37 newly masked, 0 unmasked, 1327 unchanged
  total of 235 of 1364 retained

Finally, we screen for matching any apnea or hypopnea event, which takes us down to 86 epochs:

 CMD #6: MASK
  annots: Hypopnea Obstructive_Apnea
  applied annotation mask for 2 annotation(s) (using or-matching across multiple annotations)
  558 epochs match; 149 newly masked, 0 unmasked, 1215 unchanged
  total of 86 of 1364 retained

The final command writes the mask status (masked or not) to the output. In practice, we would now likely perform a RE command to restructure the dataset, and then run subsequent commands or output a new EDF, etc.

Manipulating EDFs

Here we give an overview of filtering, resampling, relabelling, re-referencing and re-writing signal data.

The following command script (cmd/fourth.txt) illustrates the above features:

mV 
RESAMPLE sr=100
FILTER bandpass=0.3,35 ripple=0.02 tw=1
REFERENCE ref=EEG1 sig=EEG2 
WRITE edf-tag=v2 edf-dir=newedfs/ sample-list=new.lst

Using the following Luna command (note use of the parameter file cmd/vars.txt to define new labels EEG1 and EEG2):

luna s.lst @cmd/vars.txt sig=EEG1,EEG2 < cmd/fourth.txt

Processing: nsrr01 [ #1 ]
 duration: 11.22.00 | 40920 secs | clocktime 21.58.17 - 09.20.17

 signals: 2 (of 14) selected in a standard EDF file:
  EEG2 | EEG1

 annotations:
  Arousal (x194) | Hypopnea (x361) | N1 (x109) | N2 (x523)
  N3 (x17) | Obstructive_Apnea (x37) | R (x238) | SpO2_artifact (x59)
  SpO2_desaturation (x254) | W (x477)

 variables:
  eeg=EEG2,EEG1 | id=nsrr01
 ..................................................................
 CMD #1: mV
   options: sig=EEG1,EEG2
  rescaled EEG1 to mV
  rescaled EEG2 to mV
 ..................................................................
 CMD #2: RESAMPLE
   options: sig=EEG1,EEG2 sr=100
  resampling channel EEG1 from sample rate 125 to 100
  resampling channel EEG2 from sample rate 125 to 100
 ..................................................................
 CMD #3: FILTER
   options: bandpass=0.3,35 ripple=0.02 sig=EEG1,EEG2 tw=1
  filtering channel(s): EEG1 EEG2
 ..................................................................
 CMD #4: REFERENCE
   options: ref=EEG1 sig=EEG2
  referencing EEG2 with respect to EEG1
 ..................................................................
 CMD #5: WRITE
   options: edf-dir=newedfs/ edf-tag=v2 sample-list=new.lst sig=EEG1,EEG2
  appending newedfs/learn-nsrr01-v2.edf to sample-list new.lst (dropping any annotations)
  no epoch mask set, no restructuring needed
nsrr01  WRITE   .   .   NR1 40920
nsrr01  WRITE   .   .   NR2 40920
nsrr01  WRITE   .   .   DUR1    40920
nsrr01  WRITE   .   .   DUR2    40920
  data are not truly discontinuous
  writing as a standard EDF
  writing 2 channels
  saved new EDF, newedfs/learn-nsrr01-v2.edf

___________________________________________________________________

This selects only the two EEG signals, based on their aliases EEG1 and EEG2. the mV command rescales any signals in uV and V units to mV, changing the EDF headers appropriately too. The RESEAMPLE command takes an argument sr which is the new sampling rate (i.e. set to 100Hz here). The FILTER command designs and applies a FIR filter with transition frequencies at 0.3 and 35 Hz.

The REFERENCE command re-references the signal data -- in this example, for illustration of the syntax only, we reference EEG2 relative to EEG1 (i.e. subtract EEG1 values from EEG2). Signals must have the same sampling rate to be referenced. Multiple, comma-delimited signals can be given as the ref, in which case the average of those values is used as the reference; if multiple signals are given for the sig command, then each signal is re-referenced with respect to whatever ref was specified.

Finally, the WRITE command generates a new EDF, that will reflect these changes. The edf-tag value is appended to the original filename for the EDF, and it is written to a directory edf-dir (if this doesn't exist, it will be created). A new sample-list new.lst is also generated, with each newly generated EDF being appended to it. (Note, values are only appended to the sample-list, to facilitate running Luna in parallel, so if new.lst already existed and was not empty, you might want to delete that file before running the above.)

We can confirm that the new sample list is as expected:

cat new.lst

nsrr01     newedfs/learn-nsrr01-v2.edf
nsrr02     newedfs/learn-nsrr02-v2.edf
nsrr03     newedfs/learn-nsrr03-v2.edf

We can also inspect the new sets of EDFs, using the SUMMARY command on new.lst:

luna new.lst id=nsrr01 -s SUMMARY

EDF filename   : newedfs/learn-nsrr01-v2.edf
Patient ID     : 
Recording info : 
Start date     : 01.01.85
Start time     : 21:58:17

# signals      : 2
# records      : 40920
Duration       : 1

Signal 1 : [EEG2]
       # samples per record : 100
       transducer type      : 
       physical dimension   : mV
       min/max (phys)       : -0.3726/0.391668
       EDF min/max (phys)   : -0.3726/0.391668
       min/max (digital)    : -32768/32767
       EDF min/max (digital): -32768/32767
       pre-filtering        : 

Signal 2 : [EEG1]
       # samples per record : 100
       transducer type      : 
       physical dimension   : mV
       min/max (phys)       : -0.24235/0.237407
       EDF min/max (phys)   : -0.24235/0.237407
       min/max (digital)    : -32768/32767
       EDF min/max (digital): -32768/32767
       pre-filtering        :

That is, in relation to the original SUMMARY for this individual, the sampling rate is different, the labels have been reset to their alias values, there are now only two signals, not 14, the units of mV and not uV, and the min/max range is different (reflecting both the change in scale, and the filtering and referencing).

Artifact detection

Here we detect likely artifact in EEG (and other) signals, using two approaches. The first, specific to sleep EEG, is as described by Buckelmueller et al (2006). The second is based on per-epoch statistical filtering of RMS and Hjorth parameters.

Running just for the first individual, the command file cmd/fifth.txt contains the following:

EPOCH 
MASK if=W
RESTRUCTURE
FILTER bandpass=0.3,35 ripple=0.02 tw=1
SIGSTATS epoch

After filtering the signal to consider only sleep epochs, and filtering the signal to remove very slow and fast rhythms, we use the SIGSTATS command to generate Hjorth parameters (and also RMS) for a signal, both overall (i.e. all sleep epochs) and per-epoch (by virtue of the epoch option). We run Luna and save the results in artifact.db, only for a single EEG channel for now:

luna s.lst 1 sig=EEG -o artifact.db < cmd/fifth.txt

Viewing the contents of artifact.db:

destrat artifact.db

--------------------------------------------------------------------------------
artifact.db: 5 command(s), 1 individual(s), 16 variable(s), 2677 values
--------------------------------------------------------------------------------
  command #1:   c1  Thu Aug 13 13:45:19 2020    EPOCH   sig=EEG
  command #2:   c2  Thu Aug 13 13:45:19 2020    MASK    if=W sig=EEG
  command #3:   c3  Thu Aug 13 13:45:19 2020    RESTRUCTURE sig=EEG
  command #4:   c4  Thu Aug 13 13:45:19 2020    FILTER  bandpass=0.3,35 ripple=0.02 sig=EEG tw=1
  command #5:   c5  Thu Aug 13 13:45:20 2020    SIGSTATS    epoch=T sig=EEG
--------------------------------------------------------------------------------
distinct strata group(s):
  commands      : factors           : levels        : variables 
----------------:-------------------:---------------:---------------------------
  [EPOCH]       : .                 : 1 level(s)    : DUR GENERIC INC NE OFFSET
                :                   :               : 
  [MASK]        : EMASK             : 1 level(s)    : MASK_MODE MATCH_LOGIC MATCH_TYPE
                :                   :               : N_MASK_SET N_MASK_UNSET N_MATCHES
                :                   :               : N_RETAINED N_TOTAL N_UNCHANGED
                :                   :               :
                :                   :               : 
  [RESTRUCTURE] : .                 : 1 level(s)    : DUR1 DUR2 NA NR1 NR2 NS
                :                   :               : 
  [SIGSTATS]    : CH                : 1 level(s)    : H1 H2 H3
                :                   :               : 
  [SIGSTATS]    : E CH              : (...)         : H1 H2 H3
                :                   :               :
----------------:-------------------:---------------:---------------------------

The SIGSTATS command produces Hjorth parameters (H1, H2 and H3) both overall (CH strata group) and per-epoch (E CH strata group). To see the overall values:

destrat artifact.db +SIGSTATS -r CH -p 2

ID      CH   H1     H2    H3 
nsrr01  EEG  78.38  0.35  0.83

To save all per-epoch values to a file res.txt:

destrat artifact.db +SIGSTATS -r CH E  > res.txt

One can load this file in R, for example, to view the per-epoch distribution (we do this below also).

As well as viewing these statistics, we can use them to flag likely artifacts, by considering values that are statistical outliers in the per-epoch distribution. If we consider a new command script, below (and cmd/sixth.txt in the tutorial folder), we add the mask option to the SIGSTATS command, and specify a set of thresholds: 3,3,3 means to iteratively remove epochs that are more than 3 standard deviations above or below the mean. We also add the ARTIFACTS command beforehand, which implements the Buckelmueller et al. filtering referenced above. Note that the SIGSTATS command will only operate on unmasked values (the prior ARTIFACTS command will have masked some).

EPOCH 
MASK if=W
RESTRUCTURE
FILTER bandpass=0.3,35 ripple=0.02 tw=1
ARTIFACTS mask
CHEP-MASK ep-th=3,3,3 epoch
CHEP epochs
DUMP-MASK

Running this new script, which ends by saving the current mask, we see the following:

luna s.lst 1 sig=EEG -o mask.db < cmd/sixth.txt

 ... cont'd ...
 CMD #5: ARTIFACTS
   options: mask=T sig=EEG
  masked 12 of 887 epochs, altering 12
 ..................................................................
 CMD #6: CHEP-MASK
   options: ep-th=3,3,3 epoch=T sig=EEG
  within-channel/between-epoch outlier detection, ep-th = 3,3,3
   iteration 1: removed 21 channel/epoch pairs this iteration (21 in total)
   iteration 2: removed 9 channel/epoch pairs this iteration (30 in total)
   iteration 3: removed 5 channel/epoch pairs this iteration (35 in total)
 ..................................................................
 CMD #7: CHEP
   options: epochs=T sig=EEG
  masking epochs with >0% masked channels: 35 epochs
  CHEP summary:
   35 of 887 channel/epoch pairs masked (4%)
   35 of 887 epochs with 1+ masked channel, 35 with all channels masked
   1 of 1 channels with 1+ masked epoch, 0 with all epochs masked
 ..................................................................
 ... cont'd ...

That is, the first approach to artifact detection removes 12 epochs; the second approach (which itself comprises three iterations) removes a total of 35 additional epochs. We can extract the mask (output by the DUMP-MASK command)

destrat mask.db +DUMP-MASK -r E -v EMASK > mask.txt

This has 888 rows: one header plus the 887 non-wake epochs (i.e. after a RESTRUCTURE command, this is the "total" number of epochs left). Loading this file, plus the previous output from SIGSTATS into R, we can plot the per-epoch distribution of the Hjorth parameters, colored by whether that epoch was excluded by artifact detection (run the following in R):

 d <- read.table("res.txt",header=T,stringsAsFactors=F)
 m <- read.table("mask.txt",header=T,stringsAsFactors=F)
 m <- merge( d , m , by=c("ID","E") ) 

 par(mfcol=c(3,1),mar=c(4,4,1,1)) 
 plot( m$E , m$H1 , pch = 20 , xlab="Epoch" , ylab="H1" , col = m$EMASK + 1) 
 plot( m$E , m$H2 , pch = 20 , xlab="Epoch" , ylab="H2" , col = m$EMASK + 1) 
 plot( m$E , m$H3 , pch = 20 , xlab="Epoch" , ylab="H3" , col = m$EMASK + 1)

In practice, given the heterogeneous nature of sleep signals, naturally it may be preferable to perform stage-specific artifact detection, or only remove very aberrant signals based on global outlier status.

To examine one of the removed epochs more closely (we can use scope as below, also), here we illustrate using commands such as DUMP and MATRIX to obtain simple, plain-text signal data. This extracts the EEG signal only, for the first individual's epoch 220. (This was one of the flagged epochs, as you can see by examining the output of DUMP-MASK above.

luna s.lst 1 sig=EEG -s 'EPOCH & MASK epoch=220 & MATRIX file=eeg.txt minimal'

This is a plain-text file with one number (sample-point) per row. We expect 30 times 125 = 3750 rows, for a single 30-sec epoch of signal sampled at 125 Hz. In R:

s <- scan("eeg.txt")

Read 3750 items

We can then generate a plot of this signal, e.g. :

plot( seq(1/125,30,1/125), s, type="l", ylim=c(-125, 125 ) , 
      xlab="Time (sec)" , ylab="EEG" )

Info

lunaR provides a far easier means to directly import raw signal data into R, that does not involve intermediate text file; we'll consider this explicitly in the next tutorial section.

Spectral and spindle analyses

To generate the EEG power density spectrum for each individual, say during stage 2 NREM sleep, we can use the PSD command. Specifically, if we had a command file as follows:

EPOCH 
MASK ifnot=N2
RESTRUCTURE
FILTER bandpass=0.3,35 ripple=0.02 tw=1
ARTIFACTS mask
CHEP-MASK ep-th=3,3,3
CHEP epoch
RESTRUCTURE
PSD spectrum

then the following command should epoch the signals, only include N2 epochs, filter the EEG and attempt to remove epochs that contain gross artifact, and then calculate the power spectra, both for classical bands (delta, alpha, sigma, etc) and also for each frequency bin (subject to the resolution determined by the sampling frequency and the size of the intervals analyzed. Internally, Luna uses of Welch algorithm and FFT to estimate the PSD.

luna s.lst sig=EEG -o out.db < cmd/seventh.txt

Note

This script cmd/seventh.txt also contains two extra commands at the end to detect spindles, which we'll describe below.

The power spectra are values implicitly stratified both by channel (CH) and frequency (F), so we use the following command to extract the absolute power values:

destrat out.db +PSD -r F CH -v PSD > res.psd

In R, we can read the res.psd file as follows,

d <- read.table("res.psd",header=T,stringsAsFactors=F)

head(d)

      ID  CH    F      PSD
1 nsrr01 EEG 0.50 26.09547
2 nsrr01 EEG 0.75 36.36182
3 nsrr01 EEG 1.00 39.85385
4 nsrr01 EEG 1.25 30.42863
5 nsrr01 EEG 1.50 22.48063
6 nsrr01 EEG 1.75 19.02125

We can create 3 plots, on a log-scale, of

par(mfcol=c(1,3))
ids <- unique(d$ID)
for (i in ids) plot( d$F[ d$ID == i ] , log( d$PSD[ d$ID == i ] ) , 
  type="l" , xlab="Hz" , ylab="log(uV^2/Hz)" , lwd=2,main=i)

Especially for the second individual, we see the characteristic peak in the sigma band (11-15Hz), typically representing spindle activity during stage 2 NREM sleep.

As a side-note: naturally, without filtering and artifact detection, especially if including many waking epochs at the start and end of recordings, the power spectra can be very noisy, indicating many sources of (typically non-biological) artifact. For example, this command estimates the PSD without any prior filtering:

luna s.lst @cmd/vars.txt sig=EEG1,EEG2 -o psd-all.db \
   -s 'EPOCH & MASK if=W & PSD spectrum max=62'

destrat psd-all.db +PSD -r CH F -v PSD > res.psd-all

In R:

d <- read.table( "res.psd-all",header=T,stringsAsFactors=F)
par(mfrow=c(2,3))
ids <- unique(d$ID)
for (ch in c("EEG1","EEG2") )
 for (i in ids) 
   plot( d$F[ d$ID == i & d$CH == ch ] , log( d$PSD[ d$ID == i & d$CH == ch ] ) , 
            type="l" , xlab="Hz" , ylab="log(uV^2/Hz)" , lwd=2,main=paste(i,ch))

Especially for nsrr02, there is clearly a great deal of 50 and 60Hz line noise in the signals. Visual inspection of the signals, especially EEG2 reveals other artifacts (below). Depending on the type of analyses one wishes to perform, careful filtering and/or artifact detection/correction will be important.

Finally, as we noted above, the last two lines of cmd/seventh.txt additionally instructed Luna to detect spindles and then write them to a file (where the ^ indicates the individual ID is to be swapped in):

SPINDLES fc=11,15 annot=spindles
WRITE-ANNOTS file=spindles-^.txt annot=spindles no-specials

The fc options specify the targeted frequencies: slower spindles at 11 Hz and faster spindles at 15 Hz. The annot options request that an annotations be generated (for each individual) to represent the spindle calls -- these can be used as filters.

To extract the spindle density (count per minute, DENS) we use the following command:

destrat out.db +SPINDLES -r CH -c F -v DENS -p 2

ID     CH   DENS.F_11  DENS.F_15
nsrr01 EEG  0.73       0.64
nsrr02 EEG  1.29       2.06
nsrr03 EEG  0.72       0.19

Because we added the annot option, Luna generated some annotations that represent the spindle calls and some associated per-spindle meta-data (e.g. amplitude amp). These are then saved in the final command, WRITE-ANNOTS to .annot files.

head spindles-nsrr01.txt

class    instance  channel    start       stop   meta
spindles   15      EEG     2706.560   2707.176   amp=11.0679|dur=0.616|frq=13.7987|isa=0.790763|rp_mid=0.736842
spindles   11      EEG     2834.640   2835.224   amp=19.1707|dur=0.584|frq=10.274|isa=0.839118|rp_mid=0.638889
spindles   15      EEG     2868.120   2868.752   amp=12.5175|dur=0.632|frq=13.4494|isa=0.889038|rp_mid=0.410256
spindles   11      EEG     2868.832   2869.656   amp=15.9973|dur=0.824|frq=10.3155|isa=0.993035|rp_mid=0.294118
spindles   15      EEG     2870.896   2871.408   amp=13.3631|dur=0.512|frq=13.6719|isa=0.900901|rp_mid=0.730159
spindles   15      EEG     2905.008   2905.696   amp=16.0575|dur=0.688|frq=13.0814|isa=1.64327|rp_mid=0.341176
spindles   15      EEG     2949.696   2950.520   amp=19.6564|dur=0.824|frq=14.5631|isa=2.4089|rp_mid=0.578431
spindles   11      EEG     2969.200   2969.768   amp=15.1515|dur=0.568|frq=9.6831|isa=0.823499|rp_mid=0.271429
spindles   11      EEG     2978.264   2979.192   amp=14.4497|dur=0.928|frq=10.2371|isa=0.924797|rp_mid=0.365217
...

There are a large number of other options and output variables for the SPINDLES and other commands, described in the Reference pages of this web site.