Skip to content

A deeper dive

In this third section of the tutorial, we use Luna to calculate some typical metrics of interest for sleep studies, including:

  • how basic statistics vary across the night
  • measures of sleep macro-architecture
  • filtering and artifact detection in the EEG
  • spectral analysis using fast Fourier transforms
  • spindle detection using wavelets

Epoch-level summaries

Returning to the STATS command described above, here we use it with the epoch option, in order to generate per-epoch statistics. This section also illustrates working with the text output format.

luna s.lst 2 sig=EMG,ECG -s "EPOCH & STATS epoch" > res.txt
===================================================================
+++ luna | v0.2, 17-Mar-2019  |  starting 18-Mar-2019 16:09:46  +++
===================================================================
input(s): s.lst
output  : .
signals : ECG EMG
commands: c1  EPOCH 
        : c2  STATS epoch=T

___________________________________________________________________
Processing: nsrr02 [ #2 ]
 duration 09:57:30 hrs, last time-point 09:57:30 hrs after start
  35850 records, each of 1 second(s)

 signals: 2 (of 14) selected in a standard EDF file:
  ECG | EMG

 annotations:
  [NREM1] 11 instance(s) (from edfs/learn-nsrr02-profusion.xml)
  [NREM2] 399 instance(s) (from edfs/learn-nsrr02-profusion.xml)
  [NREM3] 185 instance(s) (from edfs/learn-nsrr02-profusion.xml)
  [REM] 120 instance(s) (from edfs/learn-nsrr02-profusion.xml)
  [apnea_obstructive] 5 instance(s) (from edfs/learn-nsrr02-profusion.xml)
  [arousal] 51 instance(s) (from edfs/learn-nsrr02-profusion.xml)
  [artifact_SpO2] 34 instance(s) (from edfs/learn-nsrr02-profusion.xml)
  [desat] 66 instance(s) (from edfs/learn-nsrr02-profusion.xml)
  [hypopnea] 101 instance(s) (from edfs/learn-nsrr02-profusion.xml)
  [wake] 480 instance(s) (from edfs/learn-nsrr02-profusion.xml)
 ..................................................................
 CMD #1: EPOCH
 set epochs, length 30 (step 30), 1195 epochs
 ..................................................................
 CMD #2: STATS
 processing ECG ...
 processing EMG ...

___________________________________________________________________
...processed 1 EDFs, done.
...processed 1 command set(s),  all of which passed
-------------------------------------------------------------------
+++ luna | finishing 18-Mar-2019 16:09:48                       +++
===================================================================

This command was applied only to the second EDF in the sample-list (i.e. the 2 argument), and only to the ECG and EMG signals. After defining 30-second epochs (the default from the EPOCH command), it generates basic statistics per-epoch (the epoch option for the STATS command), as well as entire-signal summaries. Note that 1195 epochs are generated, which corresponds to the stated duration of 9 hours, 57 and a half minutes (i.e. ( 9 * 60 + 57 ) * 2 + 1 = 1195).

The first few lines of res.txt are as follows:

 nsrr02  STATS   CH/ECG  .       MIN     -1.2402
 nsrr02  STATS   CH/ECG  .       MAX     1.25
 nsrr02  STATS   CH/ECG  .       MEAN    0.00567068
 nsrr02  STATS   CH/ECG  .       SD      0.274438
 nsrr02  STATS   CH/ECG  .       RMS     0.274497
 nsrr02  STATS   CH/ECG  1       MIN     -0.651961
 nsrr02  STATS   CH/ECG  1       MAX     1.25
 nsrr02  STATS   CH/ECG  1       MEAN    0.00506797
 nsrr02  STATS   CH/ECG  1       SD      0.271021
 nsrr02  STATS   CH/ECG  1       RMS     0.271051
 nsrr02  STATS   CH/ECG  2       MIN     -0.720588
 nsrr02  STATS   CH/ECG  2       MAX     1.25
 nsrr02  STATS   CH/ECG  2       MEAN    0.00471634
 nsrr02  STATS   CH/ECG  2       SD      0.279001
 nsrr02  STATS   CH/ECG  2       RMS     0.279023

That is, the first five lines (for which the 4th column is empty, .) are the whole-signal averages for the ECG. The subsequent lines are the per-epoch values: e.g. the mean is 0.00506797 for the first epoch, 0.00471634 for the second, etc.

This format is designed to be easily parsed and read by other software. For example, to use R to plot the per-epoch RMS values of the ECG, we might use the following (from within R):

d <- read.table( "res.txt" , header=F , sep="\t" , stringsAsFactors=F )

Hint

As you'll see in the next section of this tutorial, it is now possible to read lout databases directly into R

After loading the data, it is convenient to relabel the 6 columns to have sensible names:

names(d) <- c( "ID" , "COMM" , "STRATA" , "T" , "VAR" , "VAL" ) 

Our goal is to generate a plot of per-epoch RMS for the ECG. We can extract these values by filtering rows where VAR equals "RMS", when T is not ".", and when STRATA is "CH/ECG":

d <- d[ d$STRATA == "CH/ECG" & d$T != "." & d$VAR == "RMS" , ] 

We can confirm that this produces the expected number of rows/epochs:

dim(d)
 [1] 1195    6

We need to ensure that the epoch and value fields are treated as numeric data points:

d$T <- as.numeric(d$T)
d$VAL <- as.numeric(d$VAL)

As a sanity-check, we can view the structure of the data frame:

str(d)
 'data.frame':  1364 obs. of  6 variables:
  $ ID    : chr  "nsrr01" "nsrr01" "nsrr01" "nsrr01" ...
  $ COMM  : chr  "STATS" "STATS" "STATS" "STATS" ...
  $ STRATA: chr  "CH/ECG" "CH/ECG" "CH/ECG" "CH/ECG" ...
  $ T     : num  1 2 3 4 5 6 7 8 9 10 ...
  $ VAR   : chr  "RMS" "RMS" "RMS" "RMS" ...
  $ VAL   : num  0.0708 0.0706 0.0748 0.0698 0.0669 ...

To plot these per-epochs values:

plot( d$T , d$VAL , ylab = "RMS(ECG)" , xlab = "Epoch" , pch=20 )

img

Clearly there are a number of extreme points towards the end of the recording, presumably indicating bad data after the subject woke for the final time that night. We can confirm this by indicating the manually-annotated sleep stage for each point. Going back to the command line (type q() to get out of R, or even better, run the next couple of Luna commands in a separate window to keep the R session open), we can use Luna to extract sleep stages (using the STAGE command) in a format that can be read into R:

luna s.lst 2 -o ss.db -s STAGE
and extracting the output as a text file ss.txt:
destrat ss.db +STAGE -r E -v STAGE > ss.txt

We then load this sleep stage information into R as follows, which indicates that we have the expected number of epochs:

ss <- read.table( "ss.txt" , header=T )
dim(ss)
[1] 1195    3

This is the breakdown of epochs by each sleep stage for this individual:

table(ss$STAGE)
NREM1 NREM2 NREM3   REM  wake 
   11   399   185   120   480 

If you previously closed the R session, you'll need to rerun the commands to create the d data-frame. Then, plotting epochs colored by sleep stage, we indeed confirm that the end aberrant epochs are all wake: (plot not shown).

plot( d$T , d$VAL  , ylab = "RMS(ECG)" , xlab = "Epoch" , pch=20 , col = ss$STAGE  )
cols <- sort(unique(ss$STAGE))
legend("topleft",legend=cols , fill = cols ) 

Simply ignoring these final points (which roughly corresponds to looking only up to the 950th epoch in this example), we can generate a clearer plot of how the RMS of the ECG varies across the night in this individual, which tracks with the progression of sleep and wake in a semi-predictable manner:

plot( d$T[ 1:950 ] , d$VAL[ 1:950 ]  , ylab = "RMS(ECG)" , xlab = "Epoch" , pch=20 , col = ss$STAGE ) 
legend("bottomleft",legend=cols , fill = cols )

img

Hypnograms

Given a set of epoch-based staging values, the HYPNO command produces a series of statistics that describe different aspects of sleep macro-architecture.

luna s.lst -o stage.db -s "EPOCH & HYPNO"

This generate a large number of variables, viewed here with the dictionary option of destrat (-d):

destrat stage.db -d
stage.db LIGHTS_OUT STAGE   Lights out time [0,24)
stage.db SLEEP_ONSET    STAGE   Sleep onset time [0,24)
stage.db SLEEP_MIDPOINT STAGE   Sleep mid-point time [0,24)
stage.db FINAL_WAKE STAGE   Final wake time [0,24)
stage.db LIGHTS_ON  STAGE   Lights on time [0,24)
stage.db NREMC      STAGE   Number of NREM cycles
stage.db NREMC_MINS STAGE   Average NREM cycle duration (mins)
stage.db TIB        STAGE   Time in Bed (hours): LIGHTS_OUT --> LIGHTS_ON
stage.db TST        STAGE   Total Sleep Time (hours): SLEEP_ONSET --> FINAL_WAKE
stage.db TPST       STAGE   Total persistent Sleep Time (hours): PERSISTENT_SLEEP_ONSET --> FINAL_WAKE
stage.db TWT        STAGE   Total Wake Time (hours): all WAKE
stage.db WASO       STAGE   Wake After Sleep Onset (hours)
stage.db SLP_LAT    STAGE   Sleep latency
stage.db PER_SLP_LAT    STAGE   Persistent sleep latency
stage.db SLP_EFF    STAGE   Sleep efficiency: LIGHTS_OUT --> LIGHTS_ON
stage.db SLP_MAIN_EFF   STAGE   Sleep maintainence efficiency
stage.db SLP_EFF2   STAGE   Sleep efficiency: SLEEP_ONSET --> FINAL_WAKE
stage.db REM_LAT    STAGE   REM latency (from SLEEP_ONSET)
stage.db PCT_N1     STAGE   Proportion of sleep that is N1
stage.db PCT_N2     STAGE   Proportion of sleep that is N2
stage.db PCT_N3     STAGE   Proportion of sleep that is N3
stage.db PCT_N4     STAGE   Proportion of sleep that is N4
stage.db PCT_REM    STAGE   Proportion of sleep that is REM
stage.db MINS_N1    STAGE   Proportion of sleep that is N1
stage.db MINS_N2    STAGE   Proportion of sleep that is N2
stage.db MINS_N3    STAGE   Proportion of sleep that is N3
stage.db MINS_N4    STAGE   Proportion of sleep that is N4
stage.db MINS_REM   STAGE   Proportion of sleep that is REM
stage.db NREMC_START    STAGE   NREM cycle start epoch
stage.db NREMC_NREM_MINS    STAGE      NREM cycle NREM duration (mins)
stage.db NREMC_REM_MINS     STAGE      NREM cycle REM duration (mins)
stage.db NREMC_OTHER_MINS   STAGE      NREM cycle other duration (mins)
stage.db MINS           STAGE      Elapsed time since start of recording (minutes)
stage.db CLOCK_TIME     STAGE      Clock time (hh:mm:ss)
stage.db CLOCK_HOURS        STAGE      Clock time [0,24) hours
stage.db STAGE          STAGE      Sleep stage, string label
stage.db STAGE_N        STAGE      Sleep stage, numeric encoding
stage.db E_WAKE         STAGE      Elapsed wake (mins)
stage.db E_WASO         STAGE      Elapsed WASO (mins)
stage.db E_SLEEP        STAGE      Elapsed sleep (mins)
stage.db E_N1           STAGE      Elapsed N1 (mins)
stage.db E_N2           STAGE      Elapsed N2 (mins)
stage.db E_N3           STAGE      Elapsed N3 (mins)
stage.db E_REM          STAGE      Elapsed REM (mins)
stage.db PCT_E_SLEEP        STAGE      Elapsed sleep (percent of all sleep)
stage.db PCT_E_N1       STAGE      Elapsed N1 (percent of all N1)
stage.db PCT_E_N2       STAGE      Elapsed N2 (percent of all N2)
stage.db PCT_E_N3       STAGE      Elapsed N3 (percent of all N3)
stage.db PCT_E_REM      STAGE      Elapsed REM (percent of all REM)
stage.db PERSISTENT_SLEEP   STAGE      Persistent sleep yes/no? (1=Y)
stage.db CYCLE          STAGE      NREMC number
stage.db PERIOD         STAGE      NREMC period (NREM/REM)
stage.db CYCLE_POS_REL      STAGE      Position within NREMC, relative
stage.db CYCLE_POS_ABS      STAGE      Position within NREMC, absolute (mins)
stage.db FLANKING_SIM       STAGE      Number of similar epochs w.r.t. stage
stage.db NEAREST_WAKE       STAGE      Number of epochs until the nearest wake
stage.db NREM2REM       STAGE      If NREM epoch, number of NREM if next non-NREM is REM
stage.db NREM2REM_TOTAL     STAGE      If NREM epoch, total number of contiguous NREM if next non-NREM is REM
stage.db NREM2WAKE      STAGE      If NREM epoch, number of NREM if next non-NREM is WAKE
stage.db NREM2WAKE_TOTAL    STAGE      If NREM epoch, total number of contiguous NREM if next non-NREM is WAKE
stage.db N2_WGT         STAGE      Score for descending/ascending N2 epochs (-1 to +1)

There are three different strata groups in the output from STAGE:

destrat stage.db
--------------------------------------------------------------------------------
stage.db: 2 command(s), 3 individual(s), 64 variable(s), 107902 values
--------------------------------------------------------------------------------
  command #1:   c1  Mon Mar 18 16:20:23 2019    EPOCH   
  command #2:   c2  Mon Mar 18 16:20:23 2019    HYPNO   
--------------------------------------------------------------------------------
distinct strata group(s):
  commands      : factors           : levels        : variables 
----------------:-------------------:---------------:---------------------------
  [EPOCH]       : .                 : 1 level(s)    : DUR INC NE
                :                   :               : 
  [HYPNO]       : .                 : 1 level(s)    : FINAL_WAKE LIGHTS_ON LIGHTS_OUT
                :                   :               : MINS_N1 MINS_N2 MINS_N3 MINS_N4
                :                   :               : MINS_REM NREMC NREMC_MINS PCT_N1
                :                   :               : PCT_N2 PCT_N3 PCT_N4 PCT_REM PER_SLP_LAT
                :                   :               : REM_LAT SLEEP_MIDPOINT SLEEP_ONSET
                :                   :               : SLP_EFF SLP_EFF2 SLP_LAT SLP_MAIN_EFF
                :                   :               : TIB TPST TST TWT WASO
                :                   :               : 
  [HYPNO]       : E                 : (...)         : CLOCK_HOURS CLOCK_TIME CYCLE CYCLE_POS_ABS
                :                   :               : CYCLE_POS_REL E_N1 E_N2 E_N3 E_REM
                :                   :               : E_SLEEP E_WAKE E_WASO FLANKING_SIM
                :                   :               : MINS N2_WGT NEAREST_WAKE NREM2REM
                :                   :               : NREM2REM_TOTAL NREM2WAKE NREM2WAKE_TOTAL
                :                   :               : PCT_E_N1 PCT_E_N2 PCT_E_N3 PCT_E_REM
                :                   :               : PCT_E_SLEEP PERIOD PERSISTENT_SLEEP
                :                   :               : STAGE STAGE_N WASO
                :                   :               : 
  [HYPNO]       : C                 : 6 level(s)    : NREMC_MINS NREMC_NREM_MINS NREMC_OTHER_MINS
                :                   :               : NREMC_REM_MINS NREMC_START
                :                   :               : 
----------------:-------------------:---------------:---------------------------

The first (.) HYPNO group is a set of variables with no stratifiers: that is, simply one value per individual/EDF. These include minutes of N2 sleep (MINS_N2). To view the number of minutes of each sleep stage for the three individuals, and also the number of NREM cycles NREMC, we can use:

destrat stage.db +HYPNO -v MINS_N1 MINS_N2 MINS_N3 MINS_REM NREMC
ID      MINS_N1 MINS_N2 MINS_N3 MINS_REM NREMC
nsrr01  54.5    261.5   8       119      6
nsrr02  5.5     199.5   92.5    60       5
nsrr03  26      187.5   10.5    0        3

To view cycle-level variables (C factor), we can use the following to see the duration (in minutes) and start (in epoch number):

destrat stage.db +HYPNO -r C -v NREMC_START NREMC_MINS
ID       C    NREMC_MINS  NREMC_START
nsrr01   1    45          87
nsrr01   2    100.5       184
nsrr01   3    47          451
nsrr01   4    77.5        654
nsrr01   5    60          819
nsrr01   6    50          956
nsrr02   1    72.5        92
nsrr02   2    133         237
nsrr02   3    42          503
nsrr02   4    105.5       675
nsrr02   5    32.5        887
nsrr03   1    69          118
nsrr03   2    74          312
nsrr03   3    137.5       630

To illustrate extracting epoch-level information: here we consider the designated sleep stage (STAGE variable), along with the time (CLOCK_TIME) and a measure of how many nearby epochs had the same stage (FLANKING_SIM). The latter is the minimum number of contiguous epochs, either forwards or backwards in time, that are similar to the index epoch (truncated at the start and end of the recording). In other words, selecting epochs with 3 or more for this variable means that there are at least 3 epochs before and 3 epochs after with a similar stage. This can be used to select periods of sleep that are less likely to contain stage transitions, for example.

destrat stage.db +HYPNO -i nsrr01  -r E  -v STAGE FLANKING_SIM CLOCK_TIME
ID       E      CLOCK_TIME   FLANKING_SIM  STAGE
nsrr01   1      21:58:17     0             Wake 
nsrr01   2      21:58:47     1             Wake 
nsrr01   3      21:59:17     2             Wake 
nsrr01   4      21:59:47     3             Wake 
nsrr01   5      22:00:17     4             Wake 
nsrr01   6      22:00:47     5             Wake 
nsrr01   7      22:01:17     6             Wake 
nsrr01   8      22:01:47     7             Wake 
nsrr01   9      22:02:17     8             Wake 
nsrr01   10     22:02:47     9             Wake 
nsrr01   11     22:03:17     10            Wake 
 ... cont'd ...                     
nsrr01   183    23:29:17     0             NREM1
nsrr01   184    23:29:47     0             NREM2
nsrr01   185    23:30:17     0             Wake 
nsrr01   186    23:30:47     0             NREM1
nsrr01   187    23:31:17     0             NREM2
nsrr01   188    23:31:47     1             NREM2
nsrr01   189    23:32:17     2             NREM2
nsrr01   190    23:32:47     3             NREM2
nsrr01   191    23:33:17     4             NREM2
nsrr01   192    23:33:47     5             NREM2
nsrr01   193    23:34:17     6             NREM2
nsrr01   194    23:34:47     7             NREM2
nsrr01   195    23:35:17     8             NREM2
nsrr01   196    23:35:47     9             NREM2
nsrr01   197    23:36:17     10            NREM2
nsrr01   198    23:36:47     10            NREM2
nsrr01   199    23:37:17     9             NREM2
nsrr01   200    23:37:47     8             NREM2
nsrr01   201    23:38:17     7             NREM2
nsrr01   202    23:38:47     6             NREM2
nsrr01   203    23:39:17     5             NREM2
nsrr01   204    23:39:47     4             NREM2
nsrr01   205    23:40:17     3             NREM2
nsrr01   206    23:40:47     2             NREM2
nsrr01   207    23:41:17     1             NREM2
nsrr01   208    23:41:47     0             NREM2
nsrr01   209    23:42:17     0             Wake 
nsrr01   210    23:42:47     0             NREM2
 ... cont'd 

Epoch-level masks

To illustrate applying different epoch-level masks, consider the following (somewhat unrealistic) example: to select epochs that:

  • are in persistent sleep as defined above (i.e. at least 10 minutes of sleep prior)
  • occur between 11pm and 3am
  • are during stage 2 NREM sleep
  • do not contain any apnea or hypopnea events

For the first individual, we can use awk to extract the indicator (0/1 for N/Y) of persistent sleep: (we ignore the header and print the 3rd column)

destrat stage.db +HYPNO -i nsrr01 -r E -v PERSISTENT_SLEEP \
  | awk ' NR>1 { print $3 } ' > aux/ps-nsrr01.eannot 

which generates an epoch-annotation file ps-nsrr01.eannot.

Windows users

For users who do not have awk immediately available on your system, the output file ps-nsrr01.eannot is already present in the aux tutorial folder, so you do not need to run the above command. We'll see other ways in lunaR to work with epoch-level annotations.

The following Luna command script (aux/third.txt) might then be used:

EPOCH
MASK none
MASK hms=23:00:00-03:00:00
EPOCH-ANNOT file=aux/ps-^.eannot recode=0=exc,1=inc
MASK mask-if=exc
MASK mask-ifnot=NREM2
MASK mask-if=apnea_obstructive,hypopnea
DUMP-MASK

This command file first epochs the dataset, in 30-second epochs by default. Next, we clear the mask (i.e. to include all epochs). We then use the hms flag to mask any epoch outside of the specified range. (Actually, this includes any epoch that has any overlap with the interval between 23:00:00 and 03:00:00, i.e. so it will also include a 30-second epoch that starts at 22:59:47, for example. If you require more precise timing, should should set epoch duration to be 1 second, not 30 seconds, for example, when running the MASK hms command.)

Info

You must use 24 hour hh:mm:ss notation, so 11pm is 23:00:00.

We then read the .eannot file as an epoch-level annotation, optionally recoding the 0 and 1 labels to have the slightly more meaningful labels of exc and inc (i.e. to exclude or include epochs) based on the absence or presence of persistent sleep. That is, we mask those excluded epochs. Note that we've used a special character ^ in the script -- this is expanded into the current ID of the EDF (based on the first column of the sample-list), thereby providing a convenient way to use a single command script across multiple individuals. That is, here it looked for a file called aux/ps-nsrr01.eannot.

We then mask any epoch that is not stage 2 NREM sleep (annotation NREM2). Finally, we mask any epochs that contain at least one apnea or hypopnea event. Because these tutorial files come from the NSRR, and so have associated annotation files, we know about these types of respiratory events, i.e. they were manually annotated. Luna has a set of automatic functions that recast NSRR annotations into a single, clean set: this page is where apnea_obstructive and hypopnea are defined.

Note

Here we used the mask-ifnot rather than ifnot flag: the difference is that the former will not unmask NREM2 epochs that were previously masked. Remember:

  • To mask or set the mask to true means to exclude that epoch.
  • To unmask or set the mask to false mean to include that epoch.

See the page on masks for a detailed description of how the MASK syntax works in Luna.

Let's run this command, just for nsrr01, in order to study the output sent to the console that describes the masking process (i.e. we'll ignore the output in out.db):

luna s.lst nsrr01 -o out.db < aux/third.txt

Looking at the output written to the console, we can track what happened at each MASK stage. First, all epochs are included (with the none mask option):

 CMD #2: MASK
 reset all 1364 epochs to be included

Next, epochs between 11pm and 3am are included, all other epochs are masked. As a sanity check, from epoch 124 to 604 inclusive is 481 epochs, which is 4 hours and 30 seconds (i.e. this includes an extra epoch that started at 22:59:47 and so still overlaps the 11pm-3am interval):

 CMD #3: MASK
 selecting epochs from 124 to 604; 883 newly masked, 0 unmasked, 0 unchanged
 total of 481 of 1364 retained

At this point there are 481 unmasked epochs, although because we have not yet issued any RESTRUCTURE command, the effective duration of the in-memory EDF is still 1364 epochs. As such, we can still attach the annotations from the file aux/ps-nsrr01.eannot, i.e. which is expecting 1364 epochs:

 CMD #4: EPOCH-ANNOT
  remapping from 0 to exc
  remapping from 1 to inc
 mapping 2 distinct epoch-annotations (1364 in total) from aux/ps-nsrr01.eannot

We then apply a mask based on this annotation: 980 epochs have the flag set to be excluded (i.e. from the file); of those, only 329 must be within the 11pm to 3am window, as only that many epochs have their status changed (i.e. because we've set these MASK commands up to only ever mask, not to unmask, epochs), leaving us with 152 epochs:

 CMD #5: MASK
 set masking mode to 'mask' (default)
 based on exc 980 epochs match; 329 newly masked, 0 unmasked, 1035 unchanged
 total of 152 of 1364 retained

Next, we further mask based on not being in NREM2 sleep, which drops us to 142 epochs:

 CMD #6: MASK
 set masking mode to 'mask' (default)
 based on NREM2 523 epochs match; 10 newly masked, 0 unmasked, 1354 unchanged
 total of 142 of 1364 retained

Finally, we screen for matching any apnea or hypopnea event, which takes us down to 55 epochs:

 CMD #7: MASK
 set masking mode to 'mask' (default)
 based on apnea_obstructive 60 epochs match; 3 newly masked, 0 unmasked, 1361 unchanged
 total of 139 of 1364 retained
 based on hypopnea 509 epochs match; 84 newly masked, 0 unmasked, 1280 unchanged
 total of 55 of 1364 retained

Manipulating EDFs

Here we give an overview of filtering, resampling, relabelling, re-referencing and re-writing signal data.

The following command script (aux/fourth.txt) illustrates the above features:

mV 
RESAMPLE sr=100
FILTER bandpass=0.3,35 ripple=0.02 tw=1
REFERENCE ref=EEG1 sig=EEG2 
WRITE edf-tag=v2 edf-dir=newedfs/ sample-list=new.lst

Using the following Luna command (note use of the parameter file aux/vars.txt to define new labels EEG1 and EEG2):

luna s.lst @aux/vars.txt sig=EEG1,EEG2 < aux/fourth.txt
Processing: nsrr01 [ #1 ]
 duration 11:22:00 hrs, last time-point 11:22:00 hrs after start
  40920 records, each of 1 second(s)

 signals: 2 (of 14) selected in a standard EDF file:
  EEG2 | EEG1

 annotations:
  [NREM1] 109 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [NREM2] 523 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [NREM3] 16 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [NREM4] 1 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [REM] 238 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [apnea_obstructive] 37 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [arousal] 194 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [artifact_SpO2] 59 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [desat] 254 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [hypopnea] 361 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [wake] 477 instance(s) (from edfs/learn-nsrr01-profusion.xml)
 ..................................................................
 CMD #1: mV
 rescaled EEG1 to mV
 rescaled EEG2 to mV
 ..................................................................
 CMD #2: RESAMPLE
 resampling channel EEG1 from sample rate 125 to 100
 resampling channel EEG2 from sample rate 125 to 100
 ..................................................................
 CMD #3: FILTER
 filtering channel EEG1, bandpass FIR order 183
 filtering channel EEG2, bandpass FIR order 183
 ..................................................................
 CMD #4: REFERENCE
 referencing EEG2 with respect to EEG1
 ..................................................................
 CMD #5: WRITE
 appending newedfs/learn-nsrr01-v2.edf to sample-list new.lst
nsrr01     WRITE               .  .       NR1   40920
nsrr01     WRITE               .  .       NR2   40920
nsrr01     WRITE               .  .       DUR1  40920
nsrr01     WRITE               .  .       DUR2  40920
 saved new EDF, newedfs/learn-nsrr01-v2.edf
___________________________________________________________________

This selects only the two EEG signals, based on their aliases EEG1 and EEG2. the mV command rescales any signals in uV and V units to mV, changing the EDF headers appropriately too. The RESEAMPLE command takes an argument sr which is the new sampling rate (i.e. set to 100Hz here). The FILTER command designs and applies a FIR filter with transition frequencies at 0.3 and 35 Hz.

The REFERENCE command re-references the signal data -- in this example, for illustration of the syntax only, we reference EEG2 relative to EEG1 (i.e. subtract EEG1 values from EEG2). Signals must have the same sampling rate to be referenced. Multiple, comma-delimited signals can be given as the ref, in which case the average of those values is used as the reference; if multiple signals are given for the sig command, then each signal is re-referenced with respect to whatever ref was specified.

Finally, the WRITE command generates a new EDF, that will reflect these changes. The edf-tag value is appended to the original filename for the EDF, and it is written to a directory edf-dir (if this doesn't exist, it will be created). A new sample-list new.lst is also generated, with each newly generated EDF being appended to it. (Note, values are only appended to the sample-list, to facilitate running Luna in parallel, so if new.lst already existed and was not empty, you might want to delete that file before running the above.)

We can confirm that the new sample list is as expected:

cat new.lst
nsrr01     newedfs/learn-nsrr01-v2.edf
nsrr02     newedfs/learn-nsrr02-v2.edf
nsrr03     newedfs/learn-nsrr03-v2.edf

We can also inspect the new sets of EDFs, using the SUMMARY command on new.lst:

luna new.lst nsrr01 -s SUMMARY
EDF filename   : newedfs/learn-nsrr01-v2.edf
Patient ID     : 
Recording info : 
Start date     : 01.01.85
Start time     : 21:58:17

# signals      : 2
# records      : 40920
Duration       : 1

Signal 1 : [EEG2]
       # samples per record : 100
       transducer type      : 
       physical dimension   : mV
       min/max (phys)       : -0.3726/0.391668
       EDF min/max (phys)   : -0.3726/0.391668
       min/max (digital)    : -32768/32767
       EDF min/max (digital): -32768/32767
       pre-filtering        : 

Signal 2 : [EEG1]
       # samples per record : 100
       transducer type      : 
       physical dimension   : mV
       min/max (phys)       : -0.24235/0.237407
       EDF min/max (phys)   : -0.24235/0.237407
       min/max (digital)    : -32768/32767
       EDF min/max (digital): -32768/32767
       pre-filtering        : 

That is, in relation to the original SUMMARY for this individual, the sampling rate is different, the labels have been reset to their alias values, there are now only two signals, not 14, the units of mV and not uV, and the min/max range is different (reflecting both the change in scale, and the filtering and referencing).

Artifact detection

Here we detect likely artifact in EEG (and other) signals, using two approaches. The first, specific to sleep EEG, is as described by Buckelmueller et al (2006). The second is based on per-epoch statistical filtering of RMS and Hjorth parameters.

Running just for the first individual, the command file aux/fifth.txt contains the following:

EPOCH 
MASK if=wake
RESTRUCTURE
FILTER bandpass=0.3,35 ripple=0.02 tw=1
SIGSTATS epoch

After filtering the signal to consider only sleep epochs, and filtering the signal to remove very slow and fast rhythms, we use the SIGSTATS command to generate Hjorth parameters (and also RMS) for a signal, both overall (i.e. all sleep epochs) and per-epoch (by virtue of the epoch option). We run Luna and save the results in artifact.db, only for a single EEG channel for now:

luna s.lst 1 sig=EEG -o artifact.db < aux/fifth.txt

Viewing the contents of artifact.db:

destrat artifact.db
--------------------------------------------------------------------------------
artifact.db: 5 command(s), 1 individual(s), 18 variable(s), 4453 values
--------------------------------------------------------------------------------
  command #1:   c1  Mon Mar 18 17:34:41 2019    EPOCH   
  command #2:   c2  Mon Mar 18 17:34:41 2019    MASK    
  command #3:   c3  Mon Mar 18 17:34:41 2019    RESTRUCTURE 
  command #4:   c4  Mon Mar 18 17:34:42 2019    FILTER      
  command #5:   c5  Mon Mar 18 17:34:43 2019    SIGSTATS    
--------------------------------------------------------------------------------
distinct strata group(s):
  commands      : factors           : levels        : variables 
----------------:-------------------:---------------:---------------------------
  [EPOCH]       : .                 : 1 level(s)    : DUR INC NE
                :                   :               : 
  [RESTRUCTURE] : .                 : 1 level(s)    : DUR1 DUR2 NR1 NR2
                :                   :               : 
  [SIGSTATS]    : CH                : 1 level(s)    : CLIP H1 H2 H3 RMS
                :                   :               : 
  [MASK]        : EPOCH_MASK        : 1 level(s)    : N_MASK_SET N_MASK_UNSET N_MATCHES
                :                   :               : N_RETAINED N_TOTAL N_UNCHANGED
                :                   :               :
                :                   :               : 
  [SIGSTATS]    : E CH              : (...)         : CLIP H1 H2 H3 RMS
                :                   :               : 
----------------:-------------------:---------------:---------------------------

The SIGSTATS command produces Hjorth parameters (H1, H2 and H3) both overall (CH strata group) and per-epoch (E CH strata group). To see the overall values:

destrat artifact.db +SIGSTATS -r CH -p 2
ID      CH   CLIP   H1     H2    H3    RMS
nsrr01  EEG  0.00   78.38  0.35  0.83  8.54

To save all per-epoch values to a file res.txt:

destrat artifact.db +SIGSTATS -r CH E  > res.txt

One can load this file in R, for example, to view the per-epoch distribution (we do this below also).

As well as viewing these statistics, we can use them to flag likely artifacts, by considering values that are statistical outliers in the per-epoch distribution. If we consider a new command script, below (and aux/sixth.txt in the tutorial folder), we add the mask option to the SIGSTATS command, and specify a set of thresholds: 3,3,3 means to iteratively remove epochs that are more than 3 standard deviations above or below the mean. We also add the ARTIFACTS command beforehand, which implements the Buckelmueller et al. filtering referenced above. Note that the SIGSTATS command will only operate on unmasked values (the prior ARTIFACTS command will have masked some).

EPOCH 
MASK if=wake
RESTRUCTURE
FILTER bandpass=0.3,35 ripple=0.02 tw=1
ARTIFACTS mask
SIGSTATS mask threshold=3,3,3 epoch
DUMP-MASK

Running this new script, which ends by saving the current mask, we see the following:

luna s.lst 1 sig=EEG -o mask.db < aux/sixth.txt
 ... cont'd ...
 CMD #5: ARTIFACTS
 masked 12 of 887 epochs, altering 12
 ..................................................................
 CMD #6: SIGSTATS
 RMS/Hjorth filtering EEG, threshold +/-3 SDs: removed 21 of 875 epochs of iteration 1
 RMS/Hjorth filtering EEG, threshold +/-3 SDs: removed 9 of 854 epochs of iteration 2
 RMS/Hjorth filtering EEG, threshold +/-3 SDs: removed 5 of 845 epochs of iteration 3
 Overall, masked 35 of 875 epochs (RMS:6, CLP:0, ACT:20, MOB:12, CMP:3)
 ... cont'd ...

That is, the first approach to artifact detection removes 12 epochs; the second approach (which itself comprises three iterations) removes a total of 35 additional epochs. We can extract the mask (output by the DUMP-MASK command)

destrat mask.db +DUMP-MASK -r E -v EPOCH_MASK > mask.txt

This has 888 rows: one header plus the 877 non-wake epochs (i.e. after a RESTRUCTURE command, this is the "total" number of epochs left). Loading this file, plus the previous output from SIGSTATS into R, we can plot the per-epoch distribution of the Hjorth parameters, colored by whether that epoch was excluded by artifact detection (run the following in R):

 d <- read.table("res.txt",header=T,stringsAsFactors=F)
 m <- read.table("mask.txt",header=T,stringsAsFactors=F)
 m <- merge( d , m , by=c("ID","E") ) 

 par(mfcol=c(3,1),mar=c(4,4,1,1)) 
 plot( m$E , m$H1 , pch = 20 , xlab="Epoch" , ylab="H1" , col = m$EPOCH_MASK + 1) 
 plot( m$E , m$H2 , pch = 20 , xlab="Epoch" , ylab="H2" , col = m$EPOCH_MASK + 1) 
 plot( m$E , m$H3 , pch = 20 , xlab="Epoch" , ylab="H3" , col = m$EPOCH_MASK + 1) 

img

In practice, given the heterogeneous nature of sleep signals, naturally it may be preferable to perform stage-specific artifact detection, or only remove very aberrant signals based on global outlier status.

To examine one of the removed epochs more closely (we can use scope as below, also), here we illustrate using commands such as DUMP and MATRIX to obtain simple, plain-text signal data. This extracts the EEG signal only, for the first individual's epoch 220. (This was one of the flagged epochs, as you can see by examining the output of DUMP-MASK above.

luna s.lst 1 sig=EEG -s "EPOCH & MASK epoch=220 & MATRIX file=eeg.txt minimal"

This is a plain-text file with one number (sample-point) per row. We expect 30 times 125 = 3750 rows, for a single 30-sec epoch of signal sampled at 125 Hz. In R:

s <- scan("eeg.txt")
Read 3750 items

We can then generate a plot of this signal, e.g. :

plot( seq(1/125,30,1/125), s, type="l", ylim=c(-125, 125 ) , 
      xlab="Time (sec)" , ylab="EEG" )

img

Info

lunaR provides a far easier means to directly import raw signal data into R, that does not involve intermediate text file; we'll consider this explicitly in the next tutorial section.

Spectral and spindle analyses

To generate the EEG power density spectrum for each individual, say during stage 2 NREM sleep, we can use the PSD command. Specifically, if we had a command file as follows:

EPOCH 
MASK ifnot=NREM2
RESTRUCTURE
FILTER bandpass=0.3,35 ripple=0.02 tw=1
ARTIFACTS mask
SIGSTATS epoch mask threshold=3,3,3
RESTRUCTURE
PSD spectrum

then the following command should epoch the signals, only include NREM2 epochs, filter the EEG and attempt to remove epochs that contain gross artifact, and then calculate the power spectra, both for classical bands (delta, alpha, sigma, etc) and also for each frequency bin (subject to the resolution determined by the sampling frequency and the size of the intervals analyzed. Internally, Luna uses of Welch algorithm and FFT to estimate the PSD.

luna s.lst sig=EEG -o out.db < aux/seventh.txt

Note

This script aux/seventh.txt also contains two extra commands at the end to detect spindles, which we'll describe below.

The power spectra are values implicitly stratified both by channel (CH) and frequency (F), so we use the following command to extract the absolute power values:

destrat out.db +PSD -r F CH -v PSD > res.psd

In R, we can read the res.psd file as follows,

d <- read.table("res.psd",header=T,stringsAsFactors=F)

head(d)
       ID  CH   F        PSD
 1 nsrr01 EEG 0.0  0.9121815
 2 nsrr01 EEG 0.5 25.5634280 
 3 nsrr01 EEG 1.0 38.0639555
 4 nsrr01 EEG 1.5 26.5353447
 5 nsrr01 EEG 2.0 18.0301253 
 6 nsrr01 EEG 2.5 16.9778064

We can ignore very slow and faster frequencies for now:

d <- d[ d$F >= 0.5 & d$F <= 30 , ]

and create 3 plots, on a log-scale, of

par(mfcol=c(1,3))
ids <- unique(d$ID)
for (i in ids) plot( d$F[ d$ID == i ] , log( d$PSD[ d$ID == i ] ) , 
  type="l" , xlab="Hz" , ylab="log(uV^2/Hz)" , lwd=2,main=i) 

img

Especially for the second individual, we see the characteristic peak in the sigma band (11-15Hz), typically representing spindle activity during stage 2 NREM sleep.

As a side-note: naturally, without filtering and artifact detection, especially if including many waking epochs at the start and end of recordings, the power spectra can be very noisy, indicating many sources of (typically non-biological) artifact. For example, this command estimates the PSD without any prior filtering:

luna s.lst @vars.txt sig=EEG1,EEG2 -o psd-all.db \
   -s "EPOCH & MASK if=wake & PSD spectrum max=62"
destrat psd-all.db +PSD -r CH F -v PSD > res.psd-all

In R:

d <- read.table( "res.psd-all",header=T,stringsAsFactors=F)
par(mfrow=c(2,3))
ids <- unique(d$ID)
for (ch in c("EEG1","EEG2") )
 for (i in ids) 
   plot( d$F[ d$ID == i & d$CH == ch ] , log( d$PSD[ d$ID == i & d$CH == ch ] ) , 
            type="l" , xlab="Hz" , ylab="log(uV^2/Hz)" , lwd=2,main=paste(i,ch))

img

Especially for nsrr02, there is clearly a great deal of 50 and 60Hz line noise in the signals. Visual inspection of the signals, especially EEG2 reveals other artifacts (below). Depending on the type of analyses one wishes to perform, careful filtering and/or artifact detection/correction will be important.

Finally, as we noted above, the last line of aux/seventh.txt additionally instructed Luna to detect spindles:

 SPINDLES method=wavelet fc=11,15  ftr-dir=annots/

The fc options specify the targeted frequencies: slower spindles at 11 Hz and faster spindles at 15 Hz. The ftr-dir options request that an annotation file be generated (for each individual) to represent the spindle calls -- these can be used as filters, or with Scope, as we'll see below. To extract the spindle density (count per minute, DENS) we use the following command:

destrat out.db +SPINDLES -r CH -c F -v DENS -p 2 
ID       CH     DENS.F.11   DENS.F.15
nsrr01   EEG    0.80        0.66
nsrr02   EEG    1.44        2.23
nsrr03   EEG    0.84        0.24

Because we added the ftr-dir option, Luna generated some feature lists -- a type of annotation format -- that represent the spindle calls. These can be used as standard annotation files, by Luna and, as we'll see in the next section, by lunaR.

There are a large number of other options and output variables for the SPINDLES and other commands, described in the Reference pages of this web site.