After downloading and successfully installing Luna, the following tutorial provides a good starting point to familiarize oneself with its syntax, commands, features and general philosophy. The first section jumps straight in, to demonstrate a handful of commands quickly. The second section loops back over the same material, but aiming to give more context and detail. A final section extends the range of commands used to some more useful analyses of the sleep EEG.

This tutorial currently assumes a Mac OS X
or Linux operating system. Support for
Windows will be added in the future.

This tutorial, based on tutorials at the National Sleep Research Resource (http://sleepdata.org), involves looking at 3 polysomnograms. These data are recorded as EDFs (containing signal data, including EEG, ECG and EMG), along with their accompanying annotation files (including manual sleep staging, and recorded apneas, hypopnea, movements and other events). Here we will initially summarize the contents of the files, looking at EDF headers, calculating signal statistics and enumerating types of annotated events, following the NSRR tutorials. We will then explore other features of Luna, including masking, filtering and exporting signals, along with other types of statistical summary measure. We continue to show methods for working with sleep EEG, in particular automated artifact detection, spectral and spindle analysis. We also illustrate the scope tool to visualize data and annotations.

1.  Quick Start

1.1  The data

We'll follow the tutorials at the National Sleep Research Resource (http://sleepdata.org) page, which are based around three test EDFs. These data should be downloaded directly from the link below:

  ZIP archive (67Mb) containing 3 EDFs, XML annotation files and 'sample list':

After downloading, move the ZIP file to your working directory, and unzip the contents:

  unzip tutorial.zip 

This should create a folder edfs with six files (learn-nsrr01.edf, learn-nsrr01-profusion.xml, etc) and a sample-list file s.lst which defines a project based on these three files:

cat s.lst

  nsrr01	edfs/learn-nsrr01.edf	edfs/learn-nsrr01-profusion.xml
  nsrr02	edfs/learn-nsrr02.edf	edfs/learn-nsrr02-profusion.xml 
  nsrr03	edfs/learn-nsrr03.edf	edfs/learn-nsrr03-profusion.xml

1.2  Displaying the contents of an EDF file

Follow the instructions here to install Luna and place it in your command path, if you haven't already done so. Luna is invoked via the luna command typed at the prompt. To see if Luna is properly installed, and to test that the EDFs are correctly downloaded, run the following command, which applies the DESC command (to generate a simple description of each EDF) to each EDF specified in the s.lst project, writing some standard text to the console (as shown below) but saving the primary output to a text file res.txt:

luna s.lst -s DESC > res.txt

EDF source: s.lst
Output DB : .
 c1	DESC	

Processing EDF nsrr01 [ #1 ]
 total duration 11:22:00, with last time-point at 11:22:00
 40920 records, each of 1 second(s)
 14 (of 14) signals selected in a standard EDF file:
  SaO2 | PR | EEG(sec) | ECG | EMG | EOG(L) | EOG(R) | EEG

... (cont'd) ...

...processed 3 EDFs, done.
...processed 1 command(s),  all of which passed

As well as summarizing the EDF headers, Luna validates the structure of each file (e.g. whether it is the expected size given the specified signals) and will report errors. Viewing the contents of the res.txt file, for example using a command such as cat or less, or in any text editor, we see the description of each EDF header generated by the DESC command:

cat res.txt

EDF file          : edfs/learn-nsrr01.edf
ID                : nsrr01
Duration          : 11:22:00
Number of signals : 14

EDF file          : edfs/learn-nsrr02.edf
ID                : nsrr02
Duration          : 09:57:30
Number of signals : 14

EDF file          : edfs/learn-nsrr03.edf
ID                : nsrr03
Duration          : 11:22:00
Number of signals : 14

In other words, each EDF contains 14 signals, spanning approximately 10 or 11 hours of sleep. The IDs (nsrr01, nsrr02 and nsrr03) are as specified in the first column of the s.lst file. Although EDF headers contain a field corresponding to Patient ID, this is not used internally by Luna. (Note: the DESC command produces much the same information as is logged on the console when running any Luna command, just in a slightly different format.)

1.3  Signal summary statistics

To begin working with the signals contained within each EDF, here we use the STATS command to generate basic statistics (mean, min, max and standard deviation) per channel, following this NSRR tutorial.

luna s.lst -s STATS

nsrr01	STATS	CH/SaO2	.	MIN	0.10071
nsrr01	STATS	CH/SaO2	.	MAX	99.1196
nsrr01	STATS	CH/SaO2	.	MEAN	76.9242
nsrr01	STATS	CH/SaO2	.	SD	37.4744
nsrr01	STATS	CH/SaO2	.	RMS	85.5665
nsrr01	STATS	CH/PR	.	MIN	0.201419
nsrr01	STATS	CH/PR	.	MAX	200
nsrr01	STATS	CH/PR	.	MEAN	57.3485
nsrr01	STATS	CH/PR	.	SD	30.4955
nsrr01	STATS	CH/PR	.	RMS	64.9523

... (cont'd)

For example, for the first individual nsrr01, the SaO2 channel has a mean of 76.9242 and a standard deviation of 37.4744. Here the output is formatted in a standard way, described below, used by most other Luna commands, but it is quite verbose and not easily readable. In practice, Luna is designed to work with a companion tool, destrat, to record and present Luna output. Here we re-run the command saving results to a database res.db:

luna s.lst -o res.db -s STATS

and then use destrat to extract only the signal means for the ECG, EMG and SaO2 channels:

destrat res.db -x -c CH/ECG,EMG,SaO2 -p 3 -v MEAN

nsrr01	0.009	-6.856	76.924
nsrr02	0.006	-0.610	77.873
nsrr03	0.004	3.014	65.083

More complicated than it needs to be? Perhaps for this simple example, yes. However, the value of using destrat and its so-called stout (stratified output) databases becomes apparent when working with larger and more complex result sets. In practice, results may be stratified by one or more arbitrary factors (e.g. channel, sleep stage, frequency or power band, epoch, event or class of annotation) and reside across multiple output database files. We'll use destrat later in this tutorial to handle these situations, as described below.

Comparing these results to the NSRR tutorial, encouragingly we see similar estimates for these quantities.

1.4  Working with annotations

Parallel to this NSRR tutorial, here we use Luna to summarize the contents of an XML annotation file, which are structured as Compumedics Profusion files, as described here.

We can take a quick look at the types of events in an annotation file with the special --xml command, which displays the time in seconds and duration of each event/annotation, the type of annotation, if present, and then name:

luna --xml edfs/learn-nsrr01-profusion.xml

.       .       EpochLength     30
0 - 30  (30 secs)       SleepStage      Wake
30 - 60 (30 secs)       SleepStage      Wake
60 - 90 (30 secs)       SleepStage      Wake
90 - 120        (30 secs)       SleepStage      Wake
120 - 150       (30 secs)       SleepStage      Wake
150 - 180       (30 secs)       SleepStage      Wake
180 - 210       (30 secs)       SleepStage      Wake
210 - 240       (30 secs)       SleepStage      Wake
240 - 270       (30 secs)       SleepStage      Wake

... (cont'd)

2700 - 2730     (30 secs)       SleepStage      NREM2
2711.8 - 2718   (6.2 secs)      .       Arousal ()      
2720.1 - 2739.3 (19.2 secs)     .       Hypopnea        
2730 - 2760     (30 secs)       SleepStage      NREM2
2747.2 - 2751.5 (4.3 secs)      .       Arousal ()      
2750.1 - 2769.3 (19.2 secs)     .       SpO2 desaturation       
2752.6 - 2777.4 (24.8 secs)     .       Hypopnea        
2760 - 2790     (30 secs)       SleepStage      NREM2
2779 - 2784     (5 secs)        .       Arousal ()      
2782.6 - 2807.4 (24.8 secs)     .       SpO2 desaturation       
2790 - 2820     (30 secs)       SleepStage      NREM2

... (cont'd)

As shown below, the information in these XML files can be directly used to extract or exclude epochs, via the MASK command. (Alternatively, these types of annotations can be specified using simple text files instead of the XML used by Compumedics/NSRR.)

The ANNOTS command summarizes the number and duration (in seconds) of all annotations for an individual recording. We can use this, for example, to count the number of obstructive apneas. Whereas this can be done easily from the output of the --xml command, as shown above, an advantage of the ANNOTS command is that other masks can be applied: for example, counting on apneas that occur during REM sleep.

We send the output to a stout database called annot.db,

luna s.lst -o annot.db -s ANNOTS

and extract it using destrat, noting that the results are stratified by ANN, the annotation name:

destrat annot.db -x -r ANN

If we are only interested in the count of obstructive apnea events, this can be specified on the command line: (note, because the annotation name has a space character in it, you need to use quotes):

destrat annot.db -x -r ANN/"Obstructive Apnea" -v COUNT

ID     ANN      COUNT
nsrr01 Obstructive Apnea       37
nsrr02 Obstructive Apnea       5
nsrr03 Obstructive Apnea       163

To count only apneas that occur during REM sleep, we can epoch the signal (the EPOCH command) and add a mask (the MASK command) that uses the staging information present in the XML. Note how in this example we string together multiple commands, each separated by the & character, with the entire script following -s now in quotes:

luna s.lst -o annot2.db -s "EPOCH & MASK ifnot=REM & ANNOTS"

destrat annot2.db -x -r ANN/"Obstructive Apnea" -v COUNT

ID     ANN       COUNT
nsrr01 Obstructive Apnea       27
nsrr02 Obstructive Apnea       3

(Note that there is no output for the last individual, as they do not have any REM epochs.)

By default, the ANNOT command will include annotations that show any overlap with a REM epoch. To include only events that start during a REM epoch, add the start option as follows:

luna s.lst -o annot3.db -s "EPOCH & MASK ifnot=REM & ANNOTS start"

destrat annot3.db -x -r ANN/"Obstructive Apnea" -v COUNT

ID     ANN       COUNT
nsrr01 Obstructive Apnea       26
nsrr02 Obstructive Apnea       2

1.5  Putting it together: stage-specific signal summaries

We'll end this first section by combining STATS command with the annotation information in the XML, to generate stage-specific signal statistics. We give Luna a multi-part set of commands from a separate file rms.txt in this example, rather than on the command-line following the -s argument. The command file (Attach:rms.txt) introduces a number of new commands (each separated by a newline):

 EPOCH epoch=30
 TAG tag=${stage}/STAGE
 MASK ifnot=${stage} 

First, EPOCH specifies that non-overlapping, 30-second epochs should be applied to each EDF. We then set a TAG, which helps keep track of the output, as we'll see below. The value of the tag here takes a special form: level / factor where factor indicates sleep stage. Rather than being hard-coded in the file, the level is specified as a variable, in the form ${variable}. A specific value for ${stage} is given on the command line when Luna is invoked, as below. The next command masks (that is, excludes) any epoch which does not have an annotation that matches ${stage}. That is, if ${stage} is NREM2, then only NREM2 epochs are included in the analysis. After setting mask values for one or more epochs, the RESTRUCTURE command removes any masked epochs. Finally, the STATS command is invoked, but this time it will consider only epochs that match the specified sleep stage.

The rms.txt script is given as the input to Luna (i.e. using the standard input redirection operator, < ). We specify that the output should go to a database called rms.db, by virtue of the -o option. We set define the value of the stage variable (here as NREM1), which will be expected when processing the TAG and MASK commands. Finally, we set a special variable, signal, which instructs Luna to only consider the first EEG channel (labelled EEG, as indicated by the DESC command):

luna s.lst -o rms.db stage=NREM1 signal=EEG < rms.txt

We now run Luna three more times, for the remaining sleep stages. Note that all commands send their output to the same database (labelled here as rms.db), which will accumulate all output.

luna s.lst -o rms.db stage=NREM2 signal=EEG < rms.txt

luna s.lst -o rms.db stage=NREM3 signal=EEG < rms.txt

luna s.lst -o rms.db stage=REM signal=EEG < rms.txt

To view the contents of rms.db, we run destrat:

destrat rms.db

rms.db: 20 command(s), 3 individual(s), 15 variable(s), 175 values
  command #1:	c1	Wed Aug  2 11:35:28 2017	EPOCH	
  command #2:	c2	Wed Aug  2 11:35:28 2017	TAG	
  command #3:	c3	Wed Aug  2 11:35:28 2017	MASK	
  command #4:	c4	Wed Aug  2 11:35:28 2017	RESTRUCTURE	
 ... cont'd ...
distinct strata group(s):
  STAGE : 4 level(s) :  DUR1 DUR2 NR1 NR2
  CH STAGE : 4 level(s) :  MAX MEAN MIN RMS SD

which lists

  1. the number of commands, individuals and variables in the database,
  2. a list of the commands and their time-stamps, and
  3. the strata groups in this database.

The TAG command specified that a user-defined stratum, called STAGE, be added to the output. This was set to either NREM1, NREM2, etc, corresponding to the sleep stage values in the XML file. There are therefore 4 levels for the factor STAGE. We can view the RMS statistics, which are grouped by channel (CH) and sleep stage (STAGE), as follows:

destrat rms.db -x -r CH -c STAGE -v RMS -p 3

nsrr01	EEG	7.355	10.646	13.250	7.564
nsrr02	EEG	10.362	14.742	20.055	14.146
nsrr03	EEG	12.302	14.497	18.980	NA

The options for destrat are described below more fully. Briefly, the command selects the variable RMS (-v), optionally formats numeric output to 3 decimal places (-p), sets row strata to correspond to channels (-r) and column strata to correspond to stages (-c), and requests to extract the values (-x). We can also output the number of records in the EDF for each stage/individual, as this value is output after any RESTRUCTURE command: DUR2 is the duration in seconds after a given restructuring:

destrat rms.db -c STAGE -x -v DUR2

nsrr01	3270	15690	480	7140
nsrr02	330	11970	5550	3600
nsrr03	1560	11250	630	0

This explains the NA (not available) output for the RMS for the third individual's REM sleep: they did not have any REM sleep that night. If you examine the contents of the log output (i.e. sent straight to the console when running the command, you'll see this is noted here also:)

 set masking mode to 'force'
 no instances of annotation [REM] for nsrr03
 based on REM 0 epochs match;  newly masked 1364 epochs, unmasked 0 and left 0 unchanged
 total of 0 of 1364 retained for analysis

Looking at the results for the stage-specific RMS analysis, even without filtering and artifact detection of this signal, the results seem plausible: we generally see higher RMS, corresponding to more activity due to large amplitude slow waves during NREM3 sleep. We return to this theme when considering spectral analyses of the EEG, below.

2.  Take Two

This is a slightly more technical section, in which we flesh out a little more of the details surrounding some of the commands, formats and functions used above.

2.1  The 'sample-list' file: specifying the project

The sample-list file (s.lst in this tutorial) is a tab-delimited plain-text file, that defines a project, that is, a set of EDFs and annotation files. Sample-lists have one row per individual/EDF, with two or more tab-delimited columns: ID, EDF file path, and optionally, one or more annotation files or folders. In this example, the s.lst file specifies 3 EDFs and their corresponding annotation files (the XML files).

Use a text editor (e.g. TextEdit on Mac, if not emacs or similar), not a word processor, to generate or edit these files. Here the sample list has been automatically generated and you should not need to change it if running the tutorial from the same directory that contains this file. Be aware that spaces will be interpreted as part of the filename -- so make sure not to introduce any superfluous whitespace or special characters, as it will mean that Luna may not be able to locate your data.

A note on file naming conventions. For convenience, here the files are referenced by their relative paths, but which means that you need to run Luna from the folder containing this file. Alternatively, one can use absolute paths in a project file, but they will need to be changed if the data are moved, e.g. to a different machine. Also note that the file separator is assumed to be a / character, which should be the case for most operating systems.

2.2  Luna syntax: command-line arguments

Luna expects the following arguments:

  1. always as the first argument, EITHER a sample list (i.e. s.lst in this example), OR a single EDF file (that ends .edf or .EDF)
  2. one or more commands, either via standard input or specified after the -s argument. If the -s is specified, it must be the final option given. That is, all other arguments are interpreted as Luna commands rather than arguments.
  3. optionally, if the first argument is a sample list, then either one (i) or two (i and j) integers, to specify that only the i'th or the i'th to the j'th rows from the sample-list should be processed
  4. optionally, a series of variable assignments, (e.g. sr=100 or signal=EEG1)
  5. optionally, a parameter file prefixed with an at sign: for example, @param.txt. A parameter file lists variable definitions in a plain-text file, with one variable name/value pair per line, separated by a tab character. This is simply a convenience feature, and is equivalent to entering the same variable=value pairs on the command line.
  6. optionally, a stout output database following the -o argument; if it doesn't exist it will be created, otherwise output will be appended

There are a few exceptions: for example, some special commands (e.g. --xml as illustrated above) do not require a sample list to be specified, and all other options will be ignored.

2.3  Luna commands

Luna commands refer to the names of Luna functions (summarized here), i.e. operations performed on the EDFs. (Note: this is distinct from the arguments refers to the command line options that determine how Luna runs, described in the previous section.)

By default, Luna iterates through all individual EDFs in the sample-list, applying the requested command(s) as. All command names should be UPPERCASE. Multiple commands can be separated by the ampersand character (&). When commands are given via standard input, a newline character also separates commands. Options that follow commands are typically lowercase, and are either single keywords or variable=value pairs. When submitting multiple commands via the command line -s option, you will want to use quotes to stop the & being interpreted as a shell directive. For example, here we run two commands: EPOCH and STATS.

 luna s.lst -s "EPOCH epoch=20 & STATS epoch" 

Note that epoch is also an option for both EPOCH and STATS commands: in the first instance, it determines the length (in seconds) of each epoch; in the second case, it requests that per-epoch statistics are generated.

By default, all commands are assumed to come from standard input, unless the -s is given (as above). If the file command.txt contains only the word DESC, then all four invocations of Luna below are identical:

 cat command.txt | luna s.lst > results.txt 

 luna s.lst < command.txt > results.txt 

 echo "DESC" | luna s.lst > results.txt 

 luna s.lst -s DESC > results.txt 

If you are not familiar with pipes and redirection, the following tutorial (or any other web-search) will be helpful.

2.4  Output format

By default, all output goes to standard out, i.e. the console/terminal by default. When outputting plain text (instead of using a stout database, see below), most Luna commands generate output in a fixed format. (The initial DESC command is in fact an exception, as one of the few commands that generate a simple, 'human-readable' text file.) The standard format comprises 6 tab-delimited columns, with one row per value:

 nsrr01	STATS	CH/SaO2	.	MIN	0.10071
 nsrr01	STATS	CH/SaO2	.	MAX	99.1196
 nsrr01	STATS	CH/SaO2	.	MEAN	76.9242

The 6 columns represent

  1. Individual identifier/ID from the sample-list file
  2. The Luna command that is generating this output
  3. Any stratifying factors that logically organize the output (e.g. by channel, or by sleep stage and frequency range).
  4. Any temporal stratifying factors generated by Luna (e.g. epoch, time-point, event number)
  5. The variable name
  6. The value for that variable (for that individual, for that combination of stratifying factors)

If there are no stratifiers, columns 3 and 4 are set to empty(.). Here the strata are defined by a single factor (CH which indicates the channel). Strata for a single factor are represented as factor/level where level is the name of a particular channel (i.e. one of the 14 for these EDFs). The fifth and sixth columns then give the variable name and value for that person/strata combination. For example, the MEAN value for the SaO2 channel is 76.9242, as given on the third row of the output. This format is verbose but designed to be easily parsed by standard text processing tools: e.g. to see only the MEAN values, using the ubiquitous awk tool:

awk ' $5 == "MEAN" { print $1, $3, $6 } ' res.txt

nsrr01	CH/SaO2	76.9242
nsrr01	CH/PR	57.3485
nsrr01	CH/EEG(sec)	1.18558
nsrr01	CH/ECG	0.00939557
nsrr01	CH/EMG	-6.8557
nsrr01	CH/EOG(L)	0.472551
nsrr01	CH/EOG(R)	0.321447
nsrr01	CH/EEG	-0.301199
nsrr01	CH/AIRFLOW	0.0664418
... (cont'd)

2.5  Using destrat

If the -o argument is given to Luna, instead of the column-based text output format described above, all output is sent to a stout database, which is designed for handling stratified output. The tool destrat is designed to extract information from such databases, and de-stratify it, to produce simply, rectangular tables that can be read into other analysis programs or spreadsheets. For example, consider the following command:

luna s.lst signal=EEG,ECG,EMG -o stout.db -s "EPOCH & STATS epoch"

This generates a database, named stout.db as requested. The destrat program will report on the contents of this file:

destrat stout.db

stout.db: 2 command(s), 3 individual(s), 15 variable(s), 58980 values
  command #1:	c1	Wed Aug  2 15:22:44 2017	EPOCH	
  command #2:	c2	Wed Aug  2 15:22:44 2017	STATS	
distinct strata group(s):
  E CH : 3 level(s) :  MAX MEAN MIN RMS SD

That is, we see that 2 commands were performed, generating output for 3 individuals. Different Luna commands will produce different levels of stratified output, depending on how they are run. By default, the STATS command produces one set of values for each channel. This is reflected in the strata group labelled CH. The 3 levels of this factor are the 3 channels specified in the command (i.e. EEG, ECG and EMG). The five core variables (MIN, MAX, MEAN, SD and RMS ) are equivalent to the results generated at the start of this tutorial using the STATS command: that is, whole-signal statistics.

Adding the epoch optional to the Luna STATS command generates some additional output also:

  1. per-epoch values for these five measures (MIN, MAX, MEAN, SD and RMS) -- these values are stratified by both channel (CH) and epoch (E)
  2. a set of additional whole-signal summaries, based on combining the per-epoch level data and taking the median value -- like the original output, these values are only stratified by channel CH

Specifically, the new variables are

  • the digital and physical min/max values from the EDF header (DMIN, DMAX, PMIN and PMAX)
  • the observed physical min/max values from the signal data ((OMIN, OMAX)
  • the median of the per-epoch mean (MEDIAN.MEAN), SD (MEDIAN.SD) and RMS (MEDIAN.RMS)
  • the unit of measurement from the EDF header (UNIT)

(Note: the goal of the STAT command's epoch option is primarily to generate per-epoch level data, although using the median of per-epoch means will be more robust to outliers compared to the whole-signal mean, which is why both are given.)

The -d option for destrat lists the data-dictionary, which helps to define the variables within:

destrat stout.db -d

stout.db	MIN	STATS	Signal minimum
stout.db	MAX	STATS	Signal maximum
stout.db	MEAN	STATS	Signal mean
stout.db	SD	STATS	Signal SD
stout.db	RMS	STATS	Signal RMS
stout.db	DMIN	STATS	Digital minimum from EDF header
stout.db	DMAX	STATS	Digital maximum from EDF header
stout.db	UNIT	STATS	Physical unit from EDF header
stout.db	PMIN	STATS	Physical minimum from EDF header
stout.db	PMAX	STATS	Physical maximum from EDF header
stout.db	OMIN	STATS	Observed physical minimum
stout.db	OMAX	STATS	Observed physical maximum
stout.db	MEDIAN.MEAN	STATS	Median of per-epoch means
stout.db	MEDIAN.SD	STATS	Median of per-epoch SDs
stout.db	MEDIAN.RMS	STATS	Median of per-epoch RMSs

To extract information on a subset of variables, use the -v option (where multiple variable names can be comma-delimited). The -r option specifies that channels (CH) are listed as separate rows in the output. The -x instructs destrat to output the values (extract) rather than summarize the variables present at that strata.

destrat stout.db -r CH -v RMS -x

nsrr01	ECG	0.266660881869383
nsrr01	EEG	37.801350632511
nsrr01	EMG	13.9700738671353
nsrr02	ECG	0.274496551353303
nsrr02	EEG	41.0442336241345
nsrr02	EMG	19.7064640585027
nsrr03	ECG	0.300029835524692
nsrr03	EEG	54.2707996328254
nsrr03	EMG	18.8981013309648

Running the same command but swapping from rows to columns (i.e. -c instead of -r), we get the following:

destrat stout.db -c CH -v RMS -x

nsrr01	0.266660881869383	37.801350632511	13.9700738671353
nsrr02	0.274496551353303	41.0442336241345	19.7064640585027
nsrr03	0.300029835524692	54.2707996328254	18.8981013309648

By default, all decimal places are given for numeric data; this can be modified with the -p option:

destrat stout.db -c CH -v RMS -x -p 2

nsrr01	0.27	37.80	13.97
nsrr02	0.27	41.04	19.71
nsrr03	0.30	54.27	18.90

The -i option can be used to subset results to one or more individuals: in this case, nsrr02. Also, note here that we are requesting different values -- although we request the same variable (RMS), the strata are defined by channel (CH) and epoch (E). That is, these values are the per-epoch RMS values for each of the three channels. We can use both -r and -c to organize the output: e.g. arrange values for different channels as columns, but epochs as rows, as below. New variable names are created in the form {VAR}.{FAC}.{LVL} where {VAR} is the original variable name, {FAC} is the column-factor (i.e. CH), and {LVL} is the level for that factor (e.g. ECG). Multiple column-stratifiers can be combined in this way, e.g. -c FAC1 FAC2).

destrat stout.db -r E -c CH -v RMS -x -p 2 -i nsrr02

nsrr02	1	0.27	11.98	4.91
nsrr02	2	0.28	15.88	9.91
nsrr02	3	0.27	12.47	22.73
nsrr02	4	0.26	11.35	13.61
nsrr02	5	0.26	12.29	19.06
nsrr02	6	0.27	15.23	23.92
nsrr02	7	0.28	14.47	20.08
nsrr02	8	0.30	14.88	23.88
nsrr02	9	0.30	19.56	22.70

Alternatively, here we run the same command but with both epoch and channel as row-stratifiers, which may be more convenient for some types of downstream analysis.

destrat stout.db -r E CH -v RMS -x -p 2 -i nsrr02

nsrr02	ECG	1	0.27
nsrr02	ECG	2	0.28
nsrr02	ECG	3	0.27
nsrr02	ECG	4	0.26
nsrr02	ECG	5	0.26
nsrr02	ECG	6	0.27
nsrr02	ECG	7	0.28
nsrr02	ECG	8	0.30
nsrr02	ECG	9	0.30
 ... cont'd ...  
nsrr02  ECG     1193    1.08
nsrr02  ECG     1194    1.02
nsrr02  ECG     1195    0.13
nsrr02  EEG     1       11.98
nsrr02  EEG     2       15.88
nsrr02  EEG     3       12.47
  ... cont'd ...  

Finally, destrat can combine results across multiple databases, if these are listed on the command line. For example, if we were to run different channels and individuals separately, recorded as two databases:

The first two individuals for the EEG channel: luna s.lst 1 2 signal=EEG -o p1.db -s STATS

Then the second and third individuals for ECG and EMG:

luna s.lst 2 3 signal=ECG,EMG -o p2.db -s STATS

Finally, all individuals for SaO2:

luna s.lst signal=SaO2 -o p3.db -s STATS

Multiple databases can be specified on the destrat command line, and the output is compiled into a single file:

destrat p1.db p2.db p3.db -v MEAN RMS -x -r CH -p 3

nsrr01	EEG	-0.301	37.801
nsrr01	SaO2	76.924	85.567
nsrr02	EEG	-0.370	41.044
nsrr02	ECG	0.006	0.274
nsrr02	EMG	-0.610	19.706
nsrr02	SaO2	77.873	86.460
nsrr03	ECG	0.004	0.300
nsrr03	EMG	3.014	18.898
nsrr03	SaO2	65.083	77.731

Note: currently, if drawing from multiple databases, only row-based formatting can be requested (i.e. -r and not -c).

Note: if the same variable/individual/strata combination appears more than once (either within a single database, or across multiple databases), then the last encountered will be used. (We can add a flag/warning to indicate if this occurs in the future.)

2.6  Parameter files

As we have seen, Luna can accept variables in a command file. These can be defined on the command line (as variable=value pairs), but they can also be included in a parameter file, which defines these. For example,

 alias	EEG1|EEG
 alias	EEG2|EEG(sec)
 eeg	EEG1,EEG2
 myepoch   10
 arouse	"Arousal ()"
 hyp	Hypopnea
 oa	"Obstructive Apnea"
 desat	"SpO2 desaturation"

All parameter file rows must be exactly two tab-delimited columns. The first contains a variable name, the second contains the value. For example, ${hyp} is defined as a variable for the annotation Hypopnea. This can be useful to keep command scripts generic (i.e applicable to different studies, which may have different wording for a given annotation or signal), by supplying a project-specific parameter file.

Here alias is a reserved keyword that for signals can be used to specify alternate names for the same signal. It expects a pipe-delimited set of values (2 or more) in the second column, indicating that the first value (e.g. EEG1) is the preferred name, but that EEG means that same thing. If a signal called EEG is encountered, it is replaced to EEG1 internally, and in the output.

As a shortcut, the variable ${eeg} is defined here to indicate both EEG channels, comma-delimited and labelled here by their aliases: EEG1,EEG2.

So, if the file cmd1.txt is as follows:

 EPOCH epoch=${epoch}
 MASK ifnot=${oa} 
 STATS signal=${eeg}

then the following command will calculate statistics for both EEG channels, label them as "EEG1" and "EEG2", only for 10-second epochs that contain at least one "Obstructive apnea" event, only for the individual nsrr01:

luna s.lst nsrr01 @vars.txt -o res.db < cmd1.txt

EDF source: s.lst
Output DB : .
 c1	EPOCH	epoch=10
 c2	MASK	ifnot="Obstructive Apnea"
 c4	STATS	signal=EEG1,EEG2

Processing EDF nsrr01 [ #1 ]
 total duration 11:22:00, with last time-point at 11:22:00
 40920 records, each of 1 second(s)
 14 (of 14) signals selected in a standard EDF file:
  SaO2 | PR | EEG(sec) | ECG | EMG | EOG(L) | EOG(R) | EEG
 set epochs, length 10 (step 10), 4092 epochs
 set masking mode to 'force'
 based on Obstructive Apnea 121 epochs match;  newly masked 3971 epochs, unmasked 0 and left 121 unchanged
 total of 121 of 4092 retained for analysis
 restructuring as an EDF+ : keeping 1210 records of 40920, resetting mask
 retaining 121 epochs
 processing EEG1 ...
 processing EEG2 ...

...processed 1 EDFs, done.
...processed 1 command(s),  all of which passed

3.  A deeper dive

In this third section of the tutorial, we use Luna to calculate some typical metrics of interest for sleep studies, including:

  • how basic statistics vary across the night
  • measures of sleep macro-architecture
  • filtering and artifact detection in the EEG
  • spectral analysis using fast Fourier transforms
  • spindle detection using wavelets

3.1  Epoch-level summary statistics

Returning to the STATS command described above, here we use it with the epoch option, in order to generate per-epoch statistics. This section also illustrates working with the text output format.

luna s.lst 2 signal=EMG,ECG -s "EPOCH & STATS epoch" > res.txt

EDF source: s.lst
Output DB : .
signals: ECG
signals: EMG
 c1	EPOCH	
 c2	STATS	epoch=T

Processing EDF nsrr02 [ #2 ]
 total duration 09:57:30, with last time-point at 09:57:30
 35850 records, each of 1 second(s)
 2 (of 14) signals selected in a standard EDF file:
 set epochs, length 30 (step 30), 1195 epochs
 processing ECG ...
 processing EMG ...

...processed 1 EDFs, done.
...processed 1 command(s),  all of which passed

This command applies only to the second EDF in the sample-list (i.e. the 2 argument), and only to the ECG and EMG signals. After defining 30-seond epochs (the default from the EPOCH command), it generates basic statistics per-epoch (the eppch option to the STATS command), as well as entire-signal summaries. Note that 1195 epochs are generated, which corresponds to the stated duration of 9 hours, 57 and a half minutes (i.e. ( 9 * 60 + 57 ) *2 + 1 = 1195).

The first few lines of res.txt are as follows:

 nsrr02  STATS   CH/ECG  .       MIN     -1.2402
 nsrr02  STATS   CH/ECG  .       MAX     1.25
 nsrr02  STATS   CH/ECG  .       MEAN    0.00567068
 nsrr02  STATS   CH/ECG  .       SD      0.274438
 nsrr02  STATS   CH/ECG  .       RMS     0.274497
 nsrr02  STATS   CH/ECG  1       MIN     -0.651961
 nsrr02  STATS   CH/ECG  1       MAX     1.25
 nsrr02  STATS   CH/ECG  1       MEAN    0.00506797
 nsrr02  STATS   CH/ECG  1       SD      0.271021
 nsrr02  STATS   CH/ECG  1       RMS     0.271051
 nsrr02  STATS   CH/ECG  2       MIN     -0.720588
 nsrr02  STATS   CH/ECG  2       MAX     1.25
 nsrr02  STATS   CH/ECG  2       MEAN    0.00471634
 nsrr02  STATS   CH/ECG  2       SD      0.279001
 nsrr02  STATS   CH/ECG  2       RMS     0.279023

That is, the first five lines (for which the 4th column is empty, .) are the whole-signal averages for the ECG. The subsequent lines are the per-epoch values: e.g. the mean is 0.00506797 for the first epoch, 0.00471634 for the second, etc.

This format is designed to be easily parsed and read by other software. For example, to use R to plot the per-epoch RMS values of the ECG, we might use the following (from within R):

d <- read.table( "res.txt" , header=F , sep="\t" , stringsAsFactors=F )

After loading the data, it is convenient to relabel the 6 columns to have sensible names:

names(d) <- c( "ID" , "COMM" , "STRATA" , "T" , "VAR" , "VAL" )

Our goal is to generate a plot of per-epoch RMS for the ECG. We can extract these values by filtering rows where VAR equals "RMS", when T is not ".", and when STRATA is "CH/ECG":

d <- d[ d$STRATA == "CH/ECG" & d$T != "." & d$VAR == "RMS" , ]

We can confirm that this produces the expected number of rows/epochs:


 [1] 1195    6

We need to ensure that the epoch and value fields are treated as numeric data points:

d$T <- as.numeric(d$T)  
d$VAL <- as.numeric(d$VAL)   

As a sanity-check, we can view the structure of the data frame:


 'data.frame':	1364 obs. of  6 variables:
  $ ID    : chr  "nsrr01" "nsrr01" "nsrr01" "nsrr01" ...
  $ COMM  : chr  "STATS" "STATS" "STATS" "STATS" ...
  $ STRATA: chr  "CH/ECG" "CH/ECG" "CH/ECG" "CH/ECG" ...
  $ T     : num  1 2 3 4 5 6 7 8 9 10 ...
  $ VAR   : chr  "RMS" "RMS" "RMS" "RMS" ...
  $ VAL   : num  0.0708 0.0706 0.0748 0.0698 0.0669 ...

To plot these per-epochs values:

plot( d$T , d$VAL , ylab = "RMS(ECG)" , xlab = "Epoch" , pch=20 )

Clearly there are a number of extreme points towards the end of the recording, presumably indicating bad data after the subject woke for the final time that night. We can confirm this by indicating the manually-annotated sleep stage for each point. Going back to the command line, we can use Luna to extract sleep stages in a format that can be read into R. Note -- this is not the preferred approach (see below), but in this particular instance, we might extract the per-epoch sleep staging as follows, directly from the XML using --xml:

luna --xml edfs/learn-nsrr02-profusion.xml | grep SleepStage | awk -F'\t' ' { print $4 } ' > ss.txt

We then load this sleep stage information into R as follows, which indicates that we have the expected number of epochs:

ss <- scan("ss.txt",what=as.character() )

Read 1195 items

This is the breakdown of epochs by each sleep stage for this individual.


   11   399   185   120   480 

Plotting epochs colored by sleep stage, we indeed confirm that the end aberrant epochs are all wake: (plot not shown).

plot( d$T , d$VAL  , ylab = "RMS(ECG)" , xlab = "Epoch" , pch=20 , col = as.factor( ss )  )
cols <- sort(unique(as.factor(ss)))
legend("topleft",legend=cols , fill = cols )  

Simply ignoring these final points (which roughly corresponds to looking only up to the 950th epoch in this example), we can generate a clearer plot of how the RMS of the ECG varies across the night in this individual, which tracks with the progression of sleep and wake in a semi-predictable manner:

plot( d$T[ 1:950 ] , d$VAL[ 1:950 ]  , ylab = "RMS(ECG)" , xlab = "Epoch" , pch=20 , col = as.factor( ss )  ) 
legend("bottomleft",legend=cols , fill = cols )  

3.2  Hypnograms and sleep macro-architecture

Given a set of epoch-based staging values, the STAGE command produces a series of statistics that describes different aspects of sleep macro-architecture.

luna s.lst -o stage.db -s "EPOCH & STAGE"

This generate a large number of variables, viewed here with the dictionary option of destrat (-d):

destrat stage.db -d

stage.db	LIGHTS_OUT	STAGE	Lights out time [0,24)
stage.db	SLEEP_ONSET	STAGE	Sleep onset time [0,24)
stage.db	SLEEP_MIDPOINT	STAGE	Sleep mid-point time [0,24)
stage.db	FINAL_WAKE	STAGE	Final wake time [0,24)
stage.db	LIGHTS_ON	STAGE	Lights on time [0,24)
stage.db	NREMC	STAGE	Number of NREM cycles
stage.db	NREMC_MINS	STAGE	Average NREM cycle duration (mins)
stage.db	TIB	STAGE	Time in Bed (hours): LIGHTS_OUT --> LIGHTS_ON
stage.db	TST	STAGE	Total Sleep Time (hours): SLEEP_ONSET --> FINAL_WAKE
stage.db	TPST	STAGE	Total persistent Sleep Time (hours): PERSISTENT_SLEEP_ONSET --> FINAL_WAKE
stage.db	TWT	STAGE	Total Wake Time (hours): all WAKE
stage.db	WASO	STAGE	Wake After Sleep Onset (hours)
stage.db	SLP_LAT	STAGE	Sleep latency
stage.db	PER_SLP_LAT	STAGE	Persistent sleep latency
stage.db	SLP_EFF	STAGE	Sleep efficiency: LIGHTS_OUT --> LIGHTS_ON
stage.db	SLP_MAIN_EFF	STAGE	Sleep maintainence efficiency
stage.db	SLP_EFF2	STAGE	Sleep efficiency: SLEEP_ONSET --> FINAL_WAKE
stage.db	REM_LAT	STAGE	REM latency (from SLEEP_ONSET)
stage.db	PCT_N1	STAGE	Proportion of sleep that is N1
stage.db	PCT_N2	STAGE	Proportion of sleep that is N2
stage.db	PCT_N3	STAGE	Proportion of sleep that is N3
stage.db	PCT_N4	STAGE	Proportion of sleep that is N4
stage.db	PCT_REM	STAGE	Proportion of sleep that is REM
stage.db	MINS_N1	STAGE	Proportion of sleep that is N1
stage.db	MINS_N2	STAGE	Proportion of sleep that is N2
stage.db	MINS_N3	STAGE	Proportion of sleep that is N3
stage.db	MINS_N4	STAGE	Proportion of sleep that is N4
stage.db	MINS_REM	STAGE	Proportion of sleep that is REM
stage.db	NREMC_START	STAGE	NREM cycle start epoch
stage.db	NREMC_NREM_MINS	STAGE	NREM cycle NREM duration (mins)
stage.db	NREMC_REM_MINS	STAGE	NREM cycle REM duration (mins)
stage.db	NREMC_OTHER_MINS	STAGE	NREM cycle other duration (mins)
stage.db	MINS	STAGE	Elapsed time since start of recording (minutes)
stage.db	CLOCK_TIME	STAGE	Clock time (hh:mm:ss)
stage.db	CLOCK_HOURS	STAGE	Clock time [0,24) hours
stage.db	STAGE	STAGE	Sleep stage, string label
stage.db	STAGE_N	STAGE	Sleep stage, numeric encoding
stage.db	E_WAKE	STAGE	Elapsed wake (mins)
stage.db	E_WASO	STAGE	Elapsed WASO (mins)
stage.db	E_SLEEP	STAGE	Elapsed sleep (mins)
stage.db	E_N1	STAGE	Elapsed N1 (mins)
stage.db	E_N2	STAGE	Elapsed N2 (mins)
stage.db	E_N3	STAGE	Elapsed N3 (mins)
stage.db	E_REM	STAGE	Elapsed REM (mins)
stage.db	PCT_E_SLEEP	STAGE	Elapsed sleep (percent of all sleep)
stage.db	PCT_E_N1	STAGE	Elapsed N1 (percent of all N1)
stage.db	PCT_E_N2	STAGE	Elapsed N2 (percent of all N2)
stage.db	PCT_E_N3	STAGE	Elapsed N3 (percent of all N3)
stage.db	PCT_E_REM	STAGE	Elapsed REM (percent of all REM)
stage.db	PERSISTENT_SLEEP	STAGE	Persistent sleep yes/no? (1=Y)
stage.db	CYCLE	STAGE	NREMC number
stage.db	CYCLE_POS_REL	STAGE	Position within NREMC, relative
stage.db	CYCLE_POS_ABS	STAGE	Position within NREMC, absolute (mins)
stage.db	FLANKING_SIM	STAGE	Number of similar epochs w.r.t. stage
stage.db	NEAREST_WAKE	STAGE	Number of epochs until the nearest wake
stage.db	NREM2REM	STAGE	If NREM epoch, number of NREM if next non-NREM is REM
stage.db	NREM2REM_TOTAL	STAGE	If NREM epoch, total number of contiguous NREM if next non-NREM is REM
stage.db	NREM2WAKE	STAGE	If NREM epoch, number of NREM if next non-NREM is WAKE
stage.db	NREM2WAKE_TOTAL	STAGE	If NREM epoch, total number of contiguous NREM if next non-NREM is WAKE
stage.db	N2_WGT	STAGE	Score for descending/ascending N2 epochs (-1 to +1)

There are three different strata groups in the output from STAGE:

destrat stage.db

stage.db: 2 command(s), 3 individual(s), 61 variable(s), 107894 values
  command #1:	c1	Fri Aug  4 07:32:53 2017	EPOCH	
  command #2:	c2	Fri Aug  4 07:32:53 2017	STAGE	
distinct strata group(s):

The first (<default>) group is a set of variables with no stratifiers: that is, simply one value per individual/EDF. These include minutes of N2 sleep (MINS_N2). To view the number of minutes of each sleep stage for the three individuals, and also the number of NREM cycles NREMC, we can use:

destrat stage.db -x -v MINS_N1 MINS_N2 MINS_N3 MINS_REM NREMC

nsrr01	54.5	261.5	8	119	6
nsrr02	5.5	199.5	92.5	60	5
nsrr03	26	187.5	10.5	0	3

To view cycle-level variables (C factor), we can use the following to see the duration (in minutes) and start (in epoch number):

destrat stage.db -r C -x -v NREMC_START NREMC_MINS

nsrr01	1	45	87
nsrr01	2	100.5	184
nsrr01	3	47	451
nsrr01	4	77.5	654
nsrr01	5	60	819
nsrr01	6	50	956
nsrr02	1	72.5	92
nsrr02	2	133	237
nsrr02	3	42	503
nsrr02	4	105.5	675
nsrr02	5	32.5	887
nsrr03	1	69	118
nsrr03	2	74	312
nsrr03	3	137.5	630

To illustrate extracting epoch-level information: here we consider the designated sleep stage (STAGE variable), along with the time (CLOCK_TIME) and a measure of how many nearby epochs had the same stage (FLANKING_SIM). The latter is the minimum number of contiguous epochs, either forwards or backwards in time, that are similar to the index epoch (truncated at the start and end of the recording). In other words, selecting epochs with 3 or more for this variable means that there are at least 3 epochs before and 3 epochs after with a similar stage. This can be used to select periods of sleep that are less likely to contain state transitions, for example.

destrat stage.db -i nsrr01 -r E -x -v STAGE FLANKING_SIM CLOCK_TIME

nsrr01	1	21:58:17	0	Wake
nsrr01	2	21:58:47	1	Wake
nsrr01	3	21:59:17	2	Wake
nsrr01	4	21:59:47	3	Wake
nsrr01	5	22:00:17	4	Wake
nsrr01	6	22:00:47	5	Wake
nsrr01	7	22:01:17	6	Wake
nsrr01	8	22:01:47	7	Wake
nsrr01	9	22:02:17	8	Wake
nsrr01	10	22:02:47	9	Wake
nsrr01	11	22:03:17	10	Wake
 ... cont'd ... 
nsrr01	183	23:29:17	0	NREM1
nsrr01	184	23:29:47	0	NREM2
nsrr01	185	23:30:17	0	Wake
nsrr01	186	23:30:47	0	NREM1
nsrr01	187	23:31:17	0	NREM2
nsrr01	188	23:31:47	1	NREM2
nsrr01	189	23:32:17	2	NREM2
nsrr01	190	23:32:47	3	NREM2
nsrr01	191	23:33:17	4	NREM2
nsrr01	192	23:33:47	5	NREM2
nsrr01	193	23:34:17	6	NREM2
nsrr01	194	23:34:47	7	NREM2
nsrr01	195	23:35:17	8	NREM2
nsrr01	196	23:35:47	9	NREM2
nsrr01	197	23:36:17	10	NREM2
nsrr01	198	23:36:47	10	NREM2
nsrr01	199	23:37:17	9	NREM2
nsrr01	200	23:37:47	8	NREM2
nsrr01	201	23:38:17	7	NREM2
nsrr01	202	23:38:47	6	NREM2
nsrr01	203	23:39:17	5	NREM2
nsrr01	204	23:39:47	4	NREM2
nsrr01	205	23:40:17	3	NREM2
nsrr01	206	23:40:47	2	NREM2
nsrr01	207	23:41:17	1	NREM2
nsrr01	208	23:41:47	0	NREM2
nsrr01	209	23:42:17	0	Wake
nsrr01	210	23:42:47	0	NREM2
 ... cont'd 

3.3  Epoch-level masks

To illustrate applying different epoch-level masks, consider the following (somewhat unrealistic) example: to select epochs that:

  • are in persistent sleep as defined above (i.e. at least 10 minutes of sleep prior)
  • occur between 11pm and 3am
  • are during stage 2 NREM sleep
  • do not contain any apnea or hypopnea events

For the first individual, we can extract the indicator (0 or 1 for N or Y) of persistent sleep: (we ignore the header and print the 3rd column)

 destrat stage.db -i nsrr01  -r E -x -v PERSISTENT_SLEEP  | awk ' NR>1 { print $3 } ' > ps-nsrr01.txt 

which generates a file ps-nsrr01.txt. The following Luna command script, cmd2.txt might then be used:

 MASK none
 FILE-MASK include=ps-^.txt
 MASK hms=23:00:00-03:00:00
 MASK mask-ifnot=NREM2
 MASK mask-if=${oa},${hyp}

This epochs the signals, in 30-second epochs by default. Next, we clear the mask (i.e. include all epochs). We then read a mask from a file (expecting 0 and 1), where 1 means to include that epoch (i.e. do not set the mask) and 0 means do not include (i.e. set the mask). Note that we've used a special character ^ in the script -- this is expanded into the current ID of the EDF (based on the first column of the sample-list), thereby providing a convenient way to use a single command script across multiple individuals.

Next, we use the hms flag to mask any epoch outside of the specified range. Note: you must use 24 hour hh:mm:ss notation, so 11pm is 23:00:00. We then mask any epoch that is not stage 2 NREM sleep. Note: here we used the mask-ifnot rather than ifnot flag: the difference is that the former will not unmask NREM2 epochs that were previously masked. Remember:

  • To mask or set the mask to true means to exclude that epoch.
  • To unmask or set the mask to false mean to include that epoch.

In this case, the

if=AIf an epoch has annotation A, force the mask to be set to true, but otherwise to false
mask-if=AIf an epoch has annotation A, force the mask to be set to true, otherwise leave as is
unmask-if=AIf an epoch has annotation A, force the mask to be set to false, otherwise leave as is
if=A,BIf an epoch has annotation A or B, force the mask to be set to true, but otherwise to false
mask-if=A,BIf an epoch has annotation A or B, mask that epoch, otherwise leave as is
unmask-if=A,BIf an epoch has annotation A or B, unmask that epoch, otherwise leave as is
ifnot=AIf an epoch does not have annotation A, force the mask to be set to true, but otherwise to false
mask-ifnot=AIf an epoch does not have annotation A, mask that epoch, otherwise leave as is
unmask-ifnot=AIf an epoch does not have annotation A, unmask that epoch, otherwise leave as is

That is, the mask- modifier means that epochs are only ever masked (no epoch is unmasked ever). Likewise, the unmask- modifier means that epochs are only ever unmasked (no mask is ever set).

Note: currently, only if (and not ifnot) statements can be given multiple annotations in a comma-delimited list. This will be changed in future. In the meantime, the same effect can always be achieved by using different options and multiple MASK commands. For example, to include only NREM1 and NREM2 epochs in an analysis, instead of

 MASK ifnot=NREM1,NREM2 

(which is not legal), one could write

 MASK all
 MASK unmask-if=NREM1,NREM2 


 MASK none
 MASK unmask-if=NREM1
 MASK unmask-if=NREM2 

which gives the desired results (109 + 523 = 632 epochs retained for analysis)

 set masking mode to 'unmask'
 based on NREM1 109 epochs match;  newly masked 0 epochs, unmasked 109 and left 1255 unchanged
 total of 109 of 1364 retained for analysis
 based on NREM2 523 epochs match;  newly masked 0 epochs, unmasked 523 and left 841 unchanged
 total of 632 of 1364 retained for analysis

Remember, the following would not achieve the desired effects:

 MASK none
 MASK mask-ifnot=NREM1
 MASK mask-ifnot=NREM2 

This combination would

  1. start by including/unmasking all epochs
  2. then, exclude/mask all epochs other than NREM1 (including NREM2)
  3. then, exclude/mask all epochs other than NREM2 (including NREM1)

meaning that all epochs would be masked. Looking at the output will show this:

 reset all 1364 epochs to be included
 set masking mode to 'mask' (default)
 based on NREM1 109 epochs match;  newly masked 1255 epochs, unmasked 0 and left 109 unchanged
 total of 109 of 1364 retained for analysis
 set masking mode to 'mask' (default)
 based on NREM2 523 epochs match;  newly masked 109 epochs, unmasked 0 and left 1255 unchanged
 total of 0 of 1364 retained for analysis

Bottom line: do not combine multiple ifnot masks.

3.4  Manipulating EDFs: filtering, resampling, relabelling, re-referencing and re-writing signal data

The following command script (manip.txt) illustrates the above features:

 RESAMPLE sr=100
 FILTER lower=0.3 upper=35
 REFERENCE ref=EEG1 signal=EEG2 
 WRITE edf-tag=v2 edf-dir=newedfs/ sample-list=new.lst

Using the following Luna command (note use of the parameter file vars.txt to define new labels EEG1 and EEG2):

luna s.lst @vars.txt signal=EEG1,EEG2 < manip.txt

Processing EDF nsrr01 [ #1 ]
 total duration 11:22:00, with last time-point at 11:22:00
 40920 records, each of 1 second(s)
 2 (of 14) signals selected in a standard EDF file:
  EEG2 | EEG1
 CMD #1: mV
 rescaled EEG1 to mV
 rescaled EEG2 to mV
 resampling channel EEG1 from sample rate 125 to 100
 resampling channel EEG2 from sample rate 125 to 100
 filtering channel EEG1
 filtering channel EEG2
 referencing EEG2 with respect to EEG1
 appending newedfs/learn-nsrr01-v2.edf to sample-list new.lst
 saved new EDF, newedfs/learn-nsrr01-v2.edf

This selects only the two EEG signals, based on their aliases EEG1 and EEG2. the mV command rescales any signals in uV and V units to mV, changing the EDF headers appropriately too. The RESEAMPLE command takes an argument sr which is the new sampling rate (i.e. set to 100Hz here). The FILTER command designs and applies a FIR filter with transition frequencies at 0.3 and 35 Hz.

The REFERENCE command re-references the signal data -- in this example, for illustration of the syntax only, we reference EEG2 relative to EEG1 (i.e. subtract EEG1 values from EEG2). Signals must have the same sampling rate to be referenced. Multiple, comma-delimited signals can be given as the ref, in which case the average of those values is used as the reference; if multiple signals are given for the signal command, then each signal is re-referenced with respect to whatever ref was specified.

Finally, the WRITE command generates a new EDF, that will reflect these changes. The edf-tag value is appended to the original filename for the EDF, and it is written to a directory edf-dir (if this doesn't exist, it will be created). A new sample-list new.lst is also generated, with each newly generated EDF being appended to it. (Note, values are only appended to the sample-list, to facilitate running Luna in parallel, so if new.lst already existed and was not empty, you might want to delete that file before running the above.)

We can inspect the new sets of EDFs, using the SUMMARY command:

cat new.lst

nsrr01	newedfs/learn-nsrr01-v2.edf
nsrr02	newedfs/learn-nsrr02-v2.edf
nsrr03	newedfs/learn-nsrr03-v2.edf

luna new.lst nsrr01 -s SUMMARY

Processing EDF nsrr01 [ #1 ]
 total duration 11:22:00, with last time-point at 11:22:00
 40920 records, each of 1 second(s)
 2 (of 2) signals selected in a standard EDF file:
  EEG2 | EEG1
EDF filename   : newedfs/learn-nsrr01-v2.edf
Patient ID     : 
Recording info : 
Start date     : 01.01.85
Start time     : 21.58.17

# signals      : 2
# records      : 40920
Duration       : 1

Signal 1 : [EEG2]
	# samples per record : 100
	transducer type      : 
	physical dimension   : mV
	min/max (phys)       : -0.25117/0.269126
	EDF min/max (phys)   : -0.25117/0.269126
	min/max (digital)    : -32768/32767
	EDF min/max (digital): -32768/32767
	pre-filtering        : 

Signal 2 : [EEG1]
	# samples per record : 100
	transducer type      : 
	physical dimension   : mV
	min/max (phys)       : -0.30708/0.295021
	EDF min/max (phys)   : -0.30708/0.295021
	min/max (digital)    : -32768/32767
	EDF min/max (digital): -32768/32767
	pre-filtering        : 


That is, in relation to the original SUMMARY for this individual, the sampling rate is different, the labels have been reset to their alias values, there are now only two signals, not 14, the units of mV and not uV, and the min/max range is different (reflecting both the change in scale, and the filtering and referencing).

3.5  Artifact detection

Here we detect likely artifact in EEG (and other) signals, using two approaches. The first, specific to sleep EEG, is as described by Buckelmueller et al (2006). The second is based on per-epoch statistical filtering of RMS and Hjorth parameters.

Running just for the first individual, the command file artifact.txt contains the following:

 MASK if=Wake
 FILTER lower=0.3 upper=35

After filtering the signal to consider only sleep epochs, and filtering the signal to remove very slow and fast rhythms, we use the SIGSTATS command to generate Hjorth parameters (and also RMS) for a signal, both overall (i.e. all sleep epochs) and per-epoch (by virtue of the epoch option). We run Luna and save the results in artifact.db, only for a single EEG channel for now:

luna s.lst 1 signal=EEG -o artifact.db < artifact.txt

Viewing the contents of artifact.db:

destrat artifact.db

artifact.db: 5 command(s), 1 individual(s), 15 variable(s), 4450 values
  command #1:	c1	Fri Aug  4 12:12:31 2017	EPOCH	
  command #2:	c2	Fri Aug  4 12:12:31 2017	MASK	
  command #3:	c3	Fri Aug  4 12:12:31 2017	RESTRUCTURE	
  command #4:	c4	Fri Aug  4 12:12:31 2017	FILTER	
  command #5:	c5	Fri Aug  4 12:12:39 2017	SIGSTATS	
distinct strata group(s):
  CH : 1 level(s) :  CLIP H1 H2 H3 RMS
  E CH : 1 level(s) :  CLIP H1 H2 H3 RMS

The SIGSTATS command produces Hjorth parameters (H1, H2 and H3) both overall (CH strata group) and per-epoch (E CH strata group). To see the overall values:

destrat artifact.db -r CH -x -p 2

nsrr01	EEG	0.00	82.58	0.35	0.84	8.75

To save all per-epoch values to a file res.txt:

destrat artifact.db -r CH E -x > res.txt

One can load this file in R, for example, to view the per-epoch distribution (we do this below also).

As well as viewing these statistics, we can use them to flag likely artifacts, by considering values that are statistical outliers in the per-epoch distribution. If we consider a new command script, mask.txt, we add the mask option to the SIGSTATS command, and specify a set of thresholds: 3,3,3 means to iteratively remove epochs that are more than 3 standard deviations above or below the mean. We also add the ARTIFACTS command beforehand, which implements the Buckelmueller et al. filtering referenced above. Note that the SIGSTATS command will only operate on unmasked values (the prior ARTIFACTS command will have masked some).

 MASK if=Wake
 FILTER lower=0.3 upper=35
 SIGSTATS mask threshold=3,3,3

Running this new script (save the above as artifact2.txt, which ends by saving the current mask, we see the following:

luna s.lst 1 signal=EEG -o mask.db < artifact2.txt

 ... cont'd ...
 masked 18 of 887 epochs, altering 18
 RMS/Hjorth filtering EEG, threshold +/-3 SDs: removed 21 of 869 epochs of iteration 1
 RMS/Hjorth filtering EEG, threshold +/-3 SDs: removed 13 of 848 epochs of iteration 2
 RMS/Hjorth filtering EEG, threshold +/-3 SDs: removed 8 of 835 epochs of iteration 3
 Overall, masked 42 of 869 epochs (RMS:9, CLP:0, ACT:26, MOB:13, CMP:3)
 ... cont'd ...

That is, the first approach to artifact detection removes 18 epochs; the second approach (which itself comprises three iterations) removes a total of 42 additional epochs. We can extract the mask (output by the DUMP-MASK command)

destrat mask.db -r E -x -v MASK > mask.txt

This has 888 rows: one header plus the 877 non-wake epochs (i.e. after a RESTRUCTURE command, this is the "total" number of epochs left). Loading this file, plus the previous output from SIGSTATS into R, we can plot the per-epoch distribution of the Hjorth parameters, colored by whether that epoch was excluded by artifact detection:


 d <- read.table("res.txt",header=T,stringsAsFactors=F)
 m <- read.table("mask.txt",header=T,stringsAsFactors=F)
 m <- merge( d , m , by=c("ID","E") ) 

 plot( m$E , m$H1 , pch = 20 , xlab="Epoch" , ylab="H1" , col = m$MASK + 1) 
 plot( m$E , m$H2 , pch = 20 , xlab="Epoch" , ylab="H2" , col = m$MASK + 1) 
 plot( m$E , m$H3 , pch = 20 , xlab="Epoch" , ylab="H3" , col = m$MASK + 1) 

In practice, given the heterogeneous nature of sleep signals, naturally it may be preferable to perform stage-specific artifact detection, or only remove very aberrant signals based on global outlier status.

To examine one of the removed epochs more closely (we can use scope as below, also), here we illustrate using commands such as DUMP and MATRIX to obtain simple, plain-text signal data. This extracts the EEG signal only, for the first individual's epoch 220. (This was one of the flagged epochs, as you can see by examining the output of DUMP-MASK above.

luna s.lst 1 signal=EEG -s "EPOCH & MASK epoch=220 & DUMP minimal" > eeg.txt

This is a plain-text file with one number (sample-point) per row. We expect 30 times 125 = 3750 rows, for a single 30-sec epoch of signal sampled at 125 Hz. In R:

s <- scan("eeg.txt")

Read 3750 items

We can then generate a plot of this signal, e.g. :

plot( seq(1/125,30,1/125), s , type="l" , ylim=c(-125, 125 ) , xlab="Time (sec)" , ylab="EEG" )

In practice, it might be easier to extract all epochs in a format that can be read by R. The MATRIX command can be used here:

luna s.lst 1 signal=EEG -s "EPOCH & MATRIX" > matrix.txt

(Note: there are some very useful and simple R packages to handle EDF files, such as the edfReader R package. One reason you still might want to use Luna and MATRIX or similar commands for this purpose, is that you can output the signals after manipulating them, e.g. resampling, masking, filtering, re-referencing, etc.)

In R:

d <- read.table("matrix.txt",header=T,stringsAsFactors=F)


      ID E     T      EEG
 1 nsrr01 1 0.000 -2.45098
 2 nsrr01 1 0.008 -3.43137 
 3 nsrr01 1 0.016 -3.43137
 4 nsrr01 1 0.024 -3.43137
 5 nsrr01 1 0.032 -2.45098
 6 nsrr01 1 0.040 -5.39216

The signal data is in long format; we can convert to an sample by epoch matrix as follows:

m <- matrix( d$EEG , nrow = 30 * 125 , ncol = max( d$E ) )

i.e. so m has 3750 rows and 1354 columns (epochs). To plot the aberrant epoch 220, but in the context of other flanking epochs, we might run something as follows:

for (i in 218:222) plot(seq(1/125,30,1/125),m[,i], , type="l" , 
      ylim=c(-125,125) , main = paste("Epoch",i),xlab="Time(sec)",ylab="EEG")  

3.6  Spectral and spindle analyses

To generate the EEG power density spectrum for each individual, say during stage 2 NREM sleep, we can use the PSD command. Specifically, if the file psd.txt is as follows:

 MASK ifnot=NREM2
 FILTER lower=0.3 upper=35
 SIGSTATS epoch mask threshold=3,3,3
 PSD spectrum

then the following command should epoch the signals, only include NREM2 epochs, filter the EEG and attempt to remove epochs that contain gross artifact, and then calculate the power spectra, both for classical bands (delta, alpha, sigma, etc) and also for each frequency bin (subject to the resolution determined by the sampling frequency and the size of the intervals analyzed. Internally, Luna uses of Welch algorithm and FFT to estimate the PSD.

luna s.lst signal=EEG -o psd.db < psd.txt

The power spectra are values implicitly stratified both by channel (CH) and frequency (F), so we use the following command to extract the absolute power values:

destrat psd.db -x -r F CH -v PSD > res.psd

In R, we can read the res.psd file as follows,

d <- read.table("res.psd",header=T,stringsAsFactors=F)


       ID  CH   F        PSD
 1 nsrr01 EEG 0.0  0.9121815
 2 nsrr01 EEG 0.5 25.5634280 
 3 nsrr01 EEG 1.0 38.0639555
 4 nsrr01 EEG 1.5 26.5353447
 5 nsrr01 EEG 2.0 18.0301253 
 6 nsrr01 EEG 2.5 16.9778064

We can ignore very slow and faster frequencies for now:

d <- d[ d$F >= 0.5 & d$F <= 30 , ]

and create 3 plots, on a log-scale, of

ids <- unique(d$ID)
for (i in ids) plot( d$F[ d$ID == i ] , log( d$PSD[ d$ID == i ] ) , 
  type="l" , xlab="Hz" , ylab="log(uV^2/Hz)" , lwd=2,main=i)  

Especially for the second individual, we see the characteristic peak in the sigma band (11-15Hz), typically representing spindle activity during stage 2 NREM sleep.

As a side-note: naturally, without filtering and artifact detection, especially if including many waking epochs at the start and end of recordings, the power spectra can be very noisy, indicating many sources of (typically non-biological) artifact. For example, the

luna s.lst @vars.txt signal=EEG1,EEG2 -o psd-all.db -s "EPOCH & MASK if=Wake & PSD spectrum"

destrat psd-all.db -r CH F -v PSD -x > res.psd-all

In R:

d <- read.table( "res.psd-all",header=T,stringsAsFactors=F)
ids <- unique(d$ID)
for (ch in c("EEG1","EEG2") )
 for (i in ids) 
   plot( d$F[ d$ID == i & d$CH == ch ] , log( d$PSD[ d$ID == i & d$CH == ch ] ) , 
            type="l" , xlab="Hz" , ylab="log(uV^2/Hz)" , lwd=2,main=paste(i,ch))  

Especially for nsrr02, there is clearly a great deal of 50 and 60Hz line noise in the signals. Visual inspection of the signals, especially EEG2 reveals other artifacts (below). Depending on the type of analyses one wishes to perform, careful filtering and/or artifact detection/correction will be important.

Finally, we can detect spindles explicitly by adding the SPINDLES command to the end of the psd.txt file.

 SPINDLES method=wavelet fc=11  ftr-dir=annots/  ftr=11hz
 SPINDLES method=wavelet fc=15  ftr-dir=annots/  ftr=15hz

The two fc options specify the targeted frequencies: slower spindles at 11 Hz and faster spindles at 15 Hz. The ftr-dir options request that an annotation file be generated (for each individual) to represent the spindle calls -- these can be used as filters, or with Scope, as we'll see below. After adding the above lines to psd.txt, re-run luna with the extended command file. (Remove the old psd.db first.)

luna s.lst signal=EEG -o psd.db < psd.txt

Then, to extract the spindle density (count per minute, DENS) we use the following command:

destrat psd.db -r CH -c F -v DENS -x -p 2

nsrr01	EEG	0.79	0.66
nsrr02	EEG	1.39	2.22
nsrr03	EEG	0.83	0.20

The number of spindles in total is given by the N variable:

destrat psd.db -r CH -c F -v N -x

ID	CH	N.F.11	N.F.15
nsrr01	EEG	196	164
nsrr02	EEG	255	408
nsrr03	EEG	129	31

Because we added the ftr options, Luna generated some feature lists -- a type of annotation format -- that represent the spindle calls. These can be used as standard annotation files, by Luna and by Scope (as below). They are a special filename convention:


where <indiv_id> and <label> are used to match and name the type of event. If we count the number of rows per file, they match the spindle counts, as expected:

wc -l annots/*

     196 annots/id_nsrr01_feature_SPINDLES-spindles-EEG-11hz.ftr
     164 annots/id_nsrr01_feature_SPINDLES-spindles-EEG-15hz.ftr
     255 annots/id_nsrr02_feature_SPINDLES-spindles-EEG-11hz.ftr
     408 annots/id_nsrr02_feature_SPINDLES-spindles-EEG-15hz.ftr
     129 annots/id_nsrr03_feature_SPINDLES-spindles-EEG-11hz.ftr
      31 annots/id_nsrr03_feature_SPINDLES-spindles-EEG-15hz.ftr

There are a large number of other options and output variables for the SPINDLES and other commands, which will be described elsewhere.

3.7  Visualization with scope?

(documentation to be added)