Skip to content

Quick Start

After downloading and installing Luna, this tutorial should provide a good starting point to familiarize yourself with working with Luna. There are four sections that should be followed in order. This first section jumps straight in, demonstrating a handful of commands quickly. The second section loops back over the same material, but aiming to give more context and detail. The third section extends the range of commands towards some more genuinely useful analyses of the sleep EEG. The final section performs the same steps but using lunaR instead of lunaC.

Data used in this tutorial

This tutorial, based on tutorials at the National Sleep Research Resource http://sleepdata.org, involves looking at three polysomnograms. Each individual has an EDF (containing signal data, including EEG, ECG and EMG) and also an annotation file, which includes information on manual sleep staging, recorded apneas, hypopnea, movements and other events.

Across these three tutorials, we will:

  • summarize the contents of the EDFs and annotation files
  • calculate signal statistics
  • enumerate types of annotated events
  • manipulate signals by masking and exporting data
  • apply automated artifact detection for sleep EEG
  • apply spectral and spindle analyses for sleep EEG

Obtaining the data

As mentioned, we'll follow the tutorials at the National Sleep Research Resource, which are based on three EDFs.

If you are using the Docker version of Luna, you'll already have the tutorial EDFs pre-installed. If not, you'll need to grab them from the web, from the link below:

ZIP archive (67Mb) containing 3 EDFs, XML annotation files and 'sample list'
http://zzz.bwh.harvard.edu/dist/luna/tutorial.zip

After downloading, move the ZIP file to your working directory, and using the terminal, unzip the contents:

unzip tutorial.zip 

This should create a folder named edfs with six files (learn-nsrr01.edf, learn-nsrr01-profusion.xml, etc), a second folder named aux (which contains some text files we'll use later in this tutorial, given here purely to save you copying and pasting or typing things in) and a sample-list file named s.lst which defines a project for these three individuals:

cat s.lst
nsrr01 edfs/learn-nsrr01.edf    edfs/learn-nsrr01-profusion.xml
nsrr02 edfs/learn-nsrr02.edf    edfs/learn-nsrr02-profusion.xml 
nsrr03 edfs/learn-nsrr03.edf    edfs/learn-nsrr03-profusion.xml

Note to Windows users

If you are using a Windows command prompt, you may substitute type for cat and more for less. Some other command-line functions may not be available however. For example, instead of unzip you may need to use the GUI, etc. We do stress that, as a command-line program, lunaC is fundamentally better suited to macOS and Linux platforms. The experience with lunaR should be similar across platforms however.

Displaying EDF files

To test that Luna is properly installed, and to that the EDFs downloaded correctly, run the following to apply the DESC command to each EDF specified in the s.lst project:

luna s.lst -s DESC > res.txt
This should write output to a file called res.txt, as well as some information (the log) to the console/terminal, as follows:

===================================================================
+++ luna | v0.2, 17-Mar-2019  |  starting 18-Mar-2019 15:49:28  +++
===================================================================
input(s): s.lst
output  : .
commands: c1    DESC    

___________________________________________________________________
Processing: nsrr01 [ #1 ]
 duration 11:22:00 hrs, last time-point 11:22:00 hrs after start
  40920 records, each of 1 second(s)

 signals: 14 (of 14) selected in a standard EDF file:
  SaO2 | PR | EEG(sec) | ECG | EMG | EOG(L) | EOG(R) | EEG
  AIRFLOW | THOR RES | ABDO RES | POSITION | LIGHT | OX STAT

 annotations:
  [NREM1] 109 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [NREM2] 523 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [NREM3] 16 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [NREM4] 1 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [REM] 238 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [apnea_obstructive] 37 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [arousal] 194 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [artifact_SpO2] 59 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [desat] 254 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [hypopnea] 361 instance(s) (from edfs/learn-nsrr01-profusion.xml)
  [wake] 477 instance(s) (from edfs/learn-nsrr01-profusion.xml)
 ..................................................................
 CMD #1: DESC

... (cont'd) ...

___________________________________________________________________
...processed 3 EDFs, done.
...processed 1 command set(s),  all of which passed
-------------------------------------------------------------------
+++ luna | finishing 18-Mar-2019 15:49:28                       +++
===================================================================

As well as summarizing headers, Luna validates the structure of each EDF, for example, checking whether it is the expected size. Viewing the text file res.txt (i.e. with a text editor, or command such as cat or less), we see the description generated by the DESC command:

cat res.txt
EDF filename      : edfs/learn-nsrr01.edf
ID                : nsrr01
Clock time        : 21:58:17 - 09:20:17
Duration          : 11:22:00
# signals         : 14
Signals           : SaO2[1] PR[1] EEG(sec)[125] ECG[250] EMG[125] EOG(L)[50] 
                    EOG(R)[50] EEG[125] AIRFLOW[10] THOR RES[10] ABDO RES[10] 
                    POSITION[1] LIGHT[1] OX STAT[1]

EDF filename      : edfs/learn-nsrr02.edf
ID                : nsrr02
Clock time        : 21:18:06 - 07:15:36
Duration          : 09:57:30
# signals         : 14
Signals           : SaO2[1] PR[1] EEG(sec)[125] ECG[250] EMG[125] EOG(L)[50] 
                    EOG(R)[50] EEG[125] AIRFLOW[10] THOR RES[10] ABDO RES[10] 
                    POSITION[1] LIGHT[1] OX STAT[1]

EDF filename      : edfs/learn-nsrr03.edf
ID                : nsrr03
Clock time        : 20:15:00 - 07:37:00
Duration          : 11:22:00
# signals         : 14
Signals           : SaO2[1] PR[1] EEG(sec)[125] ECG[250] EMG[125] EOG(L)[50] 
                    EOG(R)[50] EEG[125] AIRFLOW[10] THOR RES[10] ABDO RES[10] 
                    POSITION[1] LIGHT[1] OX STAT[1]

In other words, each EDF contains 14 signals, spanning approximately 10 or 11 hours of sleep. Much of the output of DESC mirrors what is logged to the console when running any Luna command, just in a slightly different format.

Sample list IDs versus EDF header Patient IDs

Note that the IDs (nsrr01, nsrr02 and nsrr03) are those specified in the first column of the s.lst file. Although EDF headers contain a field corresponding to Patient ID, this is not used internally by Luna. The EDF header Patient ID (which can be viewed with the SUMMARY command) is generally ignored by Luna, and can be missing or different from the ID specified in the first column of the sample list. (When running without a sample list, the ID is simply the filename.)

Signal summary statistics

Turning to the signals contained in each EDF, here we use the STATS command to generate basic statistics (mean, median, min, max and standard deviation) per channel, following this NSRR tutorial.

luna s.lst -s STATS
nsrr01  STATS  CH/SaO2  .  MAX     99.1196
nsrr01  STATS  CH/SaO2  .  MIN     0.10071
nsrr01  STATS  CH/SaO2  .  MEAN    76.9242
nsrr01  STATS  CH/SaO2  .  MEDIAN  95.1156
nsrr01  STATS  CH/SaO2  .  SD      37.4744
nsrr01  STATS  CH/SaO2  .  RMS     85.5665
nsrr01  STATS  CH/PR    .  MAX     200
nsrr01  STATS  CH/PR    .  MIN     0.201419
nsrr01  STATS  CH/PR    .  MEAN    57.3485
nsrr01  STATS  CH/PR    .  MEDIAN  67.1916
nsrr01  STATS  CH/PR    .  SD      30.4955
nsrr01  STATS  CH/PR    .  RMS     64.9523
... (cont'd)

For example, for the first individual nsrr01, the SaO2 channel has a mean of 76.9242 and a standard deviation of 37.4744. This output is formatted in a standardized manner, described below, but it is quite verbose and not easily readable. In practice, Luna is designed to work with a companion tool, destrat, which records and presents Luna output in a more structured way. For example, here we re-run the command, except now saving results to a database res.db via the -o flag:

luna s.lst -o res.db -s STATS

and then use destrat to extract, for example, only the signal means for the ECG, EMG and SaO2 channels (don't worry about the details of this command for now)::

destrat res.db +STATS -c CH/ECG,EMG,SaO2 -p 3 -v MEAN | behead
                       ID   nsrr01              
              MEAN.CH.ECG   0.009               
              MEAN.CH.EMG   -6.856              
             MEAN.CH.SaO2   76.924              

                       ID   nsrr02              
              MEAN.CH.ECG   0.006               
              MEAN.CH.EMG   -0.610              
             MEAN.CH.SaO2   77.873              

                       ID   nsrr03              
              MEAN.CH.ECG   0.004               
              MEAN.CH.EMG   3.014               
             MEAN.CH.SaO2   65.083              

More complicated than it needs to be? For this simple example, certainly. However, the value of destrat and its stratified output (called lout) databases will become more apparent when working with larger and more complex result sets. That is, in real analyses results may be stratified by multiple factors (such as channel, sleep stage, frequency or power band, epoch, event or class of annotation) and reside across multiple output databases. Later in this tutorial, we'll use destrat to handle these situations.

In any case, comparing these results to the NSRR tutorial, encouragingly we see similar estimates for these quantities.

Working with annotations

Parallel to this NSRR tutorial, here we use Luna to summarize the contents of an XML annotation file, which are structured as Compumedics Profusion files (described here).

To take a quick look at the events in an annotation file, the special --xml command displays the time and duration (in seconds) of each event/annotation, along with the type of annotation (if present) and its name:

luna --xml edfs/learn-nsrr01-profusion.xml
.               .               EpochLength     30
0 - 30          (30 secs)       SleepStage      wake
30 - 60         (30 secs)       SleepStage      wake
60 - 90         (30 secs)       SleepStage      wake
90 - 120        (30 secs)       SleepStage      wake
120 - 150       (30 secs)       SleepStage      wake
150 - 180       (30 secs)       SleepStage      wake
180 - 210       (30 secs)       SleepStage      wake
210 - 240       (30 secs)       SleepStage      wake
240 - 270       (30 secs)       SleepStage      wake

... (cont'd)

2700 - 2730     (30 secs)       SleepStage      NREM2
2711.8 - 2718   (6.2 secs)      .               Arousal ()      
2720.1 - 2739.3 (19.2 secs)     .               Hypopnea        
2730 - 2760     (30 secs)       SleepStage      NREM2
2747.2 - 2751.5 (4.3 secs)      .               Arousal ()      
2750.1 - 2769.3 (19.2 secs)     .               SpO2 desaturation       
2752.6 - 2777.4 (24.8 secs)     .               Hypopnea        
2760 - 2790     (30 secs)       SleepStage      NREM2
2779 - 2784     (5 secs)        .               Arousal ()      
2782.6 - 2807.4 (24.8 secs)     .               SpO2 desaturation       
2790 - 2820     (30 secs)       SleepStage      NREM2

... (cont'd)

As shown below, the information in these XML files can be used directly to extract or exclude epochs, via the MASK command. Although the NSRR uses this XML format extensively for annotations, Luna accepts other, simpler formats too, as described here.

The ANNOTS command summarizes the number and duration (in seconds) of the annotations in one or more annotation files associated with an EDF. As an example, here we use it to count the number of obstructive apneas for each individual. Whereas this can be easily done from the output of the --xml command, one advantage of ANNOTS is that other masks can be applied: for example, to count only apneas occurring during REM sleep. Running ANNOTS and sending output to annot.db:

luna s.lst -o annot.db -s ANNOTS
we can then view a summary of the output generated:

destrat annot.db
--------------------------------------------------------------------------------
annot.db: 1 command(s), 3 individual(s), 5 variable(s), 16227 values
--------------------------------------------------------------------------------
  command #1:   c1  Mon Mar 18 15:53:09 2019    ANNOTS  
--------------------------------------------------------------------------------
distinct strata group(s):
  commands      : factors           : levels        : variables 
----------------:-------------------:---------------:---------------------------
  [ANNOTS]      : ANNOT             : 12 level(s)   : COUNT DUR
                :                   :               : 
  [ANNOTS]      : ANNOT INST        : 12 level(s)   : COUNT DUR
                :                   :               : 
  [ANNOTS]      : ANNOT INST T      : (...)         : START STOP VAL
                :                   :               : 
----------------:-------------------:---------------:---------------------------

See ANNOTS for a description of the output strata from this command. As an example, to get the count (COUNT) of obstructive apnea events (apnea_obstructive level of the ANNOT factor):

destrat annot.db +ANNOTS -r ANNOT/apnea_obstructive -v COUNT
ID      ANNOT                   COUNT
nsrr01  apnea_obstructive       37
nsrr02  apnea_obstructive       5
nsrr03  apnea_obstructive       163

To count only apneas that occur during REM sleep, we can epoch the dataset (using the EPOCH command) and then add a mask (the MASK command) based on the staging information in the XML:

luna s.lst -o annot.db -s "EPOCH & MASK ifnot=REM & ANNOTS"

Specifying multiple commands after -s

Note how in the example above, we string together multiple commands, each separated by the & character. We've put the entire script following -s now in quotes (this stops special characters such as & being interpreted incorrectly by the operating system's terminal/shell.

Using destrat to extract the COUNT variable once more:

destrat annot.db +ANNOTS -r ANNOT/apnea_obstructive -v COUNT

we obtain the number of events during REM:

ID      ANNOT                   COUNT
nsrr01  apnea_obstructive       27
nsrr02  apnea_obstructive       3

There is no output for the last individual nsrr03, as he or she did not have any REM epochs.

By default, the ANNOTS command will include all annotations with any overlap with a REM epoch, in this example. To include only events that start during a REM epoch, add the start option:

luna s.lst -o annot.db -s "EPOCH & MASK ifnot=REM & ANNOTS start"

destrat annot.db +ANNOTS -r ANNOT/apnea_obstructive -v COUNT
ID      ANNOT                   COUNT
nsrr01  apnea_obstructive       26
nsrr02  apnea_obstructive       2

Putting it together

We'll end this first section by combining the STATS command with the annotations from the XML, to generate stage-specific summary statistics for the EEG channel. Here, instead of using -s, we'll give Luna a multi-part set of commands as a separate plain-text file (this is in the aux folder of the tutorial, called first.txt). This command file below introduces a number of new commands (text after the % character is a comment and ignored by Luna):

% Set epoch duration
% (anything following '%' is a comment)

EPOCH len=30

% Assign a label ('tag') in the output,
% which will be set to the value of the 'stage' variable

TAG tag=STAGE/${stage}

% Restrict to epochs that match the ${stage} variable
% i.e. MASK them out if they do *not* match

MASK ifnot=${stage}

% Having set a mask above, now actually remove the masked epochs

RESTRUCTURE

% Produce basic statistics on this reduced dataset

STATS sig=EEG

First, EPOCH specifies that non-overlapping, 30-second epochs should be applied to each EDF. We then set a TAG, which helps keep track of the output, as we'll see below. The value of the tag here takes a special form: level / factor where factor indicates sleep stage. Rather than being hard-coded in the file, the level is specified as a variable, in the form ${variable}. A specific value for ${stage} is given on the command line when Luna is invoked, as below. The next command masks (that is, excludes) any epoch which does not have an annotation that matches ${stage}. That is, if ${stage} is NREM2, then only NREM2 epochs are included in the analysis. After setting mask values for one or more epochs, the RESTRUCTURE command removes any masked epochs. Finally, the STATS command is invoked, but this time it will consider only epochs that match the specified sleep stage.

The first.txt script is given as the input to Luna (i.e. using the standard input redirection operator, < ). We specify that the output should go to a database called out.db, by virtue of the -o option. We set define the value of the stage variable (here as NREM1), which will be expected when processing the TAG and MASK commands. Finally, we set a special variable, sig, which instructs Luna to only consider the first EEG channel (labeled EEG, as indicated by the DESC command):

luna s.lst -o out.db stage=NREM1 sig=EEG < aux/first.txt

We now run Luna three more times, for the remaining sleep stages. However, instead of using the -o flag (which always creates a new database), we'll use -a which appends output to an existing database. In this way, the rms.db database will accumulate all output.

luna s.lst -a out.db stage=NREM2 sig=EEG < aux/first.txt
luna s.lst -a out.db stage=NREM3 sig=EEG < aux/first.txt
luna s.lst -a out.db stage=REM sig=EEG  < aux/first.txt

To view the contents of out.db, we run destrat:

destrat out.db
--------------------------------------------------------------------------------
out.db: 20 command(s), 3 individual(s), 19 variable(s), 222 values
--------------------------------------------------------------------------------
  command #1:   c1  Mon Mar 18 15:54:27 2019    EPOCH   
  command #2:   c2  Mon Mar 18 15:54:27 2019    TAG 
  command #3:   c3  Mon Mar 18 15:54:27 2019    MASK    
  command #4:   c4  Mon Mar 18 15:54:27 2019    RESTRUCTURE 

 ... cont'd ...

--------------------------------------------------------------------------------
distinct strata group(s):
  commands      : factors           : levels        : variables 
----------------:-------------------:---------------:---------------------------
  [EPOCH]       : .                 : 1 level(s)    : DUR INC NE
                :                   :               : 
  [EPOCH]       : STAGE             : 4 level(s)    : DUR INC NE
                :                   :               : 
  [RESTRUCTURE] : STAGE             : 4 level(s)    : DUR1 DUR2 NR1 NR2
                :                   :               : 
  [STATS]       : CH STAGE          : 4 level(s)    : MAX MEAN MEDIAN MIN RMS SD
                :                   :               : 
  [MASK]        : STAGE EPOCH_MASK  : 4 level(s)    : N_MASK_SET N_MASK_UNSET N_MATCHES
                :                   :               : N_RETAINED N_TOTAL N_UNCHANGED
                :                   :               :
----------------:-------------------:---------------:---------------------------

which lists the number of commands, individuals and variables in the database, a list of the commands and their time-stamps, and the strata groups in this database.

The TAG command specified that a user-defined stratum, called STAGE, be added to the output. This was set to either NREM1, NREM2, etc, corresponding to the sleep stage values in the XML file. There are therefore 4 levels for the factor STAGE. We can view the RMS statistics, which are grouped by channel (CH) and sleep stage (STAGE), as follows:

destrat out.db +STATS -r CH -c STAGE -v RMS -p 3
ID  CH  RMS.STAGE.NREM1  RMS.STAGE.NREM2  RMS.STAGE.NREM3  RMS.STAGE.REM
nsrr01  EEG  7.355    10.646    13.250    7.564
nsrr02  EEG  10.362    14.742    20.055    14.146
nsrr03  EEG  12.302    14.497    18.980    NA

The options for destrat are described below more fully. Briefly, here we select the variable RMS (-v) from the [STATS] command, optionally formats numeric output to 3 decimal places (-p), sets row strata to correspond to channels (-r) and column strata to correspond to stages (-c).

To view small output files, the behead utility is often useful: pipe the output of destrat into behead as follows:

destrat out.db +STATS -r CH -c STAGE -v RMS -p 3 | behead
                       ID   nsrr01              
                       CH   EEG                 
          RMS.STAGE.NREM1   7.355               
          RMS.STAGE.NREM2   10.646              
          RMS.STAGE.NREM3   13.250              
            RMS.STAGE.REM   7.564               

                       ID   nsrr02              
                       CH   EEG                 
          RMS.STAGE.NREM1   10.362              
          RMS.STAGE.NREM2   14.742              
          RMS.STAGE.NREM3   20.055              
            RMS.STAGE.REM   14.146              

                       ID   nsrr03              
                       CH   EEG                 
          RMS.STAGE.NREM1   12.302              
          RMS.STAGE.NREM2   14.497              
          RMS.STAGE.NREM3   18.980              
            RMS.STAGE.REM   NA                  

As another example of extracting output, we can get the number of records in the EDF for each stage/individual, as this value is output after any RESTRUCTURE command: DUR2 is the duration in seconds after a given restructuring:

destrat out.db +RESTRUCTURE -c STAGE -v DUR2 | behead
                       ID   nsrr01              
         DUR2.STAGE.NREM1   3270                
         DUR2.STAGE.NREM2   15690               
         DUR2.STAGE.NREM3   480                 
           DUR2.STAGE.REM   7140                

                       ID   nsrr02              
         DUR2.STAGE.NREM1   330                 
         DUR2.STAGE.NREM2   11970               
         DUR2.STAGE.NREM3   5550                
           DUR2.STAGE.REM   3600                

                       ID   nsrr03              
         DUR2.STAGE.NREM1   1560                
         DUR2.STAGE.NREM2   11250               
         DUR2.STAGE.NREM3   630                 
           DUR2.STAGE.REM   0                   

This explains the NA (not available) output for the RMS for the third individual's REM sleep: they did not have any sleep scored as REM that night. If you examine the contents of the log output (i.e. sent straight to the console when running the command, you'll see this is noted here also:)

 CMD #3: MASK
 set masking mode to 'force'
 based on REM 0 epochs match;  newly masked 1364 epochs, unmasked 0 and left 0 unchanged
 total of 0 of 1364 retained for analysis

Looking at the results for the stage-specific RMS analysis, even without filtering and artifact detection of this signal, the results seem plausible: we generally see higher RMS, corresponding to more activity due to large amplitude slow waves during NREM3 sleep. We return to this theme when considering spectral analyses of the EEG, below.