Skip to content

Quick Start

After downloading and installing Luna, this tutorial should provide a good starting point to familiarize yourself with working with Luna. There are four sections that should be followed in order. This first section jumps straight in, demonstrating a handful of commands quickly. The second section loops back over the same material, but aiming to give more context and detail. The third section extends the range of commands towards some more genuinely useful analyses of the sleep EEG. The final section performs the same steps but using lunaR instead of lunaC.

Data used in this tutorial

This tutorial, based on tutorials at the National Sleep Research Resource, involves looking at three polysomnograms. Each individual has an EDF (containing signal data, including EEG, ECG and EMG) and also an annotation file, which includes information on manual sleep staging, recorded apneas, hypopnea, movements and other events.

Across these three tutorials, we will:

  • summarize the contents of the EDFs and annotation files
  • calculate signal statistics
  • enumerate types of annotated events
  • manipulate signals by masking and exporting data
  • apply automated artifact detection for sleep EEG
  • apply spectral and spindle analyses for sleep EEG

Obtaining the data

As mentioned, we'll follow the tutorials at the National Sleep Research Resource, which are based on three EDFs.

If you are using the Docker version of Luna, you'll already have the tutorial EDFs pre-installed. If not, you'll need to grab them from the web, from the link below:

ZIP archive (67Mb) containing 3 EDFs, XML annotation files and 'sample list'

After downloading, move the ZIP file to your working directory, and using the terminal, unzip the contents:


This should create a folder named edfs with six files (learn-nsrr01.edf, learn-nsrr01-profusion.xml, etc), a second folder named cmd (which contains some text files we'll use later in this tutorial, given here purely to save you copying and pasting or typing things in) and a sample-list file named s.lst which defines a project for these three individuals:

cat s.lst
nsrr01 edfs/learn-nsrr01.edf    edfs/learn-nsrr01-profusion.xml
nsrr02 edfs/learn-nsrr02.edf    edfs/learn-nsrr02-profusion.xml 
nsrr03 edfs/learn-nsrr03.edf    edfs/learn-nsrr03-profusion.xml

Note to Windows users

If you are using a Windows command prompt, you may substitute type for cat and more for less. Some other command-line functions may not be available however. For example, instead of unzip you may need to use the GUI, etc. We do stress that, as a command-line program, lunaC is fundamentally better suited to macOS and Linux platforms. The experience with lunaR should be similar across platforms however.

Using --build to generate new sample lists automatically

If the file s.lst didn't exist, you could use the --build command to generate it. Assuming you are in the folder that contains edfs/:

luna --build edfs > s2.lst
This will generate a sample list, with IDs based on file names (the actual EDFs do not contain IDs):
learn-nsrr01    edfs/learn-nsrr01.edf
learn-nsrr02    edfs/learn-nsrr02.edf
learn-nsrr03    edfs/learn-nsrr03.edf
To automatically match each EDF to the correspinding annotation file (that ends with -profusion.xml instead of .edf appended to the filename) we can run:
luna --build edfs -ext=-profusion.xml > s2.lst
Luna lets us know what it has found:
wrote 3 EDFs to the sample list
  3 of which had 1 linked annotation files
The new sample list
learn-nsrr01   edfs/learn-nsrr01.edf    edfs/learn-nsrr01-profusion.xml
learn-nsrr02   edfs/learn-nsrr02.edf    edfs/learn-nsrr02-profusion.xml
learn-nsrr03   edfs/learn-nsrr03.edf    edfs/learn-nsrr03-profusion.xml
(note: the supplied sample list, s.lst wasn't build with --build and has the shorter IDs, e.g. nsrr01)

Displaying EDF files

To test that Luna is properly installed, and that the EDFs downloaded correctly, run the following to apply the DESC command to each EDF specified in the s.lst project:

luna s.lst -s DESC > res.txt
This should write output to a file called res.txt, as well as some information (the log) to the console/terminal, as follows:

+++ luna | v0.26.0, 11-Nov-2021 | starting 01-Dec-2021 11:13:58  +++
input(s): s.lst
output  : .
commands: c1    DESC    

Processing: nsrr01 [ #1 ]
 duration: 11.22.00 | 40920 secs ( clocktime 21.58.17 - 09.20.16 )

 signals: 14 (of 14) selected in a standard EDF file:
  SaO2 | PR | EEG_sec_ | ECG | EMG | EOG_L_ | EOG_R_ | EEG

  N1 (x109) | N2 (x523) | N3 (x17) | R (x238)
  W (x477) | apnea/obstructive (x37) | arousal (x194) | artifact/SpO2 (x59)
  desat (x254) | hypopnea (x361)

  airflow=AIRFLOW | ecg=ECG | eeg=EEG_sec_,EEG | effort=THOR_RES,A...
  emg=EMG | eog=EOG_L_,EOG... | hr=PR | id=nsrr01 | light=LIGHT
  oxygen=SaO2,OX_STAT | position=POSITION

... (cont'd) ...

...processed 3 EDFs, done.
...processed 1 command set(s),  all of which passed
+++ luna | finishing 01-Dec-2021 11:13:59                       +++

As well as summarizing headers, Luna validates the structure of each EDF, for example, checking whether it is the expected size. Viewing the text file res.txt (i.e. with a text editor, or command such as cat or less), we see the description generated by the DESC command:

cat res.txt
EDF filename      : edfs/learn-nsrr01.edf
ID                : nsrr01
Clock time        : 21:58:17 - 09:20:17
Duration          : 11:22:00
# signals         : 14
Signals           : SaO2[1] PR[1] EEG(sec)[125] ECG[250] EMG[125] EOG(L)[50] 
                    EOG(R)[50] EEG[125] AIRFLOW[10] THOR RES[10] ABDO RES[10] 
                    POSITION[1] LIGHT[1] OX STAT[1]

EDF filename      : edfs/learn-nsrr02.edf
ID                : nsrr02
Clock time        : 21:18:06 - 07:15:36
Duration          : 09:57:30
# signals         : 14
Signals           : SaO2[1] PR[1] EEG(sec)[125] ECG[250] EMG[125] EOG(L)[50] 
                    EOG(R)[50] EEG[125] AIRFLOW[10] THOR RES[10] ABDO RES[10] 
                    POSITION[1] LIGHT[1] OX STAT[1]

EDF filename      : edfs/learn-nsrr03.edf
ID                : nsrr03
Clock time        : 20:15:00 - 07:37:00
Duration          : 11:22:00
# signals         : 14
Signals           : SaO2[1] PR[1] EEG(sec)[125] ECG[250] EMG[125] EOG(L)[50] 
                    EOG(R)[50] EEG[125] AIRFLOW[10] THOR RES[10] ABDO RES[10] 
                    POSITION[1] LIGHT[1] OX STAT[1]

In other words, each EDF contains 14 signals, spanning approximately 10 or 11 hours of sleep. Much of the output of DESC mirrors what is logged to the console when running any Luna command, just in a slightly different format.

Sample list IDs versus EDF header Patient IDs

Note that the IDs (nsrr01, nsrr02 and nsrr03) are those specified in the first column of the s.lst file. Although EDF headers contain a field corresponding to Patient ID, this is not used internally by Luna. The EDF header Patient ID (which can be viewed with the SUMMARY command) is generally ignored by Luna, and can be missing or different from the ID specified in the first column of the sample list. (When running without a sample list, the ID is simply the filename.)

Label remapping and sanitization

The labels for channels and annotations that Luna reports above are actually not the original ones in the EDF/XML files. Rather, they (by default) go through a process of remapping and sanitization to make labels that are more consistent (for NSRR data, at least) and easier to work with.

Data harmonization in the NSRR

To get a sense of why harmonization is important for NSRR studies, see the first couple of sections of this post.

Specifically, Luna (by default) will:

  • change different flavors of NSRR annotation label that mean to same thing (e.g. Arousal (), Arousal|Arousal (STANDARD), etc) to a single term (arousal); this behavior can be turned off with the nsrr-remap=F flag

  • replace all spaces in channel and annotation names with an underscore (e.g. THOR RES to THOR_RES); this behavior can be turned off with the keep-spaces=F flag

  • in addition to the prior point, Luna will also sanitize special characters in labels that are likely to cause problems downstream, such as - or *, replacing these with underscores also; this behavior can be turned off with the sanitize=F flag

Running the same command but with these flags set (and also just running for the first individual in the sample list):

luna s.lst 1 nsrr-remap=F sanitize=F keep-spaces=T -s DESC 
The console now prints the "original" labels:
 signals: 14 (of 14) selected in a standard EDF file:
  SaO2 | PR | EEG(sec) | ECG | EMG | EOG(L) | EOG(R) | EEG

  Arousal () (x194) | Hypopnea (x361) | NREM1 (x109) | NREM2 (x523)
  NREM3 (x16) | NREM4 (x1) | Obstructive Apnea (x37) | REM (x238)
  SpO2 artifact (x59) | SpO2 desaturation (x254) | wake (x477)
i.e. versus Luna's default representation:
 signals: 14 (of 14) selected in a standard EDF file:
  SaO2 | PR | EEG_sec_ | ECG | EMG | EOG_L_ | EOG_R_ | EEG

  N1 (x109) | N2 (x523) | N3 (x17) | R (x238)
  W (x477) | apnea/obstructive (x37) | arousal (x194) | artifact/SpO2 (x59)
  desat (x254) | hypopnea (x361)

For example: - Obstructive Apnea becomes apnea/obstructive - NREM2 becomes N2 - THOR RES becomes THOR_RES - EOG(L) becomes EOG_L_ (because sanitize is true by default)

Why do we make these changes? Because, similar to R or Matlab, Luna is fundamentally a command-line tool and so uses text input to specify operations. As such, having spaces or special characters creates many unnecessary problems (e.g. for commands such as TRANS that can perform general arithmetic operations on signals, which would make the expression C3-M2 ambiguous). See this FAQ for the rationale for making labels more machine-readable.

Signal summary statistics

Turning to the signals contained in each EDF, here we use the STATS command to generate basic statistics (mean, median, min, max and standard deviation) per channel, following this NSRR tutorial.

luna s.lst -s STATS
nsrr01  STATS  CH/SaO2  .  MAX     99.1196
nsrr01  STATS  CH/SaO2  .  MIN     0.10071
nsrr01  STATS  CH/SaO2  .  MEAN    76.9242
nsrr01  STATS  CH/SaO2  .  MEDIAN  95.1156
nsrr01  STATS  CH/SaO2  .  SD      37.4744
nsrr01  STATS  CH/SaO2  .  RMS     85.5665
nsrr01  STATS  CH/PR    .  MAX     200
nsrr01  STATS  CH/PR    .  MIN     0.201419
nsrr01  STATS  CH/PR    .  MEAN    57.3485
nsrr01  STATS  CH/PR    .  MEDIAN  67.1916
nsrr01  STATS  CH/PR    .  SD      30.4955
nsrr01  STATS  CH/PR    .  RMS     64.9523
... (cont'd)

For example, for the first individual nsrr01, the SaO2 channel has a mean of 76.9242 and a standard deviation of 37.4744. This output is formatted in a standardized manner, described below, but it is quite verbose and not easily readable. In practice, Luna is designed to work with a companion tool, destrat, which records and presents Luna output in a more structured way. For example, here we re-run the command, except now saving results to a database res.db via the -o flag:

luna s.lst -o res.db -s STATS

and then use destrat to extract, for example, only the signal means for the ECG, EMG and SaO2 channels (don't worry about the details of this command for now)::

destrat res.db +STATS -c CH/ECG,EMG,SaO2 -p 3 -v MEAN | behead
                       ID   nsrr01              
              MEAN.CH.ECG   0.009               
              MEAN.CH.EMG   -6.856              
             MEAN.CH.SaO2   76.924              

                       ID   nsrr02              
              MEAN.CH.ECG   0.006               
              MEAN.CH.EMG   -0.610              
             MEAN.CH.SaO2   77.873              

                       ID   nsrr03              
              MEAN.CH.ECG   0.004               
              MEAN.CH.EMG   3.014               
             MEAN.CH.SaO2   65.083              

More complicated than it needs to be? For this simple example, certainly. However, the value of destrat and its stratified output (called lout) databases will become more apparent when working with larger and more complex result sets. That is, in real analyses results may be stratified by multiple factors (such as channel, sleep stage, frequency or power band, epoch, event or class of annotation) and reside across multiple output databases. Later in this tutorial, we'll use destrat to handle these situations.

In any case, comparing these results to the NSRR tutorial, encouragingly we see similar estimates for these quantities.

Working with annotations

Parallel to this NSRR tutorial, here we use Luna to summarize the contents of an XML annotation file, which are structured as Compumedics Profusion files (described here).

To take a quick look at the events in an annotation file, the special --xml command displays the time and duration (in seconds) of each event/annotation, along with the type of annotation (if present) and its name:

luna --xml edfs/learn-nsrr01-profusion.xml
.               .               EpochLength     30
0 - 30          (30 secs)       SleepStage      wake
30 - 60         (30 secs)       SleepStage      wake
60 - 90         (30 secs)       SleepStage      wake
90 - 120        (30 secs)       SleepStage      wake
120 - 150       (30 secs)       SleepStage      wake
150 - 180       (30 secs)       SleepStage      wake
180 - 210       (30 secs)       SleepStage      wake
210 - 240       (30 secs)       SleepStage      wake
240 - 270       (30 secs)       SleepStage      wake

... (cont'd)

2700 - 2730     (30 secs)       SleepStage      NREM2
2711.8 - 2718   (6.2 secs)      .               Arousal ()      
2720.1 - 2739.3 (19.2 secs)     .               Hypopnea        
2730 - 2760     (30 secs)       SleepStage      NREM2
2747.2 - 2751.5 (4.3 secs)      .               Arousal ()      
2750.1 - 2769.3 (19.2 secs)     .               SpO2 desaturation       
2752.6 - 2777.4 (24.8 secs)     .               Hypopnea        
2760 - 2790     (30 secs)       SleepStage      NREM2
2779 - 2784     (5 secs)        .               Arousal ()      
2782.6 - 2807.4 (24.8 secs)     .               SpO2 desaturation       
2790 - 2820     (30 secs)       SleepStage      NREM2

... (cont'd)

As shown below, the information in these XML files can be used directly to extract or exclude epochs, via the MASK command. Although the NSRR uses this XML format extensively for annotations, Luna accepts other, simpler formats too, as described here.

The ANNOTS command summarizes the number and duration (in seconds) of the annotations in one or more annotation files associated with an EDF. As an example, here we use it to count the number of obstructive apneas for each individual. Whereas this can be easily done from the output of the --xml command, one advantage of ANNOTS is that other masks can be applied: for example, to count only apneas occurring during REM sleep. Running ANNOTS and sending output to annot.db:

luna s.lst -o annot.db -s ANNOTS
we can then view a summary of the output generated:

destrat annot.db
annot.db: 1 command(s), 3 individual(s), 5 variable(s), 28345 values
  command #1:   c1  Thu Aug 13 13:23:28 2020    ANNOTS  sig=*
distinct strata group(s):
  commands      : factors           : levels        : variables 
  [ANNOTS]      : ANNOT             : 12 level(s)   : COUNT DUR
                :                   :               : 
  [ANNOTS]      : ANNOT INST        : 6081 level(s) : COUNT DUR
                :                   :               : 
  [ANNOTS]      : ANNOT INST T      : (...)         : START STOP VAL
                :                   :               : 

See ANNOTS for a description of the output strata from this command. As an example, to get the count (COUNT) of obstructive apnea events (apnea/obstructive level of the ANNOT factor):

destrat annot.db +ANNOTS -r ANNOT/apnea/obstructive -v COUNT
ID      ANNOT                   COUNT
nsrr01  apnea/obstructive       37
nsrr02  apnea/obstructive       5
nsrr03  apnea/obstructive       163

To count only apneas that occur during REM sleep, we can epoch the dataset (using the EPOCH command) and then add a mask (the MASK command) based on the staging information in the XML:

luna s.lst -o annot.db -s 'EPOCH & MASK ifnot=R & ANNOTS'

Specifying multiple commands after -s

Note how in the example above, we string together multiple commands, each separated by the & character. We've put the entire script following -s now in quotes (this stops special characters such as & being interpreted incorrectly by the operating system's terminal/shell.

Using destrat to extract the COUNT variable once more:

destrat annot.db +ANNOTS -r ANNOT/apnea/obstructive -v COUNT

we obtain the number of events during REM:

ID      ANNOT                   COUNT
nsrr01  apnea/obstructive       27
nsrr02  apnea/obstructive       3

There is no output for the last individual nsrr03, as he or she did not have any REM epochs.

By default, the ANNOTS command will include all annotations with any overlap with a REM epoch, in this example. To include only events that start during a REM epoch, add the start option:

luna s.lst -o annot.db -s 'EPOCH & MASK ifnot=REM & ANNOTS start'

destrat annot.db +ANNOTS -r ANNOT/apnea_obstructive -v COUNT
ID      ANNOT                   COUNT
nsrr01  apnea_obstructive       26
nsrr02  apnea_obstructive       2

Putting it together

We'll end this first section by combining the STATS command with the annotations from the XML, to generate stage-specific summary statistics for the EEG channel. Here, instead of using -s, we'll give Luna a multi-part set of commands as a separate plain-text file (this is in the cmd folder of the tutorial, called first.txt). This command file below introduces a number of new commands (text after the % character is a comment and ignored by Luna):

% Set epoch duration
% (anything following '%' is a comment)

EPOCH len=30

% Assign a label ('tag') in the output,
% which will be set to the value of the 'stage' variable

TAG tag=STAGE/${stage}

% Restrict to epochs that match the ${stage} variable
% i.e. MASK them out if they do *not* match

MASK ifnot=${stage}

% Having set a mask above, now actually remove the masked epochs


% Produce basic statistics on this reduced dataset


First, EPOCH specifies that non-overlapping, 30-second epochs should be applied to each EDF. We then set a TAG, which helps keep track of the output, as we'll see below. The value of the tag here takes a special form: level / factor where factor indicates sleep stage. Rather than being hard-coded in the file, the level is specified as a variable, in the form ${variable}. A specific value for ${stage} is given on the command line when Luna is invoked, as below. The next command masks (that is, excludes) any epoch which does not have an annotation that matches ${stage}. That is, if ${stage} is N2, then only N2 epochs are included in the analysis. After setting mask values for one or more epochs, the RESTRUCTURE command removes any masked epochs. Finally, the STATS command is invoked, but this time it will consider only epochs that match the specified sleep stage.

The first.txt script is given as the input to Luna (i.e. using the standard input redirection operator, < ). We specify that the output should go to a database called out.db, by virtue of the -o option. We set define the value of the stage variable (here as N1), which will be expected when processing the TAG and MASK commands. Finally, we set a special variable, sig, which instructs Luna to only consider the first EEG channel (labeled EEG, as indicated by the DESC command):

luna s.lst -o out.db stage=N1 sig=EEG < cmd/first.txt

We now run Luna three more times, for the remaining sleep stages. However, instead of using the -o flag (which always creates a new database), we'll use -a which appends output to an existing database. In this way, the rms.db database will accumulate all output.

luna s.lst -a out.db stage=N2 sig=EEG < cmd/first.txt
luna s.lst -a out.db stage=N3 sig=EEG < cmd/first.txt
luna s.lst -a out.db stage=R sig=EEG  < cmd/first.txt

To view the contents of out.db, we run destrat:

destrat out.db
out.db: 20 command(s), 3 individual(s), 19 variable(s), 222 values
  command #1:   c1  Mon Mar 18 15:54:27 2019    EPOCH   
  command #2:   c2  Mon Mar 18 15:54:27 2019    TAG 
  command #3:   c3  Mon Mar 18 15:54:27 2019    MASK    
  command #4:   c4  Mon Mar 18 15:54:27 2019    RESTRUCTURE 

 ... cont'd ...

distinct strata group(s):
  commands      : factors           : levels        : variables 
  [EPOCH]       : .                 : 1 level(s)    : DUR INC NE
                :                   :               : 
  [MASK]        : STAGE EMASK       : 4 level(s)    : N_MASK_SET N_MASK_UNSET 
                :                   :               : N_MATCHES N_RETAINED 
                :                   :               : N_TOTAL  N_UNCHANGED
                :                   :               : 
  [RESTRUCTURE] : STAGE             : 4 level(s)    : DUR1 DUR2 NA NR1 NR2 NS
                :                   :               : 
  [STATS]       : CH STAGE          : 4 level(s)    : MAX MEAN MIN P01 P02 P05 
                :                   :               : P10 P20 P30 P40 P50 P60 
                :                   :               : P70 P80 P90 P95 P98 P99 
                :                   :               : RMS SD SKEW
                :                   :               : 

which lists the number of commands, individuals and variables in the database, a list of the commands and their time-stamps, and the strata groups in this database.

The TAG command specified that a user-defined stratum, called STAGE, be added to the output. This was set to either N1, N2, etc, corresponding to the sleep stage values in the XML file. There are therefore 4 levels for the factor STAGE. We can view the RMS statistics, which are grouped by channel (CH) and sleep stage (STAGE), as follows:

destrat out.db +STATS -r CH -c STAGE -v RMS -p 3
nsrr01 EEG 7.355           10.646          13.250          7.564
nsrr02 EEG 10.362          14.742          20.055          14.146
nsrr03 EEG 12.302          14.497          18.980          NA

The options for destrat are described below more fully. Briefly, here we select the variable RMS (-v) from the [STATS] command, optionally formats numeric output to 3 decimal places (-p), sets row strata to correspond to channels (-r) and column strata to correspond to stages (-c).

To view small output files, the behead utility is often useful: pipe the output of destrat into behead as follows:

destrat out.db +STATS -r CH -c STAGE -v RMS -p 3 | behead
                       ID   nsrr01              
                       CH   EEG                 
             RMS.STAGE_N1   7.355               
             RMS.STAGE_N2   10.646              
             RMS.STAGE_N3   13.250              
              RMS.STAGE_R   7.564               

                       ID   nsrr02              
                       CH   EEG                 
             RMS.STAGE_N1   10.362              
             RMS.STAGE_N2   14.742              
             RMS.STAGE_N3   20.055              
              RMS.STAGE_R   14.146              

                       ID   nsrr03              
                       CH   EEG                 
             RMS.STAGE_N1   12.302              
             RMS.STAGE_N2   14.497              
             RMS.STAGE_N3   18.980              
              RMS.STAGE_R   NA                  

As another example of extracting output, we can get the number of records in the EDF for each stage/individual, as this value is output after any RESTRUCTURE command: DUR2 is the duration in seconds after a given restructuring:

destrat out.db +RESTRUCTURE -c STAGE -v DUR2 | behead
                       ID   nsrr01              
            DUR2.STAGE_N1   3270                
            DUR2.STAGE_N2   15690               
            DUR2.STAGE_N3   480                 
             DUR2.STAGE_R   7140                

                       ID   nsrr02              
            DUR2.STAGE_N1   330                 
            DUR2.STAGE_N2   11970               
            DUR2.STAGE_N3   5550                
             DUR2.STAGE_R   3600                

                       ID   nsrr03              
            DUR2.STAGE_N1   1560                
            DUR2.STAGE_N2   11250               
            DUR2.STAGE_N3   630                 
             DUR2.STAGE_R   0                   

This explains the NA (not available) output for the RMS for the third individual's REM sleep: they did not have any sleep scored as REM that night. If you examine the contents of the log output (i.e. sent straight to the console when running the command, you'll see this is noted here also:)

 set masking mode to 'force'
 based on REM 0 epochs match;  newly masked 1364 epochs, unmasked 0 and left 0 unchanged
 total of 0 of 1364 retained for analysis

Looking at the results for the stage-specific RMS analysis, even without filtering and artifact detection of this signal, the results seem plausible: we generally see higher RMS, corresponding to more activity due to large amplitude slow waves during N3 sleep. We return to this theme when considering spectral analyses of the EEG, below.

Back to top