Take Two
This is a slightly more technical section, in which we flesh out a little more of the details surrounding some of the commands, formats and functions used above.
Sample-lists
The sample-list file (s.lst
in this tutorial) is a tab-delimited
plain-text file, that
defines a project, that is, a set of EDFs and annotation files.
Sample-lists have one row per individual/EDF, with two or more
tab-delimited columns: ID, EDF file path, and optionally, one or more
annotation files or folders. In this example, the s.lst
file
specifies 3 EDFs and their corresponding annotation files (the XML
files).
Use a text editor (e.g. TextEdit on Mac, if not emacs or similar), not a word processor, to generate or edit these files. Here the sample list has been automatically generated and you should not need to change it if running the tutorial from the same directory that contains this file. Be aware that spaces will be interpreted as part of the filename -- so make sure not to introduce any superfluous whitespace or special characters, as it will mean that Luna may not be able to locate your data.
A note on file naming conventions. For convenience, here the files are referenced by their relative paths, but which means that you need to run Luna from the folder containing this file. If you want to run Luna from another folder, you can set the search path.
Note
We've used the /
character to indicate folders here, which is the standard for Linux and Mac OS.
Luna syntax
Luna expects the following arguments:
-
always as the first argument, EITHER a sample list (i.e.
s.lst
in this example), OR a single EDF file (that ends.edf
or.EDF
) -
one or more commands, either via standard input or specified after the
-s
argument. If the-s
is specified, it must be the final option given. That is, all other arguments are interpreted as Luna commands rather than arguments. -
optionally, if the first argument is a sample list, then either one (i) or two (i and j) integers, to specify that only the i'th or the i'th to the j'th rows from the sample-list should be processed
-
optionally, a series of variable assignments, (e.g.
sr=100
orsig=EEG1
) -
optionally, a parameter file prefixed with an at sign: for example,
@param.txt
. A parameter file lists variable definitions in a plain-text file, with one variable name/value pair per line, separated by a tab character. This is simply a convenience feature, and is equivalent to entering the samevariable=value
pairs on the command line. -
optionally, a lunout output database following the
-o
( or-a
) argument. If it doesn't exist it will be created; with-o
it will overwrite any existing data, whereas-a
appends on any existing data.
There are a few exceptions: for example, some special commands
(e.g. --xml
as illustrated above) do not require a sample list to be
specified.
Luna commands
Luna commands refer to a set of functions (summarized here), which are primarily operations performed on the EDFs. Here we are using the term command in a specific sense, as distinct from the arguments added on the command line when running Luna, which were described in the previous section.
By default, Luna iterates through all individual EDFs in the
sample-list, applying the requested command(s). Commands can be
specified either via the command-line (-s
option) or from a separate
file, directed into the standard input of the Luna program (with the
<
shell operation).
All command names should be UPPERCASE. Multiple commands can be
separated by the ampersand character (&
). When commands are given via standard
input, a newline character also separates commands.
Options that follow commands are typically lowercase, and are either
single keywords or variable=value
pairs. When submitting multiple
commands via the command line -s
option, you will want to use quotes
to stop the &
being interpreted as a shell directive. For example,
here we run two commands: EPOCH
and STATS
.
luna s.lst -s 'EPOCH len=20 & STATS epoch' > res.txt
By default, all commands are assumed to come from standard input,
unless the -s
is given (as above). If the file command.txt
contains only the word DESC
, then all four invocations of Luna
below are identical (although the first one is preferred):
luna s.lst < command.txt > results.txt
cat command.txt | luna s.lst > results.txt
echo "DESC" | luna s.lst > results.txt
luna s.lst -s DESC > results.txt
If you are not familiar with pipes and redirection, the following tutorial (or any other web-search) will be helpful.
As a more advanced motivating example, it is also possible to prepend or append additional commands: for example, say command.txt
is expecting
signals with linked-mastoid referencing but the EDF does not contain these. Rather than edit
commands.txt
or use an intermediate file, you could perform this on-the-fly:
echo "REFERENCE sig=${eeg} ref=M1,M2" | cat - command.txt | luna s.lst > results.txt
cat - command.txt
will join first the new reference command (-
meaning standard input, which is piped into cat
from echo
)
and then the original file command.txt
, and pass both to luna
which will see it as a single stream of inputs.
Output format
By default, all output goes to standard out, i.e. the
console/terminal by default. When outputting plain text (instead of
using a lunout database, see below), most Luna commands generate
output in a fixed format. (The initial DESC
command is in fact an
exception, as one of the few commands that generate a simple,
"human-readable" text file.) The standard format comprises 6
tab-delimited columns, with one row per value:
nsrr01 STATS CH/SaO2 . MIN 0.10071
nsrr01 STATS CH/SaO2 . MAX 99.1196
nsrr01 STATS CH/SaO2 . MEAN 76.9242
- individual identifier/ID from the sample-list file
- the Luna command that is generating this output
- any stratifying factors that logically organize the output (e.g. by channel, or by sleep stage and frequency range).
- any temporal stratifying factors generated by Luna (e.g. epoch, time-point, event number)
- the variable name
- the value for that variable (for that individual, for that combination of stratifying factors)
If there are no stratifiers, columns 3 and 4 are set to empty(.
).
Here the strata are defined by a single factor (CH
which
indicates the channel). Strata for a single factor are represented as
factor/level
where level
is the name of a particular channel
(i.e. one of the 14 for these EDFs). The fifth and sixth columns then
give the variable name and value for that person/strata
combination. For example, the MEAN
value for the SaO2
channel
is 76.9242
, as given on the third row of the output. This format
is verbose but designed to be easily parsed by standard text
processing tools: e.g. to see only the MEAN
values, using the
ubiquitous awk
tool:
awk ' $5 == "MEAN" && $4 == "." { print $1, $3, $6 } ' res.txt
nsrr01 CH/SaO2 76.9242
nsrr01 CH/PR 57.3485
nsrr01 CH/EEG_sec_ 1.18558
nsrr01 CH/ECG 0.00939557
nsrr01 CH/EMG -6.8557
nsrr01 CH/EOG_L_ 0.472551
nsrr01 CH/EOG_R_ 0.321447
nsrr01 CH/EEG -0.301199
nsrr01 CH/AIRFLOW 0.0664418
... (cont'd)
Using destrat
If the -o
argument is given to Luna, instead of the column-based
text output format described above, all output is sent to a lunout
database, which is designed for handling stratified output. The
tool destrat
is designed to extract
information from such databases, and de-stratify it, to produce
simple, rectangular tables that can be read into other analysis
programs or spreadsheets. For example, consider the following
command:
luna s.lst sig=EEG,ECG,EMG -o out.db -s 'EPOCH & STATS epoch'
This generates a database, named out.db
as requested. The
destrat
program will report on the contents of this file:
destrat out.db
--------------------------------------------------------------------------------
out.db: 2 command(s), 3 individual(s), 31 variable(s), 247401 values
--------------------------------------------------------------------------------
command #1: c1 Wed Dec 1 10:51:05 2021 EPOCH sig=ECG,EEG,EMG
command #2: c2 Wed Dec 1 10:51:05 2021 STATS epoch=T sig=ECG,EEG,EMG
--------------------------------------------------------------------------------
distinct strata group(s):
commands : factors : levels : variables
-------------:------------:------------:---------------------------
[EPOCH] : . : 1 level(s) : DUR INC NE
: : :
[STATS] : CH : 3 level(s) : KURT MAX MEAN MEDIAN.KURT MEDIAN.MEAN
: : : MEDIAN.MEDIAN MEDIAN.RMS MEDIAN.SKEW
: : : MIN NE NE1 P01 P02 P05 P10 P20
: : : P30 P40 P50 P60 P70 P80 P90 P95
: : : P98 P99 RMS SD SKEW
: : :
[STATS] : E CH : (...) : KURT MAX MEAN MEDIAN MIN P01 P02
: : : P05 P10 P20 P30 P40 P50 P60 P70
: : : P80 P90 P95 P98 P99 RMS SKEW
: : :
-------------:------------:------------:---------------------------
That is, we see that 2 commands were performed, generating output for
3 individuals. Different Luna commands will produce different levels
of stratified output, depending on how they are run. By default, the
STATS
command produces one set of values for each channel. This is
reflected in the strata group labeled CH
. The 3 levels of this
factor are the 3 channels specified in the command (i.e. EEG, ECG
and EMG). The five core variables (MIN
, MAX
, MEAN
, SD
and
RMS
) are equivalent to the results generated at the start of this
tutorial using the STATS
command: that is, whole-signal statistics.
Running destrat
with the -x
option will give information about the
factors and levels for that strata, rather than extracting the tabular
output per se:
destrat out.db +STATS -r CH -x
Factors: 1
[CH] 3 levels
-> ECG, EEG, EMG
Individuals: 3
nsrr01 nsrr02 nsrr03
Commands: 1
STATS
Variables: 29
STATS/KURT STATS/MAX STATS/MEAN STATS/MEDIAN.KURT STATS/MEDIAN.MEAN
STATS/MEDIAN.MEDIAN STATS/MEDIAN.RMS STATS/MEDIAN.SKEW STATS/MIN STATS/NE
STATS/NE1 STATS/P01 STATS/P02 STATS/P05 STATS/P10 STATS/P20 STATS/P30
STATS/P40 STATS/P50 STATS/P60 STATS/P70 STATS/P80 STATS/P90 STATS/P95
STATS/P98 STATS/P99 STATS/RMS STATS/SD STATS/SKEW
Adding the epoch
option to the STATS
command generated
some additional output:
-
per-epoch values for these five measures (
MIN
,MAX
,MEAN
,SD
andRMS
) -- these values are stratified by both channel (CH
) and epoch (E
) -
a set of additional whole-signal summaries, based on combining the per-epoch level data and taking the median value -- like the original output, these values are only stratified by channel
CH
Specifically, the new variables are
- the digital and physical min/max values from the EDF header (
DMIN
,DMAX
,PMIN
andPMAX
) - the observed physical min/max values from the signal data (
OMIN
,OMAX
) - the median of the per-epoch mean (
MEDIAN.MEAN
), SD (MEDIAN.SD
) and RMS (MEDIAN.RMS
) - the unit of measurement from the EDF header (
UNIT
)
Hint
The goal of the STATS
command's epoch
option is primarily
to generate per-epoch level data, although using the median of
per-epoch means will be more robust to outliers compared to the
whole-signal mean, which is why both are given.
To extract information on a subset of variables, use the -v
option
(where multiple variable names can be comma-delimited). The [STATS]
argument tells destrat
which command's output we are interested in;
the -r
option specifies that channels (CH
) are listed as separate
rows in the output.
destrat out.db +STATS -r CH -v RMS
ID CH RMS
nsrr01 ECG 0.266660881869383
nsrr01 EEG 37.801350632511
nsrr01 EMG 13.9700738671353
nsrr02 ECG 0.274496551353303
nsrr02 EEG 41.0442336241345
nsrr02 EMG 19.7064640585027
nsrr03 ECG 0.300029835524692
nsrr03 EEG 54.2707996328254
nsrr03 EMG 18.8981013309648
Running the same command but swapping from rows to columns (i.e. -c
instead of -r
), we get the following:
destrat out.db +STATS -c CH -v RMS
ID RMS.CH.ECG RMS.CH.EEG RMS.CH.EMG
nsrr01 0.266660881869383 37.801350632511 13.9700738671353
nsrr02 0.274496551353303 41.0442336241345 19.7064640585027
nsrr03 0.300029835524692 54.2707996328254 18.8981013309648
By default, all decimal places are given for numeric data; this can be
modified with the -p
option:
destrat out.db +STATS -c CH -v RMS -p 2
ID RMS.CH.ECG RMS.CH.EEG RMS.CH.EMG
nsrr01 0.27 37.80 13.97
nsrr02 0.27 41.04 19.71
nsrr03 0.30 54.27 18.90
The -i
option can be used to subset results to one or more
individuals: in this case, nsrr02
. Also, note here that we are
requesting different values -- although we request the same variable
(RMS
), the strata are defined by channel (CH
) and epoch (E
).
That is, these values are the per-epoch RMS values for each of the
three channels. We can use both -r
and -c
to organize the output:
e.g. arrange values for different channels as columns, but epochs as
rows, as below. New variable names are created in the form
{VAR}.{FAC}.{LVL}
where {VAR}
is the original variable name,
{FAC}
is the column-factor (i.e. CH
), and {LVL}
is the level for
that factor (e.g. ECG
). Multiple column-stratifiers can be combined
in this way, e.g. -c FAC1 FAC2
).
destrat out.db +STATS -r E -c CH -v RMS -p 2 -i nsrr02
ID E RMS.CH.ECG RMS.CH.EEG RMS.CH.EMG
nsrr02 1 0.27 11.98 4.91
nsrr02 2 0.28 15.88 9.91
nsrr02 3 0.27 12.47 22.73
nsrr02 4 0.26 11.35 13.61
nsrr02 5 0.26 12.29 19.06
nsrr02 6 0.27 15.23 23.92
nsrr02 7 0.28 14.47 20.08
nsrr02 8 0.30 14.88 23.88
nsrr02 9 0.30 19.56 22.70
... (cont'd) ...
Alternatively, here we run the same command but with both epoch and channel as row-stratifiers, which may be more convenient for some types of downstream analysis.
destrat out.db +STATS -r E CH -v RMS -p 2 -i nsrr02
ID CH E RMS
nsrr02 ECG 1 0.27
nsrr02 ECG 2 0.28
nsrr02 ECG 3 0.27
nsrr02 ECG 4 0.26
nsrr02 ECG 5 0.26
nsrr02 ECG 6 0.27
nsrr02 ECG 7 0.28
... cont'd ...
nsrr02 ECG 1193 1.08
nsrr02 ECG 1194 1.02
nsrr02 ECG 1195 0.13
nsrr02 EEG 1 11.98
nsrr02 EEG 2 15.88
nsrr02 EEG 3 12.47
... cont'd ...
Finally, destrat
can combine results across multiple databases, if
these are listed on the command line. For example, if we were to run
different channels and individuals separately, recorded as two
databases:
The first two individuals for the EEG channel:
luna s.lst 1 2 sig=EEG -o p1.db -s STATS
Then the second and third individuals for ECG and EMG:
luna s.lst 2 3 sig=ECG,EMG -o p2.db -s STATS
Finally, all individuals for SaO2:
luna s.lst sig=SaO2 -o p3.db -s STATS
Multiple databases can be specified on the destrat
command line, and
the output is compiled into a single file:
destrat p1.db p2.db p3.db
attaching databases...
scanning 1 of 3: p1.db
--------------------------------------------------------------------------------
p1.db: 1 command(s), 2 individual(s), 6 variable(s), 12 values
--------------------------------------------------------------------------------
command #1: c1 Mon Mar 18 16:04:45 2019 STATS
--------------------------------------------------------------------------------
distinct strata group(s):
commands : factors : levels : variables
----------------:-------------------:---------------:---------------------------
[STATS] : CH : 1 level(s) : MAX MEAN MEDIAN MIN RMS SD
: : :
----------------:-------------------:---------------:---------------------------
scanning 2 of 3: p2.db
--------------------------------------------------------------------------------
p2.db: 1 command(s), 2 individual(s), 6 variable(s), 24 values
--------------------------------------------------------------------------------
command #1: c1 Mon Mar 18 16:04:49 2019 STATS
--------------------------------------------------------------------------------
distinct strata group(s):
commands : factors : levels : variables
----------------:-------------------:---------------:---------------------------
[STATS] : CH : 2 level(s) : MAX MEAN MEDIAN MIN RMS SD
: : :
----------------:-------------------:---------------:---------------------------
scanning 3 of 3: p3.db
--------------------------------------------------------------------------------
p3.db: 1 command(s), 3 individual(s), 6 variable(s), 18 values
--------------------------------------------------------------------------------
command #1: c1 Mon Mar 18 16:04:52 2019 STATS
--------------------------------------------------------------------------------
distinct strata group(s):
commands : factors : levels : variables
----------------:-------------------:---------------:---------------------------
[STATS] : CH : 1 level(s) : MAX MEAN MEDIAN MIN RMS SD
: : :
----------------:-------------------:---------------:---------------------------
destrat p1.db p2.db p3.db +STATS -v MEAN RMS -r CH -p 3
ID CH MEAN RMS
nsrr01 EEG -0.301 37.801
nsrr01 SaO2 76.924 85.567
nsrr02 EEG -0.370 41.044
nsrr02 ECG 0.006 0.274
nsrr02 EMG -0.610 19.706
nsrr02 SaO2 77.873 86.460
nsrr03 ECG 0.004 0.300
nsrr03 EMG 3.014 18.898
nsrr03 SaO2 65.083 77.731
Note
Currently, if drawing from multiple databases, only row-based formatting can be requested (i.e. -r
and not -c
).
Note
If the same command/variable/individual/strata combination appears more than once (either within a single database, or across multiple databases), then only the last encountered will be used in the output.
Parameter files
As we have seen, Luna can accept variables in a command file. These
can be defined on the command line (as variable=value
pairs), but
they can also be included in a parameter file, which defines these.
For example, consider this file (which is in cmd/vars.txt
in the tutorial folder):
alias EEG1|EEG
alias EEG2|EEG(sec)
alias OXSTAT|"OX STAT"
eeg EEG1,EEG2
myepoch 10
nrem NREM1,NREM2,NREM3
All parameter file rows must be exactly two tab-delimited columns. (That is, be careful if copying and pasting the text from above, as the web formatting puts spaces rather than tabs.)
The first contains a variable name, the second contains the value.
For example, ${myepoch}
is defined as a variable meaning 10
. The
${nrem}
variable is defined to list all NREM sleep annotations.
Parameter files can be useful for keeping command scripts generic (i.e
applicable to different studies, which may have different wording for
a given annotation or signal), by supplying a project-specific
parameter file.
Here alias
is a reserved keyword that for signals can be used to
specify alternate names for the same signal. It expects a
pipe-delimited set of values (2 or more) in the second column,
indicating that the first value (e.g. EEG1
) is the preferred name,
but that EEG
means that same thing. If a signal called EEG
is
encountered, it is replaced to EEG1
internally, and in the output.
Finally, as a shortcut, the variable ${eeg}
is defined here to
indicate both EEG channels, comma-delimited and labeled here by their
aliases: EEG1,EEG2
.
So, if this file cmd/second.txt
(also in the tutorial folder) is as follows:
EPOCH len=${myepoch}
MASK all
MASK unmask-if=${nrem}
RESTRUCTURE
STATS sig=${eeg}
then the following command will calculate statistics for both EEG
channels, label them as EEG1
and EEG2
, only for 10-second epochs
of either stage N1, N2 or N3 sleep, and only for the individual nsrr01
:
luna s.lst nsrr01 @cmd/vars.txt -o res.db < cmd/second.txt
===================================================================
+++ luna | v0.26.0, 11-Nov-2021 | starting 01-Dec-2021 10:57:52 +++
===================================================================
input(s): s.lst
output : res.db
commands: c1 EPOCH len=${myepoch}
: c2 MASK all
: c3 MASK unmask-if=${nrem}
: c4 RESTRUCTURE
: c5 STATS sig=${eeg}
___________________________________________________________________
Processing: nsrr01 [ #1 ]
duration: 11.22.00 | 40920 secs | clocktime 21.58.17 - 09.20.17
signals: 14 (of 14) selected in a standard EDF file
SaO2 | PR | EEG2 | ECG | EMG | EOG_L | EOG_R | EEG1
AIRFLOW | THOR_RES | ABDO_RES | POSITION | LIGHT | OXSTAT
annotations:
Arousal (x194) | Hypopnea (x361) | N1 (x109) | N2 (x523)
N3 (x17) | Obstructive_Apnea (x37) | R (x238) | SpO2_artifact (x59)
SpO2_desaturation (x254) | W (x477)
variables:
airflow=AIRFLOW | ecg=ECG | eeg=EEG2,EEG1 | effort=THOR_RES,A...
emg=EMG | eog=EOG_L,EOG_R | hr=PR | id=nsrr01 | light=LIGHT
oxygen=SaO2,OXSTAT | position=POSITION
..................................................................
CMD #1: EPOCH
options: len=10 sig=*
set epochs, length 10 (step 10, offset 0), 4092 epochs
..................................................................
CMD #2: MASK
options: all sig=*
reset all 4092 epochs to be masked
..................................................................
CMD #3: MASK
options: sig=* unmask-if=N1,N2,N3
set masking mode to 'unmask'
based on N1 327 epochs match; 0 newly masked, 327 unmasked, 3765 unchanged
total of 327 of 4092 retained
based on N2 1569 epochs match; 0 newly masked, 1569 unmasked, 2523 unchanged
total of 1896 of 4092 retained
based on N3 51 epochs match; 0 newly masked, 51 unmasked, 4041 unchanged
total of 1947 of 4092 retained
..................................................................
CMD #4: RESTRUCTURE
options: sig=*
restructuring as an EDF+: keeping 19470 records of 40920, resetting mask
retaining 1947 epochs
..................................................................
CMD #5: STATS
options: sig=EEG2,EEG1
processing EEG2 ...
processing EEG1 ...
___________________________________________________________________
...processed 1 EDFs, done.
...processed 1 command set(s), all of which passed
-------------------------------------------------------------------
+++ luna | finishing 01-Dec-2021 10:57:54 +++
===================================================================