Skip to content

Working with output databases

Overview

The primary output of most lunaC commands is a specially-formatted database file, which can contain the results of one or more analyses for one for more individuals/EDFs. Although they can have any filename (typically in this documentation we call them out.db) we'll refer to these output databases generically as lout (Luna-output) files. This section describes how to use the destrat command-line tool as well as the lunaR package to extract information from lout files.

Why doesn't Luna just write to plain text files?

Although you don't have to use these databases (i.e. luna can write everything to standard out which can be redirected to a text file), in practice it is much easier to work with lout files. In large part, this is because Luna's output is often from multiple commands, and each command may have output stratified by a number of factors: channel, sleep stage, frequency, epoch or sleep cycle, pairs of channels, etc. Rather than generate dozens of text files for each of these differently-formatted commands, Luna stores everything in a single database, along with a set of tools for extracting the required information. Although the syntax and logic may appear a little opaque at first, there is a consistency across commands, meaning that once learned it will help with all aspects of Luna.

destrat

As described here, the -o (or -a) argument instructs Luna to write its output to a lout database file:

luna s.lst nsrr01 sig=ECG,EMG -o out.db -s HEADERS 

This example generates a lout file called out.db. This file (which is actually an SQLite database) cannot be directly displayed in the terminal via a text-editor or spreadsheet. Rather, a lout file is an intermediate form, from which various text-files (or R objects) can be extracted in a variety of formats, using the destrat program that comes with Luna (or, as described below, lunaR's ldb() function).

To view the contents of this file, run destrat without any other options:

destrat out.db 
----------------------------------------------------------------------------
out.db: 1 command(s), 1 individual(s), 11 variable(s), 17 values
----------------------------------------------------------------------------
  command #1:   c1  Thu Feb  7 10:19:51 2019    HEADERS 
----------------------------------------------------------------------------
distinct strata group(s):
  commands      : factors       : levels        : variables 
----------------:---------------:---------------:---------------------------
  [HEADERS]     : .             : 1 level(s)    : NR NS REC.DUR TOT.DUR.HMS 
                :               :               : TOT.DUR.SEC
                :               :               : 
  [HEADERS]     : CH            : 2 level(s)    : DMAX DMIN PDIM PMAX PMIN 
                :               :               : SR
----------------:---------------:---------------:---------------------------

Here we see when the file was generated and some information about the number of individuals, commands and variables stored within. We also see two distinct strata groups:

  • a default (or sometimes referred to here as a baseline) group, meaning that there are no stratifying factors

  • a second group defined by the factor CH, which has two levels (i.e. corresponding to the two channels specified, ECG and EMG)

In each case, the variables defined for each strata group are listed on each row.

A strata group corresponds to a table, where each row of that table corresponds to one unique combination of levels for the factor(s) in that stratum. destrat will extract information from only one strata group at a time. Think of each strata group as a virtual table, defined by a particular set of factors: it does not make sense to mix the information about the general EDF with the information about individual channels, for example.

Running destrat with just a command label, which should either be in square brackets or preceded by a + character will show the data from the baseline stratum for that command, if one exists:

destrat out.db [HEADERS]
which prints the tab-delimited output:
ID      NR      NS   REC.DUR   TOT.DUR.HMS   TOT.DUR.SEC
nsrr01  40920   2    1         11:22:00      40920
The output of the HEADERS command is described here.

Hint

Some shells interpret square brackets [ and ] as special characters. If this appears to be the case, either place quotes around the command, e.g.:

destrat out.db "[HEADERS]"      
or (even easier) use a single + character (at the start of the command name) to indicate it is a command:
destrat out.db +HEADERS

Naturally, one can save any output from destrat to a file using standard redirection operators (i.e. to create files that can be loaded into other analysis programs such as R). For example, (here using the +command format, which we'll adopt as the default in this documentation):

destrat out.db +HEADERS > my-file.txt

Hint

All destrat command, variable, factor and level names are case-sensitive.

To extract information from the second strata group (which is defined by the factor CH), we need to explicitly list the factor(s) that define it, use either the -r or -c options. The choice of -r versus -c influences the layout of the output, in terms of whether factors are listed as additional rows or columns. This is probably easiest to show by example. In the first instance:

destrat out.db +HEADERS -r CH

will list each level of CH (i.e. each channel) as a separate row in the output:

ID        CH     DMAX   DMIN    PDIM  PMAX    PMIN     SR
nsrr01    ECG    127    -128    mV    1.25    -1.25    250
nsrr01    EMG    127    -128    uV    31.5    -31.5    125

Alternatively, the same information can be listed in a column-wise format, where each level of CH is a new column, with the -c option:

destrat out.db +HEADERS -c CH
ID      DMAX.CH.ECG DMAX.CH.EMG DMIN.CH.ECG DMIN.CH.EMG PDIM.CH.ECG PDIM.CH.EMG PMAX.CH.ECG PMAX.CH.EMG PMIN.CH.ECG PMIN.CH.EMG SR.CH.ECG SR.CH.EMG
nsrr01  127         127         -128        -128        mV          uV          1.25        31.5        -1.25       -31.5       250       125

Note how each individual variable, e.g. DMAX, is split into two variables in the output, either DMAX.CH.ECG or DMAX.CH.EMG. Depending on how you want to analyse the data, and the number of factors/levels, either -r or -c formatted output may be the more appropriate choice.

Multiple factors

To further illustrate destrat with multiple factors, consider this example of power spectral density estimation for two channels (named EEG and EEG(sec) as per the NSRR tutorial data), performed for both the entire record as well as per-epoch:

luna s.lst nsrr01 sig="EEG,EEG(sec)" -o out.db -s "EPOCH & PSD epoch"

(note the use of quotes around the sig list, which avoids the shell from interpreting the parentheses as special characters)

destrat out.db
--------------------------------------------------------------------------------
out.db: 2 command(s), 1 individual(s), 6 variable(s), 51877 values
--------------------------------------------------------------------------------
  command #1:   c1  Thu Feb  7 10:33:45 2019    EPOCH   
  command #2:   c2  Thu Feb  7 10:33:45 2019    PSD 
--------------------------------------------------------------------------------
distinct strata group(s):
  commands      : factors           : levels        : variables 
----------------:-------------------:---------------:---------------------------
  [EPOCH]       : .                 : 1 level(s)    : DUR INC NE
                :                   :               : 
  [PSD]         : CH                : 2 level(s)    : NE
                :                   :               : 
  [PSD]         : B CH              : 20 level(s)   : PSD RELPSD
                :                   :               : 
  [PSD]         : E B CH            : (...)         : PSD RELPSD
                :                   :               : 

We now see four distinct strata groups. The EPOCH command produces some basic output in the baseline stratum (such as the number of epochs, NE). For the PSD command, we see three strata groups (none of which are the default baseline group) that are collectively defined by three factors:

Factor Description
E Epoch (due to the epoch option on the PSD command)
B Spectral band
CH Channel, because PSD always operates on a particular channel

Based on these three factors, there are three distinct strata groups from PSD, each of which contains its own set of variables/data, are:

Strata group Content
CH Number of epochs (although this will be similar for each channel)
B x CH Spectral band power for each channel for the entire signal
E x B x CH As above, but output per-epoch (due to the epoch option of the PSD command)

In other words, out.db contains four virtual tables, and we can output any one of them by specifying the appropriate factors with the -r and/or -c options, as well as the command name. (as +command). When a stratum is defined by more than one factor (i.e. B and CH for the third group), it is possible to specify some factors as rows and some as columns. Here, both factors are requested with row-wise formatting:

destrat out.db +PSD -r CH B 
ID       B           CH        PSD                 RELPSD
nsrr01   SLOW        EEG       105.991683363628    0.0732300733474261
nsrr01   DELTA       EEG       198.692418792271    0.137277378186528
nsrr01   THETA       EEG       54.713385057902     0.0378016941869899
nsrr01   ALPHA       EEG       63.1553608045886    0.0436342886275004
nsrr01   SIGMA       EEG       678.134027746239    0.468525482521808
nsrr01   SLOW_SIGMA  EEG       563.260922552984    0.389159199696685
nsrr01   FAST_SIGMA  EEG       114.873105193254    0.0793662828251232
nsrr01   BETA        EEG       225.6867899095      0.155927895983292
nsrr01   GAMMA       EEG       45.8934730333316    0.0317079820769435
nsrr01   TOTAL       EEG       1447.37917796109    1
nsrr01   SLOW        EEG(sec)  173.30016852818     0.11036981788844
nsrr01   DELTA       EEG(sec)  368.806565177362    0.234882134162885
nsrr01   THETA       EEG(sec)  99.1852731160328    0.0631682047628917
nsrr01   ALPHA       EEG(sec)  113.195854898732    0.0720911352655018
nsrr01   SIGMA       EEG(sec)  427.048500439725    0.271974722375411
nsrr01   SLOW_SIGMA  EEG(sec)  346.660966460099    0.220778248874063
nsrr01   FAST_SIGMA  EEG(sec)  80.3875339796257    0.0511964735013477
nsrr01   BETA        EEG(sec)  239.891616774602    0.152779967158952
nsrr01   GAMMA       EEG(sec)  74.5169610826192    0.0474576770128835
nsrr01   TOTAL       EEG(sec)  1570.17717201771    1

Note

If you're looking at these power estimates, they may seem strange for sleep data (i.e. sigma higher than delta). Note that this command is looking over all epochs, including many artifactual wake/end-of-study epochs that the end of the recording. Examining the epoch-level estimates will make this clear, e.g. extracted with:

destrat out.db +PSD -r E B CH > out.txt
You'll see in the tutorial how to mask, filter and detect artifacts in EEG data using Luna.

To instead specify that channels are listed as columns:

destrat out.db +PSD -r B -c CH 

ID       B          PSD.CH.EEG        PSD.CH.EEG(sec)   RELPSD.CH.EEG       RELPSD.CH.EEG(sec)
nsrr01   SLOW       105.991683363628  173.30016852818   0.0732300733474261  0.11036981788844
nsrr01   DELTA      198.692418792271  368.806565177362  0.137277378186528   0.234882134162885
nsrr01   THETA      54.713385057902   99.1852731160328  0.0378016941869899  0.0631682047628917
nsrr01   ALPHA      63.1553608045886  113.195854898732  0.0436342886275004  0.0720911352655018
nsrr01   SIGMA      678.134027746239  427.048500439725  0.468525482521808   0.271974722375411
nsrr01   SLOW_SIGMA 563.260922552984  346.660966460099  0.389159199696685   0.220778248874063
nsrr01   FAST_SIGMA 114.873105193254  80.3875339796257  0.0793662828251232  0.0511964735013477
nsrr01   BETA       225.6867899095    239.891616774602  0.155927895983292   0.152779967158952
nsrr01   GAMMA      45.8934730333316  74.5169610826192  0.0317079820769435  0.0474576770128835
nsrr01   TOTAL      1447.37917796109  1570.17717201771  1                   1

Hint

The order in which you specify the -r and -c options does not matter.

Aggregating output

If there are multiple individuals in a Luna project, these will be compiled and output together. The -i option, followed by a list of one or more individual IDs can be used to restrict the output to only those individuals/EDFs.

destrat can also compile and integrate information across multiple databases by listing multiple files as follows, e.g. something like:

destrat out1.db out2.db +HEADERS -r CH > all-out.txt

or

destrat *.db +HEADERS -r CH > all-out.txt

The different databases may contain similar or different individuals; further, they may contain similar or different commands. One issue to remember is that if the same data-point is included in more than one file, only one value will be used (i.e. there is no mechanism for resolving potential discrepancies, etc). If an individual did not have data for that command/variable/level, destrat will output NA (the missing code used in R).

Restriction on -c when combining multiple databases

One caveat is that the -c option cannot be used when multiple databases are specified on the command line. That is, you have to use -r instead. (It is always possible to restructure back to column-format using other tools, e.g. dcast() in R

Restricting output

The -v option can be used to select only certain variables (with spaces between variables, and noting that all names are case-sensitive):

destrat out.db +EPOCH -r E -v START STOP 

Also, you can restrict output to only certain levels of particular factors, by specifying -r or -c in the form factor/level or factor/level1,level2. For example, using the out.db generated above, we could extract only relative sigma and beta power:

destrat out.db +PSD -r B/SIGMA,BETA CH -v RELPSD -p 2
ID      B      CH        RELPSD
nsrr01  SIGMA  EEG       0.47
nsrr01  BETA   EEG       0.16
nsrr01  SIGMA  EEG(sec)  0.27
nsrr01  BETA   EEG(sec)  0.15

That is, this extracts only RELPSD (from -v) for only sigma and beta power (from B/SIGMA,BETA). Furthermore, it uses the -p 2 option to restrict numeric output to two decimal places.

To obtain a list of the levels for a given stratum, run destrat with the -x option (which means no output) as follows (here, it doesn't matter whether -r or -c is used):

destrat out.db +PSD -r B CH  -x
Factors: 2
     [B] 10 levels
     -> ALPHA, BETA, DELTA, FAST_SIGMA, GAMMA, SIGMA, SLOW, SLOW_SIGMA, 
        THETA, TOTAL

     [CH] 2 levels
     -> EEG, EEG(sec)

Individuals: 1
     nsrr01

Commands: 1
     PSD

Variables: 2
     PSD/PSD PSD/RELPSD

Command summary

Option Example Description
+command +ANNOTS Select output from this command
[command] [ANNOTS] Equivalent to +ANNOTS
-r factor(s) -r CH Select strata group defined by CH and organize by rows
-c factor(s) -c CH Select strata group defined by CH and organize by columns
-x -x Display information about the database, rather than extracting data
-p integer -p 2 Restrict numeric output to two decimal places
-i ID(s) -i nsrr01 Restrict output to this individual(s)
-v variable(s) -v DENS Restrict output to only this variable(s)

behead

behead is a very simple text utility that is supplied with Luna and destrat, which can be used to make output more human-friendly. The input is a tab-delimited rectangular file (i.e. with the same number of columns on each row) and a header row (i.e. containing variable names), as produced by destrat.

For example, if this file is out.txt

ID      CH        DMAX    DMIN     PDIM   PMAX   PMIN   SR
nsrr01  SaO2      32767   -32768   %      100    0      1
nsrr01  PR        32767   -32768   BPM    200    0      1
nsrr01  EEG(sec)  127    -128      uV     125    -125   125
nsrr01  ECG       127    -128      mV     1.25   -1.25  250
nsrr01  EMG       127    -128      uV     31.5   -31.5  125
nsrr01  EOG(L)    127    -128      uV     125    -125   50
nsrr01  EOG(R)    127    -128      uV     125    -125   50
nsrr01  EEG       127    -128      uV     125    -125   125
nsrr01  AIRFLOW   127    -128      NA     -1     1      10
nsrr01  THOR RES  127    -128      NA     -1     1      10
nsrr01  ABDO RES  127    -128      NA     -1     1      10
nsrr01  POSITION  3      0         NA     3      0      1
nsrr01  LIGHT     1      0         NA     1      0      1
nsrr01  OX STAT   3      0         NA     3      0      1
then
behead < out.txt 
produces a file where each row of the input is represented as several rows in the output:
                       ID   nsrr01              
                       CH   SaO2                
                     DMAX   32767               
                     DMIN   -32768              
                     PDIM   %                   
                     PMAX   100                 
                     PMIN   0                   
                       SR   1                   

                       ID   nsrr01              
                       CH   PR                  
                     DMAX   32767               
                     DMIN   -32768              
                     PDIM   BPM                 
                     PMAX   200                 
                     PMIN   0                   
                       SR   1                   

... (etc) ...

In practice, you may want to pipe straight from destrat and combine behead with less or a similar pager:

destrat out.db +HEADERS -r CH | behead | less 
(press q to quit)

Options

Add -t to behead to get tab-delimited output instead of the format above; add -n to get additional row/column numbering in the output; add -nt for both.

lunaR

LunaR's ldb() function can read lout files generated by lunaC directly into R. Although destrat is more flexible, if you are performing downstream analyses in R anyway, then using ldb() obviates the need to use destrat to create intermediate text files, if they are then only read into R. See the documentation on ldb() for more information.

Scaling

There is effectively no formal limit on the size of a lout database (i.e. SQLite can in principle handle a database file up to 140TB). Naturally, very large databases take longer to process, however. When destrat first encounters a lout file, it generates an index that speeds up subsequent queries: the time it takes to generate the index is obviously related to how large the database is. In general, size and performance issues will only arise if you are placing output for hundreds of individuals in the same output file, or if you have commands then generate a lot of output (e.g. full cross spectra for all pairs of channels in an hdEEG study, separately for every epoch). Therefore, follow the usual, common-sense principles of prototyping analyses on one or two individuals first and see how things scale. It may often be easier (or necessary) to have different lout databases for different individuals and/or commands. (In certain extreme circumstances, you may want to forego using a lout database entirely and extract the elements you need from the standard output stream.)