FAQ and trouble-shooting
Although not necessarily asked with respect to Luna, here are some frequently asked questions:
Luna is a C/C++ library focused on the analysis of large numbers of sleep studies encoded as EDFs. This is a free, open-source project. Currently, there is a command-line tool (lunaC) and an extension library for R (lunaR).
The current version is the beta-release v0.22 (31-March-2019).
luna -v to display the specific build date/time.
Luna was primarily developed by Shaun Purcell, with input from a number of colleagues:
- Susan Redline and her team developing the National Sleep Research Resource
- Dennis Dean for sharing his original SpectralTrainFig code-base
- Sara Mariani and Charmaine Demanuele for input on several EEG and ECG analysis components
Interested to contribute (either as a colleague or as a job)? Please contact me.
Luna development is indirectly supported via a number of NIH grants: NHLBI R01HL146339 (PI Purcell), NHLBI R21HL145492 (PI Purcell), NIMH R03 MH108908 (PI Purcell), as well as NHLBI R35HL135818 (PI Redline) and NHLBI R24HL114473 (PI Redline).
This is a good question and deserves a longer answer... The primary aim of Luna was to provide a platform for 1) adopting some of the elegant methods and models that have emerged from animal and lab-based cognitive neuroscience studies over the past decade or so, and 2) for applying them in the context of large (albeit sometimes noisy) epidemiological studies with polysomnography.
As a relative newcomer to sleep research (my personal background is primarily in psychiatric genetics), the development of Luna has tracked with my (still steep) learning curve, in how to think about sleep signal data. Because of this, I adopted the tools I was most familiar with (namely C/C++ and R), rather than the ubiquitous "in-house Matlab script". In developing Luna though, I've been constantly reminded of how powerful Matlab and its associated toolboxes are for working with electrophysiological signal data. I can also appreciate that working with Luna's particular instantiations of specific methods may be unnecessarily restrictive for some.
So, why wouldn't I just use Matlab? There was, from my perspective,
still an unmet need for tools to work with sleep data in
thousands of individuals, such as
from the NSRR. In my (limited) experience of
seeing how others approached sleep data, it seemed clear that although
the substantive core of a particular analysis (e.g. power spectral
density estimation) could be efficiently and flexibly implemented in a
single Matlab command (i.e.
pwelch() or similar), a lot of the
scaffolding around these one or two central functions (i.e. most of
the "work" from a practical perspective) was more often than not a
tangle of brittle, error-prone and undocumented scripting. Although
not a perfect solution even for our own work, Luna represents a modest
step in the direction of building more robust and scalable analysis
I had originally conceived of Luna just as my own personal library of functions that would assist me in my sleep research. However, I decided to document and distribute this code for a number of reasons:
to make the tool better: documenting and distributing code has intrinsic value, as this process tends to make the underlying tool better, even if it will only ever be used by yourself or a very small number of people.
accessibility and transparency: the sleep field is unfortunately replete with black box proprietary software and file formats which can be limiting; making things open-source lets others see what you've done, and use it without restriction.
community: others can build upon your work; in genetics, for example, I developed a tool PLINK, which has been quite widely-used. Since it was first developed (in 2007), however, there have been considerable advances in the scale of data, and in the types of analytic approaches taken. Being an open-source tool, others were able to very significantly augment and even rewrite it, to produce an order-of-magnitude more powerful tool, whilst at the same time maintaining the pipelines and community experience that had been built over more than a decade with PLINK.
For both larger and smaller projects, I'd strongly recommend the document/distribute model whenever practically possible.
Luna uses a number of excellent open-source components, in particular:
SQLite embedded database
Chapters and example code from Mike X Cohen's fabulously clear and practical book: Analyzing neural time series data
Lees, J. M. and J. Park (1995): Multiple-taper spectral analysis: A stand-alone C-subroutine: Computers & Geology: 21, 199
Laurent Condat (2013) A Direct Algorithm for 1-D Total Variation Denoising . IEEE Signal Processing Letters, 20:11.
Multi-scale entropy (MSE) algorithm by Madalena Costa et al. (Costa M., Goldberger A.L., Peng C.-K. Multiscale entropy analysis of biological signals. Phys Rev E 2005;71:021906.)
Windows line endings
MS Windows uses carriage return (CR) and line feed (LF) characters to
denote the end of a line, whereas UNIX-like systems (including Mac)
use LF alone. The
file command on UNIX-like systems will indicate if this is the case.
foo.txt: ASCII text, with CRLF line terminators bar.txt: ASCII text
Use a utility such as
unix2dos to convert these
files. Otherwise, use the tool
tr available on most systems:
tr -d '\r' < infile.txt > outfile.txt
Spaces in channel names or annotations
alias REF|"REF X1"
to create a new label alias
REF which can be used, e.g. on the
command line, instead of
Similarly, if masking on an annotation with a space, you need to put
quotes around it. For example, the NSRR annotation for REM sleep has
spaces and special characters,
REM sleep|5. Therefore, in a command
MASK if="REM sleep|5"
If you are using the
-s option to specify a commands directly as
arguments to Luna, you will likely already be using quotes for the
entire command, thus you need to escape those additional quotes: i.e.
luna s.lst -o out.db -s "EPOCH & MASK if=\"REM sleep|5\""
Advice on channel names
Try to keep channel names to simple alphanumeric characters combined
with the underscore character to delimit terms. Although Luna will
accept spaces and characters such as
+ - * % ( ) ., etc, in channel
names, we advise against them if you wish to use
destrat and other
tools such as
R to process results downstream.
That is, for any output that is stratified by channel (
CH), you may
wish to create a dataset where each channel corresponds to a
column/variable in the output. If a variable name is, for example,
SIGMA, then using a command like
destrat out1.db -c CH > my-file.txt
may create variables with names such as
SIGMA.CH.EEG(2). When loaded into R, this may lead to variable
names that are harder to work with (i.e. these characters are swapped
. or you need to quote variable/list names, etc). For example,
if you output with channels are row stratifiers:
destrat out1.db -r CH > my-file.txt
but subsequently use an R command such as
dcast (from the
data.table packages) to generate a data frame where channels
correspond to columns, you'll end up with variable names such as
d$C3-M2 which can make life difficult (i.e. R would complain that
M2 doesn't exist, as the
- is interpreted as a minus, so you'd need
to write d$"C3-M2", or find other work-arounds, etc).
To avoid this, use aliases.
PS. for other reasons, always good advice to avoid special characters in IDs too... just stick to alpha-numeric characters and underscores.
Variables and special characters when using
It may be necessary to use quotes, or escape special characters such
$ if specifying Luna commands on the command line after
from standard input), to stop the shell from processing those as shell
Use quotes to avoid
| being interpreted as special characters
by the shell, e.g.:
luna s.lst -s "EPOCH & STATS signal=EEG1|EEG & ANNOTS"
EDF+ support for long integers and floats
As noted here, the EDF+ spec allows for a logarithmic transformation which can be helpful to represent floating-point data with a large dynamic range. This is not currently implemented in Luna.