Skip to content

FAQ and trouble-shooting

Although not necessarily asked with respect to Luna, here are some frequently asked questions:

What?

Luna is a C/C++ library focused on the analysis of large numbers of sleep studies encoded as EDFs. This is a free, open-source project. Currently, there is a command-line tool (lunaC) and an extension library for R (lunaR).

Which?

The current version is the beta-release v0.22 (31-March-2019). Use luna -v to display the specific build date/time.

Where?

Luna is developed at the Brigham & Women's Hospital and Harvard Medical School, Boston, MA, United States.

Who?

Luna was primarily developed by Shaun Purcell, with input from a number of colleagues:

  • Susan Redline and her team developing the National Sleep Research Resource
  • Dennis Dean for sharing his original SpectralTrainFig code-base
  • Sara Mariani and Charmaine Demanuele for input on several EEG and ECG analysis components

Interested to contribute (either as a colleague or as a job)? Please contact me.

How?

Luna development is indirectly supported via a number of NIH grants: NHLBI R01HL146339 (PI Purcell), NHLBI R21HL145492 (PI Purcell), NIMH R03 MH108908 (PI Purcell), as well as NHLBI R35HL135818 (PI Redline) and NHLBI R24HL114473 (PI Redline).

Why?

This is a good question and deserves a longer answer... The primary aim of Luna was to provide a platform for 1) adopting some of the elegant methods and models that have emerged from animal and lab-based cognitive neuroscience studies over the past decade or so, and 2) for applying them in the context of large (albeit sometimes noisy) epidemiological studies with polysomnography.

As a relative newcomer to sleep research (my personal background is primarily in psychiatric genetics), the development of Luna has tracked with my (still steep) learning curve, in how to think about sleep signal data. Because of this, I adopted the tools I was most familiar with (namely C/C++ and R), rather than the ubiquitous "in-house Matlab script". In developing Luna though, I've been constantly reminded of how powerful Matlab and its associated toolboxes are for working with electrophysiological signal data. I can also appreciate that working with Luna's particular instantiations of specific methods may be unnecessarily restrictive for some.

So, why wouldn't I just use Matlab? There was, from my perspective, still an unmet need for tools to work with sleep data in thousands of individuals, such as from the NSRR. In my (limited) experience of seeing how others approached sleep data, it seemed clear that although the substantive core of a particular analysis (e.g. power spectral density estimation) could be efficiently and flexibly implemented in a single Matlab command (i.e. pwelch() or similar), a lot of the scaffolding around these one or two central functions (i.e. most of the "work" from a practical perspective) was more often than not a tangle of brittle, error-prone and undocumented scripting. Although not a perfect solution even for our own work, Luna represents a modest step in the direction of building more robust and scalable analysis tools.

I had originally conceived of Luna just as my own personal library of functions that would assist me in my sleep research. However, I decided to document and distribute this code for a number of reasons:

  • to make the tool better: documenting and distributing code has intrinsic value, as this process tends to make the underlying tool better, even if it will only ever be used by yourself or a very small number of people.

  • accessibility and transparency: the sleep field is unfortunately replete with black box proprietary software and file formats which can be limiting; making things open-source lets others see what you've done, and use it without restriction.

  • community: others can build upon your work; in genetics, for example, I developed a tool PLINK, which has been quite widely-used. Since it was first developed (in 2007), however, there have been considerable advances in the scale of data, and in the types of analytic approaches taken. Being an open-source tool, others were able to very significantly augment and even rewrite it, to produce an order-of-magnitude more powerful tool, whilst at the same time maintaining the pipelines and community experience that had been built over more than a decade with PLINK.

For both larger and smaller projects, I'd strongly recommend the document/distribute model whenever practically possible.

Acknowledgments

Luna uses a number of excellent open-source components, in particular:

  • FFTW library

  • SQLite embedded database

  • R Project for Statistical Computing

  • Chapters and example code from Mike X Cohen's fabulously clear and practical book: Analyzing neural time series data

  • Lees, J. M. and J. Park (1995): Multiple-taper spectral analysis: A stand-alone C-subroutine: Computers & Geology: 21, 199

  • Laurent Condat (2013) A Direct Algorithm for 1-D Total Variation Denoising . IEEE Signal Processing Letters, 20:11.

  • Multi-scale entropy (MSE) algorithm by Madalena Costa et al. (Costa M., Goldberger A.L., Peng C.-K. Multiscale entropy analysis of biological signals. Phys Rev E 2005;71:021906.)

Trouble-shooting

Windows line endings

MS Windows uses carriage return (CR) and line feed (LF) characters to denote the end of a line, whereas UNIX-like systems (including Mac) use LF alone. The file command on UNIX-like systems will indicate if this is the case.

file *.txt
foo.txt:     ASCII text, with CRLF line terminators
bar.txt:     ASCII text

Use a utility such as unix2dos to convert these files. Otherwise, use the tool tr available on most systems:

tr -d '\r' < infile.txt > outfile.txt

Spaces in channel names or annotations

Set an signal alias in a parameter file. If the original label is REF X1, for example, enter the line:

 alias     REF|"REF X1"

to create a new label alias REF which can be used, e.g. on the command line, instead of REF X1.

Similarly, if masking on an annotation with a space, you need to put quotes around it. For example, the NSRR annotation for REM sleep has spaces and special characters, REM sleep|5. Therefore, in a command file use:

MASK if="REM sleep|5" 

If you are using the -s option to specify a commands directly as arguments to Luna, you will likely already be using quotes for the entire command, thus you need to escape those additional quotes: i.e.

luna s.lst -o out.db -s "EPOCH & MASK if=\"REM sleep|5\""  

In general, this can get a bit messy. Therefore, 1) use command files for most things, not -s, 2) use aliases, and 3) sensible channel labels and annotation names whenever possible.

Advice on channel names

Try to keep channel names to simple alphanumeric characters combined with the underscore character to delimit terms. Although Luna will accept spaces and characters such as + - * % ( ) ., etc, in channel names, we advise against them if you wish to use destrat and other tools such as R to process results downstream.

That is, for any output that is stratified by channel (CH), you may wish to create a dataset where each channel corresponds to a column/variable in the output. If a variable name is, for example, SIGMA, then using a command like

destrat out1.db -c CH > my-file.txt 

may create variables with names such as SIGMA.CH.C3-M2 or SIGMA.CH.EEG(2). When loaded into R, this may lead to variable names that are harder to work with (i.e. these characters are swapped to . or you need to quote variable/list names, etc). For example, if you output with channels are row stratifiers:

destrat out1.db -r CH > my-file.txt 

but subsequently use an R command such as dcast (from the reshape2 or data.table packages) to generate a data frame where channels correspond to columns, you'll end up with variable names such as d$C3-M2 which can make life difficult (i.e. R would complain that M2 doesn't exist, as the - is interpreted as a minus, so you'd need to write d$"C3-M2", or find other work-arounds, etc).

To avoid this, use aliases.

PS. for other reasons, always good advice to avoid special characters in IDs too... just stick to alpha-numeric characters and underscores.

Variables and special characters when using -s

It may be necessary to use quotes, or escape special characters such as $ if specifying Luna commands on the command line after -s (instead from standard input), to stop the shell from processing those as shell directives.

Use quotes to avoid & or | being interpreted as special characters by the shell, e.g.:

 luna s.lst -s "EPOCH & STATS signal=EEG1|EEG & ANNOTS" 

EDF+ support for long integers and floats

As noted here, the EDF+ spec allows for a logarithmic transformation which can be helpful to represent floating-point data with a large dynamic range. This is not currently implemented in Luna.