Demonstration data

This demonstration uses a subset of whole-night sleep hd-EEG data from the GRINS (Global Research Initiative on the Neurophysiology of Schizophrenia) project (PMID: 35578829, 39297495, 38858652)

Get the walk-through data from the NSRR

The data for this walk-through are available at the National Sleep Research Resource:

URL = https://sleepdata.org/datasets/luna-grins

Original data (`v1`)

For the purpose of this walkthrough, the "original" subset of the GRINS dataset is as follows:

20 high-density (57-channel plus mastoids A1 and A2) whole-night recordings, exported as an EDF file
10 males, 10 females, all healthy controls (versus patients with schizophrenia in the full GRINS study)
EEGs and mastoid channels have been down-sampled to 128 Hz from the original sampling rate of 500 Hz, and EOG/EMG channels have been dropped
all signals are potentials measured with respect to a common recording (reference) electrode
all recordings have manual (AASM) staging and annotated arousals
all dates, times and IDs have been swapped out (replaced with start times and dates of 22.00.00 and 1.1.85)
the 10 females have assigned IDs F01 to F10; the 10 males have assigned IDs M01 to M10
the tab-separated file luna-grins/auxiliary/master.txt contains the age (years) of each individual (27 - 45 years); the luna-grins/auxiliary/ folder also contains some other helper files used throughout the demonstration

This reduced subset of the original GRINS dataset is labelled v1 in this demonstration.

Manipulated data (`v2`)

From the original (v1) dataset we have created a second manipulated version (v2), in which we have purposefuly introduced variations in standards and conventions as well as several flavors of data corruption. In particular, changes in the v2 dataset include:

corrupted files (e.g. truncated, scrambled, misaligned or swapped signals or annotations)
variable file formats for annotations (and different EDF conventions including record size and EDF vs EDF+D)
changes in signal labels, units, sample rates, polarity and referencing
variable pre-filtering
duplicated or missing EEGs channels
gaps in recordings

The v2 data are intended to mimic some of the typical issues encountered when working with real data. The first four steps in this demonstration (file QC, signal QC, staging and artifacts) follow the steps of detecting -- and potentially correcting -- some of the (known) issues with the v2 data, aiming to recapitulate the original v1 dataset as much as possible. The second stage (ten modules in the analysis step) is focused on working with a cleaned analysis-ready (i.e. preprocessed) dataset.

The demonstration is structured this way to showcase Luna's approaches to data QC and manipulation as well as the "final" analytic steps: in practice, the former can be just as important and involved as the latter.

Fixes versus redo's

Whereas some things can be effectively fixed (e.g. interpolating a missing channel), other issues in the v2 dataset are more serious (e.g. completely corrupted/empty EDFs) and naturally can't be "fixed" analytically. In these scenarios, in order to make the full N=20 analysis dataset, we occasionally retrieve the original (v1) files: think of this as (in real life) corresponding to having to re-run the study for that person, or contacting the data-generating team, e.g. asking them to re-export/transfer a new EDF.

For reference, the sections below detail the specific manipulations introduced in terms of signals (EDFs) and staging (annotation files). They are not important to study in detail at this point, but can provide a useful reference when stepping through the demonstration; we link to these tables throughout the walkthrough when they are needed.

Signal manipulations

For reference when working through the demonstration, this table summarizes the manipulations introduced into the EDFs:

ID	Signal manipulations	Example channel labels
`F01`	N2 epochs only (EDF+D)	`Fp1`
`F02`	Different sample rate (150 Hz)	`Fp1`
`F03`	Different unit (mV)	`Fp1`
`F04`	Wrong unit (mV but says uV)	`Fp1`
`F05`	Filtered (1-20 Hz)	`Fp1`
`F06`	Corrupt EDF	`Fp1`
`F07`	Flipped EEGs	`Fp1`
`F08`	Corrupt EDF	`Fp1`
`F09`	Flipped EEGs	`Fp1`
`F10`	Gapped (EDF+D, staging aligned)	`Fp1`
`M01`	Flat midline EEGs	`Fp1`
`M02`	(none)	`Fp1`
`M03`	Duplicate channels (`CZ`->`C3`,`C4`,`F3`,`F4`,`P3`,`P4`)	`Fp1`
`M04`	Dropped `CZ`	`Fp1`
`M05`	(none)	`fp1` (lowercase)
`M06`	Atypical EDF record size (17s)	`FP1` (uppercase)
`M07`	(none)	`EEG-Fp1`
`M08`	(none)	`Fp1 REF` (spaced)
`M09`	Linked-mastoid referencing	`Fp1-(M1+M2)/2`
`M10`	Gapped (EDF+D, staging unaligned)	`A1` -> `M1`, `A2` -> `M2`

Annotation manipulations

Initially, all v1 annotations were in Luna's standard .annot format, with standard stage labels (N1,N2,N3,R & W). In addition, recordings also have manually annotated arousals (with the label Arousal).

As detailed below, for some individuals, we manipulated either the labels (e.g. SlpStg2 instead of N2, etc), file formats and/or timing conventions. Further, in a few cases we introduced more severe changes, e.g. shifting the staging relative to the signal data, truncating the staging information, or completely scrambling it.

For reference when working through the demonstration, this table summarizes the manipulations introduced into the annotations:

ID	Format	Alternate stage labels	Introduced errors
`F01`	Standard `.annot`	.	.
`F02`	Standard `.annot`	.	Misaligned (+22 epochs)
`F03`	Reduced columns; `+dur` specification; variable durations	`S1`,`S2`,`S3`,`REM`,`Wake`	File swapped w/ `F04`
`F04`	Reduced columns; `+dur` specification; variable durations	`S1`,`S2`,`S3`,`REM`,`Wake`	File swapped w/ `F03` (& mislabeled `F04b.annot`)
`F05`	Reduced columns; `...` specification; variable durations	`S1`,`S2`,`S3`,`REM`,`Wake`	Misaligned (+0.223s)
`F06`	Reduced columns; `...` specification; variable durations	`S1`,`S2`,`S3`,`REM`,`Wake`	.
`F07`	`.annot` with clock-time (H:M:S)	`S1`,`S2`,`S3`,`SR`,`SW`	.
`F08`	`.annot` with clock-time (H:M:S)	`S1`,`S2`,`S3`,`SR`,`SW`	.
`F09`	`.annot` with date/clock-time (D/M/Y-H:M:S)	`1`,`2`,`3`,`5`,`0`	.
`F10`	`.annot` with date/clock-time (D/M/Y-H:M:S)	`1`,`2`,`3`,`5`,`0`	.
`M01`	Comma-delimited (.csv) clock-times	.	.
`M02`	Comma-delimited (.csv) clock-times	.	Truncated (<500 epochs)
`M03`	Tab-delimited alternate column order; `+dur` specification	.	.
`M04`	Tab-delimited alternate column order; `+dur` specification	.	.
`M05`	`.eannot` w/ staging only	`n1`,`n2`,`n3`,`r`,`w`	All epochs/rows scrambled
`M06`	`.eannot` w/ staging only	.	.
`M07`	`.eannot` w/ staging only	.	.
`M08`	`.eannot` w/ staging only	.	.
`M09`	Luna/NSRR XML format	`SlpStg1`, `SlpStg2`, ...	.
`M10`	Luna/NSRR XML format	`SlpStg1`, `SlpStg2`, ...	.

Example annotations

You can skim over this section, which is intended for reference.

Here we show a few lines that are representative of the formatting/labelling variations introduced: in all cases, the lines convey the same fundamental staging information (but are plus or minus the arousal annotation):

Standard .annot (F01, F02)

W       .      .       2280.000        2310.000        .
N1      .      .       2310.000        2340.000        .
Arousal .      .       2333.500        2354.200        .
N1      .      .       2340.000        2370.000        .
N1      .      .       2370.000        2400.000        .
N1      .      .       2400.000        2430.000        .
N2      .      .       2430.000        2460.000        .

The standard form uses start/stop encoding for annotations (columns 4 & 5), with times measured in seconds past the EDF start time. The periods (.) in columns 2, 3 and 6 are unused fields, for the instance ID, associated channel(s), and meta-information.

Reduced columns, altered labels and duration-encoding (+dur) (F03, F04)

Wake    2280.000    +30
S1      2310.000    +30
Arousal 2333.500    +20.7
S1      2340.000    +30
S1      2370.000    +30
S1      2400.000    +30
S2      2430.000    +30

It is permissible to drop columns 2, 3 and 6, in which case Luna expects an annotation class label, a start time (seconds or clock-time), and an end time (encoded either as an explicit stop time (elapsed seconds or clock-time), or alternatively as a duration (e.g. +30 means 30 seconds from the start of that event) or (as below) an indication that the annotation spans until the start of the next annotation (...).

Reduced columns, altered labels and ellipsis-encoding (...) (F05, F06)

Wake    2280.000    ...
S1      2310.000    ...
S2      2430.000    ...

Ellipsis encoding means that the annotation stops at the start of the next annotation as ordered in that file. Also, here contiguous epochs of the same stage are implicitly a single annotation (i.e. in multiples of 30 seconds). This encoding also means that arousals aren't present in this file; they could be included in a separate annotation file; alternatively, they could be specified in this file prior to any ellipsis-encoded annotations, using start/stop or start/duration encoding.

Clock-times, altered labels (F07, F08)

SW      .   .   22:38:00    22:38:30     .
S1      .   .   22:38:30    22:39:00     .
Arousal .   .   22:38:53.5  22:39:14.2   .
S1      .   .   22:39:00    22:39:30     .
S1      .   .   22:39:30    22:40:00     .
S1      .   .   22:40:00    22:40:30     .
S2      .   .   22:40:30    22:41:00     .

Date/clock-times, altered labels (F09, F10)

0       .   .   1/1/1985-22:38:00    1/1/1985-22:38:30     .
1       .   .   1/1/1985-22:38:30    1/1/1985-22:39:00     .
Arousal .   .   1/1/1985-22:38:53.5  1/1/1985-22:39:14.2   .
1       .   .   1/1/1985-22:39:00    1/1/1985-22:39:30     .
1       .   .   1/1/1985-22:39:30    1/1/1985-22:40:00     .
1       .   .   1/1/1985-22:40:00    1/1/1985-22:40:30     .
2       .   .   1/1/1985-22:40:30    1/1/1985-22:41:00     .

Comma-delimited, clock-times (M01, M02; nb: not a valid Luna format)

W,22:38:00,22:38:30
N1,22:38:30,22:39:00
Arousal,22:38:53.5,22:39:14.2
N1,22:39:00,22:39:30
N1,22:39:30,22:40:00
N1,22:40:00,22:40:30
N2,22:40:30,22:41:00

In the demonstration, these files need to be altered before Luna can process them (swap commas with tabs).

Alternate ordering (M03, M04; nb. not a valid Luna format)

22:38:00    W        +30
22:38:30    N1       +30
22:38:53.5  Arousal  +20.7
22:39:00    N1       +30
22:39:30    N1       +30
22:40:00    N1       +30
22:40:30    N2       +30

In the demonstration, these files need to be altered before Luna can process them (swap column order).

Basic epoch-level annot (M05 - M08; nb. no arousal annotations)

W
N1
N1
N1
N2

This simple .eannot (epoch-annotation) format lists one (stage) label per epoch (assumed to be 30 seconds unless otherwise defined by the user in Luna).

XML NSRR/Luna format (M09, M10)

<?xml version="1.0" encoding="UTF-8"?>
<Annotations>
<Instances>

...

<Instance class="SlpStgW">
 <Name>Wake</Name>
 <Start>2280.000</Start>
 <Duration>30</Duration>
</Instance>

<Instance class="SlpStg1">
 <Name>Stage N1</Name>
 <Start>2310.000</Start>
 <Duration>120</Duration>
</Instance>

<Instance class="Arousal">
 <Name>Arousal</Name>
 <Start>2333.500</Start>
 <Duration>20.7</Duration>
</Instance>

<Instance class="SlpStg2">
 <Name>Stage N2</Name>
 <Start>2430.000</Start>
 <Duration>30</Duration>
</Instance>

...

</Instances>
</Annotations>

Luna supports these different formats on the assumption that getting your original annotation information into the closest of these types of formats above should usually not be too difficult (e.g. involving only line-by-line reformatting rather than more complex manipulations of the information content).

Preparing to embark

To obtain these data and satisfy any other prerequisites of this demonstration (e.g. installation of core software and other tools), move on to the next step: 0-Prep.