Demonstration data
This demonstration uses a subset of whole-night sleep hd-EEG data from the GRINS (Global Research Initiative on the Neurophysiology of Schizophrenia) project (PMID: 35578829, 39297495, 38858652)
Original data (v1
)
For the purpose of this walkthrough, the "original" subset of the GRINS dataset is as follows:
-
20 high-density (57-channel plus mastoids
A1
andA2
) whole-night recordings, exported as an EDF file -
10 males, 10 females, all healthy controls (versus patients with schizophrenia in the full GRINS study)
-
EEGs and mastoid channels have been down-sampled to 128 Hz from the original sampling rate of 500 Hz, and EOG/EMG channels have been dropped
-
all signals are potentials measured with respect to a common recording (reference) electrode
-
all recordings have manual (AASM) staging and annotated arousals
-
all dates, times and IDs have been swapped out (replaced with start times and dates of
22.00.00
and1.1.85
) -
the 10 females have assigned IDs
F01
toF10
; the 10 males have assigned IDsM01
toM10
-
the tab-separated file
luna-grins/auxiliary/master.txt
contains the age (years) of each individual (27 - 45 years); theluna-grins/auxiliary/
folder also contains some other helper files used throughout the demonstration
This reduced subset of the original GRINS dataset is labelled v1
in this demonstration.
Manipulated data (v2
)
From the original (v1
) dataset we have created a second
manipulated version (v2
), in which we have purposefuly introduced
variations in standards and conventions as well as several flavors of
data corruption. In particular, changes in the v2
dataset include:
-
corrupted files (e.g. truncated, scrambled, misaligned or swapped signals or annotations)
-
variable file formats for annotations (and different EDF conventions including record size and EDF vs EDF+D)
-
changes in signal labels, units, sample rates, polarity and referencing
-
variable pre-filtering
-
duplicated or missing EEGs channels
-
gaps in recordings
The v2
data are intended to mimic some of the typical issues
encountered when working with real data. The first four steps in this
demonstration (file QC, signal QC,
staging and artifacts) follow the steps
of detecting -- and potentially correcting -- some of the (known)
issues with the v2
data, aiming to recapitulate the original v1
dataset as much as possible. The second stage (ten modules in the
analysis step) is focused on working with a cleaned
analysis-ready (i.e. preprocessed) dataset.
The demonstration is structured this way to showcase Luna's approaches to data QC and manipulation as well as the "final" analytic steps: in practice, the former can be just as important and involved as the latter.
Fixes versus redo's
Whereas some things can be effectively
fixed (e.g. interpolating a missing channel), other issues in the
v2
dataset are more serious (e.g. completely corrupted/empty
EDFs) and naturally can't be "fixed" analytically. In these
scenarios, in order to make the full N=20 analysis dataset, we
occasionally retrieve the original (v1
) files: think of this as
(in real life) corresponding to having to re-run the study for that person, or
contacting the data-generating team, e.g. asking them to
re-export/transfer a new EDF.
For reference, the sections below detail the specific manipulations introduced in terms of signals (EDFs) and staging (annotation files). They are not important to study in detail at this point, but can provide a useful reference when stepping through the demonstration; we link to these tables throughout the walkthrough when they are needed.
Signal manipulations
For reference when working through the demonstration, this table summarizes the manipulations introduced into the EDFs:
ID | Signal manipulations | Example channel labels |
---|---|---|
F01 |
N2 epochs only (EDF+D) | Fp1 |
F02 |
Different sample rate (150 Hz) | Fp1 |
F03 |
Different unit (mV) | Fp1 |
F04 |
Wrong unit (mV but says uV) | Fp1 |
F05 |
Filtered (1-20 Hz) | Fp1 |
F06 |
Corrupt EDF | Fp1 |
F07 |
Flipped EEGs | Fp1 |
F08 |
Corrupt EDF | Fp1 |
F09 |
Flipped EEGs | Fp1 |
F10 |
Gapped (EDF+D, staging aligned) | Fp1 |
M01 |
Flat midline EEGs | Fp1 |
M02 |
(none) | Fp1 |
M03 |
Duplicate channels (CZ ->C3 ,C4 ,F3 ,F4 ,P3 ,P4 ) |
Fp1 |
M04 |
Dropped CZ |
Fp1 |
M05 |
(none) | fp1 (lowercase) |
M06 |
Atypical EDF record size (17s) | FP1 (uppercase) |
M07 |
(none) | EEG-Fp1 |
M08 |
(none) | Fp1 REF (spaced) |
M09 |
Linked-mastoid referencing | Fp1-(M1+M2)/2 |
M10 |
Gapped (EDF+D, staging unaligned) | A1 -> M1 , A2 -> M2 |
Annotation manipulations
Initially, all v1
annotations were in Luna's standard .annot
format, with standard stage
labels (N1
,N2
,N3
,R
& W
). In addition, recordings also have manually
annotated arousals (with the label Arousal
).
As detailed below, for some individuals, we manipulated either the
labels (e.g. SlpStg2
instead of N2
, etc), file formats and/or
timing conventions. Further, in a few cases we introduced more severe
changes, e.g. shifting the staging relative to the signal data,
truncating the staging information, or completely scrambling it.
For reference when working through the demonstration, this table summarizes the manipulations introduced into the annotations:
ID | Format | Alternate stage labels | Introduced errors |
---|---|---|---|
F01 |
Standard .annot |
. | . |
F02 |
Standard .annot |
. | Misaligned (+22 epochs) |
F03 |
Reduced columns; +dur specification; variable durations |
S1 ,S2 ,S3 ,REM ,Wake |
File swapped w/ F04 |
F04 |
Reduced columns; +dur specification; variable durations |
S1 ,S2 ,S3 ,REM ,Wake |
File swapped w/ F03 (& mislabeled F04b.annot ) |
F05 |
Reduced columns; ... specification; variable durations |
S1 ,S2 ,S3 ,REM ,Wake |
Misaligned (+0.223s) |
F06 |
Reduced columns; ... specification; variable durations |
S1 ,S2 ,S3 ,REM ,Wake |
. |
F07 |
.annot with clock-time (H:M:S) |
S1 ,S2 ,S3 ,SR ,SW |
. |
F08 |
.annot with clock-time (H:M:S) |
S1 ,S2 ,S3 ,SR ,SW |
. |
F09 |
.annot with date/clock-time (D/M/Y-H:M:S) |
1 ,2 ,3 ,5 ,0 |
. |
F10 |
.annot with date/clock-time (D/M/Y-H:M:S) |
1 ,2 ,3 ,5 ,0 |
. |
M01 |
Comma-delimited (.csv) clock-times | . | . |
M02 |
Comma-delimited (.csv) clock-times | . | Truncated (<500 epochs) |
M03 |
Tab-delimited alternate column order; +dur specification |
. | . |
M04 |
Tab-delimited alternate column order; +dur specification |
. | . |
M05 |
.eannot w/ staging only |
n1 ,n2 ,n3 ,r ,w |
All epochs/rows scrambled |
M06 |
.eannot w/ staging only |
. | . |
M07 |
.eannot w/ staging only |
. | . |
M08 |
.eannot w/ staging only |
. | . |
M09 |
Luna/NSRR XML format | SlpStg1 , SlpStg2 , ... |
. |
M10 |
Luna/NSRR XML format | SlpStg1 , SlpStg2 , ... |
. |
Example annotations
You can skim over this section, which is intended for reference.
Here we show a few lines that are representative of the formatting/labelling variations introduced: in all cases, the lines convey the same fundamental staging information (but are plus or minus the arousal annotation):
Standard .annot (F01
, F02
)
W . . 2280.000 2310.000 .
N1 . . 2310.000 2340.000 .
Arousal . . 2333.500 2354.200 .
N1 . . 2340.000 2370.000 .
N1 . . 2370.000 2400.000 .
N1 . . 2400.000 2430.000 .
N2 . . 2430.000 2460.000 .
The standard form uses start/stop encoding for annotations (columns 4
& 5), with times measured in seconds past the EDF start time. The
periods (.
) in columns 2, 3 and 6 are unused fields, for the
instance ID, associated channel(s), and meta-information.
Reduced columns, altered labels and duration-encoding (+dur
) (F03
, F04
)
Wake 2280.000 +30
S1 2310.000 +30
Arousal 2333.500 +20.7
S1 2340.000 +30
S1 2370.000 +30
S1 2400.000 +30
S2 2430.000 +30
It is permissible to drop columns 2, 3 and 6, in which case Luna
expects an annotation class label, a start time (seconds or
clock-time), and an end time (encoded either as an explicit stop time
(elapsed seconds or clock-time), or alternatively as a duration
(e.g. +30
means 30 seconds from the start of that event) or (as below) an
indication that the annotation spans until the start of the next
annotation (...
).
Reduced columns, altered labels and ellipsis-encoding (...
) (F05
, F06
)
Wake 2280.000 ...
S1 2310.000 ...
S2 2430.000 ...
Ellipsis encoding means that the annotation stops at the start of the next annotation as ordered in that file. Also, here contiguous epochs of the same stage are implicitly a single annotation (i.e. in multiples of 30 seconds). This encoding also means that arousals aren't present in this file; they could be included in a separate annotation file; alternatively, they could be specified in this file prior to any ellipsis-encoded annotations, using start/stop or start/duration encoding.
Clock-times, altered labels (F07
, F08
)
SW . . 22:38:00 22:38:30 .
S1 . . 22:38:30 22:39:00 .
Arousal . . 22:38:53.5 22:39:14.2 .
S1 . . 22:39:00 22:39:30 .
S1 . . 22:39:30 22:40:00 .
S1 . . 22:40:00 22:40:30 .
S2 . . 22:40:30 22:41:00 .
Date/clock-times, altered labels (F09
, F10
)
0 . . 1/1/1985-22:38:00 1/1/1985-22:38:30 .
1 . . 1/1/1985-22:38:30 1/1/1985-22:39:00 .
Arousal . . 1/1/1985-22:38:53.5 1/1/1985-22:39:14.2 .
1 . . 1/1/1985-22:39:00 1/1/1985-22:39:30 .
1 . . 1/1/1985-22:39:30 1/1/1985-22:40:00 .
1 . . 1/1/1985-22:40:00 1/1/1985-22:40:30 .
2 . . 1/1/1985-22:40:30 1/1/1985-22:41:00 .
Comma-delimited, clock-times (M01
, M02
; nb: not a valid Luna format)
W,22:38:00,22:38:30
N1,22:38:30,22:39:00
Arousal,22:38:53.5,22:39:14.2
N1,22:39:00,22:39:30
N1,22:39:30,22:40:00
N1,22:40:00,22:40:30
N2,22:40:30,22:41:00
Alternate ordering (M03
, M04
; nb. not a valid Luna format)
22:38:00 W +30
22:38:30 N1 +30
22:38:53.5 Arousal +20.7
22:39:00 N1 +30
22:39:30 N1 +30
22:40:00 N1 +30
22:40:30 N2 +30
Basic epoch-level annot (M05
- M08
; nb. no arousal annotations)
W
N1
N1
N1
N2
This simple .eannot
(epoch-annotation) format lists one (stage)
label per epoch (assumed to be 30 seconds unless otherwise defined by
the user in Luna).
XML NSRR/Luna format (M09
, M10
)
<?xml version="1.0" encoding="UTF-8"?>
<Annotations>
<Instances>
...
<Instance class="SlpStgW">
<Name>Wake</Name>
<Start>2280.000</Start>
<Duration>30</Duration>
</Instance>
<Instance class="SlpStg1">
<Name>Stage N1</Name>
<Start>2310.000</Start>
<Duration>120</Duration>
</Instance>
<Instance class="Arousal">
<Name>Arousal</Name>
<Start>2333.500</Start>
<Duration>20.7</Duration>
</Instance>
<Instance class="SlpStg2">
<Name>Stage N2</Name>
<Start>2430.000</Start>
<Duration>30</Duration>
</Instance>
...
</Instances>
</Annotations>
Luna supports these different formats on the assumption that getting your original annotation information into the closest of these types of formats above should usually not be too difficult (e.g. involving only line-by-line reformatting rather than more complex manipulations of the information content).
Preparing to embark
To obtain these data and satisfy any other prerequisites of this demonstration (e.g. installation of core software and other tools), move on to the next step: 0-Prep.