Skip to content

POPS

Prediction of sleep stages

POPS & Moonlight

As noted below, the interactive Moonlight viewer provides a simple point-and-click interface to POPS (in prediction mode only), which is appropriate for applying our pre-made single-EEG POPS model to small numbers of EDFs.

This page describes Luna's automated sleep stager (POPS). This vignette also gives some details on POPS and its companion command, SOAP. POPS is generic in the sense that it can be trained on multiple different types of signals. Currently, we only distribute a single-EEG model however, although more will be added in the near future.

Command Description
Models An overview of the current POPS models and expected signals
POPS prediction mode Apply automated sleep stage prediction
EVAL-STAGES Apply POPS evaluation metrics to external predictions
--eval-stages Similar to above, but without an attached EDF
POPS train training mode (1) Create level 1 feature matrices for training
--pops training mode (2) Combine trainers for level 2 features and train POPS models

The easy route

The easiest route to use POPS (at least for small numbers of EDFs) is the point-and-click Moonlight tool. Opening the Hypnogram/POPS panel, we can select the M1 model, specify the single channel to use (EEG, which is C4-M1), and check that we need to bandpass the signal prior to staging). Click Run POPS and then 5-10 seconds later, you'll have this output;

img

The epoch-level predictions are available in a table from the second sub-panel too:

img

(It is also possible to set lights-off/on times (approximately anyway) by selecting and dragging the mouse on the top hypnogram - see the main Moonlight tutorial).

However, if you are wishing to a) use different models, b) alter parameters, and/or c) stage more than a handful of studies in a reproducible manner, then it is highly advised (and ultimately, much easier) to use the standard command line Luna tool, as described below.

Models

An initial single EEG POPS model (s2) is hosted here:

URL = http://zzz.bwh.harvard.edu/dist/luna/pops.zip

Download this ZIP file, and extract it to give a pops/ folder.

The model was trained on ~3500 individuals from the NSRR; although the model is a single-EEG model, both C3-M2 and C4-M1 channels were used, i.e. with the assumption that they are effectively interchangeable in this context. The models were trained using the workflow as described below. The next section shows how to use these models for prediction.

The core s2 (aka Moonlight's M1, see below) model requires the following:

  • a single central EEG, based on a contralateral mastoid reference, i.e. C3-M2 or C4-M1. In practice, similar EEGs channels can be swapped in and should still perform similarly, e.g. F3-M2.

  • channels must be band-pass filtered 0.3 - 35 Hz

  • as well as this primary channel (given the placehold label CEN below), the M1 model expects a parallel, standardized version (called ZEN below), which can be obtained via the ROBUST-NORM Luna command.

In all, using Luna commands to prepare to run POPS with this model, say starting with the single EEG, say C3_M2:

luna s.lst -s ' FILTER sig=C3_M2 bandpass=0.3,35 tw=0.5 ripple=0.02
                COPY sig=C3_M2 tag=NORM
                ROBUST-NORM sig=C3_M2_NORM epoch winsor=0.005 second-norm=T
                POPS alias=CEN,ZEN|C3_M2,C3_M2_NORM path=pops lib=s2 '

i.e. this performs all the steps of a) filtering, b) creating a copy of the signal, and c) normalizing that second copy; finally, the last command calls POPS, indicates via alias that the expected slots CEN and ZEN correspond to C3_M2 and C3_M2_NORM as created above, then specifies the folder containing the POPS models/files, and indicates which library to use (s2). This of course assumes that the pops/ folder has been downloaded and, in this instance, is in the current directory. POPS will rescale (to uV) and resample (to 128 Hz) the EEGs, if needed (as these properties are specified in the .ftr file).

This model can be used in two different ways, which correspond to the M1 and M2 labels in the Moonlight instantiation of POPS - both are based on the same underyling s2 model, however.

  • The M1 instantation takes a single EEG channel from the test subject and makes predictions, as above.

  • The M2 instantiation differs at the prediction stage: rather than selecting, say, either C3-M2 or C4-M1 from the test subject, this uses the equiv POPS option to take both EEG channels, predict from each separately, and automatically select (epoch-by-epoch) the most "confident" prediction for each epoch. The requirements are the same except that two CEN and ZEN channels are required, as shown below.

POPS files

When unzipping pops.zip (or creating new models from scratch), you'll see the following file types:

File Description
s2.ftr Feature file, specifying which features to compute per epoch from the raw EEG
s2.mod Model file, generated by LightGBM after training the model
s2.conf Configuration file, to control aspects of LightGBM when fitting the model
s2.ranges Ranges file, represents the distribution of the training data
s2.*.svd SVD files, specified by s2.ftr (used to project new data into the SVD component space )

Most users will only use POPS in prediction mode, which is far simpler. Given a pre-trained model (as can be downloaded above) the core POPS command is in its most basic form just:

POPS path=pops lib=s2 
As noted above, prior steps will likely be needed to get the EEG channels to the required format, and to indicate which EDF channels correspond to which channel labels used in the .ftr file.

However, if you want to train your own models (e.g. based on other populations, etc), you can use POPS in training mode without too much effort. The basic workflow for training POPS models is two-step:

  POPS train   
    - inputs: signals (EDFs) plus a feature definition file (.ftr)
    - output: epoch by (level 1) feature matrices (binary files)

  --pops
    - inputs: concatenated binary feature matrices & LGBM config (.conf) file
    - output: a `.mod` file (LGBM model) plus several auxiliaries, as above

The basic workflow for prediction using a previously created POPS model is a single step:

   POPS
    - inputs: signals (EDFs), feature, model files (.ftr,  .mod) & auxiliaries 
    - output: posterior probabilties & most likely stages per epoch

That is, the feature file (here s2.ftr) is a text file with a special syntax to specify which epoch-level features to extract from the raw signals. We use this s2.ftr feature file to both train models, and then also to predict in new individuals. That is, the feature files are the key link between the raw signal data, and the epoch-level feature matrix that is the fundamental data used in sleep staging by Luna.

It is used both in training and in prediction. See below for a detailed overview. The .mod file is generated by LightGBM (the engine used to power POPS) and need not be inspected manually. The .conf file is not necessary when using POPS to predict sleep stages; otherwise, if the above files reside in the folder pops/ then

POPS (prediction)

Single observation stage accuracies and probabilities

Parameters

Primary options:

Option Example Description
path /path/to/pops Folder where POPS feature, model files, etc reside
lib s2 Name of library in path folder (i.e. root for .ftr, .mod, etc
alias CEN,ZEN|C3_M2,C3_M2_NORM Assign CEN (from .ftr) to be C3_M2 (from EDF), etc
equiv CEN,ZEN|C4_M1,C4_M1_NORM Use C4_M1 as a second equivalent alongside the original CEN; likewise for C4_M1_NORM and ZEN
replace CEN,C3 Replace EDF channel CEN with C3
lights-off 23:00:00 Lights off time (ignore epochs before)
lights-on 08:00:00 Lights on time (ignore epochs after)
SHAP Estimate SHAP information scores (takes longer to run)
SHAP-epoch Estimate epoch-level SHAP information scores (takes longer to run & verbose output)
Output

The primary output of running POPS in prediction mode is a set of posterior probabilities for each epoch, along with the most likely stage.

If the dataset contained manual staging, Luna will also report a suite of accuracy measures and print confusion matrices to the console (i.e. on the assumption that the original staging represent a gold standard).

In addition, POPS adds a set of annotations - labelled either N1, N2, N3, R and W (if no original staging present), or pN1, pN2, pN3,pR and pW if there was original staging. In practice, one may there want to add a command such as

WRITE-ANNOTS annot=pN1,pN2,pN3,pR,pW file=^-pops.annot hms '
after running POPS to make predictions. If output with p (POPS) prefixes, you can tell Luna to use those as the standard staging annotations via ss-prefix=p (or ss-pops=T: i.e. this will use POPS staging if the file pops.annot was previously generated in id1-pops.annot:
luna s.lst ss-prefix=p annot-file=id1-pops.annot -s HYPNO 

Individual-level output (strata: none)

Variable Description
ACC Accuracy
ACC3 3-class accuracy (NR/R/W)
K Kappa
K3 3-class Kappa
MCC Matthew's correlation coef.
MCC3 3-class MCC
F1 F-1 score
F13 3-class F-1 score
F_WGT Weighted F-1
RECALL Recall
RECALL3 3-class recall
RECALL_WGT Weighted recall
PREC Precision
PREC3 3-class precision
PREC_WGT Weighted precision
REM_LAT_OBS Observed REM latency
REM_LAT_PRD Predicted REM latency
SLP_LAT_OBS Observed sleep latency
SLP_LAT_PRD Predicted sleep latency

Epoch-level outputs (stratum: E)

Variable Description
PRIOR Prior staging, if present
PRED Predicted (most likely) stage
CONF Confidence score (highest posterior)
FLAG Flagged if an issue/outlier (0/1)
START Start time of epoch
STOP Start time of epoch
PP_N1 Posterior probability, N1
PP_N2 Posterior probability, N2
PP_N3 Posterior probability, N3
PP_R Posterior probability, REM
PP_W Posterior probability, wake

Stage-specific metrics (stratum: SS )

Variable Description
ORIG Stage duration, original staging
PRF Stage duration, predicted, weighted by posteriors
PR1 Stage duration, predicted, counting most-likely epochs
F1 F-1 score
PREC Precision
RECALL Recall
OBS Number of observations

Feature-level summaries (stratum: FTR)

Variable Description
LABEL Feature label
LABEL_ORIG Original label
INC Included? 0/1
DROPPED Dropped? 0/1
FINAL Column number in final feature matrix, if present
LEVEL Level 1 or level 2 feature?
BAD Bad feature (outliers)?
BLOCK Block label
PROP Proportion of missing observations
ROOT Label root name

Epoch-level accuracy by transition class (stratum: ETYPE)

Variable Description
ACC Accuracy
N Number of events

See the SOAP documentation for a description of the various ETYPE levels.

Stage-specific epoch-level accuracy by transition class (strata: SS x ETYPE)

Variable Description
ACC Accuracy
N Number of events

SHAP information scores (strata: SS x FTR)x

Variable Description
SHAP SHAP value for that stage/feature

Epoch-level confusion matrix (strata: OBS x PRED)

Variable Description
N Number of epochs
P_COND_OBS Probability of predicted stage, conditional on observed stage
P_COND_PRED Probability of observed stage, conditional on predicted stage
Example

Taking the second individual from the tutorial dataset:

luna s.lst 2 -o out.db \
            -s ' FILTER sig=EEG bandpass=0.3,35 tw=0.5 ripple=0.02
                 COPY sig=EEG tag=NORM
                 ROBUST-NORM sig=EEG_NORM epoch winsor=0.005 second-norm=T
                 POPS alias=CEN,ZEN|EEG,EEG_NORM path=pops lib=s2
                 WRITE-ANNOTS annot=pN1,pN2,pN3,pR,pW file=pops.annot hms '

This gives some verbose information to the console, describing the creation of the feature matrix:

  reading feature specification from pops/s2.ftr
   396 level-1 features, 109 level-2 features
   113 of 505 features selected in the final feature set
  read 65 valid feature mean/SD ranges from pops/s2.ranges
  set 0 leading/trailing sleep epochs to '?' (given end-wake=120 and end-sleep=5)
  expecting 396 level-1 features (for 1195 epochs) and 2 signals
  applying Welch with 4s segments (2s overlap), using median over segments
  resampling channel EEG from sample rate 125 to 128
  resampling channel EEG_NORM from sample rate 125 to 128
  pruning rows from 1195 to 1195 epochs
   - adding level-2 feature SVD: SPEC1 (n=98) --> SPEC1.SVD (n=6, cols:396-401) 
   - reading SVD W and V from pops/s2.spec1.svd
   - adding level-2 feature SVD: SPEC2 (n=98) --> SPEC2.SVD (n=6, cols:402-407) 
   - reading SVD W and V from pops/s2.spec2.svd
   - adding level-2 feature SVD: RSPEC1 (n=98) --> RSPEC1.SVD (n=4, cols:408-411) 
   - reading SVD W and V from pops/s2.rspec1.svd
   - adding level-2 feature SVD: RSPEC2 (n=98) --> RSPEC2.SVD (n=4, cols:412-415) 
   - reading SVD W and V from pops/s2.rspec2.svd
   - adding level-2 feature SMOOTH: SPEC1.SVD (n=6) --> SPEC1.SVD.SMOOTHED1 (n=6, cols:416-421) 
   - adding level-2 feature SMOOTH: SPEC2.SVD (n=6) --> SPEC2.SVD.SMOOTHED1 (n=6, cols:422-427) 
   - adding level-2 feature SMOOTH: MISC1 (n=4) --> MISC1.SMOOTHED1 (n=4, cols:428-431) 
   - adding level-2 feature SMOOTH: SPEC1.SVD (n=6) --> SPEC1.SVD.SMOOTHED2 (n=6, cols:432-437) 
   - adding level-2 feature SMOOTH: SPEC2.SVD (n=6) --> SPEC2.SVD.SMOOTHED2 (n=6, cols:438-443) 
   - adding level-2 feature SMOOTH: MISC1 (n=4) --> MISC1.SMOOTHED2 (n=4, cols:444-447) 
   - adding level-2 feature SMOOTH: SPEC1.SVD (n=6) --> SPEC1.SVD.SMOOTHED3 (n=6, cols:448-453) 
   - adding level-2 feature SMOOTH: SPEC2.SVD (n=6) --> SPEC2.SVD.SMOOTHED3 (n=6, cols:454-459) 
   - adding level-2 feature SMOOTH: MISC1 (n=4) --> MISC1.SMOOTHED3 (n=4, cols:460-463) 
   - adding level-2 feature SMOOTH: RSPEC1.SVD (n=4) --> RSPEC1.SVD.SMOOTHED1 (n=4, cols:464-467) 
   - adding level-2 feature SMOOTH: RSPEC2.SVD (n=4) --> RSPEC2.SVD.SMOOTHED1 (n=4, cols:468-471) 
   - adding level-2 feature SMOOTH: RSPEC1.SVD (n=4) --> RSPEC1.SVD.SMOOTHED2 (n=4, cols:472-475) 
   - adding level-2 feature SMOOTH: RSPEC2.SVD (n=4) --> RSPEC2.SVD.SMOOTHED2 (n=4, cols:476-479) 
   - adding level-2 feature SMOOTH: RSPEC1.SVD (n=4) --> RSPEC1.SVD.SMOOTHED3 (n=4, cols:480-483) 
   - adding level-2 feature SMOOTH: RSPEC2.SVD (n=4) --> RSPEC2.SVD.SMOOTHED3 (n=4, cols:484-487) 
   - adding level-2 feature NORM: MISC1 (n=4) --> ZMISC1 (n=4, cols:488-491) 
   - adding level-2 feature NORM: MISC1.SMOOTHED1 (n=4) --> ZMISC1.SMOOTHED1 (n=4, cols:492-495) 
   - adding level-2 feature NORM: MISC1.SMOOTHED2 (n=4) --> ZMISC1.SMOOTHED2 (n=4, cols:496-499) 
   - adding level-2 feature NORM: MISC1.SMOOTHED3 (n=4) --> ZMISC1.SMOOTHED3 (n=4, cols:500-503) 
   - adding level-2 feature TIME: --> TIME1 (n=1, cols:504-504) 
  feature matrix: 1195 rows (epochs) and 113 columns (features)
  set 1765 ( prop = 0.0130707) data points to missing
  read model from pops/s2.mod (1000 iterations)

After making the predictions, POPS create the annotations, and outputs the kappa (if there are observed staging data);

  adding POPS annotations (pN1, pN2, pN3, pR, pW)
  kappa = 0.821632; 3-class kappa = 0.871336 (n = 1195 epochs)

Note that these are the same kappas are output by Moonlight above.

This also prints the 5-class confusion matrix:

  Confusion matrix: 
     Pred:  W   R   N1  N2  N3  Tot
  Obs:  W   465 1   13  1   0   0.4
    R   0   117 2   1   0   0.1
    N1  5   1   4   1   0   0.01
    N2  14  53  11  312 9   0.33
    N3  1   0   0   37  147 0.15
    Tot:    0.41    0.14    0.03    0.29    0.13    1.0
  Confusion matrix: 
    Pred:      W      R     N1     N2     N3    Tot
  Obs:   W   465      1     13      1      0    0.4  
         R     0    117      2      1      0    0.1 
        N1     5      1      4      1      0   0.01
        N2    14     53     11    312      9   0.33
        N3     1      0      0     37    147   0.15
       Tot: 0.41   0.14   0.03   0.29   0.13    1.0

In this instance, the kappa is quite high (0.82 for 5-class, and 0.87 for the 3-class instance). Naturally, depending on a) the depth/consistency of sleep, b) the quality of the signals and c) any other technical differences between the test data and the training data, the performance might not be as good.

For example, applying the same model to the first tutorial individual, the initial kappa is much lower (<0.4). However, this is in large part because of an extend period of artifact after the lights-on period of the recording. Setting the lights-on option to exclude that increases the accuracy of prediction quite a lot. (This is because of the normalization step involved in pre-processing.) We'll be adding some vignettes in the future to consider best practice for applying POPS, adding new models (e.g. including EOGs, EMGs, etc).

EVAL-STAGES

Evaluates an external set of stages against the internal set

Given an external file (in .eannot format) of predicted stages, this command will read those, and compare them to the observed stages (i.e. from the original annotations) and generate the same table of statistics as POPS outputs.

Parameters
Option Example Description
file stage.txt File with staging
Output

The key outputs are also for POPS (kappas, accuracies and confusion matrices). See above.

Example

If we had extracted the annotations from the previous POPS predictions into the file stage.txt as an .eannot, the following command would give the same metrics as the original POPS command:

luna s.lst 2 -o out.db -s EVAL-STAGES file=stage.txt
  kappa = 0.821632; 3-class kappa = 0.871336 (n = 1195 epochs)
  Confusion matrix: 
    Pred:      W       R     N1     N2     N3    Tot
  Obs:   W   465       1     13      1      0    0.4
         R     0     117      2      1      0    0.1
        N1     5       1      4      1      0   0.01
        N2    14      53     11    312      9   0.33
        N3     1       0      0     37    147   0.15
      Tot:   0.41   0.14   0.03   0.29   0.13   1.00

i.e. EVAL-STAGES just performs the last comparison steps of POPS, but rather than using the POPS model to generate the predictions, it swaps in an external set of predictions. This can be useful, for example, if you want to compare the performance of another stager, on the same exact set of metrics as POPS.

--eval-stages

Evaluates an external set of stages against another external set

This is similar to EVAL-STAGES except there is no attached EDF/annotation set. It simply takes two .eannot style files of stages (one row per epoch) of the same length, and generates agreement statistics.

Parameters
Option Example Description
file stage.txt File with staging
file2 stage2.txt Second file with staging
Output

As above.

Example
luna --eval-stages --opt file=stages1.txt file2=stages2.txt

POPS (training)

Flexibly specify and train new POPS models

This is an advanced section, that covers using POPS to generate one's own stager.

Feature files

Although understanding the details of POPS feature files is not necessary in order to use POPS to predict new stages, it is useful to review briefly.

Here are the features understood by POPS:

Level 1 features:

Features Arguments Description
SPEC Power (default 0.25 Hz bins from 4-sec window)
RSPEC Relative power
VSPEC Intra-epoch variance in power
BAND Band power
RBAND Relative band power
VBAND Intra-epoch variance in band power
COH Magnitude-squared coherence
SLOPE Spectral slope (30-45 Hz)
SKEW Skewness
KURTOSIS Kurstosis
HJORTH Hjorth parameters
FD Fractal dimension
PE from to Permutation entropy (order 3 to 7)
MEAN Epoch mean
OUTLIERS th Remove outlier epochs
COVAR Individual-level/demographic covariates (from vars)

Level 2 features:

Feature Arguments Description
TIME order Time track
SMOOTH block half-window a Smoothing window
DENOISE block lambda=0.5 Total-variation denoiser
SVD block nc file PCA/SVD
NORM block Normalize
RESCALE block
CUMUL block Make epoch-level cumulative features
DERIV block Make epoch-level derivative features

Here is the main s2.ftr file (with comments interleaved):

% --------------------------------------------------------------------------------
% Declare any channels used (required), sample rates
% --------------------------------------------------------------------------------

% trained channel label = generic 'CEN'
%  CEN central EEG, filtered
%  ZEN central EEG, normed

CH CEN 128 uV
CH ZEN 128 uV

It is possible to specify aliases for CEN and ZEN (i.e. these are effectively placeholder labels) in the .ftr file (e.g. CH CEN C3_M2 C4_M1 128 uV) but as above, we can also use the alias option for POPS to do this for a given dataset.

Next, we specify some main level 1 features: i.e. the core features calculated independently per individual:

% --------------------------------------------------------------------------------
% Level 1 features
%  block : feature {key=value key=value}
% --------------------------------------------------------------------------------

spec1: SPEC     CEN lwr=0.75 upr=25 
spec2: SPEC     ZEN lwr=0.75 upr=25 

rspec1: RSPEC    CEN lwr=0.75 upr=25 z-lwr=0.75 z-upr=25
rspec2: RSPEC    ZEN lwr=0.75 upr=25 z-lwr=0.75 z-upr=25

misc1: FD       ZEN 
misc1: PE       ZEN from=4 to=4
misc1: HJORTH   ZEN 

The full table of features is given below. Note that the features are assigned to a block (e.g. spec1). This is an arbitrary label that can be used in the feature definition file to refer to the set of features. For example, spec1 maps to 98 columns from 0.75 Hz to 25 Hz in (by default) 0.25 Hz increments.

We next specify that epochs will be removed if any feature contains an extreme outlier value (10 SD units):

% --------------------------------------------------------------------------------
% Epoch/row exclusions based on level-1 features
% --------------------------------------------------------------------------------

misc1: OUTLIERS th=10

Level 2 features are based on level 1 features for one or more individuals. These are calculated on-the-fly when training models; they may also involve data reduction methods (SVD) that depend on multiple individuals, as below:

% --------------------------------------------------------------------------------
% Level 2 features: 
%  to-block: feature block=from-block {key=value}
% --------------------------------------------------------------------------------

spec1.svd:  SVD block=spec1  nc=6 file=s2.spec1.svd
spec2.svd:  SVD block=spec2  nc=6 file=s2.spec2.svd

rspec1.svd:  SVD block=rspec1  nc=4 file=s2.rspec1.svd
rspec2.svd:  SVD block=rspec2  nc=4 file=s2.rspec2.svd

That is, the SVD command takes all the variables in the spec1 block, normalizes within individual, fits a single SVD across all epochs/all individuals, and then extracts the top 6 components; in training mode, Luna will save the SVD to the file s2.spec1.svd; in prediction mode, Luna will read s2.spec1.svd and use it to project to derive the 6 new variables (i.e. summaries of the original 98 spectral values, in this case).

Next, we apply some temporal smoothing -

% --------------------------------------------------------------------------------
% Temporal smoothing
% --------------------------------------------------------------------------------

spec1.svd.smoothed1:  SMOOTH block=spec1.svd  half-window=2
spec2.svd.smoothed1:  SMOOTH block=spec2.svd  half-window=2
misc1.smoothed1:      SMOOTH block=misc1      half-window=2

spec1.svd.smoothed2:  SMOOTH block=spec1.svd  half-window=10
spec2.svd.smoothed2:  SMOOTH block=spec2.svd  half-window=10
misc1.smoothed2:      SMOOTH block=misc1      half-window=10

spec1.svd.smoothed3:  SMOOTH block=spec1.svd  half-window=25
spec2.svd.smoothed3:  SMOOTH block=spec2.svd  half-window=25
misc1.smoothed3:      SMOOTH block=misc1      half-window=25

rspec1.svd.smoothed1:  SMOOTH block=rspec1.svd  half-window=2
rspec2.svd.smoothed1:  SMOOTH block=rspec2.svd  half-window=2

rspec1.svd.smoothed2:  SMOOTH block=rspec1.svd  half-window=10
rspec2.svd.smoothed2:  SMOOTH block=rspec2.svd  half-window=10

rspec1.svd.smoothed3:  SMOOTH block=rspec1.svd  half-window=25
rspec2.svd.smoothed3:  SMOOTH block=rspec2.svd  half-window=25

We next normalize some of the smooth metrics:

% --------------------------------------------------------------------------------
% Normalize 
% --------------------------------------------------------------------------------

zmisc1:   NORM block=misc1
zmisc1.smoothed1:   NORM block=misc1.smoothed1
zmisc1.smoothed2:   NORM block=misc1.smoothed2
zmisc1.smoothed3:   NORM block=misc1.smoothed3

We add a time track (elapsed time from EDF start, scaled from -0.5 to +0.5):

% --------------------------------------------------------------------------------
% Time track
% --------------------------------------------------------------------------------

time1: TIME 

Finally, we select the subset of blocks to be used in the final model:

% --------------------------------------------------------------------------------
%
% Final feature selection (blocks as defined above)
%
% --------------------------------------------------------------------------------

SELECT spec1.svd  spec1.svd.smoothed1 spec1.svd.smoothed2 spec1.svd.smoothed3
SELECT spec2.svd  spec2.svd.smoothed1 spec2.svd.smoothed2 spec2.svd.smoothed3

SELECT rspec1.svd  rspec1.svd.smoothed1 rspec1.svd.smoothed2 rspec1.svd.smoothed3
SELECT rspec2.svd  rspec2.svd.smoothed1 rspec2.svd.smoothed2 rspec2.svd.smoothed3

SELECT  misc1      misc1.smoothed1     misc1.smoothed2     misc1.smoothed3
SELECT zmisc1     zmisc1.smoothed1    zmisc1.smoothed2    zmisc1.smoothed3

SELECT time1

Parameters

to be completed

Output

to be completed

Example

Here we generate a toy POPS model based on just 3 individuals (e.g. from the tutorial dataset). Of course, in practice, models should be trained on orders-of-magnitude larger datasets.

Assuming that a) s.lst is a sample list pointing to these EDFs & annotations, and b) all studies have existing manual staging data. Given a feature file a.ftr, this first step generates level 1 features for each individual, in the folder data/

mkdir data
luna s.lst -o out1.db  -s POPS train features=pops/a.ftr data=data/^

ls data
nsrr01   nsrr02   nsrr03

These are binary files (for compactness and speed of reading) - i.e. you cannot edit/view these with typical tools.

xxd  < data/nsrr01  | head
00000000: 066e 7372 7230 3154 0500 00c4 0000 0000  .nsrr01T........
00000010: 0000 0000 0000 008a cca4 e871 c82d 4030  ...........q.-@0
00000020: 4b74 2dec 1226 40ca 8c96 54ae 182e 40e4  Kt-..&@...T...@.
00000030: 2cca eadb 052a 4034 6c83 d8f1 7f22 40f3  ,....*@4l...."@.
00000040: 5450 3873 1f25 404f 0765 4568 d724 408f  TP8s.%@O.eEh.$@.
00000050: 1372 65e9 5d23 40cb 40d4 7899 b31e 40d2  .re.]#@.@.x...@.
00000060: 788f 9aa1 161d 4012 f8f9 da70 7eeb bfdf  x.....@....p~...
00000070: 5551 4b82 3617 4063 7558 d5fd 2e22 40a7  UQK.6.@cuX..."@.
00000080: d51f db9e b10f 401c f1a4 f8c0 3718 40dc  ......@.....7.@.
00000090: 2a9d 05f4 7e00 4098 f66a c532 48bb 3fea  *...~.@..j.2H.?.

The next step is to create a single training feature matrix, by concatenating the binary files for individuals who will be trainers:

cat data/nsrr01 data/nsrr03 > all.dat

We now a) generate the level 2 features, and b) fit the LightGBM model with the --pops command:

luna --pops -o out.db --options data=all.dat path=pops lib=a iterations=100

Please note that one would never use such a small training set in practice... i.e. the .conf file for the LightGBM training would certainly be not applicable, etc. Please consider these only as place-holder notes for now.

The above generates the pops/a.mod file (and some auxiliaries we can ignore for now). These can then be used to make predictions in new samples: for example, we exclude the second individual from the training dataset:

luna s.lst 2 -o out2.db -s POPS path=pops lib=a
  kappa = 0.502582; 3-class kappa = 0.713572 (n = 1195 epochs)
  Confusion matrix: 
    Pred:        W      R   N1     N2   N3    Tot
  Obs:   W     476      1    0      3    0    0.4
         R      68     28    0     24    0    0.1
        N1       9      0    0      2    0   0.01
        N2      84      2    0    313    0   0.33
        N3       2      0    0    183    0   0.15
       Tot:   0.53   0.03    0   0.44    0   1.00

The kappa here is lower than before - 0.50 - although the 3-class kappa is not terrible (0.71). However, with such a small training set, and no attention to tuning parameters, this is still a garbage-in / garbage-out example... Fully describing the process is beyond the scope of this documentation page however.

Note that if you tried to predict the first individual (who was also in the training dataset), you'll see inflated, unrealistic kappa values:

  kappa = 0.877568; 3-class kappa = 0.912719 (n = 1364 epochs)
  Confusion matrix: 
      Pred:    W      R     N1     N2   N3   Tot
  Obs:  W    467      3      2      5   0   0.35
        R     12    215      0     11   0   0.17
       N1     25      7     55     22   0   0.08
       N2      7      2      0    514   0   0.38
       N3      0      0      0     17   0   0.01
     Tot:   0.37   0.17   0.04   0.42   0   1.00

Further points

This documentation will be updated in due time:

  • a held-out set of validation individuals can be specified alongside the primary training and test datasets

  • weights can be applied to labels and/or training individuals

  • covariate information (e.g. age/sex) can be added with the COVAR feature and using the vars special variable to attach individual-level variables

  • creating different models using the diverse set of features (and multi-channel extensions) as described above - that is, as is POPS provides the framework for developing efficient, robust stagers, and the current model (s2) is only the first step.

Back to top