Artifact detection
Commands to perform artifact detection/correction
Command | Description |
---|---|
CHEP-MASK |
CHannel/EPoch (CHEP) outlier detection |
ARTIFACTS |
Per-epoch Buckelmueller et al. (2006) artifact detection |
LINE-DENOISE |
Line denoising via spectrum interpolation |
SUPPRESS-ECG |
Correct cardiac artifact based on ECG |
ALTER |
Reference-channel regression-based artifact correction |
CHEP-MASK
Epoch-wise Hjorth parameters and other statistics
CHEP masks indicate whether particular CHannel/EPoch pairs are
unusual or likely artifacts. As explained in this
vignette, the CHEP-MASK
is designed to work
hand-in-hand with CHEP
and potentially (in the
context of hdEEG sleep data) INTERPOLATE
commands.
This command calculates and reports per-epoch Hjorth parameters and (optionally) other statistics: signal root mean square (RMS), indices of signal clipping (the proportion of points that equal the minimum or maximum for that epoch), absolute maximum absolute values and flatness (proportion of points of a similar value to the preceding value).
Luna will mask channel/epoch pairs (called cheps here) that are (statistical) outliers (i.e. potentially indicative of signal artifacts) for these measures. For Hjorth parameters, thresholds for outlier detection can be set iteratively: for example, first, to flag epochs more than 2 standard deviations (SD) from the mean, and then second, to flag more epochs that are more than 2 SDs from the mean based on the epochs that survived round 1, etc. This iterative heuristic can be useful if some epochs are outliers by many orders of magnitude, as those can effectively hide other epochs that are less pronounced but outliers nonetheless.
A few notes on using CHEP-MASK
to detect outlying epochs:
-
CHEP-MASK
respects previously set CHEP masks and does not include them in the outlier detection process, i.e. as they have already been removed. To alter this behavior useep-th0
,ch-th0
andchep-th0
options in place ofep-th
,ch-th
andchep-th
(described below). -
You can set a smaller epoch size such as
EPOCH len=4
for finer-grained artifact detection. -
As well as Hjorth parameters,
CHEP-MASK
can flag epochs based on the proportion of sample points in a given epoch that are a) above a certain absolute value (max
), that are clipped (clipped
) or flat (flat
).
Parameters
Core parameters:
Parameter | Example | Description |
---|---|---|
sig |
sig=C3,F3 |
Restrict analysis to these channels |
ep-th |
ep-th=3,3 |
SD-unit threshold(s) for Hjorth within-channel/between-epoch outlier detection |
ch-th |
ch-th=2 |
SD-unit threshold(s) for Hjorth between-channel/within-epoch outlier detection |
chep-th |
chep-th=5 |
SD-unit threshold(s) for Hjorth general outlier detection |
Additional options:
Parameter | Example | Description |
---|---|---|
clipped |
clipped=0.05 |
Flag epochs with this proportion of clipped sample points |
flat |
flat=0.05 |
Flag epochs with this proportion of flat sample points (optionally flat=0.05,1e-4 ) |
max |
max=200,0.05 |
Flag epochs with this proportion of points with absolute value above 200 |
Output
The CHEP-MASK
command only alters the CHEP mask and writes some
information to the console. No other output is generated. Use the
CHEP
command following CHEP-MASK
to produce
summary statistics on the number of epochs/channels masked, etc.
Example
See this vignette for an application of CHEP-MASK
to hdEEG data.
Here, we'll consider a simpler application in the context of a PSG recording with only two central EEG channels. Taking the first individual from the tutorial, here we calculate epoch-level metrics for the first EEG channel, and flag outliers that are greater than 2 standard deviations from the mean (performed twice, iteratively).
luna s.lst 1 sig=EEG -o out.db -s 'CHEP-MASK sig=EEG ep-th=2,2 '
CMD #1: CHEP-MASK
options: ep-th=2,2 sig=EEG
set epochs to default 30 seconds, 1364 epochs
within-channel/between-epoch outlier detection, ep-th = 2,2
iteration 1: removed 261 channel/epoch pairs this iteration (261 in total)
iteration 2: removed 115 channel/epoch pairs this iteration (376 in total)
By itself, CHEP-MASK
only alters the internal CHEP mask: to actually remove these
potentially aberrant epochs, we need to pair this commadn with CHEP
, which sets the
epoch-level mask based on the current CHEP mask:
luna s.lst 1 sig=EEG -o out.db -s ' CHEP-MASK sig=EEG ep-th=2,2
& CHEP epochs
& DUMP-MASK
& RE '
Here we see output from CHEP
which shows that we are now dropping those flagged epochs, by
setting the epoch-level mask:
CMD #2: CHEP
options: epochs=T sig=EEG
masking epochs with >0% masked channels: 376 epochs
CHEP summary:
376 of 1364 channel/epoch pairs masked (28%)
376 of 1364 epochs with 1+ masked channel, 376 with all channels masked
1 of 1 channels with 1+ masked epoch, 0 with all epochs masked
Finally, the RESTRUCTURE
(or RE
) takes the epoch-level mask
and actually drops those epochs permanently from the internal representation of the data - i.e. any subsequent
analysis would be based on this subset of 988 epochs, not the full set.
CMD #3: RE
options: sig=EEG
restructuring as an EDF+: keeping 29640 records of 40920, resetting mask
retaining 988 epochs
The intervening DUMP-MASK
command outputs the current
state of the epoch-level mask (as used below).
The reason for separating out the steps of flagging channel/epoch
pairs as outliers (CHEP-MASK
), to setting particular epochs as
outliers (for all channels, via CHEP
), to actually removing the
flagged epochs (with RE
) is that it provides more flexibility to
design artifact detection workflows that are suitable to different
types of data/different modes of artifact.
Finally, the see the underlying Hjorth statistics used in the outlier
detection, you would need to use the
SIGSTATS
command:
luna s.lst 1 sig=EEG -o hjorth.db -s 'SIGSTATS sig=EEG epoch'
We first extract epoch-level output (i.e. stratified by E
as well as
CH
) into a plain-text file stats.txt
:
destrat hjorth.db +SIGSTATS -r CH E > stats.txt
We can also pull the epochs actually masked, from the above run: destrat out.db +DUMP-MASK -r E > masked.txt
Both these files have
1364 rows (plus a header), corresponding to the number of epochs in
the original data. We can then use a package such as R to load and
plot these data. For example, from the R command line:
d <- read.table( "stats.txt", header = T )
m <- read.table( "masked.txt", header = T )
head(d)
ID CH E H1 H2 H3
1 nsrr01 EEG 1 98.45154 0.4807507 1.105732
2 nsrr01 EEG 2 234.34135 0.3164355 1.152268
3 nsrr01 EEG 3 159.33651 0.3511658 1.089591
4 nsrr01 EEG 4 181.39586 0.6554725 1.115427
Below, still using R, we plot four epoch-level metrics (H1
(log-scaled),
H2
and H3
), with the epochs set as outliers in purple:
f <- function(x) { ifelse(x,"purple","darkgray") }
par(mfcol=c(3,1), mar=c(3,4,1,2) )
plot(d$E,log(d$H1) ,pch=20,cex=0.8,col=f(m$EMASK),xlab="Epoch",ylab="H1")
plot(d$E,d$H2 ,pch=20,cex=0.8,col=f(m$EMASK),xlab="Epoch",ylab="H2")
plot(d$E,d$H3 ,pch=20,cex=0.8,col=f(m$EMASK),xlab="Epoch",ylab="H3")
As noted above, all epochs past epoch 1100 or so are wake, essentially containing nothing but noise (e.g. if recording continued after electrodes were removed from the scalp). As expected, these have been flagged as likely artifact.
ARTIFACTS
Applies an artifact detection algorithm for EEG data
This command applies the method described in Buckelmueller et al, to detect likely artifacts in an EEG channel. Briefly, the method takes 10 4-second windows for each 30-second epoch to estimates delta (0.6 to 4.6 Hz) and beta (40-60Hz) spectral power, using Welch's method. Based on a sliding window of 15 epochs (i.e. seven either side of each epoch), an epoch is flagged to be masked if either the delta power is more than 2.5 times the local average (from the sliding window) or the beta power is more than 2.0 times the local average.
Parameters
Parameter | Example | Description |
---|---|---|
sig |
sig=C3,F3 |
Restrict analysis to these channels |
verbose |
verbose |
Report epoch-level statistics |
no-mask |
no-mask |
Run, but do not set any masks |
Output
Channel-level output (strata: CH
):
Variable | Description |
---|---|
FLAGGED_EPOCHS |
Number of epochs failing Buckelmueller |
ALTERED_EPOCHS |
Number of epochs actually masked |
TOTAL_EPOCHS |
Number of epochs tested |
Epoch-level output (option: verbose
, strata: CH
x E
):
Variable | Description |
---|---|
DELTA |
Delta power |
DELTA_AVG |
Local average delta power |
DELTA_FAC |
Relative delta power factor |
BETA |
Beta power |
BETA_AVG |
Local average beta power |
BETA_FAC |
Relative beta power factor |
DELTA_MASK |
Masked based on delta power? |
BETA_MASK |
Masked based on beta power? |
MASK |
Whether the epoch is masked |
Example
Taking the same EEG channel from the first tutorial EDF as in the
SIGSTATS
example above:
luna s.lst 1 sig=EEG -o out.db -s "ARTIFACTS verbose"
We see this command flags 60 epochs as potentially containing artifact:
masked 60 of 1364 epochs, altering 60
destrat out.db +ARTIFACTS -r E CH > d.txt
We can then use R to show the results, for delta (left column) and beta (right column) separately, showing log-scaled power per epoch (top row), the local average (middle row) and the epoch/local relative factor (bottom row), where purple points indicate that epoch was flagged as an outlier (by either delta or beta criterion).
d <- read.table("d.txt",header=T)
par(mfcol=c(3,2), mar=c(4,4,2,2) )
plot( log(d$DELTA) , type="l" , ylab="Delta avg" )
plot( log(d$DELTA_AVG) , type="l" , ylab="Delta local")
plot( d$DELTA_FAC , pch=20 , col = f( d$MASK ) , ylab="Beta fac")
plot( log(d$BETA) , type="l" , ylab="Beta avg" )
plot( log(d$BETA_AVG) , type="l" , ylab="Beta local")
plot( d$BETA_FAC , pch=20 , col = f( d$MASK ) , ylab="Beta fac")
Comparing the results to the SIGSTATS
method described
above, which is applied to the same data, we see that this approach
does a good job of flagging individual epochs with unusual artifacts,
but does not flag the very extended stretches of gross artifact
towards the end of the recording (i.e. because the entire local window
is itself aberrant) . This latter type of artifact is, of course,
easier to spot by eye, and is flagged by SIGSTATS
. In practice,
running multiple artifact detection/correction routines is usually
warranted.
LINE-DENOISE
Attempt to attenuate electrical line noise at fixed frequencies, via spectrum interpolation
This command implements a spectrum interpolation approch to correcting certain types of line noise in EEG or other signals, following the approach described here.
Briefly, this approach works as follows:
-
obtain the amplitude spectrum via FFT
-
for fixed frequency windows, interpolate amplitude with mean of flanking regions
-
with phases unchanged, use Euler’s formula to obtain complex Fourier spectrum but with modified amplitudes
-
apply the inverse FFT to obtain a corrected time-domain signal
Parameters
Parameter | Example | Description |
---|---|---|
sig |
${eeg} |
Specify which signals to apply this method to |
f |
20,40,60 |
One or more target frequencies to set to missing and interpolate |
w |
1,0.5 |
Bandwidth of noise and neighbour intervals |
Output
No output, just modifies the internal signals.
Example
Here is one example recording with considerable line noise: the raw signal (left panel) for a single epoch, and the equivalent frequency spectrum up to 128 Hz (right panel). As well as line noise at 60 Hz, there are clear sub-harmonics at 4 Hz intervals, from which extend down into the regions typically considered of high physiological relevance for the sleep EEG, i.e. < 20 Hz.
Zooming into the <25 Hz range, here we see the mean PSD for the entire sample (in this case, hundreds of individuals, many of whom turn out to share the same signature of electrical line noise, as was evident when looking at the full sample-level spectra up to 128 Hz):
Although the impact on sample-level mean power in this frequency range is not extreme, in terms of individual differences, we see a clear and disturbing pattern, whereby the presence or absence of line-noise seems to drive clear spikes in the variance of power, and these seem to overwhelm physiological sources of between-individual variation. Naturally, this represents a rather large source of potential confounding.
We can apply the LINE-DENOISE
approach to this sample, as follows (showing here just targetted frequencies up to 24 Hz); (nb.
normally, one would add other commands, e.g. PSD
or WRITE
out new EDFs after running LINE-DENOISE
):
luna s.lst –s LINE-DENOISE sig=${eeg} f=4,8,12,16,20,24 w=1,0.5
Warning
In this particular case, we observed relatively clear fixed-frequency artifacts across many individuals: naturally, in different samples, different individuals (or different portions of the recording) may show different patterns, and so this simple, one-size-fits-all approach may not be optimal.
In this particular case, this simple approach to removing line noise works quite well on this one individual:
Perhaps more significantly, when looking at the whole sample (i.e. applying the above to all individuals in this sample), we see a flatter mean PSD, but also greatly reduced variance at these 4 Hz intervals, suggesting that we've done a reasonable job at removing individual differences in power due to individual differences in the extent of line noise (e.g. especially at 16 Hz). Note that the mean PSD does show a few 'notches' suggesting that it has over-corrected in some cases.
Warning
Correcting the EEG signal in this manner naturally entails changing the data in a complex way, which partially attenuates the suspected artifactual signal sources, but potentially also induces subtle (or not so subtle) new confounds. Some downstream analyses may be more or less sensitive to these issues, and so these corrections may or may not be necessary or desirable. All steps here must be performed with care (using different approaches, or compared to the raw data, and checking for convergence with respect to the final, substantive results, if restricting analyses to clean data is not an option).
SUPPRESS-ECG
Given an ECG channel, detect R peaks and subtract cardiac-related contamination from the EEG
This command applies a modified
Pan-Tompkins algorithm
to detect QRS in the ECG. Based on the detected R-peaks, it estimates
the average signature for other channels based on 2-second intervals
time-locked with each R-peak. This is subtracted from the EEG, as
described here. It
also estimates heart rate per-epoch, and flags values likely to
represent artifact. If needed, channels will be resampled to have
similar sampling rates (to be set to the value of the parameter sr
).
Parameters
Parameter | Example | Description |
---|---|---|
ecg |
ecg=ECG |
Specify which channel is the ECG |
sr |
sr=125 |
Set the sample rate that all channels should to resampled to |
no-suppress |
no-suppress |
Run the command but do not alter any EEG channels |
Output
Individual-level summaries (strata: none):
Variable | Description |
---|---|
BPM |
Mean heart rate (bpm) |
BPM_L95 |
Lower 95% confidence interval for mean HR |
BPM_U95 |
Upper 95% confidence interval for mean HR |
BPM_N_REMOVED |
Number of epochs flagged as having invalid HR estimates |
BPM_PCT_REMOVED |
Proportion of epochs flagged as having invalid HR estimates |
Epoch-level metrics (strata: E
):
Variable | Description |
---|---|
BPM |
Heart rate for this epoch (bpm) |
BPM_MASK |
Was this epoch invalid? |
Channel-level summaries (strata: CH
):
Variable | Description |
---|---|
ART_RMS |
Root mean square of correction signature |
Details of artifact signature (strata: CH
x SP
)
Variable | Description |
---|---|
ART |
For each sample point in a 2-second window, the estimated correction factor |
Example
Here we look at individual nsrr02
from the
tutorial data, to identify and remove possible
cardiac contamination in the EEG:
luna s.lst nsrr02 sig=EEG,ECG -o out.db < cmd.txt
The command file cmd.txt
restricts analysis to NREM2 sleep,
estimates EEG/ECG coherence (and
spectral power), then estimates and corrects for
cardiac contamination in the EEG, and finally, repeats the coherence
and spectral analyses:
MASK ifnot=NREM2
RE
TAG R/pre
COH
PSD spectrum
SUPPRESS-ECG ecg=ECG
TAG R/post
PSD spectrum
COH
Hint
Note how we've used the TAG
command to structure the output, by adding a factor R
to denote
whether the coherence and spectral analysis was performed pre
or post
the SUPPRESS-ECG
command.
We can extract the per-epoch estimates of heart rate (bpm) and plot as follows:
destrat out.db +SUPPRESS-ECG -r R E > hr.txt
Plotting the BPM
field of hr.txt
against epoch (E
), e.g. in R:
plot(d$E,d$BPM,pch=20,col="red",ylim=c(50,80),ylab="HR (bpm)",xlab="NREM2 epoch")
Second, we can extract the estimates of coherence between EEG and ECG as follows:
destrat out.db +COH -r F -c R -r CHS -v COH > d.txt
Plotting the pre
(black) and post
(blue) coherence values, for
frequencies up to 20 Hz, we see that a non-trivial association between
ECG and EEG channels has effectively been removed based on these
metrics:
plot( d$F , d$COH.R.pre , ylim=c(0,1) , type="b" , pch=20, ylab="Coherence" , xlab="Frequency (Hz)" )
points( d$F , d$COH.R.post , ylim=c(0,1) , type="b" , pch=20, col="blue" )
Note, this is only a rough-and-ready approach to working with ECG data, which is not a primary focus of Luna. Other artifacts may well remain (e.g. in higher frequencies) and this approach may not work well with other datasets. That is, as usual, your mileage may vary...
Hint
When comparing different individual or signals, the ART_RMS
metric can be taken as an approximate guide to the relative extent
of contamination. The actual signatures (i.e. the R-peak
sync'ed averaged EEG) can be viewed by looking at the ART
variable, which is a channel by sample-point (CH
x SP
) metric.
ALTER
Simple regression-based or EMD approaches to handle certain artifacts
This command runs in one or two modes:
-
using explicit signals in the EDF as correctors (
corr
), i.e. proxies of any artifact in the target (sig
) channel(s), and regressing out those components, following the approach outlined here. By default, the correction is performed in 5 second segments, as a sliding window with increment 2.5 seconds (i.e. 50% overlap). The final signal is based on weighted combinations of the overlapping estimates, to avoid edge artifacts in the reconstructed signal. -
alternatively, applying EMD to the target (
sig
) channel, and subtracting components that are highly correlated with any of the corrector channels; or, ifemd-corr
option is present, also performing EMD on each corrector channel and removing anysig
components that are highly correlated with one or more of the corrector (corr
) components.
By default, the regression-based approach uses segments of 5 seconds,
with 2.5 second overlap. The EMD-based approach uses non-overlapping
30-second segments. For now, we suggest using the regression-based
mode of ALTER
.
All sample rates must be similar for sig
and corr
channels.
Note
This command is explicitly called ALTER rather than CORRECT or some such... That is, depending on the the nature of the signals and the artifact, there is no guarantee that this approach will correctly remove the true, underlying sources of artifact.
Paramters
Option | Example | Description |
---|---|---|
sig |
C3 |
Signal to be altered |
corr |
LOC,ROC |
Signals to be used as references (i.e. proxies of artifact) |
segment-sec |
10 | Segment size (default: 5 seconds for regression mode, 30 seconds for EMD mode) |
For the EMD-based method, there are the following additional parameters:
Option | Example | Description |
---|---|---|
emd |
10 |
Use EMD components of sig as references (instead of corr channels) |
th |
0.9 |
EMD-channel correlation threshold (default: 0.9) |
emd-corr |
Perform EMD on corrector signals (corr ) also |
Output
No output is generated, other than the in-memory channels specified by sig
being altered.
Examples
We'll use a combination of real and simulated data to show how the
ALTER
command works. Note that, in practice, real artifacts
(complex phenomena not necessarily well-captured by any one channel
or component) are much more likely to be difficult to correct. That is, the performance
in this toy dataset is not likely to be indicative of how well it performs on real data, e.g.
with EMG, EOG or ECG artifact in the EEG.
We'll take 10 minutes of N2 data from a random study:
luna cfs.lst 1 -s ' MASK ifnot=N2
RE
REFERENCE sig=C3 ref=M2
SIGNALS keep=C3
MASK random=20 & RE
WRITE force-edf edf-dir=tmp '
We'll use the SIMUL
command to spike in an
artifactual signal (a simple sine wave) into the real EEG signal.
Below, we'll call the original signal S1
; the artifact is X1
; the
contaminated signal is S2 = S1 + X1
; the cleaned signal is S3
,
which is intended to have the component due to X1
removed. That is, we run S3
(which is S1
plus X1
)
through the ALTER
command, to see whether it removes this simple source of artifact:
luna tmp/file.edf -o out.db alias="S1|C3" \
-s ' SIMUL sig=X1 frq=25 psd=2 sr=128
TRANS sig=S2 expr=" S2 = S1 + X1 "
TRANS sig=S3 expr=" S3 = S2 "
ALTER sig=S3 corr=X1
COH spectrum max=50
PSD spectrum max=50 dB '
After running ALTER
, we run COH
and PSD
to generate power spectra and coherence statistics for all signals. Here are the resulting power spectra:
The final signal (S3
) has clearly had the major source of artifact removed (versus the contaminated S2
signal), and shows a spectrum that is effectively identical to the
original signal, S1
.
Looking at the coherence statistics from COH
, we see a
very strong source of contamination around 25 Hz, which is the
frequency of the simulated X1
artifact signal. This has effectively
been completely removed in the altered signal (cohernece between S3
and X1
). Note that some coherence values are not defined here, as
the power for the (artificial) X1
signal (a simple sine wave) is
effectively zero there, meaning that the coherence statistic is not
defined.)
Inasmuch as the corrector channels provide an accurate proxy for the contamination observed in the target signal, this simple regression-based approach should, in principle, perform well. Naturally, seeing how it performs with realistic channels (e.g. to remove EMG artifact from the EEG, etc) may vary from dataset to dataset. We'll add a future vignette to explore some of these issues in real data.