6.1 Canonical signal definitions
The information in this section outlines the use of Luna's
CANONICAL
command
to provide a more structured way to harmonize files. As described in
the channel harmonization page, we only had one
special exception (M09
) and so all files could be processed
relatively straightforwardly. However, it is not hard to imagine real
datasets with more combinations of different types of exceptions. In
these cases, rather than performing one-off commands, it can be easier
to adopt this approach.
Instead of manually performing the above steps (separating out M09
from the other studies), it can sometimes be easier to use Luna's
CANONICAL
command, which is
designed to help when harmonizing multiple sets of EDFs that have
different labels and conventions.
Specifying rules
We must first write a specification
file
that describes a set of rules for finding (and converting) signals to a
standardized (canonical) form. We won't be able to
describe the full syntax here, but we'll give a simple script that replicates the steps above.
The full file is in orig/aux/lm.sigs
. We'll also give a second example (to implement
contra-lateral mastoid referencing) in a second step below, to further showcase how CANONICAL
works
(although note: contra-lateral mastoid referenced data are not part of this walkthrough).
The basic form of a rule is as follows, e.g. for the C3 channel:
C3
req:
sig = C3,EEG-C3,"C3 REF"
ref = "A1,A2"
ref = "M1,M2"
unit = uV
unit = mV
set:
unit = uV
sr = 128
This finds a channel that is labelled either C3
, EEG-C3
or C3
REF
(note the space is quoted here) and then looks for both A1
and
A2
(or M1
and M2
) as reference channels. A special syntax for ref
of "a,b"
means use the average of channels a
and b
,
i.e. for linked mastoid referencing. Note that sig
and ref
can take a single list, or
you can list the entries on separate lines: i.e.
sig = C3
sig = EEG-C3
sig = "C3 REF"
has the same effect as sig = C3,EEG-C3,"C3 REF"
. Note that channel
labels are sanitized and case-insensitive when we perform the
matching for required signals.
Aliases and canonical rules
Note that channel aliases can perform some of the same logic as the CANONICAL
command, albeit in a more limited fashion. Although
you could combine aliases (i.e. from the cmaps
file) with these steps, it is probably clear to keep a single, self-contained canonical
rule specification file that has all the mappings instead.
The above rule also requires that all channels have uV or mV units. If we find
channels that match these criteria, we make a new (or update the
original) channel, with the name C3
as the first (rule label) shows.
We also resample to a desired sample rate of 128 Hz, if needed.
Rule templates
One could imagine typing out the above rules for all EEG channels, and
also allowing for different reference labels (e.g. EEG-M1
and
EEG-M2
), but that would quickly become tedious. Therefore the CANONICAL
command allows for variables and templates to aid this process.
-
variables (using the same variable syntax as used in Luna scripts) are assigned with the syntax
${a=C3}
and referenced using the syntax${a}
(i.e. which would substitute the textC3
in this case). You can use variables in variable assignments, but note that Luna script variables work on simple text substitution: e.g.will end up with${a=1} ${a=${a}+1}
${a}
equaling the string1+1
rather than the number 2. -
templates can be defined for a single type of channel, and then rules can be automatically generated for many actual channels (e.g. C3, C4, F3, etc...)
The text file lm.sigs
starts by defining the core EEG channels as variables (comment lines
start with %
):
% ------- core EEG labels and canonical targets
${left=Fp1,AF3,F7,F5,F3,F1,FT7,FC5,FC3,FC1,T7,C5,C3,C1,TP7,CP5,CP3,CP1,P7,P5,P3,P1,PO3,O1}
${right=Fp2 AF4 F2 F4 F6 F8 FC2 FC4 FC6 FT8 C2 C4 C6 T8 CP2 CP4 CP6 TP8 P2 P4 P6 P8 PO4 O2}
${midline=FPZ AFZ FZ FCZ CZ CPZ PZ POz OZ}
${eeg=${left},${midline},${right}}
Note that we've listed left, right and midline EEG channels separately, and
then combined them in a single list (${eeg}
). CANONICAL
can take lists of
channels that are either comma- or space-delimited, as shown above (i.e. for ${left}
and ${right}
, we
could be used either form equivalently). Although it
doesn't matter in this example, we split EEGs in this way it simplifies the
second contra-lateral mastoid example.
The next step is to define mastoid channel labels, as we want to extract/generate linked mastoid channels:
% ------- mastoid labels
${linked_mastoids="M1,M2"}
${linked_mastoids+="EEG-M1,EEG-M2"}
${linked_mastoids+="M1 REF,M2 REF"}
${linked_mastoids+="A1,A2"}
${linked_mastoids+="EEG-A1,EEG-A2"}
${linked_mastoids+="A1 REF,A2 REF"}
The +=
syntax means to append as a comma-delimited list, and so the ${linked_mastoids}
ends up being:
"M1,M2","EEG-M1,EEG-M2","M1 REF,M2 REF","A1,A2","EEG-A1,EEG-A2","A1 REF,A2 REF"
We could have just written this out and made one assignment, but sometimes spreading definitions over lines can make the files easier to read.
Next we define a rule template for the case where we have (unreferenced) EEG channels with both mastoid channels present separately in the EDF, i.e. given we know/assume that the channels haven't already ben re-referenced:
define: EEG_make_linked_mastoid
req:
sig = ^
sig = EEG-^
sig = "^ REF"
ref = ${linked_mastoids}
set:
unit = uV
sr = 128
This template is identical to the format of the rule above, except is
starts with the define:
keyword (indicating that we are defining a template) and
has an arbitrary label (here EEG_make_linked_mastoid
). Templates use the special ^
character, which will be substituted for the actual channel labels when the
template is applied. Note we can have normal variables in template
definitions too (e.g. ${linked_mastoids}
) rather than writing things
out here in full.
We also make a second template, to cover the case where (we know) the EEG are already referenced to linked mastoids:
define: EEG_is_linked_mastoid
req:
sig = ^-(M1+M2)/2
set:
unit = uV
sr = 128
i.e. here there is no separate ref
category, and we require that
channel labels must in the form ^-(M1+M2)/2
(where ^
will be C3,
C4, etc). This handles the M09
case. In practice, one would review
a dataset as we've done, and generate/extend the templates above to
match the data.
Applying templates
The final step is to apply these templates for the whole set of EEG channels, using the apply:
keyword
following by the template label:
apply: EEG_make_linked_mastoid ${eeg}
apply: EEG_is_linked_mastoid ${eeg}
${eeg}
variable (defined above) expands to the list of core EEG channels (we could have just typed them out here too -
we don't need to use variables here, but it can often make life easier.)
Note: when using CANONICAL
, the variable ${eeg}
is not automatically populated with likely EEG channel
labels based on the attached EDF header: here they must be specified
explicitly, unlike in actual Luna scripts.
The use of templates (define:
/ apply:
keywords) essentially just
preprocesses the file and expands out all templates to make a longer
rules file. To check things are working as expected, you can add the dump
option
to CANONICAL
and it will print the processed script (to standard output) and then stop. Doing this
just for the first individual:
luna s1.lst 1 -s CANONICAL file=orig/aux/lm.sigs dump > rules.txt
The console shows how the variables were set:
setting variable ${left} = Fp1,AF3,F7,F5,F3,F1,FT7,FC5,FC3,FC1,T7,C5,C3,C1,TP7,CP5,CP3,CP1,P7,P5,P3,P1,PO3,O1
setting variable ${right} = Fp2 AF4 F2 F4 F6 F8 FC2 FC4 FC6 FT8 C2 C4 C6 T8 CP2 CP4 CP6 TP8 P2 P4 P6 P8 PO4 O2
setting variable ${midline} = FPZ AFZ FZ FCZ CZ CPZ PZ POz OZ
setting variable ${eeg} = Fp1,AF3,F7,F5,F3,F1,FT7,FC5,FC3,FC1,T7,C5,C3,C1,TP7,CP5,CP3,CP1,P7,P5,P3,P1,PO3,O1,FPZ AFZ FZ FCZ CZ CPZ PZ POz OZ,Fp2 AF4 F2 F4 F6 F8 FC2 FC4 FC6 FT8 C2 C4 C6 T8 CP2 CP4 CP6 TP8 P2 P4 P6 P8 PO4 O2
setting variable ${linked_mastoids} = "M1,M2"
appending variable ${linked_mastoids} = "M1,M2","EEG-M1,EEG-M2"
appending variable ${linked_mastoids} = "M1,M2","EEG-M1,EEG-M2","M1 REF,M2 REF"
appending variable ${linked_mastoids} = "M1,M2","EEG-M1,EEG-M2","M1 REF,M2 REF","A1,A2"
appending variable ${linked_mastoids} = "M1,M2","EEG-M1,EEG-M2","M1 REF,M2 REF","A1,A2","EEG-A1,EEG-A2"
appending variable ${linked_mastoids} = "M1,M2","EEG-M1,EEG-M2","M1 REF,M2 REF","A1,A2","EEG-A1,EEG-A2","A1 REF,A2 REF"
defining template 'EEG_make_linked_mastoid' with target form '^'
defining template 'EEG_is_linked_mastoid' with target form '^'
expanding template 'EEG_make_linked_mastoid' for 57 channels
expanding template 'EEG_is_linked_mastoid' for 57 channels
read 114 rules from orig/aux/lm.sigs
in total, read 114 rules
running with 'dump' option, will not attempt to apply rules
rules.txt
contains a set of 114 (i.e. 2 times 57 channels) rules -- showing here just the first two -- along with the original
template/variable lines of the lm.sigs
commented out:
Fp1
req:
sig = Fp1
sig = EEG-Fp1
sig = "Fp1 REF"
ref = "M1,M2","EEG-M1,EEG-M2","M1 REF,M2 REF","A1,A2","EEG-A1,EEG-A2","A1 REF,A2 REF"
set:
unit = uV
sr = 128
AF3
req:
sig = AF3
sig = EEG-AF3
sig = "AF3 REF"
ref = "M1,M2","EEG-M1,EEG-M2","M1 REF,M2 REF","A1,A2","EEG-A1,EEG-A2","A1 REF,A2 REF"
set:
unit = uV
sr = 128
Running CANONICAL
If we omit the drop
argument, Luna will try to enforce these rules
and make new signals (note: we could have used CANONICAL file=rules.txt
if we generated rules.txt
as above,
and the results would be the same; normally, Luna doesn't need to save this intermediate processed file):
luna s1.lst 1 -o out.db -s CANONICAL file=orig/aux/lm.sigs
59 signals from EDF
+ generating canonical signal Fp1 from existing signal(s) FP1 / A1,A2
+ generating canonical signal AF3 from existing signal(s) AF3 / A1,A2
+ generating canonical signal F7 from existing signal(s) F7 / A1,A2
+ generating canonical signal F5 from existing signal(s) F5 / A1,A2
+ generating canonical signal F3 from existing signal(s) F3 / A1,A2
+ generating canonical signal F1 from existing signal(s) F1 / A1,A2
...
destrat out.db +CANONICAL | behead
ID F01
CS_NOT 0
CS_SET 57
UNUSED_CH 0
USED_CH 59
Typically, one combines CANONICAL
with WRITE
to generate the new
EDF; also, one usually wants to add drop-originals
so that only the
newly-generated canonical signals are left (i.e. in this instance,
this drops the mastoid channels):
luna s1.lst 1 -o out.db \
-s ' CANONICAL file=orig/aux/lm.sigs drop-originals & WRITE edf=new1 '
which makes a new file new1.edf
here.
Generating new EDFs
Putting this all together, to recapitulate the steps described
here to make all 20 EDFs, given we have lm.sigs
defining the rules. So as not to overwrite the prior EDFs, we'll add
these to a new folder (harm1b
) but the EDFs generayed there should
be effectively identical to those generated in harm1/
(we'll leave
checking this as an exercise for the reader: any minor differences are
because CANONICAL
also tidies up some of the other EDF header fields
(e.g. transducer, pre-filtering)).
mkdir work/harm1b
We then run to convert all 20 files:
luna s1.lst -o out.db \
-s ' CANONICAL file=orig/aux/lm.sigs drop-originals & WRITE edf-dir=work/harm1b/ '
You can remove the work/harm1b/
folder after confirming that the above command worked as expected.
Contra-latreral mastoid example
As a second (and more technical) example of CANONICAL
, the script
orig/aux/cm.sigs
contains rules to make new datasets with
contra-lateral mastoid references. We use this as to illustrate a few
other features of the CANONICAL
syntax set.
Here is the full script:
cat orig/aux/cm.sigs
% ------- core EEG labels and canonical targets
${left=Fp1,AF3,F7,F5,F3,F1,FT7,FC5,FC3,FC1,T7,C5,C3,C1,TP7,CP5,CP3,CP1,P7,P5,P3,P1,PO3,O1}
${right=Fp2,AF4,F2,F4,F6,F8,FC2,FC4,FC6,FT8,C2,C4,C6,T8,CP2,CP4,CP6,TP8,P2,P4,P6,P8,PO4,O2}
% ------- mastoid labels
${left_mastoid=M1,A1}
${left_mastoid+=[EEG-][${left_mastoid}],[${left_mastoid}][ REF]}
${right_mastoid=M2,A2}
${right_mastoid+=[EEG-][${right_mastoid}],[${right_mastoid}][ REF]}
% ------- example unit mappings (first term in list is the final/preferred label, e.g. 'uV')
${volts=V,volt}
${mvolts=mV,millivolt,milli-volt,mvolt,m-volt}
${uvolts=uV,microvolt,micro-volt,uvolt,u-volt}
% ------- preferred SR and units as variables
${pref_sr=128}
${pref_unit=uV}
% ------- template for contra-lateral mastoid channels
% the label after the template name gives the desired form
% of the target label, e.g. C3_M2 if ^ is C3 and ${cm_label} was set of M2
% definition uses $${x} variables, which are swapped in when applying the template,
% i.e. not when first reading the template definition (unlike ${x} would be)
define: EEG_make_cm_referenced ^_$${cm_label}
req:
sig = ^,EEG-^,"^ REF"
ref = $${ref}
unit = ${uvolts}
unit = ${mvolts}
unit = ${volts}
set:
unit = ${pref_unit}
sr = ${pref_sr}
% ------- apply rules for contra-lateral mastoid referencing
apply: EEG_make_cm_referenced ${left} ${ref=${right_mastoid}} ${cm_label=M2}
apply: EEG_make_cm_referenced ${right} ${ref=${left_mastoid}} ${cm_label=M1}
Extensions
Key extensions here:
-
Template variables: when defining a template, variables in the form
$${x}
are evaluated not at the time the template is read (as${x}
would be) but later when it is applied. This allows different values (other than^
) to be passed into the template, e.g.$${ref}
, which is either a set of labels that match the left or the right mastoids. -
Named templates: the
define:
keyword has a second label after the template name (EEG_make_cm_referenced
), i.e.^_$${cm_label}
, which specifies the name of the rule (and thus the name of the channel that will be created in the new EDF). If it is not specified, it is assumed to be^
by default - i.e. just the name of the channels in${eeg}
etc. Here we modify it to have an underscore and then eitherM1
orM2
specified via$${cm_label}
. Note that if${ref}
was simply (e.g.M1
) we should not need to set the${cm_label}
, we could just use that: (^_$${ref}
). However, here${ref}
is a list of matching terms (e.g.M1,EEG-M1,A1,...
) and so the new channel needs a simpler label. So, as we first apply only to the left EEGs with${cm_label=M2}
, then only to the right EEGs with${cm_label=M1}
we end up with the desired labels in the formC3_M2
,F3_M2
, etc matched withC4_M1
,F4_M1
, etc. -
We specified required units (as a variable, e.g.
${uvolts}
) which requires that the physical dimension field matches (case-insensitive) one of these values but will then map it to the first (uV
), thereby harmonizing unit labels (in our walkthrough, we do not have the issue of different labels for the same unit, so this step is not needed). We also specify the preferred output unit and sample rates as variables. -
Note that the variable assignments (e.g.
${cm_label=M2}
) do not evaluate to any text, i.e. after processing, the variables are set, butis subsequently passed as:apply: EEG_make_cm_referenced ${left} ${ref=${right_mastoid}} ${cm_label=M2}
However, the two variables used in the template (apply: EEG_make_cm_referenced ${left}
${ref}
and${cm_label}
) do actually need to be defined before the template can be expanded (Luna will complain otherwise); but they could have been set before theapply:
keyword, which would have the same effect as setting them on the same line: e.g.Note that we assign${ref=${right_mastoid}} ${cm_label=M2} apply: EEG_make_cm_referenced ${left}
${ref}
using another variable${right_mastoid}
whereas we assign${cm_label}
using plain text, thus the difference in syntax between these two cases. -
Finally, we use Luna's list expansion to write sets of labels more quickly:
[a][x,y,z]
is shorthand forax,ay,az
[a,b,c][-REF]
is shorthand fora-REF,b-REF,c-REF
[a,b,c][x,y,z]
is shorthand forax,ay,az,bx,by,bz,cx,cy,cz
Thus the terms:
are a way of writing that${left_mastoid=M1,A1} ${left_mastoid+=[EEG-][${left_mastoid}],[${left_mastoid}][ REF]}
${left_mastoids}
equalsasM1,A1,EEG-M1,EEG-A1,M1 REF,A1 REF
[EEG-][${left_mastoid}]
equalsEEG-M1,EEG-A1
and[${left_mastoid}][ REF]
equalsM1 REF,A1 REF
. Yes, obviously in this example it is more text (and more complicated) to use the expansion, but this is not necessarily the case if working with many more labels (e.g. the main set of hd-EEG channels)
Summary
Although this may all seem a little involved on first blush, the above syntax can be a convenient way to harmonize different datasets (e.g. especially PSG studies with many labels or conventions)...