Simulation
Simulation of new data
Command | Description |
---|---|
SIMUL |
Simple approach to simulate time-series given a power spectrum |
SIGGEN |
Generate/spike in artificial test signals |
SIMUL
Simulation of time-series data given a power spectrum
This command simulates time-series with specified spectral properties, using a simple approach that will not generate truly realistic EEG (or other) signals, e.g. in terms of phase-amplitude coupling or other distributional properties of the data. Rather, it simply provides a means to generate random stationary time-series with a known power spectrum, e.g. to be used for testing methods, or generating figures, etc.
This command can either generate a new signal (if the sig
label does
not already exist in the EDF) or it can modify an existing signal. In
the latter case, the command will either completely overwrite the old
signal, or if the add
option is specified, it will add
(i.e. numerical addition) the simulated signal onto the existing
signal. In both these cases, if SIMUL
modifies an existing signal,
the simulated signal must have the same sample rate (sr
) as the
original.
There are three primary modes for specifying the power spectrum:
-
to read a power spectrum from a file (e.g. the output of a previous Luna
PSD
command), usingfile
-
to specify a 1/f form, with the arguments
alpha
andintercept
-
to specify one or more peaks in the power spectrum of fixed amplitude, with
frq
andpsd
; these are either single points (i.e. generating an exact 10 Hz sine wave withfrq=10
), or ifw
is set to a positive, non-zero value, to have a Gaussian bump centered on 10 Hz (wherew
is the standard deviation). -
note that the latter two options can be combined: i.e. to simulate a 1/f that also has spectral peaks; further, by applying
SIMUL
repeatedly on the same signal using theadd
option, it is possible to build up more complex composite signals.
The SIMUL
command first scales the (positive) spectrum to m=n/2+1
points where n is the desired length of the time series (which is
fixed given the EDF duration). If power spectra are read from a file, they
are interpolated as necessary (using cubic spline interpolation) to be of length m.
A new complex variable Z is generated according to the amplitudes as fixed by the generating power spectrum, but with phases randomly (and independently) assigned for each frequency. An inverse FFT is then applied to Z to give the resulting time series.
By default, the resulting time series is stationary across its entire
duration, i.e. having been generated by a single power spectrum. The
pulses
option is one simple way to generate non-stationary time
series: it takes two arguments, the number and duration of pulses.
The exact same procedure is followed as above, except at the end,
these pulses are randomly allocated to parts of the generated time
signal (constrained so as not to be overlapping). All parts of the
signal that are not spanned by a pulse are set to zero. Note that
pulses are not initiated with respect to the phase of any
component of the simulated signal. If too many pulses are specified, or
if they are too long (i.e. and won't fit into the generated signal) then Luna
issues a warning and stops.
The time series is stored as a channel in the internal EDF, i.e. it
can be output with commands such as [WRITE
] or [MATRIX
] or fed
into subsequent commands such as PSD
.
Hint
It might often be convenient to use the empty EDF feature
of Luna when using SIMUL
, i.e. if you want to create one or more
signals from scratch. Here, you specify .
(period character) as
the sample-list/file-name along with --nr
and --rs
on the
command line, to give the number of records (nr
) and the EDF
record size (rs
) respectively. Luna will then create an EDF of
this duration (i.e. with headers speciying the length of the
recording) but with 0 signals, i.e. a collection of empty records.
The SIMUL
command will then create a new channel that will be of
the desired duration (and sample rate given by the sr
option).
Parameters
Primary parameters to specify the data and any outlier actions for the dependent variables:
Option | Example | Description |
---|---|---|
sig |
S1 |
Name for the new, single signal to be generated |
add |
Assumes that sig already exists in the EDF, add the new data to it |
|
frq |
1,15 |
Specify spectral peaks at 1 and 15 Hz |
psd |
10,10 |
Specify power values of 10 and 10 (arbitrary units) for, e.g. 1 and 15 Hz |
w |
2 |
Specify peak width (SD of normal distribution centered at frq values) |
alpha |
2 |
Specify spectra in terms of 1/f^a slope |
intercept |
1 |
Specify spectral intercept, required if alpha is used |
file |
psd.txt |
Read power spectrum from a file (assumes F and PSD columns) |
sr |
100 |
Specify a sample rate (for any new signal) |
pulses |
100,1 |
Specify 100 pulses of 1 second duration each |
verbose |
Output the specified power spectrum |
Outputs
Optional expected power spectra output (option: verbose
, strata: F
):
Variable | Description |
---|---|
F |
Frequency |
LF |
Log-scaled frequency |
P |
Expected power |
LP |
Log-scaled expected power |
Example
Here we generate a time series from scratch:
luna . -o out.db --nr=30 --rs=1 \
-s ' SIMUL alpha=2 intercept=1 frq=15 psd=10 w=1 sig=S1 sr=100 verbose
MATRIX minimal file=s1.txt
PSD sig=S1 spectrum max=50 '
This statement is intended to generate 30 seconds of data (i.e. nr
=
30 records, each of rs
= 1 second) with a sampling rate of sr
=
100 Hz. The use of the .
as the sample-list/EDF filename indicates
to Luna that this is an empty EDF (which requires that --nr
and
--rs
are given on the main Luna command line).
Here we use both the spectral slope and peak formulation: an alpha
value of 2, and a Gaussian bump at 15 Hz (with a SD of w
= 1 Hz).
As noted above, the simple simulation approach here will generate a
time-domain signal that this PSD in the frequency-domain, but
otherwise it does not meaningfully replicate other time-domain
features of signals.
The expected power spectrum is given as follows:
destrat out.db +SIMUL -r F
ID F LF LP P
. 0 NA -110.197 0.000
. 0.0333333 -3.401 6.802 900.000
. 0.0666667 -2.708 5.416 225.000
. 0.1 -2.303 4.605 100.000
. 0.133333 -2.015 4.030 56.250
. 0.166667 -1.792 3.584 36.000
. 0.2 -1.609 3.219 25.000
. 0.233333 -1.455 2.911 18.367
. 0.266667 -1.322 2.644 14.062
...
This spectrum has a maximum frequency of 50 Hz (Nyquist) given the 100 Hz sample rate; given the 30 seconds of data is generated, the positive power spectrum has m = 3000/2+1 = 1501 points (and so a frequency resolution between points of 50/1500 = 0.0033 Hz).
Plotting the log-log scale expected power spectra (LF
and LP
,
skipping the DC term), we obtain the following:
destrat out.db +SIMUL -r F > exp.txt
destrat out.db +PSD -r F CH > obs.txt
Plotting the expected log-log scaled frequency and power, we see the
specified slope b = -2 (negative alpha
), which is linear on a
log-log scale, as well as a peak at 15 Hz:
The MATRIX
command above output the simulated time series to the file s1.txt
: here is five seconds of the signal:
We also applied the PSD
command to the newly generated
signal S1
, to use Welch method to estimate the spectrum from the
data. Plotting the expected (gray line, same as above) against the
estimated values for this one epoch, we see a good agreement: (note,
there the lowest frequency estimated is 0.5 Hz, as the Welch method
uses, by default, 4 second sliding windows; thus the x-axis is shifted relative
to the plot above):
We can add the slope
option to PSD
to estimate the spectral slope using a simple linear regression on the log-log spectrum: i.e. the final PSD
command were instead written:
PSD spectrum max=50 slope=30,45
then we can output the SLOPE
as follows:
destrat out.db +PSD -r CH | behead
ID .
CH S1
NE 1
SPEC_SLOPE -2.0451
SPEC_SLOPE_N 61
In the above example we specified that the slope be estimated only over the region 30 to 45 Hz (as we and others have shown to be a robust choice, being relatively free from stronger oscillatoary activity and/or line noise (Kozhemiako et al, 2021).
If we had specified a broad range, that includes the (in this example, extremely strong) bump at 15 Hz, this naturally would have biased the slope estimate, which is now closer to 5 or 6.
PSD spectrum max=50 slope=10,45
ID .
CH S1
NE 1
SPEC_SLOPE -5.6164
SPEC_SLOPE_N 141
See here for more details on the PSD
and slope
commands/options.
As a second example, here we base the simulation on a real power spectrum, estimated from N2 sleep, e.g. here from one individual from the NSRR CFS cohort (a random 10 minutes of N2 sleep):
luna cfs.lst 1 -o out.db -s ' MASK ifnot=N2 & RE
MASK random=20 & RE
PSD spectrum sig=C3 min=0 max=50 '
Note how we use the PSD
options min
and max
along with spectrum
to extract a broader range than the
PSD
command typically gives. We can extract the power spectrum to a file s.txt
:
destrat out.db +PSD -r F CH > s.txt
To now simulate a time series with a similar power spectrum, we can use SIMUL
and file
: here, we simulate 6 minutes
of signal (i.e. 180 one-second records) and read in the spectrum from s.txt
:
luna . -o out.db --nr=180 --rs=1 -s ' SIMUL sr=400 file=s.txt sig=S1 verbose
PSD spectrum max=200 '
By default, Luna assumes the PSD
values are raw and not logged; if
Luna detects a negative value in PSD
it will assume they are
10log10(X) values (i.e. generated by PSD spectrum dB
) and will
convert them accordingly.
If we plot a) the original power spectra (with log(X) scaling), b) the interpolated expected spectra (orange, i.e. to match the given sample rate of 400 Hz), c) the estimated power spectra obtained via the Welch method to 180-seconds of randomly simulated data, as green, orange and navy points/lines respectively, we see they all line up more or less as expected:
Note that we used a different sample rate here (400 Hz),
which implies a different Nyquist frequency. The original spectrum
was output up to 50 Hz. Implicitly, all unspecified frequencies are
set to zero, when reading from a file: for a sample rate of 400 Hz for
the generated signal, this implies values from 50 up to 200 Hz. Indeed,
if we plot the full range of output from the previous SIMUL
command, we'll
see the spectrum extends up to 200 Hz, but with values of 0 for all frequencies
above 50 Hz (which are therefore not defined on the log scale, and so are NA
):
Finally, here we demonstrate the use of the pulses
option, as well as
adding different signals together. Below, we simulate 3 different
10-second segments. First, a constant 4 Hz sine wave:
luna . -o out.db --nr=10 --rs=1 \
-s ' EPOCH len=10
SIMUL sr=100 frq=4 psd=1 sig=S1
FFT
MATRIX minimal file=s1.txt '
Note that we set the epoch length to 10 seconds (smaller than the
default of 30 seconds), which ensures that subsequent commands operate
correctly on a segment of data smaller than 30 seconds. We also use
the FFT
command to output the power spectrum associated with the new
signal, which uses the basic DFT algorithm (rather than Welch or
multi-taper approaches to spectral estimation). Plotting the contents of s1.txt
(i.e.
the raw time-domain signal output by MATRIX
):
The second signal adds the pulses
option to make the output consist of 3 segments (each of 1.5 second duration)
within the overall 10-second window. As noted above, this is constrained to ensure that the segments do not overlap:
luna . -o out.db --nr=10 --rs=1
-s ' EPOCH len=10
SIMUL sr=100 frq=4 psd=1 sig=S1 pulses=3,1.5
FFT
MATRIX minimal file=s2.txt '
In this third example, we create two signals and add them together: a single 2-second segment of 4 Hz activity, and then superimpose 10 0.25-second segments of 20 Hz activity:
luna . -o out.db --nr=10 --rs=1 \
-s ' EPOCH len=10
SIMUL sr=100 frq=4 psd=1 sig=S1 pulses=1,2
SIMUL sr=100 frq=20 psd=1 sig=S1 add pulses=10,0.25
FFT
MATRIX minimal file=s3.txt '
Now, for each of these three signals, we can look at power spectra obtained from the FFT
command, extracting out
output as:
destrat out.db +FFT -r F CH > o.1
The first shows the simple 4 Hz sine wave:
The second shows an attenuated peak at 4 Hz, with the discontinuities at the start/stop of segments introducing other spectral components:
Finally, we see a similar picture for the third signal, but with spectral components around both 4 and 20 Hz, again with the 'ripples' in the frequency domain resulting from the onsets/offsets of the pulses:
In this example, we've added two signals together, where the first was
created from scratch. It is also possible to use add
to modify an
existing signal, if so desired.
Finally, also note that the frq
and psd
commands can accept
multiple frequency/power pairs, but the pulses
only acts on the
composite signal generated:
luna . -o out.db --nr=10 --rs=1 \
-s ' EPOCH len=10
SIMUL sr=100 frq=4,20 psd=1,1 sig=S1 pulses=1,2
FFT
MATRIX minimal file=s3b.txt '
SIGGEN
Generate, or add-in, artificial test signals
This command is largely redundant (given SIMUL
), but is decribed here for completeness.
This is a simple command to generate test signal data (on top of an existing EDF). Currently, it only generates sine wave signals, or pulses of a given duration.
Parameters
Parameter | Example | Description |
---|---|---|
sig |
sig=C3,C4 |
Signals to be modified/created (existing or new) |
sr |
200 | Set sample rate if not an existing channel |
add |
Add to an existing channel (versus overwrite it) | |
sine |
sine=10,20 |
Generate a sine wave with specified frequency (10 Hz), amplitude (20 units) and optionally phase |
impulse |
impulse=0.5,1,100 |
Set impulse of amplitude 1 for 100 samples, starting at 0.5 way through the recording; multiple values are possible impulse=T,A,D,T,A,D,... , i.e. time/amplitude/duration |
Output
No new output, this command just modifies the internal signal data.
Example
To generate a sine wave in the signal C3
(first clearing that signal):
luna s.lst -o out.db -s ' MASK ifnot=NREM2 & RE
SIGGEN sig=C3 clear sine=10,100
MTM sig=C3 '
Plotting the output of MTM
:
Also, see this vignette for an example of using SIGGEN
to generate a toy dataset.