⚠Warning:
'CreateInfoOnly'
and 'InParforLoop'
, which are preserved for parallel computing, and you will never need to set them manually.There are roughly four stages in the MagTIP forecasting process:
The four stages is wrapped by four functions with keyword options that we can customize the parameters related to model optimization and probability forecast. The four main functions are wrapper functions for routine training and forecasting process. As follow:
conv_geomagdata
The original geomagnetic data (which are those in ".csv" format being something like "2008010300.KM" or "20190307.LY") should be converted to a standard format before any calculation. conv_geomagdata(dir_originalfiles, dir_data)
read original data in dir_originalfiles
and save them in the standard format at the directory dir_data
.
Keyword Argument:
'ContinueFromLast': Default is false
. If true
, it compares the names of all old files in the dir_originalfiles
and the new files in dir_data
before conversion to avoid files that are already being converted to be converted again. This additional procedure may take several hours depending on the size of database; a more efficient way for avoiding repeated processing is to manually specify 'FilterByDatetime'. See below.
'FilterByDatetime': Only convert the files in the assigned time range(s). It supports:
A datetime. For example, when 'FilterByDatetime', datetime(2010,10,10)
, only files with time tag being or after 2010-Oct-10 will be converted.
N by 2 datetime array, for example,
'FilterByDatetime', [datetime(2009,12,10), datetime(2010,10,10);...
datetime(2013,12,10), datetime(2017,10,10)];
, only the files in the two time ranges [2009-Dec-10, 2010-Oct-10] and [2013-Dec-10, 2017-Oct-10] will be converted; ohterwise, ignored.
'Test': It take only the assigned ratio (0-1) of data (files) in the directory dir_originalfiles
. The files are randomly chosen. This option should only be applied when you want to test your code for reducing the overall computing time. Default is 0 (when the assigned value for 'Test' is $\leq 0$ or $\geq 1$, there is no reduction in the data list).
NOTICE:
Files in the original format must be uniquely named as '20200101.HC' or '2020010109.HC' for example.
If not uniquely named, such as '..\HC\20200101.txt', '..\CS\20200101.txt', the second one will be regarded as a duplicated file and be moved out to an alternative folder (dir_alt
)
every file should have its extension as the code of the corresponding station, e.g. '20200101.HC' is correct; '20200101.txt' is not valid and an error will occur.
conv_gemsdata
conv_gemsdata(dir_gems, saveto, dir_catalog)
read original GEMS's data (e.g., "20120207164500.dat") in `dirgems`, merge and convert them to the standard format for MagTIP.
Also see "read_gemsdata.m"
. Example:
dir_gems = 'g:\GEMSdat\';
% where the individual data locates in for example:
% 'g:\GEMSdat\em10\REC\Y2012\M02\D07\2012_02_07_16_45_00.dat'
saveto = 'd:\Data';
dir_catalog = % for obtaining the station information ("station_location.mat");
conv_gemsdata(dir_gems, saveto, dir_catalog);
Write your preprocessing function for example prp_fn(filepath)
for file in the Standard Data Format. Put for example "prp_fn.m"
under "/src/preprocess"
. Use your custom preprocessing functions with the keyword argument ..., 'Preprocess', 'prp_fn'
in statind(...)
.
💡Hint:
- You may apply different preprocessing pipelines for different types of data by checking the content or file name.
- Use
prpfunctions()
to list all preprocessing functions that can be called.
Here is the list of preprocessing functions:
BP_35.m
The preprocessing function that filter the loaded timeseries with 'BP_35'
band-pass filter.
Example Import data from fpath
as M
, create bandpass filter automatically for each data column in M
, and output the filtered data as M_out
having exactly the same column dimension as M
(but with rows possibly reduced):
fpath = 'D:\GeoMag (main)\GeoMag_GO\KUOL\stn[KUOL]dt[20150714]type[GEMS].mat';
M_out = BP_35(fpath);
Basically the same as above, but attempts to apply existing filters (e.g., 'Filter[BP_35]samp[1]stp[093].mat'
) in dir_filter
if possible:
fpath = 'D:\GeoMag (main)\GeoMag_GO\KUOL\stn[KUOL]dt[20150714]type[GEMS].mat';
filtpaths = datalist('Filter*.mat', dir_filter);
FOBJ = loadfilters(filtpaths); % load all filters as a struct FOBJ
M_out = BP_35(fpath, FOBJ);
See fmt
for the frequency range of 'BP_35'
.
generalfilterprocess
is the common core of 'ULF_X'
family; see also the doc therein.
BP_40.m
The preprocessing function that filter the loaded timeseries with 'BP_40'
band-pass filter.
Example Import data from fpath
as M
, create bandpass filter automatically for each data column in M
, and output the filtered data as M_out
having exactly the same column dimension as M
(but with rows possibly reduced):
fpath = 'D:\GeoMag (main)\GeoMag_GO\KUOL\stn[KUOL]dt[20150714]type[GEMS].mat';
M_out = BP_40(fpath);
Basically the same as above, but attempts to apply existing filters (e.g., 'Filter[BP_40]samp[1]stp[093].mat'
) in dir_filter
if possible:
fpath = 'D:\GeoMag (main)\GeoMag_GO\KUOL\stn[KUOL]dt[20150714]type[GEMS].mat';
filtpaths = datalist('Filter*.mat', dir_filter);
FOBJ = loadfilters(filtpaths); % load all filters as a struct FOBJ
M_out = BP_40(fpath, FOBJ);
See fmt
for the frequency range of 'BP_40'
.
generalfilterprocess
is the common core of 'ULF_X'
family; see also the doc therein.
README.md
<unsupported language>
ULF_A.m
The preprocessing function that filter the loaded timeseries with 'ULF_A'
band-pass filter.
Example Import data from fpath
as M
, create bandpass filter automatically for each data column in M
, and output the filtered data as M_out
having exactly the same column dimension as M
(but with rows possibly reduced):
fpath = 'D:\GeoMag (main)\GeoMag_GO\KUOL\stn[KUOL]dt[20150714]type[GEMS].mat';
M_out = ULF_A(fpath);
Basically the same as above, but attempts to apply existing filters (e.g., 'Filter[ULF_A]samp[1]stp[093].mat'
) in dir_filter
if possible:
fpath = 'D:\GeoMag (main)\GeoMag_GO\KUOL\stn[KUOL]dt[20150714]type[GEMS].mat';
filtpaths = datalist('Filter*.mat', dir_filter);
FOBJ = loadfilters(filtpaths); % load all filters as a struct FOBJ
M_out = ULF_A(fpath, FOBJ);
See fmt
for the frequency range of 'ULF_A'
.
generalfilterprocess
is the common core of 'ULF_X'
family; see also the doc therein.
ULF_B.m
See the doc of 'ULF_A'
.
no.m
M_prp = no(fpath)
does no preprocessing, loads and returns the data of fpath
in the form of matrix.
Other functions might also depends on no
. For example, prpfunctions
is dependent on no
to get the directory of all functions for preprocessing.
generalfilterprocess
Given prp_tag
, generalfilterprocess(prp_tag, fpath [, FOBJ])
load data from fpath
, filter only the data column and return the data as M_prp
. The frequency range is inquiried through fmt.filterRange
or FOBJ.(prp_tag).(fs_tag)
.
If FOBJ
is provided, generalfilterprocess
attempts to do filtering with filters in FOBJ
; if there is no adequete filter in FOBJ
, it applys filterthedata
that generates filter for file fpath
with warning message.
See docs in:
loadfilters
for information about FOBJ
, fs_tag
and prp_tag
.
fmt
for fmt.filterRange
filterthedata
for automatically create a filter and do filtering
fillNaN
for interpolation (which is required before any filtering)
autobandpassfilter
for quickly making and saving filters.
loadfilters
F = loadfilters(filtpaths)
load filters as a nested structure, that opts = F.(prp_tag).(fs_tag)
is compatible to signal.internal.filteringfcns.filterData
in bandpass
. In which, prp_tag
should be in the list of prpfunctions()
and fs_tag
is the sampling frequency prefixed by "fs" (e.g., 'fs15' for 15Hz).
The simplest way to create and save a filter is apply bandpass
to data with a certain set of parameters, set debug point inside bandpass
, and save opts
in the debug scope. For canoical approaches, see Filter Design in Matlab.
The filters should named as 'Filter[ULF_A]samp[15]....mat' for example.
The input filtpaths
should be a struct
having name
for file names and fullpath
for file paths.
See also:
prpfunctions()
: list all preprocessing functions that might support F
.
signal.internal.filteringfcns.filterData
(Signal processing package of matlab)
statind(__)
qualitycontrol
Quality control of geomagnetic data of CWB. FIXME: Rename this for geomagnetic only.
checkdatevec
checkdatevec(M, colind_dt, today_date)
check if M(:,colind_dt)
lay in the range of today (today_date
).
edgetruncate
Make values of column variable who are outliers AND belong the first/last 10%. That is, the data at two edges (with edge width 10% of number of rows) will be replaced by NaN if they are outlier. This function was intended to remove edge effect after filtering.
estimatesupposedpts
[fs, dt, supposedpts] = estimatesupposedpts(M, colind_dt,what_type, fpath)
estimates the sampling frequency fs
, temporal interspace dt
, and the supposed number of points per day supposedpts
.
Noted that fs
is rounded to integer, i.e., fs = round(1/dt)
, supposedpts = fs * 86400
,
statind
)statind(dir_data,dir_output)
calculate daily statistics (a statistical quantity as an index of the day) of the daily timeseries in dir_data
. The output variables are stored in dir_output
.
Example:
statind(dir_data,dir_output,'StatName','S','StatFunction',@skewness, 'Preprocess', {'ULF_A','ULF_B'});
In which, dir_data
is the directory for the time series in the standard format; dir_output
is the dir_stat
mentioned before.
Keyword Arguments:
'StatName'
The abbreviations for the name of statistical indices. They can be arbitrarily defined but have to be the same number of elements as that of 'StatFunction'
.
Default is {'S','K'}
(for Skewness and Kurtosis).
'StatFunction'
The function handle for calculating statistical index.
It has to be of the same number of elements as that of 'StatName'
Default is {@skewness,@kurtosis}
for calling the skewness()
and kurtosis()
functions.
'Preprocess'
Apply filter(s) to time series loaded from dir_data
. Generally applying a filter very largely increase the computing time, so you may consider 'SavePreprocessedData'
.
Default is {'no'}
, which applies only minimal preprocessings.
Use prpfunctions()
to list all available preprocessing functions.
If multiple preprocessing functions are applied, for example {'no', 'ULF_A'}
, then two sets of result according to no-filter data and ULF_A band passed data are going to be produced.
See Data Preprocessing.
'SavePreprocessedData'
Save the preprocessed data to an alternative folder. Their directory is parallel to that of no-filter data.
'FilterByDatetime'
It should be a two element datetime array.
If applied, all files with date time tag not in the range will be ignored.
Default is [datetime(0001,1,1), datetime(2999,12,31)]
, resulting in no data selection by date time tag.
'UpdateExisting'
If true
, the array size in the existing statistic indices (files starts with "[StatisticIndex]") will be extended, and the DateTime
in Information file in the dir_output
will all be updated to cover the datetime of the data in dir_data
.
Default is false
; that a complete new session starts with the old Info
renamed.
Currently this option is not supported by statind_parfor
.
'SkipExist'
Set it true
to skip calculations if in the current loop a file in dir_stat
of the
same 'stn'
and 'prp'
already exists.
A typical scenario to use this option is when your last calculation is not completed yet.
Default is false
.
'LoadFilters', dir_filter
Load all filters in dir_filter
once for performance. See loadfilters
, generalfilterprocess
.
statind_parfor
is the parallel computing version of statind
, it takes the same input arguments and keyword options as statind
.
Keyword Arguments:
'NumCore'
: Assign specific how many CPU workers to be used in the paralell pool.
Default is 0
, that the number of CPU cores to be used for paralell computing is automatically decided.
statind_merge
statind_merge(dir_stat1,dir_stat2)
merge statistcal indices of dir_stat2
into dir_stat1
. Contents in S1
have higher priority to be reserved than that in S2
if there is overlapping. Only files in dir_stat1
will be modified.
anomalyind
)anomalyind(dir_stat,dir_output)
calculates anomaly index number (AIN) according to a list of $A_\text{thr}$; dir_out
is where the output variables stored.
Example:
anomalyind(dir_stat,dir_tsAIN);
Keyword Arguments:
'AthrList'
:
the list of $A_\text{thr}$ (the multiplier of the median value of the statistical indices within the moving time window)
default is [1:10]
'MovingWindow'
the moving-window length for calculating the median value mentioned above
default is 1000
(days)
Input and output:
anomalyind
takes the statistic index output from statind
by loading them from dir_stat
.
The calculated anomaly index is saved in dir_tsAIN
.
molscore
)molscore(dir_tsAIN,dir_catalog,dir_molchan)
is responsible for single station's TIP & Molchan score calculation.
Example:
dir_catalog = '\MagTIP-2021\spreadsheet';
dir_tsAIN = '\MagTIP-2021\output_var\tsAIN-J13-TZ2';
dir_molchan = '\MagTIP-2021\output_var\MolchanScore-J13-TZ2';
molscore(dir_tsAIN,dir_catalog,dir_molchan);
Keyword Arguments:
'TrainingPhase'
Assigns a (set of) training phase(s). It should be of type 'calendarDuration'
, 'duration'
, an N by 2 array of 'datetime'
, or an N by 2 cell array, where N is the number of the training phases. If given an N by 2 array specifying N training phases, then N sets of results will be produced separately, with output format being '[MolScore]stn[%x]ID[%s]prp[%s]dt[%s-%s].mat'
.
For example, a 4 by 2 datetime array reshape(datetime(2009:2016,2,1),[],2)
specifies the start and end date of 4 training phases, with the first column being the datetime of the start and the second column being the end of each training phases.
For example, a 3 by 2 cell array
{calyears(7),datetime(2012,11,11);...
calyears(7),datetime(2011,11,11);...
calyears(7),datetime(2010,11,11)};
specifies the end day of the training phases as 2010-Nov-11, 2011-Nov-11 and 2012-Nov-11, all having a length of 7-year-long training period (i.e. calyears(7)
). If the duration is negative (e.g. -calyears(7)
), the datetime of the second column become the first day of each training phase.
Default is calyears(7)
, which specifies the end day of training phase the day before the last day of statistical indices or anomaly index number, with a length of 7 year period. That is, in default it "trains" and gives the best models according to the most recent 7-year data.
'modparam'
Specify the model parameters for grid search. It should be a cell array; all elements in the cell array will be directly passed into modparam()
as input arguments.
For example, {'Athr',[1:2:10],'Rc',[20, 30]
}.
Default is {}
.
'AdditionalCheck'
Apply some additional check and tests. This option is for developer.
Default is false
.
Input and output:
molscore
takes anomaly indices from dir_tsAIN
and earthquake catalog and station information from dir_catalog
.
The output ranked models are saved in dir_molchan
.
molscore_parfor
is the parallel computing version of molscore
, it takes the same input arguments and keyword options as molscore
.
molscore3
)molscore3
calculates the joint-station TIP, the Molchan score between EQK and TIP, the Hit rate, and the earthquake probability. The calculation is based on optimized model given by molscore
.
Example:
dir_catalog = '\MagTIP-2021\spreadsheet';
dir_tsAIN = '\MagTIP-2021\output_var\tsAIN-J13-TZ2';
dir_molchan = '\MagTIP-2021\output_var\MolchanScore-J13-TZ2';
dir_jointstation = '\MagTIP-2021\output_var\JointStation-J13-TZ2';
molscore3(dir_tsAIN,dir_molchan,dir_catalog,dir_jointstation)
Keyword Arguments:
'ForecastingPhase'
It can be a N by 2 datetime array with its size identical to the training phases (trnphase
), specifying the time ranges of N forecasting phases.
It can be 'calendarDuration' or 'duration', saying T
. In this case, saying the last day of the training phase is trnphase(i, end)
, the forecasting phases are set to start from trnphase(i, end) + 1
, and end at trnphase(i, end) + T
.
'auto'. In this case, forecasting phases are set to be as long as possible. That is, until the last day where tsAIN is available.
Default is one calendar year: calyears(1)
.
Also see formatForecastingPhase()
.
'OverwriteFile'
Whether to overwrite existing output files or not.
Default is true
.
'ModelSelect'
see bestmodel()
'ModelSelectOP'
see bestmodel()
'ChooseBest'
Define the number of maximum best models for each station to be applied.
Default is 10, which means we pick the models of top 10 fitting degrees each station for calculating predicted TIP(s).
'CombinationNumber'
Define the total number of random combinations among the best models of each station for joint-station TIP calculation.
Default is 500, which means for every station a random permutation of ranking numbers (based on the fitting degree) of the best models is performed with each sequence of ranking number having 500 elements, and the ith joint-station model parameter combination is thus from the best models of each station indexed by the ith element of each permutation.
Input and output:
molscore3
takes anomaly indices from dir_tsAIN
, earthquake catalog and station information from dir_catalog
, and ranked model form dir_molchan
.
The output probability is saved in dir_jointstation
.
molscore3_parfor
is the parallel computing version of molscore3
, it takes the same input arguments and keyword options as molscore3
.
For a new type of input data you have to add new properties in fmt
and new cases in fmt
’s methods.
fmt
is a class that
stores all constant names
stores all constant variables
that enalbes you to modify or extend some contants without breaking other functionalities.
MagTIP depends on fmt
to load a variety of imtermedate data, including earthquake catalog, station information table and the input data of the standard format. fmt
save constant names and variables as properties (Constant)
that they can be available in global scope. fmt
also have static methods for acquiring information such as the column indices indicating the time vectors and recorded values of the input data of different types.
properties (Constant)
:
InfoFileName_molscore = '[MolchanScore]Information.mat'
: name to the intermediate file in the session of training (model optimization)
InfoFileName_anomalyind = '[tsAIN]Information.mat'
: name to the intermediate file in the session of calculating anomaly indices
InfoFileName_jointstation = '[JointStation]Information.mat'
: name to the intermediate file in the joint-station method session
InfoFileName_statind = '[StatisticIndex]Information.mat'
: name to the intermediate file in calculating Statistic Index
catalogFileName = 'catalog.mat'
: file name to the earthquake catalog
stationLocationFileName = 'station_location.mat'
: name to the file of station information table
tag_upperlowerthreshold = 'ULthr'
: tag for upper- and -lower threshold defined by Athr
datestrFormat = 'yyyymmdd'
: the format for converting datetime to string tags (dt
)
datetimeInputFormat = 'yyyyMMdd'
: the format for converting date strings (with tag dt
) to datetime
datetimeIncrement
: the increment of datetime between two consecutive files of the same station and type.
LatitudeLimit = [21.5 26]
: latitude limits of the Taiwan area
LongitudeLimit = [118 122.5]
: Longitude limits of the Taiwan area
datatype_mag_1 = {'mFull', 'full'}
: "type"
tag for the full field geomagnetic data (old geomagnetic stations from 2006; 'full' is for backward compatibility)
datatype_mag_3 = {'mTri', 'tri'}
: "type"
tag for the 3 component geomagnetic data (new geomagnetic stations from 2020; 'tri' is for backward compatibility)
datatype_GEMS_HJ
: "type"
tag for the data of Hong-Jia Chen 2018
datatype_GEMS
: "type"
tag for the primitive of Geoelectric data
methods (Static)
:
colindex2data(what_type)
: returns column indices to time and data part of the loaded matrix. Example: [colind_time, colind_data] = colindex2data('GEMS0')
filterRange(filter_tag)
: returns two values indicating the lower and upper limit of the filter, in the unit of Hz. Example: fmt.filterRange(ULF_A)
.
datasampfreq(what_type)
: returns the expected Sampling Frequency of a specific file/data type. Example: fs = datasampfreq('GEMS0')
. You can obtain expected data points of the day by supposedpts = fs*86400
.
fmtfieldname = datatype_fieldname(what_type)
: returns the field name for fmt
that you can fmt.(fmtfieldname)
.
[fname] = standarddataname(stnm, dtstr, type_j)
returns the file name of the standard data format. For example, standarddataname('HUAL', '20120202', 'GEMS0')
returns stn[HUAL]dt[20120202]type[GEMS0].mat
.
Subfunctions are those a user normally do not have a chance to use. However, if you are a developer, the information will be very helpful.
convAIN
)Calculate the total amount of anomaly indices of each day. That is, saying we have two statistical indices S
and K
, and the total amount of anomaly indices is less or equal than 2. It is the 'update_tsAIN' of the previous version.
Example:
[sum_tsAIN, sum_validateIndex] = convAIN(tsAIN.Athr, tsAIN.validateIndex)
For each Athr we have one anomaly index (true or false) and validation index (true or false) per day per variable, and the output is the number of "trues" per day. Usually Athr = 1:10, so the output will be a 10 by 1 cell in each there is a N by 1 double; N is the length of the DateTime series.
Input arguments:
tsAIN.Athr
or tsAIN.validateIndex
: A 1-by-10 structure array. See anomalyind
for more information.
Output arguments:
sum_tsAIN
or sum_validateIndex
: A 10-by-1 cell array corresponding to $A_{thr} = 1:10$. In each cell, there is a N-by-1 double array being the sum of anomaly indices or the sum of valid indices of N days.
sumanomalyind
)[DateTimeSAT,vlSAT] = sumanomalyind(DateTime_j,sum_tsAIN_k,Nthr_array,Tobs_i)
calculate sum of anomaly day (vlSAT
) and its corresponding datetime series (DateTimeSAT
). It is simply the moving-window sum of the sum_tsAIN_k
timeseries, where the moving-window length is Tobs_i
; in which, i
stands for $i_{th}$ model, j
for $j_{th}$ station, and k
for $k_{th}$ threshold $A_{thr}$. Of coursely, the number of elements of the output timeseires (DateTimeSAT
) will be Tobs_i - 1
less than the input timeseries (DateTime_j
or DateTime_dti
); i.e., length(sum_tsAIN_k) - length(vlSAT) = Tobs_i - 1
.
Input arguments:
DateTime_j
: the datetime array for sum_tsAIN
.
sum_tsAIN_k
: the sum of anomaly statistical indices each day.
Nthr_array
: The thresholds for sum_tsAIN_k
calculated according to NthrRatio
. Nthr
is known as the threshold of the total daily amount of anomaly statistical indices.
Tobs_i
: the observation window of model i.
Output arguments:
DateTimeSAT
: M-by-1 datetime array for sum of anomaly days
vlSAT
: M-by-1 double array for the values of sum of anomaly days (the anomaly days is summed in the moving window of Tobs
)
See also:
convAIN
for more details about sum_tsAIN_k
.
dt_TIP_true
)[dt_TIPtrue,TIPtime,TIP] = dt_TIP_true(DateTimeSAT,vlSAT, Tthr,Tlead,Tpred)
return the datetime where TIP is true.
Input arguments:
DateTimeSAT
: M-by-1 datetime array for sum of anomaly days
vlSAT
: M-by-1 double array for the values of sum of anomaly days (the anomaly days is summed in the moving window of Tobs
)
Tthr
: the threshold related to the number of anomaly days in Tobs
Tlead
: the leading window right before the prediction window
Tpred
: the window of prediction (where TIP is true or false according to the number of anmalies in the corresponding Tobs window)
Output arguments:
dt_TIPtrue
: the datetime where TIP
is true
TIPtime
: an N-by-1 datetime array paired with the TIP
array
TIP
: an N-by-1 logical array of Time of Increased Probability
The variable dt_TIPtrue
may exceeds the last day of DateTimeSAT
, while TIPtime
is truncated to not exceed DateTimeSAT(end)
in molscore
, because in training phase datetime out of DateTimeSAT
is assumed to be out of the time range of available earthquake catalog.
A schematic illustration of the input/output variables:
sum_validateIndex_all_a_dti (2nd input argument in sumanomalyind)
1100111011000001010010000000001010000000 (for example)
DateTime_dti tttttttttttttttttttttttttttttttttttttttt
Tobs length |-----| (Tobs=7, for example)
Tlead length |-------| (Tlead=9, for example)
vlSAT 5445543222122233222221112122344332 (for example)
DateTimeSAT index 123456789.........................
DateTimeSAT |Tobs |--------------------------------|
TIPtime/TIP |Tobs || Tlead ||--------------------------------------|
↑ |Tobs || Tlead ||Tpred|
Tpred start ↑ last AIN data ↑ Tpred end ↑
dt_TIP_valid
)dt_TIP_valid
gives loosely valid TIP time and strictly invalid TIP time. There are two possibilities for a TIP false: one is that the anomaly index number (AIN) is below the threshold, and the other is that AIN is NaN since anything in comparison with a NaN results in a false. The later we called it an 'invalid' TIP time/day.
Example:
[TIPtime1,TIP_valid,dates_TIPvalid_st,dates_TIPinvalid_st] = ...
dt_TIP_valid(DateTimeSAT,sum_validateIndex_all_a_dti,Tobs,Tlead,Tpred)
This gives the datetime where there is at least one data in the Tobs
and hence TIP false is a valid false.
Input arguments:
DateTimeSAT
: see sumanomalyind()
sum_validateIndex_all_a_dti
: the sum_validateIndex
in convAIN()
Tobs
: the observation time window right before the leading window (Tlead
).
Tlead
: the leading window right before the prediction window
Tpred
: the window of prediction (where TIP is true or false according to the number of anmalies in the corresponding Tobs window)
Output arguments:
TIPtime1
: a 1-d datetime array for TIP_valid
TIP_valid
: a 1-d logical array indicating whether the TIP is valid or not.
dates_TIPvalid_st
: Returns the datetimes of where the TIP have a meaningful true
/false
.
dates_TIPinvalid_st
: Returns the datetimes of where the TIP have a meaningless false
. (If TIP is true, then the day should always be valid)
About the meaningful and meaningless 'false', see the comments in anomalyind()
Recalling that sum of AIN is the sum of anomaly index calculated in the moving time window of length Tobs, and the dates DateTimeSAT(sum_AIN_valid)
denotes the end day of Tobs where at least one of the anomaly index in the Tobs window is meaningful, we have the following example where Tlead = 9, Tobs = 7, and Tpred = 3, for explicitly demonstrates how the TIP's valid times are calculated:
sum_validateIndex_all_a_dti
'1100111011000001010010000000001010000000' (for example)
DateTime_dti tttttttttttttttttttttttttttttttttttttttt|
DateTimeSAT tttttttttttttttttttttttttttttttttt|
sum_AIN_valid 1111111111111111111110001111111110|
dt_AIN_valid ttttttttttttttttttttt ttttttttt |
|Tobs ||<--a--->| |<--b----->|
TIPtime tttttttttttttttttttttttttttttttttttt
Tpred moving window |c| |c|
| the same length as DateTimeSAT |
TIPvalid_0 123456789........................
\\\\\\\
TIPvalid_1 3456789..........................
Therefore,
dt_TIPvalid = unique(TIPtime([1:3,2:4,3:5,….]))
% t denotes a datetime object
% a = Tlead+1 = idinc2predstart
% b = Tlead+Tpred = idinc2predend
% c = Tred
% dt_AIN_valid is DateTimeSAT(sum_AIN_valid), and sum_AIN_valid is a.k.a Tobs_TF.
% TIPvalid_0 (_1) is a.k.a. TIP_TF_0 (_1)
eqktip1
)[EQK,TIP, TIPtime] = eqktip1(eqMR,eqDateTime, DateTimeSAT,vlSAT, Tthr_iMod,Tlead_iMod,Tpred_iMod, Mc_iMod,Rc_iMod)
provides 1 dimensional logical array of target earthquakes (EQK
) and Time of Increased Probability (TIP
), corresponding to the 1-d datetime array TIPtime
.
Input arguments:
eqMR
: a N by 2 array of all-combination of model magnitudes and radii of detection, with the first column being magnitude and second column the radius of detection.
eqDateTime
: a N by 1 cell array. Each cell contains the datetime of target earthquakes satisfying the corresponding row of eqMR
; i.e., being larger than eqMR(i,1) and within eqMR(i,2).
DateTimeSAT
, vlSAT
: see sumanomalyind
.
Tthr
, Tlead
, Tpred
: see dt_TIP_true
.
Mc_iMod
: the current model magnitude.
Rc_iMod
: the current model radius of detection.
Output arguments:
EQK
: the M-by-1 logical array for whether there is/are target earthquakes in the day.
TIP
: an M-by-1 logical array of Time of Increased Probability
TIPtime
: an M-by-1 datetime array paired with the TIP
and EQK
array
See also:
[~,eqMR,eqDateTime] = isearthquakeinrange(...)
in molscore
[DateTimeSAT,vlSAT] = sumanomalyind(...)
in molscore
jointstation
)jointstation
calculates hit rates, alarmed rates, and spatial probability.
Example:
When it is applied for calculating the hit rates of the training phase:
HitRates_trn = jointstation(BestModels,BestModelNames,idPermute,...
sum_tsAINs,sum_validateInds,DateTime_dti,...
CWBcatalog,'CalculateHitRate',true);
When it is applied for calculating the forecasting probabilities:
[HitRates_frc, AlarmedRates_frc, SpatialProbability, xLon, yLat, ...
validStationTime, ProbTime, TIP3, TIPvalid3, EQKs]...
= jointstation(BestModels,BestModelNames,idPermute,...
sum_tsAINs,sum_validateInds,DateTime_dti,...
CWBcatalog,'CalculateHitRate',true,...
'CalculateAlarmedRate',true,'CalculateProbability',HitRates_trn,...
'StationLocation',StationLocation);
Keyword Arguments:
'CalculateHitRate': Whether to calculate the hit rates. Default is false
.
'CalculateAlarmedRate': Whether to calculate the alarmed rates. Default is false
.
If 'CalculateProbability' is true, 'CalculateAlarmedRate' will be automatically set to be true since alarmed rates are essential to probability calculation.
'CalculateProbability': Whether to calculate the spatial probability. Default is false
.
'StationLocation':
A J by 3 table with the column names being 'format', 'Lon', 'Lat' and row names (i.e. StationLocation.Properties.RowNames
) the station codes (e.g. 'TW', 'CS',...) of the total J stations.
If 'CalculateAlarmedRate' is true, 'StationLocation' must be specified; otherwise it is impossible to obtain alarm rates.
Default is 0
.
Input arguments:
BestModels
: A J-by-1 cell array containing the best models of total J stations. In each cell there is a table model parameters.
BestModelNames
: An J-by-1 cell array containing station names corresponding to BestModels
(also J-by-1 cell array).
idPermute
: The randomly permuted indices for the best models. See bestmodel()
.
sum_tsAINs
: A J-by-1 cell array with each cell corresponding to BestmodelNames
. See loadAIN()
.
sum_validateInds
: A J-by-1 cell array with each cell corresponding to BestmodelNames
. See loadAIN()
.
DateTime_dti
: A T by 1 datetime array for the time series in sum_tsAINs
and the sum_validateInds
.
CWBcatalog
: A table of earthquake catalog. You may load CWBcatalog
using only1field([dir_catalog filesep 'catalog.mat'])
.
Output arguments:
HitRates
, AlarmedRates
: M by 1 array, being the hit rates (alarmed rates) of each combination of best models. M is the amounts of total combinations (the totalM
in bestmodel()
). They are AlarmedRateForecasting
and HitRatesForecasting
of molscore3
's output.
SpatialProbability
: A S by T array of the temporal-spatial forecasting probabilities. It is the output variable Probability
of molscore3()
.
xLon
(yLat
): The longitudes (latitudes) of all spatial points as a S by 1 array. It is the output variable ProbabilityLon
(ProbabilityLat
) of molscore3()
.
validStationTime
: A T by J table of the ratio of valid models each day each station. This is for plotProbability()
, that the ratio will be demonstrated as gradient color in the marker of each station. If the ratio is zero, the marker on map will be hollow. It is the output variable validStationTime
of molscore3()
.
TIP3
: A S by T TIP array for the first model (out of the total M models). Since the first element of each cell in idPermute
is always 1, indicating the best model, TIP3
is essentially the TIP array calculated according to the best model parameter.
TIPvalid3
: A spatial-temporal logical array indicating whether a point (s,t) of TIP true or false is meaningful or not. This is also a S by T array.
ProbTime
: A 1 by T datetime array for TIP3
, TIPvalid3
and SpatialProbability
. This is the ProbabilityTime
of molscore3
's output.
EQKs
: The list of target earthquake, with each column
time
, Lon
, Lat
, Depth
, Mag
: The time, longitude, latitude, depth and magnitude ($M_L$) of the target earthquake.
InStation
: Indicates the target earthquake is of which station.
EqkDist
: The distance between the station (as specified by InStation
) and hypocenter of the target earthquake. Also see simpleStationEQKdist3D
.
loadAIN
)[sum_tsAINs,sum_validateInds,DateTime_dti,AthrList] =
loadAIN(dir_tsAIN,BestModelNames,TimeRange)
load tsAIN, calculate the summation of anomaly index of each day, and truncate the time series to TimeRange
.
If class(BestModelNames) == 'char'
, for example, 'CS', the output sum_tsAINs
and sum_validateInds
are the 10 by 1 cell array directly, instead of a M by 1 cell array inside the J by 1 cell array where J = 1.
Input arguments::
dir_tsAIN
: the directory for the saved AIN data. See anomalyind()
.
BestModelNames
: an J-by-1 cell array containing station names corresponding to BestModels
(also J-by-1 cell array).
TimeRange
: a two-element datetime array.
Output arguments::
sum_tsAINs
: a J-by-1 cell array with each cell corresponding to BestmodelNames
. In each cell there is a 10-by-1 cell corresponding to the list of $A_{thr}$ (AthrList = 1:10
), being the timeseries of the summation of anomaly index of each day.
sum_validateInds
: a J-by-1 cell array with each cell corresponding to BestmodelNames
. In each cell there is a 10-by-1 cell corresponding to the list of $A_{thr}$ (AthrList = 1:10
), being the summation of the indicators for the valid statistical indices of each day.
bestmodel
)[BestModels,BestModelNames,idPermute,molList] = bestmodel(molList,BestN,totalM,StationLocation)
filters the best N models out based on fitting degree and perform random permutation of total M combination of them. That is, bestmodel()
save the best N (at most) models for each station in BestModels
, and gives an array of M elements that are randomly picked from the set of the ranking numbers of the best N (at most) models.
Input arguments:
molList
: The list of paths to the files of ranked models, which are produced by molscore
. Get the molList
using datalist(...)
.
BestN
: The number the models of highest molchan score.
totalM
: The amount of randomly permuted combinations of the best N models. That is, M is the number of joint-station models. Noted that the first model combination is forced to be the combination of the best model of each station.
StationLocation
: A J by 3 table with the column names being 'format', 'Lon', 'Lat' and row names (i.e. StationLocation.Properties.RowNames
) the station codes (e.g. 'TW', 'CS',...) of the total J stations.
Output arguments:
BestModels
: A J-by-1 cell array containing the best models of total J stations. In each cell there is a table model parameters.
BestModelNames
: An J-by-1 cell array containing station names corresponding to BestModels
(also J-by-1 cell array).
idPermute
: The randomly permuted indices for the best models. It is a J by 1 cell array, where each cell contains a M by 1 double array being the indices for the Jth BestModels
. Noted that the first element in the M by 1 double array is always 1
, which force the first model combination in jointstation()
to be the combination of the best model of each station (SEE molscore3()
).
molList
: The list of paths to the files of ranked models, where the files having all molchan scores to be NaN (hence model parameters cannot be ranked) are excluded.
Keyword Arguments:
'NearbyThreshold': For some reasons (e.g. no target earthquakes at all in the training phase) models cannot be ranked (optimized) for a station (saying 'MS') because all molchan scores are NaN. In this situation, bestmodel
will try to find if there is a near-by station within 15 km (in default). If there is, for example, a station 'PT' nearby within 15km of 'MS', then the ranked model 'PT' will be taken instead. That is, in the following process in molscore3
, the best models of 'PT' and the AIN based on the geomagnetic data of 'MS' are applied for calculating the probabilities around the station of 'MS'.
'ModelSelect': You may filter the table of Molchan scores by using this option. For example, with 'ModelSelect', {'Tpred', 5}
, all rows whose Tpred
is not 5
will be discard.
'ModelSelectOP'
The operator function for model selecting if 'ModelSelect'
have multiple pairs of input argument.
For example, with 'ModelSelect', {'Tpred', 5, 'Tlead', 5}, 'ModelSelectOP', @or
, all rows with Tpred=5
or Tlead=5
was kept; others, discarded.
Default is @or
.
checkstation
)[isvalid, msg] = checkstation(dir_catalog)
check if station_location.mat/csv
exist in dir_catalog
, and convert the station_location.csv
to station_location.mat
. After successfully create station_location.mat
, the original station_location.csv
will be moved to the folder 'original_csv'.
If the station_location.csv
does not meet the required format, error will occur. The station_location.csv
has to suffice the following condtions:
The 'time' variable should be in the following format: 'yyyy/MM/dd
HH:mm'.
Other variables except 'time' should belong the class of 'double'.
Basicaly the following headers (column names),
{'code','format','Lon','Lat'} have to exist. In which,
'code' is the code for the station, whereas 'format' is the full name of the station. For example, 'MS' (code) corresponds to '馬仕' (format).
'Lon' and 'Lat' are longitude and latitude of the station respectively.
Aliases for the column names in the station_location.csv
are allowed; see the Aliases section below.
Aliases: For convenience, aliases of the column name of an variable in station_location.csv
will be automatically converted:
For longitude and latitude, either {'lon', 'longitude', 'Longitude'},
{'lat', 'latitude', 'Latitude'} will be converted to 'Lon' and 'Lat', respectively.
checkcatalog
)[isvalid, msg] = checkcatalog(dir_catalog)
check if catalog.mat/csv
exist in dir_catalog
, and convert the catalog.csv
to catalog.mat
. After successfully create catalog.mat
, the original catalog.csv
will be moved to the folder 'original_csv'.
If the catalog.csv
does not meet the required format, error will occur. The catalog.csv
has to suffice the following condtions:
The 'time' variable should be in the following format: 'yyyy/MM/dd
HH:mm'.
Other variables except 'time' should belong the class of 'double'.
Basicaly the following headers (column names),
{'time','Lon','Lat','Mag','Depth'}, have to exist.
Aliases for the column names in the catalog.csv
are allowed; see the Aliases section below.
Aliases: For convenience, aliases of the column name of an variable in catalog.csv
will be automatically converted:
For earthquake magnitudes, the header of either {'Magnitude', 'magnitude', 'ML'} will be automatically converted to 'Mag'.
For depths, the header of either {'depth', 'dep', 'Dep'} will be converted to 'Depth'.
For longitude and latitude, either {'lon', 'longitude', 'Longitude'},
{'lat', 'latitude', 'Latitude'} will be converted to 'Lon' and 'Lat', respectively.
For convenience, in catalog.csv
, if the event time is written in separated two columns 'date' and 'time', with format 'yyyy-mm-dd' (or 'yyyy/mm/dd') for 'date' and 'hh:MM:ss' (or 'hh:MM'), they will be merged as a single 'time' variable sufficing the format mentioned before.
modparam
)In MagTIP, model parameters defines TIP and target earthquakes (EQK). [PredParam,varargout]=modparam()
generate the whole set of model parameters that are going to be used in the training phase by molscore
.
Keyword Arguments:
'Rc': Radius (range) of detection in unit kilometer. Default is [20:10:100])
'NthrRatio':
The ratio threshold to be converted to the number of anomalous statistic index every day.
'NthrRatio' should be calculated automatically in molscore()
according to the maximum possible number of statistic indices among the anomaly index calculated by anomalyind()
.
Nthr
is the threshold (integer) for the number of anomalous statistic indices each day.
Default is [0.01, 0.99]
, which is for the situation where the maximum possible Nthr
is $\leq 2$ (i.e., to be [0, 1]). If you have the maximum possible Nthr
to be $\leq 3$ (i.e, Nthr = [0 1 2]
), 'NthrRatio' should be assigned to be [0.01, 0.49, 0.99]
for example.
'Ptthr': Ptthr
defines Tobs
through Ptthr = Tthr/Tobs
. Default is [0.1:0.1:0.5]
.
'Tobs': The length of observation moving window. Default is [5:5:100]
;
'Tpred': The length of prediction moving window.
'Tlead': The length of leading window, the time gap that is for the period of silent (no-anomaly) right before big earthquakes.
Default is [0:5:100].
'Mc': The magnitude threshold. Default is 5
, which means only events of $M_L \geq 5$ will be considered as target earthquakes.
'Test':
If true or 1, then the PredParam is reduced to 1000 randomly picked models from the default combination.
If it is a positive integer N other than 1, the predParam is reduced to N randomly picked models.
Default is 0.
'Seed': the seed for the random selection while 'Test' is enable.
Seed must be a nonnegative integer seed less than 2^32.
If seed is less than zero, the random selection has its result different everytime.
Output:
PredParam
: The GEMSTIP/MagTIP models of total 8 free parameters: G = [Mag,Rad,NthrRatio,Athr,Tthr,Tobs,Tpred,Tlead]
.
Examples: - [PredParam, columnNames] = modparam('OutputFormat','double');
- [PredParam_table] = modparam('OutputFormat','table');
All Tools are not necessary for the MagTIP algorithm; they are invented, for example, to demonstrate the results in figure or for generating/selecting directories in a convenient way.
plot_dataoverview
)plot_dataoverview(dir_stat, dir_catalog)
plot an overview of all data. Keyword Arguments:
'SaveFigureTo': the output path
'BackgroundColor': Background color; default is white ([1,1,1]
)
'DatetimeTicks': The format of datetime ticks on the x-axis.
Default is {'yyyy-mm',6,'Months'}
, which means tick labels are in the format of yyyy-mm
and interspaced every 6 months.
plotProbability
)plotProbability(dir_jointstation,dir_catalog,dir_out)
plots two-dimensional probability forecasting maps; the results are saved as png files in dir_output
.
Keyword Arguments:
'LongitudeLimit': The longitude limits of the displayed map. Default is [119 122.5]
.
'LatitudeLimit': The latitude limits of the displayed map. Default is [21.5 26]
.
'TimeRange': Manually assign time ranges or specific dates that must lay in the forecasting phase to plotting probability forecast. If not set, the probability forecast of every available dates will be plotted. The assigned datetime array must be either:
An N by 2 datetime array specifying N ranges at one time.
An N by 1 datetime array specifying N days to be plot.
'PlotEpicenter': Whether to plot the epicenter(s) of the target
earthquakes on the map. plotEpicenter,'all'
to plot events not in Rc
'Rigorous': Whether to drop (ignore) the probability forecasts that are not in the range of the forecasting phases. Default is true. Generally the leading time window (Tlead
) is different among models, and the probability can be calculated as far as the Tlead
of at least one model covered. However, for the dates out of the forecasting phase, only part of the models are applied thus the probability forecast is weaker.
'TargetPattern': the pattern for filtering the files in dir_jointstation
. Default is to look up all files that fits the pattern "[JointStation].mat". For example, you can manually specify it as "[ULFA]" to give plots of that based on ULFA only. The pattern should always begin with "" and the last cahracter can never be "". For multiple filtering at once (e.g., '*[ULF_B]*[Tpred-1]'
), join the pattern with "*" and the order is important.
plotFittingDegree
)plotFittingDegree(dir_jointstation,dir_catalog,dir_png)
gives fitting degree analysis and molchan diagrams for each training-forecasting phase pair according to the results from molscore3
.
plotEQKTIP1
)plotEQKTIP1(dir_tsAIN,dir_molchan,dir_catalog,dir_output)
plot one-dimensional TIP time series for each station with target earthquakes (EQK) scattered over the diagram; the results are saved as png files in dir_output
.
Keyword Arguments:
'ShowTrainingPhase': Whether to show the EQK and TIP on the plot. Default is false.
'scatter': Set it true
to plot an additional layer of TIP==1
as scattered circles. This makes visually inspecting whether target earthquakes lay in the area of TIP==1
much easier. Default is false
.
'Rank': Choose the model by fitting degree ranking for defining EQK and TIP. Default is 1
(plot the EQK and TIP that are defined by the rank 1 model).
'OnlyStations': Specify only a subset of station to be displayed on the plot; for example, for 'OnlyStations',{'MS','TW'}
only the results of these two stations will be displayed. In default, results of all stations will be displayed.
'datetimeTickArguments': Manually define the datetime format of the X tick labels. For example, for 'datetimeTickArguments', {"yy' mmm.", 1,'months'}
the datetime 2012-Nov-12 will be printed as "12' Nov.". For more information, see datetime_ticks()
.
'TargetPattern': See the documentation in plotProbability()
.
[Molchan]Information.mat
(constructMolInfo
)constructMolInfo(dir_molchan,InfoId)
construct the '[MolchanScore]Information.mat' according to existing [Molchanscore]___.mat files if possible. Use this function only when you are instructed by error messages.
calcFittingDegree
calcFittingDegree(jpathlist)
according to the given files (jpathlist
) provides the overall alarmed rate, missing rate that allows the calculation of the overall fitting degree. Make sure to provide correct input list of the [JointStation]
variable , for example, those have the same ID and are not overlapped in forecasting time interval for each group; otherwise the calculated fitting degree can be unreasonable.
Example:
dir_png = 'D:\GoogleDrive\0MyResearch\CWB_project\CWB2021\Figures';
jpathlist = datalist('[JointStation]ID[ou7ud]prp[ULF_B]*.mat', dir_jointstation).fullpath;
[AlarmedRate, MissingRate, xticklabel, EQKs, TIP3s, TIPv3s,TIPTimes,LatLons] = calcFittingDegree(jpathlist);
FittingDegrees = 1 - AlarmedRate - MissingRate;
plotEQKTIP3(dir_png,prp_i, xlabels, EQKs, TIP3s, TIPv3s,TIPTimes, LatLons);
Input Arguments:
jpathlist
: a cell array of the full paths of [JointStation]
files that are produced by molscore3
. You can simpliy obtain the path list by jpathlist = datalist('[JointStation]ID[ou7ud]*prp[ULF_A]*slc[Tpred-10]*.mat',dir_jointstation).fullpath;
Keyword Arguments:
'GroupTag': The tag for grouping the files in jpathlist
.
'GroupNames': Results are calculated separately according to the assigned group names; alarmed rate, missing rate and so on are only calculated if the file name contains the assigned group names. For example, for ...,'GroupNames',{'Tpred-1', 'Tpred-5'},'GroupTag', 'slc',...
, only the files with their names containing tag 'slc[Tpred-1]' and 'slc[Tpred-5]' are selected, and the results of those being 'Tpred-1' and 'Tpred-5' are separately calculated. That is, the output xlabel
is {'Tpred-1', 'Tpred-5'}
and other output arguments (e.g. AlarmedRate
) are all cell array of the same dimension as xlabel
containing the results of the groups that are calculated separately.
Noted that You cannot assign 'GroupName' without assigning 'GroupTag', but assigning 'GroupTag' without assigning 'GroupName' is OK, in this case the group names will automatically generated and sorted.
get_modelparam
)Use [var1, var2, ...] = get_modelparam(BestModels, 'name1', 'name2',...)
to get specific model parameters uniquely. BestModels
is the output of bestmodel()
.
Example:
Tpreds, Tleads = get_modelparam(BestModels, 'Tpred', 'Tlead')
Molchan_CB
)[molt_cb,moln_cb] = Molchan_CB(N,alpha)
gives the confidence boundary in molchan diagram, where [molt_cb,moln_cb]
are the points defining the boundary on the alarmed-rate-against-missing-rate phase plane.
Input Arguments:
N
: total number of target events/earthquakes
alpha
: 1-alpha
is the confidence level. For example, alpha=0.05
means 95%
confidence level.
Output Arguments:
molt_cb
: values corresponding to the alarmed rate
moln_cb
: values corresponding to the missing rate
Hint:
Calculate the $D$ (fitting degree) values by Dcb = 1 - molt_cb - moln_cb
mkdir_default
)In fact, all input/output directories can be arbitrarily assigned; we also provide a tool mkdir_default
to automatically generate an organized directory structure.
mkdir_default
creates/makes folders by default and return their directories with default variable names. The default structure is:
variable default folder name
╔═ dir_main ═╤═══════════════════════════════════════╗
║ ├─dir_stat = 'StatisticIndex' ║
║ ├─dir_tsAIN = 'tsAIN' ║
║ ├─dir_molchan = 'MolchanScore' ║
║ └─dir_jointstation = 'JointStation' ║
╚════════════════════════════════════════════════════╝
Example: [dir_stat, dir_tsAIN, dir_molchan, dir_jointstation] = mkdir_default(fullfile(pwd,'output_var'))
dirselectassign
)dirselectassign(var_names...)
prompts user to select directories in a dialog box, and assigned the selected path to workspace with default variable name. If a variable with the same name as the default name has already in the workspace, its assignment will be ignored (i.e. its dialog box won't pop out). This is a tool for convenience. You can always assign directories explicitly to variable with any name you like.
Example:
Four windows will pop out one by one allowing you to assign directories to variables dir_stat
, dir_tsAIN
, dir_molchan
, dir_jointstation
:
dirselectassign('dir_stat','dir_tsAIN','dir_molchan','dir_jointstation');
(Read the printed message in the Command Window)
Total 7 windows will pop out one by one allowing you to assign directories to the variables with default names dir_stat
, dir_tsAIN
, dir_molchan
, dir_jointstation
, dir_data
, dir_catalog
, dir_toolbox
:
dirselectassign();
(Read the printed message in the Command Window)