Library

Warning

Please don’t assign undocumented name-value pair parameters to the functions. For example, 'CreateInfoOnly' and 'InParforLoop', which are preserved for parallel computing, and you will never need to set them manually unless you are a developer.

Main Functions

There are four major stages in the complete GEMS-MagTIP workflow, and each stage is wrapped by the main functions as below:

statind: The calculation of statistical indices.
anomalyind: The calculation of anomaly indices number according to statistical indices.
molscore: Training phase based on anomaly indices, which involves the calculation of single-station TIP, matching TIP by target earthquakes and calculate Molchan scores accordingly.
molscore3: Forecasting phase based on the model parameters of the highest Molchan scores, involving the calculation of joint-station TIP and probability based on the results of 2 and 3.

The four stages is wrapped by four functions with keyword arguments that we can customize some hyper parameters which are associated with model optimization and probability forecast.

Docstring

`statind`

statind(dir_data,dir_output) calculate daily statistics (a statistical quantity as an index of the day) of the daily timeseries in dir_data. The output variables are stored in dir_output.

Example:

statind(dir_data,dir_output,'StatName','S','StatFunction',@skewness, 'Preprocess', {'ULF_A','ULF_B'});

In which, dir_data is the directory for the time series in the standard format; dir_output is the dir_stat mentioned before.

Keyword Arguments:

'StatName'
- The abbreviations for the name of statistical indices. They can be arbitrarily defined but have to be the same number of elements as that of 'StatFunction'.
- Default is {'S','K'} (for Skewness and Kurtosis).
'StatFunction'
- The function handle for calculating statistical index.
- It has to be of the same number of elements as that of 'StatName'
- Default is {@skewness,@kurtosis} for calling the skewness() and kurtosis() functions.
'Preprocess'
- Apply filter(s) to time series loaded from dir_data.
- Default is {'no'}, which applies only minimal preprocessings.
- Use prpfunctions() to list all available preprocessing functions.
- If multiple preprocessing functions are applied, for example {'no', 'ULF_A'}, then two sets of result according to no-filter data and ULF_A band passed data are going to be produced.
'FilterByDatetime'
- It should be a two element datetime array.
- If applied, all files with date time tag not in the range will be ignored.
- Default is [datetime(0001,1,1), datetime(2999,12,31)], resulting in no data selection by date time tag.
'UpdateExisting'
- If true, the array size in the existing statistic indices (files starts with “[StatisticIndex]”) will be extended, and the DateTime in Information file in the dir_output will all be updated to cover the datetime of the data in dir_data. (Hint: use 'FilterByDatetime' to avoid re-calculate all data in the dir_data)
- Default is false; that a complete new session starts with the old Info renamed.
- Currently this option is not supported by statind_parfor.
- Noted that 'SkipExist' will be set to false if 'UpdateExisting' is true to avoid unexpected behavior (there must be something wrong that you update the array sizes of the existing files to adapt latest DateTime, but also skip the calculation of statistical indices for those files).
- Noted that 'UpdateExisting' not only extend the full date time list to the latest, but also to the earliest available data time obtained according to the data list of dir_data. Thus, you may like to apply 'FilterByDatetime' to prevent the update according to earlier pasts.
'SkipExist'
- Set it true to skip calculations if in the current loop a file in dir_stat of the
same 'stn' and 'prp' already exists.
- A typical scenario to use this option is when your last calculation is not completed yet.
- Default is false.
'LoadFilters', dir_filter
- Load all filters in dir_filter once for performance. See loadfilters, generalfilterprocess.

`statind_parfor`

statind_parfor is the parallel computing version of statind, it takes the same input arguments and keyword options as statind.

Noted that the option 'UpdateExisting', true is not supported in statind_parfor; besides this option, all documented statind keyword arguments are supported as well as the following additional Keyword Arguments:

'NumCore': Assign specific how many CPU workers to be used in the paralell pool.
Default is 0, that the number of CPU cores to be used for paralell computing is automatically decided.
'ReverseDatalist': Simply reverse the list of data. This might save time when you try to continue from previous unfinished calculation with the 'SkipExist' set true. The default is false.

`anomalyind`

anomalyind(dir_stat,dir_output) calculates anomaly index number (AIN) according to a list of \(A_\text{thr}\); dir_out is where the output variables stored.

Example:

anomalyind(dir_stat,dir_tsAIN);

Keyword Arguments:

'AthrList':
- the list of \(A_\text{thr}\) (the multiplier of the median value of the statistical indices within the moving time window)
- default is [1:10]
'MovingWindow'
- the moving-window length for calculating the median value mentioned above
- default is 1000 (days)

Input and output:

anomalyind takes the statistic index output from statind by loading them from dir_stat.
The calculated anomaly index is saved in dir_tsAIN.

`molscore`

molscore(dir_tsAIN,dir_catalog,dir_molchan) is responsible for single station’s TIP & Molchan score calculation.

Example:

dir_catalog = 'Workspace/GEMS-MagTIP-insider/spreadsheet';
dir_tsAIN = 'Workspace/var-output/tsAIN-J13-TZ2';
dir_molchan = 'Workspace/var-output/MolchanScore-J13-TZ2';
molscore(dir_tsAIN,dir_catalog,dir_molchan);

Keyword Arguments:

‘TrainingPhase’
- Assigns a (set of) training phase(s). It should be of type 'calendarDuration', 'duration', an N by 2 array of 'datetime', or an N by 2 cell array, where N is the number of the training phases. If given an N by 2 array specifying N training phases, then N sets of results will be produced separately, with output format being '[MolScore]stn[%x]ID[%s]prp[%s]dt[%s-%s].mat'.
- For example, a 4 by 2 datetime array reshape(datetime(2009:2016,2,1),[],2) specifies the start and end date of 4 training phases, with the first column being the datetime of the start and the second column being the end of each training phases.
- For example, a 3 by 2 cell array
```
{calyears(7),datetime(2012,11,11);...
 calyears(7),datetime(2011,11,11);...
 calyears(7),datetime(2010,11,11)};
```
  specifies the end day of the training phases as 2010-Nov-11, 2011-Nov-11 and 2012-Nov-11, all having a length of 7-year-long training period (i.e. calyears(7)). If the duration is negative (e.g. -calyears(7)), the datetime of the second column become the first day of each training phase.
- Default is calyears(7), which specifies the end day of training phase the day before the last day of statistical indices or anomaly index number, with a length of 7 year period. That is, in default it “trains” and gives the best models according to the most recent 7-year data.
‘modparam’
- Specify the model parameters for grid search. It should be a cell array; all elements in the cell array will be directly passed into modparam() as input arguments.
- For example, {'Athr',[1:2:10],'Rc',[20, 30]}.
- Default is {}.
‘AdditionalCheck’
- Apply some additional check and tests. This option is for developer.
- Default is false.
'SkipExist'
- Set it true to skip calculations if there is a file in dir_output having exactly the same name already (which means, having the same filter tag, station name and training phase).
- A typical scenario to use this option is when your last calculation was aborted manually or interrupted by an error event.
- Default is false.

Input and output:

molscore takes anomaly indices from dir_tsAIN and earthquake catalog and station information from dir_catalog.
The output ranked models are saved in dir_molchan.
In the "[MolchanScore]Information.mat", Info.DateTime is inherited from "[tsAIN]Information.mat" (whose DateTime is inherited from "[StatisticIndex]Information.mat")

`molscore_parfor`

molscore_parfor is the parallel computing version of molscore, it takes the same input arguments and supports all documented keyword arguments as molscore.

`molscore3`

molscore3 calculates the joint-station TIP, the Molchan score between EQK and TIP, the Hit rate, and the earthquake probability. The calculation is based on optimized model given by molscore.

Example:

dir_catalog = 'Workspace/GEMS-MagTIP-insider/spreadsheet';
dir_tsAIN = 'Workspace/var-output/tsAIN-J13-TZ2';
dir_molchan = 'Workspace/var-output/MolchanScore-J13-TZ2';
dir_jointstation = 'Workspace/var-output/JointStation-J13-TZ2';
molscore3(dir_tsAIN,dir_molchan,dir_catalog,dir_jointstation)

Keyword Arguments:

‘ForecastingPhase’
- Catalog will be filtered according to this parameter.
- It can be a N by 2 datetime array with its size identical to the training phases (trnphase), specifying the time ranges of N forecasting phases.
- It can be ‘calendarDuration’ or ‘duration’, saying T. In this case, saying the last day of the training phase is trnphase(i, end), the forecasting phases are set to start from trnphase(i, end) + 1, and end at trnphase(i, end) + T.
- ‘auto’. In this case, forecasting phases are set to be as long as possible. That is, until the last day where tsAIN is available.
- Default is one calendar year: calyears(1).
- Also see formatForecastingPhase().
- Noted that the actual ProbabilityTime in the output is extended according to the maximum of leading time (Tlead) and predicting time (TPred) model parameters, to maximize the extent that the model can predict. For details, please see refer to the source code of jointstation for how ProbTime is defined.
‘OverwriteFile’
- Whether to overwrite existing output files or not.
- Default is true.
‘ModelSelect’
- see bestmodel()
‘ModelSelectOP’
- see bestmodel()
‘ChooseBest’
- Define the number of maximum best models for each station to be applied.
- Default is 10, which means we pick the models of top 10 fitting degrees each station for calculating predicted TIP(s).
‘CombinationNumber’
- Define the total number of random combinations among the best models of each station for joint-station TIP calculation.
- Default is 500, which means for every station a random permutation of ranking numbers (based on the fitting degree) of the best models is performed with each sequence of ranking number having 500 elements, and the ith joint-station model parameter combination is thus from the best models of each station indexed by the ith element of each permutation.
‘WithExtra’
- If ‘WithExtra’ is true, the returned InfoTable will include a variable for Other data.
- Please refer the output arguments of jointstation.
- Default is false.

Input and output:

molscore3 takes anomaly indices from dir_tsAIN, earthquake catalog and station information from dir_catalog, and ranked model form dir_molchan.
The output probability is saved in dir_jointstation.

`molscore3_parfor`

molscore3_parfor is the parallel computing version of molscore3, it takes the same input arguments and supports all documented keyword arguments as molscore3.

Load raw data and convert it into a standard format

Data of standard format denotes the formatted matfile that should contain only 1 field where the field name can be arbitrarily chosen. Saying the field to be M, M has to be a matrix where the column indices to time and data are assigned by function fmt.colindex2data (see fmt) depending on the type of the data.

Warning

Noted that fmt.colindex2data returns a set of indices for indexing into the data of standard format, whereas columnind (see docstring below) returns a set of indices for indexing into the raw data.

The naming of standard format data is critical. Please refer standarddataname in the following docstring for the naming rule.

Docstring

`conv_gemsdata`

conv_gemsdata(dir_gems_raw, dir_data, dir_catalog) read original GEMS’s data (e.g., 2012_02_07_16_45_00.dat) in dir_gems_raw, merge and convert them to the standard format for GEMS-MagTIP.

The GEMS’s raw data is recorded at local time (UTC+8), and conv_gemsdata converts the time to UTC time.

Also see "read_gemsdata.m". Example:

dir_gems_raw = 'rawdata/GEMSdat';
% where the individual data locates in for example:
% 'rawdata/GEMSdat/em10/REC/Y2012/M02/D07/2012_02_07_16_45_00.dat'
dir_data = 'dataStandard/GE';
dir_catalog = 'spreadsheet'; % for obtaining the station information ("station_location.mat");
conv_gemsdata(dir_gems_raw, dir_data, dir_catalog);

Keyword Argument:

‘FilterByDatetime’: See conv_geomagdata.

`conv_geomagdata`

The original geomagnetic data (which are those in “.csv” format being something like “2008010300.KM” or “20190307.LY”) should be converted to a standard format before any calculation. conv_geomagdata(dir_mag_raw, dir_data) read original data in dir_mag_raw and save them in the standard format at the directory dir_data.

Keyword Argument:

‘ContinueFromLast’: Default is false. If true, it compares the names of all old files in the dir_mag_raw and the new files in dir_data before conversion to avoid files that are already being converted to be converted again. This additional procedure may take several hours depending on the size of database; a more efficient way for avoiding repeated processing is to manually specify ‘FilterByDatetime’. See below.
‘FilterByDatetime’: Only convert the files in the assigned time range(s). It supports:
- A datetime. For example, when 'FilterByDatetime', datetime(2010,10,10), only files with time tag being or after 2010-Oct-10 will be converted.
- N by 2 datetime array, for example,
```
'FilterByDatetime', [datetime(2009,12,10), datetime(2010,10,10);...
                     datetime(2013,12,10), datetime(2017,10,10)];
```
  , only the files in the two time ranges [2009-Dec-10, 2010-Oct-10] and [2013-Dec-10, 2017-Oct-10] will be converted; otherwise, ignored.
‘Test’: It take only the assigned ratio (0-1) of data (files) in the directory dir_mag_raw. The files are randomly chosen. This option should only be applied when you want to test your code for reducing the overall computing time. Default is 0 (when the assigned value for ‘Test’ is \(\leq 0\) or \(\geq 1\), there is no reduction in the data list).

NOTICE:

Files in the original format must be uniquely named as ‘20200101.HC’ or ‘2020010109.HC’ for example.
If not uniquely named, such as ‘../HC/20200101.txt’, ‘../CS/20200101.txt’, the second one will be regarded as a duplicated file and be moved out to an alternative folder (dir_alt)
every file should have its extension as the code of the corresponding station, e.g. ‘20200101.HC’ is correct; ‘20200101.txt’ is not valid and an error will occur.

`readgems`

A general function for reading original gems data. It dispatch the task to the specified function according to file extension.

`read_gemsdat`

read yyyy_mm_dd_HH_MM_SS.dat from GEMS developed by AnaSystem The following is GEMS.dat format: Header is the first 100 bytes. All parameters in header is in a unit of 2 bytes Hence, the header has 50 fields.

0-11 bytes is for date
0-1: year
2-3: month
4-5: day
6-7: hour
8-9: minute
10-11: second
12-13: sampling rate
14-15: Channel Options (15 has 4 CHs,255 has 4 CHs,7 has 3 CHs,3 has 2 CHs)
16-99 bytes are padded by zero, for future use.

From the 100th byte on, every 4 byte is used to restore data. The order is TimeStamp, Ch1, Ch2, Ch3, Ch4. If one channel is not used, skip it. e.g., only use chn1 and chn3, then recording order is TimeStamp, Ch1, Ch3, TimeStamp, Ch1, Ch3, and so on. In every cycle, TimeStamp is a 4-byte integer, time drift from the header time, unit is minisecond. channel’s data is 4-byte single-precision float. Note that the byte order of all data is “little endian.” The value of voltage is between -10 V and 10 V.

function [data, header] = read_gemsdat(absolute_FileName) input: absolute_FileName: string, absolute full file name output: data = [time,chn1,chn2,chn3,chn4], N x 5 header = [yyyy;mm;dd;HH;MM;SS.SS;fs;chnOPT;…] built-in func: fopen, fclose, fread, fseek self-defined func:

usage:

[data, header] = read_gemsdat('g:\GEMSdat\em10\REC\Y2012\M02\D07\2012_02_07_16_45_00.dat');

Origin

readanasystem.m
Author: Han-Lun Hsu

First modification

Matlab version: 2018b
Author: Hong-Jia Chen (redhouse6341@gmail.com)
Date: 2018/07/01

Second modification

Matlab version: 2021b
Author: Hong-Jia Chen (redhouse6341@gmail.com)
Date: 2021/10/11

The docstring is revised by Tsung-Hsi Wu in 2024.

`geoelectric_projection`

One field data of any direction can be determined by data of another two directions E(phi)=[-E1*sin(phi-theta2)+E2*sin(phi-theta1)]/sin(theta2-theta1) In this framework, make sure Common Electrode is negative for all channels That is, A is CH01+, B is CH02+, C is CH01- and CH02- function [E] = geoelectric_projection(phi, E1, theta1, E2, theta2) input: phi, theta1, theta2 = ‘degree’ from the north, positive for clockwise E1, E2 = field data of any two direction output: E = field data of the specific direction phi phi = 0 means north phi = 90 means east built-in func: sind numel user-defined func:

usage:

[E_EW] = geoelectric_projection(90, E_chn1, 15, E_chn2, 120); 
[E_NS] = geoelectric_projection(0, E_chn1, 15, E_chn2, 120); 
% where  East is positive;  North is positive

Reference

Takahashi et al. (2005): ULF Electromagnetic Environment at Southern Boso Peninsula : Signal Discrimination of the Geoelectromagnetic Data

Origin

Matlab version: 2018b
Author: Hong-Jia Chen (redhouse6341@gmail.com)
Date: 2020/09/02

The docstring is revised by Tsung-Hsi Wu in 2024.

`columnind`

Return column indices for time, data and others according to the type field on the file name of standard format GE/GM data.

`standarddataname`

[fname] = standarddataname(stnm, dtstr, type_j) returns the file name of the standard data format. For example, standarddataname('HUAL', '20120202', 'GEMS0') returns stn[HUAL]dt[20120202]type[GEMS0].mat.

`write_data`

Given the reference tag strcheckwith (that automatically updated inside this function) and the reference tag strj (defined outside the function), write_data creates a new file fname if they are different, and write M into the same matObj if they are the same.

Example In the following example, New matfile matObj is created in the first loop with matObj.M = M. In the subsequent loops, new data M will be appended to the same matObj.M until datestr is updated.

dtstr0 = '00000000';
matObj = 0;
    for j = 1:height(table_i)
        dtstr_j = table_i{j, 'datestr'}; 
        ...
        [matObj, dtstr0] = write_data(M, matObj, dtstr0, dtstr_j, dir_i, fname, dir_alt);
    end

Available Preprocessing Functions

In GEMS-MagTIP, signal processing is conducted via preprocessing functions, Preprocessing functions are special functions that are located in src/preprocess. The no function that simply import the data from a given file path is necessary. You can use prplist = prpfunctions() to list all preprocessing functions. Filtering functions (such as ULF_A,ULF_B in prplist) share the same core function generalfilterprocess that takes the output of loadfilters as their inputs.

What do preprocessing functions do?

import data (namely, M)
do some pre-process (e.g., band-pass filtering)
output the processed data as a matrix the same columns as M
columnind is applied in preprocessing functions to acquire the indices to time columns, data columns, and etc..

You can define your own preprocessing functions to work with `statind`, as the following points should be satisfied:

The first input argument should be the path fpath to the file of standard format.
Import data from fpath.
Make sure the column indices is compatible with columnind.
Put the .m file for this customized function under the directory src/preprocess (put in the same place as no.m).

How to use preprocess functions

Via the interface of statind: statind(..., 'Preprocess', 'ULF_A') (ULF_A can be substitute with any function name that prpfunctions() lists.
See Examples below:

Examples

There are four built-in preprocessing functions, BP_35, BP_40, ULF_A, ULF_B; these filters can be applied in the statind stage via the 'Preprocess' keyword argument.

Here is an example of the direct use (not via 'Preprocess' interface in the statind stage) of BP_35:

fpath = 'D:\GeoMag (main)\GeoMag_GO\KUOL\stn[KUOL]dt[20150714]type[GEMS].mat';
M_out = BP_35(fpath);

In this example, data is imported from fpath as variable, then call generalfilterprocess which by default use bandpass that creates a band-pass filter every time a data is imported, and output the filtered data as M_out having exactly the same column dimension as M (but with rows might be reduced).

Here is another example, which apply existing filters (e.g., 'Filter[BP_35]samp[1]stp[093].mat') in dir_filter if possible:

fpath = 'D:\GeoMag (main)\GeoMag_GO\KUOL\stn[KUOL]dt[20150714]type[GEMS].mat';
filtpaths = datalist('Filter*.mat', dir_filter);
FOBJ = loadfilters(filtpaths); % load all filters as a struct FOBJ
M_out = BP_35(fpath, FOBJ);

Docstring

`BP_35`

The preprocessing function that filter the loaded timeseries with 'BP_35' band-pass filter. It applies down sampling to 1 Hz before band-pass filtering ('Downsampling1Hz', true), and applies edge truncations right before output ('EdgeTruncate', 3162); please refer generalfilterprocess for more details. The lower and upper bounds of frequency are [10^(-3.5) 10^(-1.75)] (in unit Hz); please refer fmt.filterRange.

`BP_40`

The preprocessing function that filter the loaded timeseries with 'BP_40' band-pass filter. It applies down sampling to 1 Hz before band-pass filtering ('Downsampling1Hz', true), and applies edge truncations right before output ('EdgeTruncate', 10000); please refer generalfilterprocess for more details. The lower and upper bounds of frequency are [10^(-4) 10^(-1.75)] (in unit Hz); please refer fmt.filterRange.

`ULF_A`

The preprocessing function that filter the loaded timeseries with the 'ULF_A' band-pass filter. It applies down sampling to 1 Hz before band-pass filtering ('Downsampling1Hz', true), and applies edge truncations right before output ('EdgeTruncate', 1000); please refer generalfilterprocess for more details. The lower and upper bounds of frequency are [0.001 0.003] (in unit Hz); please refer fmt.filterRange.

`ULF_B`

The preprocessing function that filter the loaded timeseries with the 'ULF_B' band-pass filter. It applies down sampling to 1 Hz before band-pass filtering ('Downsampling1Hz', true), and applies edge truncations right before output ('EdgeTruncate', 1000); please refer generalfilterprocess for more details. The lower and upper bounds of frequency are [0.001 0.01] (in unit Hz); please refer fmt.filterRange.

`no`

M_prp = no(fpath) does no preprocessing, loads and returns the data of fpath in the form of matrix.

Other functions might also depends on no. For example, prpfunctions is dependent on no to get the directory of all functions for preprocessing.

Note

generalfilterprocess is the core shared among 'ULF_X' and BP_X family; please refer Filtering Functions.

Filtering Functions

Docstring

`autobandpassfilter`

Given a file of standard format, create and save bandpass filter accordingly. See also fmt; generalfilterprocess.

`butterfilt`

Butterworth filter.

`filterthedata`

M_dataf = filterthedata(M_input,prp_tag,...
    'SamplingFrequency',fs,...
    'RemoveOutliers',true);%

where M_input is an n by m array where n is the number of data points, m is the number variable. That is, each row of M_input is a data point; each column is a variable.

`generalfilterprocess`

Given prp_tag, generalfilterprocess(prp_tag, fpath, 'AvailableFilter', FOBJ) load data from fpath, filter only the data column and return the data as M_prp. The frequency range is inquiried through fmt.filterRange or FOBJ.(prp_tag).(fs_tag).

If FOBJ is provided, generalfilterprocess attempts to do filtering with filters in FOBJ; if there is no adequete filter in FOBJ, it applys filterthedata that generates filter for file fpath with warning message.

Option:

'EdgeTruncate', 1000: Truncate the head/tail by 1000 data points right before output.
'AbnormalLowValue', 1e-6: Replace data of absolute values less than 1e-6 to NaN.

generalfilterprocess involves the following processin; see the docsting therein:

loadfilters for information about FOBJ, fs_tag and prp_tag.
fmt for fmt.filterRange
filterthedata for automatically create a filter and do filtering
fillNaN for interpolation (which is required before any filtering)
autobandpassfilter for quickly making and saving filters.

`loadfilters`

F = loadfilters(filtpaths) load filters as a nested structure, that opts = F.(prp_tag).(fs_tag) is compatible to signal.internal.filteringfcns.filterData in bandpass. In which, prp_tag should be in the list of prpfunctions() and fs_tag is the sampling frequency prefixed by “fs” (e.g., ‘fs15’ for 15Hz).

The simplest way to create and save a filter is apply bandpass to data with a certain set of parameters, set debug point inside bandpass, and save opts in the debug scope. For canoical approaches, see Filter Design in Matlab.

The filters should named as ‘Filter[ULF_A]samp[15]….mat’ for example.

The input filtpaths should be a struct having name for file names and fullpath for file paths.

`try_generalfilterprocess`

A try-catch wrapper to avoid aborting execution due to error raised in generalfilterprocess.

Internal API for TIP Calculation

Docstring

`TIPArea`

Given the Longitude Lon, Latitude Lat and the radius Rc, [isinS,insideRatio] = TIPArea(Lon,Lat,Rc,xLon,yLat) returns a logical vector indicating whether each grid point specified by xLon and yLat is inside Rc centered at (Lon, Lat).

TIPArea used isInpolygon of okatsn/toolbox which simply splits circles by NaN before applying matlab’s inpolygon. In the official documentation of inpolygon in the section “Points Inside Multiply Connected Polygon”, if the polygon in the inner loop are defined in a counterclockwise direction while the outer one in a clockwise direction, inpolygon identifies the points inside the outer polygon but not inside the inner polygon. What the official documentation didn’t explicitly say is, if two (or more) polygons are all defined in the same direction, which is our case that all circles are defined in the counterclockwise direction, inpolygon identifies points in either polygon, which the expected behavior in our application. That is, when the TIP area of one station overlapped with the TIP area of another one station, isinS is true if a grid point is inside of either and (xLon(isinS), yLat(isinS)) are essentially the union of the TIP area of all stations.

Unit:

xLon (yLat): degree of longitude (latitude).
Rc: kilometer. Noted that Rc is convert to the unit of degree and be the mean of those based on the latitude and longitude of the center respectively. (Hint: when converting km to deg, it depends on latitude and the longitude. For example, at lower latitude, the distance of 1 deg interval in longitude is much larger than that of 1 deg interval at high latitude.)

Input:

xLon (yLat): N by 1 array for longitude (latitude)
Rc: M by 1 array of raddi
Lon, Lat: M by 1 array of the coordinate of centers each.

Output:

isinS: a N by 1 array, where xLon(isinS),yLat(isinS) specify all the points (clearly non-repeated) that are within the ranges of radii Rc inside the centers specified by Lon, Lat.

`TIPspatialtemporal`

Given the dates to calculate TIP area dates_TIPtrue, the range (Rc) around the center point (LatLon) in LatLonRc_TIPtrue, and the grid points of the entire map (xLon, yLat), [dt_TIP_true_uniq,TIP_true_area,isinTS] = TIPspatialtemporal(dates_TIPtrue,LatLonRc_TIPtrue, xLon, yLat) returns the unique and sorted vector of dates (dt_TIP_true_uniq) and the vector of spatial TIP area TIP_true_area for each date in dt_TIP_true_uniq.

The value of TIP_true_area is the ratio of spatial TIP area to total area. When in the context of calculating the total amount of alarmed area, sum(TIP_true_area) is the alarmed area ratio to the entire 2D space specified by xLon and yLat (i.e., which are all grid points in the region of [TwLat0, TwLat1],[TwLon0, TwLon1]). See TIPArea for more information.

isinTS is a S by T logical array specifying spatial-temporal points that are within the ranges of radii Rc inside the centers specified by Lon, Lat, where [Rc, Lon, Lat] = LatLonRc_TIPtrue is a M by 3 array where \(M \geq T\) must be satisfied.

Noted that dt_TIP_true_uniq are successive (e.g., 1, 2, 4) dates but not necessarily consecutive (e.g., 1, 2, 3).

`bestmodel`

[BestModels,BestModelNames,idPermute,molList] = bestmodel(molList,BestN,totalM,StationLocation) filters the best N models out based on fitting degree and perform random permutation of total M combination of them. That is, bestmodel() save the best N (at most) models for each station in BestModels, and gives an array of M elements that are randomly picked from the set of the ranking numbers of the best N (at most) models.

Input arguments:

molList: The list of paths to the files of ranked models, which are produced by molscore. Get the molList using datalist(...).
BestN: The number the models of highest molchan score.
totalM: The amount of randomly permuted combinations of the best N models. That is, M is the number of joint-station models. Noted that the first model combination is forced to be the combination of the best model of each station.
StationLocation: A J by 3 table with the column names being ‘format’, ‘Lon’, ‘Lat’ and row names (i.e. StationLocation.Properties.RowNames) the station codes (e.g. ‘TW’, ‘CS’,…) of the total J stations.

Output arguments:

BestModels: A J-by-1 cell array containing the best models of total J stations. In each cell there is a table model parameters.
BestModelNames: An J-by-1 cell array containing station names corresponding to BestModels (also J-by-1 cell array).
idPermute: The randomly permuted indices for the best models. It is a J by 1 cell array, where each cell contains a M by 1 double array being the indices for the Jth BestModels. Noted that the first element in the M by 1 double array is always 1, which force the first model combination in jointstation() to be the combination of the best model of each station (SEE molscore3()).
molList: The list of paths to the files of ranked models, where the files having all molchan scores to be NaN (hence model parameters cannot be ranked) are excluded.

Keyword Arguments:

‘NearbyThreshold’: For some reasons (e.g. no target earthquakes at all in the training phase) models cannot be ranked (optimized) for a station (saying ‘MS’) because all molchan scores are NaN. In this situation, bestmodel will try to find if there is a near-by station within 15 km (in default). If there is, for example, a station ‘PT’ nearby within 15km of ‘MS’, then the ranked model ‘PT’ will be taken instead. That is, in the following process in molscore3, the best models of ‘PT’ and the AIN based on the geomagnetic data of ‘MS’ are applied for calculating the probabilities around the station of ‘MS’.
‘ModelSelect’: You may filter the table of Molchan scores by using this option. For example, with 'ModelSelect', {'Tpred', 5}, all rows whose Tpred is not 5 will be discard.
‘ModelSelectOP’
- The operator function for model selecting if 'ModelSelect' have multiple pairs of input argument.
- For example, with 'ModelSelect', {'Tpred', 5, 'Tlead', 5}, 'ModelSelectOP', @or, all rows with Tpred=5 or Tlead=5 was kept; others, discarded.
- Default is @or.

`convAIN`

Calculate the total amount of anomaly indices of each day. That is, saying we have two statistical indices S and K, and the total amount of anomaly indices is less or equal than 2. It is the ‘update_tsAIN’ of the previous version.

Example:

[sum_tsAIN, sum_validateIndex] = convAIN(tsAIN.Athr, tsAIN.validateIndex)

For each Athr we have one anomaly index (true or false) and validation index (true or false) per day per variable, and the output is the number of “trues” per day. Usually Athr = 1:10, so the output will be a 10 by 1 cell in each there is a N by 1 double; N is the length of the DateTime series.

Input arguments:

tsAIN.Athr or tsAIN.validateIndex: A 1-by-10 structure array. See anomalyind for more information.

Output arguments:

sum_tsAIN or sum_validateIndex: A 10-by-1 cell array corresponding to \(A_{thr} = 1:10\). In each cell, there is a N-by-1 double array being the sum of anomaly indices or the sum of valid indices of N days.

`dt_TIP_true`

[dt_TIPtrue,TIPtime,TIP] = dt_TIP_true(DateTimeSAT,vlSAT, Tthr,Tlead,Tpred) return the datetime where TIP is true.

Input arguments:

DateTimeSAT: M-by-1 datetime array for sum of anomaly days
vlSAT: M-by-1 double array for the values of sum of anomaly days (the anomaly days is summed in the moving window of Tobs)
Tthr: the threshold related to the number of anomaly days in Tobs
Tlead: the leading window right before the prediction window
Tpred: the window of prediction (where TIP is true or false according to the number of anmalies in the corresponding Tobs window)

Output arguments:

dt_TIPtrue: the datetime where TIP is true
TIPtime: an N-by-1 datetime array paired with the TIP array
TIP: an N-by-1 logical array of Time of Increased Probability

The variable dt_TIPtrue may exceeds the last day of DateTimeSAT, while TIPtime is truncated to not exceed DateTimeSAT(end) in molscore, because in training phase datetime out of DateTimeSAT is assumed to be out of the time range of available earthquake catalog.

A schematic illustration of the input/output variables:

sum_validateIndex_all_a_dti (2nd input argument in sumanomalyind)
               1100111011000001010010000000001010000000 (for example) 
DateTime_dti   tttttttttttttttttttttttttttttttttttttttt
Tobs length    |-----|  (Tobs=7, for example) 
Tlead length          |-------| (Tlead=9, for example) 
vlSAT                5445543222122233222221112122344332 (for example) 
DateTimeSAT index    123456789.........................         
DateTimeSAT    |Tobs |--------------------------------|         
TIPtime/TIP    |Tobs || Tlead ||--------------------------------------|
                               ↑                |Tobs || Tlead ||Tpred| 
                   Tpred start ↑        last AIN data ↑     Tpred end ↑

`dt_TIP_valid`

dt_TIP_valid gives loosely valid TIP time and strictly invalid TIP time. There are two possibilities for a TIP false: one is that the anomaly index number (AIN) is below the threshold, and the other is that AIN is NaN since anything in comparison with a NaN results in a false. The later we called it an ‘invalid’ TIP time/day.

Example:

[TIPtime1,TIP_valid,dates_TIPvalid_st,dates_TIPinvalid_st] = ...
dt_TIP_valid(DateTimeSAT,sum_validateIndex_all_a_dti,Tobs,Tlead,Tpred)

This gives the datetime where there is at least one data in the Tobs and hence TIP false is a valid false.

Input arguments:

DateTimeSAT: see sumanomalyind()
sum_validateIndex_all_a_dti: the sum_validateIndex in convAIN()
Tobs: the observation time window right before the leading window (Tlead).
Tlead: the leading window right before the prediction window
Tpred: the window of prediction (where TIP is true or false according to the number of anmalies in the corresponding Tobs window)

Output arguments:

TIPtime1: a 1-d datetime array for TIP_valid
TIP_valid: a 1-d logical array indicating whether the TIP is valid or not.
dates_TIPvalid_st: Returns the datetimes of where the TIP have a meaningful true/false.
dates_TIPinvalid_st: Returns the datetimes of where the TIP have a meaningless false. (If TIP is true, then the day should always be valid)

About the meaningful and meaningless ‘false’, see the comments in anomalyind()

Recalling that sum of AIN is the sum of anomaly index calculated in the moving time window of length Tobs, and the dates DateTimeSAT(sum_AIN_valid) denotes the end day of Tobs where at least one of the anomaly index in the Tobs window is meaningful, we have the following example where Tlead = 9, Tobs = 7, and Tpred = 3, for explicitly demonstrates how the TIP’s valid times are calculated:

sum_validateIndex_all_a_dti
              '1100111011000001010010000000001010000000' (for example) 
DateTime_dti   tttttttttttttttttttttttttttttttttttttttt|
DateTimeSAT          tttttttttttttttttttttttttttttttttt|
sum_AIN_valid        1111111111111111111110001111111110|
dt_AIN_valid         ttttttttttttttttttttt   ttttttttt |
               |Tobs ||<--a--->|                       |<--b----->|
TIPtime                        tttttttttttttttttttttttttttttttttttt
Tpred moving window            |c|                              |c|
                               | the same length as DateTimeSAT |
TIPvalid_0                     123456789........................
                                \\\\\\\
TIPvalid_1                       3456789..........................

Therefore, 
dt_TIPvalid = unique(TIPtime([1:3,2:4,3:5,....]))

% t denotes a datetime object
% a = Tlead+1 = idinc2predstart
% b = Tlead+Tpred = idinc2predend
% c = Tred
% dt_AIN_valid is DateTimeSAT(sum_AIN_valid), and sum_AIN_valid is a.k.a Tobs_TF.
% TIPvalid_0 (_1) is a.k.a. TIP_TF_0 (_1)

`eqktip1`

[EQK,TIP, TIPtime] = eqktip1(eqMR,eqDateTime, DateTimeSAT,vlSAT, ...
                   Tthr_iMod,Tlead_iMod,Tpred_iMod, Mc_iMod,Rc_iMod)

provides 1 dimensional logical array of target earthquakes (EQK) and Time of Increased Probability (TIP), corresponding to the 1-d datetime array TIPtime.

Input arguments:

eqMR: a N by 2 array of all-combination of model magnitudes and radii of detection, with the first column being magnitude and second column the radius of detection.
eqDateTime: a N by 1 cell array. Each cell contains the datetime of target earthquakes satisfying the corresponding row of eqMR; i.e., being larger than eqMR(i,1) and within eqMR(i,2).
DateTimeSAT, vlSAT: see sumanomalyind.
Tthr, Tlead, Tpred: see dt_TIP_true.
Mc_iMod: the current model magnitude.
Rc_iMod: the current model radius of detection.

Output arguments:

EQK: the M-by-1 logical array for whether there is/are target earthquakes in the day.
TIP: an M-by-1 logical array of Time of Increased Probability
TIPtime: an M-by-1 datetime array paired with the TIP and EQK array

See also:

[~,eqMR,eqDateTime] = isearthquakeinrange(...) in molscore
[DateTimeSAT,vlSAT] = sumanomalyind(...) in molscore

`get_modelparam`

Use [var1, var2, ...] = get_modelparam(BestModels, 'name1', 'name2',...) to get specific model parameters uniquely. BestModels is the output of bestmodel().

Example:

Tpreds, Tleads = get_modelparam(BestModels, 'Tpred', 'Tlead')

`jointstation`

jointstation calculates hit rates, alarmed rates, and spatial probability.

Example:

When it is applied for calculating the hit rates of the training phase:

HitRates_trn = jointstation(BestModels,BestModelNames,idPermute,...
             sum_tsAINs,sum_validateInds,DateTime_dti,...
             CWBcatalog,'CalculateHitRate',true);

When it is applied for calculating the forecasting probabilities:

[HitRates_frc, AlarmedRates_frc, SpatialProbability, xLon, yLat, ...
    validStationTime, ProbTime, TIP3, TIPvalid3, EQKs, Other]...
    = jointstation(BestModels,BestModelNames,idPermute,...
         sum_tsAINs,sum_validateInds,DateTime_dti,...
         CWBcatalog,'CalculateHitRate',true,...
         'CalculateAlarmedRate',true,'CalculateProbability',HitRates_trn,...
         'StationLocation',StationLocation, 'WithExtra', true);

Keyword Arguments:

‘CalculateHitRate’: Whether to calculate the hit rates. Default is false.
‘CalculateAlarmedRate’: Whether to calculate the alarmed rates. Default is false.
- If ‘CalculateProbability’ is true, ‘CalculateAlarmedRate’ will be automatically set to be true since alarmed rates are essential to probability calculation.
‘CalculateProbability’: Whether to calculate the spatial probability. Default is false.
‘StationLocation’:
- A J by 3 table with the column names being ‘format’, ‘Lon’, ‘Lat’ and row names (i.e. StationLocation.Properties.RowNames) the station codes (e.g. ‘TW’, ‘CS’,…) of the total J stations.
- If ‘CalculateAlarmedRate’ is true, ‘StationLocation’ must be specified; otherwise it is impossible to obtain alarm rates.
- Default is 0.

Input arguments:

BestModels: A J-by-1 cell array containing the best models of total J stations. In each cell there is a table model parameters.
BestModelNames: An J-by-1 cell array containing station names corresponding to BestModels (also J-by-1 cell array).
idPermute: The randomly permuted indices for the best models. See bestmodel().
sum_tsAINs: A J-by-1 cell array with each cell corresponding to BestmodelNames. See loadAIN().
sum_validateInds: A J-by-1 cell array with each cell corresponding to BestmodelNames. See loadAIN().
DateTime_dti: A T by 1 datetime array for the time series in sum_tsAINs and the sum_validateInds.
CWBcatalog: A table of earthquake catalog. You may load CWBcatalog using only1field([dir_catalog filesep 'catalog.mat']).
WithExtra: Output an additional struct of supplementary data as the Other variable.

Output arguments:

HitRates, AlarmedRates: M by 1 array, being the hit rates (alarmed rates) of each combination of best models. M is the amounts of total combinations (the totalM in bestmodel()). They are AlarmedRateForecasting and HitRatesForecasting of molscore3’s output. AlarmedRates(i) can be a number or NaN (when 0/0).
SpatialProbability: A S by T array of the temporal-spatial forecasting probabilities. It is the output variable Probability of molscore3().
xLon (yLat): The longitudes (latitudes) of all spatial points as a S by 1 array. It is the output variable ProbabilityLon (ProbabilityLat) of molscore3().
validStationTime: A T by J table of the ratio of valid models each day each station. This is for plotProbability(), that the ratio will be demonstrated as gradient color in the marker of each station. If the ratio is zero, the marker on map will be hollow. It is the output variable validStationTime of molscore3().
TIP3: A S by T TIP array for the first model (out of the total M models). Since the first element of each cell in idPermute is always 1, indicating the best model, TIP3 is essentially the TIP array calculated according to the best model parameter.
TIPvalid3: A spatial-temporal logical array indicating whether a point (s,t) of TIP true or false is meaningful or not. This is also a S by T array.
ProbTime: A 1 by T datetime array for TIP3, TIPvalid3 and SpatialProbability. This is the ProbabilityTime of molscore3’s output.
EQKs: The list of target earthquake, with each column
- time, Lon, Lat, Depth, Mag: The time, longitude, latitude, depth and magnitude (\(M_L\)) of the target earthquake.
- InStation: Indicates the target earthquake is of which station.
- EqkDist: The distance between the station (as specified by InStation) and hypocenter of the target earthquake. Also see simpleStationEQKdist3D.
Other: The additional output when ‘WithExtra’ is true.
- Other.nEQK: The number of target earthquakes for each model in forecasting phase.
- Other.areaTIP: The total area (spatial + temporal, i.e., sum(TIP_valid_area)) of TIP for each model in forecasting phase. See TIPspatialtemporal and TIPArea.
- Other.nEQK .* HitRates is the number of hitted earthquakes for each model in this forecasting phase.
- Other.areaTIP .* AlarmedRates is the total amount of alarmed TIP area (i.e., sum(TIP_true_area)) for each model in this forecasting phase.
- Noted that it is nonsesnse to set ‘WithExtra’ to be true when ‘calculateHitRate’ and ‘calculateAlarmedRate’ are false.

`loadAIN`

[sum_tsAINs,sum_validateInds,DateTime_dti,AthrList] =
loadAIN(dir_tsAIN,BestModelNames,TimeRange)

load tsAIN, calculate the summation of anomaly index of each day, and truncate the time series to TimeRange.

If class(BestModelNames) == 'char', for example, ‘CS’, the output sum_tsAINs and sum_validateInds are the 10 by 1 cell array directly, instead of a M by 1 cell array inside the J by 1 cell array where J = 1.

Input arguments::

dir_tsAIN: the directory for the saved AIN data. See anomalyind().
BestModelNames: an J-by-1 cell array containing station names corresponding to BestModels (also J-by-1 cell array).
TimeRange: a two-element datetime array.

Output arguments::

sum_tsAINs: a J-by-1 cell array with each cell corresponding to BestmodelNames. In each cell there is a 10-by-1 cell corresponding to the list of \(A_{thr}\) (AthrList = 1:10), being the timeseries of the summation of anomaly index of each day.
sum_validateInds: a J-by-1 cell array with each cell corresponding to BestmodelNames. In each cell there is a 10-by-1 cell corresponding to the list of \(A_{thr}\) (AthrList = 1:10), being the summation of the indicators for the valid statistical indices of each day.

`modparam`

In GEMS-MagTIP, model parameters defines TIP and target earthquakes (EQK). [PredParam,varargout]=modparam() generate the whole set of model parameters that are going to be used in the training phase by molscore.

Keyword Arguments:

‘Rc’: Radius (range) of detection in unit kilometer. Default is [20:10:100])
‘NthrRatio’:
- The ratio threshold to be converted to the number of anomalous statistic index every day.
- ‘NthrRatio’ should be calculated automatically in molscore() according to the maximum possible number of statistic indices among the anomaly index calculated by anomalyind().
- Nthr is the threshold (integer) for the number of anomalous statistic indices each day.
- Default is [0.01, 0.99], which is for the situation where the maximum possible Nthr is \(\leq 2\) (i.e., to be [0, 1]). If you have the maximum possible Nthr to be \(\leq 3\) (i.e, Nthr = [0 1 2]), ‘NthrRatio’ should be assigned to be [0.01, 0.49, 0.99] for example.
‘Ptthr’: Ptthr defines Tobs through Ptthr = Tthr/Tobs. Default is [0.1:0.1:0.5].
‘Tobs’: The length of observation moving window. Default is [5:5:100];
‘Tpred’: The length of prediction moving window.
‘Tlead’: The length of leading window, the time gap that is for the period of silent (no-anomaly) right before big earthquakes.
- Default is [0:5:100].
‘Mc’: The magnitude threshold. Default is 5, which means only events of \(M_L \geq 5\) will be considered as target earthquakes.
‘Test’:
- If true or 1, then the PredParam is reduced to 1000 randomly picked models from the default combination.
- If it is a positive integer N other than 1, the predParam is reduced to N randomly picked models.
- Default is 0.
‘Seed’: the seed for the random selection while ‘Test’ is enable.
- Seed must be a nonnegative integer seed less than 2^32.
- If seed is less than zero, the random selection has its result different everytime.

Output:

PredParam: The GEMSTIP/MagTIP models of total 8 free parameters: G = [Mag,Rad,NthrRatio,Athr,Tthr,Tobs,Tpred,Tlead].

Examples: - [PredParam, columnNames] = modparam('OutputFormat','double'); - [PredParam_table] = modparam('OutputFormat','table');

`shift2TIPtime`

[dt_true_in_TIP, TIPtime] = shift2TIPtime(DateTimeSAT,Tobs_TF,Tlead,Tpred) gives the date time array in the domain of TIPtime, where the trues in TIPtime are defined in Tpred window according to the data in Tobs window. Tobs_TF:

trues or falses according to the data in Tobs moving window.
these true-falses should be exactly the same size of DateTimeSAT, where DateTimeSAT is the moving-window’s time tags (at the last day of the moving time window)
e.g., vlSAT>=Tthr_iMod
e.g., logical(MovingWindowSum(sumvalidateIndexalladti,Tobs))

(these examples are taken from see dtTIPtrue.m and dtTIPinvalid.m)

dt_true_in_TIP: the corresponding trues in the TIP time domain.

`sortModelBy`

Sort model names by a certain column value. Example: [sortedModelNames, I] = sortModelBy('Lat','descend',ModelNames,TheTable);

`sumanomalyind`

[DateTimeSAT,vlSAT] = sumanomalyind(DateTime_j,sum_tsAIN_k,Nthr_array,Tobs_i) calculate sum of anomaly day (vlSAT) and its corresponding datetime series (DateTimeSAT). It is simply the moving-window sum of the sum_tsAIN_k timeseries, where the moving-window length is Tobs_i; in which, i stands for \(i_{th}\) model, j for \(j_{th}\) station, and k for \(k_{th}\) threshold \(A_{thr}\). Of coursely, the number of elements of the output timeseires (DateTimeSAT) will be Tobs_i - 1 less than the input timeseries (DateTime_j or DateTime_dti); i.e., length(sum_tsAIN_k) - length(vlSAT) = Tobs_i - 1.

Input arguments:

DateTime_j: the datetime array for sum_tsAIN.
sum_tsAIN_k: the sum of anomaly statistical indices each day.
Nthr_array: The thresholds for sum_tsAIN_k calculated according to NthrRatio. Nthr is known as the threshold of the total daily amount of anomaly statistical indices.
Tobs_i: the observation window of model i.

Output arguments:

DateTimeSAT: M-by-1 datetime array for sum of anomaly days
vlSAT: M-by-1 double array for the values of sum of anomaly days (the anomaly days is summed in the moving window of Tobs)

See also:

convAIN for more details about sum_tsAIN_k.

`truncateTIPtoSAT`

Internal API associated with the truncation of TIP array to the datetime array for the sum of anomaly indices.

Format

Docstring

`fmt`

fmt is a class that

stores all constant names
stores all constant variables

that enables you to modify or extend some contents without breaking other functionalities.

GEMS-MagTIP depends on fmt to load a variety of intermediate data, including earthquake catalog, station information table and the input data of the standard format. fmt save constant names and variables as properties (Constant) that they can be available in global scope. fmt also have static methods for acquiring information such as the column indices indicating the time vectors and recorded values of the input data of different types.

properties (Constant):

InfoFileName_molscore = '[MolchanScore]Information.mat': name to the intermediate file in the session of training (model optimization)
InfoFileName_anomalyind = '[tsAIN]Information.mat': name to the intermediate file in the session of calculating anomaly indices
InfoFileName_jointstation = '[JointStation]Information.mat': name to the intermediate file in the joint-station method session
InfoFileName_statind = '[StatisticIndex]Information.mat': name to the intermediate file in calculating Statistic Index
catalogFileName = 'catalog.mat': file name to the earthquake catalog
stationLocationFileName = 'station_location.mat': name to the file of station information table
tag_upperlowerthreshold = 'ULthr': tag for upper- and -lower threshold defined by Athr
datestrFormat = 'yyyymmdd': the format for converting datetime to string tags (dt)
datetimeInputFormat = 'yyyyMMdd': the format for converting date strings (with tag dt) to datetime
datetimeIncrement: the increment of datetime between two consecutive files of the same station and type.
LatitudeLimit = [21.5 26]: latitude limits of the Taiwan area
LongitudeLimit = [118 122.5]: Longitude limits of the Taiwan area
datatype_mag_1 = {'mFull', 'full'}: "type" tag for the full field geomagnetic data (old geomagnetic stations from 2006; ‘full’ is for backward compatibility)
datatype_mag_3 = {'mTri', 'tri'}: "type" tag for the 3 component geomagnetic data (new geomagnetic stations from 2020; ‘tri’ is for backward compatibility)
datatype_GEMS_HJ: "type" tag for the data of Hong-Jia Chen 2018
datatype_GEMS: "type" tag for the primitive of Geoelectric data

methods (Static):

colindex2data(what_type): returns column indices to time and data part of the loaded matrix. Example: [colind_time, colind_data] = colindex2data('GEMS0')
filterRange(filter_tag): returns two values indicating the lower and upper limit of the filter, in the unit of Hz. Example: fmt.filterRange(ULF_A).
datasampfreq(what_type): returns the expected Sampling Frequency of a specific file/data type. Example: fs = datasampfreq('GEMS0'). You can obtain expected data points of the day by supposedpts = fs*86400.
fmtfieldname = datatype_fieldname(what_type): returns the field name for fmt that you can fmt.(fmtfieldname).

Plotting functions

Caution

`plotCB`

Plot confidence boundary Make sure xpt0 and Dcb_max are both N by 1 array.

`plotEQKTIP1`

plotEQKTIP1(dir_tsAIN,dir_molchan,dir_catalog,dir_output) plot one-dimensional TIP time series for each station with target earthquakes (EQK) scattered over the diagram; the results are saved as png files in dir_output.

Keyword Arguments:

‘ShowTrainingPhase’: Whether to show the EQK and TIP on the plot. Default is false.
‘scatter’: Set it true to add an additional layer of scattering points of the same color on the top of every TIP==1. This option allows better visual inspection on whether target earthquakes lay in the area of TIP==1 or not. Default is false.
‘Rank’: Choose the model by fitting degree ranking for defining EQK and TIP. Default is 1 (plot the EQK and TIP that are defined by the rank 1 model).
‘OnlyStations’: Specify only a subset of station to be displayed on the plot; for example, for 'OnlyStations',{'MS','TW'} only the results of these two stations will be displayed. In default, results of all stations will be displayed.
‘datetimeTickArguments’: Manually define the datetime format of the X tick labels. For example, for 'datetimeTickArguments', {"yy' mmm.", 1,'months'} the datetime 2012-Nov-12 will be printed as “12’ Nov.”. For more information, see datetime_ticks().
‘TargetPattern’: See the documentation in plotProbability().

`plotEQKTIP3`

Like plotEQKTIP1, but expand the 3D TIP array into a 2D diagram by squashing spatial coordinates (latitude and longitude) into one.

`plotFittingDegree`

plotFittingDegree(dir_jointstation,dir_catalog,dir_png) gives fitting degree analysis and molchan diagrams for each training-forecasting phase pair according to the results from molscore3.

`plotProbability`

plotProbability(dir_jointstation,dir_catalog,dir_out) plots two-dimensional probability forecasting maps; the results are saved as png files in dir_output.

Keyword Arguments:

‘LongitudeLimit’: The longitude limits of the displayed map. Default is [119 122.5].
‘LatitudeLimit’: The latitude limits of the displayed map. Default is [21.5 26].
‘TimeRange’: Manually assign time ranges or specific dates that must lay in the forecasting phase to plotting probability forecast. If not set, the probability forecast of every available dates will be plotted. The assigned datetime array must be either:
1. An N by 2 datetime array specifying N ranges at one time.
2. An N by 1 datetime array specifying N days to be plot.
‘PlotEpicenter’: Whether to plot the epicenter(s) of the target earthquakes on the map. Default is 0.
- plotEpicenter, 1: plot target events only (those in Rc).
- plotEpicenter,'all': plot all events even when they are not in Rc.
‘Rigorous’: Whether to drop (ignore) the probability forecasts that are not in the range of the forecasting phases. Default is true. Generally the leading time window (Tlead) is different among models, and the probability can be calculated as far as the Tlead of at least one model covered. However, for the dates out of the forecasting phase, only part of the models are applied thus the probability forecast is weaker.
‘TargetPattern’: the pattern for filtering the files in dir_jointstation. Default is to look up all files that fits the pattern “[JointStation].mat”. For example, you can manually specify it as ”[ULF*A]” to give plots of that have the tag ULFA in the file name only. The pattern should always begin with “” and the last cahracter can never be ””. For multiple filtering at once (e.g., '*[ULF_B]*[Tpred-1]'), join the pattern with “” and the order is important.

`plot_Rc`

Plot radius of detection around the station.

`plot_dataoverview`

plot_dataoverview(dir_stat, dir_catalog) plot an overview of all data. Keyword Arguments:

‘SaveFigureTo’: the output path
‘BackgroundColor’: Background color; default is white ([1,1,1])
‘DatetimeTicks’: The format of datetime ticks on the x-axis.
- Default is {'yyyy-mm',6,'Months'}, which means tick labels are in the format of yyyy-mm and interspaced every 6 months.

`plot_epicenter`

plot epicenter [scatter_handle, axes_handle] = plot_epicenter(CWBcatalog,[ax], kwargs...)

Example

SLoc = only1field(fullfile(dir_catalog, fmt.stationLocationFileName));
checkcatalog(dir_catalog)
CWBcatalog = only1field(fullfile(dir_catalog, fmt.catalogFileName));
CWBcatalog = eventFilter(CWBcatalog, 'Magnitude', 5, 'TimeRange', '20220601-20221002');

figure;
[sc1, ax1] = plot_station(SLoc,'PlotFunction','map', 'MarkerFaceColor', 'none', 'MarkerEdgeColor', '#D95319', 'StationCode', false); % plot all station
hold(ax1, 'on');
plot_epicenter(CWBcatalog, ax1, 'PlotFunction','map'); % plot epicenters
set(gcf, 'Position', [100 100 792 669]);

`plot_station`

plot map and station. See plot_epicenter.

Others

Docstring

`Molchan_CB`

[molt_cb,moln_cb] = Molchan_CB(N,alpha) gives the confidence boundary in molchan diagram, where [molt_cb,moln_cb] are the points defining the boundary on the alarmed-rate-against-missing-rate phase plane.

Input Arguments:

N: total number of target events/earthquakes
alpha: 1-alpha is the confidence level. For example, alpha=0.05 means 95% confidence level.

Output Arguments:

molt_cb: values corresponding to the alarmed rate
moln_cb: values corresponding to the missing rate

Hint:

Calculate the \(D\) (fitting degree) values by Dcb = 1 - molt_cb - moln_cb

`askquestion`

Example

askquestion('Hello, would you like to continue? [Y/N]', @(x) strcmp(x, 'Y'))
Hello, would you like to continue? [Y/N]Y

ans =

  logical

   1

`calcFittingDegree`

calcFittingDegree(jpathlist) according to the given files (jpathlist) provides the overall alarmed rate, missing rate that allows the calculation of the overall fitting degree. Make sure to provide correct input list of the [JointStation] variable , for example, those have the same ID and are not overlapped in forecasting time interval for each group; otherwise the calculated fitting degree can be unreasonable.

Example:

dir_png = 'MyResearch/Figures';
jpathlist = datalist('[JointStation]ID[ou7ud]prp[ULF_B]*.mat', dir_jointstation).fullpath;
[AlarmedRate, MissingRate, xticklabel, EQKs, TIP3s, TIPv3s,TIPTimes,LatLons] = calcFittingDegree(jpathlist);
FittingDegrees = 1 - AlarmedRate - MissingRate;
plotEQKTIP3(dir_png,prp_i, xlabels, EQKs, TIP3s, TIPv3s,TIPTimes, LatLons);

Input Arguments:

jpathlist: a cell array of the full paths of [JointStation] files that are produced by molscore3. You can simpliy obtain the path list by jpathlist = datalist('[JointStation]ID[ou7ud]*prp[ULF_A]*slc[Tpred-10]*.mat',dir_jointstation).fullpath;

Keyword Arguments:

‘GroupTag’: The tag for grouping the files in jpathlist.
‘GroupNames’: Results are calculated separately according to the assigned group names; alarmed rate, missing rate and so on are only calculated if the file name contains the assigned group names. For example, for ...,'GroupNames',{'Tpred-1', 'Tpred-5'},'GroupTag', 'slc',..., only the files with their names containing tag ‘slc[Tpred-1]’ and ‘slc[Tpred-5]’ are selected, and the results of those being ‘Tpred-1’ and ‘Tpred-5’ are separately calculated. That is, the output xlabel is {'Tpred-1', 'Tpred-5'} and other output arguments (e.g. AlarmedRate) are all cell array of the same dimension as xlabel containing the results of the groups that are calculated separately.
Noted that You cannot assign ‘GroupName’ without assigning ‘GroupTag’, but assigning ‘GroupTag’ without assigning ‘GroupName’ is OK, in this case the group names will automatically generated and sorted.

`checkcatalog`

[isvalid, msg] = checkcatalog(dir_catalog) check if catalog.mat/csv exist in dir_catalog, and convert the catalog.csv to catalog.mat. After successfully create catalog.mat, the original catalog.csv will be moved to the folder ‘original_csv’.

If the catalog.csv does not meet the required format, error will occur. The catalog.csv has to suffice the following condtions:

The ‘time’ variable should be in the following format: ’yyyy/MM/dd

HH:mm’.

Other variables except ‘time’ should belong the class of ‘double’.
Basicaly the following headers (column names),

{‘time’,‘Lon’,‘Lat’,‘Mag’,‘Depth’}, have to exist.

Aliases for the column names in the catalog.csv are allowed; see the Aliases section below.

Keyword arguments: 'OverwriteOriginal', false: The csv files in ‘original_csv’ will not be overwritten. Set 'OverwriteOriginal', true if you have version control.

Aliases: For convenience, aliases of the column name of an variable in catalog.csv will be automatically converted:

For earthquake magnitudes, the header of either {‘Magnitude’, ‘magnitude’, ‘ML’} will be automatically converted to ‘Mag’.
For depths, the header of either {‘depth’, ‘dep’, ‘Dep’} will be converted to ‘Depth’.
For longitude and latitude, either {‘lon’, ‘longitude’, ‘Longitude’},

{‘lat’, ‘latitude’, ‘Latitude’} will be converted to ‘Lon’ and ‘Lat’, respectively.

For convenience, in catalog.csv, if the event time is written in separated two columns ‘date’ and ‘time’, with format ‘yyyy-mm-dd’ (or ‘yyyy/mm/dd’) for ‘date’ and ‘hh:MM:ss’ (or ‘hh:MM’), they will be merged as a single ‘time’ variable sufficing the format mentioned before.

`checkstation`

[isvalid, msg] = checkstation(dir_catalog) check if station_location.mat/csv exist in dir_catalog, and convert the station_location.csv to station_location.mat. After successfully create station_location.mat, the original station_location.csv will be moved to the folder ‘original_csv’.

If the station_location.csv does not meet the required format, error will occur. The station_location.csv has to suffice the following condtions:

The ‘time’ variable should be in the following format: ’yyyy/MM/dd

HH:mm’.

Other variables except ‘time’ should belong the class of ‘double’.
Basicaly the following headers (column names),

{‘code’,‘format’,‘Lon’,‘Lat’} have to exist. In which,

‘code’ is the code for the station, whereas ‘format’ is the full name of the station. For example, ‘MS’ (code) corresponds to ‘馬仕’ (format).
‘Lon’ and ‘Lat’ are longitude and latitude of the station respectively.

Aliases for the column names in the station_location.csv are allowed; see the Aliases section below.

Keyword arguments: 'OverwriteOriginal', false: The csv files in ’original*csv’ will not be overwritten. Set 'OverwriteOriginal', true if you have version control. Aliases: For convenience, aliases of the column name of an variable in station*location.csv will be automatically converted:

For longitude and latitude, either {‘lon’, ‘longitude’, ‘Longitude’},

{‘lat’, ‘latitude’, ‘Latitude’} will be converted to ‘Lon’ and ‘Lat’, respectively.

`chkans`

Check the answer in the interaction command-line interface in startup0.m

`constructMolInfo`

constructMolInfo(dir_molchan,InfoId) construct the ‘[MolchanScore]Information.mat’ according to existing [Molchanscore]___.mat files if possible. Use this function only when you are instructed by error messages.

Example constructMolInfo('./var-output/MolchanScore-O04-Cfl_local', 'OT0qPb')

`derive_type`

derive_type derives station type from tag stn[...].

`dirselectassign`

dirselectassign(var_names...) prompts user to select directories in a dialog box, and assigned the selected path to workspace with default variable name. If a variable with the same name as the default name has already in the workspace, its assignment will be ignored (i.e. its dialog box won’t pop out). This is a tool for convenience. You can always assign directories explicitly to variable with any name you like.

Example:

Four windows will pop out one by one allowing you to assign directories to variables dir_stat, dir_tsAIN, dir_molchan, dir_jointstation:
```
dirselectassign('dir_stat','dir_tsAIN','dir_molchan','dir_jointstation');
```
(Read the printed message in the Command Window)
Total 7 windows will pop out one by one allowing you to assign directories to the variables with default names dir_stat, dir_tsAIN , dir_molchan, dir_jointstation, dir_data, dir_catalog, dir_toolbox:
```
dirselectassign();
```
(Read the printed message in the Command Window)

`fft_of_the_day`

Fourier transform of the one-day data.

`fitting_degree_summary`

fitting_degree_summary gives the object for fitting-degree plot.

`formatForecastingPhase`

[frcphase] = formatForecastingPhase(frcphase,trnphase) is the function for converting the argument that the keyword option ‘ForecastingPhase’ takes.

`genID`

Generate trial identifier.

`getRcs`

SRc = getRcs(dir_molchan,targetpattern) returns unique Rc as a struct with field names being the station code (stn[…]). Example

targetpattern = sprintf('[%s]stn*.mat', get_tags(fmt.InfoFileName_molscore, '', 'once'));
SRc = getRcs(dir_molchan,targetpattern)

`loadSheet`

loadSheet(filepath,whatisthis) is deprecated; use checkcatatlog and checkstation instead.

`mkdir_default`

mkdir_default creates/makes folders by default and return their directories with default variable names. The default structure is:

                   variable           default folder name
    ╔═ dir_main ═╤═══════════════════════════════════════╗
    ║            ├─dir_stat         =   'StatisticIndex' ║
    ║            ├─dir_tsAIN        =   'tsAIN'          ║
    ║            ├─dir_molchan      =   'MolchanScore'   ║
    ║            └─dir_jointstation =   'JointStation'   ║
    ╚════════════════════════════════════════════════════╝

Example:

[dir_stat, dir_tsAIN, dir_molchan, dir_jointstation] =
  mkdir_default(fullfile(pwd,'output_var'))

[dir_stat, dir_jointstation] =
  mkdir_default(fullfile(pwd,'var-output'), 'Directories', {'StatisticIndex'; 'JointStation'})

`modeldiscard`

Filter models negatively (modeldiscard)

`modelselect`

Filter models positively (modelselect)

`prpfunctions`

prp_list = prpfunctions() returns the list of existing preprocessing functions that you can apply (those located in the same directory (namely, dir_prp = 'src/preprocess') as the function no). Noted that those under the sub-directories of dir_prp won’t be listed.

`renamecols`

renamecols(alias, O) rename the column name that matche either in alias of the table O to the newname.

`tabledragandfill`

A small tool that fill the empty cells of a column by the most recent non-empty value (just like ‘click, drag, and fill’ in the excel).

Example: CWBcatalog = tabledragandfill(CWBcatalog, colname1, colname2, ...)

Main Functions

statind

statind_parfor

anomalyind

molscore

molscore_parfor

molscore3

molscore3_parfor

Load raw data and convert it into a standard format

conv_gemsdata

conv_geomagdata

readgems

read_gemsdat

geoelectric_projection

columnind

standarddataname

write_data

Available Preprocessing Functions

What do preprocessing functions do?

You can define your own preprocessing functions to work with statind, as the following points should be satisfied:

How to use preprocess functions

Examples

BP_35

BP_40

ULF_A

ULF_B

no

Filtering Functions

autobandpassfilter

butterfilt

filterthedata

generalfilterprocess

loadfilters

try_generalfilterprocess

Internal API for TIP Calculation

TIPArea

TIPspatialtemporal

bestmodel

convAIN

dt_TIP_true

dt_TIP_valid

eqktip1

get_modelparam

jointstation

loadAIN

modparam

shift2TIPtime

sortModelBy

sumanomalyind

truncateTIPtoSAT

Format

fmt

Plotting functions

plotCB

plotEQKTIP1

plotEQKTIP3

plotFittingDegree

plotProbability

plot_Rc

plot_dataoverview

plot_epicenter

plot_station

Others

Molchan_CB

askquestion

calcFittingDegree

checkcatalog

checkstation

chkans

constructMolInfo

derive_type

dirselectassign

fft_of_the_day

fitting_degree_summary

formatForecastingPhase

genID

getRcs

loadSheet

mkdir_default

modeldiscard

`statind`

`statind_parfor`

`anomalyind`

`molscore`

`molscore_parfor`

`molscore3`

`molscore3_parfor`

`conv_gemsdata`

`conv_geomagdata`

`readgems`

`read_gemsdat`

`geoelectric_projection`

`columnind`

`standarddataname`

`write_data`

You can define your own preprocessing functions to work with `statind`, as the following points should be satisfied:

`BP_35`

`BP_40`

`ULF_A`

`ULF_B`

`no`

`autobandpassfilter`

`butterfilt`

`filterthedata`

`generalfilterprocess`

`loadfilters`

`try_generalfilterprocess`

`TIPArea`

`TIPspatialtemporal`

`bestmodel`

`convAIN`

`dt_TIP_true`

`dt_TIP_valid`

`eqktip1`

`get_modelparam`

`jointstation`

`loadAIN`

`modparam`

`shift2TIPtime`

`sortModelBy`

`sumanomalyind`

`truncateTIPtoSAT`

`fmt`

`plotCB`

`plotEQKTIP1`

`plotEQKTIP3`

`plotFittingDegree`

`plotProbability`

`plot_Rc`

`plot_dataoverview`

`plot_epicenter`

`plot_station`

`Molchan_CB`

`askquestion`

`calcFittingDegree`

`checkcatalog`

`checkstation`

`chkans`

`constructMolInfo`

`derive_type`

`dirselectassign`

`fft_of_the_day`

`fitting_degree_summary`

`formatForecastingPhase`

`genID`

`getRcs`

`loadSheet`

`mkdir_default`

`modeldiscard`

`modelselect`

`prpfunctions`

`renamecols`

`tabledragandfill`