Library

Warning
  • 請勿將未記錄的名稱-值對參數分配給函數。例如,'CreateInfoOnly''InParforLoop',這些參數是為平行運算保留的,除非您是開發者,否則無需手動設置它們。

Main Functions

在完整的 GEMS-MagTIP 工作流程中,主要分為四個階段,每個階段對應以下主要函數:

  1. statind:統計指標的計算。
  2. anomalyind:基於統計指標計算異常指標數值。
  3. molscore:基於異常指標的訓練階段,包括單測站 TIP 的計算、匹配目標地震的 TIP,以及計算 Molchan 得分。
  4. molscore3:基於最高 Molchan 得分的模型參數進行預報階段,包括計算聯合測站 TIP 和基於第 2、3 步結果的機率。

這四個階段由四個函數封裝,並提供關鍵字參數,允許我們自訂一些與模型優化和機率預報相關的超參數。

Docstring

statind

statind(dir_data,dir_output) calculate daily statistics (a statistical quantity as an index of the day) of the daily timeseries in dir_data. The output variables are stored in dir_output.

Example:

statind(dir_data,dir_output,'StatName','S','StatFunction',@skewness, 'Preprocess', {'ULF_A','ULF_B'});

In which, dir_data is the directory for the time series in the standard format; dir_output is the dir_stat mentioned before.

Keyword Arguments:

  • 'StatName'

    • The abbreviations for the name of statistical indices. They can be arbitrarily defined but have to be the same number of elements as that of 'StatFunction'.
    • Default is {'S','K'} (for Skewness and Kurtosis).
  • 'StatFunction'

    • The function handle for calculating statistical index.
    • It has to be of the same number of elements as that of 'StatName'
    • Default is {@skewness,@kurtosis} for calling the skewness() and kurtosis() functions.
  • 'Preprocess'

    • Apply filter(s) to time series loaded from dir_data.
    • Default is {'no'}, which applies only minimal preprocessings.
    • Use prpfunctions() to list all available preprocessing functions.
    • If multiple preprocessing functions are applied, for example {'no', 'ULF_A'}, then two sets of result according to no-filter data and ULF_A band passed data are going to be produced.
  • 'FilterByDatetime'

    • It should be a two element datetime array.
    • If applied, all files with date time tag not in the range will be ignored.
    • Default is [datetime(0001,1,1), datetime(2999,12,31)], resulting in no data selection by date time tag.
  • 'UpdateExisting'

    • If true, the array size in the existing statistic indices (files starts with “[StatisticIndex]”) will be extended, and the DateTime in Information file in the dir_output will all be updated to cover the datetime of the data in dir_data. (Hint: use 'FilterByDatetime' to avoid re-calculate all data in the dir_data)
    • Default is false; that a complete new session starts with the old Info renamed.
    • Currently this option is not supported by statind_parfor.
    • Noted that 'SkipExist' will be set to false if 'UpdateExisting' is true to avoid unexpected behavior (there must be something wrong that you update the array sizes of the existing files to adapt latest DateTime, but also skip the calculation of statistical indices for those files).
    • Noted that 'UpdateExisting' not only extend the full date time list to the latest, but also to the earliest available data time obtained according to the data list of dir_data. Thus, you may like to apply 'FilterByDatetime' to prevent the update according to earlier pasts.
  • 'SkipExist'

    • Set it true to skip calculations if in the current loop a file in dir_stat of the

    same 'stn' and 'prp' already exists.

    • A typical scenario to use this option is when your last calculation is not completed yet.
    • Default is false.
  • 'LoadFilters', dir_filter

    • Load all filters in dir_filter once for performance. See loadfilters, generalfilterprocess.

statind_parfor

statind_parfor is the parallel computing version of statind, it takes the same input arguments and keyword options as statind.

Noted that the option 'UpdateExisting', true is not supported in statind_parfor; besides this option, all documented statind keyword arguments are supported as well as the following additional Keyword Arguments:

  • 'NumCore': Assign specific how many CPU workers to be used in the paralell pool.
  • Default is 0, that the number of CPU cores to be used for paralell computing is automatically decided.
  • 'ReverseDatalist': Simply reverse the list of data. This might save time when you try to continue from previous unfinished calculation with the 'SkipExist' set true. The default is false.

anomalyind

anomalyind(dir_stat,dir_output) calculates anomaly index number (AIN) according to a list of \(A_\text{thr}\); dir_out is where the output variables stored.

Example:

anomalyind(dir_stat,dir_tsAIN);

Keyword Arguments:

  • 'AthrList':

    • the list of \(A_\text{thr}\) (the multiplier of the median value of the statistical indices within the moving time window)
    • default is [1:10]
  • 'MovingWindow'

    • the moving-window length for calculating the median value mentioned above
    • default is 1000 (days)

Input and output:

  • anomalyind takes the statistic index output from statind by loading them from dir_stat.
  • The calculated anomaly index is saved in dir_tsAIN.

molscore

molscore(dir_tsAIN,dir_catalog,dir_molchan) is responsible for single station’s TIP & Molchan score calculation.

Example:

dir_catalog = 'Workspace/GEMS-MagTIP-insider/spreadsheet';
dir_tsAIN = 'Workspace/var-output/tsAIN-J13-TZ2';
dir_molchan = 'Workspace/var-output/MolchanScore-J13-TZ2';
molscore(dir_tsAIN,dir_catalog,dir_molchan);

Keyword Arguments:

  • ‘TrainingPhase’

    • Assigns a (set of) training phase(s). It should be of type 'calendarDuration', 'duration', an N by 2 array of 'datetime', or an N by 2 cell array, where N is the number of the training phases. If given an N by 2 array specifying N training phases, then N sets of results will be produced separately, with output format being '[MolScore]stn[%x]ID[%s]prp[%s]dt[%s-%s].mat'.

    • For example, a 4 by 2 datetime array reshape(datetime(2009:2016,2,1),[],2) specifies the start and end date of 4 training phases, with the first column being the datetime of the start and the second column being the end of each training phases.

    • For example, a 3 by 2 cell array

      {calyears(7),datetime(2012,11,11);...
       calyears(7),datetime(2011,11,11);...
       calyears(7),datetime(2010,11,11)};

      specifies the end day of the training phases as 2010-Nov-11, 2011-Nov-11 and 2012-Nov-11, all having a length of 7-year-long training period (i.e. calyears(7)). If the duration is negative (e.g. -calyears(7)), the datetime of the second column become the first day of each training phase.

    • Default is calyears(7), which specifies the end day of training phase the day before the last day of statistical indices or anomaly index number, with a length of 7 year period. That is, in default it “trains” and gives the best models according to the most recent 7-year data.

  • ‘modparam’

    • Specify the model parameters for grid search. It should be a cell array; all elements in the cell array will be directly passed into modparam() as input arguments.
    • For example, {'Athr',[1:2:10],'Rc',[20, 30]}.
    • Default is {}.
  • ‘AdditionalCheck’

    • Apply some additional check and tests. This option is for developer.
    • Default is false.
  • 'SkipExist'

    • Set it true to skip calculations if there is a file in dir_output having exactly the same name already (which means, having the same filter tag, station name and training phase).
    • A typical scenario to use this option is when your last calculation was aborted manually or interrupted by an error event.
    • Default is false.

Input and output:

  • molscore takes anomaly indices from dir_tsAIN and earthquake catalog and station information from dir_catalog.
  • The output ranked models are saved in dir_molchan.
  • In the "[MolchanScore]Information.mat", Info.DateTime is inherited from "[tsAIN]Information.mat" (whose DateTime is inherited from "[StatisticIndex]Information.mat")

molscore_parfor

molscore_parfor is the parallel computing version of molscore, it takes the same input arguments and supports all documented keyword arguments as molscore.

molscore3

molscore3 calculates the joint-station TIP, the Molchan score between EQK and TIP, the Hit rate, and the earthquake probability. The calculation is based on optimized model given by molscore.

Example:

dir_catalog = 'Workspace/GEMS-MagTIP-insider/spreadsheet';
dir_tsAIN = 'Workspace/var-output/tsAIN-J13-TZ2';
dir_molchan = 'Workspace/var-output/MolchanScore-J13-TZ2';
dir_jointstation = 'Workspace/var-output/JointStation-J13-TZ2';
molscore3(dir_tsAIN,dir_molchan,dir_catalog,dir_jointstation)

Keyword Arguments:

  • ‘ForecastingPhase’

    • Catalog will be filtered according to this parameter.
    • It can be a N by 2 datetime array with its size identical to the training phases (trnphase), specifying the time ranges of N forecasting phases.
    • It can be ‘calendarDuration’ or ‘duration’, saying T. In this case, saying the last day of the training phase is trnphase(i, end), the forecasting phases are set to start from trnphase(i, end) + 1, and end at trnphase(i, end) + T.
    • ‘auto’. In this case, forecasting phases are set to be as long as possible. That is, until the last day where tsAIN is available.
    • Default is one calendar year: calyears(1).
    • Also see formatForecastingPhase().
    • Noted that the actual ProbabilityTime in the output is extended according to the maximum of leading time (Tlead) and predicting time (TPred) model parameters, to maximize the extent that the model can predict. For details, please see refer to the source code of jointstation for how ProbTime is defined.
  • ‘OverwriteFile’

    • Whether to overwrite existing output files or not.
    • Default is true.
  • ‘ModelSelect’

    • see bestmodel()
  • ‘ModelSelectOP’

    • see bestmodel()
  • ‘ChooseBest’

    • Define the number of maximum best models for each station to be applied.
    • Default is 10, which means we pick the models of top 10 fitting degrees each station for calculating predicted TIP(s).
  • ‘CombinationNumber’

    • Define the total number of random combinations among the best models of each station for joint-station TIP calculation.
    • Default is 500, which means for every station a random permutation of ranking numbers (based on the fitting degree) of the best models is performed with each sequence of ranking number having 500 elements, and the ith joint-station model parameter combination is thus from the best models of each station indexed by the ith element of each permutation.
  • ‘WithExtra’

    • If ‘WithExtra’ is true, the returned InfoTable will include a variable for Other data.
    • Please refer the output arguments of jointstation.
    • Default is false.

Input and output:

  • molscore3 takes anomaly indices from dir_tsAIN, earthquake catalog and station information from dir_catalog, and ranked model form dir_molchan.
  • The output probability is saved in dir_jointstation.

molscore3_parfor

molscore3_parfor is the parallel computing version of molscore3, it takes the same input arguments and supports all documented keyword arguments as molscore3.

Load raw data and convert it into a standard format

標準格式資料表示應包含僅有一個欄位的格式化 matfile,該欄位名稱可任意選擇。假設欄位為 M,則 M 必須是一個矩陣,其時間和資料的欄位索引需依據資料類型由函數 fmt.colindex2data(參見 fmt)分配。

Warning

需注意,fmt.colindex2data 返回一組用於標準格式資料的索引,而 columnind(參見 下方的 docstring)則返回一組用於原始資料的索引。

標準格式資料的命名至關重要。 如需了解細節,請參考以下 docstring 中的 standarddataname 以了解命名規則。

Docstring

conv_gemsdata

conv_gemsdata(dir_gems_raw, dir_data, dir_catalog) read original GEMS’s data (e.g., 2012_02_07_16_45_00.dat) in dir_gems_raw, merge and convert them to the standard format for GEMS-MagTIP.

The GEMS’s raw data is recorded at local time (UTC+8), and conv_gemsdata converts the time to UTC time.

Also see "read_gemsdata.m". Example:

dir_gems_raw = 'rawdata/GEMSdat';
% where the individual data locates in for example:
% 'rawdata/GEMSdat/em10/REC/Y2012/M02/D07/2012_02_07_16_45_00.dat'
dir_data = 'dataStandard/GE';
dir_catalog = 'spreadsheet'; % for obtaining the station information ("station_location.mat");
conv_gemsdata(dir_gems_raw, dir_data, dir_catalog);

Keyword Argument:

  • ‘FilterByDatetime’: See conv_geomagdata.

conv_geomagdata

The original geomagnetic data (which are those in “.csv” format being something like “2008010300.KM” or “20190307.LY”) should be converted to a standard format before any calculation. conv_geomagdata(dir_mag_raw, dir_data) read original data in dir_mag_raw and save them in the standard format at the directory dir_data.

Keyword Argument:

  • ‘ContinueFromLast’: Default is false. If true, it compares the names of all old files in the dir_mag_raw and the new files in dir_data before conversion to avoid files that are already being converted to be converted again. This additional procedure may take several hours depending on the size of database; a more efficient way for avoiding repeated processing is to manually specify ‘FilterByDatetime’. See below.

  • ‘FilterByDatetime’: Only convert the files in the assigned time range(s). It supports:

    • A datetime. For example, when 'FilterByDatetime', datetime(2010,10,10), only files with time tag being or after 2010-Oct-10 will be converted.

    • N by 2 datetime array, for example,

      'FilterByDatetime', [datetime(2009,12,10), datetime(2010,10,10);...
                           datetime(2013,12,10), datetime(2017,10,10)];

      , only the files in the two time ranges [2009-Dec-10, 2010-Oct-10] and [2013-Dec-10, 2017-Oct-10] will be converted; otherwise, ignored.

  • ‘Test’: It take only the assigned ratio (0-1) of data (files) in the directory dir_mag_raw. The files are randomly chosen. This option should only be applied when you want to test your code for reducing the overall computing time. Default is 0 (when the assigned value for ‘Test’ is \(\leq 0\) or \(\geq 1\), there is no reduction in the data list).

NOTICE:

  • Files in the original format must be uniquely named as ‘20200101.HC’ or ‘2020010109.HC’ for example.
  • If not uniquely named, such as ‘../HC/20200101.txt’, ‘../CS/20200101.txt’, the second one will be regarded as a duplicated file and be moved out to an alternative folder (dir_alt)
  • every file should have its extension as the code of the corresponding station, e.g. ‘20200101.HC’ is correct; ‘20200101.txt’ is not valid and an error will occur.

readgems

A general function for reading original gems data. It dispatch the task to the specified function according to file extension.

read_gemsdat

read yyyy_mm_dd_HH_MM_SS.dat from GEMS developed by AnaSystem The following is GEMS.dat format: Header is the first 100 bytes. All parameters in header is in a unit of 2 bytes Hence, the header has 50 fields.

  • 0-11 bytes is for date
  • 0-1: year
  • 2-3: month
  • 4-5: day
  • 6-7: hour
  • 8-9: minute
  • 10-11: second
  • 12-13: sampling rate
  • 14-15: Channel Options (15 has 4 CHs,255 has 4 CHs,7 has 3 CHs,3 has 2 CHs)
  • 16-99 bytes are padded by zero, for future use.

From the 100th byte on, every 4 byte is used to restore data. The order is TimeStamp, Ch1, Ch2, Ch3, Ch4. If one channel is not used, skip it. e.g., only use chn1 and chn3, then recording order is TimeStamp, Ch1, Ch3, TimeStamp, Ch1, Ch3, and so on. In every cycle, TimeStamp is a 4-byte integer, time drift from the header time, unit is minisecond. channel’s data is 4-byte single-precision float. Note that the byte order of all data is “little endian.” The value of voltage is between -10 V and 10 V.

function [data, header] = read_gemsdat(absolute_FileName) input: absolute_FileName: string, absolute full file name output: data = [time,chn1,chn2,chn3,chn4], N x 5 header = [yyyy;mm;dd;HH;MM;SS.SS;fs;chnOPT;…] built-in func: fopen, fclose, fread, fseek self-defined func:

usage:

[data, header] = read_gemsdat('g:\GEMSdat\em10\REC\Y2012\M02\D07\2012_02_07_16_45_00.dat');

Origin

  • readanasystem.m
  • Author: Han-Lun Hsu

First modification

  • Matlab version: 2018b
  • Author: Hong-Jia Chen (redhouse6341@gmail.com)
  • Date: 2018/07/01

Second modification

  • Matlab version: 2021b
  • Author: Hong-Jia Chen (redhouse6341@gmail.com)
  • Date: 2021/10/11

The docstring is revised by Tsung-Hsi Wu in 2024.

geoelectric_projection

One field data of any direction can be determined by data of another two directions E(phi)=[-E1*sin(phi-theta2)+E2*sin(phi-theta1)]/sin(theta2-theta1) In this framework, make sure Common Electrode is negative for all channels That is, A is CH01+, B is CH02+, C is CH01- and CH02- function [E] = geoelectric_projection(phi, E1, theta1, E2, theta2) input: phi, theta1, theta2 = ‘degree’ from the north, positive for clockwise E1, E2 = field data of any two direction output: E = field data of the specific direction phi phi = 0 means north phi = 90 means east built-in func: sind numel user-defined func:

usage:

[E_EW] = geoelectric_projection(90, E_chn1, 15, E_chn2, 120); 
[E_NS] = geoelectric_projection(0, E_chn1, 15, E_chn2, 120); 
% where  East is positive;  North is positive

Reference

  • Takahashi et al. (2005): ULF Electromagnetic Environment at Southern Boso Peninsula : Signal Discrimination of the Geoelectromagnetic Data

Origin

  • Matlab version: 2018b
  • Author: Hong-Jia Chen (redhouse6341@gmail.com)
  • Date: 2020/09/02

The docstring is revised by Tsung-Hsi Wu in 2024.

columnind

Return column indices for time, data and others according to the type field on the file name of standard format GE/GM data.

standarddataname

[fname] = standarddataname(stnm, dtstr, type_j) returns the file name of the standard data format. For example, standarddataname('HUAL', '20120202', 'GEMS0') returns stn[HUAL]dt[20120202]type[GEMS0].mat.

write_data

Given the reference tag strcheckwith (that automatically updated inside this function) and the reference tag strj (defined outside the function), write_data creates a new file fname if they are different, and write M into the same matObj if they are the same.

Example In the following example, New matfile matObj is created in the first loop with matObj.M = M. In the subsequent loops, new data M will be appended to the same matObj.M until datestr is updated.

dtstr0 = '00000000';
matObj = 0;
    for j = 1:height(table_i)
        dtstr_j = table_i{j, 'datestr'}; 
        ...
        [matObj, dtstr0] = write_data(M, matObj, dtstr0, dtstr_j, dir_i, fname, dir_alt);
    end

Available Preprocessing Functions

在 GEMS-MagTIP 中,訊號處理通過預處理函數進行,這些預處理函數是位於 src/preprocess 的特殊函數。 no 函數的功能是從給定的檔案路徑匯入資料,這是必須的。 您可以使用 prplist = prpfunctions() 列出所有的預處理函數。 濾波函數(例如 ULF_AULF_B,在 prplist 中)共用核心函數 generalfilterprocess,該函數將 loadfilters 的輸出作為輸入。

預處理函數的功能是什麼?

  • 匯入資料(即 M
  • 執行一些預處理(例如帶通濾波)
  • 將處理後的資料輸出為與 M 具有相同列數的矩陣
  • 在預處理函數中使用 columnind 獲取時間欄位、數據欄位等的索引。

如何定義自訂的預處理函數與 statind 配合使用,需符合以下要求:

  • 第一個輸入參數應為標準格式檔案的路徑 fpath
  • fpath 匯入資料。
  • 確保欄位索引與 columnind 相容。
  • 將函數的 .m 檔放置於 src/preprocess 資料夾下 (與 no.m 同一個資料夾階層)。

如何使用

  1. 通過statind 階段的 'Preprocess' 接口:statind(..., 'Preprocess', 'ULF_A') (ULF_A 可代換成 prpfunctions() 所列的任意函數名)。
  2. 見以下範例。

範例

內建四個預處理函數,分別為 BP_35BP_40ULF_AULF_B;這些濾波器可透過 statind 階段中的 'Preprocess' 關鍵字參數進行應用。

以下範例展示直接使用(非通過 statind 階段的 'Preprocess' 接口)BP_35 的方法:

fpath = 'D:\GeoMag (main)\GeoMag_GO\KUOL\stn[KUOL]dt[20150714]type[GEMS].mat';
M_out = BP_35(fpath);

在此範例中,資料從 fpath 匯入為變數,然後調用 generalfilterprocess,默認使用 bandpass,每次匯入資料時創建一個帶通濾波器,並將濾波後的資料輸出為與 M 具有完全相同欄位(直行)數的 M_out(但橫列數可能有所減少)。

另一個範例則是應用現有的濾波器(例如 'Filter[BP_35]samp[1]stp[093].mat')於 dir_filter 中(如果可能):

fpath = 'D:\GeoMag (main)\GeoMag_GO\KUOL\stn[KUOL]dt[20150714]type[GEMS].mat';
filtpaths = datalist('Filter*.mat', dir_filter);
FOBJ = loadfilters(filtpaths); % load all filters as a struct FOBJ
M_out = BP_35(fpath, FOBJ);
Docstring

BP_35

The preprocessing function that filter the loaded timeseries with 'BP_35' band-pass filter. It applies down sampling to 1 Hz before band-pass filtering ('Downsampling1Hz', true), and applies edge truncations right before output ('EdgeTruncate', 3162); please refer generalfilterprocess for more details. The lower and upper bounds of frequency are [10^(-3.5) 10^(-1.75)] (in unit Hz); please refer fmt.filterRange.

BP_40

The preprocessing function that filter the loaded timeseries with 'BP_40' band-pass filter. It applies down sampling to 1 Hz before band-pass filtering ('Downsampling1Hz', true), and applies edge truncations right before output ('EdgeTruncate', 10000); please refer generalfilterprocess for more details. The lower and upper bounds of frequency are [10^(-4) 10^(-1.75)] (in unit Hz); please refer fmt.filterRange.

ULF_A

The preprocessing function that filter the loaded timeseries with the 'ULF_A' band-pass filter. It applies down sampling to 1 Hz before band-pass filtering ('Downsampling1Hz', true), and applies edge truncations right before output ('EdgeTruncate', 1000); please refer generalfilterprocess for more details. The lower and upper bounds of frequency are [0.001 0.003] (in unit Hz); please refer fmt.filterRange.

ULF_B

The preprocessing function that filter the loaded timeseries with the 'ULF_B' band-pass filter. It applies down sampling to 1 Hz before band-pass filtering ('Downsampling1Hz', true), and applies edge truncations right before output ('EdgeTruncate', 1000); please refer generalfilterprocess for more details. The lower and upper bounds of frequency are [0.001 0.01] (in unit Hz); please refer fmt.filterRange.

no

M_prp = no(fpath) does no preprocessing, loads and returns the data of fpath in the form of matrix.

Other functions might also depends on no. For example, prpfunctions is dependent on no to get the directory of all functions for preprocessing.

Note

generalfilterprocess'ULF_X'BP_X 系列的核心共用函數;詳情請參考濾波函數

Filtering Functions

Docstring

autobandpassfilter

Given a file of standard format, create and save bandpass filter accordingly. See also fmt; generalfilterprocess.

butterfilt

Butterworth filter.

filterthedata

M_dataf = filterthedata(M_input,prp_tag,...
    'SamplingFrequency',fs,...
    'RemoveOutliers',true);%

where M_input is an n by m array where n is the number of data points, m is the number variable. That is, each row of M_input is a data point; each column is a variable.

generalfilterprocess

Given prp_tag, generalfilterprocess(prp_tag, fpath, 'AvailableFilter', FOBJ) load data from fpath, filter only the data column and return the data as M_prp. The frequency range is inquiried through fmt.filterRange or FOBJ.(prp_tag).(fs_tag).

If FOBJ is provided, generalfilterprocess attempts to do filtering with filters in FOBJ; if there is no adequete filter in FOBJ, it applys filterthedata that generates filter for file fpath with warning message.

Option:

  • 'EdgeTruncate', 1000: Truncate the head/tail by 1000 data points right before output.
  • 'AbnormalLowValue', 1e-6: Replace data of absolute values less than 1e-6 to NaN.

generalfilterprocess involves the following processin; see the docsting therein:

  • loadfilters for information about FOBJ, fs_tag and prp_tag.
  • fmt for fmt.filterRange
  • filterthedata for automatically create a filter and do filtering
  • fillNaN for interpolation (which is required before any filtering)
  • autobandpassfilter for quickly making and saving filters.

loadfilters

F = loadfilters(filtpaths) load filters as a nested structure, that opts = F.(prp_tag).(fs_tag) is compatible to signal.internal.filteringfcns.filterData in bandpass. In which, prp_tag should be in the list of prpfunctions() and fs_tag is the sampling frequency prefixed by “fs” (e.g., ‘fs15’ for 15Hz).

The simplest way to create and save a filter is apply bandpass to data with a certain set of parameters, set debug point inside bandpass, and save opts in the debug scope. For canoical approaches, see Filter Design in Matlab.

The filters should named as ‘Filter[ULF_A]samp[15]….mat’ for example.

The input filtpaths should be a struct having name for file names and fullpath for file paths.

See also:

  • prpfunctions(): list all preprocessing functions that might support F.
  • signal.internal.filteringfcns.filterData (Signal processing package of matlab)
  • statind(__)

try_generalfilterprocess

A try-catch wrapper to avoid aborting execution due to error raised in generalfilterprocess.

Internal API for TIP Calculation

Docstring

TIPArea

Given the Longitude Lon, Latitude Lat and the radius Rc, [isinS,insideRatio] = TIPArea(Lon,Lat,Rc,xLon,yLat) returns a logical vector indicating whether each grid point specified by xLon and yLat is inside Rc centered at (Lon, Lat).

TIPArea used isInpolygon of okatsn/toolbox which simply splits circles by NaN before applying matlab’s inpolygon. In the official documentation of inpolygon in the section “Points Inside Multiply Connected Polygon”, if the polygon in the inner loop are defined in a counterclockwise direction while the outer one in a clockwise direction, inpolygon identifies the points inside the outer polygon but not inside the inner polygon. What the official documentation didn’t explicitly say is, if two (or more) polygons are all defined in the same direction, which is our case that all circles are defined in the counterclockwise direction, inpolygon identifies points in either polygon, which the expected behavior in our application. That is, when the TIP area of one station overlapped with the TIP area of another one station, isinS is true if a grid point is inside of either and (xLon(isinS), yLat(isinS)) are essentially the union of the TIP area of all stations.

Unit:

  • xLon (yLat): degree of longitude (latitude).
  • Rc: kilometer. Noted that Rc is convert to the unit of degree and be the mean of those based on the latitude and longitude of the center respectively. (Hint: when converting km to deg, it depends on latitude and the longitude. For example, at lower latitude, the distance of 1 deg interval in longitude is much larger than that of 1 deg interval at high latitude.)

Input:

  • xLon (yLat): N by 1 array for longitude (latitude)
  • Rc: M by 1 array of raddi
  • Lon, Lat: M by 1 array of the coordinate of centers each.

Output:

  • isinS: a N by 1 array, where xLon(isinS),yLat(isinS) specify all the points (clearly non-repeated) that are within the ranges of radii Rc inside the centers specified by Lon, Lat.

TIPspatialtemporal

Given the dates to calculate TIP area dates_TIPtrue, the range (Rc) around the center point (LatLon) in LatLonRc_TIPtrue, and the grid points of the entire map (xLon, yLat), [dt_TIP_true_uniq,TIP_true_area,isinTS] = TIPspatialtemporal(dates_TIPtrue,LatLonRc_TIPtrue, xLon, yLat) returns the unique and sorted vector of dates (dt_TIP_true_uniq) and the vector of spatial TIP area TIP_true_area for each date in dt_TIP_true_uniq.

The value of TIP_true_area is the ratio of spatial TIP area to total area. When in the context of calculating the total amount of alarmed area, sum(TIP_true_area) is the alarmed area ratio to the entire 2D space specified by xLon and yLat (i.e., which are all grid points in the region of [TwLat0, TwLat1],[TwLon0, TwLon1]). See TIPArea for more information.

isinTS is a S by T logical array specifying spatial-temporal points that are within the ranges of radii Rc inside the centers specified by Lon, Lat, where [Rc, Lon, Lat] = LatLonRc_TIPtrue is a M by 3 array where \(M \geq T\) must be satisfied.

Noted that dt_TIP_true_uniq are successive (e.g., 1, 2, 4) dates but not necessarily consecutive (e.g., 1, 2, 3).

bestmodel

[BestModels,BestModelNames,idPermute,molList] = bestmodel(molList,BestN,totalM,StationLocation) filters the best N models out based on fitting degree and perform random permutation of total M combination of them. That is, bestmodel() save the best N (at most) models for each station in BestModels, and gives an array of M elements that are randomly picked from the set of the ranking numbers of the best N (at most) models.

Input arguments:

  • molList: The list of paths to the files of ranked models, which are produced by molscore. Get the molList using datalist(...).
  • BestN: The number the models of highest molchan score.
  • totalM: The amount of randomly permuted combinations of the best N models. That is, M is the number of joint-station models. Noted that the first model combination is forced to be the combination of the best model of each station.
  • StationLocation: A J by 3 table with the column names being ‘format’, ‘Lon’, ‘Lat’ and row names (i.e. StationLocation.Properties.RowNames) the station codes (e.g. ‘TW’, ‘CS’,…) of the total J stations.

Output arguments:

  • BestModels: A J-by-1 cell array containing the best models of total J stations. In each cell there is a table model parameters.
  • BestModelNames: An J-by-1 cell array containing station names corresponding to BestModels (also J-by-1 cell array).
  • idPermute: The randomly permuted indices for the best models. It is a J by 1 cell array, where each cell contains a M by 1 double array being the indices for the Jth BestModels. Noted that the first element in the M by 1 double array is always 1, which force the first model combination in jointstation() to be the combination of the best model of each station (SEE molscore3()).
  • molList: The list of paths to the files of ranked models, where the files having all molchan scores to be NaN (hence model parameters cannot be ranked) are excluded.

Keyword Arguments:

  • ‘NearbyThreshold’: For some reasons (e.g. no target earthquakes at all in the training phase) models cannot be ranked (optimized) for a station (saying ‘MS’) because all molchan scores are NaN. In this situation, bestmodel will try to find if there is a near-by station within 15 km (in default). If there is, for example, a station ‘PT’ nearby within 15km of ‘MS’, then the ranked model ‘PT’ will be taken instead. That is, in the following process in molscore3, the best models of ‘PT’ and the AIN based on the geomagnetic data of ‘MS’ are applied for calculating the probabilities around the station of ‘MS’.

  • ‘ModelSelect’: You may filter the table of Molchan scores by using this option. For example, with 'ModelSelect', {'Tpred', 5}, all rows whose Tpred is not 5 will be discard.

  • ‘ModelSelectOP’

    • The operator function for model selecting if 'ModelSelect' have multiple pairs of input argument.
    • For example, with 'ModelSelect', {'Tpred', 5, 'Tlead', 5}, 'ModelSelectOP', @or, all rows with Tpred=5 or Tlead=5 was kept; others, discarded.
    • Default is @or.

convAIN

Calculate the total amount of anomaly indices of each day. That is, saying we have two statistical indices S and K, and the total amount of anomaly indices is less or equal than 2. It is the ‘update_tsAIN’ of the previous version.

Example:

[sum_tsAIN, sum_validateIndex] = convAIN(tsAIN.Athr, tsAIN.validateIndex)

For each Athr we have one anomaly index (true or false) and validation index (true or false) per day per variable, and the output is the number of “trues” per day. Usually Athr = 1:10, so the output will be a 10 by 1 cell in each there is a N by 1 double; N is the length of the DateTime series.

Input arguments:

  • tsAIN.Athr or tsAIN.validateIndex: A 1-by-10 structure array. See anomalyind for more information.

Output arguments:

  • sum_tsAIN or sum_validateIndex: A 10-by-1 cell array corresponding to \(A_{thr} = 1:10\). In each cell, there is a N-by-1 double array being the sum of anomaly indices or the sum of valid indices of N days.

dt_TIP_true

[dt_TIPtrue,TIPtime,TIP] = dt_TIP_true(DateTimeSAT,vlSAT, Tthr,Tlead,Tpred) return the datetime where TIP is true.

Input arguments:

  • DateTimeSAT: M-by-1 datetime array for sum of anomaly days
  • vlSAT: M-by-1 double array for the values of sum of anomaly days (the anomaly days is summed in the moving window of Tobs)
  • Tthr: the threshold related to the number of anomaly days in Tobs
  • Tlead: the leading window right before the prediction window
  • Tpred: the window of prediction (where TIP is true or false according to the number of anmalies in the corresponding Tobs window)

Output arguments:

  • dt_TIPtrue: the datetime where TIP is true
  • TIPtime: an N-by-1 datetime array paired with the TIP array
  • TIP: an N-by-1 logical array of Time of Increased Probability

The variable dt_TIPtrue may exceeds the last day of DateTimeSAT, while TIPtime is truncated to not exceed DateTimeSAT(end) in molscore, because in training phase datetime out of DateTimeSAT is assumed to be out of the time range of available earthquake catalog.

A schematic illustration of the input/output variables:

sum_validateIndex_all_a_dti (2nd input argument in sumanomalyind)
               1100111011000001010010000000001010000000 (for example) 
DateTime_dti   tttttttttttttttttttttttttttttttttttttttt
Tobs length    |-----|  (Tobs=7, for example) 
Tlead length          |-------| (Tlead=9, for example) 
vlSAT                5445543222122233222221112122344332 (for example) 
DateTimeSAT index    123456789.........................         
DateTimeSAT    |Tobs |--------------------------------|         
TIPtime/TIP    |Tobs || Tlead ||--------------------------------------|
                               ↑                |Tobs || Tlead ||Tpred| 
                   Tpred start ↑        last AIN data ↑     Tpred end ↑    

dt_TIP_valid

dt_TIP_valid gives loosely valid TIP time and strictly invalid TIP time. There are two possibilities for a TIP false: one is that the anomaly index number (AIN) is below the threshold, and the other is that AIN is NaN since anything in comparison with a NaN results in a false. The later we called it an ‘invalid’ TIP time/day.

Example:

[TIPtime1,TIP_valid,dates_TIPvalid_st,dates_TIPinvalid_st] = ...
dt_TIP_valid(DateTimeSAT,sum_validateIndex_all_a_dti,Tobs,Tlead,Tpred)

This gives the datetime where there is at least one data in the Tobs and hence TIP false is a valid false.

Input arguments:

  • DateTimeSAT: see sumanomalyind()
  • sum_validateIndex_all_a_dti: the sum_validateIndex in convAIN()
  • Tobs: the observation time window right before the leading window (Tlead).
  • Tlead: the leading window right before the prediction window
  • Tpred: the window of prediction (where TIP is true or false according to the number of anmalies in the corresponding Tobs window)

Output arguments:

  • TIPtime1: a 1-d datetime array for TIP_valid
  • TIP_valid: a 1-d logical array indicating whether the TIP is valid or not.
  • dates_TIPvalid_st: Returns the datetimes of where the TIP have a meaningful true/false.
  • dates_TIPinvalid_st: Returns the datetimes of where the TIP have a meaningless false. (If TIP is true, then the day should always be valid)

About the meaningful and meaningless ‘false’, see the comments in anomalyind()

Recalling that sum of AIN is the sum of anomaly index calculated in the moving time window of length Tobs, and the dates DateTimeSAT(sum_AIN_valid) denotes the end day of Tobs where at least one of the anomaly index in the Tobs window is meaningful, we have the following example where Tlead = 9, Tobs = 7, and Tpred = 3, for explicitly demonstrates how the TIP’s valid times are calculated:

sum_validateIndex_all_a_dti
              '1100111011000001010010000000001010000000' (for example) 
DateTime_dti   tttttttttttttttttttttttttttttttttttttttt|
DateTimeSAT          tttttttttttttttttttttttttttttttttt|
sum_AIN_valid        1111111111111111111110001111111110|
dt_AIN_valid         ttttttttttttttttttttt   ttttttttt |
               |Tobs ||<--a--->|                       |<--b----->|
TIPtime                        tttttttttttttttttttttttttttttttttttt
Tpred moving window            |c|                              |c|
                               | the same length as DateTimeSAT |
TIPvalid_0                     123456789........................
                                \\\\\\\
TIPvalid_1                       3456789..........................

Therefore, 
dt_TIPvalid = unique(TIPtime([1:3,2:4,3:5,....]))

% t denotes a datetime object
% a = Tlead+1 = idinc2predstart
% b = Tlead+Tpred = idinc2predend
% c = Tred
% dt_AIN_valid is DateTimeSAT(sum_AIN_valid), and sum_AIN_valid is a.k.a Tobs_TF.
% TIPvalid_0 (_1) is a.k.a. TIP_TF_0 (_1)

eqktip1

[EQK,TIP, TIPtime] = eqktip1(eqMR,eqDateTime, DateTimeSAT,vlSAT, ...
                   Tthr_iMod,Tlead_iMod,Tpred_iMod, Mc_iMod,Rc_iMod)

provides 1 dimensional logical array of target earthquakes (EQK) and Time of Increased Probability (TIP), corresponding to the 1-d datetime array TIPtime.

Input arguments:

  • eqMR: a N by 2 array of all-combination of model magnitudes and radii of detection, with the first column being magnitude and second column the radius of detection.
  • eqDateTime: a N by 1 cell array. Each cell contains the datetime of target earthquakes satisfying the corresponding row of eqMR; i.e., being larger than eqMR(i,1) and within eqMR(i,2).
  • DateTimeSAT, vlSAT: see sumanomalyind.
  • Tthr, Tlead, Tpred: see dt_TIP_true.
  • Mc_iMod: the current model magnitude.
  • Rc_iMod: the current model radius of detection.

Output arguments:

  • EQK: the M-by-1 logical array for whether there is/are target earthquakes in the day.
  • TIP: an M-by-1 logical array of Time of Increased Probability
  • TIPtime: an M-by-1 datetime array paired with the TIP and EQK array

See also:

  • [~,eqMR,eqDateTime] = isearthquakeinrange(...) in molscore
  • [DateTimeSAT,vlSAT] = sumanomalyind(...) in molscore

get_modelparam

Use [var1, var2, ...] = get_modelparam(BestModels, 'name1', 'name2',...) to get specific model parameters uniquely. BestModels is the output of bestmodel().

Example:

  • Tpreds, Tleads = get_modelparam(BestModels, 'Tpred', 'Tlead')

jointstation

jointstation calculates hit rates, alarmed rates, and spatial probability.

Example:

When it is applied for calculating the hit rates of the training phase:

HitRates_trn = jointstation(BestModels,BestModelNames,idPermute,...
             sum_tsAINs,sum_validateInds,DateTime_dti,...
             CWBcatalog,'CalculateHitRate',true);

When it is applied for calculating the forecasting probabilities:

[HitRates_frc, AlarmedRates_frc, SpatialProbability, xLon, yLat, ...
    validStationTime, ProbTime, TIP3, TIPvalid3, EQKs, Other]...
    = jointstation(BestModels,BestModelNames,idPermute,...
         sum_tsAINs,sum_validateInds,DateTime_dti,...
         CWBcatalog,'CalculateHitRate',true,...
         'CalculateAlarmedRate',true,'CalculateProbability',HitRates_trn,...
         'StationLocation',StationLocation, 'WithExtra', true);

Keyword Arguments:

  • ‘CalculateHitRate’: Whether to calculate the hit rates. Default is false.

  • ‘CalculateAlarmedRate’: Whether to calculate the alarmed rates. Default is false.

    • If ‘CalculateProbability’ is true, ‘CalculateAlarmedRate’ will be automatically set to be true since alarmed rates are essential to probability calculation.
  • ‘CalculateProbability’: Whether to calculate the spatial probability. Default is false.

  • ‘StationLocation’:

    • A J by 3 table with the column names being ‘format’, ‘Lon’, ‘Lat’ and row names (i.e. StationLocation.Properties.RowNames) the station codes (e.g. ‘TW’, ‘CS’,…) of the total J stations.
    • If ‘CalculateAlarmedRate’ is true, ‘StationLocation’ must be specified; otherwise it is impossible to obtain alarm rates.
    • Default is 0.

Input arguments:

  • BestModels: A J-by-1 cell array containing the best models of total J stations. In each cell there is a table model parameters.
  • BestModelNames: An J-by-1 cell array containing station names corresponding to BestModels (also J-by-1 cell array).
  • idPermute: The randomly permuted indices for the best models. See bestmodel().
  • sum_tsAINs: A J-by-1 cell array with each cell corresponding to BestmodelNames. See loadAIN().
  • sum_validateInds: A J-by-1 cell array with each cell corresponding to BestmodelNames. See loadAIN().
  • DateTime_dti: A T by 1 datetime array for the time series in sum_tsAINs and the sum_validateInds.
  • CWBcatalog: A table of earthquake catalog. You may load CWBcatalog using only1field([dir_catalog filesep 'catalog.mat']).
  • WithExtra: Output an additional struct of supplementary data as the Other variable.

Output arguments:

  • HitRates, AlarmedRates: M by 1 array, being the hit rates (alarmed rates) of each combination of best models. M is the amounts of total combinations (the totalM in bestmodel()). They are AlarmedRateForecasting and HitRatesForecasting of molscore3’s output. AlarmedRates(i) can be a number or NaN (when 0/0).

  • SpatialProbability: A S by T array of the temporal-spatial forecasting probabilities. It is the output variable Probability of molscore3().

  • xLon (yLat): The longitudes (latitudes) of all spatial points as a S by 1 array. It is the output variable ProbabilityLon (ProbabilityLat) of molscore3().

  • validStationTime: A T by J table of the ratio of valid models each day each station. This is for plotProbability(), that the ratio will be demonstrated as gradient color in the marker of each station. If the ratio is zero, the marker on map will be hollow. It is the output variable validStationTime of molscore3().

  • TIP3: A S by T TIP array for the first model (out of the total M models). Since the first element of each cell in idPermute is always 1, indicating the best model, TIP3 is essentially the TIP array calculated according to the best model parameter.

  • TIPvalid3: A spatial-temporal logical array indicating whether a point (s,t) of TIP true or false is meaningful or not. This is also a S by T array.

  • ProbTime: A 1 by T datetime array for TIP3, TIPvalid3 and SpatialProbability. This is the ProbabilityTime of molscore3’s output.

  • EQKs: The list of target earthquake, with each column

    • time, Lon, Lat, Depth, Mag: The time, longitude, latitude, depth and magnitude (\(M_L\)) of the target earthquake.
    • InStation: Indicates the target earthquake is of which station.
    • EqkDist: The distance between the station (as specified by InStation) and hypocenter of the target earthquake. Also see simpleStationEQKdist3D.
  • Other: The additional output when ‘WithExtra’ is true.

    • Other.nEQK: The number of target earthquakes for each model in forecasting phase.
    • Other.areaTIP: The total area (spatial + temporal, i.e., sum(TIP_valid_area)) of TIP for each model in forecasting phase. See TIPspatialtemporal and TIPArea.
    • Other.nEQK .* HitRates is the number of hitted earthquakes for each model in this forecasting phase.
    • Other.areaTIP .* AlarmedRates is the total amount of alarmed TIP area (i.e., sum(TIP_true_area)) for each model in this forecasting phase.
    • Noted that it is nonsesnse to set ‘WithExtra’ to be true when ‘calculateHitRate’ and ‘calculateAlarmedRate’ are false.

loadAIN

[sum_tsAINs,sum_validateInds,DateTime_dti,AthrList] =
loadAIN(dir_tsAIN,BestModelNames,TimeRange)

load tsAIN, calculate the summation of anomaly index of each day, and truncate the time series to TimeRange.

If class(BestModelNames) == 'char', for example, ‘CS’, the output sum_tsAINs and sum_validateInds are the 10 by 1 cell array directly, instead of a M by 1 cell array inside the J by 1 cell array where J = 1.

Input arguments::

  • dir_tsAIN: the directory for the saved AIN data. See anomalyind().
  • BestModelNames: an J-by-1 cell array containing station names corresponding to BestModels (also J-by-1 cell array).
  • TimeRange: a two-element datetime array.

Output arguments::

  • sum_tsAINs: a J-by-1 cell array with each cell corresponding to BestmodelNames. In each cell there is a 10-by-1 cell corresponding to the list of \(A_{thr}\) (AthrList = 1:10), being the timeseries of the summation of anomaly index of each day.
  • sum_validateInds: a J-by-1 cell array with each cell corresponding to BestmodelNames. In each cell there is a 10-by-1 cell corresponding to the list of \(A_{thr}\) (AthrList = 1:10), being the summation of the indicators for the valid statistical indices of each day.

modparam

In GEMS-MagTIP, model parameters defines TIP and target earthquakes (EQK). [PredParam,varargout]=modparam() generate the whole set of model parameters that are going to be used in the training phase by molscore.

Keyword Arguments:

  • ‘Rc’: Radius (range) of detection in unit kilometer. Default is [20:10:100])

  • ‘NthrRatio’:

    • The ratio threshold to be converted to the number of anomalous statistic index every day.
    • ‘NthrRatio’ should be calculated automatically in molscore() according to the maximum possible number of statistic indices among the anomaly index calculated by anomalyind().
    • Nthr is the threshold (integer) for the number of anomalous statistic indices each day.
    • Default is [0.01, 0.99], which is for the situation where the maximum possible Nthr is \(\leq 2\) (i.e., to be [0, 1]). If you have the maximum possible Nthr to be \(\leq 3\) (i.e, Nthr = [0 1 2]), ‘NthrRatio’ should be assigned to be [0.01, 0.49, 0.99] for example.
  • ‘Ptthr’: Ptthr defines Tobs through Ptthr = Tthr/Tobs. Default is [0.1:0.1:0.5].

  • ‘Tobs’: The length of observation moving window. Default is [5:5:100];

  • ‘Tpred’: The length of prediction moving window.

  • ‘Tlead’: The length of leading window, the time gap that is for the period of silent (no-anomaly) right before big earthquakes.

    • Default is [0:5:100].
  • ‘Mc’: The magnitude threshold. Default is 5, which means only events of \(M_L \geq 5\) will be considered as target earthquakes.

  • ‘Test’:

    • If true or 1, then the PredParam is reduced to 1000 randomly picked models from the default combination.
    • If it is a positive integer N other than 1, the predParam is reduced to N randomly picked models.
    • Default is 0.
  • ‘Seed’: the seed for the random selection while ‘Test’ is enable.

    • Seed must be a nonnegative integer seed less than 2^32.
    • If seed is less than zero, the random selection has its result different everytime.

Output:

  • PredParam: The GEMSTIP/MagTIP models of total 8 free parameters: G = [Mag,Rad,NthrRatio,Athr,Tthr,Tobs,Tpred,Tlead].

Examples: - [PredParam, columnNames] = modparam('OutputFormat','double'); - [PredParam_table] = modparam('OutputFormat','table');

shift2TIPtime

[dt_true_in_TIP, TIPtime] = shift2TIPtime(DateTimeSAT,Tobs_TF,Tlead,Tpred) gives the date time array in the domain of TIPtime, where the trues in TIPtime are defined in Tpred window according to the data in Tobs window. Tobs_TF:

  • trues or falses according to the data in Tobs moving window.
  • these true-falses should be exactly the same size of DateTimeSAT, where DateTimeSAT is the moving-window’s time tags (at the last day of the moving time window)
  • e.g., vlSAT>=Tthr_iMod
  • e.g., logical(MovingWindowSum(sumvalidateIndexalladti,Tobs))

(these examples are taken from see dtTIPtrue.m and dtTIPinvalid.m)

dt_true_in_TIP: the corresponding trues in the TIP time domain.

sortModelBy

Sort model names by a certain column value. Example: [sortedModelNames, I] = sortModelBy('Lat','descend',ModelNames,TheTable);

sumanomalyind

[DateTimeSAT,vlSAT] = sumanomalyind(DateTime_j,sum_tsAIN_k,Nthr_array,Tobs_i) calculate sum of anomaly day (vlSAT) and its corresponding datetime series (DateTimeSAT). It is simply the moving-window sum of the sum_tsAIN_k timeseries, where the moving-window length is Tobs_i; in which, i stands for \(i_{th}\) model, j for \(j_{th}\) station, and k for \(k_{th}\) threshold \(A_{thr}\). Of coursely, the number of elements of the output timeseires (DateTimeSAT) will be Tobs_i - 1 less than the input timeseries (DateTime_j or DateTime_dti); i.e., length(sum_tsAIN_k) - length(vlSAT) = Tobs_i - 1.

Input arguments:

  • DateTime_j: the datetime array for sum_tsAIN.
  • sum_tsAIN_k: the sum of anomaly statistical indices each day.
  • Nthr_array: The thresholds for sum_tsAIN_k calculated according to NthrRatio. Nthr is known as the threshold of the total daily amount of anomaly statistical indices.
  • Tobs_i: the observation window of model i.

Output arguments:

  • DateTimeSAT: M-by-1 datetime array for sum of anomaly days
  • vlSAT: M-by-1 double array for the values of sum of anomaly days (the anomaly days is summed in the moving window of Tobs)

See also:

  • convAIN for more details about sum_tsAIN_k.

truncateTIPtoSAT

Internal API associated with the truncation of TIP array to the datetime array for the sum of anomaly indices.

Format

Docstring

fmt

fmt is a class that

  1. stores all constant names
  2. stores all constant variables

that enables you to modify or extend some contents without breaking other functionalities.

GEMS-MagTIP depends on fmt to load a variety of intermediate data, including earthquake catalog, station information table and the input data of the standard format. fmt save constant names and variables as properties (Constant) that they can be available in global scope. fmt also have static methods for acquiring information such as the column indices indicating the time vectors and recorded values of the input data of different types.

properties (Constant):

  • InfoFileName_molscore = '[MolchanScore]Information.mat': name to the intermediate file in the session of training (model optimization)

  • InfoFileName_anomalyind = '[tsAIN]Information.mat': name to the intermediate file in the session of calculating anomaly indices

  • InfoFileName_jointstation = '[JointStation]Information.mat': name to the intermediate file in the joint-station method session

  • InfoFileName_statind = '[StatisticIndex]Information.mat': name to the intermediate file in calculating Statistic Index

  • catalogFileName = 'catalog.mat': file name to the earthquake catalog

  • stationLocationFileName = 'station_location.mat': name to the file of station information table

  • tag_upperlowerthreshold = 'ULthr': tag for upper- and -lower threshold defined by Athr

  • datestrFormat = 'yyyymmdd': the format for converting datetime to string tags (dt)

  • datetimeInputFormat = 'yyyyMMdd': the format for converting date strings (with tag dt) to datetime

  • datetimeIncrement: the increment of datetime between two consecutive files of the same station and type.

  • LatitudeLimit = [21.5 26]: latitude limits of the Taiwan area

  • LongitudeLimit = [118 122.5]: Longitude limits of the Taiwan area

  • datatype_mag_1 = {'mFull', 'full'}: "type" tag for the full field geomagnetic data (old geomagnetic stations from 2006; ‘full’ is for backward compatibility)

  • datatype_mag_3 = {'mTri', 'tri'}: "type" tag for the 3 component geomagnetic data (new geomagnetic stations from 2020; ‘tri’ is for backward compatibility)

  • datatype_GEMS_HJ: "type" tag for the data of Hong-Jia Chen 2018

  • datatype_GEMS: "type" tag for the primitive of Geoelectric data

methods (Static):

  • colindex2data(what_type): returns column indices to time and data part of the loaded matrix. Example: [colind_time, colind_data] = colindex2data('GEMS0')
  • filterRange(filter_tag): returns two values indicating the lower and upper limit of the filter, in the unit of Hz. Example: fmt.filterRange(ULF_A).
  • datasampfreq(what_type): returns the expected Sampling Frequency of a specific file/data type. Example: fs = datasampfreq('GEMS0'). You can obtain expected data points of the day by supposedpts = fs*86400.
  • fmtfieldname = datatype_fieldname(what_type): returns the field name for fmt that you can fmt.(fmtfieldname).

Plotting functions

Caution

plotCB

Plot confidence boundary Make sure xpt0 and Dcb_max are both N by 1 array.

plotEQKTIP1

plotEQKTIP1(dir_tsAIN,dir_molchan,dir_catalog,dir_output) plot one-dimensional TIP time series for each station with target earthquakes (EQK) scattered over the diagram; the results are saved as png files in dir_output.

Keyword Arguments:

  • ‘ShowTrainingPhase’: Whether to show the EQK and TIP on the plot. Default is false.
  • ‘scatter’: Set it true to add an additional layer of scattering points of the same color on the top of every TIP==1. This option allows better visual inspection on whether target earthquakes lay in the area of TIP==1 or not. Default is false.
  • ‘Rank’: Choose the model by fitting degree ranking for defining EQK and TIP. Default is 1 (plot the EQK and TIP that are defined by the rank 1 model).
  • ‘OnlyStations’: Specify only a subset of station to be displayed on the plot; for example, for 'OnlyStations',{'MS','TW'} only the results of these two stations will be displayed. In default, results of all stations will be displayed.
  • ‘datetimeTickArguments’: Manually define the datetime format of the X tick labels. For example, for 'datetimeTickArguments', {"yy' mmm.", 1,'months'} the datetime 2012-Nov-12 will be printed as “12’ Nov.”. For more information, see datetime_ticks().
  • ‘TargetPattern’: See the documentation in plotProbability().

plotEQKTIP3

Like plotEQKTIP1, but expand the 3D TIP array into a 2D diagram by squashing spatial coordinates (latitude and longitude) into one.

plotFittingDegree

plotFittingDegree(dir_jointstation,dir_catalog,dir_png) gives fitting degree analysis and molchan diagrams for each training-forecasting phase pair according to the results from molscore3.

plotProbability

plotProbability(dir_jointstation,dir_catalog,dir_out) plots two-dimensional probability forecasting maps; the results are saved as png files in dir_output.

Keyword Arguments:

  • ‘LongitudeLimit’: The longitude limits of the displayed map. Default is [119 122.5].

  • ‘LatitudeLimit’: The latitude limits of the displayed map. Default is [21.5 26].

  • ‘TimeRange’: Manually assign time ranges or specific dates that must lay in the forecasting phase to plotting probability forecast. If not set, the probability forecast of every available dates will be plotted. The assigned datetime array must be either:

    1. An N by 2 datetime array specifying N ranges at one time.
    2. An N by 1 datetime array specifying N days to be plot.
  • ‘PlotEpicenter’: Whether to plot the epicenter(s) of the target earthquakes on the map. Default is 0.

    • plotEpicenter, 1: plot target events only (those in Rc).
    • plotEpicenter,'all': plot all events even when they are not in Rc.
  • ‘Rigorous’: Whether to drop (ignore) the probability forecasts that are not in the range of the forecasting phases. Default is true. Generally the leading time window (Tlead) is different among models, and the probability can be calculated as far as the Tlead of at least one model covered. However, for the dates out of the forecasting phase, only part of the models are applied thus the probability forecast is weaker.

  • ‘TargetPattern’: the pattern for filtering the files in dir_jointstation. Default is to look up all files that fits the pattern “[JointStation].mat”. For example, you can manually specify it as ”[ULF*A]” to give plots of that have the tag ULFA in the file name only. The pattern should always begin with “” and the last cahracter can never be ””. For multiple filtering at once (e.g., '*[ULF_B]*[Tpred-1]'), join the pattern with “” and the order is important.

plot_Rc

Plot radius of detection around the station.

plot_dataoverview

plot_dataoverview(dir_stat, dir_catalog) plot an overview of all data. Keyword Arguments:

  • ‘SaveFigureTo’: the output path

  • ‘BackgroundColor’: Background color; default is white ([1,1,1])

  • ‘DatetimeTicks’: The format of datetime ticks on the x-axis.

    • Default is {'yyyy-mm',6,'Months'}, which means tick labels are in the format of yyyy-mm and interspaced every 6 months.

plot_epicenter

plot epicenter [scatter_handle, axes_handle] = plot_epicenter(CWBcatalog,[ax], kwargs...)

Example

SLoc = only1field(fullfile(dir_catalog, fmt.stationLocationFileName));
checkcatalog(dir_catalog)
CWBcatalog = only1field(fullfile(dir_catalog, fmt.catalogFileName));
CWBcatalog = eventFilter(CWBcatalog, 'Magnitude', 5, 'TimeRange', '20220601-20221002');

figure;
[sc1, ax1] = plot_station(SLoc,'PlotFunction','map', 'MarkerFaceColor', 'none', 'MarkerEdgeColor', '#D95319', 'StationCode', false); % plot all station
hold(ax1, 'on');
plot_epicenter(CWBcatalog, ax1, 'PlotFunction','map'); % plot epicenters
set(gcf, 'Position', [100 100 792 669]);

plot_station

plot map and station. See plot_epicenter.

Others

Docstring

Molchan_CB

[molt_cb,moln_cb] = Molchan_CB(N,alpha) gives the confidence boundary in molchan diagram, where [molt_cb,moln_cb] are the points defining the boundary on the alarmed-rate-against-missing-rate phase plane.

Input Arguments:

  • N: total number of target events/earthquakes
  • alpha: 1-alpha is the confidence level. For example, alpha=0.05 means 95% confidence level.

Output Arguments:

  • molt_cb: values corresponding to the alarmed rate
  • moln_cb: values corresponding to the missing rate

Hint:

  • Calculate the \(D\) (fitting degree) values by Dcb = 1 - molt_cb - moln_cb

askquestion

Example

askquestion('Hello, would you like to continue? [Y/N]', @(x) strcmp(x, 'Y'))
Hello, would you like to continue? [Y/N]Y

ans =

  logical

   1

calcFittingDegree

calcFittingDegree(jpathlist) according to the given files (jpathlist) provides the overall alarmed rate, missing rate that allows the calculation of the overall fitting degree. Make sure to provide correct input list of the [JointStation] variable , for example, those have the same ID and are not overlapped in forecasting time interval for each group; otherwise the calculated fitting degree can be unreasonable.

Example:

dir_png = 'MyResearch/Figures';
jpathlist = datalist('[JointStation]ID[ou7ud]prp[ULF_B]*.mat', dir_jointstation).fullpath;
[AlarmedRate, MissingRate, xticklabel, EQKs, TIP3s, TIPv3s,TIPTimes,LatLons] = calcFittingDegree(jpathlist);
FittingDegrees = 1 - AlarmedRate - MissingRate;
plotEQKTIP3(dir_png,prp_i, xlabels, EQKs, TIP3s, TIPv3s,TIPTimes, LatLons);

Input Arguments:

  • jpathlist: a cell array of the full paths of [JointStation] files that are produced by molscore3. You can simpliy obtain the path list by jpathlist = datalist('[JointStation]ID[ou7ud]*prp[ULF_A]*slc[Tpred-10]*.mat',dir_jointstation).fullpath;

Keyword Arguments:

  • ‘GroupTag’: The tag for grouping the files in jpathlist.
  • ‘GroupNames’: Results are calculated separately according to the assigned group names; alarmed rate, missing rate and so on are only calculated if the file name contains the assigned group names. For example, for ...,'GroupNames',{'Tpred-1', 'Tpred-5'},'GroupTag', 'slc',..., only the files with their names containing tag ‘slc[Tpred-1]’ and ‘slc[Tpred-5]’ are selected, and the results of those being ‘Tpred-1’ and ‘Tpred-5’ are separately calculated. That is, the output xlabel is {'Tpred-1', 'Tpred-5'} and other output arguments (e.g. AlarmedRate) are all cell array of the same dimension as xlabel containing the results of the groups that are calculated separately.
  • Noted that You cannot assign ‘GroupName’ without assigning ‘GroupTag’, but assigning ‘GroupTag’ without assigning ‘GroupName’ is OK, in this case the group names will automatically generated and sorted.

checkcatalog

[isvalid, msg] = checkcatalog(dir_catalog) check if catalog.mat/csv exist in dir_catalog, and convert the catalog.csv to catalog.mat. After successfully create catalog.mat, the original catalog.csv will be moved to the folder ‘original_csv’.

If the catalog.csv does not meet the required format, error will occur. The catalog.csv has to suffice the following condtions:

  • The ‘time’ variable should be in the following format: ’yyyy/MM/dd

HH:mm’.

  • Other variables except ‘time’ should belong the class of ‘double’.
  • Basicaly the following headers (column names),

{‘time’,‘Lon’,‘Lat’,‘Mag’,‘Depth’}, have to exist.

Aliases for the column names in the catalog.csv are allowed; see the Aliases section below.

Keyword arguments: 'OverwriteOriginal', false: The csv files in ‘original_csv’ will not be overwritten. Set 'OverwriteOriginal', true if you have version control.

Aliases: For convenience, aliases of the column name of an variable in catalog.csv will be automatically converted:

  • For earthquake magnitudes, the header of either {‘Magnitude’, ‘magnitude’, ‘ML’} will be automatically converted to ‘Mag’.
  • For depths, the header of either {‘depth’, ‘dep’, ‘Dep’} will be converted to ‘Depth’.
  • For longitude and latitude, either {‘lon’, ‘longitude’, ‘Longitude’},

{‘lat’, ‘latitude’, ‘Latitude’} will be converted to ‘Lon’ and ‘Lat’, respectively.

For convenience, in catalog.csv, if the event time is written in separated two columns ‘date’ and ‘time’, with format ‘yyyy-mm-dd’ (or ‘yyyy/mm/dd’) for ‘date’ and ‘hh:MM:ss’ (or ‘hh:MM’), they will be merged as a single ‘time’ variable sufficing the format mentioned before.

checkstation

[isvalid, msg] = checkstation(dir_catalog) check if station_location.mat/csv exist in dir_catalog, and convert the station_location.csv to station_location.mat. After successfully create station_location.mat, the original station_location.csv will be moved to the folder ‘original_csv’.

If the station_location.csv does not meet the required format, error will occur. The station_location.csv has to suffice the following condtions:

  • The ‘time’ variable should be in the following format: ’yyyy/MM/dd

HH:mm’.

  • Other variables except ‘time’ should belong the class of ‘double’.
  • Basicaly the following headers (column names),

{‘code’,‘format’,‘Lon’,‘Lat’} have to exist. In which,

  • ‘code’ is the code for the station, whereas ‘format’ is the full name of the station. For example, ‘MS’ (code) corresponds to ‘馬仕’ (format).
  • ‘Lon’ and ‘Lat’ are longitude and latitude of the station respectively.

Aliases for the column names in the station_location.csv are allowed; see the Aliases section below.

Keyword arguments: 'OverwriteOriginal', false: The csv files in ’original*csv’ will not be overwritten. Set 'OverwriteOriginal', true if you have version control. Aliases: For convenience, aliases of the column name of an variable in station*location.csv will be automatically converted:

  • For longitude and latitude, either {‘lon’, ‘longitude’, ‘Longitude’},

{‘lat’, ‘latitude’, ‘Latitude’} will be converted to ‘Lon’ and ‘Lat’, respectively.

chkans

Check the answer in the interaction command-line interface in startup0.m

constructMolInfo

constructMolInfo(dir_molchan,InfoId) construct the ‘[MolchanScore]Information.mat’ according to existing [Molchanscore]___.mat files if possible. Use this function only when you are instructed by error messages.

Example constructMolInfo('./var-output/MolchanScore-O04-Cfl_local', 'OT0qPb')

derive_type

derive_type derives station type from tag stn[...].

dirselectassign

dirselectassign(var_names...) prompts user to select directories in a dialog box, and assigned the selected path to workspace with default variable name. If a variable with the same name as the default name has already in the workspace, its assignment will be ignored (i.e. its dialog box won’t pop out). This is a tool for convenience. You can always assign directories explicitly to variable with any name you like.

Example:

  • Four windows will pop out one by one allowing you to assign directories to variables dir_stat, dir_tsAIN, dir_molchan, dir_jointstation:

    dirselectassign('dir_stat','dir_tsAIN','dir_molchan','dir_jointstation');

    (Read the printed message in the Command Window)

  • Total 7 windows will pop out one by one allowing you to assign directories to the variables with default names dir_stat, dir_tsAIN , dir_molchan, dir_jointstation, dir_data, dir_catalog, dir_toolbox:

    dirselectassign();

    (Read the printed message in the Command Window)

fft_of_the_day

Fourier transform of the one-day data.

fitting_degree_summary

fitting_degree_summary gives the object for fitting-degree plot.

formatForecastingPhase

[frcphase] = formatForecastingPhase(frcphase,trnphase) is the function for converting the argument that the keyword option ‘ForecastingPhase’ takes.

genID

Generate trial identifier.

getRcs

SRc = getRcs(dir_molchan,targetpattern) returns unique Rc as a struct with field names being the station code (stn[…]). Example

targetpattern = sprintf('[%s]stn*.mat', get_tags(fmt.InfoFileName_molscore, '', 'once'));
SRc = getRcs(dir_molchan,targetpattern)

loadSheet

loadSheet(filepath,whatisthis) is deprecated; use checkcatatlog and checkstation instead.

mkdir_default

mkdir_default creates/makes folders by default and return their directories with default variable names. The default structure is:

                   variable           default folder name
    ╔═ dir_main ═╤═══════════════════════════════════════╗
    ║            ├─dir_stat         =   'StatisticIndex' ║
    ║            ├─dir_tsAIN        =   'tsAIN'          ║
    ║            ├─dir_molchan      =   'MolchanScore'   ║
    ║            └─dir_jointstation =   'JointStation'   ║
    ╚════════════════════════════════════════════════════╝

Example:

[dir_stat, dir_tsAIN, dir_molchan, dir_jointstation] =
  mkdir_default(fullfile(pwd,'output_var'))
[dir_stat, dir_jointstation] =
  mkdir_default(fullfile(pwd,'var-output'), 'Directories', {'StatisticIndex'; 'JointStation'})

modeldiscard

Filter models negatively (modeldiscard)

modelselect

Filter models positively (modelselect)

prpfunctions

prp_list = prpfunctions() returns the list of existing preprocessing functions that you can apply (those located in the same directory (namely, dir_prp = 'src/preprocess') as the function no). Noted that those under the sub-directories of dir_prp won’t be listed.

renamecols

renamecols(alias, O) rename the column name that matche either in alias of the table O to the newname.

tabledragandfill

A small tool that fill the empty cells of a column by the most recent non-empty value (just like ‘click, drag, and fill’ in the excel).

Example: CWBcatalog = tabledragandfill(CWBcatalog, colname1, colname2, ...)