Tutorial

Requirements

Environment

GEMS-MagTIP relies on the following toolboxes; type license('inuse') in matlab command line or ver to list the toolboxes currently available on your machine.

Toolbox in-use:

  • MATLAB Version 9.11 (R2021b)
  • Mapping Toolbox Version 5.2 (R2021b)
  • Parallel Computing Toolbox Version 7.5 (R2021b)
  • Signal Processing Toolbox Version 8.7 (R2021b)
  • Statistics and Machine Learning Toolbox Version 12.2 (R2021b)

Dependencies

GEMS-MagTIP depends on okatsn/toolbox and CGRG-lab/GEMS-MagTIP-insider.

You have to add these dependencies to path every time before running your scripts, as follows

CGRG-lab/GEMS-MagTIP-insider

Copy CGRG-lab/GEMS-MagTIP-insider to your local drive (e.g., GEMS-MagTIP-insider) and add the source code inside to path:

dir_src = 'GEMS-MagTIP-insider/src';
addpath(genpath(dir_src));

okatsn/toolbox

Copy okatsn/toolbox to the main directory (e.g., GEMS-MagTIP-insider/toolbox) and add the source code inside to path:

dir_toolbox = 'GEMS-MagTIP-insider/toolbox';
addpath(genpath(dir_toolbox));

Getting Started

The main functions of GEMS-MagTIP take directories that contains intermediate data (those .mat files) as input arguments, and output data in the assigned directory.

Here is the minimal example of the chain of main functions:

Prepare your data

You have to prepare/update the following data:

  • earthquake catalog
  • geo-electric data of standard format
  • geo-magnetic data of standard format

Earthquake catalog and station information

Please download or update catalog of events \(M_L \geq 5\) from GDMSN, and move it to spreadsheet/catalog.csv. The spreadsheet/station_location.csv specifies the location of every station, which is included in the copy of GEMS-MagTIP-insider.

Keypoints
  • The headers (column names) for catalog.csv should be ‘time’, ‘Lon’, ‘Lat’, ‘Depth’, and ‘Mag’.
  • The headers for station_location.csv should be being ‘code’, ‘format’, ‘Lon’, ‘Lat’ for station_location.csv.
  • The order of column names can be arbitrarily arranged, but the strings have to be exactly the same as above.
  • To update earthquake catalog, just put the updated table in csv to spreadsheet/catalog.csv with overwriting option and run checkcatalog(dir_catalog); if both catalog.mat and catalog.csv are in the directory dir_catalog, catalog.mat will be overwritten by the new one converted from catalog.csv.
  • To update spreadsheet/station_location.csv, the procedure/workflow is the same as the previous point.
Supported catalog format
GDMSN form
date time lat lon depth ML nstn dmin gap trms ERH ERZ fixed nph quality
Free form
time Lon Lat Depth Mag
2020/8/10 06:41 121.59 23.81 29.86 3.41
2020/8/10 06:29 120.57 22.18 43.54 3.02
2020/8/10 06:14 121.7 22.17 124.78 4.13

here is a partial view on an exemplary station_location.csv:

code0 code format type Lat Lon StartTime EndTime
missing ZB 知本 GM 22.7398 121.065 20201006 missing
missing XC 新城 GM 24.0383 121.609 20201006 missing
missing SM 日月潭 GM 23.881 120.908 20191008 missing
missing CN 暨南 GM 23.9576 120.928 20221229 missing
em3 KUOL 過嶺 GE 24.9629 121.142 2011-09-22 9999-12-31
em4 HUAL 華陵 GE 24.6745 121.368 2012-01-17 9999-12-31
em5 TOCH 頭城 GE 24.8435 121.805 2012-02-10 9999-12-31
em6 ENAN 南澳 GE 24.4758 121.785 2012-02-15 9999-12-31

Data of the standard format

Converting raw data into a standard format is crucial. The conversion involves verification of date-time, conduct projection for geo-electric data to NS-EW, format the file name for indexing, and etc..

Here is the minimal script for converting raw data into the standard format:

dir_gems_raw = 'data-raw/GEMSdat'; % raw GEMS data
dir_mag_raw = 'data-raw/MAG'; % raw MAG data
dir_data = 'data-standard'; % output directory
conv_gemsdata(dir_gems_raw, dir_data, dir_catalog); 
conv_geomagdata(dir_mag_raw, dir_data); 
Tip

Feel free to organize files and folders under dir_data, or convert the same raw data twice, because in GEMS-MagTIP data loading relies on the name of the standard format and the conversion functions will handle duplicated files.

See standarddataname and write_data

For more information please refer the section Load raw data and convert it into a standard format.

Setting Up Directory Paths

Assigning directories for input/output data or variables is necessary before running any of the main function.

For example:

% For windows, use backslash `\`; for unix systems, use slash `/` in the path to directories.
dir_catalog = 'GEMS-MagTIP-insider/spreadsheet'; 
        % directory of event catalog & station location
dir_data = 'standard-data'; 
        % directory of geomagnetic timeseries of "standard format"
dir_stat = 'var-output/StatisticIndex'; 
        % directory of statistic indices
dir_tsAIN = 'var-output/tsAIN'; 
        % directory for storing anomaly index number (AIN)
dir_molchan = 'var-output/Molchan'; 
        % directory for storing Molchan scores
dir_jointstation = 'var-output/JointStation'; 
        % directory for the time series of EQK, TIP and probability
Tip
  • You can use mkdir_default to automatically generate empty directories for main functions.
  • You can use dirselectassign to assign directories via file explorer interface.
Important
  • dir_catalog must contain catalog.csv or catalog.mat, and station_location.csv or station_location.mat.
  • dir_data is the directory that contains GE or GM data of the standard format; see conv_geomagdata and conv_gemsdata for the conversion of raw data to the standard format.

Executing Main Functions in the Correct Order

statind(dir_data,dir_stat); 

anomalyind(dir_stat,dir_tsAIN); 

molscore(dir_tsAIN,dir_catalog,dir_molchan); 

molscore3(dir_tsAIN,dir_molchan,dir_catalog,dir_jointstation); 
Tip

statind, molscore, and molscore3 have alternatives which run the same calculation in parallel. In most cases you can simply append _parfor (e.g., molscore3_parfor(...)) in the function name with no need to modify input arguments to do run in parallel. See statind_parfor, molscore_parfor, and molscore3_parfor.

Comprehensive Sample Script

Here is a sample script “demo/demo_script.m” for the whole process.

Tip

You can run startup0.m and follows the instruction in the command window to add dependencies and assign input/output directories as described above.

%% Convert Raw data to standard type
% Load original data and save them as matfiles of the standard format

conv_gemsdata(dir_gems_raw, dir_data, dir_catalog,'FilterByDatetime',datetime(2020,1,1)); % Convert raw GE data of timestamps after 2020-1-1 to standard format. 
% Assign 'FilterByDatetime' to save time when standard geomagnetic data out of the specified date-time range already exist. 
% Discard 'FilterByDatetime' to convert everything in the raw-data directory.
conv_geomagdata(dir_mag_raw, dir_data,'FilterByDatetime',datetime(2020,1,1)); % The same as above but for the conversion of GM data.

%% Calculate Statistic Index
statind_parfor(dir_data,dir_stat, ... % Load data in dir_data, save index in dir_stat
    'Preprocess',{'ULF_A','ULF_B','BP_40','BP_35'}, ... % with 4 kinds of filtering
    'SavePreprocessedData',false, ... % without saving filtered timeseries
    'StatName', {'S', 'K', 'FI', 'SE'},  ... % the variable name for 'StatFunction'.
    'StatFunction', {@skewness, @kurtosis, @fisherinformation, @shannonentropy}, ...
    'FilterByDatetime',[datetime(2011,1,1), datetime(2022,12,31)]); 
% Assign 'FilterByDatetime' to calculate statistical indices only between 2011-1-1 and 2022-12-31. 
% Noted that standard format GE/GM data have to be available in this range, otherwise you will get NaN if data is missing (or not converted) in `dir_data`.


%% Data overview
% An overview of data avaliability/deficiency according to the results in dir_stat 
plot_dataoverview(dir_stat, dir_catalog);

%% Calculate Anomaly Indices
anomalyind(dir_stat,dir_tsAIN);

%% Training
molscore_parfor(dir_tsAIN,dir_catalog,dir_molchan,... 
    'TrainingPhase', {calyears(3),datetime(2022,4,1);... % Use up to 3-years data before 2022-4-1 for model training.
                      calyears(5),datetime(2022,4,1);... % Use up to 5-years data before 2022-4-1 for model training.
                      calyears(7),datetime(2022,4,1);... % Use up to 7-years data before 2022-4-1 for model training.
                      calyears(9),datetime(2022,4,1)},...% Use up to 9-years data before 2022-4-1 for model training.
    'modparam',{'Test', 5000}); % Remember to disable 'Test' in a real run.

%% Forecast and test
molscore3_parfor(dir_tsAIN,dir_molchan,dir_catalog,dir_jointstation,...
    'OverwriteFile',true, ...
    'ForecastingPhase', repmat([datetime(2022,4,2), datetime(2022,9,27)], 4,1));
    % Manually assign forecasting phases. Typically the size of 'ForecastingPhase' should align with the size of 'TrainingPhase'.

Visualization of the results

Plot EQK-TIP Matching diagram

EQK-TIP of each station:

%% Plot EQK-TIP of each station

% the output directory for the images
dir_png = fullfile(dir_molchan,'png','EQKTIP'); % no need to mkdir

% Plot TIP of individual stations as heatmap, 
% with target earthquakes (EQK) scattered on the top.
plotEQKTIP1(dir_tsAIN,dir_molchan,dir_catalog,dir_png); %,...
    'ForecastingPhase',calyears(1),'ShowTrainingPhase',1,'Rank',1,...
    'ForceMagnitude',false, ...
    'scatter',1); 

% remove the white space around the image.
cropimg(dir_png,'SaveInplace',true);

An example of EQK-TIP plot.

In this figure, the match diagram of EQK and TIP defined by the model of first rank is demonstrated. The intervals of black color denotes the days where there is no data in \(T_\text{obs}\) at all (i.e. this model is invalid at these time) and hence TIP cannot be calculated.

Plot 2D EQK-TIP Matching diagram

EQK-TIP in a 2D temporal-spatial coordinate system:

%% EQK-TIP in a 2D temporal-spatial coordinate system

dir_png = fullfile(dir_jointstation,'png', 'EQKTIP3'); mkdir(dir_png);

% Find the data of ID 'AMn6ei' and filter tag 'ULF_A'

jid = 'AMn6ei';
filter_tag = 'ULF_A';

jpathlist = datalist(sprintf('[JointStation]ID[%s]prp[%s]*.mat', jid, filter_tag), dir_jointstation).fullpath;

% Retrieve essential data from '[JointStation]' files in the jpathlist:

[AlarmedRate, MissingRate,xlabels, EQKs, TIP3s, TIPv3s, TIPTimes, LatLons] = ...
    calcFittingDegree(jpathlist);

% Plot EQK-TIP on a 2D temporal-spatial coordinates

titletag = sprintf('EQK-TIP (trial ID: %s; filter: %s)', jid, filter_tag)
plotEQKTIP3(dir_png, titletag, xlabels, EQKs, TIP3s, TIPv3s,TIPTimes, LatLons);

EQK-TIP plot in temporal-spatial coordinates explained.

Plot fitting degree

%% Plot fitting degree

dir_png = fullfile(dir_jointstation,'fitting_degree'); mkdir(dir_png);

plotFittingDegree(dir_jointstation,dir_catalog,dir_png,...
    'ConfidenceLevel',0.95);

An example of fitting degree plot.

Plot probability map

%% Plot probability map

% the output directory for the images
dir_prob = fullfile(dir_jointstation,'png','prob'); 

% specific datetime to be plotted
dates2plot = [datetime(2020,12,1):caldays(30):datetime(2021,2,11)]';

% plot GEMS-MagTIP probability map 
% (export individual image for each date)
dir_prob2 = plotProbability(dir_jointstation,dir_catalog,dir_prob, ...
            'TimeRange',dates2plot,...
            'PlotEpicenter','all'); 
);

An example of probability plot. In this figure, the triangle denotes the location of station; circle(s) around each station denote the maximum and minimum range of detection (\(R_C\)) of the models that are responsible for calculating the TIPs; the filled color of the triangle denotes the ratio of valid models of the day; and hollow triangle denotes the station that cannot provide TIP for the day.