 ORIGINAL ARTICLE
 Open Access
 Published:
Statistical detection of the influence of solar activities to weak earthquakes
Pacific Journal of Mathematics for Industry volume 6, Article number: 6 (2014)
Abstract
In the literature, it has been hypothesized that the solar wind released by the Sun affects the Earth as a trigger to cause earthquakes. This hypothesis is on the basis of the observation that the frequency of earthquakes rises at the period of solar minimum. In recent years, various physical measurements on the solar wind like velocity and temperature etc. became available. With these data, we focus on investigating the relation between the solar activities and the earthquakes. For this purpose, we constructed generalized autoregressive models with exogenous variables obeying a Poisson or a negative binomial distribution, in which the response variable is the frequency of earthquakes with Richter magnitude scales 44.9 (EQ44.9), and the explanatory variables are nine physical measurements about the solar wind, the magnetospheres of the interplanetary magnetic field and the Earth. Model selection was conducted by using Bayesian information criterion based forward stepwise selection. Finally, numerical results showed that the exogenous variables of solar wind are statistically significant for the frequency of EQ44.9.
Introduction
Solar wind is a flow of magnetized plasma released from the upper atmosphere of the Sun. If the incoming solar wind conditions are stationary, the Earth’s magnetosphere is in a quiescent state. When the coronal mass flares occur on the Sun, the quantity of plasma increases and solar wind shocks are generated. When the shocks are transmitted to the Earth, they trigger the disturbance in the magnetosphere which is known as geomagnetic storms. By this way, the Sun sends energy disturbance to the Earth and affects the magnetosphere. Besides the geomagnetic storms, some researchers hypothesized that some earthquakes could be also triggered by the solar wind. This hypothesis is according to the observations that the frequency of earthquake rises at the period of solar minimum [5,6,9]. This research investigates the relation between the earthquakes and the solar wind activities by constructing statistical models.
Because the Earth is a dynamical system with complex stochastic properties, statistical approaches, including point process [7,10] andspectral analysis [4], are often employed to analyze the earthquakes. However, these approaches mainly focused on the data of the earthquakes themselves and few exogenous factors were considered. Different from those approaches, we couple the Sun and the Earth as a dynamical system, in which the variables about the solar activities, the magnetospheres of the interplanetary magnetic field (IMF) and the Earth are the exogenous variables, and the frequency of the earthquakes is the response variable. With such an assumption, autoregressive models with exogenous variables (ARX) may fit well to the data. Note that the noise terms of ARX models are always assumed to follow an independent normal distribution with a common variance. However, such models will not work well for modeling the frequency of Earthquakes. This is because that the SunEarth coupling system is of nonstationary statistical attributes, and the frequency is of discrete type.
To tackle this problem, we introduce a generalized autoregressive model with exogenous variables (GARX) by combining the generalized additive model with location, scale and shape (GAMLSS) [1] with ARX models. GARX relates the explanatory variables constructed by the past observations to location and scale parameters. For this reason, GARX is relaxed from the normally distributed assumption. Therefore, GARX is more flexible than ARX. Because the response variable, i.e. frequency, is of discrete type, the Poisson distribution and negative binomial distribution based GARXs are investigated. Bayesian information criterion (BIC) [11] is applied to select proper model structures.
The rest of the paper is organized as flows: Section ‘Data description’ describes daily data about the earthquakes, the solar activities and the magnetospheres for seven years. Section ‘GARX models for earthquakes’ introduces GARX and its model selection. In Section ‘Analysis results’, the data are analyzed by GARX, and it is shown that the solar wind statistically affects earthquakes with Richter magnitude scales 44.9 (EQ44.9). Finally, conclusions are stated in Section ‘Conclusions’.
Data description
In this section, the time series data about the earthquakes, the solar activities, and the magnetospheres (01/01/2006–12/31/2012) are introduced.
2.1 Daily frequencies of earthquakes
The daily earthquake data are downloadable from the ANNS database of northern California earthquake data center [2], which provides accurate and timely data. Table 1 illustrates the frequency of the earthquakes whose Richter magnitude scales are larger than 3 (M≥3). Note that the earthquakes with M≥8 rarely occurred, we combined the earthquakes with M≥8 into one column, i.e. EQ89.9.
Figure 1 plots the time series of the earthquakes by the magnitude scales. The data contain the earthquake M=7.2 (04/05/2010) occurred in Estado de Baja California of Mexico, and the Touhoku earthquake M=9.0 (03/11/2011) occurred in northeast of Japan. Because large earthquakes always cause aftershocks, the frequency of the earthquakes itself is also taken as the exogenous variables by GARX.
2.2 Daily solar activities and magnetospheres
As illustrated by Table 2, nine exogenous variables about the solar activities and the magnetospheres were used in this research. The daily data of these variables are downloadable from the OMNIWeb database supported by NASA [8], which provides magnetic field, plasma, and energetic particle data relevant to the heliospheric. Table 3 and Figure 2 illustrate the measurements and the time series plot of the nine variables, respectively.
To model the earthquakes, the frequency of the earthquakes EQ44.9 is taken as the response variable. Then, two types of GARX models are constructed: first one takes only the frequencies of the earthquakes other than EQ44.9 as the exogenous variables; second one includes additional variables about the solar activities and the magnetospheres. In such a way, we try to investigate the relation between the earthquakes and the solar activities by comparing these models. In what following, GARX for the earthquakes will be introduced.
GARX models for earthquakes
3.1 GARX models
Let y _{ t } and $\left \{u^{(1)}_{t},u^{(2)}_{t},\ldots, u^{(p)}_{t}\right \}$ denote the response variable and p−dimensional exogenous variables at time t≤n, respectively. Moreover, assume that the response variable y _{ t } follows a probability density function f(y _{ t } μ _{ t },σ _{ t }) specified by {μ _{ t },σ _{ t }}. Here, μ _{ t } and σ _{ t } are location and scale parameters respectively. Then a GARX model is formulated as follows:
Here, x _{ i t } is the explanatory variable vector given by
with l _{ i y },l _{ i1},…,l _{ i p } being the maximum time lags of each variable, where g _{ i } is a link function, and β _{ i } is a coefficient vector for i=1,2.
If the conditional distributions of y _{ t } given x _{1t } and x _{2t } are independent normal with $\mu _{t} = \beta _{10} + \beta ^{T}_{1}x_{1 t}$ and log(σ _{ t })=β _{20}, the model is the ordinary Gaussian ARX. Therefore, GARX captures the dynamical features not only for the location but the scale parameter of a probability distribution. Here we note that GARX is not limited to the normal distribution assumption anymore. It can handle the nonstationary attributes of the time series.
Let l= max{l _{1y },l _{11},…,l _{1p },l _{2y },l _{21},…,l _{2p }} be the maximum time lag, B _{ t } the set constructed by the observations of the response and the exogenous variables up to time t, f(y _{0},…,y _{ l−1}) the initial distribution which is not specified here, and Θ={β _{10},β _{1},β _{20},β _{2}} the set of model parameters. Then, the likelihood can be expressed by the following
Consequently, the parameter set Θ can be estimated by using the maximum likelihood method, i.e. $\hat {\Theta } = \arg \max L(\Theta)$ .
3.2 GARX based on Poisson and negative binomial distributions
Note that the frequency of the earthquakes takes nonnegative integers. For this reason, we first assume that y _{ t } obeys a Poisson (PO) regression model whose mean μ _{ t } is specified by the vector x _{1t }. PO distribution is specified by the mean parameter only, and the mean of the PO regression is expressed by
This should be called an auto PO regression model.
Figure 1 shows daily EQ frequencies from Jan. 1, 2006 to Dec. 31, 2012. It is seen that several irregular peaks are detected for EQ44.9 caused by giant earthquakes. Actually, the sample variance 409.05 of EQ44.9 is much larger than the sample mean 30.36.
In general, the variance of PO distribution is exactly same as its mean, whereas the variance of the negative binomial (NB) distribution is always greater than its mean. Therefore, we expect that NB model is superior to PO regression for EQ44.9. Hence, y _{ t } is fitted also by the NB distribution with mean μ _{ t } and sigma parameter σ _{ t }. The corresponding GARX model can be written as follows:
For an application of PO regressions, readers are referred to [3].
3.3 Model selection and evaluation
For the auto PO regressions and the NB distribution based GARX models, the appropriate variables as well as time lags comprised in x _{1t } and x _{2t } should be selected. In this research, BIC is used for model selection. Furthermore, because the GARX models for the frequency of the earthquakes have 14 exogenous variables, it is difficult to find out the optimal model structures according to the exhaustive search. Thus, we take the forward stepwise selection method based on BIC. For the NB distribution based GARX models, first, the forward stepwise method is used to select proper time lags for the mean in Eq. (5). Second, by fixing the mean structure, the forward stepwise method is again applied to determine the time lags for the sigma parameter in Eq. (6).
To measure the fitting performance of statistical models, the coefficient of determination:
is applied. Here, $\hat {y}_{t}$ is the predicted value of y _{ t } obtained by the model, and $\bar {y}$ is the sample mean of y _{ t }.
Analysis results
The earthquake is motion to cancel distortion based on various causes of underground bedrock. Therefore, solar activities (SA’s) may cause only weak earthquakes. At first, we applied auto PO regression models to EQ33.9, and parameters were successfully estimated, whereas parameters of NB based GARX did not converge. This may come from the fact that the mean structure has changed around 2009. (Recall the time series of EQ33.9 in Figure 1). Furthermore, these models were very poor for large earthquakes EQ55.9 and more. In this section, we only show the estimation results on EQ44.9 as the response variable.
On the basis of the GARX models introduced in the previous section, we constructed onestepahead models for the frequency of the earthquakes EQ44.9. We set the maximum time lag for model search to 14 (days), and proper time lags equal to or smaller than 14 were selected, i.e. the past twoweek observations were considered to predict a frequency of EQ44.9 of the next day. The computations were conducted by using the R package named GAMLSS [1].
Models 11, 21 and 31 in Table 4 show the optimal models when EQ’s are used as the exogenous variables for the prediction of EQ44.9. PO (μ _{ t }) means the auto PO regression. NB (μ _{ t },σ) and NB (μ _{ t },σ _{ t }) mean the GARX models based on NB distributions with common and timevarying sigma parameters, respectively. The sigma parameter of Model 31 is estimated when its mean is fixed by the mean of Model 21. Time lags of each response variable are listed in Table 4. For example, time lags 15 mean that the variables at time t1, t2, …, t5 are used in the GARX models.
Models 12, 22 and 32 in Table 4 examined additional effects of SA’s. Model 12 is derived by adding the optimal SA’s to the mean of Model 11. Similarly, the additional effects to Models 21 and 31 are evaluated by Models 22 and 32 respectively.
Table 4 indicates that: (a) PO (μ _{ t }) is of the highest R ^{2}, however, BIC is larger than other two models; (b) The structures of NB (μ _{ t },σ) and NB (μ _{ t },σ _{ t }) are much simpler than PO (μ _{ t }), and NB (μ _{ t },σ _{ t }) has the minimum BIC. This observation is valid to the case whose exogenous variables are {EQ’s} or {EQ’s, SA’s}.
Here, we examine the effect of SA’s to EQ44.9. The log likelihood ratio statistic testing of the additional effect from Models 11 to 12 is given by
where L(M) denotes the maximum likelihood of model M, n=2557−14 is a sample size, and 42 denotes a number of additional SA variables of Model 12. Under the null hypothesis: SA’s cause no effect to EQ44.9, the log likelihood ratio asymptotically follows a chisquare distribution with 42 degrees of freedom because they are hierarchic each other. Obviously, SA’s are highly significant. Similar comparisons of Models 21 vs 22 and Models 31 vs 32 show that the SA’s have extremely significant effect to EQ44.9.
Table 5 shows the optimal GARX models including additional exogenous variables about the solar activities and the magnetospheres. The results show that: (a) PO (μ _{ t }) remains to have the highest R ^{2}, and BIC is improved, compared with Model 12 in Table 4; (b) R ^{2} and BIC in Table 5 are all improved, compared with corresponding models in Table 4; (c) NB (μ _{ t },σ _{ t }) in this table has the minimum BIC among the models in both tables.
Comparing the coefficients of determination, it is observed that PO regressions fit well to high frequency data than NB based regressions do. However, comparing of BIC values, we can conclude that the NB based regressions are superior to the PO regressions. Also we can see that the exogenous variables about the solar activities and the magnetospheres improved each GARX model. The improvement of R ^{2} is not so large, but still these variables are statistically significant for EQ44.9.
Conclusions
In this research, we investigated the relation between the solar activities and the earthquakes. We constructed the GARX models for the earthquakes 4≤M≤4.9, on the basis of the Poisson and the negative binomial distributions. The GARX models in the previous section show that:

1.
The PO regressions always tried to fit large values of the frequency in the data and consequently selected complex models, although they had relative high coefficients of determination.

2.
The negative binomial distribution based GARX models were simpler than the auto PO regression, meanwhile, they had smaller BIC values.

3.
Comparing Tables 4 and 5, the GARX models with the exogenous variables about the solar activities and the magnetospheres improved both the coefficient of determination and BIC. That is, these variables are statistically significant for EQ44.9.
We have also tried to construct the models for the earthquakes M≥5, however, we cannot find that the variables about the solar activities and the magnetospheres can improve the GARX models.
It is obvious that the GARX models for the earthquakes are far from prediction, especially for the large earthquakes with extremely complex nonlinear dynamics. In addition, the large earthquakes can cause the high frequency of the aftershocks in a short period. For example, Figure 1 shows that gathered aftershocks caused by the Touhoku earthquake happened in northeast Japan. This makes the frequency of the weak earthquakes cannot obey a single probability distribution like the negative binomial distribution. For this reason, mixture distributions will be considered for GARX in the future.
In the past 20 years, a lot of novel geophysics and space data become available, with respect to the developments of the technologies of sensing and measurements. Although the earthquakes are not predicable for now, we can try to reveal the relations among the earthquakes, the earth environment and the solar activities statistically, on the basis of various models and data.
References
 1
Ahmed, Z., Rigby, R.A., Stasinopoulos, D.M.: Generalized additive models for location, scale and shape, (with discussion). Appl. Stat. 54, 507–554 (2005).
 2
ANNS:http://earthquake.usgs.gov/monitoring/anss/. Accessed 15 February 2014.
 3
Braga, A., Bond, B.: Policing crime and disorder hot spots: a randomized controlled trial. Criminology. 46, 577–607 (2008).
 4
Cuomo, V., Lapenna, V., Macchiato, M., Serio, C.: Autoregressive models as a tool to discriminate chaos from randomness in geoelectrical time series: an application to earthquake prediction. Ann. Geophys. 40, 385–400 (1997).
 5
Huzaimy, J.M., Yumoto, K.: Possible correlation between solar activity and global seismicity. IEEE Int. Conf. Space Sci. Commun (2011).
 6
Odintsov, S.D., IvanovKholodnyi, G.S., Georgieva, K.: Solar activity and global seismicity of the earth. Bull. Russ. Acad. Sci. Phys. 71, 593–595 (2007).
 7
Ogata, Y., Zhuang, J.: Spacetime ETAS models and an improved extension. Tectonophysics. 413, 13–23 (2006).
 8
OMNIWeb:http://omniweb.gsfc.nasa.gov. Accessed 15 February 2014.
 9
Palumbo, A.: Gravitational and geomagnetic tidal source of earthquake triggering. IL Nuovo Cimento C. 12, 685–693 (1989).
 10
Schoenberg, F.P.: Multidimensional residual analysis of point process models for earthquake occurrences. J. Am. Statist. Ass. 98, 789–795 (2004).
 11
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978).
Acknowledgements
The authors would like to thank the reviewer for his/her valuable comments and advice to improve the paper. In addition, the authors would like to thank Professors K. Yumoto and T. Hada with Kyushu University. They called our attention to this issue and gave us helpful comments. The research was supported by the “Fundamental Research Funds for the Central Universities” of China, the joint research fund of ICSWSE, Kyushu University, and GrantinAid for Scientific Research (B) #23300106.
Author information
Rights and permissions
About this article
Received
Revised
Accepted
Published
DOI
Keywords
 Earthquake
 Solar wind
 Generalized time series model
 Model selection
 Solar activity