Skip to main content

Statistical detection of the influence of solar activities to weak earthquakes


In the literature, it has been hypothesized that the solar wind released by the Sun affects the Earth as a trigger to cause earthquakes. This hypothesis is on the basis of the observation that the frequency of earthquakes rises at the period of solar minimum. In recent years, various physical measurements on the solar wind like velocity and temperature etc. became available. With these data, we focus on investigating the relation between the solar activities and the earthquakes. For this purpose, we constructed generalized auto-regressive models with exogenous variables obeying a Poisson or a negative binomial distribution, in which the response variable is the frequency of earthquakes with Richter magnitude scales 4-4.9 (EQ4-4.9), and the explanatory variables are nine physical measurements about the solar wind, the magnetospheres of the interplanetary magnetic field and the Earth. Model selection was conducted by using Bayesian information criterion based forward stepwise selection. Finally, numerical results showed that the exogenous variables of solar wind are statistically significant for the frequency of EQ4-4.9.


Solar wind is a flow of magnetized plasma released from the upper atmosphere of the Sun. If the incoming solar wind conditions are stationary, the Earth’s magnetosphere is in a quiescent state. When the coronal mass flares occur on the Sun, the quantity of plasma increases and solar wind shocks are generated. When the shocks are transmitted to the Earth, they trigger the disturbance in the magnetosphere which is known as geomagnetic storms. By this way, the Sun sends energy disturbance to the Earth and affects the magnetosphere. Besides the geomagnetic storms, some researchers hypothesized that some earthquakes could be also triggered by the solar wind. This hypothesis is according to the observations that the frequency of earthquake rises at the period of solar minimum [5,6,9]. This research investigates the relation between the earthquakes and the solar wind activities by constructing statistical models.

Because the Earth is a dynamical system with complex stochastic properties, statistical approaches, including point process [7,10] andspectral analysis [4], are often employed to analyze the earthquakes. However, these approaches mainly focused on the data of the earthquakes themselves and few exogenous factors were considered. Different from those approaches, we couple the Sun and the Earth as a dynamical system, in which the variables about the solar activities, the magnetospheres of the interplanetary magnetic field (IMF) and the Earth are the exogenous variables, and the frequency of the earthquakes is the response variable. With such an assumption, auto-regressive models with exogenous variables (ARX) may fit well to the data. Note that the noise terms of ARX models are always assumed to follow an independent normal distribution with a common variance. However, such models will not work well for modeling the frequency of Earthquakes. This is because that the Sun-Earth coupling system is of non-stationary statistical attributes, and the frequency is of discrete type.

To tackle this problem, we introduce a generalized auto-regressive model with exogenous variables (GARX) by combining the generalized additive model with location, scale and shape (GAMLSS) [1] with ARX models. GARX relates the explanatory variables constructed by the past observations to location and scale parameters. For this reason, GARX is relaxed from the normally distributed assumption. Therefore, GARX is more flexible than ARX. Because the response variable, i.e. frequency, is of discrete type, the Poisson distribution and negative binomial distribution based GARXs are investigated. Bayesian information criterion (BIC) [11] is applied to select proper model structures.

The rest of the paper is organized as flows: Section ‘Data description’ describes daily data about the earthquakes, the solar activities and the magnetospheres for seven years. Section ‘GARX models for earthquakes’ introduces GARX and its model selection. In Section ‘Analysis results’, the data are analyzed by GARX, and it is shown that the solar wind statistically affects earthquakes with Richter magnitude scales 4-4.9 (EQ4-4.9). Finally, conclusions are stated in Section ‘Conclusions’.

Data description

In this section, the time series data about the earthquakes, the solar activities, and the magnetospheres (01/01/2006–12/31/2012) are introduced.

2.1 Daily frequencies of earthquakes

The daily earthquake data are downloadable from the ANNS database of northern California earthquake data center [2], which provides accurate and timely data. Table 1 illustrates the frequency of the earthquakes whose Richter magnitude scales are larger than 3 (M≥3). Note that the earthquakes with M≥8 rarely occurred, we combined the earthquakes with M≥8 into one column, i.e. EQ8-9.9.

Table 1 Frequency of earthquakes (n = 2,557)

Figure 1 plots the time series of the earthquakes by the magnitude scales. The data contain the earthquake M=7.2 (04/05/2010) occurred in Estado de Baja California of Mexico, and the Touhoku earthquake M=9.0 (03/11/2011) occurred in north-east of Japan. Because large earthquakes always cause aftershocks, the frequency of the earthquakes itself is also taken as the exogenous variables by GARX.

Figure 1

Time series of frequency of earthquakes (01/01/2006–12/31/2012).

2.2 Daily solar activities and magnetospheres

As illustrated by Table 2, nine exogenous variables about the solar activities and the magnetospheres were used in this research. The daily data of these variables are downloadable from the OMNIWeb database supported by NASA [8], which provides magnetic field, plasma, and energetic particle data relevant to the heliospheric. Table 3 and Figure 2 illustrate the measurements and the time series plot of the nine variables, respectively.

Figure 2

Time series of solar activities and magnetospheres (01/01/2006–12/31/2012).

Table 2 Exogenous variables and abbreviations
Table 3 Measurements of solar activities and magnetospheres

To model the earthquakes, the frequency of the earthquakes EQ4-4.9 is taken as the response variable. Then, two types of GARX models are constructed: first one takes only the frequencies of the earthquakes other than EQ4-4.9 as the exogenous variables; second one includes additional variables about the solar activities and the magnetospheres. In such a way, we try to investigate the relation between the earthquakes and the solar activities by comparing these models. In what following, GARX for the earthquakes will be introduced.

GARX models for earthquakes

3.1 GARX models

Let y t and \(\left \{u^{(1)}_{t},u^{(2)}_{t},\ldots, u^{(p)}_{t}\right \}\) denote the response variable and p−dimensional exogenous variables at time tn, respectively. Moreover, assume that the response variable y t follows a probability density function f(y t | μ t ,σ t ) specified by {μ t ,σ t }. Here, μ t and σ t are location and scale parameters respectively. Then a GARX model is formulated as follows:

$$\begin{array}{@{}rcl@{}} g_{1}(\mu_{t}) &=& \beta_{10} + \beta^{T}_{1}x_{1 t} \end{array} $$
$$\begin{array}{@{}rcl@{}} g_{2}(\sigma_{t}) &=& \beta_{20} + \beta^{T}_{2}x_{2 t}. \end{array} $$

Here, x i t is the explanatory variable vector given by

$$\left(y_{t-1},\ldots, y_{t-l_{iy}},u_{t-1}^{(1)},\ldots, u_{t-l_{i1}}^{(1)},\ldots,u_{t-1}^{(p)},\ldots, u_{t-l_{in}}^{(p)}\right)^{T} $$

with l i y ,l i1,…,l i p being the maximum time lags of each variable, where g i is a link function, and β i is a coefficient vector for i=1,2.

If the conditional distributions of y t given x 1t and x 2t are independent normal with \(\mu _{t} = \beta _{10} + \beta ^{T}_{1}x_{1 t}\) and log(σ t )=β 20, the model is the ordinary Gaussian ARX. Therefore, GARX captures the dynamical features not only for the location but the scale parameter of a probability distribution. Here we note that GARX is not limited to the normal distribution assumption anymore. It can handle the non-stationary attributes of the time series.

Let l= max{l 1y ,l 11,…,l 1p ,l 2y ,l 21,…,l 2p } be the maximum time lag, B t the set constructed by the observations of the response and the exogenous variables up to time t, f(y 0,…,y l−1) the initial distribution which is not specified here, and Θ={β 10,β 1,β 20,β 2} the set of model parameters. Then, the likelihood can be expressed by the following

$$\begin{array}{@{}rcl@{}} L(\Theta) &=& f(y_{0}, y_{1}, \ldots, y_{n}|\ B_{n-1}, \Theta)\\ &=& f(y_{0}, y_{1}, \ldots, y_{l-1})\prod_{t = l }^{n} f(y_{t}|\ B_{t-1},\Theta). \end{array} $$

Consequently, the parameter set Θ can be estimated by using the maximum likelihood method, i.e. \(\hat {\Theta } = \arg \max L(\Theta)\).

3.2 GARX based on Poisson and negative binomial distributions

Note that the frequency of the earthquakes takes non-negative integers. For this reason, we first assume that y t obeys a Poisson (PO) regression model whose mean μ t is specified by the vector x 1t . PO distribution is specified by the mean parameter only, and the mean of the PO regression is expressed by

$$\begin{array}{@{}rcl@{}} \log(\mu_{t}) &=& \beta_{10} + \beta^{T}_{1}x_{1 t}. \end{array} $$

This should be called an auto PO regression model.

Figure 1 shows daily EQ frequencies from Jan. 1, 2006 to Dec. 31, 2012. It is seen that several irregular peaks are detected for EQ4-4.9 caused by giant earthquakes. Actually, the sample variance 409.05 of EQ4-4.9 is much larger than the sample mean 30.36.

In general, the variance of PO distribution is exactly same as its mean, whereas the variance of the negative binomial (NB) distribution is always greater than its mean. Therefore, we expect that NB model is superior to PO regression for EQ4-4.9. Hence, y t is fitted also by the NB distribution with mean μ t and sigma parameter σ t . The corresponding GARX model can be written as follows:

$$\begin{array}{@{}rcl@{}} \log(\mu_{t}) &=& \beta_{10} + \beta^{T}_{1}x_{1 t} \end{array} $$
$$\begin{array}{@{}rcl@{}} \log(\sigma_{t}) &=& \beta_{20} + \beta^{T}_{2}x_{2 t}. \end{array} $$

For an application of PO regressions, readers are referred to [3].

3.3 Model selection and evaluation

For the auto PO regressions and the NB distribution based GARX models, the appropriate variables as well as time lags comprised in x 1t and x 2t should be selected. In this research, BIC is used for model selection. Furthermore, because the GARX models for the frequency of the earthquakes have 14 exogenous variables, it is difficult to find out the optimal model structures according to the exhaustive search. Thus, we take the forward stepwise selection method based on BIC. For the NB distribution based GARX models, first, the forward stepwise method is used to select proper time lags for the mean in Eq. (5). Second, by fixing the mean structure, the forward stepwise method is again applied to determine the time lags for the sigma parameter in Eq. (6).

To measure the fitting performance of statistical models, the coefficient of determination:

$$ R^{2} = 1 - {\frac{\sum\limits_{t}(y_{t}-\hat{y}_{t})^{2}}{\sum\limits_{t}(y_{t}-\bar{y})^{2}}} $$

is applied. Here, \(\hat {y}_{t}\) is the predicted value of y t obtained by the model, and \(\bar {y}\) is the sample mean of y t .

Analysis results

The earthquake is motion to cancel distortion based on various causes of underground bedrock. Therefore, solar activities (SA’s) may cause only weak earthquakes. At first, we applied auto PO regression models to EQ3-3.9, and parameters were successfully estimated, whereas parameters of NB based GARX did not converge. This may come from the fact that the mean structure has changed around 2009. (Recall the time series of EQ3-3.9 in Figure 1). Furthermore, these models were very poor for large earthquakes EQ5-5.9 and more. In this section, we only show the estimation results on EQ4-4.9 as the response variable.

On the basis of the GARX models introduced in the previous section, we constructed one-step-ahead models for the frequency of the earthquakes EQ4-4.9. We set the maximum time lag for model search to 14 (days), and proper time lags equal to or smaller than 14 were selected, i.e. the past two-week observations were considered to predict a frequency of EQ4-4.9 of the next day. The computations were conducted by using the R package named GAMLSS [1].

Models 1-1, 2-1 and 3-1 in Table 4 show the optimal models when EQ’s are used as the exogenous variables for the prediction of EQ4-4.9. PO (μ t ) means the auto PO regression. NB (μ t ,σ) and NB (μ t ,σ t ) mean the GARX models based on NB distributions with common and time-varying sigma parameters, respectively. The sigma parameter of Model 3-1 is estimated when its mean is fixed by the mean of Model 2-1. Time lags of each response variable are listed in Table 4. For example, time lags 1-5 mean that the variables at time t-1, t-2, …, t-5 are used in the GARX models.

Table 4 The optimal hierarchic models for EQ4-4.9 with exogenous variables {EQ’s} or {EQ’s, SA’s}

Models 1-2, 2-2 and 3-2 in Table 4 examined additional effects of SA’s. Model 1-2 is derived by adding the optimal SA’s to the mean of Model 1-1. Similarly, the additional effects to Models 2-1 and 3-1 are evaluated by Models 2-2 and 3-2 respectively.

Table 4 indicates that: (a) PO (μ t ) is of the highest R 2, however, BIC is larger than other two models; (b) The structures of NB (μ t ,σ) and NB (μ t ,σ t ) are much simpler than PO (μ t ), and NB (μ t ,σ t ) has the minimum BIC. This observation is valid to the case whose exogenous variables are {EQ’s} or {EQ’s, SA’s}.

Here, we examine the effect of SA’s to EQ4-4.9. The log likelihood ratio statistic testing of the additional effect from Models 1-1 to 1-2 is given by

$$\begin{array}{@{}rcl@{}} & & 2 \log \{ L(\text{Model 1-2}) / L(\text{Model 1-1}) \} \\ &=& \text{BIC(Model 1-1)} - \text{BIC(Model 1-2)} + \log(n)\times42 \\ &=& 23015.3 - 22625.4 + 329.3 = 719.2 \end{array} $$

where L(M) denotes the maximum likelihood of model M, n=2557−14 is a sample size, and 42 denotes a number of additional SA variables of Model 1-2. Under the null hypothesis: SA’s cause no effect to EQ4-4.9, the log likelihood ratio asymptotically follows a chi-square distribution with 42 degrees of freedom because they are hierarchic each other. Obviously, SA’s are highly significant. Similar comparisons of Models 2-1 vs 2-2 and Models 3-1 vs 3-2 show that the SA’s have extremely significant effect to EQ4-4.9.

Table 5 shows the optimal GARX models including additional exogenous variables about the solar activities and the magnetospheres. The results show that: (a) PO (μ t ) remains to have the highest R 2, and BIC is improved, compared with Model 1-2 in Table 4; (b) R 2 and BIC in Table 5 are all improved, compared with corresponding models in Table 4; (c) NB (μ t ,σ t ) in this table has the minimum BIC among the models in both tables.

Table 5 The optimal GARX models for EQ4-4.9 in terms of earthquakes and solar activities

Comparing the coefficients of determination, it is observed that PO regressions fit well to high frequency data than NB based regressions do. However, comparing of BIC values, we can conclude that the NB based regressions are superior to the PO regressions. Also we can see that the exogenous variables about the solar activities and the magnetospheres improved each GARX model. The improvement of R 2 is not so large, but still these variables are statistically significant for EQ4-4.9.


In this research, we investigated the relation between the solar activities and the earthquakes. We constructed the GARX models for the earthquakes 4≤M≤4.9, on the basis of the Poisson and the negative binomial distributions. The GARX models in the previous section show that:

  1. 1.

    The PO regressions always tried to fit large values of the frequency in the data and consequently selected complex models, although they had relative high coefficients of determination.

  2. 2.

    The negative binomial distribution based GARX models were simpler than the auto PO regression, meanwhile, they had smaller BIC values.

  3. 3.

    Comparing Tables 4 and 5, the GARX models with the exogenous variables about the solar activities and the magnetospheres improved both the coefficient of determination and BIC. That is, these variables are statistically significant for EQ4-4.9.

We have also tried to construct the models for the earthquakes M≥5, however, we cannot find that the variables about the solar activities and the magnetospheres can improve the GARX models.

It is obvious that the GARX models for the earthquakes are far from prediction, especially for the large earthquakes with extremely complex nonlinear dynamics. In addition, the large earthquakes can cause the high frequency of the aftershocks in a short period. For example, Figure 1 shows that gathered aftershocks caused by the Touhoku earthquake happened in north-east Japan. This makes the frequency of the weak earthquakes cannot obey a single probability distribution like the negative binomial distribution. For this reason, mixture distributions will be considered for GARX in the future.

In the past 20 years, a lot of novel geophysics and space data become available, with respect to the developments of the technologies of sensing and measurements. Although the earthquakes are not predicable for now, we can try to reveal the relations among the earthquakes, the earth environment and the solar activities statistically, on the basis of various models and data.


  1. 1

    Ahmed, Z., Rigby, R.A., Stasinopoulos, D.M.: Generalized additive models for location, scale and shape, (with discussion). Appl. Stat. 54, 507–554 (2005).

    Google Scholar 

  2. 2

    ANNS: Accessed 15 February 2014.

  3. 3

    Braga, A., Bond, B.: Policing crime and disorder hot spots: a randomized controlled trial. Criminology. 46, 577–607 (2008).

    Article  Google Scholar 

  4. 4

    Cuomo, V., Lapenna, V., Macchiato, M., Serio, C.: Autoregressive models as a tool to discriminate chaos from randomness in geoelectrical time series: an application to earthquake prediction. Ann. Geophys. 40, 385–400 (1997).

    Google Scholar 

  5. 5

    Huzaimy, J.M., Yumoto, K.: Possible correlation between solar activity and global seismicity. IEEE Int. Conf. Space Sci. Commun (2011).

  6. 6

    Odintsov, S.D., Ivanov-Kholodnyi, G.S., Georgieva, K.: Solar activity and global seismicity of the earth. Bull. Russ. Acad. Sci. Phys. 71, 593–595 (2007).

    Article  Google Scholar 

  7. 7

    Ogata, Y., Zhuang, J.: Space-time ETAS models and an improved extension. Tectonophysics. 413, 13–23 (2006).

    Article  Google Scholar 

  8. 8

    OMNIWeb: Accessed 15 February 2014.

  9. 9

    Palumbo, A.: Gravitational and geomagnetic tidal source of earthquake triggering. IL Nuovo Cimento C. 12, 685–693 (1989).

    Article  Google Scholar 

  10. 10

    Schoenberg, F.P.: Multidimensional residual analysis of point process models for earthquake occurrences. J. Am. Statist. Ass. 98, 789–795 (2004).

    Article  MathSciNet  Google Scholar 

  11. 11

    Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978).

    Article  MATH  Google Scholar 

Download references


The authors would like to thank the reviewer for his/her valuable comments and advice to improve the paper. In addition, the authors would like to thank Professors K. Yumoto and T. Hada with Kyushu University. They called our attention to this issue and gave us helpful comments. The research was supported by the “Fundamental Research Funds for the Central Universities” of China, the joint research fund of ICSWSE, Kyushu University, and Grant-in-Aid for Scientific Research (B) #23300106.

Author information



Corresponding author

Correspondence to Ryuei Nishii.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Qin, P., Yamasaki, T. & Nishii, R. Statistical detection of the influence of solar activities to weak earthquakes. Pac. J. Math. Ind. 6, 6 (2014).

Download citation


  • Earthquake
  • Solar wind
  • Generalized time series model
  • Model selection
  • Solar activity