MICEX INDEX FORECASTING : THE PREDICTIVE POWER OF NEURAL NETWORK MODELING AND SUPPORT VECTOR MACHINE

Agata M. Lozinskaia ORCID ID: 0000-0001-8723-275X, Researcher ID: L-6971-2015 E-mail: AMPoroshina@gmail.com National Research University Higher School of Economics 27, Lebedeva st., Perm, 614070, Russia Victor A. Zhemchuzhnikov ORCID ID: 0000-0001-6513-3611, Researcher ID: D-2384-2017 E-mail: viktr5909@mail.ru National Research University Higher School of Economics 27, Lebedeva st., Perm, 614070, Russia


Introduction
he MICEX stock price index is the major indicator of the Russian stock market behavior which is calculated based on 50 most liquid Russian companies stocks.Especially, investors are interested in the MICEX index prediction to develop an optimal trading strategy, including futures trading.However, this task is complicated under the almost continuous data stream.For data stream processing, modern computing systems and machine learning techniques can be employed.The key feature of machine learning techniques is their ability to remember existing data relationships and to adapt to new ones.Neural networks and support vector machines became wi-despread in the financial time series modeling among different machine learning techniques.T Note that according to the Efficient Market Hypothesis (EMH), the asset price fully reflects all currently available public information.It means that it is impossible to predict a financial instrument behavior better than using the random walk model because all available information is already reflected in the asset price.However, the efficiency of financial markets continues to be studied because EMH is difficult to be tested [1].
This paper examines the predictive power of machine learning methods, such as neural network and support vector machine, in the MICEX index returns forecasting.Since the Russian stock market is an emerging market, and therefore, dynamically changing, verification and updating of existing empirical results are an important topic in the field of modeling and analyzing financial markets.This work represents one of the first known in the academic literature attempts to apply support vector machine for predicting the Russian stock index.
Literature review ne of the earliest works on the application of neural network modeling to the prediction of the financial instruments behavior (IBM stocks returns) belongs to H. White [2].He demonstrates the high predictive power of a neural network to forecast of the financial instruments behavior in his later works, for example [3].The advantages of a neural network in comparison with the classical time series models such as ARIMA include the ability to model time series forecast automatically, the lack of subjectivity in choosing the best model, flexibility and nonlinearity, etc. [4].
At the same time H. White [2] and the authors of recent studies, for example, R. Jammazi and S. Aloui [3], N.A. Valiotti and V.L. Abbakumov [4], I. Kaastra et al. [5] pointed out that the practical application of the neural network is associated with two main problems: the selection of independent variables and the neural network architecture selection.The first problem is not only typical for neural networks but it is an important step in the development of any financial time series prediction model.An automatic "filter-ing" of irrelevant variables is possible through the update of weights in neural networks during training process [6].As it was shown by R. Pakath et al. [7], an increase in a number of irrelevant variables can lead to a deterioration of prediction.Expert knowledge is commonly used to find the optimal set of factors.In particular, factors that influence the financial time series behavior are usually divided into two categories: technical and fundamental.Technical indicators are based on the different transformations of the financial instrument values, such as lagging, calculation of moving average, etc.They are presented in Table .1. Fundamental variables reflect factors describing the fundamental state of the economy, such as macroeconomic indicators.
In this paper expert knowledge, especially the set of factors that determines the Russian stock market behavior is based on the results of previous studies.Generally, in previous papers, fundamental indicators are used.For example, the growing dependence of the Russian stock market from external factors, mostly from the US stock market (for example, the MSCI index for US behavior and the rate on short-term Treasury bonds), has been demonstrated in the paper by S.A. Anatolyev [8].The author used the regression analysis on MSCI index for Russia (1995-2005 years) and his findings relevant to the study by A.A. Peresetsky [9] on an explanation of the MICEX index daily returns (2001-2011 years).By using ARIMA and GARCH models, A.A. Peresetsky found a statistically significant effect for the one-day lagged MICEX index returns, the one-day lagged American S&P500 index returns, the one-day lagged WTI oil price and the Japanese NIKKEI index returns.Results of the papers [8] and [9] signal that not only certain indicators of American and Japanese stock market impact on the Russian stock market but also there is the lagged effect of the MICEX index returns on its current dynamics.Additionally, A.A. Peresetsky [9] using regression with a moving window found that the dependence of the Russian stock market on oil prices was weakened.
O The second problem of practical application of neural network modeling is related to the selection of neural network architecture.Basically, it is the selection of connection type between neurons, a number of hidden layers and number of neurons in hidden layers.The currently known number of connection types is finite and relatively small.Feed-forward neural networks are widely used for financial time series modeling that can approximate any measurable function given a sufficient number of hidden neurons [17].To choose the number of hidden layers and the number of neurons, empirical rules are mostly used.For example, a large number of hidden layers will lead to neural network overfitting problem as it was shown in the paper [5].For this reason, one or two hidden layers are sufficient in most cases.Furthermore, when the neural network with two hidden layers demonstrates unsatisfactory results, it is necessary to reanalyze input data before adding an additional hidden layer.
An excessive number of hidden neurons may also cause an overfitting problem [18].This leads to an inability for the neural network to predict adequately any data other than a training sample.
There are numerous problems of neural network modeling practical application faced by researchers along with the above-mentioned problems.These are issues of data preprocessing, the selection of learning algorithm and learning parameters, and the selection of a criterion to choose the best model that is discussed in the methodological part of this research.
Support vector machine (SVM) [19] is an alternative method to a neural network that is used to predict financial time series.Initially, SVM was used for classification, but later on, the regression support vector regression as the modification of the support vector machine was developed to predict financial time series.The detailed description of support vector machine is presented in the methodology section of this paper.Earlier studies [10; 20] have shown that overfitting is unlikely to occur with the support vector machine and the learning process is guaranteed to converge to a global minimum in comparison with a neural network.A risk of underfitting with SVM remains, but it is relatively small.It usually occurs when a large number of degrees of freedoms in the neural network [14] or very small values of learning algorithm parameters are used.Support vector machine is quite successfully applied to financial time series prediction, and the results of empirical studies demonstrate its higher predictive power in comparison with the neural network [10; 14].
The current paper continues the existing studies in the area of financial time series forecasting and examines the predictive power of neural network modeling and support vector machine.The methodology section of the research is based on the existing in modern literature expert knowledge about the factors that determine dynamics of financial time series, as well as theoretical and empirical rules for the implementation of the above-mentioned machine learning methods.The distinctive features of the study can be summed up as follows: we analyze a longer time period and a wide range of fundamental and technical indicators; we perform the sensitivity analysis of the obtained neural network modeling results to a high instability of the Russian stock market that previously found in the literature [8; 9].Data he data used in this study is the daily MICEX stock price index for the period from January 15, 2002, to April 25, 2016 1 , and includes 3559 observations.For the analyzed time period, we collect information on such fundamentals as the S&P500 index and DAX index, exchange rate for Russian ruble to US dollar and Brent crude oil price 2 .In addition, we use information from websites of Central Bank of Russia3 and US Department of Treasury4 on the dynamics of 1-month Moscow Interbank Offered Rate and 3-month Treasury Bill rates respectively.We select technical and fundamental indicators taking into account that they are frequently used in prior research; they have the ability to explain the behavior of individual indicators of Russian stock market that previously found in the literature [8; 9]; we are able to calculate them for the current research data.
The descriptive statistics of selected variables are presented in Table 2. Technical indicators include the following: 1) Average of n-th order: where C t-i is the closing price at time t-i, n − the smoothing interval length (in days); 2) Exponential Moving Average (EMA): where α − the smoothing constant where where C t is the closing price at time t, C t-i is the closing price at time t-i; 5) Fast stochastic: where LL n − the lowest price for n days, HH n − the highest price for n days; 6) Slow stochastic is calculated as a mean of fast stochastic values for n days.
The data is not cleared from the trends and outliers because a neural network is ro-bust to them [21].Normalization is used and described further.

Methodology
Neural network modeling o predict MICEX index a feedforward neural network is used.It is trained using a backpropagation algorithm and that is also known as MultiLayer Perceptron (MLP).The main stages of neural network design to financial time series prediction are discussed, for example, in the papers [16; 18].Hyperbolic tangent is used as the activation function of the neural network.We normalize the dependent variable and independent variables to the interval [-1; 1] that corresponds to the range of the activation function.The normalization is based on the following transformation [5]  To reduce the risk of overfitting problem the data is divided into training (60%) and testing (20%) samples [20].Training and testing samples are used to find the optimal parameters of the learning algorithm and neural network architecture.Additionally, validation sample is used (20%) to evaluate the predictive performance of the neural network.
We use grid search as an empirical way to find the optimal learning parameters of neural network.For this purpose, the neural network is trained using different learning T parameters and the best neural network with minimum the Mean Squared Error (MSE) of the forecast for the test sample is chosen.The number of learning epochs (training iterations) is fixed at all training stages and equals to 800.This study varies the following learning parameters: the number of hidden layers (1 or 2), the learning speed (LR, Learning Rate) (from 0.1 to 0.9 [5]) and Momentum (M, (from 0 to 1 [22]).
LR controls the size of the steps for the method of gradient descent on the surface of a target error function (error surface).Too high value for this parameter can lead to missing of the minimum and too low value can slow down the training process.In this paper, the advanced gradient descent method with an additional parameter learning M is also used [22].According to the [23], advanced method allows learning algorithm to converge faster to a minimum with a lower risk of failing into a local minimum.
For neural network modeling, Keras framework in Python language is employed.The obtained results for training and testing samples indicate that the optimal parameters of learning algorithm are LR = 0.09, M = 0.8 and the neural network topology consists of 1 hidden layer with 18 neurons.The value of MSE on the test sample equals to 0.0002.
According to the literature [8; 9], the Russian stock market is highly volatile.We test whether using financial time series data for a longer time period in neural network training leads to complication of data relationships detection by a neural network (due to a change in pre-existing data relationships, its disappearance, and appearance of new ones) and decrease in the predictive power.We train the neural network by using historical data for the 15-year period (2002-2016 years), the 10-year period (2007-2016 years) and the 5-year period (2012-2016 years).MSE value tends to increase with a decrease in the sample size and equals to 0.001, 0.015 and 0.052, respectively.For this reason, the historical data for the 15-year period is used and includes 3559 observations.It does not follow from the obtained results that we do not find support for the instability of the Russian stock market.It rather means that in this paper to increase the prediction accuracy of the neural network a large dataset should be used.We cannot, however, confirm and test that using longer than 15 years time period can lead to a decrease of the predictive performance as in the study [24].
Support vector machine VM5 is a machine learning method that allows to transfer an input vector in multidimensional space and to evaluate a linear regression in this space.As well as neural network, SVM uses fewer data assumptions in comparison with classical regression models (for example, using data normalization is enough in most cases) and is robust to nonstationarity in the analyzed financial time series.This method also includes training and testing processes to find an optimal configuration of SVM.Typically, the training process is associated with quadratic programming problem that is discussed in the work [20].
Key training parameters are chosen experimentally based on the results of training and testing on the same samples that are used for neural network modeling.Data normalization is performed using the transformation (6).The main training parameters include parameter C, regularization parameter ε and kernel type.A kernel is a function allows transferring the input vector to n-dimensional space, where n is the number of observations.We use the Gaussian kernel with one parameter of width σ 2 that is widely applied in literature.Support vector machine has fewer adjustable learning parameters in comparison to a neural network that allows finding faster optimal values for them.In the case when we additionally find an optimal kernel type the search for optimal learning parameters does not become very complicated because the number of kernel type is finite.The development of algorithms for finding optimal learning parameters C and ε, as well as the kernel, remains an area that requires further investigation [20].To implement SVM framework for machine learning in Python language Scikit-learn is used.
For selection of optimal learning parameters, grid search is applied.For this purpose, each training parameter is varied in the range from 10 -3 to 10 3 .The obtained results for training and testing samples indicate that the optimal parameters of learning algorithm are C = 2, ε = 0.01, σ 2 = 0.001.MSE value for testing sample is 0.0006.
Prediction performance e compare prediction performance of neural network and support vector machine on validation sample based on two groups of metrics.The first group of metrics consists of Mean Squared Error (MSE) (7), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE) (8) and Mean Absolute Percentage Error (MAPE) (9).RMSE increases faster when the share of large absolute errors is higher.
where 2 T − the length of the validation sample.
The second group of metrics consists of coefficient of determination R 2 (10) and counted coefficient of determination cR 2 (11).The aim of forecasting is not only to achieve high average accurate prediction of MICEX index, but also to predict price change direction correctly.The second point is important, for example, for an investor who intends to trade futures, because the decision to purchase or sell the asset in this case is taken in advance.In other words, if the methods predict a rise in the asset price tomorrow, an investor will enter into a contract to purchase the asset today for the purpose of selling it tomorrow.If the actual price of the asset actually rises tomorrow, an investor will benefit, even if the growth is less than predicted.Otherwise, an investor will bear the loss.
In this context, the practical application of machine learning methods for predicting financial time series in order to make investment decisions should have a measure of a model ability to predict a direction of price movement.For example, we use counted coefficient of determination cR 2 .The numerator in the formula (11) represents the total number of correctly predicted directions of MICEX price change.The direction change is a sign of the absolute change in the index price.If forecasted direction is equivalent to actual direction change, the observation equals to 1, and to 0 otherwise (12).Therefore, counting R 2 shows the percentage of correctly predicted MICEX price changes among all index changes.This indicator in itself is similar to an economic forecasting error, proposed in the paper [16], and Hit Ratio and Directional Symmetry (DC), presented in the papers [25] and [14], respectively.

Empirical results
ig. 1 and Fig.To compare more precisely the prediction performance of the methods we have calculated several prediction metrics presented in Table 3.Firstly, RMSE value is close to MAE value that means we have few observations with a noticeable difference in predicted and real values.In this study, we obtain a lower MAE value compared with the results of L. Cao and F. Tay [25], who forecasted the absolute value of S&P500 index.In their research, MAE value has a range (0.3496-0.6347) for a neural network and (0.3403-0.3706) for SVM on the testing sample.
Secondly, cR 2 equals to 0.51 and can be interpreted that the methods predict correctly only a half of all MICEX price direction changes.This result is comparable to the range of values for similar to count coefficient of determination prediction metrics (Hit Ratio [25] and Directional Symmetry (DC) [14]) obtained by previous studies.For example, Hit Ratio in forecasting of the first difference of KOSPI index in the study [25] ranged from 0.5198 to 0.5783 for SVM.DC for the testing sample in the study [14] varied from 0.392 to 0.4975 for a neural network and from 0.4623 to 0.4772 for SVM.Additionally, the papers [14; 25] point out that to reach high prediction accuracy of the direction change in stock price index is quite difficult.
In the case of investment in MICEX futures given cR 2 = 0.51, the strategy generates a near-zero profit.However, this problem re-quires further investigation by modifying in neural network modeling MSE function by reducing weights of neurons in case of incorrect direction change prediction and increasing them otherwise.For this purpose, new neural network architecture should be found.
Thirdly, SVM has higher prediction accuracy than neural network.This finding is consistent with the previous studies for the US stock market [10; 25] and Korean stock market [14].Despite the higher prediction accuracy of SVM, the difference in corresponding prediction metrics is quite small and methods are based on various assumptions.SVM is robust to overfitting problem and always converges to the global minimum.Neural network has more variety in its design and modification that allows using it in conjunction with other forecasting tools [26].The choice of the preferred prediction method depends on many factors, including computational costs.

Conclusion his paper forecasts the Russian
MICEX stock price index using neural network modeling and support vector machine.We use historical data of MICEX prices, as well as certain fundamental and technical indicators for the period 2002-2016 years.Keras and Scikit-learn frameworks in Python language are employed to perform computer experiments.Based on the analysis of certain prediction metrics, we conclude that both methods demonstrate high predictive power.Support vector machine outperforms neural network, but the difference in predictive metrics is not substantial.This finding is relevant to previous studies for different international financial markets.Further research includes the development of a methodology for filtering and transformation input data with using additional machine learning algorithms, and the development of guidelines for a choice of a trading strategy based on the machine learning results. : where x − the initial value of variable, normalized x − the normalized value of variable, min TF − the lower interval limit, i.e. -1, max TF − the upper interval limit, i.e. 1, min x − the maximum value of the initial variable, max x − the minimum value of the initial variable.

1 T−
− the MICEX actual price at time t, t y ˆ − the predicted MICEX price at time t, the length of the test sample.
TP − the number of true positive cases, i.e. correctly predicted positive changes of the MICEX index, TN − the number of true negative cases, i.e. correctly predicted negative changes of the MICEX index, 2 T − the length of the validation sample.

Table 2 The definition of variables and descriptive statistics
Note: pct. is percentage point.
-the actual value of the MICEX index at time t-1.