04 / 22 / 2011 - 12:01 — administrator

**UDK 519.217.2 **

**MARKOV CHAINS APPLICATION IN FORECASTING OF SOCIOECONOMIC PROCESSES**

**Shamil M. Ikhsanov, candidate of Sci. in Engineering, associate professor, senior research worker**

**Valentina V. Lopushanska, graduate student**

**Mykolayiv State Agrarian University, Ukraine**

*The article offers a variant of applying stochastic models based on Markov chains for **socioeconomic processes forecasting. Data on the structure of land resources in Ukraine is considered as an example. An optimization problem is formulated for estimation of transition probability matrix, and accuracy of estimation depending on the interval of accumulation is obtained by the method of Monte Carlo. Forecast for the land resources structure of Ukraine in 2010 with its accuracy estimation is presented.*

*Keywords: **Markov chain; transition probabilities matrix; land resources of Ukraine, stochastic model, **forecasting the land resources structure.*

*В статье предложен вариант использования стохастических моделей на базе цепей Маркова для прогнозирования социально-экономических процессов. В качестве примера рассмотрены данные по структуре земельного фонда Украины. Сформулирована оптимизационная задача для оценки матрицы переходных вероятностей, и методом Монте–Карло получена точность оценки в зависимости от интервала накопления. Приведен прогноз структуры земельного фонда Украины на 2010 год с оценкой точности.*

*Ключевые слова: **цепь Маркова, матрица вероятностей переходов, земельный фонд Украины, стохастическая модель, прогнозирование структуры земельного фонда.*

A great number of socioeconomic systems are characterized by distribution of some basic elements by different categories, and the redistribution of such component parts is occurred in the course of time.

For instance, the land area of any region is distributed by categories depending on its purpose (agricultural land, forestry area, building area and others). But, as a rule the general land areas of region remain unchangeable. The population of region depending on the research social and economic process also can be grouped by the following categories: education, speciality, political commitment, sufficiency level and others like that. But, in comparison with the first example, the amount of total region population is constantly changing. However, the introduction of additional categories, such as the amount of new-born children, dead people and immigrants, allows to form such system closed. One of the possible models of such processes’ development can be a model, based on Markov chains application, which is described bythe transition probabilities matrix (TPM).

The questions of predicting the future situation in separate socioeconomic systems on the basis of stochastic approach are consideredin a number of publications concerning probability modeling in the sphere of finances, commodity market and some other industries [1-7]. The insufficient attention is given to the problems of transition probabilities matricesconstruction, and the estimation of forecast precision on the basis of such matrices is not resulted.

For the problem of forecasting, the stochastic approach has the edge on the separate categories’ prediction on the basis of approximating functions’ selection. The above-mentioned does not provide the system circularity, i.e. the constancy of general amount of elements, which results in the necessity of the artificial methods’ use [4] and does not allow to take into account the interdependence among categories. In Markov model the system circularity is laid at the beginning, and TPM fully describes the interdependences among categories [7, 8].

For the notational convenience we will insymbol and give the definition of Markov chain. The stochastic object can be in one of the *n *states, probability of transitions from one state to other one depends only on the previous object state. The probability of transitions is set by the transition probability matrix, , where *p _{ij} *is probability of object transition from the state

We will consider the possibility of Markov chains application by the example of prediction of land fund structure. As initial data we will take statistics about the dynamics of land fund changes in Ukraine, which was given by Center for Land Reform Policy in Ukraine over the period of 15 years (1^{st} January 1994 – 1^{st} January 2009) [11]. The dynamics of land fund changes on the basic types of lands and its functional use from 2006 to 2009 is given below (Table 1).

Table 1

Dynamics of land fund changes on the basic types of lands and its functional use from 2006 to 2009

Land category | Area, thousand hectares | |||

2006 | 2007 | 2008 | 2009 | |

Agricultural lands | 41722,2 | 41675,9 | 41650,0 | 41625,8 |

Forestry area | 10503,7 | 10539,9 | 10556,3 | 10570,1 |

Built-up lands | 2467,5 | 2470,2 | 2476,6 | 2489,0 |

Open waterlogged lands | 966,0 | 972,4 | 975,8 | 978,0 |

Open lands without plant cover or insignificant plant cover | 1040,5 | 1042,5 | 1038,2 | 1032,8 |

Other lands | 1238,0 | 1235,2 | 1236,3 | 1236,6 |

Water | 2416,9 | 2418,7 | 2421,6 | 2422,5 |

| 60354,8 | 60354,8 | 60354,8 | 60354,8 |

The form in which the definition of Markov chain is given does not enable his application to the represented data, as does not assume the numerical characteristic of the object state. The solution of the problem is found in that fact that we will consider the unit of area as Markov’s object. Further, as the unit of area, we will accept 10 hectares, exactly this size provides necessary accuracy in calculations. Thus, the general amount of objects which are studied, for the land fund of Ukraine is 6035480. Let us assume that all objects statistically are not dependent. Otherwise, we will come across a plenty of model parameters.

Having accepted a closed statistical model for the real research phenomena, it undoubtedly should be realized. It will allow to estimate the model features and characteristics, which are difficult or impossible to derive theoretically, by Monte Carlo method, i.e. by the observation of model realization and calculation of necessary parameters. It is important particularly for the systems, which are similar to research ones, and to which the large volume of statistical information is not available.

In contrast to work [4], we will take the homogeneous model of Markov’s chain, as even in this case, the amount of parameters which should be estimated is very considerable (for the given classification of lands by 7 categories, the transition probability matrix consists of 49 elements). For this model it is easy to get statement (see theorem 2.1 in [8] about the connection of states’ probabilities in the homogeneous Markov chain in the nearest moments of time), that the product of transposedmatrix *Р *by the vector of current land state , where *S _{i}*

(1)

It is necessary to use the above mentioned formula while forecasting the structure of land fund for the year, as well as recommended in [4].

The dispersion of states is calculated by the following formula:

, (2)

where matrix .

The program which realizes this model is developed by the programming language C++.

The main problem of the offered model realization is the estimation of transition probability matrix. The crucial solution of this problem is the additional information about the structure of transitions in the form of matrix, where *m _{ij }*is the amount of area units which turned from the state

Such approach is examined, for example, in work [5]for the forecast of market shares of different brands. In this case, it is obviously seen, that the best estimation of transition probabilities matrix is the frequency of transitions from the state *i* to the state*j*.

(3)

It is necessary to average out the obtained estimations for every moment of transition on some time interval.

When there is no matrix of transitions ** M_{trans}**, the problem of TPM evaluation can be solved as well, but of course, with less accuracy. One of the possible algorithms of evaluation is in the solutionof the following optimization task:

(4)

Here *S(t) – *vector value of object states, which were observed, *ES**(t) - *expected average values of vector of object states, which are calculated by the formula (1), *-*observed average value of the *і-*state. The normalizationis applied for the reduction of analyzed values’ declinationfrom the expected average ones according to different states to the single scale.

The decision of optimization task (4) by Microsoft Excel for Table 1 data on the maximal time interval (the differences of declination squares are calculated in 2007, 2008 and 2009, the average of analyzed values is carried out in 2006 – 2009 as well, for the initial value of TPM the unitary matrixis accepted) results in the following TPM (Figure 1):

0,99922 | 0,00050 | 0,00014 | 0,00010 | 0,00000 | 0,00000 | 0,00005 |

0,00005 | 0,99983 | 0,00012 | 0,00000 | 0,00000 | 0,00001 | 0,00000 |

0,00000 | 0,00002 | 0,99998 | 0,00000 | 0,00000 | 0,00000 | 0,00000 |

0,00006 | 0,00000 | 0,00005 | 0,99985 | 0,00000 | 0,00003 | 0,00000 |

0,00000 | 0,00247 | 0,00000 | 0,00003 | 0,99749 | 0,00000 | 0,00002 |

0,00000 | 0,00039 | 0,00000 | 0,00009 | 0,00001 | 0,99951 | 0,00001 |

0,00001 | 0,00005 | 0,00001 | 0,00000 | 0,00000 | 0,00000 | 0,99993 |

Fig. 1. Transition probability matrix as the result of using the maximal time interval

This brings up the question concerning TPM accuracy estimation by the indicated method, making allowances that statistical information is the selection from the offered Markov model. The answer to this question can be easily given by Monte Carlo method. As the initial value of land state we will use the information of 2006, further changes – in accordance with received TPM. For the measure of TPM estimation divergence from available TPM we will accept standard deviation (** σ_{err}**) (without normalizing on the amount of elements):

(5)

Here - estimated transition probabilities, - actual value of transition probabilities.

In the process of getting the estimation results we had to come across with that fact that setting norms on the average investigated value offered in formula (4) leads to the worse values of errors, what without setting norms, for some values of accumulation the error is increased more than fourfold. In Figure 2 the dependence of TPM evaluation errors depending on the amount of years are shown (setting norms in the objective function is excluded).

Fig. 2. Error in TPM estimation while solving the optimization task

The obtained results show that the offered method allows to estimate TPM with satisfactory accuracy. Thus, the exactness of estimation in parvo depends on the amount of years when the estimation is carried out.

The obtained results allow us to begin the main objective of this research – estimation of forecast precision of the land fund structure of Ukraine, which based on offered Markov model.

There is no doubt that if the great volume of statistical information is available, for example, homogeneous annual information for 15-20 years about the structure of land fund, it would be possibly enough to estimate the prediction certainty by the offered method. However, from the outlined data it is possible to get the preliminary estimation of prediction accuracy.

For the estimation of prediction accuracy we will compare the prediction value of land fund structure, got by formula (1), with its actual value (Table 2). The calculation can be made for 2008 by the single possible variant of TPM estimation and for 2009 by two variants of estimation. The average value of absolute error is calculated by the following formula:

(6)

Table 2

Predicted values of land distribution by categories in 2008 and 2009 according to the data of different periods

Land category | Predicted area in 2008 according to the period of time 2006-2007 | Predicted area in 2009 according to the period of time 2006-2008 | Predicted area in 2009 according to the period of time 2007-2008 | ||||||

Predicted value, thousand hectares | Absolute deviation, thousand hectares | Devi-tion, % | Predicted value, thousand hectares | Absolute deviation, thousand hectares | Devi-tion, % | Predicted value, thousand hectares | Absolute deviation, thousand hectares | Devi-tion, % | |

Agricultural lands | 41629,66 | 20,34 | 0,049% | 41613,94 | 11,86 | 0,028% | 41624,16 | 1,64 | 0,004% |

Forestry area | 10576,05 | -19,75 | -0,187% | 10582,49 | -12,39 | -0,117% | 10572,55 | -2,45 | -0,023% |

Built-up lands | 2472,89 | 3,71 | 0,150% | 2481,09 | 7,91 | 0,318% | 2482,85 | 6,15 | 0,247% |

Open waterlogged lands | 978,80 | -3,00 | -0,307% | 980,65 | -2,65 | -0,271% | 979,04 | -1,04 | -0,106% |

Open lands without plant cover or insignificant plant cover | 1044,50 | -6,30 | -0,607% | 1037,27 | -4,47 | -0,433% | 1034,62 | -1,82 | -0,176% |

Other lands | 1232,42 | 3,88 | 0,314% | 1235,45 | 1,15 | 0,093% | 1237,25 | -0,65 | -0,052% |

Water | 2420,48 | 1,12 | 0,046% | 2423,89 | -1,39 | -0,057% | 2424,33 | -1,83 | -0,076% |

Average value |
| 8,30 | 0,096% |
| 5,98 | 0,069% |
| 2,23 | 0,026% |

The received data (Table 2) show that prediction of land fund structure of Ukraine on the application of Markov chains model is possible with adequate accuracy. The relative error of prediction by the separate land categories does not exceed 0,6%, average value of error - not more than 0,1%. To get the reliable prediction it is necessary to use information for period of not more than three previous years, here it is possible, that the most optimal is the data use of two previous years, as it is demonstrated by the results of prediction of land structure in 2009.

To sum it up, we will outline the forecasting of land fund structure of Ukraine in 2010 with the use of different intervals of TPM estimation (Table 3).

Table 3

Comparative predicted distribution of lands by categories in 2010,

having got for three periods

Land category | 2006-2009 thousand hectares | 2007-2009 thousand hectares | 2008-2009 thousand hectares |

Agricultural lands | 41594,0 | 41600,4 | 41601,1 |

Forestry area | 10591,9 | 10584,4 | 10583,3 |

Built-up lands | 2496,1 | 2497,8 | 2500,9 |

Open waterlogged lands | 982,0 | 980,1 | 979,7 |

Open lands without plant cover or insignificant plant cover | 1030,3 | 1032,0 | 1030,4 |

Other lands | 1236,1 | 1236,5 | 1236,4 |

Water | 2424,4 | 2423,6 | 2422,9 |

In spite of a lack of data level(it has been already mentioned above), the authors risk to declare, that the most accurate prediction of land fund structure of Ukraine for 2010 will be the last column of table 3.

Thus, the offered approach on the Markov chains application, i.e. the construction of probabilities matrix of system transitions from one state to another one through the solving of nonlinear programming problem, is possible to use for prediction and analysis of possible variants of further development of socioeconomic systems – from the structure of state or region land fund or the processes which take place in the market system, to the forecasting of demographic situation or position at labour-market.

**References:**

- 1. Dent, Warren and Ballintine, Richard. A review of the estimation of transition probabilities in Markov chains. / The Australian journal of agricultural economics. Vol. 15, No. 2, p. 69-81, 1971.
- 2. Grimshaw, Scott D. and Alexander, William P. (2010). “Markov Chain Models for Delinquency:Transition Matrix Estimation and Forecasts.” Applied Stochastic Models in Business andIndustry. n/a. doi: 10.1002/asmb.827, 2010.
- 3. GoncharenkoI. V., LopushanskayaV. V. (2009). “Prognozirovaniyepredpochteniyvvybore sfery professionalnoy deyatelnosti. [The forecasting of preferences in profession choice]”. / “Sistemny analiz i prognozirovaniye ekonomiki”. Minsk: Izd-voBATU. 200-204.
- 4. Kopytko V.І. (2004). “Naukovo-praktychnі rekomendatsiyiz vykoristannyam ekonomіko-matematychnykh metodіv v umovakh reformuvannya regіonalnikh APK. [Practical recommendations for mathematical methods appliance by APC-reforming conditions]”. Kyiv : Navch.-koord. tsentr doradchykh sluzhb.
- 5. Kravchenko V.N. “Prognozirovaniyerynochnykhdoleytorgovykhmarok. [The Forecasting of Brands Marketshares]”. Retrieved from: http://modeling.at.ua/publ/2-1-0-37.
- 6. Zhluktenko V.I. , Begun A.V. (2005). “Stokhastychnі modelі v ekonomіtsі. [Stochastic models in Ecomomic]”. Kyiv : KNEU.
- 7. Berezhnaya H.V., Berezhnoy V.E. (2006). “Matematicheskiye metody modelirovaniya ekonomicheskih system. [Mathematical methods of economic system modeling]”. Мoskow: Finansy i statistika.
- 8. Labsker L. (2002). “Veroyatnostnoye modelirovaniye v finansovo-ekonomicheskoy oblasti[Stochastic modeling in economic and finances]”. Moskow: Alpina pablisher.
- 9. Raytsin V. Ya. (2005). “Modelirovaniye sotsialnykh protsessov.[The Social processes Modeling]”.Мoskow. Ekzamen.
- 10. Romanovskiy I. V. (2003). “Diskretnyy analiz: Uchebnoye posobiye dlya studentov, spetsializiruyushchikhsya po prikladnoy matematike i informatike [Sampling analysis: Textbook for students, who specialized in applied mathematics and informatic]”. St. Petersburg : Nevskiy Dialekt.
- 11. Center for Land Reform Policy in Ukraine (2010).Land resources of Ukraine. Retrieved from: http://www.myland.org.ua/ index.php?id=2080