- About project
- Results and Awards
- Affiliate Programs
- International services
Medvedchuk Vira, postgraduate student
Alla Lazoriv , applicant
Ivano-Frankivsk National Technical University of Oil and Gas, Ukraine
Championship participant: the National Research Analytics Championship - "Ukraine";
the Open European-Asian Research Analytics Championship;
The most wasteful operations according to the analysis of the empirical models construction algorithm are operations of solving linear systems of algebraic equations and calculating of output of a system. These operations are performed repeatedly. Therefore, in order to reduce the computation time, parallelization of the algorithm for synthesis of optimal complexity models is proposed.
Keywords: model, gene, chromosome, adaptation criterion, system of equations, analysis of an algorithm.
To construct empirical models it is commonly to use the method of least squares. The inductive method of self-organization of models, proposed by Ivakhnenko and developed by his followers, is close to it. For the implementation of these methods, it is assumed that, exact values of physical quantities, which are variables of models, are given. It is obvious that this assumption idealizes conditions of observing operating object, because any physical quantity is measured with some error.
The problem of finding the object empirical model is considered in case of exact values of the output and inputs are unknown, because of a skewing of values of the output and inputs during their measurement performed with a certain error. If or is the input of the measuring device, the output is or . Namely:
where , are observational errors of and , is the number of object’s inputs.
It is now well known in probability and statistics that random variablesare independent random variables, which have the normal law of distribution.
Since are true values of measurement values and mathematical expectations of observational errors , then are mathematical expectations of random variables . During the period of object observation the set of values has been received, where is the number of experimental points. If are independent random variables and they obey the normal law of distribution, then their common law of distribution is normal too . So the formula for the density distribution of the fixed value can be represented as:
where is the inverse covariance matrix, are elements of the matrix.
With equations (1) and (2) . If random variables taken together are statistically independent, then the equation (3) can be represented as follow:
Now the law of distribution for all values can be found. According to previously made assumptions.
According to the equation (4):
According to the formula (5):
Therefore the estimation of demand parameters of the empirical model
is obtained according to the condition for minimum of the functional
During measurement of variables metrological characteristics of measuring instruments change very slowly. So it is assumed that during the time of observation
Considering above mentioned conditions functional (7) can be represented as:
where are normalized variables.
Functional (8) can be written as follow:
It is obtained according to the equation (2). If , then .
Considering the fact that , it can be written .
In most cases when , the result is obtained by minimization of the functional (9):
where is a vector which components are measured values ;
To construct an empirical model it is usually to use a polynomial dependency as follow:
where orders of arguments are chosen to be equal 0, 1, 2,… and satisfy the constraint ; is the maximum order of the polynomial (11).
The resulting matrix-vector equation(10) can be represented as a linear system of equations with , which solution can be found by using any computational method.
To build the empirical model (11) genetic algorithms  has to be applied in order to find not only model parameters but also to determine its structure. Genetic algorithms are an approach of artificial intelligence search algorithms based on evolutionary theory.
The sequence of order structures is created and it can be represented by zeros or ones on ith place depending on the parameterof the model (11)is equal to zero or not. This sequence of order elements is called a chromosome and its elements are genes. A set of chromosomes forms a population. The major role in genetic algorithms plays a fitness function, which is available to assign the fitness value for each individual. Based on it individuals with better fitness values are discovered. It means that a solution with the lowest fitness function value or the highest fitness function value is searched depending on a fitness function is minimized or maximized. The regularity criterion  is a criterion of selection and it is chosen as the fitness function in the problem of synthesis of optimal complexity empirical models.
So the problem of synthesis of optimal complexity empirical models is considered as the choice of a chromosome with the best fitness value from an initial population by using the evolutionary selection.
If the model (11) contains more than 200 parameters, it would take a long time to get the model solution (10). Therefore, in order to reduce the machine time of solving of the system of equations (10) the parallelization of Gaussian method was proposed.
Normal equation (10) can be written as:
Based on Gaussian elimination where the system (12) can be transformed into an upper diagonal matrix, the equation (14) is solved by using following formulas :
where , are ith and kth rows of the augmented matrix obtained after appending the column vector to the matrix , where с is the number of null genes in the chromosome.
The essence of this algorithm is as follows. The matrix is placed in the workspace of the first process (the process of a master) during the initialization stage (initial stage). The first step is sending equal layers of the matrix by the master to other processes (workers) as possible as each process within the first one gets submatrices , where is the total number of processes. After sending equal layers, the master normalizes the first row of his submatrix by using the equation (13) and sends the value to other workers. Workers receiving the row corresponding to formula (14), where , recount elements of their submatrices including the element . At the same time, the master starting from the second row calculates new values by using the formula (14). As the result of such conversion, submatrices are received by workers in which first columns are equal to zero. The matrix is received with first column elements to be equal to zero, except the first one , which is equal to one.
The process of submatrices elements modification is performed cyclically following to the scheme until the master completes transforming the matrix into an upper diagonal matrix. As a result of the first cycle the upper rectangular matrix with ones on the main diagonal is saved in the master’s workspace and submatrices are received in workers’ workspaces. The number of columns with all zero values is determined by the number of rows of the master’s matrix in those submatrices.
Workers send their submatrices to the master and they are to be combined at the beginning of the second cycle. The combined submatrix is divided into q parts by the master, which are sent to all workers including himself. Next calculations are performed according to the scheme, which are analogous to the first cycle.
The algorithm continues running until the size of the next sending layer is not exceed the number of processes. Then the remaining submatrix is transformed into the upper rectangular matrix with ones on the main diagonal.All matrices saved in the master’s workspace are combined at the final stage.
Based on the results of previously made algorithm analysis the waste of machine time of parallel algorithm implementation is defined as:
where is the total number of operations of division and subtraction performed by the master and ith worker at the calculation rth cycle; is the number of rows of the master’s submatrix in the rth cycle; is the number of operations of division and subtraction at the last calculation cycle; are wastes of machine time spent on performing operations .
When the master divides the matrix into q layers of equal size, the upper estimate of wastes of machine time of parallel algorithm implementation is obtained as:
In general, the value does not belong to the set of natural numbers. Therefore, it should be rounded to the nearest integer.
Values of the formula (15) are calculated as:
where is the number of rows of the master’s submatrix in the rth calculation cycle.
The following number of operations of division and subtraction performed by the master and each ith worker after finishing the rth calculation cycle is obtained like this:
The synthesis of optimal complexity empirical modelsbased on genetic algorithms requires the repeated solution of the system of linear equations. Therefore, the matrix is transformed into the upper triangular matrix and the most of the machine time is wasted on it. Thus, the parallel algorithm application of the transfer upper triangular matrix gives a significant time advantage compared to the serial algorithm.
1. Prokhorov, Yu. V. , & Rozanov, Yu. A. (1973). Probality Theoty. Moskow, M: Nauka.
2. Horbiychuk, M. I., Kohutyak, M. I., Vasulenko, O. B., Chupak, I. V. (2009). The method of the empirical model synthesis based on genetic algorithms. Journal of Oil and Gas Exploration and Development, 4 (33), 72-79.
3. Ivahnenko, A. H., Koppa, Yu. V., Stepashko, V. S. (Eds.). (1980). Handbook of standard simulation programs. Kyiv, K.: Tehnika.
4. Volkov, E. A. (1987). Numerical methods. Moskow, M: Nauka.