Parallel seasonal approach for electrical load forecasting.

July 6, 2017 | Autor: Oussama Ahmia | Categoría: Data Mining, Clustering and Classification Methods, Self-adaptive Software

Share Embed

Laporkan tautan ini

Descripción

Parallel seasonal approach for electrical load forecasting. Oussama Ahmia, Nadir Farah. LABGED Laboratory Université Badji Mokhtar Annaba, Département d’Informatique, Bp 12, El Hadjar, 23000, Algeria {ahmia, farah}@labged.net

Abstract. The electrical load forecast is an important aspect for the electrical distribution companies. It is important to determine the future demand for power in the short, medium and long term. In order to make sure that the prediction remains suitable different parameters are taking into account such as GDP, weather, And so on. This paper covers the forecasting of medium and long terms of Algerian electrical load. This is done using information contained in past consumption in a parallel approach where each season is forecasted separately. Three models are implemented in this work. Multiple linear regressions, artificial neural network MLP (multilayer perceptron), SVR (Support Vector Machines Regression) with grid search algorithm for hyper parameter optimization, and we use real energy consumption records. The proposed approach can be useful in the elaboration of energy policies, although accurate predictions of energy consumption positively affect the capital investment, while conserve at the same time the supply security. In addition it can be a precise tool for the Algerian mid long-term energy consumption prediction problem, which up today has not been faced effectively. The results are very encouraging and were accepted by the local electricity company. Keywords: Support vector machines · MLP neural network · linear regression · Comparative methods · Kernel · RBF · Pearson VII · electricity demand

1

Introduction:

Electric energy is considered as an important factor in the economic and social development of a country and therefore in the people’s wealth and everyday life. Energy consumption forecast are necessary in the studies energy supply strategy, industrial investment. In addition, an exact prediction helps to decrease the loss of charge or overproduction [1].What really matters is for power companies to settle the consumption peak [2], per day per month, per season or per year. Then, according to these given peak we can use a dynamic profile on months, weeks or years scale, these

Proceedings ITISE 2015. Granada 1-3, july, 2015

615

information will give a prediction on the consumption considering the considered profiles. Put another way, using only the peak and basing on profile we can deduce daily or hourly load value. One of the components of chronological series is seasonal variation, which corresponds to the periodic phenomenon [3]. In the case of the power consummation, it is divided into four seasons [4]. In power consumption there are three types of voltage, the low voltage used in domestic houses, the medium voltage used by small industries, the high voltage used by big companies, like steel industries and soaking water industries. The GDP (Gross Domestic Product) is an important parameter for the forecast of power charge in case of mid or high voltage because the economy of a country lays in a major part on power consumption. If we consider the transformation of resources into goods and services and the fact that every transformation requires power, it is an obviousness that the economic production depends on the power production [5]. Our work consists in finding a model which corresponds to the consumption’s diagram considering the variation problem of seasonal consumption. With these parameters, we choose a parallel approach [6] using different models in order to establish which one could give the most accurate prediction. These models are the linear regression, the neural network MLP (Multi-Layer Perceptron) with different architectures, machine’s vector regression (SVR) with different functions and core (Polynomial, Radial basis function (RBF), (PUK) Pearson 7 functions). The paper is organized as follow: in the second section we present related works. The methods used will be discussed briefly in the Third section. In the fourth section we present the work methodology and pretreatment. In the fifth section we explain some experiments and results. Finally a conclusion comes in the final section.

2

Related work:

Different methods and approaches have been proposed for forecasting the long and mid-term electric load demand in the last decades. Many of them include time series analysis with statistical method like linear regressions. Indeed, Bianco [7] used multiple linear regression using GDP (Gross Domestic Product) and population as selected exogenous variables to forecast electricity consumption in Italy, the paper present the different models used, the results were globally good with an error rate that vary between 0.11% and 2.4%. Nezzar [8] used also linear regression approach for Mid-long term Algerian electric load forecasting, using historical information and GDP. The paper is composed mainly in two parts the first one for the annual national grid forecasting, the second one concern the use of load profile in order to make dynamic prediction and obtaining a global load matrix. Renuka Achanta [9] applied support vector regression to a real world dataset provided in the web by the Energy Information Administration (EIA) department of America for the Alaska state. In this work the Support vector machines (SVM) performance was compared with MLP for various models. The results obtained show that SVM per-

Proceedings ITISE 2015. Granada 1-3, july, 2015

616

forms better than neural networks trained with back propagation algorithm, they concluded that through proper selection of the parameters, Support Vector Machines can replace some of the neural network based models for electric load forecasting. In the other hand, we can found intelligent methods such as Artificial neural network (ANN), in this way Ekonomou [10] used a multi-layer perceptron (MLP) to predict Greek long-term energy consumption . Several neural network (MLP) architectures were tested and the one with the best generalizing ability was selected. The selected ANN model results were compared to linear regression and ε-Support Vector Regression. The produced results was much more accurate than these obtained by a linear regression model and similar to these obtained by a support vector machine model. Karin [11] also use Artificial Neural Network Approach for the forecasting of electricity demand in Thailand, in the paper three methodologies were used, autoregressive integrated moving average (ARIMA), artificial neural network (ANN) and multiple linear regression (MLR). The objective was to compare the performance of these three approaches and the empirical data used in this study was the historical data regarding the electricity demand (population, gross domestic product: GDP, stock index, revenue from exporting industrial products and electricity consumption). The results based on the error measurement showed that ANN model outperforms the other approaches.

3

Models used:

3.1

Multivariable Regression:

Regression analysis is a statistical tool for the investigation of relationships between variables. Usually, we need to know the causal effect of one variable upon another, the GDP increase upon demand, for example. To explore such issues, we need to assemble data on the underlying variables of interest and employs regression to estimate the quantitative effect of the causal variables upon the variable that they influence. Regression techniques have been central to the field of economic statistics (“econometrics”). Multiple linear regression attempts to model the relationship between two or more explanatory variables and a response variable by fitting a linear equation on the observed data [12]. In this work multiple regressions are used to forecast load. Which consist in attempting to model the relationship between several explanatory variables and a response variable by fitting a linear equation on the observed data. Every value of the independent variables is associated with a value of the dependent variable. Formally, the model for multiple linear regressions, given N observations, is: ȯሺ݇ሻ ൌ ߙͲ ൅ σఒ௜ୀଵ ܽ݅߯݅ ሺ݇ሻ ൅ ߝ௞ ǡ ݇ ൌ ͳǡʹǡ ǥ ǡ ܰ

Proceedings ITISE 2015. Granada 1-3, july, 2015

(1)

617

Where : Υ(k) is the estimated loads. χi are the multiple variables. ߝ௞ is The notation for the model deviations. 3.2

Artificial Neural Network:

Artificial Neural Network (ANN) is a machine learning approach inspired by the way in which the brain performs a particular learning task. ANNs are modeled on human brain and consists of a number of artificial neurons. Neuron in ANNs tends to have fewer connections than biological neurons. Each neuron in ANN receives a number of inputs. (Each input has weight). An activation function is applied to these inputs which results in a neuron activation level. There are different classes of network architectures: Single-layer feed-forward, Multi-layer feed-forward, Recurrent [13]. Multilayer perceptron (MLP): A multilayer perceptron (MLP) is a special case of artificial neural networks which consists of one or more layers of computation nodes in a directed graph the signal propagate in the network layer-by-layer, with each layer fully connected to the next one. The input layer consists of (sensory unites) input nodes, each node is a neuron (or processing element) with a nonlinear function namely sigmoid function is widely used to generate output activation in the computation nodes [14]. In a general case MLPs are trained with the back propagation algorithm to increase the success of classification and regression systems.

Fig. 1. Multilayer perceptron (MLP) .

Proceedings ITISE 2015. Granada 1-3, july, 2015

618

3.3

Support Vector Machines:

Support vector machine (SVM) is a Statistical Learning algorithm developed by Vladimir Vapnik and co-workers.[15] Support Vector Machines are based on the concept of decision planes that define decision boundaries. A decision plane is one that separates between a set of objects having different class memberships The basic idea of Support Vector Machines is to map the original data into a feature space with high dimensionality through a nonlinear mapping function and construct an optimal hyper plane in new space. SVM can be applied to both classification and regression. In the case of classification, an optimal hyper plane is found that separates the data into two classes. Whereas in the case of regression a hyper plane is to be constructed that lies close to as many points as possible [16]. The key characteristics of SVMs are the use of kernels, the non-attendance of local minima, the sparseness of the solution and the capacity control obtained by optimizing the margin. Support Vector Regression (SVR): The SVR task is to find a functional form for a function that can correctly predict new cases that the SVM has not been presented with before. This can be achieved by training the SVM model on a sample set, i.e., training set, a process that involves, like classification (see above), and the sequential optimization of an error function, depending on this error function definition [16]. There are number of kernels that can be used in Support Vector Machines models. These include linear, polynomial, radial basis function (RBF) and sigmoid:

Table 1. Kernels summary.

Kernel Polynomial

Function ሾͳ ൅ ሺܺǤ ܺ௜ ሻሿ௣

RBF

PUK (Pearson 7)

Comment Power p is specified by the user The width σ² is specified by the user

ͳ ሺെ ʹ ԡܺ െ ܺ௜ ԡଶ ሻ ʹߪ ‫ܪ‬

݂ሺ‫ݔ‬ሻ ൌ

ଶ ఠ

ʹሺ‫ ݔ‬െ ‫ݔ‬଴ ሻඥʹሺଵΤఠ ሻ െ ͳ ൥ͳ ൅ ቆ ቇ ൩ ߪ

Proceedings ITISE 2015. Granada 1-3, july, 2015

Where H is the peak height at the center x0 of the peak, and x represents the independent variable. The parameters r and x control the half and the tailing factor of the peak

619

4

Methodology:

The power consumption in Algeria is divided into four seasons and the electric load values in each season are relatively following the same evolution path. In order to improve the precision of the forecast, we divided the dataset into four parts; each part contains the monthly electric load values (peak) of a different season. Due to that, we constructed a model on each season’s data set. In this goal we divided the problem into four sub-problems. In other terms we will have a parallel approach as shown on Figure 2 with four ANN, four SVR with RBF kernel and so on. The forecasting models that are modeled are implemented using historical information. Different approaches are used to find the best monthly forecast model by choosing and testing different given information: x The electric load values of the same month of the previous years (PMY) ex: ܻሺ‫ݕ‬ǡ ݉ሻ ൌ ݂ሺܻሺ‫ ݕ‬െ ͳǡ ݉ሻǡ ܻሺ‫ ݕ‬െ ʹǡ ݉ሻǡ ǥ Ǥ Ǥ ሻ x The electric load values of the previous months (PM) ex: ܻሺ‫ݕ‬ǡ ݉ሻ ൌ ݂ሺܻሺ‫ݕ‬ǡ ݉ െ ͳሻǡ ܻሺ‫ݕ‬ǡ ݉ െ ʹሻǡ ǥ Ǥ Ǥ ሻ x and combination of them ex: ܻሺ‫ݕ‬ǡ ݉ሻ ൌ ݂ሺܻሺ‫ ݕ‬െ ͳǡ ݉ሻǡ ܻሺ‫ ݕ‬െ ʹǡ ݉ሻǡ ǥ Ǥ ǡ ܻሺ‫ݕ‬ǡ ݉ െ ͳሻǡ ܻሺ‫ݕ‬ǡ ݉ െ ʹሻǡ ǥ Ǥ ሻ Where: y: is the year m: is the month

Fig. 2. Diagram illustrating an example of prediction of monthly load for the year 2012.

Proceedings ITISE 2015. Granada 1-3, july, 2015

620

According to the existence of a unidirectional causal relationship between economic growth (GDP growth) and energy consumption with direction of causality running from economic growth to energy consumption [17], we add the GDP as exogenous variable. From the stance that the future GDP values are unknown we use a power regression model equation (2) (autoregressive model) to predict it. ଴Ǥଽଷସଽ ‫ܨ‬ሺܺ௧ ሻ ൌ ͳǤͺͲͷ͸ܺ௧ିଵ

(2)

Since our interest is focused on monthly forecast. Unfortunately the GDP is an annual data and to have the corresponding value for each month, we use the load profile [8]. Otherwise, the percentage corresponding to monthly power value compared to the calculated load factor (annual peak). Example: we suppose that the profile value for a month is 80% so the monthly GDP (GDPm) is calculated by the equation (3): ‫ ݉ܲܦܩ‬ൌ ͲǤͺሺ‫ܲܦܩ‬ሻ 4.1

(3)

DATA SELECTION AND PREPROCESSING:

This study uses the Algerian national electricity consumption data from year 2000 to 2012, provided by the national electricity company. In order to compare the forecasting performances of SVR models with those of ANN and linear regression models. The data provided concerns the national electricity consumption for each hour during the period from 2000 to 2012, using annual, weekly and daily profile we calculated the electrical load value, after that the monthly peak is deduced by taking the maximum load values in a considered month. The data used by ANN and SVR (PUK, RBF) has to be normalized on a Scaling from 0 to 1. This normalization will help the network to converge and producing meaningful results. ܰ‫݀݁ݖ݈݅ܽ݉ݎ݋‬ሺ݁௜ ሻ ൌ ா

௘೔ିா೘೔೙

೘ೌೣ ିா೘೔೙

(4)

Where: ‫ܧ‬௠௜௡ = the minimum expected value for variable E ‫ܧ‬௠௔௫ = the maximum expected value for variable E If ‫ܧ‬௠௔௫ is equal to ‫ܧ‬௠௜௡ then Normalized (݁௜ ) is set to 0,5.

Proceedings ITISE 2015. Granada 1-3, july, 2015

621

4.2

Variables Considered:

There is a relation between actual and past electric load. That's why eight combinations of past information (monthly peak load at a precedent month) are used as variables in order to predict future peak load. To forecast a month we use different models for each combination: x (2 PMY+ 2 PM) use the same month of the two previous years and the electric load values of the two previous months (four variables). x (2 PMY + PM) use the same month of the two previous years and the electric load values of the previous month (three variables). x (2 PMY) use the same month of the two previous years (two variables). x (3 PMY) use the same month of the three previous years (three variables). x (4 PMY+ 2 PM) use the same month of the four previous years and the electric load values of the two previous months (six variables). x (4 PMY+ 3 PM) use the same month of the four previous years and the electric load values of the three previous months (seven variables). x (4 PMY+ PM) use the same month of the four previous years and the electric load values of the previous month (five variables). x (4 PMY) use the same month of the four previous years (four variables). 4.3

Design of the proposed SVR and MLP ANN model:

In order to find the best neural network architecture to have a good generalizing ability, several ANN MLP model was developed and tested. These structures were consisted from 1 to 5 hidden layers and within 2 to 10 neurons in each hidden layer. The model, which presented the best generalizing ability, had a compact structure with the following characteristics: 2 hidden layers, with 5 and 4 neurons in each layer, and trained with back propagation learning algorithm and logarithmic sigmoid transfer function. The parameters optimizations increase considerably the accuracy of the peak forecast [18-20]. It is in this perspective the SVR parameters have been chosen by grid search algorithm [21] in order to have better accuracy.

5

Results and Discussion:

Firstly, we train the models on the full data set without dividing it into four seasons. The performance of all the models with the same variables set is tested in order to reveal the advantage of the proposed method. Each model will be checked using MAPE. The forecasting results for each model are presented in Table 2. The minimal errors in each line are highlighted in bold.

Proceedings ITISE 2015. Granada 1-3, july, 2015

622

Table 2. MAPE before dividing the dataset into four seasons of the five models using different variables basing on different historical value

2 PMY+ 2 PM 2 PMY+ PM 2 PMY 3 PMY 4 PMY+ 2 PM 4 PMY+ 3 PM 4 PMY+ PM 4 PMY

linear regression model

SVR (PUK kernel)

SVR (RBF kernel)

SVR (Poly kernel)

Neural network

2,96% 3,02% 3,60% 3,55% 2,96% 2,98% 3,03% 3,15%

3,28% 3,17% 3,57% 3,50% 3,02% 3,12% 3,12% 3,13%

3,15% 3,23% 3,56% 3,37% 3,04% 3,11% 3,13% 3,10%

2,98% 3,20% 3,64% 3,42% 3,01% 3,11% 3,23% 3,01%

3,52% 3,74% 3,90% 3,86% 3,75% 3,93% 4,16% 3,77%

Then we train four models on the data set that have been divided into four seasons, where each model will be specialized on a different season (parallel approach), the results are represent in table 3.

Table 3. MAPE after dividing the dataset into four seasons of the five models using different variables basing on different historical value

2 PMY+ 2 PM 2 PMY+ PM 2 PMY 3 PMY 4 PMY+ 2 PM 4 PMY+ 3 PM 4 PMY+ PM 4 PMY

linear regression model

SVR (PUK kernel)

SVR (RBF kernel)

SVR (Poly kernel)

Neural network

2,94% 2,83% 3,63% 3,51% 2,85% 2,93% 2,81% 2,97%

3,21% 3,00% 4,33% 3,81% 2,77% 2,89% 2,58% 2,88%

2,97% 2,76% 3,58% 3,05% 2,77% 2,88% 2,48% 2,66%

2,72% 2,73% 3,48% 3,48% 2,67% 2,65% 2,54% 2,79%

5,48% 4,81% 3,71% 3,51% 4,81% 3,99% 4,63% 4,02%

Finally we add the GDP as exogenous variable to the divided data set, the results are shown below.

Proceedings ITISE 2015. Granada 1-3, july, 2015

623

Table 4. MAPE after dividing the dataset into four seasons of the five models using different variables basing on different historical values and the GDPm value

2 PMY+ 2 PM 2 PMY+ PM 2 PMY 3 PMY 4 PMY+ 2 PM 4 PMY+ 3 PM 4 PMY+ PM 4 PMY

linear regression model

SVR (PUK kernel)

SVR (RBF kernel)

SVR (Poly kernel)

Neural network

2,22% 2,21% 3,51% 3,45% 2,35% 2,53% 2,38% 2,74%

2,10% 2,37% 3,36% 3,10% 1,93% 1,91% 1,92% 2,44%

2,14% 2,17% 2,98% 2,62% 1,60% 1,59% 1,78% 2,34%

2,08% 2,30% 3,43% 3,16% 1,99% 2,20% 1,97% 2,84%

3,77% 4,04% 2,81% 3,08% 3,04% 2,68% 2,73% 3,76%

We notice from table 2 and 3 that dividing our dataset into seasons increases the forecast precision, as an example the error of the Support vector regression with RBF kernel decrease from 3,13% to 2,48%, for the model that uses the values of the same month of the four previous years and the values of the previous month. Except in the case of the neural network because of insufficient training data set caused by the division, The size of the training set before dividing contain 120 instances, after dividing the number of instance in each data set becomes 30. We note according to the result in table 2 and 3 that the error of the neural network increases from 3,52% to 5,48% for the model that use the values of the same month of the two previous years and the values of the two previous months. According to table 4, taking into account the GDP value increase significantly the accuracy of the prediction, this shows that the Algerian GDP is closely linked to the electricity load peak. From the table 2, 3, 4 it can be observed that hyper parameter optimized SVR’s give a better result than ANN and multiple Regression.

6

Conclusion:

A parallel approach application of support vector regression, multilayer perceptron neural networks and multiple linear regressions for electric load forecasting has been presented in this paper. Several combinations of past load values was used in order to find which ones have the best generalizing ability. The main Idea of this work is to improve the accuracy of the forecast by dividing the problem into several smaller problems; the results show that by using a parallel approach where a model is constructed for season improve precision of the forecasts. The obtained results show that SVR performs better than neural network and multiple linear regressions. It was also observed that parameter selection using grid search algorithm in the case of SVR increase significantly the performance of the model.

Proceedings ITISE 2015. Granada 1-3, july, 2015

624

References: 1. Feinberg, E. A., & Genethliou, D.: Load forecasting. In: Anonymous Applied mathematics for restructured electric power systems, pp. 269-285. Springer (2005) 2. Haida, T., & Muto, S.: Regression Based Peak Load Forecasting using a Transformation Technique. Power Systems, IEEE Transactions on, vol. 9, no 4, p. 1788-1794. (1994) 3. Cleveland, W. P., & Tiao, G. C.: Decomposition of Seasonal Time Series: A Model for the Census X-11 Program. Journal of the American statistical Association, vol. 71, no 355, p. 581-587. (1976) 4. Perninge, M., Knazkins, V., Amelin, M. Söder, L.: Modeling the Electric Power Consumption in a multiǦarea System. European transactions on electrical power, vol. 21, no 1, p. 413423.(2011). 5. Ozturk, I., & Acaravci, A.: The Causal Relationship between Energy Consumption and GDP in Albania, Bulgaria, Hungary and Romania: Evidence from ARDL Bound Testing Approach. Appl. Energy, vol. 87, no 6, p. 1938-1943.( 2010) 6. Laouafi, A., Mordjaoui, M., Dib, D.: One-Hour Ahead Electric Load Forecasting Using Neuro-fuzzy System in a Parallel Approach. In: Anonymous Computational Intelligence Applications in Modeling and Control, pp. 95-121. Springer (2015) 7. Bianco, V., Manca, O., Nardini, S.: Electricity Consumption Forecasting in Italy using Linear Regression Models. Energy, vol. 34, no 9, p. 1413-1421.(2009) 8. Nezzar, M., Farah, N., Khadir, T.: Mid-Long Term Algerian Electric Load Forecasting using Regression Approach. In: Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE), 2013 International Conference on. IEEE. p. 121-126.( 2013) 9. Achnata, R.: Long Term Electric Load Forecasting using Neural Networks and Support Vector Machines. IJCST, vol. 3, no 1.(2012) 10. Ekonomou, L.: Greek Long-Term Energy Consumption Prediction using Artificial Neural Networks. Energy, vol. 35, no 2, p. 512-517.(2010) 11. Kandananond, K.: Forecasting Electricity Demand in Thailand with an Artificial Neural Network Approach. Energies, vol. 4, no 8, p. 1246-1257. (2011) 12. Sykes, A.O.: An introduction to regression analysis. 1993. Chicago Working Paper in Law and Economics. (1993) 13. Kubat, M.: Neural networks: a comprehensive foundation by Simon Haykin, Macmillan, 1994, ISBN 0-02-352781-7. (1999). 14. Riedmiller, M., & Braun, H.: A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm. In : Neural Networks, 1993., IEEE International Conference on. IEEE, p. 586-591. (1993) 15. Burges, C. J.: A Tutorial on Support Vector Machines for Pattern Recognition. Data mining and knowledge discovery , vol. 2, no 2, p. 121-167.(1998) 16. Schölkopf, B., Burges, C. J., Smola, A. J.: Using Support Vector Machine for Time Series Prediction, Advances in Kernel Methods, Eds. Cambridge, MA:MIT Press, pp. 242-253. (1999) 17. Abaidoo, R.: Economic growth and energy consumption in an emerging economy: augmented granger causality approach. Research in Business and Economics Journal , 01-15 (2011) 18. Cherkassky, V., & Ma, Y.: Practical selection of SVM parameters and noise estimation for SVM regression. Neural networks, vol. 17, no 1, p. 113-126. (2004) 19. Diehl, C. P., & Cauwenberghs, G.: SVM Incremental Learning, Adaptation and Optimization, vol 4, p. 2685-2690. (2003)

Proceedings ITISE 2015. Granada 1-3, july, 2015

625

20. Thornton, C., Hutter, F., Hoos, H. H., Leyton-Brown, K.: Auto-WEKA: Combined Selection and Hyper parameter Optimization of Classification Algorithms. 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.847-855 .(2013) 21. Bi, J., Bennett, K., Embrechts, M., M., Breneman, C., Song, M..: Dimensionality Reduction Via Sparse Support Vector Machines. The Journal of Machine Learning Research,vol 3,pp 1229-1243. (2003)

Proceedings ITISE 2015. Granada 1-3, july, 2015

626

Lihat lebih banyak...

Parallel seasonal approach for electrical load forecasting.

Descripción

Comentarios