ENHANCED NEURO-FUZZY ARCHITECTURE FOR ELECTRICAL LOAD FORECASTING

Previous researches about electrical load time series data forecasting showed that the result was not satisfying. This paper elaborates the enhanced neuro-fuzzy architecture for the same application. The system uses Gaussian membership function (GMF) for Takagi-Sugeno fuzzy logic system. The training algorithm is Levenberg-Marquardt algorithm to adjust the parameters in order to get better forecasting system than the previous researches. The electrical load was taken from East Java-Bali from September 2005 to August 2007. The architecture uses 4 inputs, 3 outputs with 5 GMFs. The system uses the following parameters: momentum=0.005, gamma=0.0005 and wildness factor=1.001. The MSE for short term forecasting for January to March 2007 is 0.0010, but the long term forecasting for June to August 2007 has MSE 0.0011.


INTRODUCTION
Previous researches [1]-[6] about electrical load time series data forecasting show that the result was not satisfying.Gustafson-Kessel clustering algorithm shows good result, but it is not satisfying [1].Forecasting with fuzzy C-means [2] showed that the MSE is not acceptable.For this reason, the same research but with another method, i.e. enhanced Gustafson-Kessel using evolutionary algorithm has evaluated by [3], but still the result is not acceptable.The other researches [4]-[6] also show that the result is not good.Their RMSE are 5.4%.In order to reach RMSE under 5%, the research is continued with different approach.
This paper elaborates the implementation of neuro-fuzzy for electrical load time series data forecasting.The proposed method uses the same data as in [1]-[6] so that the methods can be easily compared.According to those researches, the data is from East Java-Bali from September 2005 to August 2007.The neuro-fuzzy architecture will be explored in order to get the optimum architecture.It uses MIMO Takagi-Sugeno type and Levenberg-Marquardt training algorithm to make the training efficient.The architecture of neural network uses feed-forward neural network.The forecasting system uses both short and long time forecasting.Data from September to December 2006 is used for data training while the rest of the data is for data testing.The goal of this research is to reach the RMSE for LTF under 5%.

PROPOSED METHOD 2.1. Neuro-Fuzzy Architecture
Neuro-Fuzzy architecture combines the advantage of neural network and fuzzy logic in order to get better performance than operating them separately.Here, the neuro-fuzzy uses feed-forward neural network with Levenberg-Marquardt algorithm (LMA) training.
The fuzzy part uses MIMO Takagi-Sugeno type with differentiable membership, i.e.Gaussian membership function (GMF), and its to form fuzzy inference and defuzzifier.The output itself is represented as multilayer neural network as shown in Figure 1.The Takagi-Sugeno itself uses the architecture recommended by Palit and Popovic.Figure 2 shows the architecture.

Neuro-Fuzzy Training
The neuro-fuzzy is trained with LMA.The LMA was developed by Kenneth Levenberg and Donald Marquardt to accelerate neuro-fuzzy training.If function V(w) is parameter vector w then with Newton method, the update parameter can be defined as where ( ) w V is taken from sum of squares for error (SSE) formula.

( )
, where they are defined as with Jacobian matrix ( ) With Gauss Newton method, the last term in (4) becomes zero, then update parameter becomes To make the LMA training gives better performance, some researcher add momentum and modified error index, then the SSE becomes where avg e average error and SSE formula after modification is: w) is SSE without modification, therefore the new gradient can be defined with Jacobian matrix as shown equation ( 11), where learning rate (γ) must be smaller than 1 for LMA training.
In LMA training, the most important thing is calculation process for each layer of Jacobian matrix.The calculation can be derived from SEE for each adjustable parameter of fuzzy logic system, i.e.
When modified error index is added and the result is not always good, it is necessary to add control oscillation.To use control oscillation, it is compulsory to save two set adjustable parameters.If the next epoch in LMA training has lower SSE, then the next iteration must be processed with the new parameter.But when the next epoch over the multiplication of wildness factor (WF) and SSE, the current parameter is used for the next iteration.

RESEARCH METHOD
The data from September 2005 to August 2007 is divided into data training (September 2005 to December 2006) and data testing (Januari to August 2007).The electrical load data is taken every 30 minutes, so in a day the number of data is 48.The number of data training is 23,376 for 487 days.For preparation the data is normalized.
For the membership function is Gaussian, the inference rule and weighted average defuzzifier can be defined as , and The system needs 2 parameters for Gaussian Membership Function (GMF), i.e. means (χ) and variances (τ), with 0 W and i W as Takagi-Sugeno parameters.The starting values for these parameters are random.
The output of Neuro-Fuzzy is Error will be used in training and it is used according to equation ( 7) to update the parameter, i.e. means, variances, 0 W and i W . Parameter µ can be multiplied/divided by a constant according to the SSE value.µ will be multiplied with 200 if current SSE is greater than the previous one.When current SSE is less than the previous one, µ is divided by 10, otherwise, it will be multiplied with 10.If µ is too large or too small, then the value must be set to 10.
The modified error index is used to accelerate the training.For this purpose e(w) in equation ( 7) is changed to and the formula is added with γ is set to 0.005.
To find the update parameter of Neuro-Fuzzy, Jacobian matrices and its transpose must calculated.Formula ( 25)-( 28) are used to calculate them for 0 To get the Jacobian matrices and its transpose for means and variance, the authors use To avoid the oscillation becomes bigger, it is necessary to control it within 1%.
After all parameters are updated and the SSE converges to certain value, the forecasting process for both Short Term Forecasting (STF) and Long Term Forecasting (LTF).The matrix, it is called XIO, for STF could be MISO (Multiple Input Single Output) or MIMO (Multiple Input Multiple Output).The XIO for STF is The equation ( 33) can be represented for daily or weekly data.It can also represent in interval.
The XIO for LTF is MISO for the result will be used to forecast the next value.The equation (34) shows the XIO for LTF.It is 4 inputs and 1 output.

RESULT AND DISCUSSION
The first experiment is to find relationship between momentum (mo), modified error index (g) and wildness factor (WF).These experiments use 4 inputs and 3 outputs with the first 2000 data and iteration up to 100.The chosen value for g and WF is 0.0005 and 1.0010.Table 1 shows the summary of these experiments.Table 1 show that the momentum cannot be set so large or even zero.So the recommended value for momentum is small enough but not zero.
All SSE plot has almost the same shape.Figure 3 shows two of the SSE plot during the experiments.It is shown that the momentum can increase the speed of LMA training.Experiment with mo=0 reach SSE around 10 at 60th iteration while the other get there at 10th iteration.The experiment also shows that the bigger the momentum, the convergence speed of LMA training is not good.The SSE is even larger than the other experiments.The next experiment is to find suitable value for g.The other parameters is mo=0.1 and WF=1.001.These experiments show that g has small effect in LMA training.The smaller the modified error index, the smaller the SSE.Unfortunately, when g is so small, it looks like there is no significant effect for the SSE (see 0.0005 and 0.00001).With g=0.0005 and mo=0.1, the experiment is to find the suitable value for WF.Table 2 show these experiments result.Table 3 show that the bigger the WF, the larger the SSE.But the interesting point is found when the SSE plot is displayed.Figure 4 shows the effect of WF in the SSE.
Before using the architecture for forecasting application, it is necessary to check which combination of input-output will give good result.System uses the following parameters: mo=0.005,WF=1.005,GMF=5, gamma=0.0005.Table 4 summaries the result of this experiment.Combination of 4 inputs and 1 output gives the best MSE according to Table 4. Another interesting fact is that system with 1 output is preferable.So the forecasting will use MISO.The next experiment explores the suitable data retrieving process.These experiments use data representation every 30 minutes, every 24h and every week.It is also combined with input variation but the number of output is 1.Table 5 shows the results.The table shows that the best data representation is data per 30 minutes.The result is far better than the other conditions.

95
The number of epoch in training is also interesting issue.Table 6 shows several maximum epochs with SSE results.All experiments start with the same condition.According to Table 6, it is shown that the number of epochs is 500.When it is increased, the SSE forecasting is decreased until it reaches 500.Now, the architecture is tested for STF.The data is from January to March 2007.Figure 5 shows the plot of error for the whole data.The figure shows that the forecasting cannot follow all real data, especially when the real data is zero (e.g. during black out), the error is very big.This is normal because the condition such as black out is abnormal.To see how the STF follows the real data, Figure 6 shows the first 48 data for January to March 2007, i.e. at January 17, 2007.The STF shows that the forecasting can follow the real data as long as there is no abnormality.In LTF testing, the result will be used as input for the next data forecasting.It means the error will propagate to the next phase.The LTF uses data from June to August 2007.The MSE is 1.1E-3.Figure 7 shows the zoom for LTF on June 6, 2007.The parameters are input=4, output=1, GMF=5, mo=0.005,g=0.0005 and WF=1.001.The LTF result in Figure 7 is interesting for it can follow the real data.But off course, when it is going further, the result is not good anymore.The error from previous phase propagates to the current phase.

CONCLUSION
The momentum parameter cannot be set to great value.The greater the value, then the worst the result will be.The recommended value for momentum is less than 0.1.The MSE for momentum value under 0.1 is almost the same.The experiments show that the ability of neurofuzzy to forecast electrical load for East Java-Bali is good.The STF for January to March 2007 is 0.0010 while LTF for June to August 2007 is 0.0011.Wildness factor (WF) is interesting.The experiment recommended 5-10% to limit the oscillation.

Figure 4 .
Figure 4.The effect of WF in the SSE, (a).SSE training up to 100 epoch for WF=1.0001, and (b) SSE training up to 100 epoch for WF=1.6000 Figure 5. Error plot for STF (January to March 2007) 91Means and variance has dimension Mxn, where M is number of GMF and n is number of inputs.Dimension for i W is Mxk; k is number of output, while 0 W is Mxnxk.The output of Neuro-Fuzzy is calculated with Takagi-Sugeno rule, i.e.

Table 2 .
Summary of experiments with mo=0.1 and WF=1.0010

Table 4 .
Combination of input-output and its MSE result

Table 6 .
The number of maximum epochs with SSE results Enhanced Neuro-Fuzzy Architecture for Electrical Load Forecasting(Hany Ferdinando)