Time series prediction with a three layers 
backpropagation neural network:
Time series prediction with Multi layered perceptrons


Applet source.

Back propagation network class source.

The applet Neural network Tutorial, Running the applet...

This neural network has been conceived to predict times series. We know that the solar spots  (sun storms) occurs almost regularly, following a cycle of 11 years.
          Predicting the numbers of solar spots is a good training for our network.
1.First you will load (click on the ” load” button) a file with a certain number of data to train the network. After that you would be able to see a time serie in the prediction graph relative to the file you choose.

          2. You also need to set up the network : All the parameters, in order to set up the network are on the left :
          The number of units in the input and hidden layers, the momentum and the learning rate.
                  The default number of units is 2 in the Input Layer, 3 in the Hidden  Layer, and 1 in the Output Layer (which  is defined by default here).

       The learning rate, that is, the rate of change of the connection weights. The default value is 0.1.
       Momentum , which aim to  attenuate  a too high   the learning rate when the delta error is negative.The   default value is 0.999.

3.Initialize the network (click on the init” button).

4. Then you can select the training set ( click on the ”select training set” button first) by drag and drop the mouse on the prediction graph, the period you choose should be colorized.

          5. And let s start the training…(click on the ” train” button).
6. You would be able to stop and run the graphs (error graph and prediction graph) at  the time the network is training… You could also choose if the network should use data for prediction or should work with HIS own prediction.

 7. After a while, the network has train HIMself (“train” button disabled) and you can determine a year, when the network have to guess (click on the ” guess ” button) the time serie corresponding ( here solar spots)

 

 

Backpropagation .

                                         In many situations we are faced with incomplete or noisy data and it can be important to be able to make predictions about what is missing from the information available. If we can’t get any good theory or algorithm for the prediction, we can use Backpropagation networks to get some answers.

                                           A Backpropagation network consists of three layers of units: an input layer, at least one hidden layer and an output layer. Input units are connected to units in the hidden layer and hidden units fully connected to units in the output layer. Backpropagation networks adapt their weights to acquire new knowledge.

                                            Learning occurs during the training where each input pattern in a training set is applied to the input units and then propagated forward. The pattern of activation arriving at the output layer is then compared with the correct output pattern to calculate an error signal. The error signal for each such target output pattern is then backpropagated from the outputs to the inputs to adjust the weights in each layer of the network. The weights are randomised in the beginning to get the Backpropagation network to learn properly. After a Backpropagation network has learned the correct classification for a set of inputs, it can be tested on a second set of inputs to see how well it classifies untrained patterns.
 

Training
First each pattern Ip is presented to the network and propagated forward to the output. Second, a method called gradient descent is used to minimise the total error on the patterns in the training set. In gradient descent, weights are changed in proportion to the negative of an error derivative with respect to each weight.
Local minima
When an extra hidden layer is added to solve more difficult problems, the possibility arises for complex error surfaces which contain many minima. Some minimas can be deeper than others can, and it’s possible that we could not find the global minima. Instead the network may fall into a local minima. A rule of thumb is that the more hidden units you have in a network the less likely you are to encounter a local minimum during training.
Momentum
The concept of momentum is that previous changes in the weights should influence the current direction of movement in weight space. With momentum, once the weights start moving in a particular direction they tend to continue moving in that direction.

 
 

Times series predictions...


            When you work with time series prediction you need several different sets of data.  There should be a training set, a validation set and a test set. You use the training set to train the network and the validation set to validate the training, to avoid overtraining of the net. The overtraining is when the net learns too much about the training set that it starts to lose its generalisation skills, i.e., when the net sees things in the training set which isn’t there in the validation set.
 

Training set:
A set of examples used for learning, that is to fit the parameters [weights] of the classifier.
Test set:
A set of examples used only to assess the performance [generalization] of a fully-specified classifier.


          Since our goal is to find the network having the best performance on new data, the simplest approach to the comparison of different networks is to evaluate the error function using data which is independent of that used for training. Various networks are trained by minimization of an appropriate error function defined with respect to a training data set. The performance of the networks is then compared by evaluating the error function,  using an independent validation set, and the network having the smallest error with respect to the validation set is selected.

         The “one-step-ahead prediction, also known as the “single-step prediction” is when all input units are given the actual values of the observed time series.  The average relative variance, which is independent of the dynamic range of the data and of the record length of the series, is used. Comparing this with the TAR model it is possible to see that the net makes comparable results. Significant differences will appear for predictions further than one step into the future.

        There is other ways to predict more than one step into the future. One of those ways is “iterated single-step”, where the result of the prediction is fed back into the net again and used as a input for the next prediction. Obviously there is a problem with using predicted values as input and the error in prediction will rise with time and number of predictions.

The other way to go is by using “direct multi-step” where you train the net to do prediction several steps ahead. On a solarspot prediction the result were much worse than the “single-step prediction” however this doesn’t apply to all prediction. Most often the “direct multi-step” is significantly better.
 
 
 
 

Sample program    .

The program run as a java applet in an html file. It ’s devided in two different files , two different classes,
 
Class abouttime :
The graphical representation of the applet, user interactions control, and interface with the back propagation network:
Class BNP
This wills instanciate a three-layer backpropagation network with one single output unit.
The creator will take as input the numbers of neurons in the input layer, the numbers of neurons in the hidden layer, the sigmoid threshold and elasticity.


Set up Experiments with the solar spots time series
 

Learning rate :
As mentioned before it has been proven that a not high learning rate should been used while training the network. It will be interesting to experiment the network with a learning rate value between 0,1 and 0,3.

Momentum :  
As the training set is a lot larger than the numbers of neurons in the input layer , it s recommended to keep a momentum between 0,9 and 1.

Numbers of input neurons:
there is no mathematical reason to choose the numbers of input neurons thus we can assume that it will be efficient to have a range between 5 and 15, that will no keep the network faster to train.

Numbers of hidden neurons:
there is no mathematical reason also to choose the numbers of input neurons thus we can assume that it will be efficient to have a range between 5 and 10, that will no keep the network faster to train. The training set: it should be as representative as the time serie. The previous researches on that field usually take a large training set to be sure to get the concept of the graph.

Numbers of iterations:
We can assume the higher the numbers of iterations, the better predictions we get.Therefore we are going to plan the predictions on a thousand iterations.