Gerbyte : FANN Primer

You have navigated to this page as you have taken an interest in using Artificial Neural Networks (ANNs) to either:

  • make your life easier;
  • fix a problem; or
  • build Jonny 5 or a real working Terminator.

  • There seems to have been an interest in AI popping up all over the place recently, and as such I have decided to write this page which can be used to get you started with ANNs should you wish to have a go yourself. I have helped a few of you in the past, and this page is basically what I have told them, but in word form without the Oldham accent. :)

    But you read on let me mention a couple of things:

  • Artificial Intelligence is a MASSIVE subject that covers many areas. Artificial neural networks is just one of them.
  • I am not a professor/expert in ANNs. I have been interested in ANNs for a VERY long time now, I have studied it at different levels, with many very good people, and been in touch with other great minds on the subject. But saying that I do not claim to know the ins-and-outs of everything regarding this, but if you need help I will give it if I can.
  • This page is a guide to starting with FANN. I've explained a few things here (maybe a bit too much in places) and all to give an outline into how to create simple but effective ANNs. I would be typing forever to cover absolutely everything. This isn't a reference guide after all, just a simple guide covering all you need to know to get started. :)
  • "I have an idea to do X. Can this be done?" Get in touch if you wish to speak to me regarding ANNs as I'm always open to helping people with their ideas. If I see pitfalls, then I will say, just don't be disheartened.
  • ANNs are fun! Don't be scared! Read on and open your mind to all the possibilities. :)



  • ==What This Page Doesn't Discuss==

    Well, to name a few:

  • local minima problems
  • classification
  • plateaus
  • classification
  • learning rates
  • function steepness
  • thresholds
  • activation functions
  • transfer functions
  • ALL types of ANN
  • ED209
  • linear, sigmoid etc.
  • etc. etc.
  • ...and the list goes on and on. The main reason is because you don't really need to know about these to get started, but be aware of them if you wish to go further with this subject.


    ==What Is FANN?==

    FANN is a really light, very easy to use set of libraries that can be used to create Artificial Neural Networks (ANNs). This is what I've been using for the past decade or so on a Linux machine. All examples in this page are with linux.

    The FANN webpage (at time of writing) can be found here, and the VERY good reference manual can be found here should you wish to visit it and gaze at it lovingly. This is a dummy link


    ==Installing FANN==

    Below is the method using svn on a linux (Karky) machine. Git can be used too. For any other operationg systems then refer to the FANN website above.

    svn export https://github.com/libfann/fann.git fann
    cd fann
    cmake .
    sudo make install

    Once installed, you can make the examples:

    cd trunk/examples
    make all

    NOTE: After installing fann, you will more than likely want to create your own projects in a completely different folder, so remember to use the -lfann when compiling your code. For example:

    gcc -xor_train.c -o xor_train -lfann


    Then run the xor_train example:

    ./xor_train

    We should get a result similar to the following:

    Creating network. Training network. Max epochs 1000. Desired error: 0.0000000000. Epochs 1. Current error: 0.3927980661. Bit fail 4. Epochs 10. Current error: 0.2412641048. Bit fail 4. Epochs 20. Current error: 0.0420324467. Bit fail 4. Epochs 30. Current error: 0.0036144638. Bit fail 4. Epochs 40. Current error: 0.0006126538. Bit fail 4. Epochs 47. Current error: 0.0000449758. Bit fail 0. Testing network. 0.000025 XOR test (-1.000000,-1.000000) -> -0.989304, should be -1.000000, difference=0.010696 XOR test (-1.000000,1.000000) -> 0.993458, should be 1.000000, difference=0.006542 XOR test (1.000000,-1.000000) -> 0.993399, should be 1.000000, difference=0.006601 XOR test (1.000000,1.000000) -> -0.986030, should be -1.000000, difference=0.013970 Saving network. Cleaning up.
    I say similar because the values in your run will be different to mine, due to the internal workings of an ANN (I'm not going to explain why this time, but below I'll give a brief outline on ANNs (in English as much as possible) so it all makes sense.). If they ARE the same, I'd be VERY surprised! Try running it a second time. They're not the same are they? Didn't think so! ;)


    ==Installation Problems==

    I have only ever come across one of these, which was right now when I installed FANN onto my Karky laptop.

    ./xor_train: error while loading shared libraries: libfann.so.2: cannot open shared object file: No such file or directory
    This may show when trying to train. This is usually because the libraries have been installed to a /usr/local/lib instead of /usr/lib. To get around this, we can either manually copy the fann files to /usr/lib, or be more elegant and create symbolic links as follows:

    ln -s /usr/local/lib/*fann* /usr/lib/

    Any other problems then please refer to the FANN webpage mentioned above.



    ==Dissecting A Neural Network==

    Here I will outline (in a nutshell) what an ANN is, and describe all you need to know to get started to build a basic yet effective ANN.
    ANNS is a MASSIVE subject with which there are thousands of information pages and no doubt tutorials too online, so here I will only talk about the basics. I will then dissect the above C program that was used to train the network so that you will gain an insight into the world of ANNs and see for yourself how easy they are, and how easy they are to program. But first, the basics.

    What is a neuron?

    A set of neurons is what makes up an ANN. These are modeled upon a biological human neurons and were originally built to (try to) function the same way. Basically, they take a bunch of numerical inputs via an input vector (known as Dentrites biologically); does something mathematically with these values (this bit would be the Soma in the biological world) and then pass this new value to a single output (which represents the Axon in the biological world).

    What is a perceptron?

    In its simplest form, a perceptron is a single layer ANN and is brilliant for solutions where the results are binary, or 'linearly separable' (this is, data that can be separated on a graph using a strait line. Plot the four results of the AND gate on a graph is -1 and 1 for both the inputs and the x and y coordinates. You'll see that 0 and 1 results will be separated with a straight line. Now try with XOR. Results for this are not linearly separable so a perceptron cannot solve this problem).

    What is a threshold?

    A threshold is a value used to denote a whether or not a neuron "fires". In the form of a perceptron, if a threshold is set to 0.7, the perceptron would have to have a value of 0.7 or more to give a 1 result, otherwise it will not fire, it will give a 0 result.

    What is a bias neuron?

    A bias neuron is basically an extra neuron that is attached to one (or more) layers and connects to all the neurons in the next layer, and will always fire a 1 (ie it is always activated). The aim of the bias is to help with learning as it can shift the activation function result to the left or right.

    What is an activation function?

    This is the mathematical function that is run through to decide the resulting value of the output of each neuron. If plotted on a graph, these would give a shape usually with a sigmoid shape (that is an 'S' shape for all you thick people out there). In the case of a perceptron this would be a straight line as mentioned above.

    What is a layer?

    A layer is a single or number of neurons that has inputs and outputs. NOTE: A single layer network contains one input layer, one output layer and a layer of neuron(s). Although this could be seen as a three layer model, it is only one as there is one layer of neuron(s). All ANNs have an input and an output layer.

    What is a hidden layer?

    A hidden layer is an extra set of neurons that the previous layer of neurons will connect to. These may be fully connected or connected in an arbitrary way depending on how complicated you make your network. ;)

    What is a weight?

    A weight is attached to each input of each neuron. This is part of the magic of ANNs as they are changed during training to bring the overall error as close to the desired error as possible. They change by either increasing or decreasing the value slightly as data passes through it, or through back propagation.

    What is an epoch?

    During training the training data will pass through the ANN, each 'record' tweaking the weights as it passes through. When all records have passed through, the ANN is said to have been trained by one epoch, so an epoch is the training of one full set of training data. Typically, an ANN will use thousands of epochs to train a single set of data.

    What is over- and undertraining?

    Overtraining is when you train an ANN that results in an overall error too small. You might think, "isn't this desirable?" well the answer is yes and no. If the problem is something like the XOR problem as mentioned above, then this would be good as effectively there are only two inputs, which can each be of two values, so over training wouldn't be a problem. Overtraining an ANN that used patients data to predict heart problems would pose a problem as the ANN would be too specific towards the training data, so when tested with data that hasn't been seen before then it will give a way out result. Ie. if a test record was given to the trained ANN that was very similar to one training record, then this is what the result would be fit towards rather than an ideal blend between several records which could ultimately flip the outcome for that test record.

    Undertraining is when the ANN isn't trained enough. For example, hard to do but if you undertrained the XOR problem above then it would be possible to get a positive result from inputs 0.62 and 0.51. The results would be too vague. This vagueness would lead to random results which is also not ideal in the real world.

    The most successful ANNs are neither undertrained or overtrained.

    What is training and test data?

    In order to successfully train an ANN we need to get a load of data (the more the better) and use it to train. This is data that HAS the desired output with it and the ANN uses these results to work to.

    Test data is data that we have to test the ANN once it has trained. This is data that we know the outputs of, but we do not input the output, we use it to verify the output against the output from the ANN once the test run has been completed.

    NOTE: Some people may argue that you should have 3 sets, these being a training set, a verification set and a test set. Some implementations will use a verification set to not only test the data but to also give the ANN some "last minute tweaks" in order to optimise it. In this page I will only deal with training and testing data.

    I think these make up the basics. Let me know if there is anything I've missed.


    ==How Does An ANN Train?==

    Well this depends on the type of network chosen. There are bloody loads of them such as feed forward, back propagation networks, self organising networks, radial basis function networks etc. etc. etc.
    As I'm not going to describe them all or how they work, I'm going to describe a generic overall description of what happens in a nutshell, so depending on the ANN used this description may be in a different order or seen as wrong, but you get the idea. If you are interested in learning more then there should be loads of info on the net. If you do find owt good on the net though, that may be interesting or useful to others then put a link on the bottom of this page as ANNs are a great passion of mine! :)
    So, anyway, in a nutshell...

  • A desired total error is given to the network, this is the error that the ANN will work towards when training. When the ANN reaches (or dips below) this error then training will (or should) stop.
    Initially, all the weights in an ANN are set to a random value, usually with small values between -0.1 and 0.1 for example (there are ways that the ANN can decide this on setup, but I don't need to describe this for this page). They are NOT all set to 0 however (think about why this would be).
    The number of epochs will also be given to the ANN so that the ANN will know how many times to analyse the test data.
    Training data is applied to the network and the ANN will begin its training:

  • The data moves through the network, slightly changing the weights of the inputs so that the total error gets closer to the desired error. This is not a random act where they are randomly altered by a random amount, there are different ways this happens in order to get closer the the desired error, and the amount they change by are not fixed either (again, I don't need to elaborate here).
    At neuron level, each input value is multiplied by its corresponding weight, and there are weights attached to every input. When all these inputs reach the neuron, they are summed and depending on the activation function they give an resulting value to be passed onto the next neuron/output layer.

  • The total error is summed and compared after every epoch in order to see how close the the total error is to the desired error. The the total error is less-than or equal to the desired error then training will finish.

  • And that's it! Simple eh?



    ==Testing The ANN==

    Once the ANN has trained, it can be tested. The best way to do this is to use a set of data that you know the outcome of, and push this through the trained ANN. More likely than not you will get good results - not necessarily perfect but good enough to be worked in the real world.

    If you are happy with the results then you have a working model.


    ==Dissecting The XOR Example==

    FANN comes with a load of examples, the most easiest one (in my opinion) is the XOR example. The source code is as follows:
    #include #include "fann.h" int FANN_API test_callback(struct fann *ann, struct fann_train_data *train, unsigned int max_epochs, unsigned int epochs_between_reports, float desired_error, unsigned int epochs) { printf("Epochs %8d. MSE: %.5f. Desired-MSE: %.5f\n", epochs, fann_get_MSE(ann), desired_error); return 0; } int main() { fann_type *calc_out; const unsigned int num_input = 2; const unsigned int num_output = 1; const unsigned int num_layers = 3; const unsigned int num_neurons_hidden = 3; const float desired_error = (const float) 0; const unsigned int max_epochs = 1000; const unsigned int epochs_between_reports = 10; struct fann *ann; struct fann_train_data *data; unsigned int i = 0; unsigned int decimal_point; printf("Creating network.\n"); ann = fann_create_standard(num_layers, num_input, num_neurons_hidden, num_output); data = fann_read_train_from_file("xor.data"); fann_set_activation_steepness_hidden(ann, 1); fann_set_activation_steepness_output(ann, 1); fann_set_activation_function_hidden(ann, FANN_SIGMOID_SYMMETRIC); fann_set_activation_function_output(ann, FANN_SIGMOID_SYMMETRIC); fann_set_train_stop_function(ann, FANN_STOPFUNC_BIT); fann_set_bit_fail_limit(ann, 0.01f); fann_set_training_algorithm(ann, FANN_TRAIN_RPROP); fann_init_weights(ann, data); printf("Training network.\n"); fann_train_on_data(ann, data, max_epochs, epochs_between_reports, desired_error); printf("Testing network. %f\n", fann_test_data(ann, data)); for(i = 0; i < fann_length_train_data(data); i++) { calc_out = fann_run(ann, data->input[i]); printf("XOR test (%f,%f) -> %f, should be %f, difference=%f\n", data->input[i][0], data->input[i][1], calc_out[0], data->output[i][0], fann_abs(calc_out[0] - data->output[i][0])); } printf("Saving network.\n"); fann_save(ann, "xor_float.net"); decimal_point = fann_save_to_fixed(ann, "xor_fixed.net"); fann_save_train_to_fixed(data, "xor_fixed.data", decimal_point); printf("Cleaning up.\n"); fann_destroy_train(data); fann_destroy(ann); return 0; }
    Rather than putting comments in the code, I will extract the relevant bits and explain these below.
    const unsigned int num_input = 2; const unsigned int num_output = 1; const unsigned int num_layers = 3; const unsigned int num_neurons_hidden = 3; const float desired_error = (const float) 0; const unsigned int max_epochs = 1000; const unsigned int epochs_between_reports = 10;

    These are constants that are used in building the ANN. They are self explanatory. Here the desired error is set to 0 which is allowed for the XOR example as it can be achieved. HOWEVER, in the real world, choose something more realistic. One thing to notice here is the item I have put in bold - num_layers. The value used here MUST match the true number of layers for the network. If you wish to add/remove a layer, then this value must be increased/decreased accordingly. This is important for the next step...


    ann = fann_create_standard(num_layers, num_input, num_neurons_hidden, num_output);

    This line creates a standard fully connected network. As mentioned in the previous step, value must match. The first argument here denotes how many layers there are, and this MUST be followed by that amount of arguments. These follow-on arguments specify the number of neurons for whatever layer. So in simple terms, with the given values above, the function call fann_create_standard(num_layers, num_input, num_neurons_hidden, num_output); can be hard coded to be fann_create_standard(3, 2, 3, 1); and it would work exactly the same. If you wanted to add another layer of 2 neurons AFTER the current hidden layer you would declare something like const unsigned int num_neurons_hidden2 = 2; at the top of the file, change num_layers to be 4 and then call the function ann = fann_create_standard(num_layers, num_input, num_neurons_hidden, num_neurons_hidden2, num_output); to create the ANN. Or just use ann = fann_create_standard(4, 2, 3, 2, 1);. It's all quite simple really.


    data = fann_read_train_from_file("xor.data");

    Speaking of self explanatory - this line chooses the data file used to train the ANN. See Datafile section below to describe the data setup for this.


    fann_set_activation_steepness_hidden(ann, 1); fann_set_activation_steepness_output(ann, 1);

    These lines set the "steepness" of the activation function. The steepness of an activation function denotes how fast it goes from minimum to maximum, so a higher (steeper) value will give a more aggressive training.


    fann_set_activation_function_hidden(ann, FANN_SIGMOID_SYMMETRIC); fann_set_activation_function_output(ann, FANN_SIGMOID_SYMMETRIC);

    These lines set which type of activation function to use. In this case we are using FANN_SIGMOID_SYMMETRIC as the XOR example gives results that are not linearly separable. If we was to create something straight forward, like the AND gate as described above we could either leave this as it is or maybe change it to be something like FANN_LINEAR. There are bloody loads of these functions with FANN that can be found here.


    fann_set_train_stop_function(ann, FANN_STOPFUNC_BIT); fann_set_bit_fail_limit(ann, 0.01f);

    These two lines set the stop function that is used when training. The first line tells the ANN to use FANN_STOPFUNC_BIT which is the number of bits that fail rather than square error (I'm gonna stop there before I get too technical). The second line is only used if the stop function is set to FANN_STOPFUNC_BIT as it denotes the maximum accepted difference in bits between total and desired output error. Think decimal places.


    fann_set_training_algorithm(ann, FANN_TRAIN_RPROP);

    This is the training algorithm to use. I'm not going to discuss here what FANN_TRAIN_RPROP is as it is beyond the scope of this page. Basically, there are loads of training algorithms out there, and FANN has a lot of them coded ready for use, which can be called via the menu. If you are interested in viewing the different algorithms provided by FANN then check out this [http://libfann.github.io/fann/docs/files/fann_data-h.html#fann_train_enum link].


    fann_init_weights(ann, data);

    This line initialises the weights according to an algorithm developed that looks at the data to make the values more sensible rather than purely random. If you wanted to use pure random values instead, then the line fann_randomize_weights(ann,-0.1,0.1) would be put here instead to give min/max results as -0.1,0.1 respectively (sometimes in some cases, a purely random initialisation can result in better training).


    fann_train_on_data(ann, data, max_epochs, epochs_between_reports, desired_error);

    This line is self explanatory.

    This is the end of the training of the network. The next bit of the program is to test what results we would get when running the trained network against the inputs used to train the network (this is XOR - we can't really use any other values other than floats close to -1 and 1).


    for(i = 0; i < fann_length_train_data(data); i++) { calc_out = fann_run(ann, data->input[i]); printf("XOR test (%f,%f) -> %f, should be %f, difference=%f\n", data->input[i][0], data->input[i][1], calc_out[0], data->output[i][0], fann_abs(calc_out[0] - data->output[i][0])); }

    This loop gets the current total error and checks against the desired error for each record by calling fann_run(ann, data->input[i]). In this example we are only using four records so this will only result in four lines being output.


    fann_save(ann, "xor_float.net");

    This line saves the trained ANN to a file in a floating point format.


    decimal_point = fann_save_to_fixed(ann, "xor_fixed.net");

    This line saves the trained ANN to a file in a fixed point format, returning the bit position of the fixed point.


    fann_save_train_to_fixed(data, "xor_fixed.data", decimal_point);

    This line saves the training data to a fixed point format.


    fann_destroy_train(data); fann_destroy(ann);

    These lines are used to free up the memory used by the ANNs before the program ends.


    There we go, it's all very simple.



    ==FANN Training Data File Structure==

    If we look at the training data for the XOR training example, we see the following:

    4 2 1 -1 -1 -1 -1 1 1 1 -1 1 1 1 -1
    So what is this and what does it mean? This can be explained best with a little help from an XOR truth table.

    Input1Input2Outcome
    000
    011
    101
    110

    The first line of the training data gives information about the training data as follows:

    so we can read the first line as "we have 4 records, each having 2 inputs and 1 output".
    The second line denotes the input data for the first record and the third line represents the desired outcome of that data.
    NOTE: The training data uses -1 instead of 0 for the input. This is to separate the two extremes more as negative values are used with training (in this case).
    So we can read the second and third line as "This record has one input of -1 and another input of -1 which has the desired output of -1". This corresponds to the first line of the XOR truth table above. We then do the same to the rest of the file.

    FANN will also accept data files with records that are kept to the same line. This is my personal preference, especially for example if your ANN has 5 inputs and 5 outputs (you can scan halfway down a big file and immediately know which are inputs and which are outputs, as well as doing a line count to check the actual number of records compared to the header). So the file above can be set as follows:

    4 2 1 -1 -1 -1 -1 1 1 1 -1 1 1 1 -1
    and this will still work. :)
    And that's all there is to creating and training a neural network. Simple! :)



    ==A Few Tips For Beginner's Success==

    People in the past think that creating an ANN is a daunting task. IT ISN'T! But saying that, it isn't to say an ANN is the god of all decision making, it isn't, but it can give a bloody good and usually successful attempt at solving a problem.

    For example, horse A beats horse B. If the same race was run again, under EXACTLY the same conditions, would horse A beat horse B? Most probably it would. What if these horses ran under the same conditions a further 1000 times? The answer would be no! Same goes for an ANN, it cannot guarantee a perfect answer every time.

    Anyway, here are a few of my tips on building successful and effective ANNs.

  • Start small. I really cannot stress this enough, it's what I seemed to have told EVERYBODY I have ever been in an ANN discussion with in my life! What I mean by this is start with a low amount of inputs and increase them if you need to. This will allow for quicker training despite being easier on memory/process time. You may not think that starting small can be effective, but give it a try and you will probably surprise yourself (self driving cars have turned their images into greyscale and massively reduced resolution of the road they are "looking" at in front of them). If you need to do extra work on the input data to allow for smaller inputs then do it, the extra work will pay off in the long run. Trust me on this one!
  • Get good data sets. If you have a good, clean set of data that has a good amount of data that covers the full range of what you want to produce a solution for, then your ANNs will train with better results. If you are dividing this data into training/testing data then 70% training data with 30% test data is usually sufficient (if you're verifying then I would use 60% training, 20% verify and 20% test).
  • Normalise input data. Only if you need to though. Good normalised inputs can produce great results, especially it it takes into account future changes to data to your problem ("Oh dear! That output value that I feed into my ANN has been changed and now takes inputs up to the value of 20 rather than 10 which is what I originally trained my ANN with! That means the scaling has buggered up! Bollocks to it!")
  • Set an ideal desired error. Depending on what the problem is, I will personally use between 0.005 and 0.05. Like I say, it depends on the problem. If you set it too low, not only will you end up with an overtrained ANN but it will take forever to get there, which it probably won't do (combine this with a shit load of input neurons and loads of hidden layers then you're wasting your time).
  • Don't use too many hidden layers. They're good for separating the data ready for the output layer to give a desired outcome, but too many will provide too much unnecessary... erm... everything.
  • Save often and uniquely for every run. If you do have an ANN that takes AGES to train, then I suggest setting it to save its state every so often, such as every 5 epochs to a unique filename. That way, you can stop the training and come back later (if you have a machine that is always switched on then ignore this). Also, you may find that the learning plateaus close to your desired error. If you save often you then have that network so you can try again (saving to a different name) and keep the network which works best for you.
  • Be patient. Rome was not built in a day. Playing with ANNs introduces a LOT of trial and error. Depending on the state of the ANN it can take AGES to train, and a lot of times either the ANN will not train, or it will plateau or get stuck in a local minima. If this happens try playing around with the amount of layers and the number of neurons in each.
  • Have fun! :) It can be hard work, but a successful ANN that solves a complex problem can be very rewarding!



  • There you go folks - that's all from me.

    Have fun,

    gerbil
    ----

    Next week I'll be discussing the pros and cons of wearing fleeces with embroidered 'Alaskan Wolf Scenes' and such during winter, and how not to get totally pissed off within the first thirty minutes of an automated telephone menu system and queue.




    Did you learn anything from this page? Buy me a coffeeBuy me a coffee

    Powered by Caffeinated Projects.