![]() |
| | #1 |
| Registered User Join Date: Nov 2006 Location: Lurking about
Posts: 212
| Neural Networks - Calculating Error Gradients I'm working on implementing a neural network, but I'm having trouble calculating error gradients on both output and hidden layers. I'm using the identity function as my activation function: f(x) = x. I am pretty clueless when it comes to calculus so I'm really having trouble with it. I found this Web page that has a good explanation http://www.willamette.edu/~gorr/clas...9/linear2.html I just can't seem to figure out how to implement it. I have an example of the gradient calculation of a network that is using a sigmoid activation function Code: inline double trainer::getOutputErrorGradient( double desiredValue, double outputValue)
{
//return error gradient
return outputValue * ( 1 - outputValue ) * ( desiredValue - outputValue );
}
Any help would be greatly appreciated.
__________________ If you take something apart and put it back together enough times, you will eventually have enough parts left over to build a second one. |
| IdioticCreation is offline | |
| | #2 |
| Crazy Fool Join Date: Jan 2003 Location: Canada
Posts: 2,588
| [disclaimer: it's been a long time since I've studied NNs]. But anyway, a couple of points. First, a neural network with a linear function doesn't need any hidden layers. They can be collapsed since it's just a linear combination (you can directly compute the outputs from the inputs). Second, the backprop algorithm propagates errors using the derivative of the activation, the example you have is likely the closed form for the sigmoid (the most common function used from what I remember). In your case the derivative is constant since the function is linear. Why do you want to use f(x) = x? You should use the sigmoid.
__________________ jeff.bagu.org - Terrain rendering and other random stuff |
| Perspective is offline | |
| | #3 |
| Rampaging 35 Stone Welsh Join Date: Apr 2007
Posts: 2,926
| Neural networks more or less require a sigmoid to be neural networks, otherwise its not a neural network. That said there are a lot of sigmoid functions. The one I use the most is f(x) = sin(atan(x)) because it is easy to impliment in hardware for feedforward networks. It takes a little longer to train than some other sigmoid functions, but once trained executes a lot faster. Another sigmoid that is fast for feed forward networks in f(x) = x / (abs(x) + 1.0). However, for calculating the error gradient its just the difference between the actual output and the expected output. Most feedback functions use some algorithm for assigning a lesser error to each input based on that inputs influence on the output
__________________ He is free, you say. Ah! That is his misfortune… These men… [have] the most terrible, the most imperious of masters, that is, need. … They must therefore find someone to hire them, or die of hunger. Is that to be free? - Simon Linguet Last edited by abachler; 02-09-2009 at 03:22 PM. |
| abachler is offline | |
| | #4 |
| Registered User Join Date: Nov 2006 Location: Lurking about
Posts: 212
| Oh, interesting. I only chose a linear function because I was/still am under the impression that the sigmoid function can only give outputs between 0 and 1 (or maybe some other range if the functions is altered). I wanted analog outputs, I don't see how you can do that with a sigmoid function. It seems to me that with a sigmoid function if the output is close to one, than the neuron fires, if it is close to zero than it does not fire. What if your output was a decimal number? Would you have to have enough output nodes to get it in binary? or is there some other way? edit: Ohhh, I was thinking. Could I just say f(x) = sin(atan(x))*9 Then it would return output between -9 and 9. Will it work like that? I was also thinking maybe some kind of step function, but that's just a guess.
__________________ If you take something apart and put it back together enough times, you will eventually have enough parts left over to build a second one. Last edited by IdioticCreation; 02-09-2009 at 04:01 PM. |
| IdioticCreation is offline | |
| | #5 |
| Crazy Fool Join Date: Jan 2003 Location: Canada
Posts: 2,588
| A NN does not output the answer as a computation. The idea is that each output node represents some answer and the values are like a probability distribution over answers. For example, if you are using an NN as a classifier, you'd have an output node for each output class. Let's say our inputs are a feature vector of a text document, the output nodes could represent SPORTS, POLITICS, and ENTERTAINMENT. The (normalized) output of running the NN on a partcilar document about a movie may be [0.2, 0.1, 0.7] which suggests that the document is most likely an ENTERTAINMENT document.
__________________ jeff.bagu.org - Terrain rendering and other random stuff |
| Perspective is offline | |
| | #6 |
| Registered User Join Date: Nov 2006 Location: Lurking about
Posts: 212
| Wow, I can't believe I never realized that. I knew that was how they seemed to be used most of the time, but I thought they would preform computations as well. So in order to output an 8 digit number, I would need 10 nodes for each number, each node representing 0-9. Which ever node activates would be the number for that position. That would mean 72 output nodes though ![]() I was hoping I might train a neural network to solve a problem like this: 1 ? 2 = 21 45 ? 65 = 5465 98 ? 8 = 898 32 ?43 = 3342 The question mark is just an operator, and basically the output is just a rearrangement of the inputs. A pattern. Is this at all possible with a neural network?
__________________ If you take something apart and put it back together enough times, you will eventually have enough parts left over to build a second one. |
| IdioticCreation is offline | |
| | #7 | ||
| Senior software engineer Join Date: Mar 2007 Location: Portland, OR
Posts: 5,381
| Quote:
Quote:
__________________ "Congratulations on your purchase. To begin using your quantum computer, set the power switch to both off and on simultaneously." -- raftpeople@slashdot | ||
| brewbuck is offline | |
| | #8 | |
| Registered User Join Date: Nov 2006 Location: Lurking about
Posts: 212
| Quote:
At any rate, do you think I can still salvage my project? Or is it not something that can be solved with neural networks?
__________________ If you take something apart and put it back together enough times, you will eventually have enough parts left over to build a second one. | |
| IdioticCreation is offline | |
| | #9 | |
| Senior software engineer Join Date: Mar 2007 Location: Portland, OR
Posts: 5,381
| Quote:
You could also select an activation function which is not bounded. The function sigmoid(x) + x is still nonlinear, but not bounded.
__________________ "Congratulations on your purchase. To begin using your quantum computer, set the power switch to both off and on simultaneously." -- raftpeople@slashdot | |
| brewbuck is offline | |
| | #10 | |
| Senior software engineer Join Date: Mar 2007 Location: Portland, OR
Posts: 5,381
| Quote:
If you were designing the weights in the network by hand, I'm sure you could come up with a way to make the network do what you are asking. The question is whether the backprop algorithm can relax the network into the right set of weights. Who knows without trying.
__________________ "Congratulations on your purchase. To begin using your quantum computer, set the power switch to both off and on simultaneously." -- raftpeople@slashdot | |
| brewbuck is offline | |
| | #11 |
| Registered User Join Date: Nov 2006 Location: Lurking about
Posts: 212
| Thanks brewbuck, I guess it's back to the drawing board for me. I read somewhere that the number of training examples needs to be about 60 times the number of weights in a network to avoid overfitting, so having 72 output nodes would make it very difficult. Thank you for the help everyone!
__________________ If you take something apart and put it back together enough times, you will eventually have enough parts left over to build a second one. |
| IdioticCreation is offline | |
| | #12 | |
| Senior software engineer Join Date: Mar 2007 Location: Portland, OR
Posts: 5,381
| Quote:
Assign: 0.0 -> 0 0.1 -> 1 0.2 -> 2 ... 0.9 -> 9 For both the input layer and output layer. The network should be able to zero in on that. You could even distribute the values "sigmoidally" to be friendly to your activation function. EDIT: It's just my hunch from working with these in the past, that for this task you'll probably need multiple hidden layers. Don't ask me why, just a hunch.
__________________ "Congratulations on your purchase. To begin using your quantum computer, set the power switch to both off and on simultaneously." -- raftpeople@slashdot | |
| brewbuck is offline | |
| | #13 | |
| Rampaging 35 Stone Welsh Join Date: Apr 2007
Posts: 2,926
| Quote:
__________________ He is free, you say. Ah! That is his misfortune… These men… [have] the most terrible, the most imperious of masters, that is, need. … They must therefore find someone to hire them, or die of hunger. Is that to be free? - Simon Linguet | |
| abachler is offline | |
| | #14 | |
| Registered User Join Date: Nov 2006 Location: Lurking about
Posts: 212
| Quote:
I also read some stuff about people saying someone did a proof that shows no backprop network would ever need more than one hidden layer. I can't find the link, but I'll mess with the structure stuff and just do what works. Thanks for the info abachler, I don't understand some of that at the moment, but once I start adjusting training sets and the network structure I will come back and check it out more. You guys are great, thank you for all the help.
__________________ If you take something apart and put it back together enough times, you will eventually have enough parts left over to build a second one. | |
| IdioticCreation is offline | |
| | #15 |
| Rampaging 35 Stone Welsh Join Date: Apr 2007
Posts: 2,926
| That is incorrect, the number of hidden layers depends on the particulars of the output manifold; e.g. a smooth multivariate manifold can be approximated with no fewer than 3 layers of weights (i.e. 2 hidden layers). For example, this image would require such a network to classify whether a given point is black or white. A circle would require 4 layers. This is assuming you use arbitrary precision mathematics. In practice a network that uses finite precision floating point, like doubles, may require more layers or nodes, or both.
__________________ He is free, you say. Ah! That is his misfortune… These men… [have] the most terrible, the most imperious of masters, that is, need. … They must therefore find someone to hire them, or die of hunger. Is that to be free? - Simon Linguet Last edited by abachler; 02-10-2009 at 12:57 AM. |
| abachler is offline | |
![]() |
| Thread Tools | |
| Display Modes | |
|
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Neural networks - am i doing this properly? | Bobcat | General AI Programming | 6 | 04-01-2009 08:48 PM |
| Details about artificial neural networks | ChadJohnson | General AI Programming | 1 | 07-23-2005 10:29 AM |
| Recursion | Lionmane | C Programming | 11 | 06-04-2005 12:00 AM |
| Need examples on Neural Networks | khpuce | General AI Programming | 2 | 05-23-2005 11:26 AM |
| Neural Networks VS. Spike Neural Networks | magis | General AI Programming | 1 | 04-12-2005 06:37 AM |