# Thread: Neural networks - am i doing this properly?

1. ## Neural networks - am i doing this properly?

Hello,

I'm attempting to create my first neural network, but I only have highschool maths so most of the formulas out there are pretty hard for me to understand, so I was hoping someone could tell me if i'm doing this properly.

I'm creating a spam classifier, I have 7 inputs and I assume I only need one output, and if it were to output something close to a 1 it would be considered "spam", and something close to a 0 it would be considered "not spam".

Basically this is all I have:

Do I need more than this, other than to calculate error and adjust the weights for it to work?

As for calculating error for the final weight i would take the expected result of the network(say 1) and minus the actual result from it, say (0.7), which would give me (0.3).

What confuses me next is

1) How would I use this number to edit the weights in the previous layer?

and 2) How do I work out the desired value for weights other than the final one? For example weights in the middle of the network.

And I found this while looking around

Code:
```inline double trainer::getOutputErrorGradient( double desiredValue, double outputValue)
{
return outputValue * ( 1 - outputValue ) * ( desiredValue - outputValue );
}```
What this would do is take my 0.7 output and 1 desired value and return a 0.063, what would i do with that?

any help is appreciated.

2. If you are using neural networks for spam filtering you should take a look at this paper:

My wife and I were co-authors so I can answer any questions you have. The reason I mention it is because we found that neural networks are extremely good spam classifiers if you use the right kind of inputs. We had to combine our neural network with an info-theoretic clustering method before it started working well. (To be honest, implementing the clusterer was WAY more fun than the neural network, unfortunately I gave that task to my wife and didn't get to do it myself)

As far as the math, you're going to have a bit of trouble if you don't understand what a derivative is. But you ought to be able to find some algorithms already laid out, which you could implement without too much work even if you don't completely understand them.

Or if you just want to play with neural networks, go download SNNS (UNIX only), it's pretty powerful.

3. I am a bit concerned about my inputs, would you say i've chosen the right kind of inputs here for this to work?

1) Words: For this 2 wordlists were built using words from spam emails and normal emails. 1 if the email contained more words from the spam wordlist than the normal email wordlist, otherwise 0.

2) Images: 1 if the message contained an image, 0 if not.

3) Colours: 1 if the message contained an unusual colour such as pink or red, otherwise 0.

4) HyperLinks: 0.5 if the message contained 1 hyperlink, 1 if it contained more than 1, otherwise 0.

5) Domain: For this two wordlists were built using domains from spam emails and normal emails. 1 if the email contained more words from the spam domain wordlist than the normal domain wordlist, otherwise 0.

6) Top Level Domain: For this two wordlists were built using top level domains from spam emails and normal emails. 1 if the email contained more words from the spam top level domain wordlist than the normal top level domain wordlist, otherwise 0.

7) Priority: 1 if the priority of the message was high, otherwise 0.

4. actually you need to multiply the input by the weight, sum the results from each input/weight, then apply the sigmoid, not just add them together. You also are NOT using a fully connected network, as such a network requires far more weights than are shown for the number of inputs and intermediate nodes. I also suggest using 0.9 == spam and 0.1 == not spam when training.

5. Originally Posted by Bobcat
1) Words: For this 2 wordlists were built using words from spam emails and normal emails. 1 if the email contained more words from the spam wordlist than the normal email wordlist, otherwise 0.
The features you list are pretty good. Mapping all words to two buckets (spam/nonspam) is crude, but when combined with other features might perform okay. In our implementation, we used 10-15 buckets, with a clustering algorithm that assigned words to buckets.

In addition to what you're already checking, you may want to consider:

* the sender's address (is the sender known or unknown?)
* whether the message is in response to a valid thread
* the time of day the message was sent (bucketed appropriately)
* the time itself -- far into the future or past?

6. Thank you for the "0.9 == spam and 0.1 == not spam" suggestion, as it would have been unlikely to ever get a 1 or a 0.

I'm curious, what categories did you use to seperate the words you encountered into 10-15 different buckets? verbs, adjectives, nouns. Or very common words, lesser common words, uncommon words. Or something else?

As for back propogation, i've come across this and want to know if i'm interpreting it correctly: http://www.rgu.ac.uk/files/chapter3%20-%20bp.pdf

In the picture I gave above, lets say the white boxes are neurons labeled 1 to 11, top to bottom then left to right.
And w8 w9 w10 w11 w12 are connected to the first box and link to all the boxes on the second column

Then to update weights:

NewWeight13 = OldWeight13 + (error x Neuron11Output)
where error = Neuron11Output * ( 1 - Neuron11Output) * (desiredOutput - Neuron11Output)

And

NewWeight2 = OldWeight2 + (error x Neuron1Output)
where error = Neuron1Output * (1 - Neuron1Output) * (Neuron6'sError * weight8 + Neuron7'sError * Weight9 + Neuron8'sError * Weight10 + Neuron9'sError * Weight11 + Neuron10'sError *Weight12 )

Have i interpreted this correctly?

7. When selecting keywords, I would use a simple statistical analysis of the terms, words and phrases used in spam versus non-spam, All words or phrases for which the threshold of differentiation is greater than some arbitrary setting, defined by you, could be used as an input to the network. For example, words like 'natural' 'male' and 'enhancement' will have a high correlation with spam, while words like 'the' will be pure noise. If you then train the network on all word inputs, you can correleate things like 'the all nautral male enhancement' which should trigger a high output. You should also correlate them using seperate statistical analysis for the subject and the body, and one using both. The final network should not be trained until it is perfect, use early stopping. Specificalyl stop as soon as the trainign set yeilds a firm differentiation in classes. For each example, run teh network in feed forward mode adn check the output. Then see if the outputs for spam all fall above or below the outputs for non-spam. Now retrain using the examples from eaqch that overlap, then recheck. Repeat until the network properly differentiates all your examples, and keep teh threshold at which it makes this choice. choosing an arbitrary threshold like 0.5 will tend to tryt o warp teh hyperplane in ways that while mathematically feasable are difficult to attain in real woorld hardware which has finite precision. Trying to force the most optimal hyperplane can and often does cause teh trainign to disregard solutions that are less bound to the parameters fo perfection, but are non-theless correct in their differentiation. Do not forget to add new examples to the training set and to retrin as often as possible. Usually the point at which a new example is added is teh perfect time to retrain, as most people are more than happy to let their computer spend lots of time learnign to ignore the most recent peice of spam to get through.

Here is some code, which I used to build MARGO. This is just 'one' of the feedforward functons.
Code:
```void CNode::FastFeedForward(double* Input , double* Output){
this->Temp[0] = 0.0;

// this is the pure C/C++ implimentation
for(DWORD x = 0;x<this->NumInputs;x++){
this->Temp[0] += (Input[x] * (this->Weights[x]));
}
*Output = sin(atan(this->Temp[0]));

return;
}```