why doesn't my XOR network work?

**mike_g** · 09-28-2009

I guess I did something wrong...? And yes, I included <math.h>

If you are compiling with gcc on linux you have to add the -lm switch to include math.h. I don't know why, but for for some reason you do.

After that you still have a lot of things that will need changing before this will work. Perhaps the first step would be to use randomly generated weights between -1.0 and +1.0.

**yann** · 09-28-2009

So, instead gcc name.c -o name is gcc name.c -lm name?

why do i need randomly generated weights? OK, i will add them, but tell me those other things i need...pretty please?

**MK27** · 09-28-2009

Originally Posted by yann

So, instead gcc name.c -o name is gcc name.c -lm name?

No, linker flags at the end:

gcc name.c -o name -lm

-lm just means "link math". Some header files also have pre-compiled library objects necessary to them.

**mike_g** · 09-28-2009

So, instead gcc name.c -o name is gcc name.c -lm name?

use: gcc name.c -lm -o name

why do i need randomly generated weights?

Its so you don't get bias to certain outcomes. It also means that each trained network behaves slightly differently.

OK, i will add them, but tell me those other things i need...pretty please?

First off learn how to use a loop. Then you could try producing a diagram of where your data is flowing through your network, with arrows and stuff. That would probably give you a better understanding of what you are doing. Then write some pseudo code listing the steps the program will take for your feed forward and feedback phases. Thats what I would do anyway.

**yann** · 09-28-2009

First off learn how to use a loop. Then you could try producing a diagram of where your data is flowing through your network, with arrows and stuff. That would probably give you a better understanding of what you are doing.

OK thank you...

**yann** · 09-28-2009

OK, this is my code, I still need to "account for the derivative of the activation when calculating error"
and i have no idea of how to do that, so be free to change my code anyway you want.

Code:

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>

float error[100];
int i;
float weight[100];
bool percept[100];
int input[100];
float tres[100];
double sum[100];
int successive_right = 0;
int total_go_round = 1;
const float learning_rate = 0.1f;
const float learning_rate2 = 0.25f;
const float learning_rate3 = 0.1f;


double f(double x){
   return sin(atan(x));
   }

/*double f(double x){
   if(x > 0.0) return 1.0;
   return 0.0;
   }*/

/*double f(double x){
   return 1.0 / (1.0 + pow(2.71828182846 , 0.0 - x));
   }*/



bool target(int y, int z) {
	if ((y && !z) || (z && !y)) return 1;
	return 0;
}


int educate() {
  
	input[0] = 1; //bias, uvijek 1
        input[3] = 1;
        input[4] = 1;
        input[5] = 1;
        input[6] = 1;
        input[1] = rand() % 2;
        input[2] = rand() % 2;
        bool goal = target(input[1], input[2]);
   	sum[0] =  (weight[0]*input[0]+weight[1]*input[1]+weight[2]*input[2]);
	sum[1] =  (weight[3]*input[3]+weight[4]*input[1]+weight[5]*input[2]);
 	sum[2] =  (weight[6]*input[4]+weight[7]*percept[0]+weight[8]*percept[1]);
   	sum[3] =  (weight[9]*input[5]+weight[10]*percept[0]+weight[11]*percept[1]);
    	sum[4] =  (weight[12]*input[6]+weight[13]*percept[2]+weight[14]*percept[3]);
        percept[0] = f(sum[0]);
        percept[1] = f(sum[1]);
        percept[2] = f(sum[2]);
        percept[3] = f(sum[3]);
        percept[4] = f(sum[4]);
        
        if (percept[4] == goal) {
        	successive_right++;
        }
        else {
            	successive_right = 0;
            	error[0] = goal - percept[4];
            	error[1] = (error[0]*weight[14]);
       	     	error[2] = (error[0]*weight[13]);
                error[3] = (error[1]*weight[1])+(error[2]*weight[2]);
            	error[4] = (error[1]*weight[4])+(error[2]*weight[5]);
                          
            	weight[0]  += learning_rate*error[4]*input[0];//input
            	weight[1]  += learning_rate*error[4]*input[1];//input
            	weight[2]  += learning_rate*error[4]*input[2];//input
            	weight[3]  += learning_rate*error[4]*input[3];//input
            	weight[4]  += learning_rate*error[3]*percept[1];//input
            	weight[5]  += learning_rate*error[3]*percept[0];//input
            	weight[6]  += learning_rate2*error[3]*input[4];//hiden
            	weight[7]  += learning_rate2*error[2]*percept[0];//hiden
            	weight[8]  += learning_rate2*error[2]*percept[1];//hiden
            	weight[9]  += learning_rate2*error[2]*input[5];//hiden
            	weight[10] += learning_rate2*error[1]*percept[0];//hiden
            	weight[11] += learning_rate2*error[1]*percept[1];//hiden
            	weight[12] += learning_rate3*error[1]*input[6];//output
            	weight[13] += learning_rate3*error[0]*percept[4];//output
            	weight[14] += learning_rate3*error[0]*percept[3];//output

}
	return;
}


int main(){
   	tres[0] = 0;
   	tres[1] = 0;
   	tres[2] = 0;
   	tres[3] = 0;
   	tres[4] = 0;
   	srand(time(NULL));
   	for(i=0;i<14;i++){
   		weight[i]= 0.5;
   	}    
   	total_go_round=1;
   	while(total_go_round <= 5000){
        	educate();
      		total_go_round++;
	}
        printf("inputs:\n");
        scanf("%d", &input[1]);
        scanf("%d", &input[2]);
   	percept[0] =  (weight[0]*input[0]+ weight[1]*input[1]+weight[2]*input[2] > tres[0]);
	percept[1] =  (weight[3]*input[3]+weight[4]*input[1]+weight[5]*input[2] > tres[1]);
 	percept[2] =  (weight[6]*input[4]+weight[7]*percept[0]+weight[8]*percept[1] > tres[2]);
   	percept[3] =  (weight[9]*input[5]+weight[10]*percept[0]+weight[11]*percept[1] > tres[3]);
    	percept[4] =  (weight[12]*input[6]+weight[13]*percept[2]+weight[14]*percept[3] > tres[4]);
        printf("%d\n", percept[4]);   	
	return 0;
}

**abachler** · 09-28-2009

Originally Posted by yann

Huh, the first thing you posted is just like the linear threshold function, it still gets things wrong, in the second thing sin(atan(x)), I get these error messages:And yes, I included <math.h>

hmm, thats what I was gonna say is to include math.h, which should have sin and atan defined in it, if it doesn't then your math.h is broken, you may also try <cmath>

you can also try just using tanh, which depending on the implementation might be faster, but less accurate.

sin(atan()) doesn't produce 'better' results, it just executes faster, because sin and atan are both supported in hardware.

The derivative of a function is teh function that decribed the change in that function, or put another way a function that outputs teh instantaeous slope of that function at any given point. That also makes the sin(atan()) easier, the derivative of the arc tangeant is 1 / (1 + x^2) the derivative of sin is cos, so the derivative of sin(atan(x)) would be

Code:

cos( 1.0 / (1.0 + x * x))

or something, my calculus is a bit rusty.

The suggestion to draw out the neural network is a good one, beign abel to visualize what is happening helps a lot.

**yann** · 09-28-2009

sooo...this:

weight[14] += learning_rate3*error[0]*cos( 1.0 / (1.0 + x * x)) *percept[3];

Right?
Everything is fine with math now...

**abachler** · 09-28-2009

Originally Posted by yann

sooo...this:
Right?
Everything is fine with math now...

ummm, who gave you that feedback equation. Most perceptrons just use

Code:

Weight += Error * Input * Alpha;

where alpha is the learning rate, typically 0.1, this implements Hebbian learning. You want to reduce alpha for larger numbers of inputs to stabilize the learning.

**yann** · 09-28-2009

Uhhh... this is complicated...i used to have input there, i don't know where i lost it...now...what is the right "Weight += Error * Input * Alpha;"
where do i put cos( 1.0 / (1.0 + x * x)) thing?
(please write the whole equation, i mean please, i had a very busy day at school and i hurt my head(nothing sirious...) and my head hurts and my dad is yelling at me because of something and i am totally confused...sorry...)

this is my actual code:
weight[0] += learning_rate*error[4]*input[0];, instead of input perceptron is like the input, they are connected...

**abachler** · 09-28-2009

Originally Posted by yann

Uhhh... this is complicated...i used to have input there, i don't know where i lost it...now...what is the right "Weight += Error * Input * Alpha;"
where do i put cos( 1.0 / (1.0 + x * x)) thing?
(please write the whole equation, i mean please, i had a very busy day at school and i hurt my head(nothing sirious...) and my head hurts and my dad is yelling at me because of something and i am totally confused...sorry...)

this is my actual code:
weight[0] += learning_rate*error[4]*input[0];, instead of input perceptron is like the input, they are connected...

you dont put the cos anywhere, its not part of the learning rule. What I gave you is the entire basic learning rule, its all you need.

Code:

void CNeuron::ApplyError(double* pInputs , double* Feedback , double Error){
	double Temp;

	this->Bias += Error * this->Bias * this->Alpha;
	for(unsigned long x = 0;x<this->InputCount;x++){
		Temp = pInputs[x] * Error * this->Alpha;
		this->UWeights[x] += Temp + this->UAdjustments[x];
		this->UAdjustments[x] = 0.0;
		Feedback[x] += Temp;
		}
	return;
	}

**Zach_the_Lizard** · 09-28-2009

Originally Posted by yann

OK, this is my code, I still need to "account for the derivative of the activation when calculating error"
and i have no idea of how to do that, so be free to change my code anyway you want.

Well, to account for the derivative, you'd have to know calculus. Such a thing is independent of whether or not you can code, so we'll probably be willing to help you if you don't know.

I'd have to guess that this is a definite case where you'd use partial derivatives. I think your activation function is a function of your various weights and input values. I'm trying to remember propagation of uncertainty, to see if that would apply here. Let's see here.....

I think it would be |partial x_0| * error for x_0 + |partial x_1| * error x_1 + ...... partial |x_n| * error x_n.

Hopefully someone who knows the math better than I do will step up

**Zach_the_Lizard** · 09-28-2009

Originally Posted by abachler

That also makes the sin(atan()) easier, the derivative of the arc tangeant is 1 / (1 + x^2) the derivative of sin is cos, so the derivative of sin(atan(x)) would be

Code:

cos( 1.0 / (1.0 + x * x))

or something, my calculus is a bit rusty.

You're right about arctan(x), and right about sin(x), but wrong about sin(atan(x)).

-(x^2/(1 + x^2)^(3/2)) + 1/sqrt(1 + x^2) is the derivative. You have to use the product rule and the chain rule

**yann** · 09-28-2009

void CNeuron::ApplyError(double* pInputs , double* Feedback , double Error){
double Temp;

this->Bias += Error * this->Bias * this->Alpha;
for(unsigned long x = 0;x<this->InputCount;x++){
Temp = pInputs[x] * Error * this->Alpha;
this->UWeights[x] += Temp + this->UAdjustments[x];
this->UAdjustments[x] = 0.0;
Feedback[x] += Temp;
}
return;
}

whuh, i i am so confused right now, i don't know the names of these variables, anything....

**brewbuck** · 09-28-2009

Originally Posted by mike_g

Its so you don't get bias to certain outcomes. It also means that each trained network behaves slightly differently.

Hmm.

In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.
"What are you doing?", asked Minsky.
"I am training a randomly wired neural net to play Tic-tac-toe", Sussman replied.
"Why is the net wired randomly?", asked Minsky.
"I do not want it to have any preconceptions of how to play", Sussman said.
Minsky then shut his eyes.
"Why do you close your eyes?" Sussman asked his teacher.
"So that the room will be empty."
At that moment, Sussman was enlightened.