why doesn't my XOR network work?

**yann** · 09-27-2009

Hi, i made a XOR network that should I teach XOR function, i have 3 layers, input, output and one hidden, 5 neurons...code...

Code:

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int i;
float weight[100];
bool percept[100];
int input[100];
float tres[100];
int successive_right = 0;
int total_go_round = 1;
const float learning_rate = 0.1f;

int educate() {
  
        input[0] = 1; //bias, uvijek 1
        input[3] = 1;
        input[4] = 1;
        input[5] = 1;
        input[6] = 1;
        printf("inputs:\n");
        scanf("%d", &input[1]);
        scanf("%d", &input[2]);
        int goal;
        printf("goal:\n");
        scanf("%d", &goal);
   	percept[0] =  (weight[0]*input[0]+ weight[1]*input[1]+weight[2]*input[2] > tres[0]);
	percept[1] =  (weight[3]*input[3]+weight[4]*input[1]+weight[5]*input[2] > tres[1]);
 	percept[2] =  (weight[6]*input[4]+weight[7]*percept[0]+weight[8]*percept[1] > tres[2]);
   	percept[3] =  (weight[9]*input[5]+weight[10]*percept[0]+weight[11]*percept[1] > tres[3]);
    	percept[4] =  (weight[12]*input[6]+weight[13]*percept[2]+weight[14]*percept[3] > tres[4]);
        if (percept[4] == goal) {
            successive_right++;
        } else {
            successive_right = 0;
            int sign = goal ? 1 : -1; //sign of (y-f(x))
            weight[0] += learning_rate*sign*input[0];
            weight[1] += learning_rate*sign*input[1];
            weight[2] += learning_rate*sign*input[2];
            weight[3] += learning_rate*sign*input[3];
            weight[4] += learning_rate*sign*input[1];
            weight[5] += learning_rate*sign*input[2];
            weight[6] += learning_rate*sign*input[4];
            weight[7] += learning_rate*sign*percept[0];
            weight[8] += learning_rate*sign*percept[1];
            weight[9] += learning_rate*sign*input[5];
            weight[10] += learning_rate*sign*percept[0];
            weight[11] += learning_rate*sign*percept[1];
            weight[12] += learning_rate*sign*input[6];
            weight[13] += learning_rate*sign*percept[2];
            weight[14] += learning_rate*sign*percept[3];
           for(i=0;i<14;i++)
           {
           printf("%f", weight[i]);
           }     
      }
    printf("\n");
    printf("output:\n");
    printf("%d\n", percept[4]);
    return;
}


int main(){
   tres[0] = 0.5;
   tres[1] = 0.5;
   tres[2] = 0.5;
   tres[3] = 0.5;
   tres[4] = 0.5;
   srand(time(NULL));
   total_go_round=1;
   while(total_go_round <= 1500){
      educate();
      total_go_round++;
}
return 0;
}

So, I don't know why doesn't it work, it has no errors but it can't learn XOR...maybe i don't teach it well....Help? Please?

**MK27** · 09-27-2009

Well yann, part of the reason I tidied up that last piece of code was because I was hoping to communicate something to you: you will only be hampered in your efforts if you cannot use the C language well. I think you are overly excited about solving your problem, and so you fail to give enough attention to learning how to code. Unfortunately, you will have to learn to code before you learn to implement complex algorithms.

This seems to be a pretty "classic" illustration of of the consequence, because it looks to me like you are trying to write a multi-layer perceptron network that can learn XOR when in fact you have not bothered to learn how to do a XOR yourself!

Code:

        printf("inputs:\n");
        scanf("%d", &input[1]);
        scanf("%d", &input[2]);
        int goal;
        printf("goal:\n");
        scanf("%d", &goal);

Now, I presume you are piping in a file of values here and not sitting there entering numbers 4500 times! But why do you even need to do that? Here is a version of the target() function that performs a XOR: *

Code:

bool target(int y, int z) {
	if ((y && !z) || (z && !y)) return 1;
	return 0;
}

Does that make sense? If you had understood the use of logical operators, a basic element of the language, this would have been easy for you to do. Instead, presumably, you have wasted a lot of time on some kind of work around. Please correct me if I am wrong about my assumption.

So here's another illustration of how to exploit the "power" of basic C syntax. Which it is basic and will not be hard for you to learn if you take the time to try:

Code:

int x, *ptr;
for (i=0; i<15; i++;) {
	if ((i == 1) || (i == 8) || (i == 11)) x = 0;
	else if ((i == 1) || (i == 8) || (i == 11)) x = 1;
	else if ((i == 2) || (i == 5) || (i == 13)) x = 2;
	else if ((i == 3) || (i == 14)) x = 3;
	else if (i == 6) x = 4;
	else if (i == 9) x = 5;
	else if (i == 12) x = 6;
	if ((i == 9) || (i==12) || (i<7)) ptr = input;
	else ptr = percept;
	weight[i] += learning_rate*sign*ptr[x];
	printf("%f", weight[i]);
}

[corrected]

This replaces the series of assignments beginning on line 38. Now, there is not a big difference, but notice

this is less than half the number of lines
it is comperable efficiency wise
it is more logically organized

The more you expand this list, the more organizing it this way will help.

Enough "not a big difference" type things will add up to a very big difference as the code you are working on gets longer and more complex. Readability and logic are important. Right now, you are relying on your own memory: you understand the logic of the code because you wrote it. But once you get up to a few hundred or thousand lines of code, you will not be able to do that as easily. So by writing in a concise, well organized way, you will save your self time and headaches.

* ps I did try that with the single perceptron, it truly cannot learn XOR.

**brewbuck** · 09-27-2009

Your network will never learn XOR. It is mathematically impossible. We've been through this already.

**yann** · 09-27-2009

Originally Posted by brewbuck

Your network will never learn XOR. It is mathematically impossible. We've been through this already.

yes i know, i built a new one that can, but sometimes gets it wrong...it uses backpropagation...

Code:

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

float error[100];
int i;
float weight[100];
bool percept[100];
int input[100];
float tres[100];
int successive_right = 0;
int total_go_round = 1;
const float learning_rate = 0.1f;
const float learning_rate2 = 0.25f;
const float learning_rate3 = 0.1f;

bool target(int y, int z) {
	if ((y && !z) || (z && !y)) return 1;
	return 0;
}


int educate() {
  
	input[0] = 1; //bias, uvijek 1
        input[3] = 1;
        input[4] = 1;
        input[5] = 1;
        input[6] = 1;
        input[1] = rand() % 2;
        input[2] = rand() % 2;
        bool goal = target(input[1], input[2]);
   	percept[0] =  (weight[0]*input[0]+ weight[1]*input[1]+weight[2]*input[2] > tres[0]);
	percept[1] =  (weight[3]*input[3]+weight[4]*input[1]+weight[5]*input[2] > tres[1]);
 	percept[2] =  (weight[6]*input[4]+weight[7]*percept[0]+weight[8]*percept[1] > tres[2]);
   	percept[3] =  (weight[9]*input[5]+weight[10]*percept[0]+weight[11]*percept[1] > tres[3]);
    	percept[4] =  (weight[12]*input[6]+weight[13]*percept[2]+weight[14]*percept[3] > tres[4]);
        if (percept[4] == goal) {
        	successive_right++;
        }
        else {
            	successive_right = 0;
            	error[0] = goal - percept[4];
            	error[1] = (error[0]*weight[14]);
       	     	error[2] = (error[0]*weight[13]);
                error[3] = (error[1]*weight[1])+(error[2]*weight[2]);
            	error[4] = (error[1]*weight[4])+(error[2]*weight[5]);
                          
            	weight[0]  += learning_rate*error[4]*input[0];//input
            	weight[1]  += learning_rate*error[4]*input[1];//input
            	weight[2]  += learning_rate*error[4]*input[2];//input
            	weight[3]  += learning_rate*error[4]*input[3];//input
            	weight[4]  += learning_rate*error[3]*percept[1];//input
            	weight[5]  += learning_rate*error[3]*percept[0];//input
            	weight[6]  += learning_rate2*error[3]*input[4];//hiden
            	weight[7]  += learning_rate2*error[2]*percept[0];//hiden
            	weight[8]  += learning_rate2*error[2]*percept[1];//hiden
            	weight[9]  += learning_rate2*error[2]*input[5];//hiden
            	weight[10] += learning_rate2*error[1]*percept[0];//hiden
            	weight[11] += learning_rate2*error[1]*percept[1];//hiden
            	weight[12] += learning_rate3*error[1]*input[6];//output
            	weight[13] += learning_rate3*error[0]*percept[4];//output
            	weight[14] += learning_rate3*error[0]*percept[3];//output

}
	return;
}


int main(){
   	tres[0] = 0;
   	tres[1] = 0;
   	tres[2] = 0;
   	tres[3] = 0;
   	tres[4] = 0;
   	srand(time(NULL));
   	for(i=0;i<14;i++){
   		weight[i]= 0.5;
   	}    
   	total_go_round=1;
   	while(total_go_round <= 5000){
        	educate();
      		total_go_round++;
	}
        printf("inputs:\n");
        scanf("%d", &input[1]);
        scanf("%d", &input[2]);
   	percept[0] =  (weight[0]*input[0]+ weight[1]*input[1]+weight[2]*input[2] > tres[0]);
	percept[1] =  (weight[3]*input[3]+weight[4]*input[1]+weight[5]*input[2] > tres[1]);
 	percept[2] =  (weight[6]*input[4]+weight[7]*percept[0]+weight[8]*percept[1] > tres[2]);
   	percept[3] =  (weight[9]*input[5]+weight[10]*percept[0]+weight[11]*percept[1] > tres[3]);
    	percept[4] =  (weight[12]*input[6]+weight[13]*percept[2]+weight[14]*percept[3] > tres[4]);
        printf("%d\n", percept[4]);   	
	return 0;
}

now, this network will say 0 to 1 and one, and will say 1 to 0 and 1, but it will some times get it wrong

do you know why?

MK27, before i will start following you're advices, i think i should just finish what i started.

**MK27** · 09-27-2009

Originally Posted by yann

MK27, before i will start following you're advices, i think i should just finish what i started.

Sure, if you can. And it will probably take you longer that way. Honestly. I have to deal with this issue. Everyone does. You know what happens? You write some code and realize two months later how "stupid" it was, ie, that it would have been much easier if you had a better grasp of the essentials first. Meaning the code you wrote is now next to useless.

But no one wants to just sit and do "hello world" exercises out of a book all day. Try and strike a balance -- take some interest in the language. Most learning comes while you are trying to accomplish a goal. So slow down, and remember to learn as much as you can while you are at it, rather than being blinded by your need to "get it done". In the end, it will probably

take less time
teach you more
be done better

What you are doing now is like if someone gave you the worlds most incredible calculator, but you don't understand most of the buttons. So to calculate 4^5, you go:

4x4x4x4x4

Yes, it will work, but there is a pow() button. If you are doing a long, complicated exercise with a lot of exponents, you are going to waste a huge amount of time. How 'bout 123^12314? I guess it will be a huge accomplishment just to get that finished! Here's a simple rule: if you think there is a function or a method that might help, write a short, separate program to explore that method and save it. Even if it turns out to be not useful to you now, it probably will be later. The same is true if you see a method used that you do not understand or have not used before. Experiment. If you want to program, you might as well try and enjoy programming...

Also, you will get more respect from people (who could give you help) if you demonstrate some level of proficiency with the syntax, which right now you are not. You're a smart person yann, try and behave like one

**brewbuck** · 09-27-2009

Originally Posted by yann

yes i know, i built a new one that can, but sometimes gets it wrong...it uses backpropagation...

It "sometimes gets it wrong" because you are trying to do the impossible. Look, try it on paper to convince yourself.

Use a piece of graph paper. At coordinates (0,0) and (1,1), draw two green dots. At coordinates (0,1) and (1,0) draw two red dots. Now, try to draw a straight line such that the green dots are both on one side of the line, and the red dots are both on the other side of the line.

It's not possible. That's why a linear machine can never learn the XOR function. In order to learn this function, the machine must learn a curve, not a straight line.

**abachler** · 09-27-2009

You have to apply a sigmoid to the output before it is apssed on to the next neuron, or it will never learn a non-linear function like XOR, it is, as we have all tried to explain to you, impossible, not 'we don't know how to do it' impossible, mathematically impossible. It has been proven time and time again, it is the proof that nearly destroyed the entire field of neural network research.

This is why noone uses classical perceptrons any more. We use newer neuron models that have non-linear activation functions, this lets the neuron map a curved hyperplane, not just a linear one. Using multiple layers increases the number of folds the manifold can achieve. A simple problem like XOR needs at least 2 layers. It is the classical problem used to teach the fact that simple perceptrons cannot ever learn that function. No matter how many layers you have, a 5 billion layer perceptron will NEVER learn a problem that is not linearly separable. It has been proven that a perceptron of any number of layers can be reduced to a single layer network that yields the exact same outputs.

**yann** · 09-28-2009

But this 3 layers, which is enough for XOR, and it can do XOR, but only gets it it right in 1 of 3 times, I did it with backpropagation but i think I connected something/s wrong...

What I did wrong I think is that I don't have that "nonlinear activation", how to achive that?

Could you post an example like this:
if weight1*input1+weight2*input2+weightb*bias > threshold then out=1...?

And thank you people for you're effort...

**abachler** · 09-28-2009

3 layers or 20, doesnt make a difference if you don't have a non-linear activation function.

there are a few, and its fairly simple to apply them. After you perform the sum of products on the inputs and weights, you simply take the result and run it through f(x). f(x) could be any of the following-

Code:

double f(double x){
   if(x > 0.0) return 1.0;
   return 0.0;
   }

Code:

double f(double x){
   return sin(atan(x));
   }

Code:

double f(double x){
   return 1.0 / (1.0 + pow(2.71828182846 , 0.0 - x));
   }

Personally I use the sin(atan()) method because its faster than the logarithmic method, but the lgo method converges faster in my experience. So it basically comes down to when do you want the network to be faster, during development, or in the field, since my stuff actually goes out the door, I choose the field, and just spend extra time in development.

**yann** · 09-28-2009

OK, thank you, I will apply it to my code!

**mike_g** · 09-28-2009

The last learning function abachler posted is the "delta rule" which was what I learnt to use. Converging faster, means that you need shorter training times, so that may be better for initial testing. I might have to have a play around with the return sin(atan(x)) one myself if it produces better results when trained.

**yann** · 09-28-2009

Huh, the first thing you posted is just like the linear threshold function, it still gets things wrong, in the second thing sin(atan(x)), I get these error messages:

tmp/ccKzDcmq.o: In function `f':
mozak.c:(.text+0x19): undefined reference to `atan'
mozak.c:(.text+0x21): undefined reference to `sin'
collect2: ld returned 1 exit status

this is how I did it...

Code:

   	

double f(double x){
   return sin(atan(x));
   }
...
sum[0] =  (weight[0]*input[0]+weight[1]*input[1]+weight[2]*input[2]);
sum[1] =  (weight[3]*input[3]+weight[4]*input[1]+weight[5]*input[2]);
sum[2] =  (weight[6]*input[4]+weight[7]*percept[0]+weight[8]*percept[1]);
sum[3] =  (weight[9]*input[5]+weight[10]*percept[0]+weight[11]*percept[1]);
sum[4] =  (weight[12]*input[6]+weight[13]*percept[2]+weight[14]*percept[3]);
percept[0] = f(sum[0]);
percept[1] = f(sum[1]);
percept[2] = f(sum[2]);
percept[3] = f(sum[3]);
percept[4] = f(sum[4]);

I guess I did something wrong...? And yes, I included <math.h>

**brewbuck** · 09-28-2009

Just adding a nonlinear activation won't help anything. You need to adjust your learning/backprop function to account for the derivative of the activation when calculating error.

**yann** · 09-28-2009

Oh...and what should I actually do...?...(code examples are welcome...)...(: thanks.

**yann** · 09-28-2009

And how to eliminate those errors?

Thread: why doesn't my XOR network work?

Thread Tools

Search Thread

Display

why doesn't my XOR network work?

Similar Threads

strcmp returning 1...

getline() don't want to work anymore...

Why don't the tutorials on this site work on my computer?

fopen();

DLL __cdecl doesnt seem to work?