# Thread: Curve Fitting problem (nonlinear regression)

1. ## Curve Fitting problem (nonlinear regression)

Hi everyone!

Currently I have this nonlinear regression problem whereby I need to perform a curve fitting based on one whole chunk of raw data(given below).
The program has to be written in C++ which will give the equation's coefficients at the output.

How can I also minimise the error for the cure fitting process? Could anyone help to give an estimate equation for this fitting as well?

Thanks a million!
Regression

Raw data:
X Y
0.25 2000
0.25 1780
0.2 1600
0.2 1520
0.15 820
0.15 800
0.15 940
0.2 1200
0.2 1100
0.2 1120
0.3 1830
0.4 2000
0.4 2500
0.5 2500
0.5 2800
0.6 3000
0.6 3500
0.7 3600
0.75 3600
0.8 3600
0.9 3600
0.85 3600
0.15 620
0.3 1180
0.35 1280
0.25 1060
0.2 840
0.4 1420
0.45 1500
0.5 1600
0.55 1700
0.6 1860
0.65 2000
0.3 1200
0.3 1280
0.3 1400
0.35 1360
0.35 1480
0.4 1520
0.4 1640
0.4 1380
0.45 1660
0.45 1750
0.5 1720
0.5 1820
0.55 1860
0.55 1960
0.2 800
0.2 820
0.25 920
0.25 940
0.3 1080
0.3 1100
0.35 1200
0.35 1160
0.4 1320
0.45 1420
0.45 1360
0.5 1520
0.55 1650
0.55 1700
0.55 1600
0.6 1800
0.6 1700
0.7 1800
0.7 1860
0.8 1950
0.8 2000
0.15 220
0.15 240
0.15 260
0.15 300
0.15 320
0.15 360
0.15 380
0.15 400
0.15 420
0.15 440
0.2 340
0.2 360
0.2 380
0.2 400
0.2 420
0.2 440
0.2 480
0.25 460
0.25 480
0.25 540
0.25 560
0.25 600
0.25 620
0.25 680
0.25 700
0.3 660
0.3 680
0.3 700
0.3 720
0.3 740
0.3 760
0.3 800
0.3 880
0.3 940
0.35 540
0.35 620
0.35 660
0.35 680
0.35 720
0.35 800
0.35 840
0.35 880
0.35 820
0.35 840
0.35 1040
0.4 600
0.4 620
0.4 640
0.4 680
0.4 800
0.4 820
0.4 840
0.4 860
0.4 900
0.4 920
0.4 1060
0.45 620
0.45 640
0.45 700
0.45 920
0.45 960
0.45 1120
0.5 540
0.5 620
0.5 700
0.5 960
0.5 1100
0.5 1140
0.5 1260
0.55 680
0.55 940
0.55 960
0.55 1000
0.55 1020
0.55 1040
0.55 1100
0.55 1200
0.55 1260
0.55 1280
0.6 800
0.6 840
0.6 860
0.6 880
0.6 940
0.6 960
0.6 1000
0.6 1320
0.7 1080
0.7 1220
0.7 1240
0.7 1260
0.7 1280
0.7 1420
0.7 1440
0.7 1460
0.8 1140
0.8 1160
0.8 1200
0.8 1220
0.8 1240
0.8 1320
0.8 1340
0.8 1360
0.8 1400
0.8 1420
0.8 1580
0.8 1600
0.9 1220
0.9 1340
0.9 1400
0.9 1440
0.9 1460
0.9 1480
0.9 1660
0.9 1720
0.9 1800
0.9 1820
0.9 1840
1 1480
1 1560
1 1580
1 1740
1 1760
1 1800
1 1860
1 1880
1 1960
0.15 100
0.15 80
0.2 100
0.2 100
0.25 150
0.3 150
0.3 200
0.4 250
0.4 280
0.4 200
0.5 300
0.5 350
0.55 350
0.6 400
0.6 450
0.6 500
0.7 550
0.7 600
0.7 500
0.7 400
0.75 550
0.75 500
0.75 600
0.8 650
0.8 700
0.85 750
0.85 820
0.9 800
0.9 900
0.9 700
0.9 650
0.2 150
0.15 60

2. So you take whatever nonlinear regression method you intend to use, code it in C++, and then run it.

3. Why don't you type "nonlinear least squares" into your favorite search?

4. There is no curve that fits that data. It is all over the place. Are you sure this is the correct data? There seem to be multiple series in it:

http://neuralnw.com/brewbuck/regression.png

5. Originally Posted by brewbuck
There is no curve that fits that data. It is all over the place. Are you sure this is the correct data? There seem to be multiple series in it:

http://neuralnw.com/brewbuck/regression.png
Hello, thank you all for the replies! =)

Hi brewbuck,

the data are correct. I also feel that there's isn't any curve that fits those data.
Actually now i intend to treat this problem as a "classification" problem. I intend to divide those data into 4 regions. So user upon entering any X and Y combi, the system will give the region(between 1-4) for that particular X and Y combi.
data set 1-22 : region 1
data set 23-68 : region 2
data set 69-195 : region 3
data set 196-228(end) : region 4

However I do not know what model/equation I should use for this classification problem. Pls advice, thanks! I'm using genetic algo as shown below:

int i=0;
int rnd_down =0;
double err=0;

for(i=0; i<m_rows; i++){

double x=m_xy[i][0]; // access 1st column from excel sheet
double y=m_xy[i][1]; //2nd column
double z=m_xy[i][2]; //3rd column ( i define every combi of X & Y into 4 regions, 1-4)
double V=a1+a2*x+a3+b1+b2*y+b3*y*y; // What model/equation should I use here???

rnd_down = (int)(V+0.5) ;
rnd_down = rnd_down % 4;

if (rnd_down != z){ // comparing wadever user X & Y combi with the 3rd column
err +=1; }
fx = err; // calculate fitness
return fx;
}

6. You don't know what model/equation to use?? That must be part of the problem! If you don't have a curve to fit there is no problem. I mean if you're free to choose ANY curve you could choose the one which is piecewise constant and passes through all the points! Zero error then.

7. hi mustermeister!

hmmm, yeap the problem is that i cant find any suitable model/equation for his case. Any to recommend perhaps?

8. Originally Posted by regression
hi mustermeister!

hmmm, yeap the problem is that i cant find any suitable model/equation for his case. Any to recommend perhaps?
hi regression! Try linear regression. Fit a straight line to the data and see how well that works.

9. Hi mustermeister, I've tried on that quite some time ago but it doesnt seems to work, as such I'm looking for other models..

10. how did linear regression 'not work'? no regression model will fit your data because the relative standard deviation about any particular data point is enormous. no model will give a reasonable pearson coefficient because there is simply too much noise in the input to achieve meaningful results.

also, the polynomial model you used above is linear. linear does not mean a straight line.

further, you can also curvilinearize the input data to further extend the reach of OLS into many nonlinear systems.

the wikipedia articles on these subjects are actually very very good. Linear regression - Wikipedia, the free encyclopedia

if you can code up a simple linear algebra library, you can create generalized linear regression code from these articles. i did it last year.

honestly, i'm afraid that no forum is going to help you here because it seems you may have some basic misconceptions about your assignment. you seem to be grasping at straws a bit and i would honestly recommend you visit with your TA or professor to make sure you are understanding the assignment properly before proceeding further.

11. Actually now i intend to treat this problem as a "classification" problem. I intend to divide those data into 4 regions.
I tried using the K-Means clustering algorithm, this is what I got:

Attachment 9644

12. Sorry that previous k means clustering was only set to allow 2 clusters, which obviously doesn't seem right.

Here is the k means clustering at levels of 3, 4, and 5 clusters:

3 clusters:
Attachment 9645

4 clusters:
Attachment 9646

5 clusters:
Attachment 9647

13. Hi m37h0d and DavidP, thanks for the comment. =)

DavidP: Oh actually attached is the classification which I've wanted. Right now I'm trying the Taylor series expression as the 'model' but still trying to figure out. (with the top green data as region 1 down to region 4 in purple)

m37h0d: Yup due to the shortage of time, thus I do not have sufficient time to build up my knowledge in this area. In my code, I'm actually trying to input a model (trial and error) so that when user inputs any X & Y value, the program will automatically tells the user which region that set of data falls in, as simple as that.

14. In my code, I'm actually trying to input a model (trial and error) so that when user inputs any X & Y value, the program will automatically tells the user which region that set of data falls in, as simple as that.
This is the difference between classification and clustering.

With classification, you already have regions set in stone. When you add a new data point to the data set, it gets classified into one of the pre-existing regions.

With clustering, your given a data set, and your goal is to discover what regions may exist (but not be readily apparent) in the given data set.

Linear regression is normally a type of classification, because your regression model acts as the classifying agent.

15. yea DavidP you're right that is why I have specificied this as a 'classification' problem earlier in my 2nd post.

well so now the most headache thing is to find that particular regression model(which I've problem finding). =XX