Thread: Curve Fitting problem (nonlinear regression)

  1. #1
    Registered User
    Join Date
    Mar 2010
    Posts
    6

    Curve Fitting problem (nonlinear regression)

    Hi everyone!

    Currently I have this nonlinear regression problem whereby I need to perform a curve fitting based on one whole chunk of raw data(given below).
    The program has to be written in C++ which will give the equation's coefficients at the output.

    How can I also minimise the error for the cure fitting process? Could anyone help to give an estimate equation for this fitting as well?

    Thanks a million!
    Regression

    Raw data:
    X Y
    0.25 2000
    0.25 1780
    0.2 1600
    0.2 1520
    0.15 820
    0.15 800
    0.15 940
    0.2 1200
    0.2 1100
    0.2 1120
    0.3 1830
    0.4 2000
    0.4 2500
    0.5 2500
    0.5 2800
    0.6 3000
    0.6 3500
    0.7 3600
    0.75 3600
    0.8 3600
    0.9 3600
    0.85 3600
    0.15 620
    0.3 1180
    0.35 1280
    0.25 1060
    0.2 840
    0.4 1420
    0.45 1500
    0.5 1600
    0.55 1700
    0.6 1860
    0.65 2000
    0.3 1200
    0.3 1280
    0.3 1400
    0.35 1360
    0.35 1480
    0.4 1520
    0.4 1640
    0.4 1380
    0.45 1660
    0.45 1750
    0.5 1720
    0.5 1820
    0.55 1860
    0.55 1960
    0.2 800
    0.2 820
    0.25 920
    0.25 940
    0.3 1080
    0.3 1100
    0.35 1200
    0.35 1160
    0.4 1320
    0.45 1420
    0.45 1360
    0.5 1520
    0.55 1650
    0.55 1700
    0.55 1600
    0.6 1800
    0.6 1700
    0.7 1800
    0.7 1860
    0.8 1950
    0.8 2000
    0.15 220
    0.15 240
    0.15 260
    0.15 300
    0.15 320
    0.15 360
    0.15 380
    0.15 400
    0.15 420
    0.15 440
    0.2 340
    0.2 360
    0.2 380
    0.2 400
    0.2 420
    0.2 440
    0.2 480
    0.25 460
    0.25 480
    0.25 540
    0.25 560
    0.25 600
    0.25 620
    0.25 680
    0.25 700
    0.3 660
    0.3 680
    0.3 700
    0.3 720
    0.3 740
    0.3 760
    0.3 800
    0.3 880
    0.3 940
    0.35 540
    0.35 620
    0.35 660
    0.35 680
    0.35 720
    0.35 800
    0.35 840
    0.35 880
    0.35 820
    0.35 840
    0.35 1040
    0.4 600
    0.4 620
    0.4 640
    0.4 680
    0.4 800
    0.4 820
    0.4 840
    0.4 860
    0.4 900
    0.4 920
    0.4 1060
    0.45 620
    0.45 640
    0.45 700
    0.45 920
    0.45 960
    0.45 1120
    0.5 540
    0.5 620
    0.5 700
    0.5 960
    0.5 1100
    0.5 1140
    0.5 1260
    0.55 680
    0.55 940
    0.55 960
    0.55 1000
    0.55 1020
    0.55 1040
    0.55 1100
    0.55 1200
    0.55 1260
    0.55 1280
    0.6 800
    0.6 840
    0.6 860
    0.6 880
    0.6 940
    0.6 960
    0.6 1000
    0.6 1320
    0.7 1080
    0.7 1220
    0.7 1240
    0.7 1260
    0.7 1280
    0.7 1420
    0.7 1440
    0.7 1460
    0.8 1140
    0.8 1160
    0.8 1200
    0.8 1220
    0.8 1240
    0.8 1320
    0.8 1340
    0.8 1360
    0.8 1400
    0.8 1420
    0.8 1580
    0.8 1600
    0.9 1220
    0.9 1340
    0.9 1400
    0.9 1440
    0.9 1460
    0.9 1480
    0.9 1660
    0.9 1720
    0.9 1800
    0.9 1820
    0.9 1840
    1 1480
    1 1560
    1 1580
    1 1740
    1 1760
    1 1800
    1 1860
    1 1880
    1 1960
    0.15 100
    0.15 80
    0.2 100
    0.2 100
    0.25 150
    0.3 150
    0.3 200
    0.4 250
    0.4 280
    0.4 200
    0.5 300
    0.5 350
    0.55 350
    0.6 400
    0.6 450
    0.6 500
    0.7 550
    0.7 600
    0.7 500
    0.7 400
    0.75 550
    0.75 500
    0.75 600
    0.8 650
    0.8 700
    0.85 750
    0.85 820
    0.9 800
    0.9 900
    0.9 700
    0.9 650
    0.2 150
    0.15 60

  2. #2
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    So you take whatever nonlinear regression method you intend to use, code it in C++, and then run it.

  3. #3
    Registered User NeonBlack's Avatar
    Join Date
    Nov 2007
    Posts
    431
    Why don't you type "nonlinear least squares" into your favorite search?
    I copied it from the last program in which I passed a parameter, which would have been pre-1989 I guess. - esbo

  4. #4
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    There is no curve that fits that data. It is all over the place. Are you sure this is the correct data? There seem to be multiple series in it:

    http://neuralnw.com/brewbuck/regression.png
    Last edited by brewbuck; 03-19-2010 at 09:25 PM.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  5. #5
    Registered User
    Join Date
    Mar 2010
    Posts
    6
    Quote Originally Posted by brewbuck View Post
    There is no curve that fits that data. It is all over the place. Are you sure this is the correct data? There seem to be multiple series in it:

    http://neuralnw.com/brewbuck/regression.png
    Hello, thank you all for the replies! =)

    Hi brewbuck,

    the data are correct. I also feel that there's isn't any curve that fits those data.
    Actually now i intend to treat this problem as a "classification" problem. I intend to divide those data into 4 regions. So user upon entering any X and Y combi, the system will give the region(between 1-4) for that particular X and Y combi.
    data set 1-22 : region 1
    data set 23-68 : region 2
    data set 69-195 : region 3
    data set 196-228(end) : region 4

    However I do not know what model/equation I should use for this classification problem. Pls advice, thanks! I'm using genetic algo as shown below:


    int i=0;
    int rnd_down =0;
    double err=0;

    for(i=0; i<m_rows; i++){

    double x=m_xy[i][0]; // access 1st column from excel sheet
    double y=m_xy[i][1]; //2nd column
    double z=m_xy[i][2]; //3rd column ( i define every combi of X & Y into 4 regions, 1-4)
    double V=a1+a2*x+a3+b1+b2*y+b3*y*y; // What model/equation should I use here???

    rnd_down = (int)(V+0.5) ;
    rnd_down = rnd_down % 4;

    if (rnd_down != z){ // comparing wadever user X & Y combi with the 3rd column
    err +=1; }
    fx = err; // calculate fitness
    return fx;
    }


  6. #6
    Registered User
    Join Date
    Mar 2010
    Posts
    15
    You don't know what model/equation to use?? That must be part of the problem! If you don't have a curve to fit there is no problem. I mean if you're free to choose ANY curve you could choose the one which is piecewise constant and passes through all the points! Zero error then.

  7. #7
    Registered User
    Join Date
    Mar 2010
    Posts
    6
    hi mustermeister!

    hmmm, yeap the problem is that i cant find any suitable model/equation for his case. Any to recommend perhaps?

  8. #8
    Registered User
    Join Date
    Mar 2010
    Posts
    15
    Quote Originally Posted by regression View Post
    hi mustermeister!

    hmmm, yeap the problem is that i cant find any suitable model/equation for his case. Any to recommend perhaps?
    hi regression! Try linear regression. Fit a straight line to the data and see how well that works.

  9. #9
    Registered User
    Join Date
    Mar 2010
    Posts
    6
    Hi mustermeister, I've tried on that quite some time ago but it doesnt seems to work, as such I'm looking for other models..

  10. #10
    3735928559
    Join Date
    Mar 2008
    Location
    RTP
    Posts
    838
    how did linear regression 'not work'? no regression model will fit your data because the relative standard deviation about any particular data point is enormous. no model will give a reasonable pearson coefficient because there is simply too much noise in the input to achieve meaningful results.

    also, the polynomial model you used above is linear. linear does not mean a straight line.

    further, you can also curvilinearize the input data to further extend the reach of OLS into many nonlinear systems.

    the wikipedia articles on these subjects are actually very very good. Linear regression - Wikipedia, the free encyclopedia

    if you can code up a simple linear algebra library, you can create generalized linear regression code from these articles. i did it last year.

    honestly, i'm afraid that no forum is going to help you here because it seems you may have some basic misconceptions about your assignment. you seem to be grasping at straws a bit and i would honestly recommend you visit with your TA or professor to make sure you are understanding the assignment properly before proceeding further.

  11. #11
    l'Anziano DavidP's Avatar
    Join Date
    Aug 2001
    Location
    Plano, Texas, United States
    Posts
    2,743
    Actually now i intend to treat this problem as a "classification" problem. I intend to divide those data into 4 regions.
    I tried using the K-Means clustering algorithm, this is what I got:

    Attachment 9644
    My Website

    "Circular logic is good because it is."

  12. #12
    l'Anziano DavidP's Avatar
    Join Date
    Aug 2001
    Location
    Plano, Texas, United States
    Posts
    2,743
    Sorry that previous k means clustering was only set to allow 2 clusters, which obviously doesn't seem right.

    Here is the k means clustering at levels of 3, 4, and 5 clusters:

    3 clusters:
    Attachment 9645

    4 clusters:
    Attachment 9646

    5 clusters:
    Attachment 9647
    My Website

    "Circular logic is good because it is."

  13. #13
    Registered User
    Join Date
    Mar 2010
    Posts
    6
    Hi m37h0d and DavidP, thanks for the comment. =)

    DavidP: Oh actually attached is the classification which I've wanted. Right now I'm trying the Taylor series expression as the 'model' but still trying to figure out. (with the top green data as region 1 down to region 4 in purple)

    m37h0d: Yup due to the shortage of time, thus I do not have sufficient time to build up my knowledge in this area. In my code, I'm actually trying to input a model (trial and error) so that when user inputs any X & Y value, the program will automatically tells the user which region that set of data falls in, as simple as that.

  14. #14
    l'Anziano DavidP's Avatar
    Join Date
    Aug 2001
    Location
    Plano, Texas, United States
    Posts
    2,743
    In my code, I'm actually trying to input a model (trial and error) so that when user inputs any X & Y value, the program will automatically tells the user which region that set of data falls in, as simple as that.
    This is the difference between classification and clustering.

    With classification, you already have regions set in stone. When you add a new data point to the data set, it gets classified into one of the pre-existing regions.

    With clustering, your given a data set, and your goal is to discover what regions may exist (but not be readily apparent) in the given data set.

    Linear regression is normally a type of classification, because your regression model acts as the classifying agent.
    My Website

    "Circular logic is good because it is."

  15. #15
    Registered User
    Join Date
    Mar 2010
    Posts
    6
    yea DavidP you're right that is why I have specificied this as a 'classification' problem earlier in my 2nd post.

    well so now the most headache thing is to find that particular regression model(which I've problem finding). =XX

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. A question related to strcmp
    By meili100 in forum C++ Programming
    Replies: 6
    Last Post: 07-07-2007, 02:51 PM
  2. WS_POPUP, continuation of old problem
    By blurrymadness in forum Windows Programming
    Replies: 1
    Last Post: 04-20-2007, 06:54 PM
  3. Bin packing problem....
    By 81N4RY_DR460N in forum C++ Programming
    Replies: 0
    Last Post: 08-01-2005, 05:20 AM
  4. Replies: 5
    Last Post: 12-03-2003, 05:47 PM
  5. problem with output
    By Garfield in forum C Programming
    Replies: 2
    Last Post: 11-18-2001, 08:34 PM