Thread: Flower Classification Problem

  1. #1
    Registered User
    Join Date
    Aug 2019
    Location
    inside a singularity
    Posts
    308

    Flower Classification Problem

    The Flower Classification Problem is like the classic "Hello World" introduction to the world of Machine Learning and Artificial Intelligence or so I believe.

    Basically, the problem statement says that for a given of 5 different Flower types of different colours in addition to 4 other different properties of each flower type, make a prediction model that can predict the colour of a mystery flower with 4/5 given properties (5th one being the colour which is not given and the ML model is supposed to predict).
    I found this problem quite difficult at first as I'm just starting out and didn't understand how I should be using a neural network to solve it. So, I simplified the problem for myself to just consist of two flower types each type having three properties (Petal Length, Petal Width, Colour) and attempted at creating a prediction model. It's working pretty well at predicting the colour of the mystery flower but there's something disturbing me about the Cost function. I'll provide the code and explain my question.

    Here's my ML model:
    Code:
    /*
    Perceptron : Flower Classification Problem
    
    Neural Network Structure:
    ---------------------------------------------------
    
        O     Output Layer : Color (Blue or Red)
       / \
      O   O    Input Layer : (PetalLength , PetalWidth)
    ---------------------------------------------------
    */
    
    #include <iostream>
    #include <cmath>
    #include <random>
    
    enum Color
    {
        Undetermined = -1 ,
                Blue =  0 ,
                 Red =  1
    };
    
    struct Flower
    {
    public:
    
        float PetalLength;
        float PetalWidth;
        Color FlowerColor;
    
    public:
    
        Flower () = delete;
    
        Flower (float PL , float PW , Color C)
            : PetalLength (PL) ,
              PetalWidth  (PW) ,
              FlowerColor (C)
        { }
    
        friend std::ostream& operator << (std::ostream& stream , const Flower& flower);
    };
    
    std::ostream& operator << (std::ostream& stream , const Flower& flower)
    {
        stream << "[Petal Length]: " << flower.PetalLength << "\n [Petal Width]: " << flower.PetalWidth << "\n       [Color]: ";
    
        switch (flower.FlowerColor)
        {
        case -1: stream << "Undetermined"; break;
        case 0 : stream << "Blue";         break;
        case 1 : stream << "Red";          break;
        }
    
        return stream;
    }
    
    /* Function Prototypes */
    double Sigmoid       (double X);
    double dSigmoidX_dx  (double X);
    double CostFunction  (double Prediction , int Expected);
    double random_double (void);
    
    int random_int (void);
    
    /* Main Thread */
    int main (void)
    {
        /* Training Set */
        Flower FlowersDataSet [] =
        {
            // Blue: Characteristic feature - Smaller in size
            { 2.0f , 1.0f , Color::Blue } ,
            { 3.0f , 1.0f , Color::Blue } ,
            { 2.0f , 0.5f , Color::Blue } ,
            { 1.0f , 1.0f , Color::Blue } ,
            // Red: Characteristic feature - Larger in size
            { 3.0f , 1.5f , Color::Red } ,
            { 4.0f , 1.5f , Color::Red } ,
            { 3.5f , 0.5f , Color::Red } ,
            { 5.5f , 1.0f , Color::Red }
        };
    
        /* Initial Weights and Bias is random */
        double Weight1 = random_double();
        double Weight2 = random_double();
        double Bias    = random_double();
        /* ----------------------------------------- */
    
        /* Other required variables */
        int RandomIndex;
    
        double Activation ;
        double Prediction ;
        double Cost       ;
        double dCost_dPred;
        double dPred_dActi;
        double dActi_dW1  ;
        double dActi_dW2  ;
        double dActi_dBias;
        double dCost_dActi;
        double dCost_dW1  ;
        double dCost_dW2  ;
        double dCost_dBias;
    
        double LearningRate = 0.1;
        /* ------------------------ */
    
        /* Start Training */
        std::cout << "Training in Progress..." << std::endl;
    
        /* Training Loop */
        for (int i = 0; i < 10000; i++)
        {
            RandomIndex = random_int();
    
            Activation = (FlowersDataSet[RandomIndex].PetalLength * Weight1) + (FlowersDataSet[RandomIndex].PetalWidth * Weight2) + Bias;
    
            Prediction = Sigmoid(Activation);
    
            Cost = CostFunction(Prediction , FlowersDataSet[RandomIndex].FlowerColor);
            std::cout << Cost << std::endl;
    
            dCost_dPred = 2 * (Prediction - FlowersDataSet[RandomIndex].FlowerColor);
            dPred_dActi = dSigmoidX_dx(Activation);
            dActi_dW1   = FlowersDataSet[RandomIndex].PetalLength;
            dActi_dW2   = FlowersDataSet[RandomIndex].PetalWidth;
            dActi_dBias = 1;
    
            dCost_dActi = dCost_dPred * dPred_dActi;
    
            dCost_dW1   = dCost_dActi * dActi_dW1;
            dCost_dW2   = dCost_dActi * dActi_dW2;
            dCost_dBias = dCost_dActi * dActi_dBias;
    
            Weight1 -= (LearningRate * dCost_dW1  );
            Weight2 -= (LearningRate * dCost_dW2  );
            Bias    -= (LearningRate * dCost_dBias);
        }
    
        std::cout << "Training Completed!" << std::endl << std::endl << Weight1 << " " << Weight2 << " " << Bias;
    
        Flower MysteryFlower(1 , 1.5 , Color::Undetermined);
        Activation = MysteryFlower.PetalLength * Weight1 + MysteryFlower.PetalWidth * Weight2 + Bias;
        Prediction = Sigmoid(Activation);
    
        if (Prediction > 0.5)
            MysteryFlower.FlowerColor = Color::Red;
        else
            MysteryFlower.FlowerColor = Color::Blue;
    
        std::cout << std::endl << std::endl << MysteryFlower;
    
        return 0;
    }
    
    double Sigmoid (double X)
    {
        return (1 / (1 + exp(-X)));
    }
    
    double dSigmoidX_dx (double X)
    {
        return (Sigmoid(X) * (1 - Sigmoid(X)));
    }
    
    double CostFunction (double Prediction , int Expected)
    {
        return (pow(Prediction - (double)Expected , 2));
    }
    
    double random_double (void)
    {
        static std::default_random_engine e;
        static std::uniform_real_distribution<> dis(0, 1);
        return dis(e);
    }
    
    int random_int (void)
    {
        static std::default_random_engine e;
        static std::uniform_real_distribution<> dis(0, 8);
        return dis(e);
    }
    First, credit to c++ - Random float number generation - Stack Overflow for the random number generator.

    Second, my question. Have a look at the Cost values evaluated from the cost function:

    Code:
    0.00702527
    0.028994
    0.00011494
    3.07483e-007
    3.07481e-007
    0.0183549
    0.0177597
    0.00700561
    0.00691958
    0.000107458
    0.00683462
    0.000106515
    0.00039988
    3.9654e-007
    0.458472
    0.362696
    0.00298477
    0.00208283
    0.0877828
    6.75981e-005
    0.00141368
    2.02433e-006
    0.0036412
    0.00141755
    0.0687313
    7.49147e-005
    0.0042517
    0.00421865
    0.0100692
    0.0584421
    0.00084096
    0.079695
    0.000595151
    0.413079
    0.00884952
    0.0682537
    0.350009
    0.00229093
    0.00225457
    0.00681175
    0.264896
    9.67824e-006
    0.00197086
    0.198073
    0.0027157
    5.44921e-005
    5.44886e-005
    0.14506
    0.00147193
    0.00854737
    0.00347595
    2.27151e-006
    2.27145e-006
    0.00152141
    0.307664
    0.114488
    0.27559
    0.188302
    0.0932209
    0.112232
    0.0552676
    9.00705e-007
    0.0753365
    0.037417
    0.000457546
    0.442054
    0.00105253
    0.0904635
    0.0007183
    0.072252
    0.0143435
    0.420105
    0.00126335
    6.9766e-005
    0.00125219
    6.99779e-005
    0.00378638
    0.065188
    0.000935867
    0.0106762
    1.23841e-006
    0.00429752
    1.26348e-006
    1.26346e-006
    1.26344e-006
    0.000981218
    0.354948
    0.00285393
    0.00283864
    0.00221268
    0.0021787
    0.00689987
    0.26681
    0.193945
    0.0065364
    0.00645031
    5.42753e-005
    0.00263111
    5.40839e-005
    4.51614e-006
    0.00633342
    4.65456e-006
    0.102146
    0.0785372
    0.0987857
    0.0486906
    0.0682993
    4.87485e-007
    0.0567359
    0.0168735
    0.050095
    0.018354
    3.06251e-007
    0.000330341
    0.477846
    0.000760553
    8.18737e-005
    0.00479408
    0.0474058
    0.0053247
    0.00527377
    8.63794e-005
    0.000636328
    0.0414009
    0.00051712
    9.24693e-005
    0.42673
    0.00383604
    0.00120267
    1.61248e-006
    0.00382518
    0.330095
    0.150222
    6.33479e-005
    0.00154276
    0.0739423
    0.0595928
    7.73619e-005
    0.371281
    0.083115
    6.52392e-005
    0.0658824
    0.00423946
    0.0890867
    8.19332e-005
    0.0006847
    7.53096e-007
    7.53088e-007
    0.0427782
    0.00055308
    0.0368406
    0.0148277
    4.72599e-007
    0.0582564
    0.00679147
    9.96971e-005
    0.000380562
    9.97812e-005
    0.00671765
    0.0289541
    0.475973
    0.0462178
    0.00530084
    0.0686167
    0.0327887
    9.73331e-005
    0.0515055
    0.00031151
    2.66341e-007
    2.6634e-007
    0.0441569
    0.00814776
    0.000259398
    0.0195541
    0.000110522
    2.30931e-007
    2.30931e-007
    2.3093e-007
    0.018883
    0.000107695
    2.51411e-007
    2.5141e-007
    0.0429014
    0.0201657
    0.0194589
    0.496195
    8.21744e-005
    0.0409966
    5.22899e-007
    0.00570051
    0.0136075
    8.53853e-005
    0.0372898
    0.000467256
    0.0145405
    0.0337777
    0.451255
    7.00099e-005
    0.052781
    0.000746095
    7.98998e-007
    7.60235e-005
    0.0762551
    0.00564718
    0.00559017
    0.00553426
    0.414946
    0.321216
    0.103905
    0.00184598
    0.0018221
    0.0784298
    0.00367542
    0.00365047
    0.0637587
    1.11336e-006
    0.00420191
    0.000995856
    0.090963
    0.000678575
    0.0411686
    0.0357011
    4.1329e-007
    8.70425e-005
    8.70336e-005
    0.438844
    1.23631e-006
    0.00106607
    0.00404097
    1.24722e-006
    0.342892
    0.146433
    0.00818034
    0.00804849
    0.307051
    0.00229871
    0.110172
    2.92765e-006
    0.00288222
    0.13686
    6.14113e-005
    0.00368812
    1.56969e-006
    0.00366303
    0.00363824
    0.00845458
    0.0647591
    6.58112e-005
    6.58061e-005
    0.350301
    0.264483
    0.133239
    0.0985158
    0.128614
    1.27566e-006
    0.0983336
    0.0111236
    0.0798308
    0.00550363
    0.0657857
    0.448084
    0.0097384
    1.15332e-006
    0.0923929
    0.391377
    2.09064e-006
    0.00157073
    0.00765803
    2.13606e-006
    0.00754176
    0.00317685
    0.121487
    6.52685e-005
    0.00942837
    0.340445
    0.0061162
    0.00258561
    0.00257299
    0.149491
    1.93314e-006
    0.112158
    0.0100143
    0.0542413
    0.0771631
    5.33383e-007
    0.00545403
    8.25551e-005
    0.0130682
    0.0383342
    0.000477423
    0.00580472
    This is some part of the 10000 times that the Cost is printed. What's disturbing me is that I want my cost value to decrease gradually but quite a few times the cost rises up to quite a high value in comparison to the average cost. Are these random rises in cost a general trend seen in training models? Or is it just that my training model is flawed? If so, could any seasoned ML programmer point out what's wrong? Also, I had to bring my initial weights and bias into a range of 0-1 to get a good working model, but before I changed that the random_double values were set to give a result between 0-100. No matter how much I tried (by increasing number of iterations or playing with the learning rate) the cost would never be precise. It was just displaying a 0 or 1. Why is that? The Sigmoid function pretty much gives a value very close to 1 but never exactly 1 and yet my cost displayed 1 at quite a few places. How do I increase the precision of cost without any rounding off occuring?

    Thanks for your time and help!

    Here's what an example output looks like for the mystery flower:
    Code:
    [Petal Length]: 1
     [Petal Width]: 1.5
           [Color]: Blue
    Last edited by Zeus_; 11-01-2019 at 10:36 AM.

  2. #2
    Registered User Sir Galahad's Avatar
    Join Date
    Nov 2016
    Location
    The Round Table
    Posts
    277
    Quote Originally Posted by Zeus_ View Post
    This is some part of the 10000 times that the Cost is printed. What's disturbing me is that I want my cost value to decrease gradually but quite a few times the cost rises up to quite a high value in comparison to the average cost. Are these random rises in cost a general trend seen in training models? Or is it just that my training model is flawed?
    Not an ML expert. But a small dataset like that will probably not yield the best results. You need more samples.

    As to the fluctuations my first thought was that that particular method of using the first derivative is prone to problems during the gradient descent stage. It guarantees a local minima but that may not be the same thing as what you actually need, the global minima. That said the more samples you throw at it the less likely that tends to happen.

    To really solve the gradient descent issue you have to apply both the first AND second derivative to your cost function.


    Quote Originally Posted by Zeus_ View Post
    I had to bring my initial weights and bias into a range of 0-1 to get a good working model, but before I changed that the random_double values were set to give a result between 0-100. No matter how much I tried (by increasing number of iterations or playing with the learning rate) the cost would never be precise. It was just displaying a 0 or 1. Why is that? The Sigmoid function pretty much gives a value very close to 1 but never exactly 1 and yet my cost displayed 1 at quite a few places. How do I increase the precision of cost without any rounding off occuring?
    The sigma function squashes everything into the range [0, 1). The user can scale the result after the fact anyway. In your case that means multiplying by 100.

    The rounding may be due to overtraining. Be careful with the number of iterations. Inserting an occasional random value into the neural net can help prevent overtraining too, at the cost of some temporary instabilities. (Maybe do it for only the first half of the training loop?)

  3. #3
    Registered User
    Join Date
    Aug 2019
    Location
    inside a singularity
    Posts
    308
    > Not an ML expert. But a small dataset like that will probably not yield the best results. You need more samples.

    That's true, the more data sets I provide for each type of flower, the better the network will perform. But right now, all my program is trying to do is compute the line that is at a minimum distance from all points graphed (say, PetalLength on X-axis and PetalWidth on Y-axis). You'll get the 4 blue flowers more dense in one area and the 4 red ones with more density in another. Now, based on the minimum cost, there'd be a certain W1, W2 and B. Drawing a line across the blue and red points such that the average of the distances of each point from the line is minimum will be the desired minimum cost. Getting those random bumps in the cost means that the average of summation of distances of the points from the line determined by the NN is spiking up and this means that instead of gradually achieving a minima, the NN performs like a drunk man trying to walk up a hill gradually falling down and reaching the bottom instead of a gradual descent. So, therefore, it lead me to question if this is a general trend that is seen in real-life problem dealing NNs or is just that my NN is flawed in someway. Thanks for the response!

    >
    As to the fluctuations my first thought was that that particular method of using the first derivative is prone to problems during the gradient descent stage. It guarantees a local minima but that may not be the same thing as what you actually need, the global minima. That said the more samples you throw at it the less likely that tends to happen.

    Shouldn't the global minima and local minima of this NN be the same? Because there is only one minima as per my thought.... But yes, what you said should be true for other complex NNs.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 3
    Last Post: 02-09-2014, 06:46 PM
  2. Replies: 2
    Last Post: 01-06-2013, 07:49 AM
  3. Replies: 1
    Last Post: 12-07-2012, 10:00 AM
  4. Replies: 4
    Last Post: 10-16-2008, 07:30 PM

Tags for this Thread