1. ## Flower Classification Problem

The Flower Classification Problem is like the classic "Hello World" introduction to the world of Machine Learning and Artificial Intelligence or so I believe.

Basically, the problem statement says that for a given of 5 different Flower types of different colours in addition to 4 other different properties of each flower type, make a prediction model that can predict the colour of a mystery flower with 4/5 given properties (5th one being the colour which is not given and the ML model is supposed to predict).
I found this problem quite difficult at first as I'm just starting out and didn't understand how I should be using a neural network to solve it. So, I simplified the problem for myself to just consist of two flower types each type having three properties (Petal Length, Petal Width, Colour) and attempted at creating a prediction model. It's working pretty well at predicting the colour of the mystery flower but there's something disturbing me about the Cost function. I'll provide the code and explain my question.

Here's my ML model:
Code:
```/*
Perceptron : Flower Classification Problem

Neural Network Structure:
---------------------------------------------------

O     Output Layer : Color (Blue or Red)
/ \
O   O    Input Layer : (PetalLength , PetalWidth)
---------------------------------------------------
*/

#include <iostream>
#include <cmath>
#include <random>

enum Color
{
Undetermined = -1 ,
Blue =  0 ,
Red =  1
};

struct Flower
{
public:

float PetalLength;
float PetalWidth;
Color FlowerColor;

public:

Flower () = delete;

Flower (float PL , float PW , Color C)
: PetalLength (PL) ,
PetalWidth  (PW) ,
FlowerColor (C)
{ }

friend std::ostream& operator << (std::ostream& stream , const Flower& flower);
};

std::ostream& operator << (std::ostream& stream , const Flower& flower)
{
stream << "[Petal Length]: " << flower.PetalLength << "\n [Petal Width]: " << flower.PetalWidth << "\n       [Color]: ";

switch (flower.FlowerColor)
{
case -1: stream << "Undetermined"; break;
case 0 : stream << "Blue";         break;
case 1 : stream << "Red";          break;
}

return stream;
}

/* Function Prototypes */
double Sigmoid       (double X);
double dSigmoidX_dx  (double X);
double CostFunction  (double Prediction , int Expected);
double random_double (void);

int random_int (void);

int main (void)
{
/* Training Set */
Flower FlowersDataSet [] =
{
// Blue: Characteristic feature - Smaller in size
{ 2.0f , 1.0f , Color::Blue } ,
{ 3.0f , 1.0f , Color::Blue } ,
{ 2.0f , 0.5f , Color::Blue } ,
{ 1.0f , 1.0f , Color::Blue } ,
// Red: Characteristic feature - Larger in size
{ 3.0f , 1.5f , Color::Red } ,
{ 4.0f , 1.5f , Color::Red } ,
{ 3.5f , 0.5f , Color::Red } ,
{ 5.5f , 1.0f , Color::Red }
};

/* Initial Weights and Bias is random */
double Weight1 = random_double();
double Weight2 = random_double();
double Bias    = random_double();
/* ----------------------------------------- */

/* Other required variables */
int RandomIndex;

double Activation ;
double Prediction ;
double Cost       ;
double dCost_dPred;
double dPred_dActi;
double dActi_dW1  ;
double dActi_dW2  ;
double dActi_dBias;
double dCost_dActi;
double dCost_dW1  ;
double dCost_dW2  ;
double dCost_dBias;

double LearningRate = 0.1;
/* ------------------------ */

/* Start Training */
std::cout << "Training in Progress..." << std::endl;

/* Training Loop */
for (int i = 0; i < 10000; i++)
{
RandomIndex = random_int();

Activation = (FlowersDataSet[RandomIndex].PetalLength * Weight1) + (FlowersDataSet[RandomIndex].PetalWidth * Weight2) + Bias;

Prediction = Sigmoid(Activation);

Cost = CostFunction(Prediction , FlowersDataSet[RandomIndex].FlowerColor);
std::cout << Cost << std::endl;

dCost_dPred = 2 * (Prediction - FlowersDataSet[RandomIndex].FlowerColor);
dPred_dActi = dSigmoidX_dx(Activation);
dActi_dW1   = FlowersDataSet[RandomIndex].PetalLength;
dActi_dW2   = FlowersDataSet[RandomIndex].PetalWidth;
dActi_dBias = 1;

dCost_dActi = dCost_dPred * dPred_dActi;

dCost_dW1   = dCost_dActi * dActi_dW1;
dCost_dW2   = dCost_dActi * dActi_dW2;
dCost_dBias = dCost_dActi * dActi_dBias;

Weight1 -= (LearningRate * dCost_dW1  );
Weight2 -= (LearningRate * dCost_dW2  );
Bias    -= (LearningRate * dCost_dBias);
}

std::cout << "Training Completed!" << std::endl << std::endl << Weight1 << " " << Weight2 << " " << Bias;

Flower MysteryFlower(1 , 1.5 , Color::Undetermined);
Activation = MysteryFlower.PetalLength * Weight1 + MysteryFlower.PetalWidth * Weight2 + Bias;
Prediction = Sigmoid(Activation);

if (Prediction > 0.5)
MysteryFlower.FlowerColor = Color::Red;
else
MysteryFlower.FlowerColor = Color::Blue;

std::cout << std::endl << std::endl << MysteryFlower;

return 0;
}

double Sigmoid (double X)
{
return (1 / (1 + exp(-X)));
}

double dSigmoidX_dx (double X)
{
return (Sigmoid(X) * (1 - Sigmoid(X)));
}

double CostFunction (double Prediction , int Expected)
{
return (pow(Prediction - (double)Expected , 2));
}

double random_double (void)
{
static std::default_random_engine e;
static std::uniform_real_distribution<> dis(0, 1);
return dis(e);
}

int random_int (void)
{
static std::default_random_engine e;
static std::uniform_real_distribution<> dis(0, 8);
return dis(e);
}```
First, credit to c++ - Random float number generation - Stack Overflow for the random number generator.

Second, my question. Have a look at the Cost values evaluated from the cost function:

Code:
```0.00702527
0.028994
0.00011494
3.07483e-007
3.07481e-007
0.0183549
0.0177597
0.00700561
0.00691958
0.000107458
0.00683462
0.000106515
0.00039988
3.9654e-007
0.458472
0.362696
0.00298477
0.00208283
0.0877828
6.75981e-005
0.00141368
2.02433e-006
0.0036412
0.00141755
0.0687313
7.49147e-005
0.0042517
0.00421865
0.0100692
0.0584421
0.00084096
0.079695
0.000595151
0.413079
0.00884952
0.0682537
0.350009
0.00229093
0.00225457
0.00681175
0.264896
9.67824e-006
0.00197086
0.198073
0.0027157
5.44921e-005
5.44886e-005
0.14506
0.00147193
0.00854737
0.00347595
2.27151e-006
2.27145e-006
0.00152141
0.307664
0.114488
0.27559
0.188302
0.0932209
0.112232
0.0552676
9.00705e-007
0.0753365
0.037417
0.000457546
0.442054
0.00105253
0.0904635
0.0007183
0.072252
0.0143435
0.420105
0.00126335
6.9766e-005
0.00125219
6.99779e-005
0.00378638
0.065188
0.000935867
0.0106762
1.23841e-006
0.00429752
1.26348e-006
1.26346e-006
1.26344e-006
0.000981218
0.354948
0.00285393
0.00283864
0.00221268
0.0021787
0.00689987
0.26681
0.193945
0.0065364
0.00645031
5.42753e-005
0.00263111
5.40839e-005
4.51614e-006
0.00633342
4.65456e-006
0.102146
0.0785372
0.0987857
0.0486906
0.0682993
4.87485e-007
0.0567359
0.0168735
0.050095
0.018354
3.06251e-007
0.000330341
0.477846
0.000760553
8.18737e-005
0.00479408
0.0474058
0.0053247
0.00527377
8.63794e-005
0.000636328
0.0414009
0.00051712
9.24693e-005
0.42673
0.00383604
0.00120267
1.61248e-006
0.00382518
0.330095
0.150222
6.33479e-005
0.00154276
0.0739423
0.0595928
7.73619e-005
0.371281
0.083115
6.52392e-005
0.0658824
0.00423946
0.0890867
8.19332e-005
0.0006847
7.53096e-007
7.53088e-007
0.0427782
0.00055308
0.0368406
0.0148277
4.72599e-007
0.0582564
0.00679147
9.96971e-005
0.000380562
9.97812e-005
0.00671765
0.0289541
0.475973
0.0462178
0.00530084
0.0686167
0.0327887
9.73331e-005
0.0515055
0.00031151
2.66341e-007
2.6634e-007
0.0441569
0.00814776
0.000259398
0.0195541
0.000110522
2.30931e-007
2.30931e-007
2.3093e-007
0.018883
0.000107695
2.51411e-007
2.5141e-007
0.0429014
0.0201657
0.0194589
0.496195
8.21744e-005
0.0409966
5.22899e-007
0.00570051
0.0136075
8.53853e-005
0.0372898
0.000467256
0.0145405
0.0337777
0.451255
7.00099e-005
0.052781
0.000746095
7.98998e-007
7.60235e-005
0.0762551
0.00564718
0.00559017
0.00553426
0.414946
0.321216
0.103905
0.00184598
0.0018221
0.0784298
0.00367542
0.00365047
0.0637587
1.11336e-006
0.00420191
0.000995856
0.090963
0.000678575
0.0411686
0.0357011
4.1329e-007
8.70425e-005
8.70336e-005
0.438844
1.23631e-006
0.00106607
0.00404097
1.24722e-006
0.342892
0.146433
0.00818034
0.00804849
0.307051
0.00229871
0.110172
2.92765e-006
0.00288222
0.13686
6.14113e-005
0.00368812
1.56969e-006
0.00366303
0.00363824
0.00845458
0.0647591
6.58112e-005
6.58061e-005
0.350301
0.264483
0.133239
0.0985158
0.128614
1.27566e-006
0.0983336
0.0111236
0.0798308
0.00550363
0.0657857
0.448084
0.0097384
1.15332e-006
0.0923929
0.391377
2.09064e-006
0.00157073
0.00765803
2.13606e-006
0.00754176
0.00317685
0.121487
6.52685e-005
0.00942837
0.340445
0.0061162
0.00258561
0.00257299
0.149491
1.93314e-006
0.112158
0.0100143
0.0542413
0.0771631
5.33383e-007
0.00545403
8.25551e-005
0.0130682
0.0383342
0.000477423
0.00580472```
This is some part of the 10000 times that the Cost is printed. What's disturbing me is that I want my cost value to decrease gradually but quite a few times the cost rises up to quite a high value in comparison to the average cost. Are these random rises in cost a general trend seen in training models? Or is it just that my training model is flawed? If so, could any seasoned ML programmer point out what's wrong? Also, I had to bring my initial weights and bias into a range of 0-1 to get a good working model, but before I changed that the random_double values were set to give a result between 0-100. No matter how much I tried (by increasing number of iterations or playing with the learning rate) the cost would never be precise. It was just displaying a 0 or 1. Why is that? The Sigmoid function pretty much gives a value very close to 1 but never exactly 1 and yet my cost displayed 1 at quite a few places. How do I increase the precision of cost without any rounding off occuring?

Thanks for your time and help!

Here's what an example output looks like for the mystery flower:
Code:
```[Petal Length]: 1
[Petal Width]: 1.5
[Color]: Blue``` 2. Originally Posted by Zeus_ This is some part of the 10000 times that the Cost is printed. What's disturbing me is that I want my cost value to decrease gradually but quite a few times the cost rises up to quite a high value in comparison to the average cost. Are these random rises in cost a general trend seen in training models? Or is it just that my training model is flawed?
Not an ML expert. But a small dataset like that will probably not yield the best results. You need more samples.

As to the fluctuations my first thought was that that particular method of using the first derivative is prone to problems during the gradient descent stage. It guarantees a local minima but that may not be the same thing as what you actually need, the global minima. That said the more samples you throw at it the less likely that tends to happen.

To really solve the gradient descent issue you have to apply both the first AND second derivative to your cost function. Originally Posted by Zeus_ I had to bring my initial weights and bias into a range of 0-1 to get a good working model, but before I changed that the random_double values were set to give a result between 0-100. No matter how much I tried (by increasing number of iterations or playing with the learning rate) the cost would never be precise. It was just displaying a 0 or 1. Why is that? The Sigmoid function pretty much gives a value very close to 1 but never exactly 1 and yet my cost displayed 1 at quite a few places. How do I increase the precision of cost without any rounding off occuring?
The sigma function squashes everything into the range [0, 1). The user can scale the result after the fact anyway. In your case that means multiplying by 100.

The rounding may be due to overtraining. Be careful with the number of iterations. Inserting an occasional random value into the neural net can help prevent overtraining too, at the cost of some temporary instabilities. (Maybe do it for only the first half of the training loop?) 3. > Not an ML expert. But a small dataset like that will probably not yield the best results. You need more samples.

That's true, the more data sets I provide for each type of flower, the better the network will perform. But right now, all my program is trying to do is compute the line that is at a minimum distance from all points graphed (say, PetalLength on X-axis and PetalWidth on Y-axis). You'll get the 4 blue flowers more dense in one area and the 4 red ones with more density in another. Now, based on the minimum cost, there'd be a certain W1, W2 and B. Drawing a line across the blue and red points such that the average of the distances of each point from the line is minimum will be the desired minimum cost. Getting those random bumps in the cost means that the average of summation of distances of the points from the line determined by the NN is spiking up and this means that instead of gradually achieving a minima, the NN performs like a drunk man trying to walk up a hill gradually falling down and reaching the bottom instead of a gradual descent. So, therefore, it lead me to question if this is a general trend that is seen in real-life problem dealing NNs or is just that my NN is flawed in someway. Thanks for the response!

>
As to the fluctuations my first thought was that that particular method of using the first derivative is prone to problems during the gradient descent stage. It guarantees a local minima but that may not be the same thing as what you actually need, the global minima. That said the more samples you throw at it the less likely that tends to happen.

Shouldn't the global minima and local minima of this NN be the same? Because there is only one minima as per my thought.... But yes, what you said should be true for other complex NNs. Popular pages Recent additions colour, float, flower, model, problem 