![]() |
| | #1 |
| Registered User Join Date: Jun 2007
Posts: 24
| Support vector machines I applied svm-train on a data set N of features and have found out the cross validation accuracy and then i have applied svm-train again on a data set which is the subset of having K (K<N) features and have found out the cross validation accuracy. The accuracies in both cases are the same and in fact for any subset with the root as original(N feature) the accuracy is coming out to be the same,irrespective of number of features. Should'nt the accuracy improve with reducing number of features?? Why the anomaly? |
| megastar is offline | |
| | #2 | |
| Senior software engineer Join Date: Mar 2007 Location: Portland, OR
Posts: 5,381
| Quote:
Are the accuracies EXACTLY the same in both cases? I would suspect a bug in the code... EDIT: Either that, or you validation set is small enough that the accuracies are lining up by random chance. Last edited by brewbuck; 07-09-2007 at 01:57 PM. | |
| brewbuck is offline | |
| | #3 |
| Registered User Join Date: Jun 2007
Posts: 24
| Yup! i got that point later...anyways reduction of noise can be done in many ways.. as to your second statement Code: Are the accuracies EXACTLY the same in both cases? I would suspect a bug in the code... EDIT: Either that, or you validation set is small enough that the accuracies are lining up by random chance why do u think exactly same cross validation accuracies are not possible for different subsets? In fact i think this tendency will be more prevalent in subsets with larger features which have lot of noise rather than smaller ones so that a different 'large subset' will eventually result in the same cross validation accuracy...in fact i think it is necessary to look into what exactly is affecting the classifiers accuracy..one is number of informative features ..and anything else? |
| megastar is offline | |
| | #4 | ||
| Senior software engineer Join Date: Mar 2007 Location: Portland, OR
Posts: 5,381
| Quote:
Quote:
| ||
| brewbuck is offline | |
| | #5 |
| Registered User Join Date: Jun 2007
Posts: 24
| i guess it was'nt very clear..sorry about that.. consider the following scenario : There are let us say 15000 features for each sample. The subsets we can generate have 4000-8000 features i.e subset1:4500 , subset2: 7900.... Now in case the actual informative features i.e those which are required by the classifier to classify the sample is very less compared to the number of features may be something like 10-15 then the subsets generated will contain lots of irrelevant features ..noisy data..so even when the subset size is different when we try to classify the sample SINCE THE NOISE REDUCTION IS STILL INSIGNIFICANT AS COMPARED TO THE NUMBER OF INFORMATIVE FEATURES there should not be any improvement in accuracy.. which is why i guess my accuracies have been the same...THERE SHOULD BE A SIGNIFICANT REDUCTION IN NOISE WITH RETENTION OF THE REQUISITE FEATURES so as to improve the classifiers accuracy.When there are smaller subsets which have hundreds of features informative ones sometimes may or may not be retained which results in different accuracies. This was one of my reasonings but i still suspect this may not be the only one... Would you like to suggest anything? |
| megastar is offline | |
![]() |
| Thread Tools | |
| Display Modes | |
|
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Errors including <windows.h> | jw232 | Windows Programming | 4 | 07-29-2008 01:29 PM |
| failure to import external C libraries in C++ project | nocturna_gr | C++ Programming | 3 | 12-02-2007 03:49 PM |
| Dev-cpp - compiler options | tretton | C Programming | 7 | 01-06-2006 06:20 PM |
| what's a good TWAIN library/SDK? Also, JPG/TIF/GIF/PNG support... | KeMBro | Windows Programming | 1 | 09-25-2003 08:25 AM |