November 19, 2012

Predict by removing unnecessary variables

Some variables haven t sufficient effectiveness to give an important result. We can say that these values perturbate the model.

I have choosed two methods to remove some unnecessary variables . 

-   Low correlations  (abs(corrélation)<0,1) : list 1 :householdsize, racePctAsian, agePct12t21, agePct16t24, agePct65up, pctUrban, pctWRetire, indianPerCap, PctEmplManu, PctEmplProfServ, PctWorkMomYoungKids, PersPerOccupHous, PctVacMore6Mos, MedOwnCostPctInc, MedOwnCostPctIncNoMtg, PctBornSameState, PctSameCity85, PctSameState85
-   Opposite variables or our targetted values on the two by two matrix on the kohonen clustering research  : list 2 : racePctWhite, pctUrban, pctWWage, pctWInvInc, pctWRetire, PctEmploy, PctFam2Par, PctKids2Par, PctYoungKids2Par, PctTeen2Par, PctWorkMom, PctSpeakEnglOnly, PersPerOccupHous, PersPerOwnOccHous, PctHousOccup, PctHousOwnOcc,MedYrHousBuilt, PctBornSameState, PctSameHouse85, PctSameCity85, PctSameState85



Removed values in two lists



      List
     RMSE
      MAE
      MSE
      ARV
linear regression
1
0,13825
0,10049
0,019112
0,52324

2
0,134
0,097173
0,017957
0,49161
PLS regression
1
0,13671
0,098719
0,018691
0,5117

2
0,13291
0,09554
0,017665
0,48362
SVM Polynomial
1
0,13048
0,091957
0,017025
0,46611

2
0,12925
0,089951
0,016705
0,45733
Neural network
1
0,13788
0,09764
0,019011
0,52048

2
0,13351
0,095503
0,017824
0,48797


No comments:

Post a Comment