November 19, 2012

French study

Predict by mixing strategy

My last prediction analysis is  


Method
RMSE
MAE
MSE
ARV
likelihood, gaussian mixture    
0.1241
0.087554
0.015401
0.42164
Full data set
0.13475
0.098797
0.018157
0.49709
Full data set cut in 2 classes
0.13686
0.097323
0.01873
0.51276
 Full data set cut in 3 classes
0.13521
0.094228
0.018282
 0.50052
Removed Variables
0.13406
0.097274
0.017972
0.49202
Removed Communities
0.12757
0.092739
0.016275
0.44557
Mixte avec 2 classes
0.1241
0.087554
0.015401
0.42164
linear regression
0.12437
0.087327
0.015467
0.42344
Full data set
0,13499
0.099144
0.018222
0.49888
 Full data set cut in 2 classes
0.13763
0.099092
 0.018942
0.51857
 Full data set cut in 3 classes
0.13501
0.096144
0.018227
0.49899
Removed Variables
0,134
0,097173
0,017957
0,49161
Communautés supprimées
0.12747
0.092553
0.016248
0.44483
Mixed with 2 classes
0.12437
0.087327
0.015467
0.42344
PLS regression 1st
0.12438
 0.08572
0.015472
 0.42357
Full data set
0,13347
0.09774
0.017815
0.48772
Full data set cut in 2 classes
0.13245
 0.094019
0.017542
0.48025
 Full data set cut in 3 classes
0.13047
 0.091678
0.017021
0.466    
Removed Variables
0,13291
0,09554
0,017665
0,48362
Communautés supprimées
0.12764
0.091114
0.016292
 0.44602
Mixed with 2 classes
0.12438
 0.08572
 0.015472
 0.42357
PLS regression advanced
0.1207
0.085773
0.01457
0.39888
Full data set
0.12743
0.093526
0.016238
 0.44455
Full data set cut in 2 classes
0.12396
0.089755
0.015366
0.42067
 Full data set cut in 3 classes
0.12021
 0.087285
 0.014451
0.39562    
Removed Variables
0.12829
0.094293
0.016458
 0.45057
Communautés supprimées
0.12429
 0.088444
 0.015448
0.4229
Mixed with 2 classes
0.1207
0.085773
0.01457
0.39888
SVM Polynomial
0.12175
0.08589
0.014822
0.40579
Full data set
0.12985
0.092377
0.01686
0.46268
Full data set cut in 2 classes
0.12911
0.088887
0.01667
0.45637
 Full data set cut in 3 classes
0.13302 
0.092129
0.017695
0.48444
Removed Variables
0,12925
0,089951
0,016705
0,45733
Communautés supprimées
0.12797
0.090735
0,017175
0,47019
Mixed with 2 classes
0.12175
0.08589
0.014822
0.40579
Neural network
0,11787
0,086258
0,013893
0,40909
Full data set
0,11787
0.086258
0.013893
0.40909
Full data set cut in 2 classes
0,13692
0.10066
0.018747
0.51323
 Full data set cut in 3 classes
0.13393
0.094034
0.017938
 0.4911
Removed Variables
0,13351
0,095503
0,017824
0,48797
Communautés supprimées
 0.13552
0.094944
 0.018367
0.50283
Mixed with 2 classes
0,13283
0,097711
0,017645
0,48306

Predict by removing extra community

To ameliorate the estimation we can remove some non necessary values . The outliers can be removed by a statistical method. 
In my case I use the cook distance . After a threshold distance we remove the communities.
Why by a distance ? The reason is:  in our previous analysis we have found that our data are centered.
My conclusion is the distance permit to extract extrema values.

RMSE
MAE
MSE
ARV
Régression linear
0.12747
0.092553
0.016248
0.44483
Régression PLS
0.12764
0.091114
0.016292
 0.44602
SVM Polynomial
0.12797
0.090735
0,017175
0,47019
Neural network
 0.13552
0.094944
0.018367
0.50283