Predict Violent Crimes on the USA. Modeling, classify, clustering, dbscan, neural networks, svm : all methods are used to build a good model of violent crimes on the USA
November 19, 2012
Predict by mixing strategy
My last prediction analysis is
Method
|
RMSE
|
MAE
|
MSE
|
ARV
|
likelihood, gaussian mixture
|
0.1241
|
0.087554
|
0.015401
|
0.42164
|
Full data set
|
0.13475
|
0.098797
|
0.018157
|
0.49709
|
Full data set cut in 2 classes
|
0.13686
|
0.097323
|
0.01873
|
0.51276
|
Full data set cut in 3 classes
|
0.13521
|
0.094228
|
0.018282
|
0.50052
|
Removed Variables
|
0.13406
|
0.097274
|
0.017972
|
0.49202
|
Removed Communities
|
0.12757
|
0.092739
|
0.016275
|
0.44557
|
Mixte
avec 2 classes
|
0.1241
|
0.087554
|
0.015401
|
0.42164
|
linear regression
|
0.12437
|
0.087327
|
0.015467
|
0.42344
|
Full data set
|
0,13499
|
0.099144
|
0.018222
|
0.49888
|
Full data set cut in 2 classes
|
0.13763
|
0.099092
|
0.018942
|
0.51857
|
Full data set cut in 3 classes
|
0.13501
|
0.096144
|
0.018227
|
0.49899
|
Removed Variables
|
0,134
|
0,097173
|
0,017957
|
0,49161
|
Communautés
supprimées
|
0.12747
|
0.092553
|
0.016248
|
0.44483
|
Mixed with 2 classes
|
0.12437
|
0.087327
|
0.015467
|
0.42344
|
PLS regression 1st
|
0.12438
|
0.08572
|
0.015472
|
0.42357
|
Full data set
|
0,13347
|
0.09774
|
0.017815
|
0.48772
|
Full data set cut in 2 classes
|
0.13245
|
0.094019
|
0.017542
|
0.48025
|
Full data set cut in 3 classes
|
0.13047
|
0.091678
|
0.017021
|
0.466
|
Removed Variables
|
0,13291
|
0,09554
|
0,017665
|
0,48362
|
Communautés
supprimées
|
0.12764
|
0.091114
|
0.016292
|
0.44602
|
Mixed with 2 classes
|
0.12438
|
0.08572
|
0.015472
|
0.42357
|
PLS regression advanced
|
0.1207
|
0.085773
|
0.01457
|
0.39888
|
Full data set
|
0.12743
|
0.093526
|
0.016238
|
0.44455
|
Full data set cut in 2 classes
|
0.12396
|
0.089755
|
0.015366
|
0.42067
|
Full data set cut in 3 classes
|
0.12021
|
0.087285
|
0.014451
|
0.39562
|
Removed Variables
|
0.12829
|
0.094293
|
0.016458
|
0.45057
|
Communautés
supprimées
|
0.12429
|
0.088444
|
0.015448
|
0.4229
|
Mixed with 2 classes
|
0.1207
|
0.085773
|
0.01457
|
0.39888
|
SVM Polynomial
|
0.12175
|
0.08589
|
0.014822
|
0.40579
|
Full data set
|
0.12985
|
0.092377
|
0.01686
|
0.46268
|
Full data set cut in 2 classes
|
0.12911
|
0.088887
|
0.01667
|
0.45637
|
Full data set cut in 3 classes
|
0.13302
|
0.092129
|
0.017695
|
0.48444
|
Removed Variables
|
0,12925
|
0,089951
|
0,016705
|
0,45733
|
Communautés
supprimées
|
0.12797
|
0.090735
|
0,017175
|
0,47019
|
Mixed with 2 classes
|
0.12175
|
0.08589
|
0.014822
|
0.40579
|
Neural network
|
0,11787
|
0,086258
|
0,013893
|
0,40909
|
Full data set
|
0,11787
|
0.086258
|
0.013893
|
0.40909
|
Full data set cut in 2 classes
|
0,13692
|
0.10066
|
0.018747
|
0.51323
|
Full data set cut in 3 classes
|
0.13393
|
0.094034
|
0.017938
|
0.4911
|
Removed Variables
|
0,13351
|
0,095503
|
0,017824
|
0,48797
|
Communautés
supprimées
|
0.13552
|
0.094944
|
0.018367
|
0.50283
|
Mixed with 2 classes
|
0,13283
|
0,097711
|
0,017645
|
0,48306
|
Labels:
Data prediction
Location:
États-Unis
Predict by removing extra community
To ameliorate the estimation we can remove some non necessary values . The outliers can be removed by a statistical method.
In my case I use the cook distance . After a threshold distance we remove the communities.
Why by a distance ? The reason is: in our previous analysis we have found that our data are centered.
My conclusion is the distance permit to extract extrema values.
RMSE
|
MAE
|
MSE
|
ARV
|
|
Régression linear
|
0.12747
|
0.092553
|
0.016248
|
0.44483
|
Régression PLS
|
0.12764
|
0.091114
|
0.016292
|
0.44602
|
SVM Polynomial
|
0.12797
|
0.090735
|
0,017175
|
0,47019
|
Neural network |
0.13552
|
0.094944
|
0.018367
|
0.50283
|
Labels:
Data prediction
Location:
États-Unis
Subscribe to:
Posts (Atom)