Analysis
Evaluation Criteria:
- Accuracy: Ratio of correctly predicted observation to the total observations. ((TP + TN) / (TP + FP + TN + FN))
- Precision: Ratio of correctly predicted positive observations to the total predicted positive observations. (TP / (TP + FP))
- Recall (Sensitivity): Ratio of correctly predicted positive observations to the all observations in actuall class. (TP/ (TP + FN))
- F1 Score: Weighted average of Precision and Recall. (2 * (Recall * Precision) / (Recall + Precision))
Support Vector Machines:
Specs: https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC
Random Forest:
Specs:https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier
Extra Trees:
Specs: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html#sklearn.ensemble.ExtraTreesClassifier
Ada Boost:
Specs: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html#sklearn.ensemble.AdaBoostClassifier
Gradient Boosting:
Specs: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html#sklearn.ensemble.GradientBoostingClassifier
Conclusion:
From all of the charts above, we can see that a PU adapted estimator increases the performance under some conditions. We think this is because some estimators don't handle our specific type of dataset as well as some other datasets, for example, SVMs and Decision Trees are generally susceptible to overfitting (the model learns the noise too well where it negatively impacts the performance with new data), and those flaws are amplified when we attempted to adapt those estimators into a PU estimator which resulted in a worse performance.
- Accuracy: Ratio of correctly predicted observation to the total observations. ((TP + TN) / (TP + FP + TN + FN))
- Precision: Ratio of correctly predicted positive observations to the total predicted positive observations. (TP / (TP + FP))
- Recall (Sensitivity): Ratio of correctly predicted positive observations to the all observations in actuall class. (TP/ (TP + FN))
- F1 Score: Weighted average of Precision and Recall. (2 * (Recall * Precision) / (Recall + Precision))
Support Vector Machines:
Specs: https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC
We had issues with SVM because of miss predicted labels (only 1 class was predicted) therefore when generating our evaluation criteria, some values are miss under-represented and giving us a lower performance than expected.
Decision Tree:
Specs:https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier
Extra Trees:
Specs: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html#sklearn.ensemble.ExtraTreesClassifier
Ada Boost:
Specs: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html#sklearn.ensemble.AdaBoostClassifier
Gradient Boosting:
Specs: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html#sklearn.ensemble.GradientBoostingClassifier
Conclusion:
From all of the charts above, we can see that a PU adapted estimator increases the performance under some conditions. We think this is because some estimators don't handle our specific type of dataset as well as some other datasets, for example, SVMs and Decision Trees are generally susceptible to overfitting (the model learns the noise too well where it negatively impacts the performance with new data), and those flaws are amplified when we attempted to adapt those estimators into a PU estimator which resulted in a worse performance.
Comments
Post a Comment