Fusion Model Classification Algorithm for Imbalanced Data
Abstract
Imbalance data which the samples containing more discriminative information near the decision boundary may be ignored and derive a suboptimal model. In practice, ensemble model can achieve more precise and robust classification result. Therefore, this paper presents a fusion model classification algorithm concluding classification and regression tree, support vector machines, gradient boosting machine and random forests. Meanwhile integrated learning method of bagging and voting strategy of KNN were adapted to minimize the variance of the proposed method. The evaluation measures including accuracy, p-value, and recall rate were applied with air quality data in Shijiazhuang City, capital of China’s Hebei Province. The experiment results showed a better performance under proposed method than under single base model towards imbalanced data.