OVERCOMING THE ORDINAL IMBALANCED DATA PROBLEM BY COMBINING DATA PROCESSING AND STACKED GENERALIZATIONS

Overcoming the ordinal imbalanced data problem by combining data processing and stacked generalizations

Overcoming the ordinal imbalanced data problem by combining data processing and stacked generalizations

Blog Article

Ordinal imbalanced datasets are pervasive in real world applications but remain challenging to analyse as they require specific methods to account for the ordering information and imbalanced classes.Failure to account for both those characteristics can substantially impact the model predictive performance.However, existing methods tend to focus either on ordinality or imbalance, rather than addressing both simultaneously.The few approaches that do account for both characteristics are not always easy to implement for non-advanced analysts and simpler approaches are needed to facilitate appropriate data processing.

Here, Bath Chair we developed a general approach using some of the most popular machine learning algorithms to ensure appropriate processing of ordinal imbalanced datasets and to optimize the predictions of all classes.After transforming the multi-class ordinal problem into a well-known binary problem, we implemented several different resampling methods in a decision-tree classifier.We then used a stacked generalization algorithm to combine the classifiers to improve model predictive performance.To test our approach, we used two ordinal imbalanced datasets on student performance and wine quality.

Individual resampling techniques tended to improve the accuracy of minority classes, while simultaneously Horse Liniment increasing the number of false positives in those classes.This resulted in a decrease, sometimes substantial, in accuracy of other classes.The stacking model offered a good compromise between improvement in accuracy of minority classes and mitigation of reduced accuracy in other classes.Our approach provided useful insights into modelling strategies that should be favoured for implementation in production that involve these common datasets, depending on the end-user interests.

Report this page