Evaluating optimizable machine learning models for anemia type prediction from complete blood count data

Authors Details :
Ladislav Vegh,
Ondrej Takac,
Krisztina Czakoova,
Daniel Dancsa,
Melinda Nagy

112 Views Original Article

This paper compares different optimizable machine learning classification models to predict eight types of anemia from complete blood count (CBC) data. For the research, we used a publicly available Kaggle dataset containing 1281 observations, 14 predictors, and the diagnosis as the categorical target variable with nine categories (eight types of anemia and the healthy category). First, we examined the dataset and observed the histograms of some of the predictors. We compared the values of predictors of observations with no anemia to the observations where any anemia was diagnosed. Next, we used MATLAB R2024a to train and test nine optimizable machine-learning classification models. These models were Ensemble, Tree, SVM, Efficient Linear, Neural Network, Kernel, KNN, Naïve Bayes, and the Discriminant. Bayesian optimization was used to optimize the hyperparameters of all these models. We used 90% of observations for training and 10% of observations for testing. During the training, 10-fold cross-validation was used to prevent overfitting. The results showed the best accuracy was reached with the Ensemble classification model using the bag ensemble method (validation accuracy: 99.22%, test accuracy: 100%). Finally, we inspected our best classification model in more detail. We calculated the permutation feature importance to determine the contribution of each predictor to the final model. The results showed 6–7 important predictors, while the most important feature was the amount of hemoglobin.

Article Subject Details

Article Keywords Details

Article File

Full Text PDF

Evaluating optimizable machine learning models for anemia type prediction from complete blood count data

Article Subject Details

Article Keywords Details

Article File

More Article by Ladislav Végh

More Computer Science Articles

Acceptance of cloud deployed blended learning environment by students in higher education sector-a literature review

Evolution and significance of unmanned aerial vehicles

Design and development of framework for big data based smart farming system

Fake news detection using machine learning ensemble methods

Computer fundamentals pdf

A study of wsn and analysis of packet drop during transmission

A memetic algorithm for the inventory routing problem

Impact of mobility on power consumption in rpl

An exhaustive review on state-of-the-art techniques for anomaly detection on attributed networks

Satiating a user-delineated time constraints while scheduling workflow in cloud environments