Ladislav Végh Profile Ladislav Végh

Evaluating optimizable machine learning models for anemia type prediction from complete blood count data

  • Authors Details :  
  • Ladislav Vegh,  
  • Ondrej Takac,  
  • Krisztina Czakoova,  
  • Daniel Dancsa,  
  • Melinda Nagy

24 Views Original Article

This paper compares different optimizable machine learning classification models to predict eight types of anemia from complete blood count (CBC) data. For the research, we used a publicly available Kaggle dataset containing 1281 observations, 14 predictors, and the diagnosis as the categorical target variable with nine categories (eight types of anemia and the healthy category). First, we examined the dataset and observed the histograms of some of the predictors. We compared the values of predictors of observations with no anemia to the observations where any anemia was diagnosed. Next, we used MATLAB R2024a to train and test nine optimizable machine-learning classification models. These models were Ensemble, Tree, SVM, Efficient Linear, Neural Network, Kernel, KNN, Naïve Bayes, and the Discriminant. Bayesian optimization was used to optimize the hyperparameters of all these models. We used 90% of observations for training and 10% of observations for testing. During the training, 10-fold cross-validation was used to prevent overfitting. The results showed the best accuracy was reached with the Ensemble classification model using the bag ensemble method (validation accuracy: 99.22%, test accuracy: 100%). Finally, we inspected our best classification model in more detail. We calculated the permutation feature importance to determine the contribution of each predictor to the final model. The results showed 6–7 important predictors, while the most important feature was the amount of hemoglobin.

Article Subject Details


Article Keywords Details



Article File

Full Text PDF





More Article by Ladislav Végh

Models of data structures in educational visualizations for supporting teaching and learning algorithms and computer programming

Teaching and learning computer programming is challenging for many undergraduate first-year computer science students. during introductory programming courses, novice programmers n...

Using interactive web-based animations to help students to find the optimal algorithms of river crossing puzzles

To acquire algorithmic thinking is a long process that has a few steps. the most basic level of algorithmic thinking is when students recognize the algorithms and various problems ...

Simulations of solving a single-player memory card game with several implementations of a human-like thinking computer algorithm

The memory card game is a game that probably everyone played in childhood. the game consists of n pairs of playing cards, whereas each card of a pair is identical. at the beginning...

Comparing machine learning classification models on a loan approval prediction dataset

In the last decade, we have observed the usage of artificial intelligence algorithms and machine learning models in industry, education, healthcare, entertainment, and several othe...