BIAS-VARIANCE TRADE-OFF FOR MACHINE LEARNING ALGORITHMS


Supervised machine learning algorithms use past knowledge to predict future events based on new data. In this process, the model acquires knowledge of the past thanks to the selected labeled examples. Having labeled data is essential in supervised machine learning, and as more varied the data is, the more effective the models are.
Different metrics are used to evaluate the complex model and it depends on the task itself. In binary classification tasks, metrics can be accuracy, precision, and recall. The following loss functions are used to evaluate the performance of the regression model: ME, MAE, MSE, RMSE, F1-score.
In order to better train the model on existing data and acquire flexibility, we must artificially introduce some noise into the model, the so-called bias and elaborate on these data. A model trained by introducing noise too quickly or too slowly can underfit or overfit the training data, which is why a good prediction cannot be made based on the test data. On one hand, the model must learn well on the training data and on the another - it must generalize this knowledge well to new data as well. Achieving it perfectly is a difficult task. It is important to balance both the bias and the variance and choose a compromised option. Bias-variance tradeoff is one of the challenges in supervised machine learning models.
In the work we present one of the possible solutions how the bias and variance can be optimally changed to obtain an acceptable predicting model.