資料內(nèi)容:
Cross-validation
We did not build any models in the previous chapter. The reason for that is simple.
Before creating any kind of machine learning model, we must know what crossvalidation is and how to choose the best cross-validation depending on your
datasets.
So, what is cross-validation, and why should we care about it?
We can find multiple definitions as to what cross-validation is. Mine is a one-liner:
cross-validation is a step in the process of building a machine learning model which
helps us ensure that our models fit the data accurately and also ensures that we do
not overfit. But this leads to another term: overfitting.
To explain overfitting, I think it’s best if we look at a dataset. There is a red winequality dataset2 which is quite famous. This dataset has 11 different attributes that
decide the quality of red wine.
These attributes include:
• fixed acidity
• volatile acidity
• citric acid
• residual sugar
• chlorides
• free sulfur dioxide
• total sulfur dioxide
• density
• pH
• sulphates
• alcohol
Based on these different attributes, we are required to predict the quality of red wine
which is a value between 0 and 10