25 September 2023

Today’s class covered a resampling method- Cross Validation.

Cross validation is a technique used to improve model accuracy by dividing the data into folds and building multiple models based on all but one of the folds, followed by validating the model using the remaining division (or fold). This method is useful when there is not enough data to utilize for training as well as testing the model. So, resampling at random and using different sets of the data can improve the model’s accuracy. The validation set can be chosen by just dividing the data into two equal groups and training on one (training set) and testing on the other (validation set). However, this is not as effective as doing a multiple -or- k fold cross validation where there are ‘k’ folds and one validation set among them.

I intend to use this technique in the modeling of the CDC diabetes data. There are 2918 instances in the data. A 10-fold cross validation with around 290 instances in each fold can be implemented.

Leave a Reply

Your email address will not be published. Required fields are marked *