Train caret

4/14/2023

This is particularly useful if you're going to be parallelizing your computations across multiple cores. And then you can set preprocessing options as well as things like prediction bounds, and you can set the seeds for all the different resampling layers.

You can also have it return a different summary, use a different summary function than the default summary if you'd like it to. You can also have it return the actual predictions themselves from each of the iterations when it's building the model. And horizon is the number of time points that you'll be predicting. So for example, for time course data, initial window tells you the size of the training dataset, the size of the number of time points that will be in the training data. You can tell it the size of the training set with this p parameter, and then you can tell it a bunch of other parameters that depend on the specific problems you're working on. You could also tell it how many times to repeat that whole process if you want to be careful about repeated cross-validation. You can tell it the number of times to do bootstrapping or cross validation. You can tell it which method to use for resampling the data whether it's bootstrapping or cross validation which we'll talk about in a minute. The trainControl argument allows you to be much more precise about the way that you train models. It's a more in-depth, more complicated measure that's frequently used in some competitions like and other competitions. I've linked here to a definition of that measure. You can also tell it to use Kappa, which is a measure of concordance. And that's the default measure of accuracy for categorical outcomes. So that's just the number that you get correct. It may not be so useful if you're doing more non-linear things like random forests and so forth. Linear agreement is very useful if you're using linear regression and things like that. RSquared is a measure of linear agreement between the variables that you're predicting and the variables that you predict with. This is the RSquared that you get from a regression model if you remember that from the inference class. So the Metric options are built-in to the train function for continuous outcomes. So, you can also set a large number of other control parameters using this trControl parameter here and you have to pass it a call to this particular function, trainControl, which we'll talk about in a couple of slides. For continuous variables it's the root mean squared error, like we talked about in a previous lecture. You can set the metric, so by default for factor variable, in other words for categorical variables the default metric is accuracy that it's trying to maximize. These are particularly useful if you have very unbalanced training set where you have a lot more examples of one type than another. In other words, you can upweight or downweight certain observations. We'll talk about that in a future lecture. One, you can use this preProcess parameter to set a bunch of preprocessing options. For example, you can use a large set of options for training.

You can go a little bit further than this, though. Other than maybe the method that you're going to be using to fit and which data set you're going to be using. So usually what you would do when you fit a model is basically just use the train function, like this, where you basically set all of the defaults to be whatever defaults that the train function chooses for you. Then we define training and testing sets using those indicator functions.

And we set about 75% of the data to be in the training set. Then we use the createDataPartition function to create a set of endices corresponding to the training set. So we load the caret package, the kernlab package, and then we attach the spam dataset. For this lecture, we're going to be using the spam example again just to illustrate how these ideas work. This is a brief lecture about some of the training control options that you have when training while using the caret package.

0 Comments

Train caret

Leave a Reply.

Author

Archives

Categories