The pSVM method, with a Gaussian RBF kernel, was used to predict future MVCs, using the 84 independent variables shown in
Table 1, including better and worse VAs, the 52 TD values, the six sectorial average TD values, driving attitude, past history of MVCs, HT status, DM status, use of anti-HT drugs, smoking habits, alcohol intake, years with driving license, distance driven per week, BMI, and use of sleep aid/sedatives. This pSVM model is denoted, pred
penSVM_all. We also fitted a standard SVM model (without variable selection) and denote this model, pred
SVM. As the dataset is unbalanced with respect to the measured outcome (merely 28 patients experienced a MVC on self-report [“MVC+” group], while 157 patients did not [“MVC−” group]), we applied the synthetic minority over-sampling technique (SMOTE) algorithm
25 to generate an “artificial” dataset with a balanced outcome. In addition, weights for the different classes were applied: a weight of 157/185 for the MVC+ group and a weight of 28/185 for the MVC− group. Future MVCs were predicted using the pSVM model and leave-one-out cross validation. In leave-one-out cross validation, data from a single patient are used as validation data, and all data from the remaining patients are used as training data (
N = 184/185). This procedure is repeated until each patient in the original sample is used once as validation data (i.e., 185 times). In other words, for each individual, only the data from all other subjects are used in the prediction model. Finally, the relationship between the pSVM-predicted outcome and the self-reported actual incidence of MVCs was analyzed using logistic regression. For comparison, we also generated a pSVM using only a subset of 63 variables (see
Table 1), namely the 52 TD values, the six sector average TDs, mTD, VAs, age, and sex. This model is denoted, pred
penSVM_basic.