Figure 2 shows the CART model. Accuracy of the CART model predictions, (MAE = 0.60 and RMSE = 0.75) was better than those of the linear regression model (MAE = 0.76 and RMSE = 1.01). Correlation with test data was
R = 0.73 for the CART model and only
R = 0.47 for the linear regression model.
In a CART model, each node in the tree represents a decision based on a feature (or predictor variable) and a threshold value, reflecting how the data should be partitioned based on the selected features. Here, the selected features are SEQ, sex, and age. All three input variables of SEQ, age, and sex were critical forks in the model. The first split was at SEQ = +3.94 D. For children with SEQ ≥ +3.94 D, the next split was SEQ = +7.06 D and, if SEQ was ≥ +7.06 D, the model predicted AL = 20.3 mm. If SEQ < +7.06 D the model classifies on the basis of age, with AL = 21.5 mm for children ≥6.83 years, and if the age is <6.83 years old, the model incorporates sex as the final spilt, predicting AL = 20.7 for female children and AL = 21.2 for male children. On the other limb of the classification tree, for children with SEQ < +3.94 D, the next split was sex. Girls <6 years of age are predicted to have AL = 21.5 mm and, if ≥6 years, AL = 22.1 mm. For boys with SEQ < +3.31 D, the prediction is AL = 21.8 mm and for boys with SEQ ≥3.31 and AL = 22.9 mm.
To compare with the CART model, we also fitted a linear regression model.
Figures 3A and
3B show Bland-Altman plots of prediction versus data from the CART and the linear regression model.
Table 3 summarizes statistics derived from Bland-Altman analysis. Compared with the CART model, the linear regression model had less bias but a larger limit of agreement (LOA); whereas the mean bias was low, the linear regression model overestimated when AL was shorter and underestimated when AL was longer.