Facultade de Fisioterapia

2018/01/10_Irantzu Barrio, Universidad del País Vasco (UPV/EHU)

January 10th 2018

Facultad de Ciencias Económicas y Empresariales | Aula Seminario 7

Selection of optimal cut points to categorise continuous predictors: beyond the univariate logistic regression model

2018/01/10 - 11:00 h | Irantzu Barrio, UPV/EHU


In the medical eld, prediction models are gaining importance as a support for decision-making whereby increased knowledge of potential predictors helps the decision-making process. An important consideration needed in the development of prediction models is the selection of the predictors (clinical variables) to be used in the model. From a statistical perspective, categorising continuous variables is not advisable, since it may entail a loss of information and power. Yet in clinical research and, more speci cally, in the development of prediction models for use in clinical practice, both clinicians and health managers call for the categorisation of continuous parameters. However, despite the fact that categorisation is a common practice in clinical research, there are no uni ed criteria for the selection of the cut points. Previous work has been done in the categorisation of continuous variables but with the aim in almost all cases of dichotomising the predictor variable. In this work, we focus on the categorisation of continuous variables to be used in the development of prediction models, considering that the use of more than two categories may be preferable. This serves to reduce the loss of information and enables the relationship between the covariate and the response variable to be retained. Our goal is to propose a methodology to categorise continuous predictor variables in regression-based prediction models, mainly focussing on the logistic and Cox regression models which are those most widely used in the medical eld for modelling dichotomous and time-to-event outcomes respectively.

For a dichotomous response variable Y our proposal consists on categorizing the continuous covariate X in such a way that the maximal area under the receiver operating characteristic curve (AUC) is obtained (Barrio et al, 2017a). The proposal can be extended to a multivariate logistic regression model with or without interactions. On the other hand, for time to event outcomes, we considered categorising the continuous predictor variable X in a Cox proportional hazard model. To measure the discriminative ability of the model, we considered the concordance probability index, and two di erent estimators were studied: the c-index and the concordance probability estimator (CPE) (Barrio et al, 2017b).

In this talk I will present the methodology we have developed to categorize continuous variables in prediction models, showing an empirical validation by means of simulations and an application to a real data set of patients with chronic obstructive pulmonary disease. Finally, I will show the R package, named CatPredi, which implements these methods and provides the user with the optimal cut-points and the categorized variable to be used in practice.


I. Barrio, I. Arostegui, M.X. Rodríguez-Álvarez, J.M. Quintana (2017) A new approach to categorising continuous variables in prediction models: Proposal and validation, Statistical methods in medical research, 26(6), 2586-2602.

I. Barrio, M.X. Rodríguez-Álvarez, L. Meira-Machado, C. Esteban, I. Arostegui (2017). Comparison of two discrimination indexes in the categorisation of continuous predictors in time-to-event studies, SORT, 41, 73-92.