Not just data: A method for improving prediction with knowledge

Yet B., Perkins Z., Fenton N., TAI N., Marsh W.

JOURNAL OF BIOMEDICAL INFORMATICS, vol.48, pp.28-37, 2014 (SCI-Expanded) identifier identifier identifier


Many medical conditions are only indirectly observed through symptoms and tests. Developing predictive models for such conditions is challenging since they can be thought of as 'latent' variables. They are not present in the data and often get confused with measurements. As a result, building a model that fits data well is not the same as making a prediction that is useful for decision makers. In this paper, we present a methodology for developing Bayesian network (BN) models that predict and reason with latent variables, using a combination of expert knowledge and available data. The method is illustrated by a case study into the prediction of acute traumatic coagulopathy (ATC), a disorder of blood clotting that significantly increases the risk of death following traumatic injuries. There are several measurements for ATC and previous models have predicted one of these measurements instead of the state of ATC itself. Our case study illustrates the advantages of models that distinguish between an underlying latent condition and its measurements, and of a continuing dialogue between the modeller and the domain experts as the model is developed using knowledge as well as data. (C) 2013 Elsevier Inc. All rights reserved.