Spatio-temporal and class-imbalanced data analytics in healthcare


Data analytics promise to deliver value to industries by providing historical insights that can drive the future decisions. However, depending on the characteristics of the data, machine learning models are not always accurately generalizable. For example, the well-studied problem of overfitting can arise as a result of small sample sizes. We worked on two problems in the healthcare industry, where the high dimensionality of the state space, coupled with the rarity of training samples poses challenges to the applicability of general methods to those specific problems. In the first problem, we used temporal flu activity from multiple locations in Ontario, and showed that depending on the surveillance variable under study, spatial and temporal models can each exhaust the limited amount of spatio-temporally recorded data more efficiently than the other, and predict surveillance variables more accurately as a result. In the second problem, we used a clinical dataset of radiotherapy treatment plans, whose quality is labelled by clinicians as acceptable or unacceptable, and developed an automated quality assurance system. Due to limitations in recording unacceptable plans, there is a severe class imbalance in labels that requires special learning treatments. We investigated two classification approaches, namely a class-specific learning for binary classification and a one-class learning for anomaly detection that both benefit from an adaptive resonance learning scheme that can adapt to long-term trends in labelling behaviour.