Motivation

Data science applied to healthcare can have substantial impact on patient outcome. Data analytics performed on historical healthcare data can identify high risk patients, offer personalized outcome predictions to aid in clinical decision-making during therapy enrollment and long-term follow-ups, select more optimum treatments and customize its parameters for best patient outcome, and more. Personalizing medicine by demographics, comoborities, severity and a multitude of other factors is finally achievable with advances in data science methodologies and digitalization of healthcare data. Previously siloed structured and unstructured healthcare data, including medical records, doctor notes, medical images, omics data, device data, patient reported outcomes(pain scores, side effects, etc.), are slowly becoming more available.

Machine learning model training follows the splitting of the data into training and test set. The former is used for training of the model and hyperparameter tuning using cross-validation. The latter is used to test the performance of the optimizied model on unseen data to avoid overfitting.

Machine learning model training and hyperparameter tuning

Common data science models, e.g. support vector machine, decision tree, random forest, Naive Bayes, neural networks, etc., can be applied on the available healthcare data depending on the desired prediction outcome.

Supervised learning machine learning models

Solution

Patient toxicity outcome prediction after radiation therapy

For one application, the risk of patient experienced toxicity after radiation therapy (RT) to treat cancer can be predicted from the delivered radiation dose to the surrounding organs of the tumor. The 3D dose distribution to each organ is represented by the dose-volume histogram (DVH) calculated from DICOM RT Dose and RT Structure acquired during therapy. An open source machine learning library scikit-learn was used to train the DVH RT data on a SVM model with recursive feature elimination and cross validation to produce a model of accuracy 79% to predict generic toxicity. This model can be used to fine tune the radiation therapy plan as an addition objective in the planning calculus to reduce risk of toxicity prior to treatment and also to preemptively prescribe medication to counter toxicity to try to offset symptoms after treatment.

Use of patient toxicity outcome predictions to guide radiation therapy planning

Patient pain relief outcome prediction after neuroablation

For another application, the success rate for long-term pain relief can be predicted immediately after neuroablation treatment for pain management. Typically, success of neuroablation treatment is based on long-term pain relief measured by patient reported outcomes after a long monitoring period lasting months to years. Treatment success for long-term pain relief right after neuroablation can be predicted from a weighted ensemble model of random forests, boosted trees, K-nearest neighbors and neural networks. The open source AutoML library, AutoGluon, was used to train this ensemble model with recursive feature elimination and cross validations to predict treatment success for long term pain relief. Based on predictions, clinicians can decide right after procedure whether treatment performed is a success or additional treatment is required without waiting for months for patient response.

Identifying features of importances with RFE

Performances of various ensemble models