Diagnosis and Medication Anomaly Detection in EHR Data via GNNs

Background

A graph neural network (GNN) is a neural network that operates on graphical data. Graphs are composed of nodes/vertices connected by edges, and the main aspect of GNNs relevant to my project is their ability to learn embeddings (low-dimensional vector representations) of the nodes in a graphical dataset. GNNs leverage the graphical structure of such a dataset by calculating each node’s embedding as a function of its neighbors’ embeddings at each layer of the GNN. Additionally, for each patient visit, electronic health records (EHRs) saliently contain what the patient was diagnosed with, which medications they were prescribed, and their lab test results (e.g. hematology, chemistry, microbiology results).

My initial research goal was to learn better embeddings for diagnoses and medications in EHRs, but my PI, mentor, and I collectively concluded that my work would have a greater impact if I slightly shifted my focus to using those embeddings to perform anomaly detection in EHRs. Specifically, my goal became finding diagnosis and medication anomalies in EHRs by using GNNs. Successfully performing such anomaly detection would allow for our model to function as a quick sanity check for doctors when seeing patients and for cleaning up errors in existing EHRs, increasing medical data quality across the board. To that end, the main hypothesis we were testing in my research was that taking a graph-based approach to EHRs will allow us to outperform other state-of-the-art anomaly detection methods for diagnosis and medication anomaly detection in EHRs.

Results and Future Directions

Throughout the summer, I expanded upon our existing basic GNN model to improve our model’s performance. The graph on which this basic model runs is generated from the MIMIC-III EHR dataset as follows: each patient visit, diagnosis, or medication is a node, each lab test result is a node (with the range of possible outcomes for numerical lab test results being segmented into “buckets” of similar value, of which each bucket is a distinct node in the graph), and an edge is placed between a given diagnosis/medication/lab-test-result node and a given patient visit node if and only if that diagnosis/medication/lab test result was observed in that patient visit. With this graph, our model training pipeline is as follows: we first train the GNN on the task of predicting the diagnoses and medications associated with a patient visit given that visit’s lab test results and that patient’s visit history, and then we directly use the GNN’s likelihood rating for each edge between a patient visit and a diagnosis/medication and classify the bottom x% least likely edges as anomalies (where x is a hyperparameter known as the contamination factor of the data; we used x = 0.5 for this project).

Within this context, this past summer, I first experimented with several different methods of incorporating time information into our model. In that vein, I built two different time bias models that incorporated a patient’s previous visit information into that patient’s current visit embedding via a biased dot product attention mechanism. In these models, I used dot product attention to calculate how much (formally, a scalar value between 0 and 1 known as an attention value) of each previous visit’s embedding should be factored into the current visit’s embedding. Both models calculate the lengths of the time intervals between a patient’s current visit and each of their previous visits, place each previous visit into a bucket—just like the lab test results above—based on its corresponding time interval length, and then, for each previous visit, add the corresponding bucket’s learnable bias parameter to the attention mechanism. For one model, that bias parameter was a scalar added to the attention values; for the other model, that bias parameter was a vector added to the previous visit embeddings. I found that the scalar bias model performed better than our previous models on diagnosis anomaly detection—improving from our standard attention model’s F1 score of 0.6722 to an F1 score of 0.6873. However, both models surprisingly underperformed on medications relative to our basic GNN model. In addition to these models, I built two time encoding models which also added bias terms to the same attention mechanism as the previous models. The first of these models calculates a unique vector for each distinct time interval and adds that vector to the corresponding previous visit embedding in the attention mechanism; the second model takes that vector, passes it through a linear neural network layer to generate a scalar value, and adds that scalar to the corresponding attention value. Both models underperformed on diagnoses, but the second model slightly improved our best F1 score on medications to 0.7753 from 0.7698.

I then noticed that our models using past information performed either worse or about the same as our basic model without past information on medications, so I analyzed the patient visits for which our models were detecting anomalies. I found that the models using past information performed well on patient visits with previous visit information but performed poorly on patients without a history (occurring the first time a patient visited). Therefore, I added gated attention to our models to allow our models to learn to retain more present information. This method increased the second time encoding model’s F1 score on medications to 0.7863.

Finally, I noticed that when a given model exhibited improved performance on one of the diagnosis/medication tasks, it would usually exhibit poor performance on the other task. My preliminary experiments indicated that learning both tasks simultaneously generally improves performance relative to exclusively learning one task, so one future direction is to explore disentangled learning, which is a technique that would allow for slightly greater—but not complete—separation between the two tasks. Another future direction is to further explore using the hierarchical nature of diagnosis/medication medical classifications to improve our models’ learned embeddings—my preliminary experiments indicated a decrease in performance when adding this into our models, but this hierarchical information could potentially be useful if incorporated differently.

Reflections

Overall, through this summer, I learned how to design, build, train, and evaluate AI models; I also learned how to effectively present my research through several presentations I delivered to my PI and to our Yale School of Public Health collaborators. Moreover, I gained a broader exposure to CS research and learned how to build upon other people’s research by reading many CS papers throughout the summer and discussing their relevance to our project with my mentor. I’m excited to share our research with the world through publishing our work on anomaly detection in the future!

I’d like to extend my thanks to many people for my wonderful summer research experience. First, I’d like to express my gratitude to my PI, Prof. Rex Ying, and my mentor, Tinglin Huang, for their invaluable mentorship and guidance. Moreover, thank you to Prof. Andrew Taylor and Vimig Socrates—our School of Public Health collaborators—for talking through model design ideas with me. Finally, I’m incredibly grateful to David Yu, to the sponsors and organizers of the Andy Keidel Fund, and to Yale University for making my amazing summer research experience possible—I’m looking forward to applying the skills I’ve learned this summer to further pursuing CS research!