Common to the two projects with sequential medical data is the study of an underlying graph which connects the items with each other in a manner unknown to us. In Chapter 6, we seek to incorporate these graph relations into a vector representation of each item to obtain a “dense” vector representation with the graph relation embedded in it. Similarly, in Chapter 7, we study whether the graph for two groups are the same by comparing edges, vertices and frequencies. Finally, by weighting the graph edges with their frequency (e.g. probability), we predict the next item based the maximal frequency.
4.2.1 Embedding of sequential semantic meaning
The embedding of items into vector representations is studied in [2, 3] which introduce the neural network called Skip-Gram. Their application is specific for embedding words into vectors with semantic meaning, but their methodology may be applied to the general concept of ordered sequences as in Section 5.1.
In Chapter 6, we apply the algorithm Skip-Gram to items in an electronic health record. Our dataset originates from Silkeborg Regionshospital, Den- mark, and consist of 169 electronic health records for 169 patients. The patient cohort was chosen from a pool of patients labeled “Suspicion of Serious Ilness”
(SSI) (Danish: Mistanke om Alvorlig Sygdom), which are known to be difficult to diagnose. The label SSI is given by the patients’ general practitioner who provides a referral to the diagnostic unit (Danish: Diagnostisk Center) at Silke- borg Regionshospital. Overall, the patients represent a complex diagnostic problem, as they are often multi-sick, i.e. suffering from several concurrent ailments and have specifically been referred by the general practitioner for a more precise diagnostic elucidation. The dataset consists of 178 thousand electronic health records entries which is much less than related studies but what was available in a Danish setting and may be used as a proof of concept.
The underlying hypothesis is that items affect the occurrence of other items and thus carry a semantic meaning in the context of other items. The Skip-
Gram algorithm aims to induce sequential semantic meaning of electronic health records into embeddings of events/treatments based on their sequential order. The t-SNE visualization of the embeddings in Figure 6.4 reveal intriguing relations between events and successfully show that the embedding holds some semantic meaning which is directly interpretable. It is possible to identify groups of related events and by consultation with a medical doctor, these groups represent standard treatment packages ordered in the clinical workflow – the algorithm discovers these relations without prior knowledge of them. This is a promising result for automatic detection and incorporation of sequential semantic meaning.
To quantify the quality of the Skip-Gram vector representations, we eval- uate the clustering algorithmk-means’ ability to rediscover select annotated groups from Figure 6.4, using the vectors representations as inputs. We com- pare the performance Skip-Gram representation to a benchmark of a Markov Chain on three different classification tasks. The Markov Chain slightly outper- forms the Skip-Gram vector representation on two of the tasks, which may be explained by its more local behavior and the dataset characteristics. However as input in a more complex model (Recurrent Neural Network; RNN) we show that the Skip-Gram representations outperforms the Markov Chain in next- event prediction. To which extent this is due to the Skip-Gram representations or the RNN is difficult to determine, but it does show promising results for the use of vector representations from Skip-Gram as inputs in other statistical models.
4.2.2 Analyzing medication logs for sepsis patients
Sepsis is a life-threatening condition which occurs primarily in the hospital settings and has a high mortality rate. Although not completely understood, it has been closely associated bad hygiene standards, weak immune systems and implanted foreign bodies (e.g. for fixation of broken bones).
In Chapter 7, we analyze the 24 hour time window following an alert (or registration) of potential sepsis. We have two datasets on sepsis which both originate from Stanford Health Care. The first dataset concerns testing a new automatic rule-based sepsis alert system, which during a trial period would registers alerts and for half of the registered alerts, an alert was forwarded to a doctors pager. For the second dataset, we have general sepsis alert registrations, but no group variable, and we study the 24 hour time window following the registration.
The initial goal of the project was search for differences in the medications given to each group in the first dataset – however the analysis showed no major differences in the frequency of common states, the possible order of the medications (through the transitions in a Markov Chain), and visualizing both graphs revealed no major differences. Hence, we concluded that the alert system did not appear to result in a treatment-altering behavior. Following this conclusion, we merged the first dataset with a second, much larger, dataset.
Our main objective for the merged dataset is to predict the next medication using a Markov Chain. A Markov Chain may appear as a simple tool to model a complex decision process, but the sequences are on average 3 entries long and hence more advanced methods were not considered suitable. We report our prediction accuracy on a test set and the result appears reasonable, given the semi-large set of medications and the simple estimation procedure of a Markov Chain.
4.2.3 Conclusion, experiences and thoughts
In the future, we believe that studies of sequential medical data need to in- corporate both categorical variables and quantitative measurements to obtain enough signal that a proper, reliable and precise prediction can be made. This could for example suggest a prioritized list of recommended medications. It would require integration of many categorical variables with quantitative mea- surements, but this appears necessary to move beyond idea and the proof of concept stage to an actual clinical decision support system. Alternatively, much larger datasets (think all national registrations of sepsis) would be an- other avenue of obtaining more signal. Most probably, this approach would not change the fact that sepsis medication sequences typically consist of 2-7 entries. Hence, this does not appear to be the way forwards. A deeper and more comprehensive analysis of the graph generated by the medications is another avenue which would benefit significantly from more data – the idea of studying sub-graphs for patient sub-groups in collaboration with medical doctors may be fruitful for further hypothesis generation on the causes and factors of sepsis.
References
[1] Laurens van der Maaten and Geoffrey Hinton. “Visualizing data using t-SNE”.Journal of Machine Learning Research(2008), 2579–2605.
[2] Tomas Mikolov, Kai Chen, Greg Corrado and Jeffrey Dean. “Efficient estimation of word representations in vector space” (2013). arXiv:1301.
3781.
[3] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado and JeffDean.
“Distributed representations of words and phrases and their compo- sitionality”.Advances in Neural Information Processing Systems. 2013, 3111–3119.
C h a p t
5
Technical introduction to embeddings
This chapter contains supplementary technical material to the chapters 4, 6, and 7. Apart from the first two sections which are interconnected, any section in this chapter can be read almost independently and serves as a quick lookup for the reader. The aim is to make the thesis as self-contained as possible.
In Section 5.1 we formally define the data structures that we study and in Section 5.2 we explain how the data structures is encoded as mathematical vector representation and how we define the context of an event.
5.1 Preliminaries
In this section, embeddings are introduced, both in terms of the presumptions we make on observed data and the corresponding mathematical details, in- spired by [5]. We use this general setup in both Chapter 6 and 7. We observe an unordered collection of sequences,S, e.g.
S= [s1, s2, . . . , sp],
for somep∈N. Each sequencesjis an ordered set of itemsets (typically ordered by time or sequential order) and is denoted by
sj= (X1j, Xj2, . . . , Xnjj),
for somenj ∈Nwhich denotes the number of entries in the sequence. The length of a sequences= (X1, . . . , Xns) is defined byl(s) =|X1|+. . .+|Xns|, where
| · |denotes the number of elements. An itemsetXkjis an ordered set of items, e.g.
Xkj={i1, i2, . . . , imj,k}
for somemj,k∈Nwhich denotes the number of items in the set. The ordering of the items is fixed prior to analysis and not relevant to us. An itemibelongs to a sequences= (X1, X2, . . . , Xn) if
i∈s ⇐⇒ ∃k∈ {1,2, . . . , ns}:{i} ⊆Xk.
A sequencesa= (Y1, Y2, . . . , Yna) iscontainedin another sequencesb= (X1, X2, . . . , Xnb) if there is a strictly increasing sequence of integers (mj)1≤j≤nasuch that Y1⊆Xm1, Y2⊆Xm2, . . . , Yna⊆Bmna. (5.1) In the affirmative case,sais called asubsequenceofsband we denote this by savsb. We define the set of alluniqueitems in the collectionS by
IB ni
∃sj∈ S,such that there existsk:i∈Xkjo
. (5.2)
The itemsets and the order of itemsets define the sequence and hence the sequences
saB({a},{b},{c}), sbB({a},{c},{b}) are not equal, provided the items were letters in the alphabet.
Examples of this structure could be a corpora of documents (sequences) with items being words, or a database of electronic health records with record entry names being items. In these applications, the itemsetsXonly contain a single item. The underlying presumption is that the sequential structure defines the purpose and meaning of each itemset – in natural language pro- cessing this is called the Distributional Hypothesis [8]. Exactly how we utilize the sequential structure to interpret the Distributional Hypothesis to produce quality embeddings, is amodelingquestion.Word2vec, introduced in [14] and computationally enhanced in [15], contains two models, Skip-Gram and CBOW, grounded on the Distributional Hypothesis. Another model is a Markov Chain.
The framework above is used in Sequential Pattern Mining (SPM). In our application of SPM, we analyse treatment packages but due to practical data collection issues we only consider itemsets with a single item. For a good intro- duction to the field of SPM, we refer to [5]. An example of general sequential databases is grocery shopping: Each customer corresponds to a sequence of purchases and each purchase (or basket) consists of groceries, and each grocery is an item in the above terminology. A central point of analysis in this field is the relation and co-occurrence of items, which can be used for recommendation of additional sales items.