An entirely incomplete listing of current projects I work on.
Evidence-based medicine (EBM) looks to inform patient care with the totality of available relevant evidence. But the volume of published biomedical data renders this goal infeasible.
For example, systematic reviews are the cornerstone of EBM and are critical to modern healthcare, informing everything from national health policy to bedside decision-making. But conducting systematic reviews is extremely laborious (and hence expensive): producing a single review requires thousands of person-hours. Moreover, the exponential expansion of the biomedical literature base has imposed an unprecedented burden on reviewers, thus multiplying costs. Researchers can no longer keep up with the primary literature, and this hinders the practice of evidence-based care.
This project concerns optimizing the practice of EBM via novel machine learning and natural language processing methods, with the aim of facilitating evidence-based care in an era of information overload.
We aim to develop resources and novel computational methods to advance automated irony detection (i.e., identification of the ironic voice in online content). This is a challenging task because the meaning of natural language is not captured by words and syntax alone. Rather, utterances (tweets, sentences in forum posts, etc.) are embedded within a specific context. The ironic voice is an important example of this phenomenon: to appreciate a speaker’s intended meaning, it is crucial to first infer if he or she is being ironic or sincere.
Existing automated approaches to irony detection leverage statistical natural language processing (NLP) and machine learning (ML) methods. These models tend to be relatively ‘shallow’ in that they operate only over simple, unstructured representations of data. In our view, verbal irony detection is somewhat unique in that such representations are likely inadequate. Context is necessary to discern ironic intent: we aim to demonstrate this empirically and build models that exploit contextual cues.
Physician-patient communication is a key aspect of health-care, but it is poorly understood. How can physicians communicate with their patients to effect better health outcomes? To answer this question, we first need to better understand and quantify how physicians currently communicate with patients. To this end, transcripts of physician-patient visits annotated with clinically relevant codes that capture important topics and acts of speech can provide vital insights into interactions by revealing communication patterns. For example, using codes that indicate the topic being discussed, we can quantify the fraction of time spent discussing, e.g., biomedical issues (as opposed to, say, time spent socializing). These kinds of analyses allow us to quantify measures of patient-centeredness in outpatient care.
But labeling transcripts with such codes is tedious and expensive, precluding large-scale analyses. Moreover, existing analyses of such codes are inadequate: we need better models to perform sophisticated analyses. These might, for example, highlight subtle but important communicative patterns. This project looks to realize this aim.
The amount of unstructured health-related information online has exploded (e.g., consumer reviews of physicians; health-related Tweets; media coverage of health stories and associated comments).
This broad project aims to design new models that process and make sense of such information to better understand the health needs and wants of individuals.