This material is based upon work supported by the National Science Foundation (NSF) under Grant No 1750978: https://www.nsf.gov/awardsearch/showAward?AWD_ID=1750978, "CAREER: Structured Scientific Evidence Extraction: Models and Corpora".
Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Project duration (expected): 7/1/2018 -- 6/30/2023.
Project PI: Byron Wallace.
Students: Jay DeYoung (PhD student); Eric Lehman (undergraduate).
Project goals: We aim to design natural language processing (NLP) models that can "read" through scientific articles and extract reported findings from these. As an important, illustrative motivating domain, we specifically seek to train models that can consume reports of clinical trials and extract actionable, structured evidence from these. Such models, if successful, would allow domain experts (physicians, in this case) to harness the entirety of the published evidence base to inform treatment decisions, something not currently possible due to published evidence being predominantly unstructured.
Research challenges: Designing and training models to make sense of the findings reported in clinical trials requires core technical innovations to realize models that can jointly extract entities and infer relationships between them over lengthy technical articles, and new corpora with which to train and evaluate them. We aim to meet both of these requirements in this project.
Selected project outputs
We describe a first version of the dataset we are collecting for this task along with initial models and results in our NAACL 2019 paper, Inferring Which Medical Treatments Work from Reports of Clinical Trials and accompanying website: http://evidence-inference.ebm-nlp.com.
Here is a blog post describing the task, and our initial models: http://evidence-inference.ebm-nlp.com/blog/
Source code for our initial models is available on GitHub: https://github.com/jayded/evidence-inference.
PI Wallace spoke about the overarching goal of automating biomedical evidence synthesis on the NLP Highlights podcast: https://soundcloud.com/nlp-highlights/86-nlp-for-evidence-based-medicine-with-byron-wallace.
This project seeks to make published (unstructured) evidence more useful by designing models that can automatically infer report findings. Meeting this aim would have substantial implications for the practice of evidence-based medicine, and more generally for scientific disciplines in which evidence (specifically, results of trials) is described in free-text reports.
Concerning educational and research opportunities, as part of this project Wallace designed and executed the first iteration of Practical Neural Networks, which included the option of using the aforementioned corpus for the final project. This project has also provided key opportunities for research. In particular undergraduate Eric Lehman, supported in part by an REU supplement for this project, worked on this research project as a sixth-month ‘co-op’ (a core component of Northeastern undergraduate education). Lehman was first author on our NAACL publication, which he will present at the conference in June 2019. Additionally, PhD student Jay DeYoung joined this project in fall 2018, and co-authored the aforementioned NAACL paper. This has afforded important opportunity for him to gain familiarity with key technical problems in NLP that are directly motivated by this project, and he will continue working on these over the summer and into next year. DeYoung also served as a teaching assistant (TA) for the Practical Neural Networks course that is part of this project’s broader impacts, thereby gaining teaching experience.
This page last updated: 5/6/2019; please contact Byron Wallace with any questions.