byron wallace: corpora

Evidence Inference

We're excited to share the first release of the Evidence Inference website, part of NSF CAREER project. The idea here is to infer the reported findings concerning a given treatment, comparator, and outcome of interest. More details on the corpus website, here: http://evidence-inference.ebm-nlp.com/

EBM-NLP

The EBM-NLP dataset comprises about 5,000 abstracts of articles describing clinical trials, the text is annotated to indicated descriptions of patients/populations, interventions, and outcomes. This corpus is described at length in our ACL 2018 paper, "A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature". The corpus website is here: http://www.ccs.neu.edu/home/bennye/EBM-NLP/

Dr Reviews

This is a corpus of online reviews of physicians from RateMDs.com. We used this in our JAMIA 2014 paper. Here's the data http://www.byronwallace.com/static/ratemd-dr-ratings.csv.zip

Verbal irony (~sarcasm) on reddit

This is a dataset built in 2014; it comprises reddit comments annotated as being sarcastic (or not). One version is available here: https://github.com/bwallace/ACL-2014-irony; there is also a version on Kaggle: https://www.kaggle.com/rohandoshi/sarcasm-classification.