We're excited to share the first release of the EBM-NLP corpus! This dataset is described at length in our ACL 2018 paper, "A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature". The corpus website is here: http://www.ccs.neu.edu/home/bennye/EBM-NLP/

Dr Reviews

This is a corpus of online reviews of physicians from RateMDs.com. We used this in our JAMIA 2014 paper. Here's the data http://www.byronwallace.com/static/ratemd-dr-ratings.csv.zip

Verbal irony (~sarcasm) on reddit

This is a dataset built in 2014; it comprises reddit comments annotated as being sarcastic (or not). One version is available here: https://github.com/bwallace/ACL-2014-irony; there is also a version on Kaggle: https://www.kaggle.com/rohandoshi/sarcasm-classification.