Collecting datasets for figurative speech and affective computing is an important but difficult task, often done by employing crowdworkers^. In this talk I will present two new methods for automatically capturing high-quality labeled data for sarcasm detection and emotion classification. I will first introduce reactive supervision, a novel data collection method for capturing sarcasm data by using the dynamics of online conversations to overcome the limitations of existing data collection techniques. Our method is the first to automatically collect both intended sarcasm and perceived sarcasm data, and I will discuss why we need both types. I will also describe a large and unique dataset of tweets with sarcasm perspective labels and new contextual features that we created and shared publicly. Our second proposed method automatically collects emotion data, and specifically induced emotions. The distinction between induced and perceived emotions has long been recognized in other fields, but is mostly missing from the NLP literature. Induced emotion data is of great importance yet most existing datasets contain only perceived emotions labels. Our method automatically collects texts along with their induced emotions and sentiment by exploiting the online use of reaction GIFs. This novel approach captures complex affective states. I will describe the method and a first-of-its-kind affective dataset of tweets. Our data collection methods are generalizable and the datasets open new research opportunities in emotion detection and affective computing.
^Time-permitting, I will mention our work on the ethical implications of NLP crowdsourcing
Boaz Shmueli is a PhD Student in Academia Sinica in Taipei, interested in language resources for affective computing. Before that he was co-director of the Rigsum Research Lab in Thimphu, Bhutan and a software engineer at the IBM labs in Pittsburgh and Haifa