Chatbots such as GPT-4 and ChatGPT are currently serving millions of users. Despite their widespread use, there remains a lack of public datasets that showcase how these tools are used by users in practice. In this talk, I will introduce (InThe)WildChat, a corpus of 570K user-ChatGPT conversations, which comprises over 1.5 million interaction turns. I will show that, compared to other popular user-chatbot interaction datasets, WildChat offers the most diverse user prompts and presents the richest variety of potentially toxic use-cases. Finally, I will demonstrate the potential utility of this dataset in fine-tuning state-of-the-art instruction following models.
Wenting Zhao is a Ph.D. candidate in Computer Science at Cornell University. Her research focuses on improving reasoning capabilities of large language models by exploiting explicit problem structures.