Although language models have been shown to have a remarkable grasp of the structure of language (e.g., when generating fluent sentences, or learning the correspondence between words), there has been much doubt cast on their ability to truly understand meaning, given that their text-only training has given them no access to the grounded meaning of words in the world. In this talk, we discuss how we could take a language model trained only on text, and teach it to be grounded, by showing it (in a few-shot way) what certain words ground to in a text world. We focus on grounded concepts such as colours, spatial locations, state changes and quantities that can be represented in text (e.g., a text world that shows a model what the words north or south might mean), and fine-tune GPT-style language models on these text worlds to teach them a concept. We then evaluate how well language models extrapolate this learned meaning to other word forms that they had previously learned in a text-only fashion, thus allowing us to analyse their ability to use the structure of language learned during pretraining to generalise to new grounded concepts.
Roma is a PhD student at Brown University advised by Ellie Pavlick.