Understanding semi-structured tabular data, which is ubiquitous in the real world, requires an understanding of the meaning of text fragments and the implicit connections between them. We believe such data could be used to investigate how individuals and machines reason about semi-structured data. First, we present the InfoTabS dataset, which consists of human-written textual predictions based on tables collected from Wikipedia’s infoboxes. Our research demonstrates that the semi-structured, multi-domain, and heterogeneous nature of the premises prompts complicated, multi-faceted reasoning, offering a modeling challenge for traditional modeling techniques. Second, we analyzed these challenges in-depth and developed simple, effective preprocessing strategies to overcome them. Thirdly, despite accurate NLI prediction, we demonstrate through rigorous probing that the existing model does not reason with the provided tabular facts. To address this, we suggest a two-stage evidence extraction and tabular inference technique for enhancing model reasoning and interpretability. We also investigate efficient methods for enhancing tabular inference datasets with semi-automatic data augmentation and pattern-based pre-training. Lastly, to ensure that tabular reasoning models work in more than one language, we introduce XInfoTabS, a unique problem of bilingual tabular inference, and a cost-effective pipeline for translating tables. In the near future, we plan to test the tabular reasoning model for temporal changes, especially for dynamic tables where information changes over time.
Vivek is a Postdoctoral Researcher at the University of Pennsylvania, specializing in natural language processing (NLP) with Prof. Dan Roth in the Cognitive Computation Group. He holds a Ph.D. in Computer Science from the University of Utah, with a focus on semi-structured data. With over five years of experience in AI, NLP, and ML, he has collaborated with organizations such as Microsoft Research, Bloomberg, Facebook AI, IBM Research, and Verisk. Vivek has received numerous awards, including the Bloomberg Data Science Fellowship, Microsoft Research Fellowship, Ericsson Innovation Award, Best Paper Award at DeeLIO 2022, and Outstanding Paper Award at NLP4ConvAI 2022. He also served as the inaugural coordinator of the Utah Data Science Club and IIT Kanpur’s Special Interest Group in Machine Learning (SIGML).