Designing an application for classifying Time Use Survey texts
The EMEGAE initiative addresses gender inequality in the Spanish Higher Education and Research System (SESIE). It focuses on the analysis of the distribution of tasks between women and men, as well as on the valuation of these tasks, considering whether they are adequately recognized, made visible, or remunerated within academic and research institutions.
In this work, we present an application developed to support the collection, processing and automatic categorization of data provided by survey participants, as well as the Natural Language Processing (NLP) methods on which the solution is based. Initially, few-shot learning techniques using Large Language Models, and subsequently, the exploration of other more scalable and interpretable approaches, such as vector search based on a domain-adapted embeddings model.
Palabras clave: Text classification Time Use Survey NLP LLM