R. Santos Ríos, A. Pérez Bote, J. Paz Ruza, J. Vilares Ferro

This article presents the latest developments within the CIDMEFEO project (Data Science and Engineering for the Improvement of Official Statistics) in the field of automatic coding. Our goal is to develop a deep learning automatic encoder for multiclass classification. The work focuses on the Spanish National Classification of Economic Activities (CNAE), which organizes the economic activity of companies into a four-level hierarchy comprising 664 final categories. CNAE was chosen as a case study due to both its relevance and complexity. Our proposal combines deep learning techniques and LLMs to generate synthetic training data from category descriptions. Then, we employ BERT-type models for classification, including multi-class and hierarchical approaches. Our results indicate improved classification performance, up to 15% when using generated data, while increasing efficiency (by reducing computational costs compared to using LLMs) and security (by reducing the use of real data)

Keywords: Deep learning, machine learning, text classification, transformers, CNAE

Scheduled

SI: Propuestas desde la ciencia y la ingeniería de datos para problemas específicos en Estadística Pública
September 4, 2026  3:30 PM
Aula 21


Other papers in the same session

Ajuste Estacional ante Rupturas Estructurales en Estadística Pública

C. Amoroso, S. J. Koopman, C. García-Martos, G. Aneiros, J. A. Vilar Fernández, M. Francisco-Fernández, M. Oviedo

Estimación de matrices de reasignación del gasto turístico mediante información de pernoctas y pagos con tarjeta

D. Frade-Amil, M. Oviedo de la Fuente, S. Naya, J. Tarrío-Saavedra, L. Carpente, M. Francisco-Fernández

Autoencoders para la imputación de datos en series espacio-temporales: una aplicación a la predicción del Índice de Cifra de Negocios en la Industria

A. Juncal, O. Fontenla Romero, B. Guijarro Berdiñas, E. Hernández Pereira, B. Acereda Serrano, S. Barragán Andres, E. Rosa Perez, J. M. Martin Moral


Cookie policy

We use cookies in order to be able to identify and authenticate you on the website. They are necessary for the correct functioning of it, and therefore they can not be disabled. If you continue browsing the website, you are agreeing with their acceptance, as well as our Privacy Policy.

Additionally, we use Google Analytics in order to analyze the website traffic. They also use cookies and you can accept or refuse them with the buttons below.

You can read more details about our Cookie Policy and our Privacy Policy.