Application of deep learning techniques to automatic classification: CNAE as a case study
R. Santos Ríos, A. Pérez Bote, J. Paz Ruza, J. Vilares Ferro
This article presents the latest developments within the CIDMEFEO project (Data Science and Engineering for the Improvement of Official Statistics) in the field of automatic coding. Our goal is to develop a deep learning automatic encoder for multiclass classification. The work focuses on the Spanish National Classification of Economic Activities (CNAE), which organizes the economic activity of companies into a four-level hierarchy comprising 664 final categories. CNAE was chosen as a case study due to both its relevance and complexity. Our proposal combines deep learning techniques and LLMs to generate synthetic training data from category descriptions. Then, we employ BERT-type models for classification, including multi-class and hierarchical approaches. Our results indicate improved classification performance, up to 15% when using generated data, while increasing efficiency (by reducing computational costs compared to using LLMs) and security (by reducing the use of real data)
Palabras clave: Deep learning, machine learning, text classification, transformers, CNAE
Programado
SI: Propuestas desde la ciencia y la ingeniería de datos para problemas específicos en Estadística Pública
4 de septiembre de 2026 15:30
Aula 21
Otros trabajos en la misma sesión
F. Hermo García, Á. Gómez García, C. Dafonte Vázquez
C. Amoroso, S. J. Koopman, C. García-Martos, G. Aneiros, J. A. Vilar Fernández, M. Francisco-Fernández, M. Oviedo
D. Frade-Amil, M. Oviedo de la Fuente, S. Naya, J. Tarrío-Saavedra, L. Carpente, M. Francisco-Fernández
A. Juncal, O. Fontenla Romero, B. Guijarro Berdiñas, E. Hernández Pereira, B. Acereda Serrano, S. Barragán Andres, E. Rosa Perez, J. M. Martin Moral