Distance-based dimensionality reduction for big dat
Dimensionality reduction (DR) involves projecting a high-dimensional dataset into a lower-dimensional space. Many DR techniques have been proposed; most of them are based on the inter-individual distance matrix. However, using full distance matrices becomes impractical when the number of individuals is very large due to significant computational time and memory requirements. Although there are algorithms that extend classical multidimensional scaling (MDS) to big data, many use specific elements of classical MDS and cannot be applied to other DR methods. One exception is the divide-and-conquer algorithm, which we adapt in this work for use with any generic, distance-based DR method. We implemented a generalized Python framework for distance-based DR methods that uses the divide-and-conquer strategy to reduce time and memory complexities. We tested our framework with non-classical MDS, local MDS, Isomap, and t-SNE.
Keywords: Divide-and-conquer Procrustes transformations non-classical MDS local MDS Isomap t-SNE