Description: Rank Selection in Non-negative Matrix Factorization: systematic comparison and a new MAD metric

This title appears in the Scientific Report : 2019

Rank Selection in Non-negative Matrix Factorization: systematic comparison and a new MAD metric

Abstract—Non-Negative Matrix Factorization (NMF) is a powerful dimensionality reduction and factorization method that provides a part-based representation of the data. In the absence of a priori knowledge about the latent dimensionality of the data, it is necessary to select a rank of the reduced re...

Personal Name(s):	Muzzarelli, Laura (Corresponding author)
	Weis, Susanne / Eickhoff, Simon / Patil, Kaustubh
Contributing Institute:	Gehirn & Verhalten; INM-7
Imprint:	2019
Physical Description:	7
Conference:	2019 International Joint Conference on Neural Networks, Budapest (Hungary), 2019-07-14 - 2019-07-19
Document Type:	Contribution to a conference proceedings
Research Program:	Human Brain Project Specific Grant Agreement 2 Supercomputing and Modelling for the Human Brain Theory, modelling and simulation
Link:	OpenAccess OpenAccess
	Publikationsportal JuSER

Please use the identifier: http://hdl.handle.net/2128/21854 in citations.

Abstract—Non-Negative Matrix Factorization (NMF) is a powerful dimensionality reduction and factorization method that provides a part-based representation of the data. In the absence of a priori knowledge about the latent dimensionality of the data, it is necessary to select a rank of the reduced representation. Several rank selection methods have been proposed, but no consensus exists on when a method is suitable to use. In this work, we propose a new metric for rank selection based on imputation cross-validation, and we systematically compare it against six other metrics while assessing the effects of data properties. Using synthetic datasets with different properties, our work critically evidences that most methods fail to identify the true rank. We show that properties of the data heavily impact the ability of different methods. Imputation-based metrics, including our new MADimput, provided the best accuracy irrespective of the data type, but no solution worked perfectly in all circumstances. One should therefore carefully assess characteristics of their dataset in order to identify the most suitable metric for rank selection. Keywords— non-negative matrix factorization, rank selection, cross-validation.