Artificial Intelligence (AI) in supermarkets is moving fast with the recent advances in deep learning. One important project in the retail sector is the development of AI solutions for smart stores, mainly to improve product recognition. In this paper, we present a new framework to address the multi-view image classification using multiple clustering. The proposed framework combines a pre-trained Vision Transformer with a Bayesian Non-Parametric multiple clustering. In this work, we propose an MCMC-based inference approach to learn the column-partition and the row-partitions. This method infers multiple clustering solutions and allows to find automatically the number of clusters. Our method provides interesting results on a multi-view image dataset and emphasizes, on one hand, the power of pre-trained Vision Transformers combined with the multiple clustering algorithm, on the other hand, the usefulness of the Bayesian Non-Parametric modeling, which automatically performs a model selection.