Deep multi-view clustering effectively utilizes multidimensional perspective data, categorizing sample entities into their respective categories. Nonetheless, prevalent methodologies frequently exhibit inefficiency during the feature fusion phase, particularly in isolating pivotal features conducive to clustering. To address this problem, this paper proposes a multi-view clustering method based on multi-scale attention and loss penalty mechanism (MALPMVC). The MALPMVC method begins by utilizing an autoencoder to extract latent feature representations, then employs multi-scale attention to enhance the salience of channel features and spatial areas, thereby intensifying focus on significant feature channels and spatial areas. The loss penalty mechanism is then used to focus the model on hard-to-classify samples, improving the ability to learn discriminative features from hard-to-categorize samples. Finally, the obtained fused features are inputted into the data clustering module to divide the data samples into clusters. Extensive experiments have shown that the MALPMVC method surpasses 10 other competitive clustering approaches, such as CoMVC, MFLVC, and GCFAggMVC, in delivering superior performance. Furthermore, with an increase in the number of views, the model effectively counteracts the adverse influences of mutually exclusive views, successfully mitigating the detrimental effects associated with these conflicts. Particularly, in the Caltech-4V and Caltech-5V datasets, it outperforms the GCFAggMVC method by an impressive 12.36% and 9.21% in clustering accuracy, respectively.