Sentiment analysis of video comment text has important application value in modern social media and opinion mana gement. By conducting sentiment analysis on video comments, we can better understand the emotional tendency of users, optimise content recommendation, and effectively manage public opinion, which is of great practical significance to the push of video content. Aiming at the current video comment text sentiment analysis methods problems such as understanding ambiguity, complex const ruction, and low accuracy. This paper proposes a sentiment analysis method based on the M-S multimodal sentiment model. Firstly , briefly describes the existing methods of video comment text sentiment analysis and their advantages and disadvantages; then it studies the key steps of multimodal sentiment analysis, and proposes a multimodal sentiment model based on the M-S multimodal sentiment model; finally, the efficiency of the experimental data from the Communist Youth League video comment text was verified through simulation experiments. The results show that the proposed model improves the accuracy and real-time performance of the prediction model, and solves the problem that the time complexity of the model is too large for practical application in the existing multimodal sentiment analysis task of the video comment text sentiment analysis method, and the interrelationships and mutual influences of the multimodal information are not considered.