This research presents a systematic framework that employs large language models to identify and evaluate destination attractors based on travellers' intentions and emotions, as deduced from user-generated images. The framework (1) directs the model to identify attractors related to themes, environments, and activities and (2) assesses travellers' interest by measuring importance, focal distance, character portrayal, and emotions through a multimodal social semiotic perspective. To optimise model performance and generate quantifiable outcomes, this research incorporates strategies during prompt engineering, model validation, fine-tuning, and empirical testing. The case study demonstrates the fine-tuned model's effectiveness in identifying attractors and deducing subjective intentions and emotions, highlighting its potential for application in similar image-based studies that attempt to integrate identification and evaluation together. Compared to conventional computational methods, large language models facilitate data-driven decision-making, improve analytical capabilities, enhance operational efficiency, and optimise destination management.