In the face of a technological and multimodal reality, as we are experiencing today, discussing language teaching in the perspective of multimodality, specifically with regard to reading and critical visual literacy, has become quite pertinent and inevitable. The multimodal resources, especially the imagery, have been widely used in the various social spheres, which makes us question how schools and teacher training courses have addressed this theme in the classroom. This paper aims to investigate how English language teachers in training have launched look at the imagery texts during the supervised stage. This is an interpretative and descriptive research that follows the qualitative approach of data analysis. For the analysis of data, the interviews with UERN students who lived through the first phase of supervised training were considered. The data show that trainee students understand the image only as a facilitating tool for language acquisition, demonstrating a limited view on reading conceptions and critical visual literacy. Thus, it is perceived a lack of knowledge about multimodal literacy by the trainees, which suggests the need for this undergraduate course in Letters / English to review concepts and approaches on the treatment of multimodal text in the classroom, since the images should not configure as a translation of the verbal text, as a pretext for exploring grammar, but as texts with their own ideas and meanings