Effective representation learning models are critical for knowledge computation and the practical application of knowledge graphs. However, most existing knowledge graph representation learning models primarily focus on structured triple-based entities, neglecting or underutilizing additional multimodal information, such as entity types, images, and texts. To address this issue, we propose a novel framework, M ulti- M odal K nowledge G raph R epresentation L earning ( M2KGRL ), which integrates multimodal features derived from structured triples, images, and textual data to enhance knowledge graph representations. M2KGRL leverages three adapted technologies (i.e., VGG16, BERT, and SimplE) to extract diverse features from these modalities. Additionally, it employs a specially designed autoencoder for feature fusion and a similarity-based scoring function to guide the presentation learning process. The proposed framework is evaluated through extensive experiments on two widely used datasets (FB15K and WN18) against ten representative baseline methods (e.g., ComplEx, TransAE). Experimental results demonstrate that M2KGRL achieves superior performance inmost scenarios. For instance, M2KGRL outperforms TransAE with a 1.8% improvement in Hit@10), showcasing its ability to predict more accurate links by incorporating visual and textual information. These findings highlight the potential of M2KGRL in advancing multimodal knowledge graph representation learning.