Single biometric techniques have several problems, like non-universality, noisy data, unacceptable error rates, and spoof attacks. To solve these problems, multimodal biometrics are used to obtain secure authentication based on multiple modalities such as iris, retina, fingerprint, and face. Nowadays, several research works are being conducted to improve the security of sensitive data. This paper proposes multimodal secure biometrics (MSB) using a deep learning (DL) framework. Based on this network, retina, and fingerprint datasets are considered for the evaluation process to protect the multimodal biometric template. The proposed work is mainly divided into three methods: the pre-processing model, the fusion feature extraction module, and the diagonal hash compression (DHC) module. Initially, the fingerprint and retinal images are pre-processed to train the images for removing noise. Then, the fingerprint and retina images are extracted for the deep features using the Attention efficient net B7 model. Further, the network weights are optimized by the sparrow search optimization (SSO), and the features are fused by the DHC. The experimental analysis is evaluated with Python programming language. The performance metrics are evaluated on the two benchmark datasets and compared with the existing research works. This model achieves a better accuracy of 99.94% and minimized the error rate of 0.12. Finally, it is proved from the results that this proposed technique helps to protect the confidentiality of the original users' data.