Hand gesture plays an important role in communication among the hearing and speech disorders people. Hand gesture recognition (HGR) is the backbone of human–computer interaction (HCI). Most of the reported hand gesture recognition techniques suffer due to the complex backgrounds. As per the literature, most of the existing HGR methods have only selected a few inter-class similar gestures for recognition performance. This paper proposes a two-phase deep learning-based HGR system to mitigate the complex background issue and consider all gesture classes. In the first phase, inception V3 architecture is improved and named mIV3Net: modified inception V3 network to reduce the computational resource requirement. In the second phase, mIV3Net has been fine-tuned to offer more attention to prominent features. As a result, better abstract knowledge has been used for gesture recognition. Hence, the proposed algorithm has more discrimination characteristics. The efficacy of the proposed two-phase-based HGR system is validated and generalized through experimentation using five publicly available standard datasets: MUGD, ISL, ArSL, NUS-I, and NUS-II. The accuracy values of the proposed system on five datasets in the above order are 97.14%, 99.3%, 97.4%, 99%, and 99.8%, which indicates significant improvement, i.e., 12.58%, 2.54%, 2.73%, 0.56%, and 2.02%, respectively, than the state-of-the-art HGR systems.