Convolutional neural networks (CNNs) have revolutionized motor decoding from electroencephalographic (EEG) signals, showcasing their ability to outperform traditional machine learning, especially for Brain-Computer Interface (BCI) applications. By processing also other recording modalities (e.g., electromyography, EMG) together with EEG signals, motor decoding improved. However, multi-modal algorithms for decoding hand movements are mainly applied to simple movements (e.g., wrist flexion/extension), while their adoption for decoding complex movements (e.g., different grip types) is still under-investigated. In this study, we recorded EEG and EMG signals from 12 participants while they performed a delayed reach-to-grasping task towards one out of four possible objects (a handle, a pin, a card, and a ball), and we addressed multi-modal EEG+EMG decoding with a dual-branch CNN. Each branch of the CNN was based on EEGNet. The performance of the multi-modal approach was compared to mono-modal baselines (based on EEG or EMG only). The multi-modal EEG+EMG pipeline outperformed the EEG-based pipeline during movement initiation, while it outperformed the EMG-based pipeline in motor preparation. Finally, the multi-modal approach was capable of accurately discriminating between grip types widely during the task, especially from movement initiation. Our results further validate multi-modal decoding for potential future BCI applications, aiming at achieving a more natural user experience.