3D object detection is a crucial task in computer vision and autonomous systems, which is widely utilized in robotics, autonomous driving, and augmented reality. With the advancement of input devices, researchers propose to use multimodal information to improve the detection accuracy. However, integrating 2D and 3D features effectively to harness their complementary nature for detection tasks is still a challenge. In this paper, we note that the complementary nature of geometric and visual texture information can effectively strengthen feature fusion, which plays a key role in detection. To this end, we propose the Cross-Dimensional Attention Fusion-based indoor 3D object detection method (CDAF3D). This method dynamically learns geometric information with corresponding 2D image texture details through a cross-dimensional attention mechanism, enabling the model to capture and integrate spatial and textural information effectively. Additionally, due to the nature of 3D object detection, where intersecting entities with different specific labels are unrealistic, we further propose Preventive 3D Intersect Loss (P3DIL). This loss enhances detection accuracy by addressing intersections between objects of different labels. We evaluate the proposed CDAF3D on the SUN RGB-D and Scannet v2 datasets. Our results achieve 78.2 mAP@0.25 and 66.5 mAP@0.50 on ScanNetV2 and 70.3 mAP@0.25 and 54.1 mAP@0.50 on SUN RGB-D. The proposed CDAF3D outperforms all the multi-sensor-based methods with 3D IoU thresholds of 0.25 and 0.5.