An eye, as a sensory organ, not only captures visual information but also provides critical cues about human attention and intentions. It is also possible to infer focus, interest, and even truthfulness from where one's gaze is directed. Gaze also helps in influencing social interactions and human behavior. Our gaze can reveal our engagement, attraction, and subconscious reactions, contributing to nonverbal communication and shaping interpersonal dynamics. Eye gaze direction estimation is a crucial task in many real-world applications, such as driver drowsiness detection, human-computer interaction, and assistive technologies. Traditional single-stage CNN classifiers may not be so useful in such applications due to less representation of data for particular classes. Data augmentation can help up-to some extent, but we need multiple novel ideas to be employed to overcome these limitations. In this work, we proposed a hierarchical CNN architecture for eye gaze direction classification, that integrates predictions of three distinct stages to create an ensemble learning based classifier. The proposed model is trained on a dataset of eye images with ground-truth class labels for each gaze direction. Experimental results demonstrate that our proposed method achieved improved classification results on the eye gaze dataset, indicating its potential for real-world applications.