This study aimed at evaluating the reliability and discriminant validity of a newly developed in-water agility test through a progression in complexity. Thirty-eight young male water polo players were divided into elite (n = 20; 16.2 +/- 0.5 years) and non-elite (n = 18; 15.8 +/- 0.6 years) groups. Players performed 20 m Freestyle Sprint Swimming Test (20 m FSST), Change of Direction Speed (CODS), and the Functional Agility Test without (FAT) and with shooting (FATS). The cognitive (CD) and technical deficits (TD) were calculated by subtracting time results of CODS from FAT and of FAT from FATS respectively. Excellent reliability was found for the 20 m FSST, CODS, and FAT (ICC from 0.954 to 0.982) and good reliability for the FATS (ICC = 0.838). Elite players outperformed non-elite players in 20 m FSST (g = 1.25), CODS (g = 1.08), and FAT (g = 1.14). There were no differences in FATS (g = 0.29), CD (g = 0.14), TD (g = 0.50) and shooting efficiency (g = 0.04). Increasing protocol's complexity resulted in similar performance between groups eliminating differences in physicality. Although designed to mimic in-game performance, the results suggest discriminant validity only for physical attributes among young players, raising the need for further improvements in the testing protocol to better meet the cognitive and technical demands.