This article aims to assess the effect of embodied interaction on attention during the process of solving spatio-visual navigation problems. It presents a method that links operator's physical interaction, feedback, and attention. Attention is inferred through networks called Bayesian Attentional Networks (BANs). BANs are structures that describe cause-effect relationship between attention and physical action. Then, a utility function is used to determine the best combination of interaction modalities and feedback. Experiments involving five physical interaction modalities (vision-based gesture interaction, glove-based gesture interaction, speech, feet, and body stance) and two feedback modalities (visual and sound) are described. The main findings are: (i) physical expressions have an effect in the quality of the solutions to spatial navigation problems; (ii) the combination of feet gestures with visual feedback provides the best task performance.