The development of Large Vision-Language Models (LVLMs) has been hindered by hallucinations. Existing methods often struggle to accurately infer relationships between objects and their attributes, and frequently overlook the challenges posed by semantic duality. In this work, we develop METER, a novel multimodal hallucination detection method that utilizes a mixture of experts through tool-supported reasoning and ensembling. Specifically, our model rethinks and infers based on the decomposed reasoning steps derived from the chain-of-thought prompts, which eliminates the need for additional manual templates and recognizes attributes of hallucination step by step. We also use topics discovered from image-text pairs to distinguish ambiguous text, mitigating semantic duality. Furthermore, we investigate the effects of incorporating external tools into hallucination detection, exploring the variations and efficacy of tool ensembling in mitigating hallucinations. Additionally, we successfully alleviate hallucinations by incorporating METER's explanation into the prompt. Extensive experiments demonstrate the effectiveness of our model. Our codes are available at https://github.com/lambdarw/METER.