Powered by advanced Artificial Intelligence (AI) techniques, conversational AI systems, such as ChatGPT, and digital assistants like Siri, have been widely deployed in daily life. However, such systems may still produce content containing biases and stereotypes, causing potential social problems. Due to modern AI techniques' data-driven, black-box nature, comprehensively identifying and measuring biases in conversational systems remains challenging. Particularly, it is hard to generate inputs that can comprehensively trigger potential bias due to the lack of data containing both social groups and biased properties. In addition, modern conversational systems can produce diverse responses (e.g., chatting and explanation), which makes existing bias detection methods based solely on sentiment and toxicity hardly being adopted. In this paper, we propose BiasAsker, an automated framework to identify and measure social bias in conversational AI systems. To obtain social groups and biased properties, we construct a comprehensive social bias dataset containing a total of 841 groups and 5,021 biased properties. Given the dataset, BiasAsker automatically generates questions and adopts a novel method based on existence measurement to identify two types of biases (i.e., absolute bias and related bias) in conversational systems. Extensive experiments on eight commercial systems and two famous research models, such as ChatGPT and GPT-3, show that 32.83% of the questions generated by BiasAsker can trigger biased behaviors in these widely deployed conversational systems. All the code, data, and experimental results have been released to facilitate future research.