Building a spontaneous, multi-modal, rich-annotated emotion database is a challenging work. Although there have been a growing number of emotional corpora available, most of them were recorded in lab controlled' environment. This paper presents a recently collected database, CASIA Natural Emotional Audio-Visual Database. This corpus contains two hours spontaneous emotional segments extracted from 219 speakers from films, TV plays and talk shows. The number of the speakers of the corpus makes this database a valuable addition to the existing emotional databases. In total, 24 non-prototypical emotional states are labeled by three first Chinese native speakers. In contrast to other available emotional databases, we provided multi-emotion labels and fake/suppressed emotion labels. To our best knowledge, this database is the first large-scale Chinese natural emotion corpus dealing with multi-modal and natural emotion.