Extreme Learning Machine (ELM) as a new single hidden layer feedforward neural network (SLFN) learning framework has obtained extensive attention and in-depth research in various domains. It has been widely used in many applications, such as action recognition, emotion recognition, fault diagnosis, and so on. ELM was originally proposed for "generalized" single hidden layer feedforward neural networks to overcome the challenging issues faced by back-propagation (BP) learning algorithm and its variants. Recent studies show that ELM can be extended to "generalized" multilayer feedforward neural networks in which a hidden node could be a subnetwork of nodes or a combination of other hidden nodes. ELM provides an efficient and unified learning framework for regression, classification, feature learning, and clustering. The learning theories of ELM show that when learning parameters of hidden layer nodes are generated independently of training samples, as long as the activation function of feedforward neural network is non-linear and continuous, it can approach any continuous objective function or any complex decision boundary in the classification task. In ELM, the input weights and hidden biases connecting the input layer and the hidden layer can be independent of the training sample and randomly generated from any continuous probability distribution. The output weight matrix between the hidden layer and the output layer is obtained by minimizing the square loss function and solving the Moore-Penrose generalized inverse operation to obtain the minimum norm least squares solution. The only parameter that needs to be optimized is the number of hidden layer nodes. It has been shown by theoretical studies that ELM is capable of maintaining the universal approximation and classification capability of SLFNs even if it works with randomly generated hidden nodes. Different from traditional gradient-based neural network learning algorithms, which are sensitive to the combination of parameters and easy to trap in local optimum, ELM has faster learning speed, least human intervention and easy to implementation. In a word, ELM has become one of the most popular research directions in the field of artificial intelligence in recent years and received widespread attention from more and more research members domestic and abroad. To make it more suitable and efficient for specific applications, ELM theories and algorithms have been investigated extensively in the past few decades. Recently, random neurons have gradually been used in deep learning and ELM provides the theoretical basis for use. This paper aims to provide a comprehensive review of existing research results in ELM. We first give an introduction to the historical background and developments of ELM. Then we describe the principle and algorithm of ELM in detail followed by the introduction of its feature map and feature space. After an overview of ELM theory, we discuss and analyze state-of-the-art algorithms or the typical variants of ELM, including models, solution approaches and relevant problems. On this basis, the core ideas, advantages and disadvantages of each algorithm are summarized. Furthermore, the latest applications of ELM are reviewed. In the end, several controversies, open issues and challenges in ELM are pointed out together with its future research directions and trends. © 2019, Science Press. All right reserved.