Nowadays, cloud brokers play an important role for allocating resources in the cloud computing market, which mediate between cloud users and service providers by buying a limited capacity from the providers and subleasing them to the users to make profits. However, the user demands are usually stochastic and the resource capacity bought from cloud providers is limited. Therefore, in order to maximize the profits, the broker needs an effective resource allocation algorithm to decide whether satisfying the demands of arriving users or not, i.e. need to allocate the resource to a valuable user. In this paper, we propose a resource allocation algorithm named Q-DP, which is based on reinforcement learning and dynamic programming, for the broker to maximize the profits. First, we consider all arriving users' demands at each stage as a bundle, and model the process of the broker allocating resources to all arriving users as a Markov Decision Process. We then use the Q-learning algorithm to determine how much resources will be allocated to the bundle of users arriving at the current stage. Next, we use dynamic programming to decide which cloud user will obtain the resources. Finally, we run experiments in the artificial dataset and realistic dataset respectively to evaluate our resource allocation algorithm against other typical resource allocation algorithms, and show that our algorithm can beat other algorithms, especially in the setting of the broker having extremely limited resources.