Due to the tremendous cooling costs, data center cooling efficiency improvement has been actively pursued for years. In addition to cooling efficiency, the reliability of the. cooling system is also essential for guaranteed uptime. In traditional data center cooling system design with N+1 or higher redundancy, all the computer room air conditioning (CRAG) units are either constantly online or cycled according to a predefined schedule. Both cooling system configurations, however have their respective drawbacks. Data centers are usually over provisioned when all CRAG units are online all the time, and hence the cooling efficiency is low On the other hand, although cooling efficiency can be improved by cycling CRAG units and turning off the backups, it is difficult to schedule the cycling such that sufficient cooling provisioning is guaranteed and gross over provisioning is avoided. In this paper, we aim to maintain the data center cooling redundancy while achieving high cooling efficiency. Using model-based thermal zone mapping, we first partition data centers to achieve the desired level of cooling influence redundancy. We then design a distributed controller for each of the CRAC units to regulate the. thermal status within its zone of influence. The distributed controllers coordinate with each other to achieve the desired data center thermal status using the least cooling power When CRAG units or their associated controllers fail, racks in the affected thermal zones are still within the control "radius" of other decentralized cooling controllers through predefined thermal zone overlap, and hence their thermal status is properly managed by the active CRAG units and controllers. Using this failure resistant data center cooling control approach, both cooling efficiency and robustness are achieved simultaneously. A higher flexibility in cooling system maintenance is also expected, since the distributed control system can automatically adapt to the new cooling facility configuration incurred by maintenance.