Caching can be leveraged to significantly improve network performance and mitigate congestion. However, characterizing the optimal tradeoff between routing cost and cache deployment cost remains an open problem. In this paper, for a network with arbitrary topology and congestion-dependent nonlinear cost functions, we aim to jointly determine the cache deployment, content placement, and hop-by-hop routing strategies, so that the sum of routing cost and cache deployment cost is minimized. We tackle this mixed-integer nonlinear problem starting with a fixed-routing setting, and then generalize to a dynamic-routing setting. For the fixed-routing setting, a Gradient-combining Frank-Wolfe algorithm with (1/2, 1)-approximation is presented. For the general dynamic-routing setting, we obtain a set of KKT conditions, and devise a distributed and adaptive online algorithm based on these conditions. We demonstrate via extensive simulation that our algorithms significantly outperform a number of baselines.