High quality tags play a critical role in applications involving online multimedia search, such as social image annotation, sharing and browsing. However, user-generated tags in real world are often imprecise and incomplete to describe the image contents, which severely degrades the performance of current search systems. To improve the descriptive powers of social tags, a fundamental issue is tag relevance learning, which concerns how to interpret the relevance of a tag with respect to the contents of an image effectively. In this paper, we investigate the problem from a new perspective of learning to rank, and develop a novel approach to facilitate tag relevance learning to directly optimize the ranking performance of tag-based image search. Specifically, a supervision step is introduced into the neighbor voting scheme, in which the tag relevance is estimated by accumulating votes from visual neighbors. Through explicitly modeling the neighbor weights and tag correlations, the risk of making heuristic assumptions is effectively avoided. Besides, our approach does not suffer from the scalability problem since a generic model is learned that can be applied to all tags. Extensive experiments on two benchmark datasets in comparison with the state-of-the-art methods demonstrate the promise of our approach.