Landmark image classification attracts increasing research attention due to its great importance in real applications, ranging from travel guide recommendation to 3-D modelling and visualization of geolocation. While large amount of efforts have been invested, it still remains unsolved by academia and industry. One of the key reasons is the large intra-class variance rooted from the diverse visual appearance of landmark images. Distinguished from most existing methods based on scalable image search, we approach the problem from a new perspective and model landmark classification as multi-modal categorization , which enjoys advantages of low storage overhead and high classification efficiency. Toward this goal, a novel and effective feature representation, called hierarchical multi-modal exemplar (HMME) feature, is proposed to characterize landmark images. In order to compute HMME, training images are first partitioned into the regions with hierarchical grids to generate candidate images and regions. Then, at the stage of exemplar selection, hierarchical discriminative exemplars in multiple modalities are discovered automatically via iterative boosting and latent region label mining. Finally, HMME is generated via a region-based locality-constrained linear coding (RLLC), which effectively encodes semantics of the discovered exemplars into HMME. Meanwhile, dimension reduction is applied to reduce redundant information by projecting the raw HMME into lower-dimensional space. The final HMME enjoys advantages of discriminative and linearly separable. Experimental study has been carried out on real world landmark datasets, and the results demonstrate the superior performance of the proposed approach over several state-of-the-art techniques.