TY - GEN
T1 - Learning compact visual representation with canonical views for robust mobile landmark search
AU - Zhu, Lei
AU - Shen, Jialie
AU - Liu, Xiaobai
AU - Xie, Liang
AU - Nie, Liqiang
PY - 2016/7
Y1 - 2016/7
N2 - Mobile Landmark Search (MLS) recently receives increasing attention. However, it still remains unsolved due to two important issues. One is high bandwidth consumption of query transmission, and the other is the huge visual variations of query images. This paper proposes a Canonical View based Compact Visual Representation (2CVR) to handle these problems via novel three-stage learning. First, a submodular function is designed to measure visual representativeness and redundancy of a view set. With it, canonical views, which capture key visual appearances of landmark with limited redundancy, are efficiently discovered with an iterative mining strategy. Second, multimodal sparse coding is applied to transform multiple visual features into an intermediate representation which can robustly characterize visual contents of varied landmark images with only fixed canonical views. Finally, compact binary codes are learned on intermediate representation within a tailored binary embedding model which preserves visual relations of images measured with canonical views and removes noises. With 2CVR, robust visual query processing, low-cost of query transmission, and fast search process are simultaneously supported. Experiments demonstrate the superior performance of 2CVR over several state-of-the-art methods.
AB - Mobile Landmark Search (MLS) recently receives increasing attention. However, it still remains unsolved due to two important issues. One is high bandwidth consumption of query transmission, and the other is the huge visual variations of query images. This paper proposes a Canonical View based Compact Visual Representation (2CVR) to handle these problems via novel three-stage learning. First, a submodular function is designed to measure visual representativeness and redundancy of a view set. With it, canonical views, which capture key visual appearances of landmark with limited redundancy, are efficiently discovered with an iterative mining strategy. Second, multimodal sparse coding is applied to transform multiple visual features into an intermediate representation which can robustly characterize visual contents of varied landmark images with only fixed canonical views. Finally, compact binary codes are learned on intermediate representation within a tailored binary embedding model which preserves visual relations of images measured with canonical views and removes noises. With 2CVR, robust visual query processing, low-cost of query transmission, and fast search process are simultaneously supported. Experiments demonstrate the superior performance of 2CVR over several state-of-the-art methods.
UR - http://www.scopus.com/inward/record.url?scp=85006124886&partnerID=8YFLogxK
UR - https://www.ijcai.org/proceedings/2016/
M3 - Conference Paper published in Proceedings
AN - SCOPUS:85006124886
VL - 2016-January
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 3959
EP - 3965
BT - IJCAI International Joint Conference on Artificial Intelligence
PB - International Joint Conferences on Artificial Intelligence
T2 - 25th International Joint Conference on Artificial Intelligence, IJCAI 2016
Y2 - 9 July 2016 through 15 July 2016
ER -