TY - JOUR
T1 - Query-document-dependent fusion
T2 - A case study of multimodal music retrieval
AU - Li, Zhonghua
AU - Zhang, Bingjun
AU - Yu, Yi
AU - Shen, Jialie
AU - Wang, Ye
PY - 2013/12/2
Y1 - 2013/12/2
N2 - In recent years, multimodal fusion has emerged as a promising technology for effective multimedia retrieval. Developing the optimal fusion strategy for different modalities (e.g., content, metadata) has been the subject of intensive research. Given a query, existing methods derive a unified fusion strategy for all documents with the underlying assumption that the relative significance of a modality remains the same across all documents. However, this assumption is often invalid. We thus propose a general multimodal fusion framework, query-document-dependent fusion (QDDF), which derives the optimal fusion strategy for each query-document pair via intelligent content analysis of both queries and documents. By investigating multimodal fusion strategies adaptive to both queries and documents, we demonstrate that existing multimodal fusion approaches are special cases of QDDF and propose two QDDF approaches to derive fusion strategies. The dual-phase QDDF explicitly derives and fuses query- and document-dependent weights, and the regression-based QDDF determines the fusion weight for a query-document pair via a regression model derived from training data. To evaluate the proposed approaches, comprehensive experiments have been conducted using a multimedia data set with around 17 K full songs and over 236 K social queries. Results indicate that the regression-based QDDF is superior in handling single-dimension queries. In comparison, the dual-phase QDDF outperforms existing approaches for most query types. We found that document-dependent weights are instrumental in enhancing multimedia fusion performance. In addition, efficiency analysis demonstrates the scalability of QDDF over large data sets.
AB - In recent years, multimodal fusion has emerged as a promising technology for effective multimedia retrieval. Developing the optimal fusion strategy for different modalities (e.g., content, metadata) has been the subject of intensive research. Given a query, existing methods derive a unified fusion strategy for all documents with the underlying assumption that the relative significance of a modality remains the same across all documents. However, this assumption is often invalid. We thus propose a general multimodal fusion framework, query-document-dependent fusion (QDDF), which derives the optimal fusion strategy for each query-document pair via intelligent content analysis of both queries and documents. By investigating multimodal fusion strategies adaptive to both queries and documents, we demonstrate that existing multimodal fusion approaches are special cases of QDDF and propose two QDDF approaches to derive fusion strategies. The dual-phase QDDF explicitly derives and fuses query- and document-dependent weights, and the regression-based QDDF determines the fusion weight for a query-document pair via a regression model derived from training data. To evaluate the proposed approaches, comprehensive experiments have been conducted using a multimedia data set with around 17 K full songs and over 236 K social queries. Results indicate that the regression-based QDDF is superior in handling single-dimension queries. In comparison, the dual-phase QDDF outperforms existing approaches for most query types. We found that document-dependent weights are instrumental in enhancing multimedia fusion performance. In addition, efficiency analysis demonstrates the scalability of QDDF over large data sets.
KW - Information retrieval
KW - Multimodal
KW - Query-document-dependent fusion
UR - http://www.scopus.com/inward/record.url?scp=84888309967&partnerID=8YFLogxK
U2 - 10.1109/TMM.2013.2280437
DO - 10.1109/TMM.2013.2280437
M3 - Article
AN - SCOPUS:84888309967
VL - 15
SP - 1830
EP - 1842
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
SN - 1520-9210
IS - 8
M1 - 6588600
ER -