Abstract
Digital twin is the innovation backbone of the smart manufacturing by delivering virtual representation of the real world. Aiming at constructing virtual representations of visual scenes, scene graph generation is a digital twin task that not only models objects but also infers their relationships. Existing works usually learn coarse global context when predicting relationships leading to excessive redundant information being considered. In this paper, we first classify objects into different subgroups according to the degree of correlations with several hub objects. Then, we propose a multi-hub driven attention network (MHDANet) based on deep learning that drives the information to pass within the subgroups and forces objects to attend more to related objects. Consequently, MHDANet learns compact relation-aware features of visual scenes, and predicts accurate and diverse relationships. Experimental results show that MHDANet achieves superb performance on scene graph generation on real-world datasets, especially alleviates the imbalance of predicted relationship categories.
Original language | English |
---|---|
Pages (from-to) | 1435-1444 |
Number of pages | 10 |
Journal | IEEE Transactions on Industrial Informatics |
Volume | 18 |
Issue number | 2 |
DOIs | |
Publication status | Published - Feb 2022 |
Bibliographical note
Publisher Copyright:IEEE