Abstract:
Zero-shot learning of 3D model classification is a burgeoning topic in the field of 3D vision, aiming to classify untrained 3D models correctly. Aiming at the problem that zero-shot learning of 3D model classification focus on global information rather than local information, impose mandatory constraints, and ignore the cross-domains semantic-visual differences, resulting in low performance, this paper proposes a discriminative feature-guided zero-short 3D model classification network. Firstly, the local discriminative features, i.e., real-visual features of the multi-view 3D models, are adaptively captured by the proposed visual feature extraction module. Secondly, the semantic representations of the class labels are introduced in the form of word vectors, and their pseudo-visual features are generated by conditional generation adversarial network. Finally, the fine-grained across domains alignment of semantic-visual features is achieved by a novel semantic-content joint loss, which consists of semantic discriminative loss and content-aware loss between real-visual and pseudo-visual features from semantics to contents. The proposed algorithm achieves a Top-1 accuracy rate of 60.9% on the ZS3D dataset, which exceeds with the current best method with an accuracy rate of 2.3%. and achieves an accuracy rate of 31.9%, 9.9% and 16.6% on the three sub-datasets of Ali, respectively, which performs an excellent experimental result and verifies the effectiveness and universality of the proposed algorithm.