Association Journal of CSIAM
Supervised by Ministry of Education of PRC
Sponsored by Xi'an Jiaotong University
ISSN 1005-3085  CN 61-1269/O1

Chinese Journal of Engineering Mathematics ›› 2024, Vol. 41 ›› Issue (1): 164-174.doi: 10.3969/j.issn.1005-3085.2024.01.010

Previous Articles     Next Articles

Multi-scale Transformer for View-based 3D Shape Analysis

WEI Xin,  SUN Jian   

  1. School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049
  • Received:2023-10-22 Accepted:2023-11-24 Online:2024-02-15 Published:2024-04-15
  • Supported by:
    The National Natural Science Foundation of China (12125104).

Abstract:

View-based 3D shape analysis is a crucial research domain within the field of 3D computer vision. Those techniques aim to recognise and retrieve 3D objects by aggregating features extracted from 2D images of the same object taken from different viewpoints. However, effectively exploring the relationships between different viewpoints and aggregating features from multiple viewpoints using these relationships remain fundamental challenges in the field of 3D shape analysis. Taking inspiration from the recent success of Transformer networks in modeling relationships, an novel multi-scale Transformer architecture is introduced and the Multi-View Multi-Scale Transformer (MVMST) is presented for three-dimensional shape analysis. MVMST efficiently learns relationships between different views and integrates features from multi-view images into a global descriptor. While previous approaches use a Transformer with a global receptive field to model the relationships between multi-view features, MVMST makes use of multi-scale learning. A multi-scale Transformer is used to model the relationships between multi-view features at different scales. In addition, a multi-scale fusion module is designed to merge the features processed by the multi-scale Transformer to obtain a more efficient multi-scale representation. With the view pooling module, these multi-scale representations from different views are eventually fused into a global descriptor of the 3D shape. The experiments on synthetic and real-world 3D object classification datasets demonstrate that the proposed method shows promising performance in 3D object classification tasks.

Key words: 3D shape recognition, Transformer, multi-scale learning

CLC Number: