在线咨询
中国工业与应用数学学会会刊
主管:中华人民共和国教育部
主办:西安交通大学
ISSN 1005-3085  CN 61-1269/O1

工程数学学报 ›› 2024, Vol. 41 ›› Issue (1): 164-174.doi: 10.3969/j.issn.1005-3085.2024.01.010

• • 上一篇    下一篇

基于多尺度Transformer的多视图三维形状分析方法

卫  鑫,  孙 剑   

  1. 西安交通大学数学与统计学院,西安 710049
  • 收稿日期:2023-10-22 接受日期:2023-11-24 出版日期:2024-02-15 发布日期:2024-04-15
  • 基金资助:
    国家自然科学基金(12125104).

Multi-scale Transformer for View-based 3D Shape Analysis

WEI Xin,  SUN Jian   

  1. School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049
  • Received:2023-10-22 Accepted:2023-11-24 Online:2024-02-15 Published:2024-04-15
  • Supported by:
    The National Natural Science Foundation of China (12125104).

摘要: 基于多视图的三维形状分析方法是三维计算机视觉领域中的重要研究分支,通过整合三维形状在多个视角下的二维图像的特征来完成三维形状的识别、检索等任务。然而,如何有效地探索不同视角之间的关联性,并运用这些关联性来聚合多视图图像的特征仍然是三维形状分析中一个亟待解决的核心问题。受到最近兴起的Transformer网络在关系建模问题上成功应用的启发,
研究工作引入了一种创新的多尺度Transformer架构,提出了基于多尺度Transformer的多视图三维形状分析方法(Multi-View Multi-Scale Transformer, MVMST)。此方法能够有效地学习不同视角之间的关联性,将多视图图像的特征聚合为一个具有强大表达能力的整体描述符。与以往方法使用感受野为全局的Transformer建模多视图特征的关系不同,该方法受到多尺度学习方法的启发,使用多尺度的Transformer来建模不同尺度下的多视图图像特征之间的关系,并设计了一个多尺度融合模块将多个尺度下经过Transformer处理的特征进行融合,得到一个相比单一尺度更加有效的多尺度表示。多个视图的多尺度表示最终经过视角池化模块融合成三维形状的一个整体描述符。研究了在多个合成和真实扫描三维形状分类数据集上进行了实验,结果表明所提出的方法在三维形状分类任务上表现出令人满意的性能。

关键词: 三维形状分析, Transformer, 多尺度方法

Abstract:

View-based 3D shape analysis is a crucial research domain within the field of 3D computer vision. Those techniques aim to recognise and retrieve 3D objects by aggregating features extracted from 2D images of the same object taken from different viewpoints. However, effectively exploring the relationships between different viewpoints and aggregating features from multiple viewpoints using these relationships remain fundamental challenges in the field of 3D shape analysis. Taking inspiration from the recent success of Transformer networks in modeling relationships, an novel multi-scale Transformer architecture is introduced and the Multi-View Multi-Scale Transformer (MVMST) is presented for three-dimensional shape analysis. MVMST efficiently learns relationships between different views and integrates features from multi-view images into a global descriptor. While previous approaches use a Transformer with a global receptive field to model the relationships between multi-view features, MVMST makes use of multi-scale learning. A multi-scale Transformer is used to model the relationships between multi-view features at different scales. In addition, a multi-scale fusion module is designed to merge the features processed by the multi-scale Transformer to obtain a more efficient multi-scale representation. With the view pooling module, these multi-scale representations from different views are eventually fused into a global descriptor of the 3D shape. The experiments on synthetic and real-world 3D object classification datasets demonstrate that the proposed method shows promising performance in 3D object classification tasks.

Key words: 3D shape recognition, Transformer, multi-scale learning

中图分类号: