基于多尺度Transformer的多视图三维形状分析方法

doi:10.3969/j.issn.1005-3085.2024.01.010

工程数学学报 ›› 2024, Vol. 41 ›› Issue (1): 164-174.doi: 10.3969/j.issn.1005-3085.2024.01.010

基于多尺度Transformer的多视图三维形状分析方法

卫鑫, 孙剑

西安交通大学数学与统计学院，西安 710049

收稿日期:2023-10-22 接受日期:2023-11-24 出版日期:2024-02-15 发布日期:2024-04-15
基金资助:
国家自然科学基金(12125104).

Multi-scale Transformer for View-based 3D Shape Analysis

WEI Xin, SUN Jian

School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049

Received:2023-10-22 Accepted:2023-11-24 Online:2024-02-15 Published:2024-04-15
Supported by:
The National Natural Science Foundation of China (12125104).

摘要/Abstract

摘要： 基于多视图的三维形状分析方法是三维计算机视觉领域中的重要研究分支，通过整合三维形状在多个视角下的二维图像的特征来完成三维形状的识别、检索等任务。然而，如何有效地探索不同视角之间的关联性，并运用这些关联性来聚合多视图图像的特征仍然是三维形状分析中一个亟待解决的核心问题。受到最近兴起的Transformer网络在关系建模问题上成功应用的启发，
研究工作引入了一种创新的多尺度Transformer架构，提出了基于多尺度Transformer的多视图三维形状分析方法(Multi-View Multi-Scale Transformer, MVMST)。此方法能够有效地学习不同视角之间的关联性，将多视图图像的特征聚合为一个具有强大表达能力的整体描述符。与以往方法使用感受野为全局的Transformer建模多视图特征的关系不同，该方法受到多尺度学习方法的启发，使用多尺度的Transformer来建模不同尺度下的多视图图像特征之间的关系，并设计了一个多尺度融合模块将多个尺度下经过Transformer处理的特征进行融合，得到一个相比单一尺度更加有效的多尺度表示。多个视图的多尺度表示最终经过视角池化模块融合成三维形状的一个整体描述符。研究了在多个合成和真实扫描三维形状分类数据集上进行了实验，结果表明所提出的方法在三维形状分类任务上表现出令人满意的性能。

关键词: 三维形状分析, Transformer, 多尺度方法

Abstract:

View-based 3D shape analysis is a crucial research domain within the field of 3D computer vision. Those techniques aim to recognise and retrieve 3D objects by aggregating features extracted from 2D images of the same object taken from different viewpoints. However, effectively exploring the relationships between different viewpoints and aggregating features from multiple viewpoints using these relationships remain fundamental challenges in the field of 3D shape analysis. Taking inspiration from the recent success of Transformer networks in modeling relationships, an novel multi-scale Transformer architecture is introduced and the Multi-View Multi-Scale Transformer (MVMST) is presented for three-dimensional shape analysis. MVMST efficiently learns relationships between different views and integrates features from multi-view images into a global descriptor. While previous approaches use a Transformer with a global receptive field to model the relationships between multi-view features, MVMST makes use of multi-scale learning. A multi-scale Transformer is used to model the relationships between multi-view features at different scales. In addition, a multi-scale fusion module is designed to merge the features processed by the multi-scale Transformer to obtain a more efficient multi-scale representation. With the view pooling module, these multi-scale representations from different views are eventually fused into a global descriptor of the 3D shape. The experiments on synthetic and real-world 3D object classification datasets demonstrate that the proposed method shows promising performance in 3D object classification tasks.

Key words: 3D shape recognition, Transformer, multi-scale learning

中图分类号:

TP183

卫鑫, 孙剑. 基于多尺度Transformer的多视图三维形状分析方法[J]. 工程数学学报, 2024, 41(1): 164-174.

WEI Xin, SUN Jian. Multi-scale Transformer for View-based 3D Shape Analysis[J]. Chinese Journal of Engineering Mathematics, 2024, 41(1): 164-174.

[1]	王世鹏, 孙剑, 徐宗本. 基于特征置信度的无源域自适应方法[J]. 工程数学学报, 2023, 40(4): 511-522.
[2]	逯苗, 曲良东, 何登旭. 多根非线性方程组求解的探路者灰狼算法[J]. 工程数学学报, 2022, 39(6): 957-968.
[3]	张振宇, 林沐阳. 人工神经网络中的一种Krylov子空间优化算法[J]. 工程数学学报, 2022, 39(5): 681-694.
[4]	王芬. 基于忆阻器的随机神经网络的稳定性[J]. 工程数学学报, 2022, 39(4): 522-532.
[5]	苏丽娟, 周立群. 一类具比例时滞细胞神经网络反周期解的指数稳定性[J]. 工程数学学报, 2017, 34(2): 143-154.

基于多尺度Transformer的多视图三维形状分析方法

Multi-scale Transformer for View-based 3D Shape Analysis

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 5

编辑推荐

Metrics

本文评价