[1]朱家群,王东阳,顾玉宛,等.基于多尺度融合CNNs特征和Transformer特征的显著目标检测[J].常州大学学报(自然科学版),2023,35(06):35-44.[doi:10.3969/j.issn.2095-0411.2023.06.005]
 ZHU Jiaqun,WANG Dongyang,GU Yuwan,et al.Salient object detection based on multi-scale fusion of CNNs features and Transformer features[J].Journal of Changzhou University(Natural Science Edition),2023,35(06):35-44.[doi:10.3969/j.issn.2095-0411.2023.06.005]
点击复制

基于多尺度融合CNNs特征和Transformer特征的显著目标检测()
分享到:

常州大学学报(自然科学版)[ISSN:2095-0411/CN:32-1822/N]

卷:
第35卷
期数:
2023年06期
页码:
35-44
栏目:
计算机与信息工程:深度学习与目标检测专题
出版日期:
2023-11-28

文章信息/Info

Title:
Salient object detection based on multi-scale fusion of CNNs features and Transformer features
文章编号:
2095-0411(2023)06-0035-10
作者:
朱家群 王东阳 顾玉宛 徐守坤
(常州大学 计算机与人工智能学院, 江苏 常州 213164)
Author(s):
ZHU Jiaqun WANG Dongyang GU Yuwan XU Shoukun
(School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou 213164, China)
关键词:
神经网络 Transformer 多尺度分析 显著性目标检测
Keywords:
neural networks Transformer multi-scale analysis salient object detection
分类号:
TP 391.4
DOI:
10.3969/j.issn.2095-0411.2023.06.005
文献标志码:
A
摘要:
提出了一种融合深度神经网络和Transformer 特征的多尺度结构,目的在于解决在同一场景下出现尺寸不同的目标时,显著目标检测网络性能下降的问题。当处理不同尺度的物体时,由于采样深度和感受野尺寸之间的矛盾,现有方法的表现往往不稳定。为了应对这一挑战,采取了3种不同的采样率对特征图进行采样,并使用 Transformer 模块来学习全局上下文信息。这种方法可以将卷积神经网络(CNNs)和Transformer 两种网络的特性进行有效融合,从而创新性地提出了一种针对多尺度物体的显著目标检测策略。在 UHRSD-TE,DUT-OMRON 和 DUTS-TE 3个公开数据集上的实验结果证明,该方法在处理同一场景下不同尺寸物体的显著目标检测任务上表现优秀。
Abstract:
This paper proposes a multi-scale structure that integrates deep neural networks and Transformer features, aiming to address the issue of performance degradation in salient object detection networks when objects of different sizes appear in the same scene. When dealing with objects of different scales, the performance of existing methods often fluctuates due to the contradiction between sampling depth and receptive field size. To tackle this challenge, three different sampling rates were adopted to sample the feature maps, and the Transformer module was used to learn global context information. This method enables the effective fusion of the characteristics of Convolutional Neural Networks(CNNs)and Transformer networks, thereby innovatively proposing a salient object detection strategy for multi-scale objects. Experimental results on three public datasets, UHRSD-TE, DUT-OMRON, and DUTS-TE, demonstrate that this method performs excellently in the task of salient object detection for objects of different sizes in the same scene.

参考文献/References:

[1] LIU N A, ZHAO W B, ZHANG D W, et al. Light field saliency detection with dual local graph learning and reciprocative guidance[C]//2021 IEEE/CVF International Conference on Computer Vision(ICCV). Montreal: IEEE, 2021: 4712-4721.
[2] ZHANG M, LI J J, WEI J, et al. Memory-oriented decoder for light field salient object detection[C]//31st Annual Conference on Neural Information Processing Systems(NIPS 2017). California: [s.n.], 2019.
[3] ZHOU Z Q, WANG Z, LU H C, et al. Multi-type self-attention guided degraded saliency detection[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 13082-13089.
[4] JI G P, FU K R, WU Z, et al. Full-duplex strategy for video object segmentation[C]//2021 IEEE/CVF International Conference on Computer Vision(ICCV). Canada: IEEE, 2021: 4922-4933.
[5] ZHANG M A, LIU J E, WANG Y F, et al. Dynamic context-sensitive filtering network for video salient object detection[C]//2021 IEEE/CVF International Conference on Computer Vision(ICCV). Canada: IEEE, 2021: 1553-1563.
[6] CHEN S H, TAN X L, WANG B, et al. Reverse attention for salient object detection[C]//European Conference on Computer Vision. Cham: Springer, 2018: 236-252.
[7] JI W, LI J J, YU S A, et al. Calibrated RGB-D salient object detection[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Nashville: IEEE, 2021: 9471-9481.
[8] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Las Vegas: IEEE, 2016: 770-778.
[9] LIU N A, ZHANG N, WAN K Y, et al. Visual saliency transformer[C]//2021 IEEE/CVF International Conference on Computer Vision(ICCV). Montreal: IEEE, 2021: 4722-4732.
[10] LIU Z, LIN Y T, CAO Y E, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//2021 IEEE/CVF International Conference on Computer Vision(ICCV). Montreal: IEEE, 2021: 10012-10022.
[11] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Honolulu: IEEE, 2017: 2117-2125.
[12] WEI J, WANG S H, WU Z, et al. Label decoupling framework for salient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S.l.: s.n.], 2020: 13025-13034.
[13] XIA C Q, LI J, CHEN X W, et al. What is and what is not a salient object? Learning salient object detector by ensembling linear exemplar regressors[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Honolulu: IEEE, 2017: 4399-4407.
[14] XU B W, LIANG H R, LIANG R H, et al. Locate globally, segment locally: a progressive architecture with knowledge review network for salient object detection[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(4): 3004-3012.
[15] ZHAO J X, LIU J J, FAN D P, et al. EGNet: edge guidance network for salient object detection[C]//2019 IEEE/CVF International Conference on Computer Vision(ICCV). Seoul: IEEE, 2020: 8778-8787.
[16] WU Z, SU L, HUANG Q M. Cascaded partial decoder for fast and accurate salient object detection[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Long Beach: IEEE, 2019: 3907-3916.
[17] WU Z, SU L, HUANG Q M. Stacked cross refinement network for edge-aware salient object detection[C]//Proceedings of the IEEE/CVF international conference on computer vision. [S.l.]: IEEE, 2019: 7264-7273.
[18] ZHAO J W, ZHAO Y F, LI J, et al. Is depth really necessary for salient object detection? [C]//Proceedings of the 28th ACM International Conference on Multimedia. New York: ACM, 2020: 1745-1754.
[19] ZHAO Z R, XIA C Q, XIE C X, et al. Complementary trilateral decoder for fast and accurate salient object detection[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York: ACM, 2021: 4967-4975.
[20] WEI J, WANG S H, HUANG Q M. F3Net: fusion, feedback and focus for salient object detection[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12321-12328.
[21] ZHOU H J, XIE X H, LAI J H, et al. Interactive two-stream decoder for accurate and fast saliency detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Seattle: IEEE, 2020: 9138-9147.
[22] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848.
[23] ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Honolulu: IEEE, 2017: 6230-6239.
[24] YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[EB/OL]. 2015: arXiv: 1511.07122. https://arxiv.org/abs/1511.07122.pdf.
[25] XIE C X, XIA C Q, MA M C, et al. Pyramid grafting network for one-stage high resolution saliency detection[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). New Orleans: IEEE, 2022: 11707-11716.
[26] YANG C, ZHANG L H, LU H C, et al. Saliency detection via graph-based manifold ranking[C]//2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 3166-3173.
[27] WANG L J, LU H C, WANG Y F, et al. Learning to detect salient objects with image-level supervision[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Honolulu: IEEE, 2017: 3796-3805.
[28] ZENG Y, ZHANG P P, LIN Z, et al. Towards high-resolution salient object detection[C]//2019 IEEE/CVF International Conference on Computer Vision(ICCV). Seoul: IEEE, 2021, 35(3): 2311-2318.
[29] BORJI A, CHENG M M, JIANG H Z, et al. Salient object detection: a benchmark[J]. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society, 2015, 24(12): 5706-5722.
[30] FAN D P, CHENG M M, LIU Y, et al. Structure-measure: a new way to evaluate foreground maps[C]//2017 IEEE International Conference on Computer Vision(ICCV). Venice: IEEE, 2017: 4558-4567.
[31] FAN D P, GONG C, CAO Y, et al. Enhanced-alignment measure for binary foreground map evaluation[EB/OL]. 2018: arXiv: 1805.10421. https://arxiv.org/abs/1805.10421.pdf.
[32] TANG L, LI B, DING S H, et al. Disentangled high quality salient object detection[C]//2021 IEEE/CVF International Conference on Computer Vision(ICCV). Montreal: IEEE, 2022: 3560-3570.
[33] QIN X B, ZHANG Z C, HUANG C Y, et al. BASNet: boundary-aware salient object detection[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Long Beach: IEEE, 2020: 7471-7481.
[34] CHEN Z Y, XU Q Q, CONG R M, et al. Global context-aware progressive aggregation network for salient object detection[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 10599-10606.
[35] MA M C, XIA C Q, LI J A. Pyramidal feature shrinking for salient object detection[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(3): 2311-2318.

相似文献/References:

[1]王正洪,蒋建明,张小鸣.基于神经网络的瓦斯传感器非线性校正[J].常州大学学报(自然科学版),2005,(01):37.
 WANG Zheng -hong,JIANG Jian -ming,ZHANG Xiao -ming.Nonlinear Compensation of Methane Sensor Based on Neural Network[J].Journal of Changzhou University(Natural Science Edition),2005,(06):37.
[2]倪重文,江兴方,王叶荟,等.神经网络模型应用于少数投影图像重建及数值模拟[J].常州大学学报(自然科学版),2005,(02):1.
 NI Zhong -wen,JIANG Xing -fang,WANG Ye -hui,et al.Neural Network Model Used in Image Reconstruction from a Few Projections and Numeric Simulation[J].Journal of Changzhou University(Natural Science Edition),2005,(06):1.
[3]周序洋,查利权.基于前馈多层神经网络的振动模态参数计算[J].常州大学学报(自然科学版),2002,(04):50.
 ZHOU Xu -yang,ZHA Li-quan.Computing Vibrational Modal Parameters by Neural Network[J].Journal of Changzhou University(Natural Science Edition),2002,(06):50.
[4]吕继东,王艺洁,夏正旺,等.基于改进的MaskR-CNN自然场景下苹果识别研究[J].常州大学学报(自然科学版),2022,34(01):68.[doi:10.3969/j.issn.2095-0411.2022.01.008]
 LYU Jidong,WANG Yijie,XIA Zhengwang,et al.Research on Natural Scene Apple Recongnition Based on Improved Mask R-CNN[J].Journal of Changzhou University(Natural Science Edition),2022,34(06):68.[doi:10.3969/j.issn.2095-0411.2022.01.008]

备注/Memo

备注/Memo:
收稿日期: 2023-07-06。
基金项目: 国家自然科学基金资助项目(61906021)。
作者简介: 朱家群(1981—), 女, 江苏连云港人, 硕士, 讲师。通信联系人: 徐守坤(1972—), E-mail: shoukxu@126.com
更新日期/Last Update: 1900-01-01