[1]王洪元,齐鹏宇,唐 郢,等.基于YOLOv4的行人检测算法[J].常州大学学报(自然科学版),2024,36(05):52-60.[doi:10.3969/j.issn.2095-0411.2024.05.006]
 WANG Hongyuan,QI Pengyu,TANG Ying,et al.Pedestrian detection algorithm based on YOLOv4[J].Journal of Changzhou University(Natural Science Edition),2024,36(05):52-60.[doi:10.3969/j.issn.2095-0411.2024.05.006]
点击复制

基于YOLOv4的行人检测算法()
分享到:

常州大学学报(自然科学版)[ISSN:2095-0411/CN:32-1822/N]

卷:
第36卷
期数:
2024年05期
页码:
52-60
栏目:
计算机与信息工程
出版日期:
2024-09-28

文章信息/Info

Title:
Pedestrian detection algorithm based on YOLOv4
文章编号:
2095-0411(2024)05-0052-09
作者:
王洪元 齐鹏宇 唐 郢 张 继 朱 繁 徐志晨
常州大学 计算机与人工智能学院, 江苏 常州 213164
Author(s):
WANG Hongyuan QI Pengyu TANG Ying ZHANG Ji ZHU Fan XU Zhichen
School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou 213164, China
关键词:
行人检测 单阶段目标检测 排斥损失函数 遮挡行人
Keywords:
pedestrian detection single-stage target detection repulsion loss occlude pedestrian
分类号:
TP 391.41
DOI:
10.3969/j.issn.2095-0411.2024.05.006
文献标志码:
A
摘要:
针对实际场景下YOLOv4模型难以处理遮挡行人的问题,在保证YOLOv4模型实时性的情况下做出了改进,将YOLOv4模型应用于行人检测。为了提高模型检测遮挡行人的能力,模型采用K-means++聚类算法重新设计适用于行人目标尺寸的先验框,引入排斥损失函数项,使候选框与临近的非匹配目标真实框距离最大化,候选框和其他目标真实框的重叠比例最小化。改进后模型在具有挑战性的数据集CrowdHuman和Caltech上进行实验,实验结果均验证了文中方法的有效性,最后将模型应用于实际场景下的视频行人检测,同样验证了本文改进的有效性。
Abstract:
Aiming at the YOLOv4 model's difficulty in dealing with occluded pedestrians in real scenarios, this paper made improvements in ensuring the real-time performance of the YOLOv4 model and applied the YOLOv4 model to pedestrian detection. In order to improve the model's ability to detect occluded pedestrians, the model adopted the K-means++ clustering algorithm to re-design the priori frames applicable to the pedestrian target sizes, and introduced the exclusion loss function term to maximise the distance between the candidate frames and the neighbouring real frames of non-matching targets, and minimise the overlap ratio between the candidate frames and the real frames of other targets. The improved model was experimented on the challenging datasets CrowdHuman and Caltech, and the experimental results verified the effectiveness of it. Finally the model has been applied to video pedestrian detection in real scenarios, which also verified the effectiveness of the improvements in this paper.

参考文献/References:

[1] YANG X, SUN H, SUN X, et al. Position detection and direction prediction for arbitrary-oriented ships via multitask rotation region convolutional neural network[J]. IEEE Access, 2018, 6: 50839-50849.
[2] XIA G S, BAI X A, DING J A, et al. DOTA: a large-scale dataset for object detection in aerial images[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 3974-3983.
[3] CHEN C Y, GONG W G, CHEN Y L, et al. Object detection in remote sensing images based on a scene-contextual feature pyramid network[J]. Remote Sensing, 2019, 11(3): 339.
[4] MA W P, GUO Q Q, WU Y E, et al. A novel multi-model decision fusion network for object detection in remote sensing images[J]. Remote Sensing, 2019, 11(7): 737.
[5] 齐鹏宇, 王洪元, 张继, 等. 基于改进FCOS的拥挤行人检测算法[J]. 智能系统学报, 2021, 16(4): 811-818.
[6] CHABOT F, CHAOUCH M, RABARISOA J, et al. Deep MANTA: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Honolulu: IEEE, 2017: 2040-2049.
[7] 张云鹏, 王洪元, 张继, 等. 近邻中心迭代策略的单标注视频行人重识别[J]. 软件学报, 2021, 32(12): 4025-4035.
[8] 戴臣超, 王洪元, 倪彤光, 等. 基于深度卷积生成对抗网络和拓展近邻重排序的行人重识别[J]. 计算机研究与发展, 2019, 56(8): 1632-1641.
[9] FARENZENA M, BAZZANI L, PERINA A, et al. Person re-identification by symmetry-driven accumulation of local features[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco: IEEE, 2010: 2360-2367.
[10] 王洪元, 徐志晨, 陈海琴, 等. 基于金字塔分割和时空注意力的视频行人重识别[J]. 常州大学学报(自然科学版), 2023, 35(2): 66-76.
[11] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[12] DAI J F, LI Y, HE K M, et al. R-FCN: object detection via region-based fully convolutional networks[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. New York: ACM, 2016: 379-387.
[13] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision(ICCV). Venice: IEEE, 2017: 2980-2988.
[14] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//2017 IEEE International Conference on Computer Vision(ICCV). Venice: IEEE, 2017: 2999-3007.
[15] CAI Z W, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6154-6162.
[16] REDMON J, DIVVALA S K, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// Proceedings of the 29th IEEE Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788.
[17] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Honolulu: IEEE, 2017: 6517-6525.
[18] REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. 2018: arXiv: 1804.02767. https://arxiv.org/abs/1804.02767.pdf.
[19] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. 2020: arXiv: 2004.10934. https://arxiv.org/abs/2004.10934.pdf.
[20] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]//European Conference on Computer Vision. Cham: Springer, 2016: 21-37.
[21] HU J E, SHEN L, SUN G. Squeeze-and-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
[22] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Honolulu: IEEE, 2017: 936-944.
[23] WANG C Y, MARK LIAO H Y, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW). Seattle: IEEE, 2020: 1571-1580.
[24] MISRA D. Mish: a self regularized non-monotonic activation function[EB/OL]. 2019: arXiv: 1908.08681. https://arxiv.org/abs/1908.08681.pdf.
[25] GHIASI G, LIN T Y, LE Q V. DropBlock: a regularization method for convolutional networks[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. New York: ACM, 2018: 10750-10760.
[26] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
[27] MEI Y Q, FAN Y C, ZHANG Y L, et al. Pyramid attention networks for image restoration[EB/OL]. 2020: arXiv: 2004.13824. https://arxiv.org/abs/2004.13824.pdf.
[28] BAHMANI B, MOSELEY B, VATTANI A, et al. Scalable k-means++[J]. Proceedings of the VLDB Endowment, 2012, 5(7): 622-633.
[29] WANG X, XIAO T, JIANG Y, et al. Repulsion loss: detecting pedestrians in a crowd[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7774-7778.
[30] ZHANG H Y, CISSE M, DAUPHIN Y N, et al. Mixup: beyond empirical risk minimization[J]. arXiv preprint arXiv:1710.09412, 2017.
[31] DEVRIES T, TAYLOR G W. Improved regularization of convolutional neural networks with cutout[EB/OL]. 2017: arXiv: 1708.04552. https://arxiv.org/abs/1708.04552.pdf.
[32] YUN S, HAN D, CHUN S, et al. CutMix: regularization strategy to train strong classifiers with localizable features[C]//2019 IEEE/CVF International Conference on Computer Vision(ICCV). Seoul: IEEE, 2020: 6022-6031.
[33] BODLA N, SINGH B, CHELLAPPA R, et al. Soft-NMS: improving object detection with one line of code[C]//2017 IEEE International Conference on Computer Vision(ICCV). Venice: IEEE, 2017: 5562-5570.
[34] ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the AAAI conference on artificial intelligence. 2020, 34(7): 12993-13000.

备注/Memo

备注/Memo:
收稿日期: 2024-03-15。
基金项目: 国家自然科学基金资助项目(61976028, 61572085, 61806026, 61502058); 江苏省自然科学基金资助项目(BK20180956)。
作者简介: 王洪元(1960—), 男, 江苏常熟人, 博士, 教授。E-mail: hywang@cczu.edu.cn
更新日期/Last Update: 1900-01-01