Abstract:A lightweight pedestrian detection algorithm based on feature fusion is proposed to solve the problem of low detection accuracy caused by dense pedestrian targets, small target scales, and varying background illumination around the target. Firstly, build a new backbone feature extraction network (CSPDarknet53-S), and add a new feature extraction module (REM) to the original backbone network to enhance the networks ability to extract pedestrian features. Secondly, improve the feature fusion structure. After extracting high-low feature maps from the backbone network, add a feature fusion module (RM block) between the backbone network and the feature fusion network to increase the receptive field. And then introduce shallow feature information to retain more small target features to form a new feature fusion network (IFFM). Finally, the fused feature map is processed through YOLO Head to obtain the output results. The above steps are based on the basic framework of TinyYOLOv4. Experimental results show that the proposed algorithm achieves higher detection accuracy and better detection results on pedestrian data sets (PASCAL VOC2007 and VOC2012 person data).
LIENHART R, MAYDT J. An extended set of Haar-like features for rapid object detection [C] //IEEE International Conference on Image Processing. New York, USA, 2002.
ZHANG L G, JIANG Y X, TIAN G J. Research on Unmanned Aerial Vehicle to Ground Vehicle Target Detection Algorithm Based on Multiscale Fusion Method [J]. Acta Metrologica Sinica, 2021, 42(11): 1436-1442.
[7]
KAZEMI F M, SAMADI S, POORREZA H R, et al. Vehicle recognition using Curvelet transform and SVM [C] //4th International Conference on Information Technology. Las Vegas NV, USA, 2007.
WANG C Y, LIAO H M, WU Y H, et al. CSPNet: A new backbone that can enhance learning capability of CNN [C] //IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle, WA, 2020.
[3]
DALAL N, TRIGGS B. Histograms of oriented gradients for human detection [C] //IEEE Conference on Computer Vision and Pattern Recognition. New York, USA, 2005.
VIOLA P, JONES M. Rapid object detection using a boosted cascade of simple features [C] //IEEE Conference on Computer Vision and Pattern Recognition. Kauai, HI, USA, 2001.
[9]
GIRSHICK R. Fast R-CNN [C] //IEEE International Conference on Computer Vision. Santiago, Chile, 2015.
[15]
BOCHKOVSKIY A, WANG C Y, LIAO H M. Yolov4: Optimal speed and accuracy of object detection [J/OL]. https://arxiv.org/abs/2004.10934. 2004.
[20]
ZHENG Z, WANG P, REN D, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J]. arXiv preprint arXiv: 2005. 03572, 2020.
[8]
GIRSHICK R, DONAHUE J, DARRELLl T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C] //IEEE Conference on Computer Vision and Pattern Rec-ognition. New York, USA, 2014.
[13]
REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, reatime object detection [C] //IEEE Conference on Computer Vision and Pattern Recognition. Seattle, WA, 2016.
[16]
HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37 (9): 1904-1916.
[19]
LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C] //IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Honolulu, USA, 2017.
TAN F, MU P A, MA Z X. Multi-target Tracking Algorithm Based on YOLOv3 Detection and Feature Point Matching[J]. Acta Metrologica Sinica, 2021, 42(2): 157-162.
QI X M, DONG X. Improved Yolov7-tiny algorithm for steel surface defect detection[J]. Computer Engineering and Applications, 2023, 59(12): 176-183.
[23]
WANG C Y, BOCHKOVSKIY A, LIAO H M. Scaled-YOLOv4: Scaling Cross Stage Partial Network [C] //IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Virtual, 2021.
WANG H B, YU F, LI Y J, et al. Detection of moving object by combining block features matching and local differential [J]. Acta Metrologica Sinica, 2015, 36(4): 352-355.
[4]
WU B, NEVATIA R. Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors [C] //IEEE International Conference on Computer Vision. Beijing, China, 2005.
CHEN S H, GAO X, ZHOU B. Vehicle recognition based on multi-feature extraction and SVM parameter optimization [J]. Acta Metrologica Sinica, 2018, 39(3): 348-352.
[10]
REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[17]
WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C] //15th European Conference on Computer Vision. Munich, German, 2018.