Abstract:With the development of drone technology,the use of drone images for ground vehicle target recognition is of great significance both in rescue and disaster relief and in traffic management. However,in actual use,due to the flight height and other reasons,the target in the image is generally small in size and the feature information is not obvious. It is difficult to detect the target using existing algorithms. Therefore,an image multi-target detection method based on multi-scale fusion is proposed. Using Faster R-CNN as the basic framework,the feature information of different levels is fused,and the context information is combined to realize the detection of small targets in unmanned aerial vehicle images. The VisDrone dataset is used to perform ground inspection on ground vehicles. Experiments have shown that the detection of ground vehicle targets by drones has achieved good results. The accuracy of the algorithm used has reached 88%, which is an increase of 3.8% compared with other algorithms the above.
[1]Denton E, Zaremba W, Bruna J, et al. Exploiting linear structure within convolutional networks for efficient evaluation[J/OL]. In NIPS, 2014, [2020-01-08]. https://arxiv. org/abs/1404. 0736.
[2]Papandreou G, Zhu T, Kanazawa N, et al. Towards accurate multiperson pose estimation in the wild [J/OL]. In CVPR, 2017, [2020-01-08]. https://arxiv. org/abs/1701. 01779.
[3]田卓, 佘青山, 甘海涛, 等. 面向人脸特征点定位和姿态估计任务协同的DCNN方法 [J]. 计量学报, 2019, 40 (4):576-582.
Tian Z, She Q S, Gan H T, et al. DCNN method for face feature point localization and pose estimation task coordination [J]. Acta Metrologica Sinica, 2019, 40 (4): 576-582.
[4]Bailo O, Rameau F, Joo K, et al. Efficient Aadaptive Nnon-maximal Suppression Algorithms for Homogeneous Spatial Keypoint Distribution [J]. Pattern Recognition Letters, 2018, 106: 53-60.
[5]Xie S N, Girshick R, Dollar P, et al. Aggregated resi-dual transformations for deep neural networks [J/OL]. In CVPR, 2017, 4/5, [2019-12-15]. https://arxiv. org/abs/1611. 05431.
[6]He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[J/OL]. In CVPR, 2016, 2/4, [2019-12-15]. https://arxiv. org/abs/1512. 03385.
[7]李抵非, 陈赫, 冯志刚, 等. 一种双目立体视觉系统的校准方法 [J]. 计量学报, 2018, 39 (4): 485-489.
Li D F, Chen H, Feng Z G, et al. A Calibration Method for Binocular Stereo Vision System [J]. Acta Metrologica Sinica, 2018, 39 (4): 485-489.
[8]Redmon J, Divvala S, Girshick R, et al. You only look once:Unified, real-time object detection [J/OL]. arXiv preprint arXiv, 2015, 5/6:1506.02640, [2019-11-13]. https://arxiv. org/abs/1506.02640.
[9]Ren S Q, He K M, Girshick R, et al. Faster r-cnn:Towards real-time object detection with region proposal networks [J/OL]. arxiv preprint arXiv: 1506. 01497, 2015, 2/3/5/6, [2019-11-13]. https://arxiv.org/abs/1506. 01497.
[10]程淑红, 张仕军, 赵考鹏. 基于卷积神经网络的生物式水质监测方法 [J]. 计量学报, 2019, 40 (4): 721-727.
Cheng S H, Zhang S J, Zhao K P. A biological water quality monitoring method based on convolutional neural network [J]. Acta Metrologica Sinica, 2019, 40 (4): 721-727.
[11]Thomee B, Shamma D A, Friedland G, et al. YFCC100M: The New Data in Multimedia Research[J]. Communications of the ACM, 2016, 59(2): 64-73.
[12]Wei S E, Ramakrishna V, Kanade T, et al. Convolutional pose machines[J/OL]. In CVPR, 2016, 8, [2019-12-15]. https://arxiv. org/abs/1602. 00134.
[13]郭丽丽, 丁世飞. 深度学习研究进展 [J]. 计算机科学, 2015, 42 (5):28-33.
Guo L L, Ding S F. Research Progress on Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2015, 42 (5): 28-33.
[14]Yao W, Liu Y P, Zhu C B. Deep learning of full-reference image quality assessment based on human vis-ual properties [J]. Infrared and Laser Engineering, 2018, 47 (7): 0703004.
[15]程瑶, 赵雷, 成珊, 等. 基于机器视觉的车距检测系统设计 [J]. 计量学报, 2020, 41 (1):11-15.
Cheng Y, Zhao L, Cheng S, et al. Design of Vehicle Distance Detection System Based on Machine Vision [J]. Acta Metrologica Sinica, 2020, 41 (1): 11-15.
[16]Liu W, Wen Y, Yu Z, et al. Large-Margin Softmax Loss for Convolutional Neural Networks[C]//Procee-dings of The 33rd International Conference on Machine Learning. 2016: 507-516.
[17]Capes T, Coles P, Conkie A, et al. Siri On-Device Deep Leaming-Guided Unit Selection Text-to-Speech Sy-stem[C]//Interspeech. 2017: 12-25.