基于坐标注意力脉冲神经网络的注视估计方法

doi:10.3969/j.issn.1000-1158.2024.07.07

摘要
图/表
参考文献(30)
相关文章 (1)

全文: PDF (493 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要针对传统相机在拍摄人眼运动时易产生动态模糊、时间分辨率低等问题,采用事件相机近眼拍摄构建Spiking-Eye数据集,并提出一种坐标注意力的脉冲神经网络模型(CA-SpikingRepVGG)。模型读取编码后的事件数据,经过带坐标注意力的主干网络进行特征提取,最后馈入检测头进行检测。实验结果显示:CA-SpikingRepVGG的平均检测精确率RP达到了70.8%,与SpikingVGG-16比较,该模型的RP提高了15.9%,召回率Rr提高了14.2%;仅需SpikingDensenet模型1/3的训练时间,比其RP提高1.8%、Rr提高0.9%。结果表明:该模型在针对眼球运动这一场景下对人眼的检测追踪能力更强,可以很好地完成注视估计任务。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	王红霞
	赵志国

关键词 ：机器视觉;目标检测;脉冲神经网络;注视估计;坐标注意力;召回率, 事件相机

Abstract：The problems of dynamic blur and low temporal resolution in capturing eye movements with traditional cameras are addressed by employing an event camera for close-range capture and constructing a spiking-eye dataset. A spiking neural network model with a coordinate attention referred to as CA-SpikingRepVGG. The model reads encoded event data and performs feature extraction using the attention-based backbone network, followed by detection using the detection head. Experimental results demonstrate that CA-SpikingRepVGG achieves a mean average precision RP of 70.8%. Compared to SpikingVGG-16, the model shows a 15.9% improvement in RP and a 14.2% increase in Rr. With only one-third of the training time required by SpikingDensenet, the model achieves a 1.8% improvement in RP and a 0.9% improvement in Rr. These results indicate that the proposed model exhibits stronger eye detection and tracking capabilities in the context of eye movement, effectively accomplishing gaze estimation tasks.

Key words： machine vision object detection spiking neural network gaze estimation coordinate attention recall event camera

收稿日期: 2023-09-04 发布日期: 2024-07-04

PACS:

TB96

基金资助:辽宁省自然科学基金(2022-MS-276)

作者简介: 王红霞(1977-),辽宁沈阳人,沈阳理工大学教授,主要从事人工智能、物联网技术方面的研究。Email:sunny58258@sina.com

引用本文:

王红霞,赵志国. 基于坐标注意力脉冲神经网络的注视估计方法[J]. 计量学报, 2024, 45(7): 982-988.
WANG Hongxia,ZHAO Zhiguo. Gaze Estimation Method Based on Coordinate Attention and Spiking Neural Network. Acta Metrologica Sinica, 2024, 45(7): 982-988.

链接本文:

http://jlxb.china-csm.org:81/Jwk_jlxb/CN/10.3969/j.issn.1000-1158.2024.07.07 或 http://jlxb.china-csm.org:81/Jwk_jlxb/CN/Y2024/V45/I7/982

［1］	VERGHESE P, McKEE S P. Predicting future motion［J］. Journal of Vision, 2002,2(5):413-423.
［10］	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: Optimal speed and accuracy of object detection［J］. arxiv preprint arxiv: 2004. 10934, 2020.
［2］	周小龙, 刘倩倩, 产思贤, 等. 基于事件相机的视觉跟踪算法综述［J］. 小型微型计算机系统, 2020, 41(11): 2325-2332.
［5］	GIRSHICK R. Fast R-CNN ［C］//Proceedings of the IEEE international conference on computer vision. Santiago, Chile, 2015:1440-1448.
	ZHOU X L, LIU Q Q, CHAN S X, et al. A Survey of Visual Tracking Algorithms Based on Event Cameras ［J］. Journal of Miniaturized and Microcomputers, 2020, 41(11): 2325-2332.
［4］	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］//Proceedings of the IEEE conference on computer vision and pattern recognition. Columbus, USA,2014: 580-587.
［14］	姚波, 温秀兰, 焦良葆, 等. 改进YOLOv3算法用于铝型材表面缺陷检测［J］. 计量学报, 2022, 43(10): 1256-1261.
［26］	张世辉, 王红蕾, 陈宇翔, 等. 基于深度学习利用特征图加权融合的目标检测方法［J］. 计量学报, 2020, 41(11): 1344-1351.
［7］	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection［C］//Proceedings of the IEEE conference on computer vision and pattern recognition. Las Vegas, USA, 2016: 779-788.
［8］	REDMON J, FARHADI A. YOLO 9000: better, faster, stronger［C］//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2017: 7263-7271.
［9］	REDMON J, FARHADI A. Yolov3: An incremental improvement［J］. arxiv preprint arxiv: 1804. 02767, 2018.
［11］	Wang C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors［C］//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Vancouver, Canada, 2023: 7464-7475.
	YAO B, WEN X L, JIAO L B, et al. Improved YOLOv3 Algorithm for Surface Defect Detection in Aluminum Profiles［J］. Acta Metrologica Sinica, 2022, 43(10): 1256-1261.
［16］	WONG E T, YEAN S, HU Q, et al. Gaze estimation using residual neural network［C］//2019 IEEE international conference on pervasive computing and communications workshops (PerCom Workshops). IEEE, 2019: 411-414.
［17］	ANGELOPOULOS A N, MARTEL J N P, KOHLI A P S, et al. Event based, near eye gaze tracking beyond 10, 000 hz［J］. arxiv preprint arxiv: 2004. 03577, 2020.
［18］	STOFFREGEN T, DARAEI H, ROBINSON C, et al. Event-based kilohertz eye tracking using coded differential lighting［C］//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, USA,2022: 2515-2523.
［20］	HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design［C］//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Nashvile, USA,2021:13713-13722.
［24］	GERSTNER W, KISTLER W M, NAUD R, et al. Neuronal dynamics: From single neurons to networks and models of cognition［M］. Cambridge: Cambridge University Press, 2014.
［3］	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks［J］. Communications of the ACM, 2017, 60(6): 85-90.
［13］	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector ［C］//European Conference on Computer Vision, Amsterdam, Netherlands, 2016: 21-37.
［22］	MAASS W. Networks of spiking neurons: The third generation of neural network models［J］. Neural Networks, 1997, 10(9): 1659-1671.
	ZHANG S H, WANG H L, CHEN Y X, et al. Object Detection Method Based on Deep Learning and Feature Map Weighted Fusion ［J］. Acta Metrologica Sinica, 2020, 41(11): 1344-1351.
［27］	CORDONE L, MIRAMOND B, THIERION P. Object detection with spiking neural networks on automotive event data［C］//2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022: 1-8.
［29］	HU J, SHEN L, SUN G. Squeeze-and-excitation networks［C］//Proceedings of the IEEE conference on computer vision and pattern recognition. Salt Lake City, USA,2018: 7132-7141.
［6］	REN S Q, HEK M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39 (6): 1137 -1149.
［12］	GE Z, LIU S, WANG F, et al. Yolox: Exceeding yolo series in 2021［J］. arxiv preprint arxiv: 2107. 08430, 2021.
［15］	LIAN D, HU L, LUO W, et al. Multiview multitask gaze estimation with deep convolutional neural networks［J］. IEEE transactions on neural networks and learning systems, 2018, 30(10): 3010-3023.
［21］	GALLEGO G, DELBRüCK T, ORCHARD G, et al. Event-based vision: A survey［J］. IEEE transactions on pattern analysis and machine intelligence, 2020, 44(1): 154-180.
［25］	FANG W, YU Z, CHEN Y, et al. Incorporating learnable membrane time constant to enhance learning of spiking neural networks［C］//Proceedings of the IEEE/CVF international conference on computer vision. Nashvile, USA,2021: 2661-2671.
［30］	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft coco: Common objects in context［C］//Computer Vision-ECCV 2014: 13th European Conference. Zurich, Switzerland, 2014: 740-755.
［19］	DING X, ZHANG X, MA N, et al. Repvgg: Making vgg-style convnets great again［C］//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Nashvile, USA,2021:13733-13742.
［28］	Qi C R, SU H, MO K, et al. Pointnet: Deep learning on point sets for 3d classification and segmentation［C］//Proceedings of the IEEE conference on computer vision and pattern recognition. Honolulu, USA,2017:652-660.
［23］	HODGKIN A L, HUXLEY A F, A quantitative description of membrane current and its application to conduction and excitation in nerve［J］. The Journal of physiology, 1952, 117(4): 500-544.