Abstract:The problems of dynamic blur and low temporal resolution in capturing eye movements with traditional cameras are addressed by employing an event camera for close-range capture and constructing a spiking-eye dataset. A spiking neural network model with a coordinate attention referred to as CA-SpikingRepVGG. The model reads encoded event data and performs feature extraction using the attention-based backbone network, followed by detection using the detection head. Experimental results demonstrate that CA-SpikingRepVGG achieves a mean average precision RP of 70.8%. Compared to SpikingVGG-16, the model shows a 15.9% improvement in RP and a 14.2% increase in Rr. With only one-third of the training time required by SpikingDensenet, the model achieves a 1.8% improvement in RP and a 0.9% improvement in Rr. These results indicate that the proposed model exhibits stronger eye detection and tracking capabilities in the context of eye movement, effectively accomplishing gaze estimation tasks.
GIRSHICK R. Fast R-CNN [C]//Proceedings of the IEEE international conference on computer vision. Santiago, Chile, 2015:1440-1448.
ZHOU X L, LIU Q Q, CHAN S X, et al. A Survey of Visual Tracking Algorithms Based on Event Cameras [J]. Journal of Miniaturized and Microcomputers, 2020, 41(11): 2325-2332.
[4]
GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Columbus, USA,2014: 580-587.
REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Las Vegas, USA, 2016: 779-788.
[8]
REDMON J, FARHADI A. YOLO 9000: better, faster, stronger[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2017: 7263-7271.
[9]
REDMON J, FARHADI A. Yolov3: An incremental improvement[J]. arxiv preprint arxiv: 1804. 02767, 2018.
[11]
Wang C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Vancouver, Canada, 2023: 7464-7475.
YAO B, WEN X L, JIAO L B, et al. Improved YOLOv3 Algorithm for Surface Defect Detection in Aluminum Profiles[J]. Acta Metrologica Sinica, 2022, 43(10): 1256-1261.
[16]
WONG E T, YEAN S, HU Q, et al. Gaze estimation using residual neural network[C]//2019 IEEE international conference on pervasive computing and communications workshops (PerCom Workshops). IEEE, 2019: 411-414.
[17]
ANGELOPOULOS A N, MARTEL J N P, KOHLI A P S, et al. Event based, near eye gaze tracking beyond 10, 000 hz[J]. arxiv preprint arxiv: 2004. 03577, 2020.
[18]
STOFFREGEN T, DARAEI H, ROBINSON C, et al. Event-based kilohertz eye tracking using coded differential lighting[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, USA,2022: 2515-2523.
[20]
HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Nashvile, USA,2021:13713-13722.
[24]
GERSTNER W, KISTLER W M, NAUD R, et al. Neuronal dynamics: From single neurons to networks and models of cognition[M]. Cambridge: Cambridge University Press, 2014.
[3]
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 85-90.
[13]
LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C]//European Conference on Computer Vision, Amsterdam, Netherlands, 2016: 21-37.
[22]
MAASS W. Networks of spiking neurons: The third generation of neural network models[J]. Neural Networks, 1997, 10(9): 1659-1671.
ZHANG S H, WANG H L, CHEN Y X, et al. Object Detection Method Based on Deep Learning and Feature Map Weighted Fusion [J]. Acta Metrologica Sinica, 2020, 41(11): 1344-1351.
[27]
CORDONE L, MIRAMOND B, THIERION P. Object detection with spiking neural networks on automotive event data[C]//2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022: 1-8.
[29]
HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Salt Lake City, USA,2018: 7132-7141.
[6]
REN S Q, HEK M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39 (6): 1137 -1149.
[12]
GE Z, LIU S, WANG F, et al. Yolox: Exceeding yolo series in 2021[J]. arxiv preprint arxiv: 2107. 08430, 2021.
[15]
LIAN D, HU L, LUO W, et al. Multiview multitask gaze estimation with deep convolutional neural networks[J]. IEEE transactions on neural networks and learning systems, 2018, 30(10): 3010-3023.
[21]
GALLEGO G, DELBRüCK T, ORCHARD G, et al. Event-based vision: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2020, 44(1): 154-180.
[25]
FANG W, YU Z, CHEN Y, et al. Incorporating learnable membrane time constant to enhance learning of spiking neural networks[C]//Proceedings of the IEEE/CVF international conference on computer vision. Nashvile, USA,2021: 2661-2671.
[30]
LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft coco: Common objects in context[C]//Computer Vision-ECCV 2014: 13th European Conference. Zurich, Switzerland, 2014: 740-755.
[19]
DING X, ZHANG X, MA N, et al. Repvgg: Making vgg-style convnets great again[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Nashvile, USA,2021:13733-13742.
[28]
Qi C R, SU H, MO K, et al. Pointnet: Deep learning on point sets for 3d classification and segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Honolulu, USA,2017:652-660.
[23]
HODGKIN A L, HUXLEY A F, A quantitative description of membrane current and its application to conduction and excitation in nerve[J]. The Journal of physiology, 1952, 117(4): 500-544.