Neural network-based encoding in free-viewing fMRI with gaze-aware models
Abstract
Representations learned by convolutional neural networks (CNNs) exhibit a remarkable resemblance to information processing patterns observed in the primate visual system on large neuroimaging datasets collected under diverse, naturalistic visual stimulation, but with instruction for participants to maintain central fixation. This viewing condition, however, diverges significantly from ecologically valid visual behaviour, suppresses activity in visually active regions, and imposes substantial cognitive load on the viewing task. We present a modification of the encoding model framework, adapting it for use with naturalistic vision datasets acquired under fully natural viewing conditions, without fixation, by incorporating eye-tracking data. Our gaze-aware encoding models were trained on the StudyForrest dataset, which features task-free naturalistic movie viewing. By combining eye-tracking data with the visual content of movie frames, we generate combined subject-wise gaze-stimulus specific feature time series. These time series are constructed by sampling only the locally and temporally relevant elements of the CNN feature map for each fixation. Our results demonstrate that gaze-aware encoding models match the performance of conventional encoding models with 112x fewer model parameters. Gaze-aware encoding models were especially beneficial for participants with more dynamic eye-movement patterns. Therefore, this approach opens the door to more ecologically valid models that can be built in more naturalistic settings, such as playing games or navigating virtual environments.
Source: arXiv:2603.11663v1 - http://arxiv.org/abs/2603.11663v1 PDF: https://arxiv.org/pdf/2603.11663v1 Original Link: http://arxiv.org/abs/2603.11663v1