Vectorized instructive signals in cortical dendrites

AI Summary10 min read

TL;DR

This study demonstrates that cortical dendrites implement vectorized instructive signals for credit assignment in the brain, using a neurofeedback BCI task in mice. Dendritic activity encodes task-related information like reward and error, and disrupting these signals impairs learning.

Key Takeaways

•Vectorized instructive signals in cortical dendrites solve the credit assignment problem by tailoring teaching signals to individual neurons.
•Dendritic activity contains task-related information (e.g., reward, error) not fully present in somatic activity, reflecting neuron-specific contributions.
•Optogenetic perturbation of dendritic signals disrupts learning, confirming their functional role in neural adaptation.
•The study uses a neurofeedback BCI task to specify reward functions, enabling direct testing of subcellular learning mechanisms.

Abstract

Vectorization of teaching signals is a key element of almost all modern machine learning algorithms, including backpropagation, target propagation and reinforcement learning. Vectorization allows a scalable and computationally efficient solution to the credit assignment problem by tailoring instructive signals to individual neurons. Recent theoretical models have suggested that neural circuits could implement single-phase vectorized learning at the cellular level by processing feedforward and feedback information streams in separate dendritic compartments^1,2,3,4,5. This presents a compelling, but untested, hypothesis for how cortical circuits could solve credit assignment in the brain. Here we used a neurofeedback brain–computer interface task with an experimenter-defined reward function to test for vectorized instructive signals in dendrites. We trained mice to modulate the activity of two spatially intermingled populations (four or five neurons each) of layer 5 pyramidal neurons in the retrosplenial cortex to rotate a visual grating towards a target orientation while we recorded GCaMP activity from somas and corresponding distal apical dendrites. We observed that the relative magnitudes of somatic and dendritic signals could be predicted using the activity of the surrounding network and contained information about task-related variables that could serve as instructive signals, including reward and error. The signs of these putative teaching signals depended on the causal role of individual neurons in the task and predicted changes in overall activity over the course of learning. Furthermore, targeted optogenetic perturbation of these signals disrupted learning. These results demonstrate a vectorized instructive signal in the brain, implemented via semi-independent computation in cortical dendrites, unveiling a potential mechanism for solving credit assignment in the brain.

Inferring neural activity before plasticity as a foundation for learning beyond backpropagation

Article Open access 03 January 2024

Foundation model of neural activity predicts response to new stimulus types

Article Open access 09 April 2025

Specialized structure of neural population codes in parietal cortex outputs

Article Open access 31 October 2025

Main

Learning is the product of changes in the strength of synaptic connections between neurons^{6,7,8,9,10,11,12,13}. Synaptic modifications can have difficult-to-predict effects on network output, particularly in complex hierarchical networks such as the brain. The challenge of determining how individual synapses should be altered to improve task performance is known as the credit assignment problem^{14,15,16,17,18}. Whereas this problem is effectively solved in artificial neural networks (ANNs) by the backpropagation-of-error algorithm¹⁹, how credit assignment is solved in the brain remains unknown^14,15.

Recent theoretical work has proposed several models by which biological circuits could solve credit assignment, including target learning and backpropagation-like algorithms^{1,2,3,4,5,20,21}. Central to both artificial and biologically inspired solutions to credit assignment is the vectorization of instructive signals, as opposed to the broadcasting of a single scalar teaching signal¹⁴. Effective learning requires, in addition to vectorization, instructive signals to be separable from feedforward inputs to prevent interference¹⁵. In ANNs, this is achieved via temporal separation, which has long been thought to be biologically implausible. One hypothesis is that in cortex, credit-related information is spatially, rather than temporally, segregated in the apical dendrites of pyramidal neurons¹⁵. This aligns with anatomical and circuit evidence that feedforward inputs are received perisomatically and feedback inputs are received in the distal dendrites^{22,23,24,25,26,27,28,29,30,31}. However, direct evidence regarding the subcellular mechanisms of credit assignment is lacking.

Vectorized teaching signals at the dendritic level should meet four experimentally testable conditions. First, dendritic activity should contain information that is not present in somatic activity alone (although somas could theoretically transmit gradients using qualitatively different spiking patterns^2,4,32, the cable properties of dendrites predict some level of independence between somatic and dendritic activity). Second, dendritic activity should encode information about task performance that could serve as instructive signals, such as reward and error representations. Third, dendritic activity should reflect the contribution of that neuron to task performance (that is, the reward function). Fourth, disrupting vectorized instructive dendritic signals should impair learning.

Specifying a reward function using a BCI task

Evaluating credit assignment in biological neural networks has thus far proved impossible^14,15. Teaching signals can only be defined relative to a reward function that maps neural activity to task performance. It is unclear whether such functions are explicitly represented in the brain. Even if they are, experimenters are blind to their specific formulation in terms of neural activity¹⁵. Neurofeedback brain–computer interface (BCI) tasks present a potential solution to this problem by directly coupling neural activity to task performance, thereby allowing the experimenter to specify the reward function to be optimized^14,20,21. Previous studies have shown that mice are able to learn BCI tasks using a variety of feedback stimuli and brain areas and that learning induces changes in the activity of the neurons controlling the BCI, including in the hippocampus and various sensory and motor cortices^{33,34,35,36,37,38,39}. Here we leveraged a visually guided neurofeedback BCI task in cortical pyramidal neurons to test subcellular mechanisms for error and reward-related signalling (Fig. 1a–c and Supplementary Figs. 1 and 2). We trained head-fixed mice under a 2-photon microscope to control the activity of two spatially intermingled sets of GCaMP7f-labelled layer 5 pyramidal neurons, in the retrosplenial cortex (RSC), designated P+ and P− (selection criteria in Extended Data Figs. 1 and 4b and Methods). The difference in mean somatic GCaMP activity of P+ versus P− neurons was coupled to rotation of a visual grating relative to a rewarded target angle^{33,34,35,36,38,39} (Fig. 1d–f and Supplementary Data Fig. 1). We selected RSC owing to the optical accessibility of layer 5 and previous demonstration of independent dendritic events in this area⁴⁰. We recorded GCaMP activity at 15 Hz in the proximal trunk dendrite as a proxy for somatic activity; this allowed imaging of many neurons while reducing signal contamination owing to the more precise spatial footprint and faster signal kinetics of the apical trunk^41,42,43. We measured task performance with two metrics: accuracy, which represented the fraction of rewarded trials; and speed, which represented the number of rewards obtained per minute. Mice (n = 6) learned the task by both metrics (Fig. 1g and Extended Data Fig. 2 and 3).

**Fig. 1: Mice learn a neurofeedback BCI task through the differential regulation of P+ and P− neurons.**

We compared activity levels of P+ and P− populations, as well as the population of surrounding neurons that were not directly involved in the rotation of the stimulus (termed P₀), across days of task performance. We imaged the same neurons longitudinally throughout all experiments. We found that learning was accompanied by the differential regulation in the activity of P+ and P− neurons over days (Fig. 1h,i), with P+ neurons maintaining their activity levels while P− neurons were downregulated. Whereas, on average, changes in activity in P₀ neurons resembled changes in P+ neurons (Fig. 1i), selecting the subpopulation of P₀ neuron with matching activity levels of P+ and P− neurons on day 1 revealed that changes in activity in P₀ neurons fell in between those of P+ and P− neurons (Extended Data Fig. 4). As the most active neurons on day 1 were also those that were most strongly downregulated (Extended Data Fig. 4c), our results are consistent with a model of learnin