Description: Functional role of opponent, dopamine modulated D1/D2 plasticity in prediction error-driven reinforcement learning in the basal ganglia

This title appears in the Scientific Report : 2013

Functional role of opponent, dopamine modulated D1/D2 plasticity in prediction error-driven reinforcement learning in the basal ganglia

In this work, we introduce a spiking actor-critic network model of learning from both reward and punishment in the basal ganglia. Both the dorsal (actor) and ventral (critic) striatum are assumed to contain populations of D1 and D2 medium spiny neurons (MSNs). In the ventral striatum, this allows se...

Personal Name(s):	Jitsev, Jenia (Corresponding author)
	Abraham, Nobi / Tittgemeyer, Marc / Morrison, Abigail
Contributing Institute:	Computational and Systems Neuroscience; INM-6 Computational and Systems Neuroscience; IAS-6
Imprint:	2013
Conference:	Computational Psychiatry 2013, Miami (USA), 2013-10-22 - 2013-10-23
Document Type:	Poster
Research Program:	W2/W3 Professorinnen Programm der Helmholtzgemeinschaft Supercomputing and Modelling for the Human Brain Helmholtz Alliance on Systems Biology Signaling pathways, cell and tumor biology
	Publikationsportal JuSER

In this work, we introduce a spiking actor-critic network model of learning from both reward and punishment in the basal ganglia. Both the dorsal (actor) and ventral (critic) striatum are assumed to contain populations of D1 and D2 medium spiny neurons (MSNs). In the ventral striatum, this allows separate representation of both positive and negative expected outcomes by respective D1/D2 MSN populations, which we hypothesize to reside in the shell part of the Nucleus Accumbens. The positive and negative outcome expectations are fed to dopamine (DA) neurons in VTA region, which compute and signal total prediction error by DA release. Based on recent experimental work [1], DA level is assumed to modulate plasticity of D1 and D2 synapses in opposing way, inducing LTP on D1 and LTD on D2 synapses if being high and vice versa if being low. Crucially, this form of opponent plasticity implements temporal-difference (TD)-like update of both positive and negative outcome expectations and performs appropriate adaptation of action preferences.We implemented the network in the NEST simulator [2] using leaky integrate-and-fire spiking neurons, and designed a battery of experiments in various grid world tasks. Across the tasks the network can learn both to approach the delayed rewards while consequently avoiding punishments, which posed severe difficulties for the previous model without D1/D2 segregation [3]. The model highlights thus the functional role of D1/D2 MSN segregation within the striatum in implementing appropriate TD-like learning from both reward and punishment and explains necessity for opponent direction of DA-dependent plasticity found at synapses converging on distinct striatal MSN types. The approach can be further extended to study how abnormal D1/D2 plasticity may lead to a reorganization of the basal ganglia network towards pathological, dysfunctional states, like for instance those observed in Parkinson disease under condition of progressive dopamine depletion.[1] Shen, W., Flajolet, M., Greengard, P. and Surmeier, D. J. Dichotomous dopaminergic control of striatal synaptic plasticity. Science, 2008, 321, 848-851[2] Gewaltig M-O and Diesmann M (2007). NEST, Scholarpedia 2(4):1430[3] Potjans, W., Diesmann, M. and Morrison, A. An imperfect dopaminergic error signal can drive temporal-difference learning. PLoS Comput. Biol., 2011, 7