Emergence of associative learning in a neuromorphic inference network
Abstract
Objective.
In the theoretical framework of predictive coding and active inference, the brain can be viewed as instantiating a rich generative model of the world that predicts incoming sensory data while continuously updating its parameters via minimization of prediction errors.
While this theory has been successfully applied to cognitive processes—by modelling the activity of functional neural networks at a mesoscopic scale—the validity of the approach when modelling neurons as an ensemble of inferring agents, in a biologically plausible architecture, remained to be explored.
Approach.
We modelled a simplified cerebellar circuit with individual neurons acting as Bayesian agents to simulate the classical delayed eyeblink conditioning protocol. Neurons and synapses adjusted their activity to minimize their prediction error, which was used as the network cost function. This cerebellar network was then implemented in hardware by replicating digital neuronal elements via a low-power microcontroller.
Main results.
Persistent changes of synaptic strength—that mirrored neurophysiological observations—emerged via local (neurocentric) prediction error minimization, leading to the expression of associative learning. The same paradigm was effectively emulated in low-power hardware showing remarkably efficient performance compared to conventional neuromorphic architectures.
Significance.
These findings show that: (a) an ensemble of free energy minimizing neurons—organized in a biological plausible architecture—can recapitulate functional self-organization observed in nature, such as associative plasticity, and (b) a neuromorphic network of inference units can learn unsupervised tasks without embedding predefined learning rules in the circuit, thus providing a potential avenue to a novel form of brain-inspired artificial intelligence.
1 introduction
Clearly, implementing the FEP in a specific neuromorphic architecture, like that anticipated for the cerebellum, raises a series of questions. Does the FEP effectively apply to single neuron? What is the role of network connectivity? Will plasticity driven by neurocentric active inference emerge at the synaptic sites that mirror biology?
We used the connectivity architecture of the cerebellum to test two hypotheses: (a) the emergent behaviour of the ensemble would recapitulate associative learning of the sort seen empirically and (b) any changes to the connectivity architecture (optimised by natural Bayesian model selection) would render free energy minimisation and performance suboptimal. We tested these hypotheses using in silico simulations and numerical experiments.
2.6. Hardware implementation The circuit architecture was realized by connecting 105 identical digital neurons, each implemented by means of a commercial, off-the-shelf, low-cost 4 and low-power microcontroller, the STM32L475.
This device is equipped with an ARM® Cortex®-M4 core, with digital signal processing and floating-point unit that can run up to 80 MHz, as well as with several integrated peripherals like analog-to-digital and digital-to-analog converters, fast communication buses, controllers, operational amplifiers, comparators, timers, and cryptography units. However, only the core was powered and exploited, disabling all the peripherals to optimize the energy consumption profile. All digital neurons were set to run at a core frequency of 16 MHz, were supplied by a 3 V DC power source, and received an externally generated low-frequency (i.e. 80 Hz) clock signal to synchronize their activity, with each clock cycle lasting 12.5 ms; corresponding to an individual time-step.
3.2
our modelled system is unique in terms of (a) the number of repetitions [27–29, 36], (b) the complexity of the neuronal network, (c) the absence of explicit learning rules.
The emergence of associative learning—through long-term plasticity in conventional neural networks—typically takes tens of cycles of stimulus presentation. In our case, the system starts to evince stable learning after only five repetitions. In short, a network of inferring neurons, that minimize a single (VFE) cost function, exhibit the functional properties and synaptic plasticity consistent with known experimental data, in the absence of bespoke learning rules.
Associative learning was impaired when connections were either homogeneously or randomly distributed across the network (rather than being organized hierarchically as in biology), suggesting that learning is not a general property exhibited by any set of interconnected neurons but rather that it depends sensitively on network architecture (see supplementary materials). Notably, the selective deletion of either synaptic connections or groups of neurons reduced associative learning task and changed the time course of free energy minimization (see figures SI-7 and SI-8).
4. Discussion
The inference network introduced here shows properties that may have implications for computational technologies.
First, it was more than an order of magnitude faster than classical artificial neural networks, which require hundreds of cycles for learning to occur [52, 53].
Secondly, learning was from one [54] to four orders [36] of magnitude smaller than other biologically inspired networks (e.g.[22–24, 34]), which typically involve thousands of units implementing specific learning rules at each synaptic stage.
Thirdly, learning was implemented using conventional off-the-shelf hardware components, demonstrating low power consumption and high capability of dynamic reconfiguration.
These unique features could result in unparalleled advantages for electronic implementation and artificial intelligence applications. For example, since the inference network capabilities emerge entirely from its connectivity and from the Bayesian engine in neurons (rather than from hardwired learning rules), the connections between neurons could simply be changed through programmable switch matrices, like those adopted in FPGA devices. Moreover, inference networks use the same Bayes inference engine and identical hardware for different classes of neuron.
Thus, the ease of implementation and reprogramming would make the inferential networks quite attractive in terms of maintenance costs and smart reuse of resources.
A notable feature of this approach is that since every element in the network is trying to minimize its free energy, the joint free energy, namely the free energy of the ensemble, is minimized. The joint free energy is minimized because it is an expression of the mutual predictability of the system, and we also expect that a system in which everything is mutually predictable would also manifest a quantifiable thermodynamic efficiency. In this setting, it would be interesting to investigate how the free energy minimization time-course correlates with declining power consumption.
完整内容请参考原论文