© 2022 (CC BY) Gerd Wagner, Brandenburg University of Technology, Germany.
A simulation of a signaling reinforcement learning process, in which two agents learn to communicate with each other via signals.
You can inspect the model's OESjs code on the OES GitHub repo.
In this simulation of a signaling reinforcement learning (RL) process, two agents learn to communicate with each other via signals. One of the two agents (the blind "jumper") is not able to see the size of a barrier over which he has to jump, while the other agent (the "speaker") is able to see the size of the barrier and tries to communicate the jump length to the jumper. However, the two agents do not speak a common language, so both agents first have to learn which signal communicates which jump length. Both of them can perceive the success or failure of a jump (a jump fails if it is too short or too long), and then update their signalling, resp. signal interpretation, function, accordingly.
The model defines two agent types: Speaker and Jumper, one object type: Barrier, and four event types: the periodic time event type StartOver, the perception event type PerceiveBarrier, the message event type SendJumpLengthSignal and the action type Jump. The simulation of learning by trial and failure is based on repeated rounds of event sequences of the form StartOver→PerceiveBarrier→SendJumpLengthSignal→Jump. The function to be learned is expressed as a probability matrix where the row index, representing the current (information) state type, is mapped to a column index, representing an action option, by choosing the column with the maximal cell value.
After perceiving the current length of the barrier in a PerceiveBarrier event, the speaker tries to communicate this information to the blind jumper using a symbol from his symbol set {A, B, C} chosen with his learning function/matrix.
Then, for taking a decision on the length of the next Jump, the jumper maps the received symbol to a possible jump length (1-4) using his learning function/matrix and then jumps. Subsequently, both the speaker and the jumper update their learning functions/matrices: when the jump was a success, they increase the probability of their signalling choice, resp. signal interpretation choice, while they decrease it when the jump was a failure.
Finally, a StartOver event occurs, resetting the jumper's position and modifying the length of the barrier.
The simulated learning process goes on until the two learning functions/matrices become stable. This means that the two agents were able to find a common language that allows communicating the barrier length.
Remarkably, the Blind Jumper by Peter Fleissner and Gregor Fleissner is a minimal model for teaching/learning/illustrating multi-agent reinforcement learning.
Based on Beyond the Chinese Room: The Blind Jumper - Self-Organised Semantics and Pragmatics on the Computer. By Peter Fleissner and Gregor Fleissner (in German). In W. Hofkirchner (Ed.), Information und Selbstorganisation - Annäherungen an eine vereinheitlichte Theorie der Information, StudienVerlag, Innsbruck/Wien, pp. 325-340, 1998.
Wilensky, U. 2016. NetLogo Signaling Game model. http://ccl.northwestern.edu/netlogo/models/SignalingGame. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.
Skyrms, B. 2006. Signals. Presidential Address, Philosophy of Science Association. http://www.socsci.uci.edu/~bskyrms/bio/other/Presidential%20Address.pdf, accessed 2022-01-05.