14.3.2. Case Study: A Signaling Game

A simulation of a signaling reinforcement learning process, in which two agents learn to communicate with each other via signals.

For getting a quick impression, you can run this model from the Sim4edu website, or inspect its OESjs code.

In this simulation of a signaling reinforcement learning (RL) process, two agents learn to communicate with each other via signals. One of the two agents (the blind "jumper") is not able to see the size of a barrier over which he has to jump, while the other agent (the "speaker") is able to see the size of the barrier and tries to communicate the jump length to the jumper. However, the two agents do not speak a common language, so both agents first have to learn which signal communicates which jump length. Both of them can perceive the success or failure of a jump (a jump fails if it is too short or too long), and then update their signalling, resp. signal interpretation, function, accordingly.

Based on Beyond the Chinese Room: The Blind Jumper - Self-Organised Semantics and Pragmatics on the Computer. By Peter Fleissner and Gregor Fleissner (in German). In W. Hofkirchner (Ed.), Information und Selbstorganisation - Annäherungen an eine vereinheitlichte Theorie der Information, StudienVerlag, Innsbruck/Wien, pp. 325-340, 1998.

See also

Conceptual Model
Conceptual Information Model

The potentially relevant object types are:

  1. barriers with a certain length,
  2. speakers (that try to signal the jump length),
  3. jumpers (that try to interpret the jump length signal).

Potentially relevant types of events are:

  1. start over (periodic time events),
  2. perceive barrier (perception events),
  3. send jump length signal (out-message events),
  4. receive jump length signal (in-message events),
  5. jump (action events).

Object, event and action types, together with their participation associations, can be visually described in a conceptual information model in the form of a conceptual Object Event (OE) class diagram.

conceptual information model describing object, event and activity types
Conceptual Process Model
A conceptual process model in the form of a BPMN diagram
Simulation Design

The model defines two agent types: Speaker and Jumper, one object type: Barrier, and four event types: the periodic time event type StartOver, the perception event type PerceiveBarrier, the message event type SendJumpLengthSignal and the action type Jump. The simulation of learning by trial and failure is based on repeated rounds of event sequences of the form StartOverPerceiveBarrierSendJumpLengthSignalJump. The function to be learned is expressed as a probability matrix where the row index, representing the current (information) state type, is mapped to a column index, representing an action option, by choosing the column with the maximal cell value.

After perceiving the current length of the barrier in a PerceiveBarrier event, the speaker tries to communicate this information to the blind jumper using a symbol from his symbol set {A, B, C} chosen with his learning function/matrix.

Then, for taking a decision on the length of the next Jump, the jumper maps the received symbol to a possible jump length (1-4) using his learning function/matrix and then jumps. Subsequently, both the speaker and the jumper update their learning functions/matrices: when the jump was a success, they increase the probability of their signalling choice, resp. signal interpretation choice, while they decrease it when the jump was a failure.

Finally, a StartOver event occurs, resetting the jumper's position and modifying the length of the barrier.

The simulated learning process goes on until the two learning functions/matrices become stable. This means that the two agents were able to find a common language that allows communicating the barrier length.

Remarkably, the Blind Jumper by Peter Fleissner and Gregor Fleissner is a minimal model for teaching/­learning/­illustrating multi-agent reinforcement learning.

Information Design Model

T.B.D.

An information design model, in the form of an OE class diagram as shown below, is derived from a conceptual information model by abstracting away from items that are not design-relevant and possibly adding certain computational details.

Process Design Model

T.B.D.

A process design model, in the form of a DPMN process diagram as shown below, is derived from a conceptual process model by abstracting away from items that are not design-relevant and possibly adding certain computational details.

A DPMN process design model essentially defines the admissible sequences of events and activities together with their dependencies and effects on objects, while its underlying OE class design model defines the types of objects, events and activities, together with the participation of objects in events and activities, including the resource roles of activities, as well as resource multiplicity constraints, parallel participation constraints, alternative resources, and task priorities.

It is an option, though, to enrich a DPMN process design model by displaying more computational details, especially the recurrence of exogenous events, the duration of activities and the most important resource management features defined in the underlying OE class design model, such as resource roles (in particular, performer roles can be displayed in the form of Lanes) and resource multiplicity constraints. The following model shows an enriched version of :

Such an enriched DPMN process design model includes all computational details needed for an implementation without a separate explicit OE class design model. In fact, such a process model implicitly defines a corresponding class model. For instance, the enriched DPMN model of implicitly defines the OE class model of above.

Combined with its underlying OE class design model, a DPMN process design model provides a computationally complete specification of a simulation model.