AAMAS

Inverse Reinforcement Learning in Swarm Systems

A. Šošić, W. R. KhudaBukhsh, A. M. Zoubir and H. Koeppl,
Inverse Reinforcement Learning in Swarm Systems,
International Conference on Autonomous Agents and Multiagent systems, 2017
Best Paper Award Finalist

Expert Demonstrations
Example trajectory of the Vicsek system (consisting of 200 agent trajectories) used as input to the learning algorithm. The agent positions and orientations are initialized randomly.
Learning
Visualisation of the proposed heterogeneous Q-learning scheme. Red agents are exploring agents, blue agents perform greedy actions. The system learns a policy for the final reward estimate returned by the IRL procedure. Note that the direction of travel is different from that of the demonstrated behavior since the used observation model considers only relative orientations.
Learned Policy
The learned policy executed from a random initialisation of the system state.