Inverse Reinforcement Learning in Swarm Systems


  • A. Šošić, W. R. KhudaBukhsh, A. M. Zoubir and H. Koeppl,
    Inverse Reinforcement Learning in Swarm Systems,
    International Conference on Autonomous Agents and Multiagent systems, 2017


Vicsek dynamics
Vicsek dynamics
  • Expert Demonstrations
    Example trajectory of the Vicsek system (consisting of 200 agent trajectories) used as input to the learning algorithm. The agent positions and orientations are initialized randomly.
  • Learning
    Visualisation of the proposed heterogeneous Q-learning scheme. Red agents are exploring agents, blue agents perform greedy actions. The system learns a policy for the final reward estimate returned by the IRL procedure. Note that the direction of travel is different from that of the demonstrated behavior since the used observation model considers only relative orientations.
  • Learned Policy
    The learned policy executed from a random initialisation of the system state.