lifelong learning

RL with policy advice. Azar et al., ECML 2013.

  - Reduction from RL to bandit problem.

Regret bounds: sum of differences between actual policy and optimal policy.
Regret scales with the number of tasks \sqrt(M), rather than the state and action space.
Brunskill and Li, UAI 2013. Reduce from RL to (active) classification problem.
https://cs.stanford.edu/people/ebrun
Provably speeding multitask RL. Guo and Brunskill, AAAI 2015. K tasks sampled from M tasks. Evaluation goal: provably improve performance. Approach: quickly cluster, then share.
Killian et al., NIPS 2017. Bayesian NNs for modeling MDP dynamics.
Smooth latent policy space for crossdomain transfer. Anmar et al., IJCAI 2015. Limited theoretical results (some nice convergence results).
Model-agnostic meta-learning. Finn et al., ICML 2017.

RL with policy advice. Azar et al., ECML 2013.

  - Reduction from RL to bandit problem.

Regret bounds: sum of differences between actual policy and optimal policy.
Regret scales with the number of tasks \sqrt(M), rather than the state and action space.
Brunskill and Li, UAI 2013. Reduce from RL to (active) classification problem.
https://cs.stanford.edu/people/ebrun
Provably speeding multitask RL. Guo and Brunskill, AAAI 2015. K tasks sampled from M tasks. Evaluation goal: provably improve performance. Approach: quickly cluster, then share.
Killian et al., NIPS 2017. Bayesian NNs for modeling MDP dynamics.
Smooth latent policy space for crossdomain transfer. Anmar et al., IJCAI 2015. Limited theoretical results (some nice convergence results).
Model-agnostic meta-learning. Finn et al., ICML 2017.

Related Skills