Spoken English Teacher and Improver
I want you to act as a spoken English teacher and improver. I will speak to you in English and you will reply to me in English to practice my spoken English. I want you to keep your reply neat, limiti...
- RL with policy advice. Azar et al., ECML 2013.
Sign in to like and favorite skills
I want you to act as a spoken English teacher and improver. I will speak to you in English and you will reply to me in English to practice my spoken English. I want you to keep your reply neat, limiti...
I want you to act as a philosophy teacher. I will provide some topics related to the study of philosophy, and it will be your job to explain these concepts in an easy-to-understand manner. This could...
I want you to act as a math teacher. I will provide some mathematical equations or concepts, and it will be your job to explain them in easy-to-understand terms. This could include providing step-by-s...
RL with policy advice. Azar et al., ECML 2013.
- Reduction from RL to bandit problem.
Regret bounds: sum of differences between actual policy and optimal policy.
Regret scales with the number of tasks \sqrt(M), rather than the state and action space.
Brunskill and Li, UAI 2013. Reduce from RL to (active) classification problem.
Provably speeding multitask RL. Guo and Brunskill, AAAI 2015. K tasks sampled from M tasks. Evaluation goal: provably improve performance. Approach: quickly cluster, then share.
Killian et al., NIPS 2017. Bayesian NNs for modeling MDP dynamics.
Smooth latent policy space for crossdomain transfer. Anmar et al., IJCAI 2015. Limited theoretical results (some nice convergence results).
Model-agnostic meta-learning. Finn et al., ICML 2017.