Introduction
Last lecture:
- Model-free prediction
- Estimate the value function of an unknown MDP
This lecture:
- Model-free control
- Optimise the value function of an unknown MDP
日々私たちが过ごしている日常というのは、実は奇迹の连続なのかもしれんな
Last lecture:
This lecture:
Last lecture, David taught us how to solve a known MDP, which is planning by dynamic programming. In this lecture, we will learn how to estimate the value function of an unknown MDP, which is model-free prediction. And in the next lecture, we will optimise the value function of an unknown MDP.
Table of Contents
Basically, Markov decision processes formally describe an environment for reinforcement learning, where the environment is fully observable, which means the current state completely characterises the process.
RL, especially DRL (Deep Reinforcement Learning) has been an fervent research area during these years. One of the most famous RL work would be AlphaGo, who has beat Lee Sedol, one of the best players at Go, last year. And in this year (2017), AlphaGo won three games with Ke Jie, the world No.1 ranked player. Not only in Go, AI has defeated best human play in many games, which illustrates the powerful of the combination of Deep Learning and Reinfocement Learning. However, despite AI plays better games than human, AI takes more time, data and energy to train which cannot be said to be very intelligent. Still, there are numerous unexplored and unsolved problems in RL research, that's also why we want to learn RL.
This is the first note of David Silver's RL course.
Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural language.
Speech recognition (SR) is the inter-disciplinary sub-field of computational linguistics that develops methodologies and technologies that enables the recognition and translation of spoken language into text by computers.