Skip to content

Commit

Permalink
Update chapter2.md
Browse files Browse the repository at this point in the history
  • Loading branch information
yyysjz1997 committed Jul 15, 2022
1 parent 72cdb71 commit 9707af5
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/chapter2/chapter2.md
Original file line number Diff line number Diff line change
Expand Up @@ -347,7 +347,7 @@ $$

这里我们另外引入了一个 `Q 函数(Q-function)`。Q 函数也被称为 `action-value function`**Q 函数定义的是在某一个状态采取某一个动作,它有可能得到的这个 return 的一个期望**,如式 (4) 所示:
$$
q^{\pi}(s, a)=\mathbb{E}_{\pi}\left[G_{t} \mid s_{t}=s, A_{t}=a\right] \tag{4}
q^{\pi}(s, a)=\mathbb{E}_{\pi}\left[G_{t} \mid s_{t}=s, a_{t}=a\right] \tag{4}
$$
这里期望其实也是 over policy function。所以你需要对这个 policy function 进行一个加和,然后得到它的这个价值。
**对 Q 函数中的动作函数进行加和,就可以得到价值函数**,如式 (5) 所示:
Expand Down Expand Up @@ -380,7 +380,7 @@ v^{\pi}(s)=E_{\pi}\left[R_{t+1}+\gamma v^{\pi}\left(s_{t+1}\right) \mid s_{t}=s\
$$
对于 Q 函数,我们也可以做类似的分解,也可以得到 Q 函数的 Bellman Expectation Equation,如式 (7) 所示:
$$
q^{\pi}(s, a)=E_{\pi}\left[R_{t+1}+\gamma q^{\pi}\left(s_{t+1}, A_{t+1}\right) \mid s_{t}=s, A_{t}=a\right] \tag{7}
q^{\pi}(s, a)=E_{\pi}\left[R_{t+1}+\gamma q^{\pi}\left(s_{t+1}, a_{t+1}\right) \mid s_{t}=s, a_{t}=a\right] \tag{7}
$$
**Bellman expectation equation 定义了你当前状态跟未来状态之间的一个关联。**

Expand Down

0 comments on commit 9707af5

Please sign in to comment.