[RL/Value-based]SARSA强化学习(Reinforcement Learning)Building BlocksValue-based RLOn Policy,遵循当前策略更新:$$Q(s,a)\leftarrow Q(s,a)+\alpha[r+\gamma Q(s',a')-Q(s,a)]$$思想:策略内更新,更保守、更稳定