A machine learning training method in which an agent in an uncertain environment is rewarded for optimal behaviors and punished for undesired behaviors, allowing the agent to learn from trial-and-error.
A machine learning training method in which an agent in an uncertain environment is rewarded for optimal behaviors and punished for undesired behaviors, allowing the agent to learn from trial-and-error.