Reinforcement Learning in Machine Learning
After the supervised and unsupervised learning approaches to machine learning that we’ve seen in the previous articles, we arrive at the last sub-division: Reinforcement Learning. This is a type of ML where a model learns to navigate a given environment by making decisions and seeing the outcomes of those decisions. The model receives positive feedback for a right decision and negative feedback for a bad decision. The models learn on their own, without labels to guide them, unlike supervised learning. Unlike supervised learning, the agent learns autonomously utilizing feedback and no labeled data in Reinforcement Learning.
RL is excellent as a strategy in areas where decisions need to be made in a sequence and applications with a long-term goal. Example: playing chess, robotics etc.
In reinforcement learning, a model’s purpose is to maximize accuracy and efficiency by obtaining the maximum number of positive rewards.
There are two major types of reinforcement learning:
- Positive Reinforcement Learning
- Negative Reinforcement Learning
1. Positive Reinforcement Learning:
Here, we keep increasing the amount of positive feedback given. This impacts how the agent behaves and increases the probability of good decision making. But as a downside, it also has the possibility of overload of states.
2. Negative Reinforcement Learning:
This is the opposite of positive reinforcement learning. Here, we provide negative feedback to decrease the probability of bad decisions. This increases the probability of good decisions, relatively. It is more effective than positive reinforcement learning but only works up to a minimum.
Working of reinforcement learning:
1. The model explores the environment and familarises itself with the problem with it.
2. It makes a decision as to how to interact with its environment. If the decision is a good one, it gets feedback that it made a good decision aka positive feedback. Else, it receives negative feedback.
3. Based on the feedback, the probability of decision making is affected
Approaches to reinforcement learning:
Reinforcement learning has three major approaches to it.
- Value-based
- Policy based
- Model-based
1. Value-based approach:
In this approach a model operates based on an optimal value function that finds the maximum value at a state under any policy.
2. Policy-based approach:
Policy based approach has models find the optimal policy without using a value function. Here, the model applies a policy where the decision at each step maximizes the reward. There are two types:
- Deterministic: At any state, the policy produces same action.
- Stochastic: Here, policy determines which action to make..
3. Model-based approach:
In this approach,we create a model virtually for the environment and it explores the environment to learn. The models are environment-specific so there can be no one solution.
Terminology in reinforcement learning:
1. Agent:
An entity that perceives, understands and interacts with its environment
2. Environment:
Reinforcement learning assumes a stochastic environment which means the environment acts randomly.
3. Action:
Any decisions an agent makes as to how to interact with its environment
4. State:
The result of every action (i.e) how the environment reacts to the agent
5. Reward:
The feedback that is returned to the agent from the environment after ever action/decision. It helps the agent assess itself.
6. Policy:
The technique that an agent uses to decide on what action to do next.
7. Value:
The rewards of an agent’s decisions
Difference between Reinforcement learning and Supervised learning:
Supervised learning is a training approach in which a competent supervisor curates a data set and feeds it to the training algorithm. The supervisor is in charge of gathering this data, which consists of a collection of samples like as photos, textual fragments, or audio recordings, each with a criteria that categorises the sample. A supervised learning algorithm’s primary goal is to extrapolate and generalise, or to generate predictions for cases not included in the training dataset.
RL is a distinct machine learning approach. RL does not need a supervisor or a pre-labeled environment.
Difference between Reinforcement learning and Unsupervised learning:
Because RL does not need supervision, it is vital to distinguish it from unsupervised learning, another model of dee learning. The training data in unsupervised learning is not categorized, and the goal is to discover the underlying pattern in the data. The model can cluster similar instances or predict the distribution function that produced the instances if it is aware of this underlying pattern. The discovery of this underlying pattern does not address the RL challenge of maximising the payoff at the conclusion of a path. The understanding of a concealed pattern in the agent’s knowledge, on the other hand, can accelerate the learning process.
The barter between exploration and exploitation is an issue specific to RL algorithms.
Reinforcement learning algorithms:
1. SARSA (State Action Reward State Action)::
While learning with a specified policy, SARSA’s on-policy control technique picks the action for each state. Its purpose is to compute the Q (s, a) for the given current policy and all pairings of (s-a). It selects additional actions and rewards based on the same policy that decided the initial action. It gets its name from its usage of the quintuple Q(s, a, r, s’, a’). Where,
2. DQN (Deep Q Neural Network):
DQN, as the term implies, is a Q-learning algorithm that employs neural networks. Defining and updating a Q-table in a major state space setting will be a difficult and complicated undertaking.
A DQN algorithm can be used to address such a problem. Rather than establishing a Q-table, the neural network estimates the Q-values for every operation and condition.
3. Q-Learning:
Q-learning, which is built on the Bellman formula, is a common model-free reinforcement training technique.
The basic goal of Q-learning is to establish the policy that will tell the actor about what tasks should be performed to maximise the payoff and by what conditions. It is a non-policy RL that aims to determine the optimal response to do in the given situation.
In Q-learning, the agent’s purpose is to maximise the value of Q. The Bellman formula may be used to calculate the value of Q-learning. Check out the following equation:
Applications of reinforcement learning:
1. Marketing:
Training to evaluate rising products and new consumers to maximize profit.
2. Robotic applications:
Robots can perform in new roles while retaining their learned knowledge.
3. Game playing:
RL can figure out the optimal approach to a game.
4. Self-driving cars:
RL helps to figure out the best way to get to point B from point A
Features of reinforcement learning:
1. In actuality, the agent is not taught about the environment or what tasks must be performed.
2. It is centered on the hit-and-trial method.
3. The agent performs the next task and modifies its state based on the input from the prior interaction.
4. Agent may be rewarded afterwards.
5. The environment is stochastic and needs exploration in order to maximize rewards.
Advantages of reinforcement learning:
1. Concentrates on the issue overall
RL does not break down the problem into subproblems; instead, it strives to optimise the long-term payoff. It has a clear purpose, knows the objective, and is capable of foregoing short-term advantages in exchange for long-term advantages.
2. No need for an additional data gathering procedure.
In RL, training data is the knowledge of the learning agent, not a distinct collection of data that must be provided to the algorithm.
3. Works in a fast-paced, unpredictable setting.
RL algorithms are fundamentally adaptable, meaning they are designed to respond to changes in the environment.
Challenges in reinforcement learning
1. Experience
A RL agent requires experience. RL approaches produce training data on their own by interacting with the environment. As a result, the speed at which data is gathered is constrained by the environment.
2. Delayed gratification
Quick incentives can be traded off for long-term advantages by the learning agent. While this fundamental notion makes RL valuable, it also makes the agent’s identification of the best policy challenging.
3. Inability to interpret
Once an RL agent has learnt the best policy and is in the environment, it acts on new knowledge. The cause for these behaviors may not be clear to an outside observer. This leads to confusion and ambiguity.
Future of reinforcement learning
Tremendous development has been achieved in the field of deep reinforcement learning in past decades. Deep reinforcement learning models the value function (value-based) or the agent’s policy (policy-based) or both (actor-critic). Complex characteristics had to be designed to train an RL algorithm prior to the mainstream popularity of deep neural networks. As a result, learning ability was limited, restricting the scope of RL to basic contexts.
Models may be generated using deep learning employing millions of trainable weights, saving the user from arduous feature engineering. During the training phase, relevant characteristics are created automatically, allowing the agent to learn optimum rules in complicated contexts.
Traditionally, RL is used on a single job at a time. Each job is taught to a different RL agent, and these agents do not share their information. Learning complicated activities, such as driving a car, becomes inefficient and sluggish as a result. Problems with a shared information source, linked underlying structure, and interdependence can benefit greatly from enabling numerous actors to collaborate. By training several agents concurrently, multiple agents can share the same representation of the system, enabling advancements in one agent’s efficiency to be exploited by another.
Summary
This article has covered reinforcement learning. Reinforcement learning is a branch of machine learning that studies how AI algorithms should operate in a specific environment to get the best possible solution. Reinforcement learning, along with supervised and unsupervised learning, is one of the three main machine learning techniques.