# Q-Learning Introduction, Usage, Disadvantages Explained with Example

Table of Contents

#### What is Q-learning Algorithm?

Q-learning is an off-policy algorithm i.e. because Q-learning learns from actions and that are outside the policy. The agent in the Q-learning algorithm doesn’t learn in a single step i.e. it learns step-by-step i.e. e.g.

The agent takes the first action, thus from that action, it learns something, not the whole. Thus, as the further steps are taken, it’s going to learn, so at the end of the whole steps, it becomes fully trained, thus also it becomes very fast.

The Q-learning algorithm finds the reward before getting the action to take place, as this algorithm learns from its previous rewards.

This Reinforcement model has some main components like Agent, Action, Environment, Reward.

- In Environment, the agent learns about what it has to do.
- Here, Reward is the reward given to the agent by the environment on selecting a particular action.
- The state is a state of an agent in that environment.
- Action is the collection of actions the agent can perform.

#### Example

There are also many algorithms to find the shortest path between two points like one of the popular **Dijkstra algorithms**. This algorithm can be said as a smart planner, as it doesn’t calculate for all the paths like in Q-learning. It does the calculations for all the steps without mattering it’s going to take or not. But in Dijkstra’s algorithm, it is going to do the calculation only for a particular path, i.e. it mainly calculates for that path only, which is most likely to take it, rather than calculating for the all.

There is also the **Bellman-Ford Algorithm**. This also calculates the shortest path as like Dijkstra and Q-learning do but there are some differences.

**How Bellman-ford algorithm is different from Dijkstra’s algorithm?**

Bellman-Ford Algorithm works also for the negative edge i.e. if there is an edge of negative weight then it also calculates that. But in Dijkstra’s, if there is a negative edge then it’s not going to calculate it.

#### Time Complexity

Dijkstra is more time-consuming as compared to Bellman-Ford Algorithm. Bellman is less time-consuming. But in Q-learning, first, it takes time as it calculates for every path there is possible, no matter it is short or not. But, after calculating for every path it stores that value and predicts the value for another path on its basis. Thus, first, it takes time but as it’s going on, its speed is going to increase and calculate more fastly than Bellman-Ford Algorithm and Dijkstra’s Algorithm.

It is best to use the Q-learning algorithm, its concepts, and its code in a C-programming language as it C language is a faster language and thus, it does calculations very easily for Q-learning Algorithm.

#### Usage of Reinforcement Learning

Reinforcement learning (RL) can be used to solve a high amount of complex problems. Also, it can correct the errors itself that mainly occurs in the code or during the runtime.

It is used basically in the ML i.e. in Machine Learning as Reinforcement learning when there is only one solution or way to collect the information to perform the task and that way is to interact with it.

#### Disadvantages of Reinforcement Learning

But there are also some disadvantages of Reinforcement learning. It can diminish the results due to the lead of overloading of states in the paths.

#### Deep Learning vs Reinforcement Learning

Some learnings are popularly known as Deep learning and Reinforcement learning. In deep learning, the machine learns from a data set. (Data set is a set or collection of data in which there is a lot of information stored even for a particular data so that the machine can learn its best from that data set and then applies that learning to a new data set created by it. )

Whereas in Reinforcement learning the learning process is not done from the data sets. It’s done eventually from its every action as in RL. It mainly learns from the feedback or the output of the previous one step or state. With the help of this, it can maximize its awards. And **Q-learning algorithms are also based on Reinforcement Learning**.

This is why the Q-learning algorithm is an **off-policy model-free Reinforcement Learning Algorithm**.