In this article, we will learn how to plot a Bar Chart in Python using Matplotlib library. In our earlier articles, we learnt how to plot a Histogram and line plots. So this is going to be a follow up on that. But this time with Bar Charts!

So are you ready to learn about it? Great! Then let us start right away!

Plot Bar Chart In Python Using Matplotlib – The Basics

So to begin with, we will start with the basics. Alright? Because if we get our basics right, everything else will become very clear. I hope you agree with me! So here is the first basic thing to know about:

What Is A Bar Chart?

A bar chart is a chart that uses rectangular bars whose length is equal to the value it represents. Now this bar can either be a vertical bar or a horizontal bar. Alright? So it can be used in either ways. But the main thing is that it’s width is the value that it represents!

So it is quite simple then. Right? But how are we going to plot a bar chart using Python? Any Guess? Yep, using our good old Matplotlib library!

So how does the code of a simple bar chart using Matplotlib look like? Let us take a look at it next!

Example Plot Of A Bar Chart In Python Using Matplotlib

So here is our example code that shows us how we can do this:

Okay. It looks pretty straight forward. So what is going on here?

As you can see in the first line, we are importing the standard plt module from the Matplotlib library. Now this module is like the Swiss army knife of the library. Because this is the module that has all the plotting functions.

So in this case, since we need to plot a bar chart, we will call the bar( ) function! Alright? So that is what we did in line 2.

plt.bar([10, 12, 15], [18, 6, 24])

But what are those two list values we are passing here? They are the X & Y co-ordinates. So the first list [10, 12, 15] gives the x-axis co-ordinate values. It is where the left margin of our bars will be drawn. On the other hand, the second list [18, 6, 24] gives us the height of the bars!

And finally in line3, we call the plt.show( ) function that will display our bar chart!

So how does our final Bar Chart look like? Well, take a look at it for yourself!

So there you go! That is how we plot a bar chart in Python using Matplotlib. It is quite easy to plot. Right? But if you still have any doubts, do let me know in the comments below. I will be more than happy to help! ðŸ™‚

So in this article, we will take a look at how we can plot histogram in Python using Matplotlib library. Now this is some thing quite different from the basics of line plotting we have seen so far. So I know you will need some time to get through it. So what I will do is to go through it in an easy to understand way. Alright?

So relax, take a cup of coffee if you want to. As we will now look into the plot of histogram in Python using Matplotlib!

Plot Histogram In Python Using Matplotlib – The Basics

To get started, let us first learn a bit about what Histogram plot is. And then we will look at other questions. Like where it is used and how to draw it using Matplotlib. Okay? Great! So here we go!

What is an Histogram?

A histogram is a way to display frequencies of some thing. So how does it look like? In simple words, it is drawn using bars.

Oh wait a second! So does that mean that it is a kind of bar graph? Yeah you are right. Kind of!

So what happens is, the data that you want to show in an histogram is grouped together. But it does not mean that they are grouped randomly. But instead, similar data items are grouped together. Alright? Does that make sense? So when you plot, you will be plotting these grouped data on the chart. Okay?

Now there is one other thing. In Matplotlib, we call these groups of data as bins.

What are Histogram bins?

So a histogram bin is nothing but a group of similar data. That is all it is. So there is nothing really confusing about it!

Alright. So now that we know what a histogram bin in Matplotlib is, it is time for us to see an example of it. So how do we go about creating a plot of histogram in Python? Here is an example of it.

Plot Histogram In Python Using Matplotlib – Example

So we all know that to start a plot of something, we need data. Right? So how do we get this data? Since histogram is used to plot a lot of data, we cannot create it by hand. So what do we do then? We will have to take help of a library. Of course!

And what better library than NumPy to get a set of random numbers. Right? So that is what we will do. We will use Numpy to generate a bunch of random numbers.

But how many random numbers shall we use? 10, 50 or 100? Naah! We can surely go more than that. Right? So how about using 1000 random numbers? ðŸ˜‰

So here is the piece of code we will use to generate 1000 random numbers using Numpy!

import numpy as np
y = np.random.randn(1000)

That is it! That is all the code we need to create 1000 random numbers using Numpy! So easy. Right?

So now that we have our data ready, let us see how we can plot it as a Histogram using Python’s Matplotlib.

So the code to plot a histogram using Matplotlib looks like this:

import matplotlib.pyplot as plt
import numpy as np
y = np.random.randn(1000)
plt.hist(y);
plt.show()

That’s it! We just import pyplot module and call it’s hist( ) function with our data. And the Matplotlib library does the rest. It will go ahead and plot a Histogram in Python for us!

This is very easy right? And that is the beauty of Matplotlib library. The modules and functions are so well written that you can create beautiful histogram plot in Python easily!

So then how does the final output plot of the Histogram look like? Well, you see it for yourself!

Matplotlib Histogram Bins

Woah! What happened here? We gave it 1000 input data points right? What happened to all of it then? Well let me explain. Here is what Matplotlib has done.

It has taken our 1000 data input and grouped them together into 10 bins. And then it created the above histogram!

So why 10 bins? Why not 12 or 15 or any other number? Now that is a valid question for you to ask. So let me tell you why the number 10.

It is because that is the default number of bins Matplotlib will create for any number of input data you give to it. Okay? Does that make sense?

So in simple terms – Matplotlib took our 1000 data & grouped closer numbers together into 10 bins. It then went on to create the above histogram plot!

So that is all there is to it! But what if we want to have more than 10 bins? Well, we will come to that soon, but not now. Because it is going to need it’s own article that I will write next!

In this tutorial, we will learn how to use Matplotlib to add legend to an existing plot. We can use Matplotlib to visualize data in different forms such as bar plots, charts, lines etc. However, none of these plots will be meaningful untill a legend is added to them. So, we need to first learn what a legend is. Why it is useful in a Matplotlib plot. And finally, we will learn how to write a simple Python program to achieve this.

What is a legend in Matplotlib plot?

A legend in a Matplotlib plot is a small infobox, whcih helps us in understanding what the plot is representing.

For example, let us take a look at an existing plot from our previous tutorial. It looks like this:

We can see that it is a two dimensional plot. It’s axes labels also tells us what data is used to plot it. However, there is one thing that is still not clear. We are seeing three lines drawn in the plot above. But what does each of these lines represent? That is the question that a legend of a plot will answer.

Each of the above three lines for example could be representing the acceleration of a metro line. So the orange line in the plot could be representing an orange metro line train’s acceleration. Green line in the plot corresponds to a green metro line’s acceleration and so on.

So to represent this information on the plot, we need to make use of legends.

Now that we understand what a legend in a plot is, let us learn how to add one to the above plot.

How to add legend to an existing Matplotlib plot

So let us go back to the Python code from our previous tutorial. It looked like this:

import matplotlib.pyplot as plt
x = range(1, 10)
plt.plot(x, [xi*1 for xi in x])
plt.plot(x, [xi*2 for xi in x])
plt.plot(x, [xi*3 for xi in x])
plt.grid()
plt.axis([0, 20, 0 , 40])
plt.xlabel('This is the X axis label')
plt.ylabel('This is the Y axis label')
plt.title('Dummy Plot')
plt.show()

So we had added axes labels and title for the plot in our previous tutorials. It is now time to add legends to the plot above. From the code, we can see that the three lines where generated by multiplying their x-axis values by 1, 2 & 3 respectively. In other words, we got the blue line by multiplying x-axis values with 1. Similarly, orange line by multiplying x-axis values with 2. And finally, green line by multiplying with 3.

So our Matplotlib plot should have a legend that shows Blue=1x, Orange=2x and Green=3x. Do you agree?

Now, to draw a legend in the output of a Matplotlib plot, we will make use of a special function called legend(). Go through theMatplotlib legend function’s documentation. We can see that we can add legend to a plot by simply passing legend’s texts as a list argument to this function. So with this in mind, if we add a line of code like this:

plt.legend(['Blue=1x', 'Orange=2x', 'Green=3x'])

then we should be getting our desired output.

So, our final code to add legend to Matplotlib plot will look something like this:

import matplotlib.pyplot as plt
x = range(1, 10)
plt.plot(x, [xi*1 for xi in x])
plt.plot(x, [xi*2 for xi in x])
plt.plot(x, [xi*3 for xi in x])
plt.legend(['Blue=1x', 'Orange=2x', 'Green=3x'])
plt.grid()
plt.axis([0, 20, 0 , 40])
plt.xlabel('This is the X axis label')
plt.ylabel('This is the Y axis label')
plt.title('Dummy Plot')
plt.show()

Notice that we added our legend() function call right after plotting the three lines. The legend values in legend() is also passed in the same order as to how they are plotted. That is we are plotting the lines in the order blue, orange and green. So our legend function’s parameter also lists legends for blue, orange and green in that same order.

With this, our final output plot looks like this:

Conclusion

Using Matplotlib to add legend to an existing plot is not difficult. It is as simple as calling Matplotlib library’s legend() function. However, we have to ensure that we are passing the legend parameters in the correct order. So that is all for this tutorial. If you still have any questions about this, do let us know in the comment below. So until next time, ciao! ðŸ™‚

While working on Matplotlib, we can change the axes size of its output plots. Matplotlib provides us with specific functions to modify individual axes values. So we can write Python programs to modify these axes size.

In our previous tutorial, we created a simple Matplotlib plot of multiple lines along with gridlines. However, in that plot we can see tht the size of each of the two axis where auto-determined. Since we used x & y values ranging between 1-10 & 0-30 respectively, axis size was also so to the same range.

However, we can actually change this. We can use Matplotlib to change axes size by making use of its appropriate features.

Understanding How Matplotlib Changes Axes Size

So let us go back to our previous plot, which looked like this:

The code we used to generate the above chart looked like this:

import matplotlib.pyplot as plt
x = range(1, 10)
plt.plot(x, [xi*1 for xi in x])
plt.plot(x, [xi*2 for xi in x])
plt.plot(x, [xi*3 for xi in x])
plt.grid()
plt.show()

As mentioned earlier, we can see from the above code that x-axis values ranges between 1 & 10. On the other hand, values of y-axis is determined by the 3 lines we plotted on the graph. Their values where calculated by multiplying the values of x by 3 different values – 1, 2 & 3.

So by analyzing this, we can see that the highest y value achieved is from line number three. But line is being drawn using the code:

plt.plot(x, [xi*3 for xi in x])

So, we can see that the highest value of y it can achieve is when we multiply the highest value of x with 3. But x ranges between 1 & 10. So the highest value that y can achieve is:

yi = xi*3 where xi=9 (because it is less than 10)
yi = 9 * 3
yi = 27

Hence, the highest value of y is 27. Our graph is also confirming this. We can see that the value of y axis of our 3rd line is not going beyond 27.

So with this knowledge, Matplotlib is drawing the x-axis of the plot to be up to 10. But on the other hand, it is stretching the y-axis to 30. It did this to accommodate the highest y-axis value of 27 of our 3rd plot.

Programming Matplotlib To Change Axes Size

So now that we understand how Matplotlib calculates the axes values automatically, we will now learn how we can change this. In order for us to achieve this, we will use yet another function of Matplotlib.

Matplotlib Axis Function

In order to control the size of our plot axes, Matplotlib provides us with another function called the axis function. The signature of this function looks like this:

matplotlib.pyplot.axis(*args, **kwargs)
where args = [xmin, xmax, ymin, ymax]

From the above signature, we can see that we can set the minimum and maximum values of x & y axis using xmin, xmax, ymin and ymax. You should also keep in mind that we need to pass these parameters as a Python list variable.

However this is not it. There is one another interesting feature of axis(). It is that if we simply call it without passing any parameters, it will return the current values of xmin, xmax, ymin ymax!

So axis() acts like both a GET function and a POST function.

One more thing to keep in mind while using axis() is that we need to call it before calling our plt.show().

Now enough of the theory behind this function. Let us understand it better by exploring it with our example plot.

Code For Matplotlib Change Axes Size

So we will now modify our code to include axis() function call as follows:

import matplotlib.pyplot as plt
x = range(1, 10)
plt.plot(x, [xi*1 for xi in x])
plt.plot(x, [xi*2 for xi in x])
plt.plot(x, [xi*3 for xi in x])
plt.grid()
plt.axis()
plt.show()

When we run this program, what we get is the current size of the axes of our plot:

So the above code returned us with the current size of our plot. Let us now modify this code further so that it can change the size of our plot axes values. To do this, let us modify our code like this:

import matplotlib.pyplot as plt
x = range(1, 10)
plt.plot(x, [xi*1 for xi in x])
plt.plot(x, [xi*2 for xi in x])
plt.plot(x, [xi*3 for xi in x])
plt.grid()
plt.axis([0, 20, 0 , 40])
plt.show()

By adding the parameters (0, 20, 0, 40) to our plot axis function, we have increased the size of both our axes. So the x-axis is extended to 20(xmax=20) while the y-axis is extended to 40 (ymax=40).

When we now run this program again, we will finally get this Matplotlib output plot:

From the above plot, we can clearly see that the x-axis is increased upto 20 while the y-axis of the plot is increased to 40. So this is how we can use the axis() provided by Matplotlib to change xxes size of our output graph plot.

I hope this tutorial was helpful to you. If you still have any questions about it, do let me know in the comments below. So until next time, ciao! ðŸ™‚

In this article, we will take a look at range vs arange in Python. Learning the difference between these functions will help us understand when to either of them in our programs. Both range and arange functions of Python have their own set of advantages and disadvantages. This article will help us learn about them in detail.

To better understand the difference between range vs arange functions in Python, let us first understand what each of these functions’ do.

range vs arange in Python: Understanding range function

The range function in Python is a function that lets us generate a sequence of integer values lying between a certain range. The function also lets us generate these values with specific step value as well . It is represented by the equation:

range([start], stop[, step])

So, in the above representation of the range function, we get a sequence of numbers lying between optional start & stop values. Next, each of these values are also getting incremented by the optionalstep values.

range function example 1

So that was the theory behind the range function. Now, let understand it better by practicing using it. Fire up your Python IDLE interpreter and enter this code:

l = range(1, 10, 2)

When you hit enter on your keyboard, you don’t see anything right? That is because the resulting sequence of values are stored in the list variable “l”.

To be able to see the values stored in it, let us print individual list values. So, by indexing each of the list items, we get the following values printed out.

So we see a list of numbers, cool! But when you observe closely, you will realize that this function has generated the numbers in a particular pattern. You can see that the first number it has generated is after taking into consideration our optionalstart parameter. We had set its value to 1. Next, it is also honoring the stop value by printing the value 9, which is within our defined stop value of 10. If you try to index the list for any further value beyond this point will only return an error:

>>> l[5]
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
l[5]
IndexError: range object index out of range

So, this confirms that the last value we get will always be less than the stop value.

But the most important thing to observation we need to make here is the step size between each of these values. We can see that each of these values are incremented exactly with a step size of 2. This is the same step size we had defined in our call to range function (the last parameter)!

range function example 2

Does this mean everytime we want to call range function, we need to define these 3 parameters? Not really. If we take a look at the signature of the arange function again:

range([start], stop[, step])

The parameters start and step (mentioned within the square brackets) are optional. This means that we can call the range function without these values as well like this:

In this case, we have only the stop value of 4. As a result we get our sequence of integers starting with a default value of 0. The step value is also defaulted to a value of 1. So we get the integers in the range between 0 to 3, with a step value of 1.

Alright then, hope everything is clear to you up to this point. If this is the case with Python’s range function, what does the arange function in Python do?

range vs arange in Python: Understanding arange function

Unlike range function, arange function in Python is not a built in function. But instead, it is a function we can find in the Numpy module. So, in order for you to use the arange function, you will need to install Numpy package first!

The signature of the Python Numpy’s arange function is as shown below:

numpy.arange([start, ]stop, [step, ]dtype=None)

Wait a second! Doesn’t this signature look exactly like our range function? Yes, you are right! Python’s arange function is exactly like a range function. It also has an optional start and step parameters and the stop parameter.

But then what is the difference between the two then?

range vs arange in Python – What is the difference?

Where the arange function differs from Python’s range function is in the type of values it can generate.

The built in range function can generate only integer values that can be accessed as list elements. But on the other hand, arange function can generate values that are stored in Numpy arrays. We can observer this in the following code:

import numpy as np
a = np.arange(4)
>>> a
array([0, 1, 2, 3])
>>> a[0]
0

We are clearly seeing here that the resulting values are stored in a Numpy array. Each of the individual values can hence also be accessed by simply indexing the array!

range vs arange in Python – Advantages & Disadvantages

This begs us the next question. When should we use Python’s built-in range function vs Numpy’s arange function? To understand this, lets list out the advantages and disadvantages of each of these functions:

Advantages of range function in Python

range function returns a list of integers in Python 2. In case of Python 3, it returns a special “range object” just like a generator.

Disadvantages of range function in Python

range function is considerably slower

It also occupies more memory space when handling large sized data objects.

Advantages of arange function in Python

Numpy’s arange function returns a Numpy array

Its performance is wat better than the built-in range function

When dealing with large datasets, arange function needs much lesser memory than the built-in range function.

So this is the fundamental difference between range vs arange in Python. We can understand them even better by using them more in our everyday programming.

I hope this gave you some amount of clarity on the subject. If you still have any more doubts, do let me know about it in the comments below. I will be more than happy to help you out.

Having said that, take a look at this article. It gives you a simple explanation on the “Difference between expressions and statements in Python“. I have spent considerable amount of time trying to understand these topics. Since there are not many articles available that explains them clearly, I started this blog to capture these topics. Hope you found them useful! If yes, do share them with your friends so that it can help them as well.

With this, I will conclude this article right here. See you again in my next article, until then, ciao!

In this article, we are going to learn how to extract data from a website using Python. The term used for extracting data from a website is called “Web scraping” or “Data scraping”. We can write programs using languages such as Python to perform web scraping automatically.

In order to understand how to write a web scraper using Python, we first need to understand the basic structure of a website. We have already written an article about it here on our website. Take a quick look at it once before proceeding here to get a sense of it.

The way to scrape a webpage is to find specific HTML elements and extract its contents. So, to write a website scraper, you need to have good understanding of HTML elements and its syntax.

Assuming you have good understanding on these per-requisites, we will now proceed to learn how to extract data from website using Python.

How To Fetch A Web Page Using Python

The first step in writing a web scraper using Python is to fetch the web page from web server to our local computer. One can achieve this by making use of a readily available Python package called urllib.

We can install the Python package urllib using Python package manager pip. We just need to issue the following command to install urllib on our computer:

pip install urllib

Once we have urllib Python package installed, we can start using it to fetch the web page to scrape its data.

For the sake of this tutorial, we are going to extract data from a web page from Wikipedia on comet found here:

This wikipedia article contains a variety of HTML elements such as texts, images, tables, headings etc. We can extract each of these elements separately using Python.

How To Fetch A Web Page Using Urllib Python package.

Let us now fetch this web page using Python library urllib by issuing the following command:

will import the urllib package’s request function into our Python program. We will make use of this request function send an HTML GET request to Wikipedia server to render us the webpage. The URL of this web page is passed as the parameter to this request.

As a result of this, the wikipedia server will respond back with the HTML content of this web page. It is this content that is stored in the Python program’s “content” variable.

The content variable will hold all the HTML content sent back by the Wikipedia server. This also includes certain HTML meta tags that are used as directives to web browser such as <meta> tags. However, as a web scraper we are mostly interested only in human readable content and not so much on meta content. Hence, we need extract only non meta HTML content from the “content” variable. We achieve this in the next line of the program by calling the read() function of urllib package.

read_content = content.read()

The above line of Python code will give us only those HTML elements which contain human readable contents.

At this point in our program we have extracted all the relevant HTML elements that we would be interested in. It is now time to extract individual data elements of the web page.

How To Extract Data From Individual HTML Elements Of The Web Page

In order to extract individual HTML elements from our read_content variable, we need to make use of another Python library called Beautifulsoup. Beautifulsoup is a Python package that can understand HTML syntax and elements. Using this library, we will be able to extract out the exact HTML element we are interested in.

We can install Python Beautifulsoup package into our local development system by issuing the command:

pip install bs4

Once Beautifulsoup Python package is installed, we can start using it to extract HTML elements from our web content. Hope you remember that we had earlier stored our web content in the Python variable “read_content“. We are now going to pass this variable along with the flag ‘html.parser’ to Beautifulsoup to extract html elements as shown below:

from bs4 import BeautifulSoup
soup = BeautifulSoup(read_content,'html.parser')

From this point on wards, our “soup” Python variable holds all the HTML elements of the webpage. So we can start accessing each of these HTML elements by using the find and find_all built-in functions.

How To Extract All The Paragraphs Of A Web Page

For example, if we want to extract the first paragraph of the wikipedia comet article, we can do so using the code:

pAll = soup.find_all('p')

Above code will extract all the paragraphs present in the article and assign it to the variable pAll. Now pAll contains a list of all paragraphs, so each individual paragraphs can be accessed through indexing. So in order to access the first paragraph, we issue the command:

pAll[0].text

The output we obtain is:

\n

So the first paragraph only contained a new line. What if we try the next index?

pAll[1].text
'\n'

We again get a newline! Now what about the third index?

pAll[2].text
"A comet is an icy, small Solar System body that..."

And now we get the text of the first paragraph of the article! If we continue further with indexing, we can see that we continue to get access to every other HTML <p> element of the article. In a similar way, we can extract other HTML elements too as shown in the next section.

How To Extract All The H2 Elements Of A Web Page

Extracting H2 elements of a web page can also be achieved in a similar way as how we did for the paragraphs earlier. By simply issuing the following command:

h2All = soup.find_all('h2')

we can filter and store all H2 elements into our h2All variable.

So with this we can now access each of the h2 element by indexing the h2All variable:

So there you have it. This is how we extract data from website using Python. By making use of the two important libraries – urllib and Beautifulsoup.

We first pull the web page content from the web server using urllib and then we use Beautifulsoup over the content. Beautifulsoup will then provides us with many useful functions (find_all, text etc) to extract individual HTML elements of the web page. By making use of these functions, we can address individual elements of the web page.

So far we have seen how we could extract paragraphs and h2 elements from our web page. But we do not stop there. We can extract any type of HTML elements using similar approach – be it images, links, tables etc. If you want to verify this, checkout this other article where we have taken similar approach to extract table elements from another wikipedia article.