Categories
DATA MINING DATA SCIENCE DATA VISUALIZATION PROGRAMMING PYTHON TUTORIALS

Matplotlib Add Text Inside A Plot Using Python

In this article, we will learn how to add text inside a plot using Matplotlib. But not just that. We will also look at how we can position this text inside the plot.

In our earlier articles, we had already seen how we can add titles and axis labels to our plot. Right? So this article will be a follow up of that. But only that we will be adding text inside the plot itself. This is unlike others where we added labels and titles outside the plot.

So how do we go about doing this? How can we add text inside a plot? If we can, can it be positioned wherever we want within the plot? These are some of the questions we will be answering next!

So let us start with the first question:

How To Add Text Inside A Matplotlib Plot Using Python

So to answer this, let us draw a simple plot using the following code:

import matplotlib.pyplot as plt
import numpy as np
x = np.arange(0, 20, 1)
y = np.arange(0, 2, 0.1)
plt.plot(x, y)
plt.show()

So this will give us a simple plot with a linear line that looks like this:

A simple line plot
A simple line plot

Okay. This is great! But now how do we go about adding a piece of text inside this plot? Let me say I want a piece of text that reads “This is cool!” written above the blue line, but somewhere in the middle? Can I do that in Matplotlib?

The text( ) function

Of course you can! Matplotlib provides you with a very specific function for you to do just that! The function is called the text( ) function and it is part of the plt module of Matplotlib.

So what does the signature of this text( ) function look like? Here it is:

plt.text(x, y, text)

In the above text( ) function signature,w e see that it takes in 3 parameters – x, y, & text. But what do they do? Well, the parameters x & y gives the co-ordinates where the text is to be written. While the “text” parameter specifies the text we want to write!

So how does this all workout for us where we want to say “This is cool!” above the blue line? Well, first we need to set the x & y co-ordinates.

From the plot above, we can choose a value that is above the line and somewhere in the middle. Right? So what would be that value? Well how about it being 5 along the x-axis and 1 along the y-axis? In which case, we can call the text( ) function as follows:

plt.text(5, 1, 'This is cool!')

We will call this function just before plotting. So our final code will then become:

import matplotlib.pyplot as plt
import numpy as np
x = np.arange(0, 20, 1)
y = np.arange(0, 2, 0.1)
plt.text(5, 1, 'This is cool!')
plt.plot(x, y)
plt.show()

So how will the plot look like for this code? Take a look at it for yourself!

Matplotlib Add Text Inside A Plot
Matplotlib Add Text Inside A Plot

Now that is what we wanted right?!

So there you have it. This is how we can add text inside a plot in Matplotlib using Python. Hope it was pretty easy to follow. But if you have any questions, do let me know in the comments below. I will be more than happy to help!

So until next time, have a nice day! 🙂

Categories
DATA MINING DATA SCIENCE DATA VISUALIZATION PROGRAMMING PYTHON TUTORIALS

Matplotlib Plot Multiple Lines On Same Graph Using Python

In this tutorial, we will learn how to use Python library Matplotlib to plot multiple lines on the same graph. Matplotlib is the perfect library to draw multiple lines on the same graph as its very easy to use. Since Matplotlib provides us with all the required functions to plot multiples lines on same chart, it’s pretty straight forward.

In our earlier article, we saw how we could use Matplotlib to plot a simple line to connect between points. However in that article, we had used Matplotlib to plot only a single line on our chart. But the truth is, in real world applications we would often want to use Matplotlib to plot multiple lines on the same graph. This tutorial will explain how to achieve this.

But before we start, let us first create the dataset required for our tutorial.

Creating dataset for Matplotlib to plot multiple lines on same graph

Just like we did in our previous tutorial, we will simply generate our sample dataset using Python’s range function.

Now, if you are unfamiliar with Python’s built-in range function, take a look at this tutorial we wrote about it earlier.

So the code to generate multiple datasets with Python’s range function looks like this:

import matplotlib.pyplot as plt
x = range(1, 10)

With this, we now have our sample dataset saved in the Python variable x. So its time to start using these values to plot our chart.

Using Matplotlib to plot multiple lines on same graph

Wait a second! Using above code we got only one set of data. How are we going to use it to plot multiple lines on the same graph?

Well, the answer is that we can convert this single dataset into 3 different datasets by simply multiplying it with different values. So for example to create three different datasets we can do something like:

Pseudocode:
===========
Dataset1 = 1 * x
Dataset2 = 2 * x
Dataset3 = 3 * x

Cheeky huh?! 😉

So with our required datasets in place, we can start writing our code to draw different lines out of it. Here we go!

plt.plot(x, [xi*1 for xi in x])
plt.plot(x, [xi*2 for xi in x])
plt.plot(x, [xi*3 for xi in x])

So these three lines of code is all that is required to draw 3 different lines on our graph using Matplotlib. But are you not so sure what these three lines are doing? Then let me explain the first line of code first. This should make the rest of the code clear as well, right?

Let us take a look at the first line of code again:

plt.plot(x, [xi*1 for xi in x])

What we are doing over here is that we are calling Maplotlib’s plot function. The only job of this plot function is to draw or plot our data on the Matplotlib’s canvas!

But if you are not familiar with Matplotlib’s canvas, you should first read this article on Introduction to Matplotlib.

Understanding Matplotlib plot function’s parameters

From this first line of code, you can also notice that we are passing two parameters to our plot function. The first parameter we pass is simply the value of x. This forms the x-axis values for our plot. On the other hand, the second parameter forms the y-axis values of our plot. But what is exactly happening here? Why is the second parameter looking so complicated?

[xi*1 for xi in x]

What we are doing over here is that we are looping over each value of x and multiplying it by 1. So this is the final value that we use for the y-axis of the plot.

Now if you are able to keep up with me so far, then you will know what the last two lines of code does as well, right? You can see that they also do the same thing as our first line of code. The only difference is that they multiply the y-axis values with different co-efficients.

Display the final output graph

So far we in the code generated our sample dataset and drawn three lines using it on the same graph. But if you have followed along with me and typed in the code, you realize that no image is yet displayed. But why so? The reason is that we might have drawn the image on Matplotlib canvas, but we haven’t displayed it yet. In order to display our final output image, we still need to call one another function:

plt.show()

This Matplotlib’s show function is the one that is responsible to display the output on our screen. This function does not take any parameters as seen. But calling this in your code is a must if you want to display the graph on screen.

So with just these 6 lines of code, we have been able to make Matplotlib plot multiple lines on same graph.

Here is the final output image drawn using the above piece of code.

Matplotlib plot multiple lines on same graph
Matplotlib plot multiple lines on same graph

One another interesting thing for us to note here is that Matplotlib has plot these multiple lines on the same graph using different colors. This is a built-in feature found in Matplotlib. If it needs to plot more than one line on the same graph, it automatically chooses different colors for different lines! In this way, we will be able to differentiate different datasets represented on a single chart. Isn’t it cool & beautiful? 😉

Conclusion

So this is how we can make Matplotlib plot multiple lines on the same graph. By using Python’s Matplotlib and writing just 6 lines of code, we can get this result.

Here is the final summary of all the pieces of code put together in a single file:

import matplotlib.pyplot as plt
x = range(1, 10)
plt.plot(x, [xi*1 for xi in x])
plt.plot(x, [xi*2 for xi in x])
plt.plot(x, [xi*3 for xi in x])
plt.show()

Matplotlib is an easy to use Python visualization library that can be used to plot our datasets. We can use many different types of datasets and Matplotlib will still be to handle them. By familiarizing ourselves with this wonderful Python library package, we are adding new tool into our arsenal.

Hope this tutorial was easy enough for you to understand. If you have any queries on Matplotlib or Python in general, do not forget to comment below. I will try my best to provide you with all the helpful answers I can give. With this, I will conclude this tutorial on Matplotlib. Until next time, ciao!

Categories
DATA MINING DATA SCIENCE DATA VISUALIZATION PROGRAMMING PYTHON TUTORIALS

Using Matplotlib To Draw Line Between Points

In this tutorial, let us learn how to use Matplotlib to draw line between two points. Matplotlib is a Python library package that we can use to draw lines, charts and other plots. It takes in datasets as its input and converts them into plots and graphs. Therefore, it can help us in visualizing and interpreting our datasets is a much more better way.

In order for us to be able to use Matplotlib to draw line between two points, first ensure that Matplotlib is installed on your computer. Once it is confirmed, let us now first create a set of data points that we want to plot.

Creating dataset for Matplotlib to draw line between points

One of the simplest way for us to create our dataset is by calling Python’s built-in range function.

Check this tutorial to learn more about Python’s built-in range function.

So, let us now start writing our Python plotting program:

import matplotlib.pyplot as plt

In this line of code, we are simply importing the pyplot submodule of the Matplotlib library as plt. Hence from now onwards we can call it by simply calling the plt variable.

Next, let us generate our desired dataset using Python’s range function.

x = range(5)

As can be seen here, we are asking the range function to provide us with a sequence of integers from 0 to 4. That is because mentioning an upper limit of 5, we have limited the range between 0 to 5. The step size will also default to 1. As a result, our dataset will now look like this:

x = [0, 1, 2, 3, 4]

So now that we have our dataset, its time to plot these values using Matplotlib.

Using Matplotlib to draw line between points

Since we have already imported Matplotlib’s Pyplot submobule, we can right away start using it to plot our line. Pyplot provides us with a very handy helper function called plot to plot our line.

The general syntax of our plot function looks like this:

plot([x], y, [fmt], *, data=None, **kwargs)

As can be seen above, plot takes in an optional x-axis value. However y-axis values are a must for plot function to work. On the other hand, plot function also takes in additional parameters such as an optional [fmt], data etc. You can refer to the official documentation for this function to learn more about how to use it.

However, for our case, we will simply use our dataset as our y-axis parameters. Since x-axis is optional, we can leave it blank. By doing so, Matplotlib will automatically start filling in these values starting with a value of 0 and incrementing it by 1 for each extra intervals. Hence, our code for plotting will simply look like this:

plt.plot([xi for xi in x])

What we are doing here is simply passing each of the values of our dataset x as plot functions y-axis parameters.

However, we are still not done here. The code written up until now would have drawn our line connecting the points of data. However, in order for us to be able to display it to the end user, we need to call another function called the “show” function. So, we still need to add this final line of code into our program:

plt.show()

With this, we should be able to see a plot drawn by Matplotlib that is drawing a line between our data points. It looks something like this:

Final result image of using Matplotlib to draw line between points
Line between points drawn using Matplotlib

Conclusion

Combining all the above piece of code in a single place will give our final code that looks like below:

import matplotlib.pyplot as plt
x = range(5)
plt.plot([xi for xi in x])
plt.show()

So this is it! With just these four lines of code, we are able to make use of Matplotlib to draw line between points. I hope this tutorial was pretty straight forward. If you have any more queries or simply want to say hi to me, please leave a comment below! Until next time, ciao!

Categories
DATA MINING DATA SCIENCE HTML JAVASCRIPT PROGRAMMING PYTHON STATIC WEBSITES TUTORIALS WEB DEVELOPMENT WEB SCRAPING

How To Extract Data From A Website Using Python

In this article, we are going to learn how to extract data from a website using Python. The term used for extracting data from a website is called “Web scraping” or “Data scraping”. We can write programs using languages such as Python to perform web scraping automatically.

In order to understand how to write a web scraper using Python, we first need to understand the basic structure of a website. We have already written an article about it here on our website. Take a quick look at it once before proceeding here to get a sense of it.

The way to scrape a webpage is to find specific HTML elements and extract its contents. So, to write a website scraper, you need to have good understanding of HTML elements and its syntax.

Assuming you have good understanding on these per-requisites, we will now proceed to learn how to extract data from website using Python.

Python logo on extracting data from a web page using Python
Python Web Scraper Development

How To Fetch A Web Page Using Python

The first step in writing a web scraper using Python is to fetch the web page from web server to our local computer. One can achieve this by making use of a readily available Python package called urllib.

We can install the Python package urllib using Python package manager pip. We just need to issue the following command to install urllib on our computer:

pip install urllib

Once we have urllib Python package installed, we can start using it to fetch the web page to scrape its data.

For the sake of this tutorial, we are going to extract data from a web page from Wikipedia on comet found here:

https://en.wikipedia.org/wiki/Comet

This wikipedia article contains a variety of HTML elements such as texts, images, tables, headings etc. We can extract each of these elements separately using Python.

How To Fetch A Web Page Using Urllib Python package.

Let us now fetch this web page using Python library urllib by issuing the following command:

import urllib.request
content = urllib.request.urlopen('https://en.wikipedia.org/wiki/Comet')

read_content = content.read()

The first line:

import urllib.request

will import the urllib package’s request function into our Python program. We will make use of this request function send an HTML GET request to Wikipedia server to render us the webpage. The URL of this web page is passed as the parameter to this request.

content = urllib.request.urlopen('https://en.wikipedia.org/wiki/Comet')

As a result of this, the wikipedia server will respond back with the HTML content of this web page. It is this content that is stored in the Python program’s “content” variable.

The content variable will hold all the HTML content sent back by the Wikipedia server. This also includes certain HTML meta tags that are used as directives to web browser such as <meta> tags. However, as a web scraper we are mostly interested only in human readable content and not so much on meta content. Hence, we need extract only non meta HTML content from the “content” variable. We achieve this in the next line of the program by calling the read() function of urllib package.

read_content = content.read()

The above line of Python code will give us only those HTML elements which contain human readable contents.

At this point in our program we have extracted all the relevant HTML elements that we would be interested in. It is now time to extract individual data elements of the web page.

How To Extract Data From Individual HTML Elements Of The Web Page

In order to extract individual HTML elements from our read_content variable, we need to make use of another Python library called Beautifulsoup. Beautifulsoup is a Python package that can understand HTML syntax and elements. Using this library, we will be able to extract out the exact HTML element we are interested in.

We can install Python Beautifulsoup package into our local development system by issuing the command:

pip install bs4

Once Beautifulsoup Python package is installed, we can start using it to extract HTML elements from our web content. Hope you remember that we had earlier stored our web content in the Python variable “read_content“. We are now going to pass this variable along with the flag ‘html.parser’ to Beautifulsoup to extract html elements as shown below:

from bs4 import BeautifulSoup
soup = BeautifulSoup(read_content,'html.parser')

From this point on wards, our “soup” Python variable holds all the HTML elements of the webpage. So we can start accessing each of these HTML elements by using the find and find_all built-in functions.

How To Extract All The Paragraphs Of A Web Page

For example, if we want to extract the first paragraph of the wikipedia comet article, we can do so using the code:

pAll = soup.find_all('p')

Above code will extract all the paragraphs present in the article and assign it to the variable pAll. Now pAll contains a list of all paragraphs, so each individual paragraphs can be accessed through indexing. So in order to access the first paragraph, we issue the command:

pAll[0].text

The output we obtain is:

\n

So the first paragraph only contained a new line. What if we try the next index?

pAll[1].text
'\n'

We again get a newline! Now what about the third index?

pAll[2].text
"A comet is an icy, small Solar System body that..."

And now we get the text of the first paragraph of the article! If we continue further with indexing, we can see that we continue to get access to every other HTML <p> element of the article. In a similar way, we can extract other HTML elements too as shown in the next section.

How To Extract All The H2 Elements Of A Web Page

Extracting H2 elements of a web page can also be achieved in a similar way as how we did for the paragraphs earlier. By simply issuing the following command:

h2All = soup.find_all('h2')

we can filter and store all H2 elements into our h2All variable.

So with this we can now access each of the h2 element by indexing the h2All variable:

>>> h2All[0].text
'Contents'
>>> h2All[2].text
'Physical characteristics[edit]'

Conclusion

So there you have it. This is how we extract data from website using Python. By making use of the two important libraries – urllib and Beautifulsoup.

We first pull the web page content from the web server using urllib and then we use Beautifulsoup over the content. Beautifulsoup will then provides us with many useful functions (find_all, text etc) to extract individual HTML elements of the web page. By making use of these functions, we can address individual elements of the web page.

So far we have seen how we could extract paragraphs and h2 elements from our web page. But we do not stop there. We can extract any type of HTML elements using similar approach – be it images, links, tables etc. If you want to verify this, checkout this other article where we have taken similar approach to extract table elements from another wikipedia article.

How to scrape HTML tables using Python