Categories
DATA MINING DATA SCIENCE DATA VISUALIZATION PROGRAMMING PYTHON TUTORIALS

Introduction To Matplotlib – Data Visualization In Python

This article will give you an introduction to Matplotlib Data Visualization In Python. Matplotlib is a data visualization library package written specifically for us to be used with Python. So Matplotlib is usually the preferred Python package to visualize data while working on Machine Learning & Data Science. It helps us in visualizing the data by representing them with the help of plots and charts.

Brief Introduction To Matplotlib – Data Visualization In Python

We humans are all highly responsive to images than text messages. Images helps us in better visualizing and understanding a situation over interpreting any raw data. So we always wanted a way to represent data through images. If you look at our history, we have always tried to accomplish this in many ways. While I cant go back in time to explain each and every approach, I can quicky give you an historical introduction to Matplotlib. This should help you understand how this package came to be. Why it matters a lot in Python data visualization.

Historical Introduction To Matplotlib – Data Visualization

In the early days of computer data analysis, data scientists often relied on tools like gnuplot and MATLAB to visualize data. However, the problem here was that they had to do it in two stages. First use programming languages like C or Python scripts to process the data. Then plot the resulting data output using gnuplot or MATLAB.

It was a very cumbersome process to say the least. It also resulted in erroneous calculation at times due to lengthy process. As a result, the scientists were in dire need of a simpler solution to this. This is when Matplotlib – Data Visualization package in Python was born.

Matplotlib Official Logo
Matplotlib Official Logo

This helped scientists to both process the data using Python scripts and also visualize the resulting output using Matplotlib package. Now since the Matplotlib was developed along the lines of MATLAB, it is supporting all the functionalities of MATLAB. Because of this reason, it got embraced by the data scientists over MATLAB pretty quickly.

Now you may be wondering why you should be using Matplotlib over MATLAB. For you to understand and appreciate Matplotlib package, you need to understand some of the benefits it brings over a tool like MATLAB. Discussing the advantages and disadvantages of this library in an article that gives an introduction to Matplotlib is appropirate I believe. This will help you in making appropriate decision while chosing this Python library package.

Advantages Of Matplotlib

  • Matplotlib Is Open Source – One of the primary advantage of Matplotlib is that it is an open source package. Because of this, you can use it in whatever way you want. You don’t need to pay any money for this tool to anyone. You can also use it for both academic and commerical purposes.
  • Written In Python – Yet another benefit of Matplotlib is that it is all written in Python. As you use Python programming language to do data processing, plotting its result in Python again makes it so much more easier.
  • Customizable & Extensible – As its written in Python, you will also be able to customize the package (if required) to suit your requirements. In addition to this, you can always extend its functionalities and contribute it back to the open source community. Since Python also has other useful packages, you can also make use of those packages’ functionalities to extend Matplotlib.
  • Portable & Cross Platform – Since its written in Python it is easily portable to any system which can run Python. It also works smoothly on Windows, Linux & Mac OS.
  • Easy to learn – Because the Python language is much easier to learn, any packages written using this language becomes so much more easier. You will not find Matplotlib any different either when it comes to this.

Matplotlib Output formats

When we plot any data using Matplotlib, we can get the resulting output plots in two different ways.

The first method is to get the resulting output plots in a new window. This is useful if you want an interactive data output. In this case, your output will continue to display for as long as your program is running.

The second method of output plots you can obtain is by saving them permanently on your computers. In this method, your resulting output will be saved in a file in standard image formats such as PNG, JPG, PDFs etc. This will be very useful to you when you want to share your results with others. You can also make use of this method when you want to generate a lot of output charts or plots programmatically. But in this case, you have one disadvantage. You will not be able to interact with the resulting output like the way you could in the first method. But as I mentioned earlier, if you just want to analyze the results in bulk at a later time, this is still one of the best method to make use of.

So you might be wondering now, what other different formats can you use to save the resulting output. Let me help you right there! So this is the list of file formats supported by Matplotlib that you can use to save your resulting plots:

EPS, JPG, PDF, PNG, PS, SVG

So now you understand the different formats that we can store our outputs in. Next, let me introduce you to another important feature of Matplotlib. It is called Backends and this terminology is something that you should be aware of when working with Matplotlib.

Introduction To Matplotlib: What Are Backends?

Backends In Matplotlib is a feature using which you can either visualize the output live or store it to analyze them in the future.

As I mentioned in the previous section, we can store the resulting plots in either files or view it live in a new interactive window. Backends simply represent this factor. So from the previous paragraphs we can already realize that there are two types of backends in Matplotlib:

  • Hardcopy backend – This is the type of backend where we save the images in a file
  • User Interface backend – In this type of backend, we display the resulting output in an interactive output window.

In order to provide us with these two type of backends, Matplotlib makes use of two sub modules called the renderer and the canvas. Let us try to learn more about them to get a better understanding of Matplotlib backends.

What is a renderer in Matplotlib?

A renderer is a module used by Matplotlib to draw its output plot or the graph. So it is this module that does the actual drawing of our Matplotlib’s output. The standard renderer used by Matplotlib to render its output is called the Anti-Grain Geometry(AGG) renderer.

Now are you wondering what this rendere do? Want to know what is so special about this renderer? This AGG renderer is a high performance renderer. It helps us in getting generating a publication level quality output. It also helps us in obtaining our output with sub pixel accuracy and antialiasing. Is this all sounding too alien of a terminology for you? Then simply know that the AGG renderer helps in getting a high production quality graphs and plots that we can adore about!

Now that we understand about the renderer used by Matplotlib, let us turn our focus towards the second module – canvas

What Is A Canvas In Matplotlib?

After getting an understanding about renderer, its time for us to learn about the other module – canvas of Matplotlib.

We mentioned that the renderer is responsible for the drawing. But do you know where exactly is this drawing being done at? That is where the canvas module comes into picture. Canvas is the area where the renderer will draw our out plots and graphs. Ok, so who provides this canvas then? Great question!

Canvas in Matplotlib is usually provided by the GUI libraries. So this can be GTK if you are using a Linux machine. Or it can be WX if you are on a Mac OS. On the other hand if you are using Windows machines, these could be coming in from Windows GUI libraries. But these are not all. It could also be coming in from platform agnostic interfaces like QT if you are developing your visualization tools in QT.

So basically we get the canvas from the GUI library we intend to use in our Visualization app.

But do you really need to worry about it? Do you really need to know about canvases and renderers when using Matplotlib? Well, for the most cases, not really. You can simply use Matplotlib’s functions and get away with generating your visualizations. You dont really need to know the underlying aspects of how Matplotlib works to use it. But if you want to be a good visual data scientist, knowing how Matplotlib works under the hood will help you in mastering it. It will also help you in truly appreciating its versatility.

Conclusion

So that is it. This should give you a good introduction to Matplotlib – data visualization and why you should use it. In the next set of articles, we will learn how to install the library package. We will learn how to make use of it to draw some interesting plots and graphs. So see you there! Until then, have a great learning experience!

If you are interested in using Matplotlib to add text to any image, here is a quick tutorial link describing how you can do this.

2 replies on “Introduction To Matplotlib – Data Visualization In Python”

Leave a Reply

Your email address will not be published. Required fields are marked *