In this article, we will learn how to plot a Scatter plot in Python using Matplotlib. But before we do that, we will learn what a Scatter plot is and what all options are there for us to use. Sounds good? Great! Then let us start!
Table of Contents
What Is A Scatter Plot?
A scatter plot is a type of plot that we can use to display values from two sets of data. So what happens is, we will take two sets of data of same length & pair them together. We then use this pair to plot the scatter plot. It is very important for you to remember here that both the data sets have to be of the same length!
But there is another thing to note here. It is that the scatter plot will have only points drawn and no lines in them. So in other words the points in the plot will not be connected together. But just the points scattered across the chart. Hence the name scatter plot!
What Is The Use Of A Scatter Plot?
We can use Scatter plot to see any correlation between two data sets. So what happens is, similar points get grouped together in the scatter plot. Now this can be a very valuable insight for us. Especially when looking at non linear relationships between the two datasets! Does that make sense?
So we can use a scatter plot to find any relationship between data points.
So now that we know the uses of a scatter plot, let us see how to plot it. But to do so, we need two sets of data. Right? So what do we do for that? Where can we find it?
How To Plot A Scatter Plot In Python
Here is what we will do. We will use our good old random function randn( ) from the Numpy library for that. Alright? So check this code:
import matplotlib.pyplot as plt import numpy as np x = np.random.randn(100) y = np.random.randn(100) plt.scatter(x, y); plt.show()
So this is the code that will generate us a scatter plot using two sets of random data. But what is going on here? Let us go through the code line by line:
So in line 1 and 2, we are importing our Matplotlib and Numpy libraries.
But what is going on in line 3 & 4? Well, as I said earlier, we need two sets of data for our scatter plot. So we are using Numpy’s randn( ) to generate these data sets. We will assign them to variables x & y.
Next in line 5, we call our good old plt module’s scatter( ) function and passing x & y to it. So this is the function that will generate our scatter plot using x & y data!
Finally we call the usual plt.show( ) function to display our resulting scatter plot. So here is how it looks like:
So as you can see, there are 100 points taken from x & y variables to be plotted along the X & Y axis. So if there are any points that are similar, they will converge together here!
How To Change The Size Of A Marker
The above chart is all well and good, but is there a way to control the size, color & marker type in it? Well, lucky for us Matplotlib does give us an option for this. So to do so, we need to set the s, c & marker parameters in our plt.scatter( ) function!
So here is a simple example code that changes the size and color of our plot markers:
import matplotlib.pyplot as plt import numpy as np x = np.random.randn(100) y = np.random.randn(100) size = 20*np.random.randn(100) colors = np.random.rand(100) marker = "^" plt.scatter(x, y, s=size, c=colors, marker=marker); plt.show()
And here is how our final scatter plot would look like:
So there you have it! This is how we plot a Scatter Plot in Python. I hope this was easy enough for you to follow. But if you have any doubts, do let me know in the comments below. I will be more than happy to help!
Alright? So then see you until next time. take care! 🙂