Plotting Data Visualisation on the Map of India using GeoPandas in Python
For data scientists, data visualization is a very important step to show some useful insights. Not only bar charts, line graphs, and scatter plots are useful, but also maps are very helpful to know our data better. In this blog, I will share my experience of plotting a map of India using GeoPandas.
What is Geopandas?
GeoPandas is an open-source project to make working with geospatial data in python easier. In my opinion, GeoPandas is one of the most satisfying Python packages to use because it produces a tangible, visible output that is directly linked to the real world. Here we will be exploring the method to create a geo map and visualize data over it, using shapefiles(.shp) and some other Python libraries.
I strongly encourage you to look at the official documentation, to see all the cool things GeoPandas is capable of. Hopefully, you find this tutorial helpful and exciting! All of the relevant data and notebooks can be found on my GitHub page here.
Let’s begin…
Step 1 : Installing GeoPandas and Shapely
This instruction assumes that you have Anaconda Navigator installed on your machine and use either Jupiter Notebook or Spyder IDE for python development. We will need to install the GeoPandas and Shapely libraries in order to plot a map, and these libraries do not come with the Anaconda download.
There are two ways to install GeoPandas, the recommended way, as per the documentation (available at http://geopandas.org) is through the conda-forge channel with the following command from your terminal:
conda install -c conda-forge geopandas
or alternatively
pip install geopandas
Be sure to add an !
to if pip installing from a jupyter notebook or Spyder IDE.
If the above does not work out for you then check out the video below.
Shapely can be pip installed with the command pip install shapely
. Documentation for Shapely can be found at https://pypi.org/project/Shapely/
Step 2 : Importing the libraries
The next step is to import your libraries. We will need the following, all of which can be installed quickly from the command line with a quick “pip install <package_name>”
import numpy as np
import pandas as pd
import matplotlib.pyplot as pltimport seaborn as sns
import geopandas as gpd
import shapefile as shpfrom shapely.geometry import Pointsns.set_style('whitegrid')
Step 3 : Download the mapping data
To download the shapefile click here. Download the file and unzip it. Keep all of the files in the same folder. We’ll be using the shapefile (.shp) to map, but all files need to remain in the folder in order for it to work properly.
Step 4 : Load the data
Load the data into a GeoDataFrame as shown below. Note that your file path may be different.
fp = r'Maps_with_python\india-polygon.shp'
map_df = gpd.read_file(fp)
map_df_copy = gpd.read_file(fp)
map_df.head()
Step 5 : Plotting the Shapefiles
One nice thing about downloading a GeoDataFrame, is that we’ve already got enough info to make a basic plot. Try executing map_df.plot()
Step 6 : Adding better data insights into the map
Now we will plot a map of only landslides that have happened within India. Click here to find the dataset and data processing part.
df = pd.read_csv('globallandslides.csv')
pd.set_option('display.max_columns', None)df = df[df.country_name=="India"]
df["Year"] = pd.to_datetime(df["event_date"]).dt.year
df = df[df.landslide_category=="landslide"]ls_df["admin_division_name"].replace("Nāgāland", "Nagaland",inplace = True)
ls_df["admin_division_name"].replace("Meghālaya", "Meghalaya",inplace = True)
ls_df["admin_division_name"].replace("Tamil Nādu", "Tamil Nadu",inplace = True)
ls_df["admin_division_name"].replace("Karnātaka", "Karnataka",inplace = True)
ls_df["admin_division_name"].replace("Gujarāt", "Gujarat",inplace = True)
ls_df["admin_division_name"].replace("Arunāchal Pradesh", "Arunachal Pradesh",inplace = True)state_df = ls_df["admin_division_name"].value_counts()
state_df = state_df.to_frame()
state_df.reset_index(level=0, inplace=True)
state_df.columns = ['State', 'Count']state_df.at[15,"Count"] = 69
state_df.at[0,"State"] = "Jammu and Kashmir" state_df.at[20,"State"] = "Delhi"
state_df.drop(7)
Now you have the data of the number of landslides that have happened in each state over the years. We be now plotting this data in our previous India map as a choropleth map.
Step 7 : Merge the data
We can now merge this above state data which contains landslide information with map_df shapefile. We will use the State column of both the data frames for merging purposes.
#Merging the data
merged = map_df.set_index('st_nm').join(state_df.set_index('State'))
merged['Count'] = merged['Count'].replace(np.nan, 0)
merged.head()
Step 8 : Plotting the data on the Shapefile
#Create figure and axes for Matplotlib and set the title
fig, ax = plt.subplots(1, figsize=(10, 10))
ax.axis('off')ax.set_title('Number of landslides in India state-wise', fontdict={'fontsize': '20', 'fontweight' : '10'})# Plot the figure
merged.plot(column='Count',cmap='YlOrRd', linewidth=0.8, ax=ax, edgecolor='0',legend=True,markersize=[39.739192, -104.990337], legend_kwds={'label': "Number of landslides"})
We can also plot the latitudes and longitudes of the occurred landslides on the map along with the names of states as shown below. For this, you will need to find the shapefile data for Indian states and their latitudes and longitudes. You can find them here. Also, you have to plot data points on the map. The process is very similar to the above explanation but needs further data pre-processing. You can check out the entire implementation in my ipynb file on GitHub.
Happy Plotting!!