How to Produce 8 Meaningful EDA Visualisations in Under an Hour Using World Bank GDP Data

Exploratory Data Analysis (EDA) is one of the fastest ways to understand economic data. 



With just Pandas, Matplotlib, and Seaborn, you can produce high-quality insights from World Bank GDP datasets in under an hour.

In this tutorial, we will use World Bank GDP data to create 8 meaningful visualizations commonly used in:

  • economics,

  • business intelligence,

  • public policy,

  • and data science.

You can copy and paste all the code. 

Upload and Load the Dataset

We will assume you downloaded GDP data from the World Bank Data Catalog.

import pandas as pd
from google.colab import files

# Upload CSV file
uploaded = files.upload()

# Get uploaded file name
file_name = list(uploaded.keys())[0]

# Read the CSV file, skipping the initial metadata rows
df = pd.read_csv(file_name, skiprows=4)

# Preview the data
print(df.head())



Import Visualization Libraries

import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("whitegrid")



1. Histogram — Understand GDP Distribution

Histograms reveal how GDP values are distributed globally.

plt.figure(figsize=(8,5))

sns.histplot(df['2022'], bins=30) # Changed 'gdp_usd' to '2022'

plt.title('Distribution of GDP (2022)') # Updated title for clarity
plt.xlabel('GDP (USD)')
plt.show()



This quickly reveals:

  • skewness,

  • outliers,

  • and economic concentration.

Most GDP datasets are heavily right-skewed.


2. Bar Chart — Top 10 Largest Economies

Bar charts are excellent for rankings.

top10 = df.sort_values('2022', ascending=False).head(10)

plt.figure(figsize=(10,6))

sns.barplot(
    data=top10,
    x='2022',
    y='Country Name'
)

plt.title('Top 10 Economies by GDP (2022)')
plt.xlabel('GDP (USD)')
plt.ylabel('Country')

plt.show()






This helps identify dominant economies instantly.

3. Line Chart — GDP Growth Over Time

Line charts are essential for time-series analysis.

# Get all year columns (assuming they are strings that can be converted to int)
year_cols = [col for col in df.columns if col.isdigit() and int(col) >= 1960 and int(col) <= 2025]

# Melt the DataFrame to have years as a column
df_melted = df.melt(id_vars=['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code'],
                    value_vars=year_cols,
                    var_name='Year',
                    value_name='GDP')

# Convert 'Year' column to numeric
df_melted['Year'] = pd.to_numeric(df_melted['Year'])

# Filter for Kenya using the correct column name 'Country Name'
kenya = df_melted[df_melted['Country Name'] == 'Kenya']

plt.figure(figsize=(10,5))

sns.lineplot(
    data=kenya,
    x='Year', # Use the new 'Year' column
    y='GDP'   # Use the new 'GDP' column
)

plt.title('Kenya GDP Over Time')
plt.xlabel('Year')
plt.ylabel('GDP (current US$)')

plt.show()



Perfect for:

  • economic growth analysis,

  • forecasting,

  • and trend detection.


4. Box Plot — Detect Economic Outliers

Box plots expose extreme GDP values.

plt.figure(figsize=(8,5))

sns.boxplot(x=df['2022'])

plt.title('GDP Outliers (2022)')
plt.xlabel('GDP (USD)')

plt.show()



You will immediately see:

  • ultra-large economies,

  • global inequality,

  • and distribution spread.


5. Scatter Plot- GDP per Capita: 2021 vs 2022

Scatter plots show relationships between variables.

plt.figure(figsize=(8,6))

sns.scatterplot(
    data=df,
    x='2021',
    y='2022'
)

plt.title('GDP per Capita (2021) vs (2022)')
plt.xlabel('GDP per Capita (2021)')
plt.ylabel('GDP per Capita (2022)')

plt.show()



The scatter plot comparing GDP per capita for 2021 and 2022 reveals:

  1. Economic Consistency: Most countries tend to maintain similar GDP per capita levels year over year, forming a diagonal trend on the plot.
  2. Growth or Decline: Countries positioned above the diagonal line have experienced an increase in GDP per capita from 2021 to 2022, while those below have seen a decrease.
  3. Significant Shifts: Outliers or points far from the main diagonal highlight countries that have undergone substantial economic changes, either rapid growth or significant downturns, in that one-year period.


6. Heatmap — Correlation Analysis

Correlation heatmaps help identify relationships across metrics.

selected_years = ['2000', '2010', '2020', '2021', '2022']
corr = df[selected_years].corr()

plt.figure(figsize=(8,6))

sns.heatmap(corr, annot=True, cmap='coolwarm', fmt=".2f")

plt.title('GDP per Capita Correlations Across Selected Years')
plt.show()


This not ideal for the data but the illustration is perfect.

Useful for:

  • feature selection,

  • model preparation,

  • and economic research.


7. Pair Plot — Multi-Variable Exploration

Pair plots automate multiple visual comparisons.

selected_years_for_pairplot = ['2000', '2010', '2020', '2021', '2022']
sns.pairplot(
    df[selected_years_for_pairplot].dropna()
)

plt.suptitle('Pair Plot of GDP per Capita Across Selected Years', y=1.02)
plt.show()

This creates:

  • scatter plots,

  • histograms,

  • and variable relationships automatically.



Extremely useful during rapid EDA.


8. Violin Plot — Distribution by Region

Violin plots combine density and distribution analysis.

plt.figure(figsize=(10,6))

sns.violinplot(
    data=df,
    y='2022' )


plt.title('GDP Distribution (2022)')
plt.ylabel('GDP per Capita (2022)')

plt.show()



This reveals:

  • regional inequality,

  • spread,

  • and concentration patterns.


Why These 8 Visualizations Matter

Together, these charts help answer critical questions:

Visualization                        Main Purpose
Histogram                    Distribution analysis
Bar Chart                    Rankings
Line Chart                    Trends over time
Box Plot                    Outlier detection
Scatter Plot                    Relationships
Heatmap                    Correlation analysis
Pair Plot                    Multi-variable EDA
Violin Plot                    Distribution comparison


These are foundational visualizations used in:

  • data science,

  • machine learning,

  • BI dashboards,

  • and economic analytics.


You do not need advanced tools to produce meaningful EDA quickly. With:

  • Pandas,

  • Matplotlib,

  • and Seaborn,

you can generate professional-grade economic insights in less than an hour.

World Bank GDP data is especially powerful because it contains:

  • long-term trends,

  • global comparisons,

  • economic inequality patterns,

  • and strong statistical relationships.

Mastering these 8 visualizations gives you a strong foundation for deeper analytics, forecasting, and machine learning workflows.



Advance Your Career With 16 Python Projects in Data & ML — All for $288.

Comments

Popular posts from this blog

How to Filter Rows Using Boolean Indexing in Pandas (Afrobarometer Kenya Dataset)

How to Decide Whether to Drop or Fill Missing Value

How to create your first line chart with World Bank Kenya GDP data