How to Predict GDP Growth from Socioeconomic Indicators Using World Bank Data

Economic growth forecasting is one of the most valuable applications of machine learning in economics, finance, and public policy. 



Governments, investors, development organizations, and businesses all rely on GDP growth forecasts to make strategic decisions.

In this tutorial, you will learn how to build a machine learning model that predicts GDP growth using socioeconomic indicators from the World Bank Open Data Platform.

We will use indicators such as:

  • Inflation

  • Unemployment

  • Population growth

  • Exports

  • Education enrollment

  • Foreign direct investment (FDI)

  • Internet penetration

The target variable will be GDP growth annual percentage.

The World Bank provides over 16,000 indicators across hundreds of countries through its data platform and API. (World Bank Data Help Desk)


Why GDP Growth Prediction Matters

GDP growth measures how fast an economy expands or contracts over time.

The World Bank defines GDP growth as the annual percentage growth rate of GDP at market prices based on constant local currency. (World Bank Open Data)

Economists use GDP growth forecasts to:

  • Evaluate economic stability

  • Predict recessions

  • Analyze investment opportunities

  • Compare country performance

  • Study the impact of policies

Machine learning helps identify nonlinear relationships between socioeconomic variables and economic performance.


Step 1: Install Required Libraries

pip install pandas numpy scikit-learn matplotlib seaborn wbdata

We will use:

  • pandas for data manipulation

  • scikit-learn for machine learning

  • wbdata to pull World Bank indicators



Step 2: Import Libraries

import pandas as pd
import numpy as np
import wbdata
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, r2_score



Step 3: Select World Bank Indicators

We will predict GDP growth using several socioeconomic indicators.

Indicator                                        World Bank Code
GDP GrowthNY.GDP.MKTP.KD.ZG
InflationFP.CPI.TOTL.ZG
UnemploymentSL.UEM.TOTL.ZS
Population GrowthSP.POP.GROW
Internet UsersIT.NET.USER.ZS
Exports (% GDP)NE.EXP.GNFS.ZS
School EnrollmentSE.SEC.ENRR

The GDP growth indicator is officially available through the World Bank database. (World Bank Open Data)


Step 4: Download Data from the World Bank

indicators = {
    'NY.GDP.MKTP.KD.ZG': 'gdp_growth',
    'FP.CPI.TOTL.ZG': 'inflation',
    'SL.UEM.TOTL.ZS': 'unemployment',
    'SP.POP.GROW': 'population_growth',
    'IT.NET.USER.ZS': 'internet_users',
    'NE.EXP.GNFS.ZS': 'exports',
    'SE.SEC.ENRR': 'school_enrollment'
}

data = wbdata.get_dataframe(indicators)

df = data.reset_index()

print(df.head())


This creates a dataset containing multiple countries and years.


Step 5: Clean the Dataset

World Bank datasets often contain missing values.

df = df.dropna()

You can also use imputation if needed:

df = df.fillna(df.mean(numeric_only=True))

The World Bank regularly updates datasets and methodologies, so cleaning is essential before modeling. (Data Topics)


Step 6: Define Features and Target

Our target variable is GDP growth.

X = df.drop(columns=['gdp_growth'])

# Remove non-numeric columns
X = X.select_dtypes(include=np.number)

y = df['gdp_growth']



Step 7: Split the Data

We separate training and testing data.

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42
)


A typical split is:

  • 80% training

  • 20% testing


Step 8: Train the Model

We will use a Random Forest Regressor.

model = RandomForestRegressor(
    n_estimators=200,
    random_state=42
)

model.fit(X_train, y_train)



Random Forest works well because GDP growth relationships are often nonlinear.


Step 9: Make Predictions

predictions = model.predict(X_test)


Step 10: Evaluate the Model

We use:

  • MAE (Mean Absolute Error)

  • R² Score

mae = mean_absolute_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

print("MAE:", mae)
print("R²:", r2)


The R² metric measures how much variance the model explains.

An R² score closer to 1 indicates stronger predictive performance.


Step 11: Analyze Feature Importance

Understanding which socioeconomic indicators drive GDP growth is extremely valuable.

importance = pd.DataFrame({
    'Feature': X.columns,
    'Importance': model.feature_importances_
})

importance = importance.sort_values(
    by='Importance',
    ascending=False
)

print(importance)


You may discover that:

  • Inflation strongly impacts growth

  • Internet penetration correlates with productivity

  • Exports influence developing economies

  • Education levels improve long-term GDP performance


Example Insights

A model trained on World Bank data might reveal:

Feature                                Importance
Inflation0.29
Exports0.22
Internet Users0.18
Population Growth0.14
Unemployment0.10
School Enrollment0.07

These relationships vary by country and time period.


Common Challenges

1. Missing Data

Many countries have incomplete records.

2. Multicollinearity

Some indicators are highly correlated.

3. Time Dependency

GDP growth depends heavily on historical trends.

4. Economic Shocks

Pandemics, wars, and inflation crises can reduce prediction accuracy.

Research shows that deep learning and recursive forecasting approaches are increasingly being used for long-term GDP forecasting. (arXiv)



Predicting GDP growth from socioeconomic indicators combines:

  • Economics

  • Data engineering

  • Machine learning

  • Time-series analysis


The World Bank dataset is one of the best sources for global economic modeling because it provides standardized indicators across decades and countries.

As you advance, you can improve your model using:

  • XGBoost

  • LightGBM

  • LSTMs

  • Panel data modeling

  • Time-series forecasting

  • Feature lagging

  • Country clustering

GDP forecasting is not just an academic exercise. It powers investment strategies, national policy planning, risk analysis, and global development forecasting.


Advance Your Career With 16 Python Projects in Data & ML — All for $288.


Comments

Popular posts from this blog

How to Filter Rows Using Boolean Indexing in Pandas (Afrobarometer Kenya Dataset)

How to Decide Whether to Drop or Fill Missing Value

How to create your first line chart with World Bank Kenya GDP data