How to Standardise Inconsistent Text Values (e.g. 'Nairobi' vs 'nairobi')

April 27, 2026

In real-world datasets, text columns are rarely clean. You’ll often see variations like "Kenya", "kenya ", "KENYA", or "Ken-ya".

Standardising these values is essential for reliable grouping, analysis, and reporting.

Below is a minimal, practical workflow using a CSV file in Python (Pandas).

1. Load the Data

import pandas as pd

df = pd.read_csv("survey_data.csv")

df.head()

2. Inspect the inconsistent column

df['country'].unique()

This helps you identify variations like:

extra spaces
inconsistent casing
spelling differences

3. Standardise text values

Step 1: Lowercase everything

Step 2: Strip spaces

Step 3: Replace known inconsistencies

df['country_clean'] = (
    df['country']
    .str.lower()
    .str.strip()
    .replace({
        'ken-ya': 'kenya',
        'k e n y a': 'kenya'
    })
)

4. Optional: Capitalise properly

If you want clean reporting format:

df['country_clean'] = df['country_clean'].str.title()

5. Validate results

df[['country', 'country_clean']].drop_duplicates()

Key idea

Standardisation is not just cleaning—it is making categories comparable so your analysis doesn’t split the same entity into multiple fake groups.

Advance Your Career With 16 Python Projects in Data & ML — All for $288.

Search This Blog

Practical Python for Data Engineering, Data Analysis & Machine Learning