How to Use Pandas pipe() to Chain Cleaning Steps Cleanly

April 30, 2026

Data cleaning code can quickly become messy when every transformation is written on a separate line.

pandas.pipe() helps you build cleaner, reusable, and more readable workflows.

Instead of nesting functions or rewriting DataFrames repeatedly, you can chain transformations step by step.

Why Use `pipe()`?

pipe() lets you pass a DataFrame through custom functions in sequence.

Benefits:

Cleaner code
Easier debugging
Reusable cleaning functions
Better readability for pipelines

Sample Dataset

Look for any Dataset here: FREE DATASETS

import pandas as pd

df = pd.read_csv("survey_data.csv")

print(df.head())

Step 1: Create Cleaning Functions

Each function takes a DataFrame and returns a cleaned DataFrame.

def remove_duplicates(df):
    return df.drop_duplicates()

def standardize_country(df):
    df["country"] = df["country"].str.strip().str.title()
    return df

def fill_missing_age(df):
    df["age"] = df["age"].fillna(df["age"].median())
    return df

Step 2: Chain Them With `pipe()`

df_clean = (
    df
    .pipe(remove_duplicates)
    .pipe(standardize_country)
    .pipe(fill_missing_age)
)

print(df_clean.head())

This creates a clean, readable transformation pipeline.

Without `pipe()`

df = remove_duplicates(df)
df = standardize_country(df)
df = fill_missing_age(df)

This works, but becomes harder to manage in larger projects.

Best Practices

Keep each function focused on one task
Use descriptive function names
Return the DataFrame from every function
Combine pipe() with method chaining for cleaner workflows

Final Thoughts

pipe() is one of the best ways to structure data cleaning workflows in pandas.
It makes your code modular, reproducible, and easier for teams to understand.

Advance Your Career With 16 Python Projects in Data & ML — All for $288.

Search This Blog

Practical Python for Data Engineering, Data Analysis & Machine Learning