How to Use Pandas pipe() to Chain Cleaning Steps Cleanly
Data cleaning code can quickly become messy when every transformation is written on a separate line.
pandas.pipe() helps you build cleaner, reusable, and more readable workflows.
Instead of nesting functions or rewriting DataFrames repeatedly, you can chain transformations step by step.
Why Use pipe()?
pipe() lets you pass a DataFrame through custom functions in sequence.
Benefits:
Cleaner code
Easier debugging
Reusable cleaning functions
Better readability for pipelines
Sample Dataset
import pandas as pd
df = pd.read_csv("survey_data.csv")
print(df.head())
Step 1: Create Cleaning Functions
Each function takes a DataFrame and returns a cleaned DataFrame.
def remove_duplicates(df):
return df.drop_duplicates()
def standardize_country(df):
df["country"] = df["country"].str.strip().str.title()
return df
def fill_missing_age(df):
df["age"] = df["age"].fillna(df["age"].median())
return df
Step 2: Chain Them With pipe()
df_clean = (
df
.pipe(remove_duplicates)
.pipe(standardize_country)
.pipe(fill_missing_age)
)
print(df_clean.head())
This creates a clean, readable transformation pipeline.
Without pipe()
df = remove_duplicates(df)
df = standardize_country(df)
df = fill_missing_age(df)
This works, but becomes harder to manage in larger projects.
Best Practices
Keep each function focused on one task
Use descriptive function names
Return the DataFrame from every function
Combine
pipe()with method chaining for cleaner workflows
Final Thoughts
pipe() is one of the best ways to structure data cleaning workflows in pandas.
It makes your code modular, reproducible, and easier for teams to understand.
Advance Your Career With 16 Python Projects in Data & ML — All for $288.
Comments
Post a Comment