How to use df.head(), df.info(), and df.describe() to explore any dataset

Learn how to quickly explore any dataset in Python using df.head(), df.info(), and df.describe() for a fast overview of data structure, types, and summary statistics.


When you first load a dataset in Python with pandas, it’s crucial to understand its structure and contents. Three core commands give you a rapid overview:

1. df.head()

  • Shows the first 5 rows by default (you can pass a number to view more).

  • Helps check if the data loaded correctly and inspect sample values.

import pandas as pd

df = pd.read_csv("your_dataset.csv")
print(df.head())
print(df.head(10))  # first 10 rows


2. df.info()

  • Displays a concise summary of the DataFrame.

  • Key details: number of rows, columns, column names, non-null counts, and data types.

df.info()


Output includes:

  • Total rows and columns

  • Column names

  • Data type of each column (int64, float64, object, etc.)

  • Number of non-null entries (useful for spotting missing data)

3. df.describe()

  • Provides summary statistics for numerical columns.

  • Includes count, mean, std (standard deviation), min, max, and quartiles (25%, 50%, 75%).

df.describe()



Optional: Include include='all' to get statistics for all columns, including categorical ones:

df.describe(include='all')



Key Takeaways:

  • df.head() → preview sample rows

  • df.info() → understand structure, types, missing values

  • df.describe() → get numerical summaries quickly

These three commands give a fast, reliable first look at any dataset before deeper analysis.



Advance Your Career With 16 Python Projects in Data & ML — All for $288.


Comments

Popular posts from this blog

How to Filter Rows Using Boolean Indexing in Pandas (Afrobarometer Kenya Dataset)

How to Decide Whether to Drop or Fill Missing Value

How to create your first line chart with World Bank Kenya GDP data