How to Use Cross-Validation Instead of a Single Train/Test Split

When building machine learning models, one of the first things you learn is to split your data into training and testing sets.




The training set teaches the model patterns in the data, while the testing set evaluates how well the model performs on unseen observations.


Although this approach is simple and widely used, it has an important limitation: your results depend heavily on how the data was split.


A model that achieves 90% accuracy on one train/test split might achieve only 85% accuracy on another.


This is where cross-validation becomes valuable.


Cross-validation provides a more reliable estimate of model performance by evaluating the model multiple times on different subsets of the data.



The Problem with a Single Train/Test Split

Suppose you have a dataset containing 569 observations.

A typical workflow looks like this:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42
)

The model is trained once and tested once.

The issue is that the selected test set may not represent the entire dataset.

If the test set happens to be unusually easy or difficult, your evaluation metrics can be misleading.

As a result, decisions based on a single split may not reflect real-world performance.


What Is Cross-Validation?

Cross-validation repeatedly splits the data into training and testing portions.


The most common technique is K-Fold Cross-Validation.


In K-Fold Cross-Validation:

  1. The dataset is divided into K equal parts.

  2. One part is used for testing.

  3. The remaining parts are used for training.

  4. The process repeats until every part has served as the test set.

  5. The evaluation scores are averaged.


For example, in 5-Fold Cross-Validation:

  • Fold 1 tests on Part 1

  • Fold 2 tests on Part 2

  • Fold 3 tests on Part 3

  • Fold 4 tests on Part 4

  • Fold 5 tests on Part 5


The final score is the average of all five evaluations.

This provides a much more stable estimate of model performance.


Why Cross-Validation Is Better

Cross-validation offers several advantages:

1. Reduced Randomness

Results are not dependent on a single train/test split.

2. Better Use of Data

Every observation is used for both training and testing.

This is particularly useful when datasets are small.

3. More Reliable Metrics

Instead of one accuracy score, you obtain multiple scores and their average.

This gives a better understanding of how the model will perform on unseen data.


Practical Example Using the Breast Cancer Dataset

The Breast Cancer dataset is included with Scikit-Learn and is ideal for demonstrating cross-validation.

Load the Dataset

from sklearn.datasets import load_breast_cancer
import pandas as pd

data = load_breast_cancer()

X = pd.DataFrame(
    data.data,
    columns=data.feature_names
)

y = data.target

Evaluate Using a Single Train/Test Split

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42
)

model = DecisionTreeClassifier(
    max_depth=4,
    random_state=42
)

model.fit(X_train, y_train)

predictions = model.predict(X_test)

accuracy = accuracy_score(
    y_test,
    predictions
)

print("Accuracy:", accuracy)



Example output:

Accuracy: 0.94

This looks good, but it is based on only one split.



Evaluate Using 5-Fold Cross-Validation

from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier(
    max_depth=4,
    random_state=42
)

scores = cross_val_score(
    model,
    X,
    y,
    cv=5,
    scoring='accuracy'
)

print(scores)



Example output:

[0.92 0.95 0.94 0.96 0.93]

Now calculate the average score.

print("Mean Accuracy:", scores.mean())



Example output:

Mean Accuracy: 0.94

Rather than relying on a single evaluation, we now have five independent evaluations and an overall average.



Understanding the Results

The individual fold scores tell us how consistently the model performs.

print("Mean:", scores.mean())
print("Standard Deviation:", scores.std())

Example output:

Mean: 0.94
Standard Deviation: 0.014



A small standard deviation indicates stable performance across different subsets of data.

A large standard deviation may indicate that the model is sensitive to the training data and may not generalize well.



Using Stratified Cross-Validation

Classification datasets often have imbalanced classes.

For example:

  • 90% class A

  • 10% class B

Standard K-Fold splitting may create folds with very different class distributions.

Stratified K-Fold solves this problem by preserving class proportions in every fold.


from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score

cv = StratifiedKFold(
    n_splits=5,
    shuffle=True,
    random_state=42
)

scores = cross_val_score(
    model,
    X,
    y,
    cv=cv,
    scoring='accuracy'
)

print(scores.mean())


For classification problems, Stratified K-Fold is usually the preferred option.


Cross-Validation for Model Comparison

Cross-validation is especially useful when comparing multiple algorithms.

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

models = {
    "Logistic Regression":
        LogisticRegression(max_iter=5000),

    "Decision Tree":
        DecisionTreeClassifier(max_depth=4)
}

for name, model in models.items():

    scores = cross_val_score(
        model,
        X,
        y,
        cv=5,
        scoring='accuracy'
    )

    print(
        name,
        scores.mean()
    )


This approach ensures that every model is evaluated under identical conditions.


Best Practices

When using cross-validation:

  • Use 5-fold or 10-fold validation for most projects.

  • Use Stratified K-Fold for classification problems.

  • Compare models using average scores rather than a single test score.

  • Examine both the mean and standard deviation.

  • Combine cross-validation with hyperparameter tuning.



A single train/test split provides only one view of model performance. Depending on how the data is divided, the evaluation may be overly optimistic or overly pessimistic.

Cross-validation solves this problem by repeatedly training and testing the model on different subsets of the data. 

The result is a more reliable estimate of real-world performance and greater confidence in your machine learning models.


For most practical machine learning projects, cross-validation should be considered the default evaluation technique rather than an optional extra step.



Build a Job‑Ready Portfolio in 16 Python Projects — Proven, Practical, and Profitable for $288.


How to Pay and Get Access to the 16 End to End Practical Python Projects






Comments

Popular posts from this blog

How to Filter Rows Using Boolean Indexing in Pandas (Afrobarometer Kenya Dataset)

How to Build a Pivot Table From Our World in Data Demographics

How to Decide Whether to Drop or Fill Missing Value