Posts

Showing posts from March, 2026

What machine learning actually is — a plain-language guide

Image
Machine learning" is one of those phrases that feels like it belongs to someone else — to researchers in lab coats or Silicon Valley engineers with PhDs. But the core idea is something you already understand intuitively.   You've been doing it your whole life. What a Program Usually Does Traditional software works like a recipe. A programmer writes down every step explicitly. "If the user clicks this button, do that." "If the number is greater than ten, show this message."  Every rule is hand-coded by a human who thought of it in advance. This works brilliantly for things that follow clear, stable rules — payroll calculations, sorting a list of names, booking a flight. Traditional programming: Data + Rules → Answers. You write the rules. The computer applies them. But some problems don't work this way.  How would you write a rule for recognising a cat in a photo? Try it. You'd start with "four legs, pointy ears, fur..." and immediatel...

How to create your first line chart with World Bank Kenya GDP data

Image
Learn how to pull Kenya GDP data directly from the World Bank API and create your first line chart in Python using pandas and matplotlib—clean, fast, and reproducible. 1. Install Required Libraries !pip install pandas matplotlib wbdata 2. Import Dependencies import pandas as pd import matplotlib.pyplot as plt import wbdata import datetime 3. Define the Data You Need World Bank uses indicators . For GDP (current US$), the indicator is: NY.GDP.MKTP.CD Set the country to Kenya ( KEN ) and define a time range: indicator = {'NY.GDP.MKTP.CD': 'gdp'} data_date = (datetime.datetime(2000, 1, 1), datetime.datetime(2023, 1, 1)) 4. Fetch Data from World Bank df = wbdata.get_dataframe(indicator, country='KEN', data_date=data_date) 5. Clean and Prepare Data df = df.reset_index() df['date'] = pd.to_datetime(df['date']) df = df.sort_values('date') print (df) Explanation df = df.reset_index() → Converts the index into a normal column and resets ro...

How to install Python libraries in Google Colab with pip

Image
Learn how to install Python libraries in Google Colab using pip with simple, step-by-step commands for fast setup and dependency management. Google Colab comes with many libraries pre-installed, but you will often need to install additional ones. Use  ! pip directly inside a notebook cell. 1. Basic Installation Run this in a cell: !pip install package_name Example: !pip install pandas 2. Installing Multiple Libraries !pip install numpy pandas matplotlib 3. Installing a Specific Version !pip install pandas==2.2.2 Use this when: you need compatibility your code depends on a fixed version 4. Upgrading a Library !pip install --upgrade pandas 5. Installing Quietly (Cleaner Output) !pip install package_name --quiet 6. Install from a Requirements File A requirements file (commonly named requirements.txt) is a text file in Python projects that lists all external dependencies, packages, and their specific versions required to run an application.  It enables consistent environments ...

How to use df.head(), df.info(), and df.describe() to explore any dataset

Image
Learn how to quickly explore any dataset in Python using df.head() , df.info() , and df.describe() for a fast overview of data structure, types, and summary statistics. When you first load a dataset in Python with pandas, it’s crucial to understand its structure and contents. Three core commands give you a rapid overview: 1. df.head() Shows the first 5 rows by default (you can pass a number to view more). Helps check if the data loaded correctly and inspect sample values. import pandas as pd df = pd.read_csv("your_dataset.csv") print(df.head()) print(df.head(10)) # first 10 rows 2. df.info() Displays a concise summary of the DataFrame. Key details: number of rows, columns, column names, non-null counts, and data types. df.info() Output includes: Total rows and columns Column names Data type of each column ( int64 , float64 , object , etc.) Number of non-null entries (useful for spotting missing data) 3. df.describe() Provides summary statistics for numerical columns. I...

How to Load a CSV from a URL Directly into Pandas

Image
Load a CSV directly from a URL into pandas using read_csv() , with essential options for parsing, authentication, and large files. 1. Install Dependencies !pip install pandas 2. Import Pandas import pandas as pd 3. Load CSV from URL url = "https://example.com/data.csv" df = pd.read_csv(url) 4. Verify Data Loaded print(df.head()) print(df.shape) print(df.columns) 5. How to Handle Authentication (Basic Example) import requests from io import StringIO url = "https://example.com/protected.csv" headers = {"Authorization": "Bearer YOUR_TOKEN"} response = requests.get(url, headers=headers) df = pd.read_csv(StringIO(response.text)) 6. How to Handle Compressed Files df = pd.read_csv(url, compression='zip') Other options: 'gzip' 'bz2' 'xz' 7. Some Common Errors Error: HTTP Error 403 Cause: Access denied Fix: Add headers or authentication Error: ParserError Cause: Wrong delimiter Fix: pd.read_csv(url, sep=',')...

How to set up Google Colab for your first data science project

Image
This guide shows you exactly how to set up a working data science environment using Google Colab. What is Google Colab? Google Colab is a cloud-based Python notebook environment. It allows you to: - Write and run Python code - Use pre-installed data science libraries - Access free CPU/GPU resources - Save work automatically to Google Drive You don't need to install anything on your computer. Step 1 — Open Google Colab 1. Go to: https://colab.research.google.com 2. Click "New Notebook" You now have a working Python environment to start write scripts on. Step 2 — Understand the Interface A Colab notebook has cells. There are two types of cells: - Code cells → run Python - Text cells → write notes (Markdown) Run a cell with Shift + Enter Step 3 — Verify Your Environment Run this in a code cell:[ Press the Play Button] Check installed libraries on Collab: Step 4 — Install Missing Libraries (if needed) Colab already includes most libraries, but, if something is missing: Rul...