Computer Vision | Object Detection using Python

An introduction to building object detection models with YOLO

6 min readOct 22, 2024

Object detection is a foundational task in Computer Vision, powering systems from self-driving cars that detect pedestrians and other vehicles to smart security cameras that identify unusual activities. Unlike image classification, which only labels an entire image, object detection allows machines to identify multiple objects and pinpoint their exact locations, enhancing safety and decision-making across various industries.

The evolution of Deep Learning, particularly Convolutional Neural Networks (CNNs), has significantly improved object detection’s accuracy and efficiency, making it a powerful tool for a wide range of industries.

In this article, we’ll perform basic object detection using Python’s YOLO library.

Why YOLO?

YOLO (You Only Look Once) is a high-speed, high-accuracy model perfect for real-time object detection. While there are other options like TensorFlow and PyTorch, YOLO is especially favored for real-world, time-sensitive applications like autonomous driving and video surveillance, thanks to its efficiency and reliable accuracy.

Object Detection

We’ll use the following image to perform object detection, which you can replace with any other one:

We’ll use the OpenCV and YOLO libraries to define some functions to read the image and predict detected objects.

import cv2
import numpy as np
from ultralytics import YOLO
import matplotlib.pyplot as plt

def detect_objects(image_path):
    """
    Detect objects in an image using YOLOv8.
    
    Args:
        image_path: Path to the input image
    
    Returns:
        Detected objects and class labels.
    """
    # Load YOLO model
    model = YOLO('yolov8n.pt')  # Load the model
    
    # Read image
    image = cv2.imread(image_path)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    
    # Perform detection
    results = model(image_rgb)[0]
    
    # Create a copy of the image for drawing
    annotated_image = image_rgb.copy()
    
    # Generate random colors for classes
    np.random.seed(42)  # For consistent colors
    colors = np.random.randint(0, 255, size=(100, 3), dtype=np.uint8)
    
    # To hold class names and their corresponding colors
    class_labels = {}
    
    # Process detections
    boxes = results.boxes

    return boxes, results.names, annotated_image, colors

def show_results(image_path, confidence_threshold):
    """
    Show original image and detection results side by side.

    Args:
        image_path: Path to the input image
        confidence_threshold: Minimum confidence score for detections
    """
    # Read original image
    original_image = cv2.imread(image_path)
    original_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)
    
    # Get detection results
    boxes, class_names, annotated_image, colors = detect_objects(image_path)
    
    # Process each detected object and apply confidence threshold filtering
    class_labels = {}
    for box in boxes:
        # Get box coordinates
        x1, y1, x2, y2 = map(int, box.xyxy[0])
        
        # Get confidence score
        confidence = float(box.conf[0])
        
        # Only show detections above confidence threshold
        if confidence > confidence_threshold:
            # Get class id and name
            class_id = int(box.cls[0])
            class_name = class_names[class_id]
            
            # Get color for this class
            color = colors[class_id % len(colors)].tolist()
            
            # Draw bounding box
            cv2.rectangle(annotated_image, (x1, y1), (x2, y2), color, 2)
            
            # Store class name and color for legend
            class_labels[class_name] = color

    # Create figure
    plt.figure(figsize=(15, 7))
    
    # Show original image
    plt.subplot(1, 2, 1)
    plt.title('Original Image')
    plt.imshow(original_image)
    plt.axis('off')
    
    # Show detection results
    plt.subplot(1, 2, 2)
    plt.title('Detected Objects')
    plt.imshow(annotated_image)
    plt.axis('off')

    # Create legend
    legend_handles = []
    for class_name, color in class_labels.items():
        normalized_color = np.array(color) / 255.0  # Normalize the color
        legend_handles.append(plt.Line2D([0], [0], marker='o', color='w', label=class_name,
                                           markerfacecolor=normalized_color, markersize=10))

    plt.legend(handles=legend_handles, loc='upper right', title='Classes')

    plt.tight_layout()
    plt.show()

# Example usage:
show_results('test.jpg', confidence_threshold=0.2)

With a confidence threshold of 0.2, our model can automatically identify cars, a person, and traffic lights.

Breaking the Code

Now, let’s break down the code to understand how we did it.

Import libraries

import cv2
import numpy as np
from ultralytics import YOLO
import matplotlib.pyplot as plt

First, we begin by importing all the necessary libraries to build our object detection model:

cv2: OpenCV, used for image processing tasks like reading and drawing on images.
numpy (np): Used for numerical operations, including generating random colors.
YOLO: A state-of-the-art object detection model imported from the ultralytics library.
matplotlib.pyplot (plt): A library for plotting images and visualizations.

Defining the `detect_objects` function

Now, we build a function for object detection:

def detect_objects(image_path):
    """
    Detect objects in an image using YOLOv8.
    
    Args:
        image_path: Path to the input image
    
    Returns:
        Detected objects and class labels.
    """
    # Load YOLO model
    model = YOLO('yolov8n.pt')  # Load the model
    
    # Read image
    image = cv2.imread(image_path)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    
    # Perform detection
    results = model(image_rgb)[0]
    
    # Create a copy of the image for drawing
    annotated_image = image_rgb.copy()
    
    # Generate random colors for classes
    np.random.seed(42)  # For consistent colors
    colors = np.random.randint(0, 255, size=(100, 3), dtype=np.uint8)
    
    # To hold class names and their corresponding colors
    class_labels = {}
    
    # Process detections
    boxes = results.boxes

    return boxes, results.names, annotated_image, colors

The detect_objects function takes an image path as input and returns the detected objects and class labels. The function:

Loads the YOLO model
Reads the input image and converts it to RGB format
Performs object detection using the YOLO model
Creates a copy of the image for drawing bounding boxes
Generates random colors for classes
Returns the detected objects (boxes), class names, annotated image, and colors

Defining the `show_results` function

Next, we build a function to show the results:

def show_results(image_path, confidence_threshold):
    """
    Show original image and detection results side by side.

    Args:
        image_path: Path to the input image
        confidence_threshold: Minimum confidence score for detections
    """
    # Read original image
    original_image = cv2.imread(image_path)
    original_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)
    
    # Get detection results
    boxes, class_names, annotated_image, colors = detect_objects(image_path)
    
    # Process each detected object and apply confidence threshold filtering
    class_labels = {}
    for box in boxes:
        # Get box coordinates
        x1, y1, x2, y2 = map(int, box.xyxy[0])
        
        # Get confidence score
        confidence = float(box.conf[0])
        
        # Only show detections above confidence threshold
        if confidence > confidence_threshold:
            # Get class id and name
            class_id = int(box.cls[0])
            class_name = class_names[class_id]
            
            # Get color for this class
            color = colors[class_id % len(colors)].tolist()
            
            # Draw bounding box
            cv2.rectangle(annotated_image, (x1, y1), (x2, y2), color, 2)
            
            # Store class name and color for legend
            class_labels[class_name] = color

    # Create figure
    plt.figure(figsize=(15, 7))
    
    # Show original image
    plt.subplot(1, 2, 1)
    plt.title('Original Image')
    plt.imshow(original_image)
    plt.axis('off')
    
    # Show detection results
    plt.subplot(1, 2, 2)
    plt.title('Detected Objects')
    plt.imshow(annotated_image)
    plt.axis('off')

    # Create legend
    legend_handles = []
    for class_name, color in class_labels.items():
        normalized_color = np.array(color) / 255.0  # Normalize the color
        legend_handles.append(plt.Line2D([0], [0], marker='o', color='w', label=class_name,
                                           markerfacecolor=normalized_color, markersize=10))

    plt.legend(handles=legend_handles, loc='upper right', title='Classes')

    plt.tight_layout()
    plt.show()

The show_results function takes an image path and confidence threshold as input and displays the original image and detection results side by side. This function:

Reads the original image and converts it to RGB format
Gets the detection results from the detect_objects function
Processes each detected object and applies confidence threshold filtering
Draws bounding boxes on the annotated image
Creates a legend for the class names and colors
Displays the original image and detection results side by side using Matplotlib

Example Usage

Finally, the code calls the show_results function with an example image path and a confidence threshold of 0.2, to display objects given the confidence score.

# Example usage:
show_results('test.jpg', confidence_threshold=0.2)

A note on confidence scores

The confidence_threshold argument in the show_results function is a parameter that controls the minimum confidence score required for an object detection to be considered valid.

What is confidence score?

In object detection, the confidence score is a measure of how confident the model is that a detected object is present in the image. The confidence score is usually a value between 0 and 1, where:

0 means the model is not confident at all that the object is present
1 means the model is extremely confident that the object is present

How does confidence_threshold work?

When you set a confidence_threshold value, you're telling the model to only consider detections with a confidence score above that threshold as valid. This means that detections with a confidence score below the threshold will be ignored.

For example, if you set confidence_threshold=0.5, the model will only consider detections with a confidence score of 0.5 or higher as valid. Detections with a confidence score below 0.5 will be ignored.

Interested in these topics? Follow me on LinkedIn or X

Computer Vision | Object Detection using Python

An introduction to building object detection models with YOLO

Object Detection

Breaking the Code

Import libraries

Defining the `detect_objects` function

Defining the `show_results` function

Example Usage

A note on confidence scores

Written by Diego Lopez Yse

Responses (4)

Computer Vision | Object Detection using Python

An introduction to building object detection models with YOLO

Object Detection

Breaking the Code

Import libraries

Defining the detect_objects function

Defining the show_results function

Example Usage

A note on confidence scores

Written by Diego Lopez Yse

Responses (4)

Defining the `detect_objects` function

Defining the `show_results` function