Computer Vision | Object Detection using Python

An introduction to building object detection models with YOLO

Diego Lopez Yse
6 min readOct 22, 2024

Object detection is a foundational task in Computer Vision, powering systems from self-driving cars that detect pedestrians and other vehicles to smart security cameras that identify unusual activities. Unlike image classification, which only labels an entire image, object detection allows machines to identify multiple objects and pinpoint their exact locations, enhancing safety and decision-making across various industries.

The evolution of Deep Learning, particularly Convolutional Neural Networks (CNNs), has significantly improved object detection’s accuracy and efficiency, making it a powerful tool for a wide range of industries.

In this article, we’ll perform basic object detection using Python’s YOLO library.

Why YOLO?

YOLO (You Only Look Once) is a high-speed, high-accuracy model perfect for real-time object detection. While there are other options like TensorFlow and PyTorch, YOLO is especially favored for real-world, time-sensitive applications like autonomous driving and video surveillance, thanks to its efficiency and reliable accuracy.

Object Detection

We’ll use the following image to perform object detection, which you can replace with any other one:

We’ll use the OpenCV and YOLO libraries to define some functions to read the image and predict detected objects.

import cv2
import numpy as np
from ultralytics import YOLO
import matplotlib.pyplot as plt

def detect_objects(image_path):
"""
Detect objects in an image using YOLOv8.

Args:
image_path: Path to the input image

Returns:
Detected objects and class labels.
"""
# Load YOLO model
model = YOLO('yolov8n.pt') # Load the model

# Read image
image = cv2.imread(image_path)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Perform detection
results = model(image_rgb)[0]

# Create a copy of the image for drawing
annotated_image = image_rgb.copy()

# Generate random colors for classes
np.random.seed(42) # For consistent colors
colors = np.random.randint(0, 255, size=(100, 3), dtype=np.uint8)

# To hold class names and their corresponding colors
class_labels = {}

# Process detections
boxes = results.boxes

return boxes, results.names, annotated_image, colors

def show_results(image_path, confidence_threshold):
"""
Show original image and detection results side by side.

Args:
image_path: Path to the input image
confidence_threshold: Minimum confidence score for detections
"""
# Read original image
original_image = cv2.imread(image_path)
original_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)

# Get detection results
boxes, class_names, annotated_image, colors = detect_objects(image_path)

# Process each detected object and apply confidence threshold filtering
class_labels = {}
for box in boxes:
# Get box coordinates
x1, y1, x2, y2 = map(int, box.xyxy[0])

# Get confidence score
confidence = float(box.conf[0])

# Only show detections above confidence threshold
if confidence > confidence_threshold:
# Get class id and name
class_id = int(box.cls[0])
class_name = class_names[class_id]

# Get color for this class
color = colors[class_id % len(colors)].tolist()

# Draw bounding box
cv2.rectangle(annotated_image, (x1, y1), (x2, y2), color, 2)

# Store class name and color for legend
class_labels[class_name] = color

# Create figure
plt.figure(figsize=(15, 7))

# Show original image
plt.subplot(1, 2, 1)
plt.title('Original Image')
plt.imshow(original_image)
plt.axis('off')

# Show detection results
plt.subplot(1, 2, 2)
plt.title('Detected Objects')
plt.imshow(annotated_image)
plt.axis('off')

# Create legend
legend_handles = []
for class_name, color in class_labels.items():
normalized_color = np.array(color) / 255.0 # Normalize the color
legend_handles.append(plt.Line2D([0], [0], marker='o', color='w', label=class_name,
markerfacecolor=normalized_color, markersize=10))

plt.legend(handles=legend_handles, loc='upper right', title='Classes')

plt.tight_layout()
plt.show()

# Example usage:
show_results('test.jpg', confidence_threshold=0.2)

With a confidence threshold of 0.2, our model can automatically identify cars, a person, and traffic lights.

Breaking the Code

Now, let’s break down the code to understand how we did it.

Import libraries

import cv2
import numpy as np
from ultralytics import YOLO
import matplotlib.pyplot as plt

First, we begin by importing all the necessary libraries to build our object detection model:

  • cv2: OpenCV, used for image processing tasks like reading and drawing on images.
  • numpy (np): Used for numerical operations, including generating random colors.
  • YOLO: A state-of-the-art object detection model imported from the ultralytics library.
  • matplotlib.pyplot (plt): A library for plotting images and visualizations.

Defining the detect_objects function

Now, we build a function for object detection:

def detect_objects(image_path):
"""
Detect objects in an image using YOLOv8.

Args:
image_path: Path to the input image

Returns:
Detected objects and class labels.
"""
# Load YOLO model
model = YOLO('yolov8n.pt') # Load the model

# Read image
image = cv2.imread(image_path)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Perform detection
results = model(image_rgb)[0]

# Create a copy of the image for drawing
annotated_image = image_rgb.copy()

# Generate random colors for classes
np.random.seed(42) # For consistent colors
colors = np.random.randint(0, 255, size=(100, 3), dtype=np.uint8)

# To hold class names and their corresponding colors
class_labels = {}

# Process detections
boxes = results.boxes

return boxes, results.names, annotated_image, colors

The detect_objects function takes an image path as input and returns the detected objects and class labels. The function:

  • Loads the YOLO model
  • Reads the input image and converts it to RGB format
  • Performs object detection using the YOLO model
  • Creates a copy of the image for drawing bounding boxes
  • Generates random colors for classes
  • Returns the detected objects (boxes), class names, annotated image, and colors

Defining the show_results function

Next, we build a function to show the results:

def show_results(image_path, confidence_threshold):
"""
Show original image and detection results side by side.

Args:
image_path: Path to the input image
confidence_threshold: Minimum confidence score for detections
"""
# Read original image
original_image = cv2.imread(image_path)
original_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)

# Get detection results
boxes, class_names, annotated_image, colors = detect_objects(image_path)

# Process each detected object and apply confidence threshold filtering
class_labels = {}
for box in boxes:
# Get box coordinates
x1, y1, x2, y2 = map(int, box.xyxy[0])

# Get confidence score
confidence = float(box.conf[0])

# Only show detections above confidence threshold
if confidence > confidence_threshold:
# Get class id and name
class_id = int(box.cls[0])
class_name = class_names[class_id]

# Get color for this class
color = colors[class_id % len(colors)].tolist()

# Draw bounding box
cv2.rectangle(annotated_image, (x1, y1), (x2, y2), color, 2)

# Store class name and color for legend
class_labels[class_name] = color

# Create figure
plt.figure(figsize=(15, 7))

# Show original image
plt.subplot(1, 2, 1)
plt.title('Original Image')
plt.imshow(original_image)
plt.axis('off')

# Show detection results
plt.subplot(1, 2, 2)
plt.title('Detected Objects')
plt.imshow(annotated_image)
plt.axis('off')

# Create legend
legend_handles = []
for class_name, color in class_labels.items():
normalized_color = np.array(color) / 255.0 # Normalize the color
legend_handles.append(plt.Line2D([0], [0], marker='o', color='w', label=class_name,
markerfacecolor=normalized_color, markersize=10))

plt.legend(handles=legend_handles, loc='upper right', title='Classes')

plt.tight_layout()
plt.show()

The show_results function takes an image path and confidence threshold as input and displays the original image and detection results side by side. This function:

  • Reads the original image and converts it to RGB format
  • Gets the detection results from the detect_objects function
  • Processes each detected object and applies confidence threshold filtering
  • Draws bounding boxes on the annotated image
  • Creates a legend for the class names and colors
  • Displays the original image and detection results side by side using Matplotlib

Example Usage

Finally, the code calls the show_results function with an example image path and a confidence threshold of 0.2, to display objects given the confidence score.

# Example usage:
show_results('test.jpg', confidence_threshold=0.2)

A note on confidence scores

The confidence_threshold argument in the show_results function is a parameter that controls the minimum confidence score required for an object detection to be considered valid.

What is confidence score?

In object detection, the confidence score is a measure of how confident the model is that a detected object is present in the image. The confidence score is usually a value between 0 and 1, where:

  • 0 means the model is not confident at all that the object is present
  • 1 means the model is extremely confident that the object is present

How does confidence_threshold work?

When you set a confidence_threshold value, you're telling the model to only consider detections with a confidence score above that threshold as valid. This means that detections with a confidence score below the threshold will be ignored.

For example, if you set confidence_threshold=0.5, the model will only consider detections with a confidence score of 0.5 or higher as valid. Detections with a confidence score below 0.5 will be ignored.

Interested in these topics? Follow me on LinkedIn or X

--

--