Computer Vision | Analyzing Satellite Images using Python
A comprehensive guide to calculating and visualizing vegetation indices
Key Takeaways
- Learn how to process multispectral satellite imagery using Python
- Calculate and visualize the Normalized Difference Vegetation Index (NDVI)
- Implement statistical analysis for vegetation monitoring
Introduction
Human vision perceives only a small portion of the electromagnetic spectrum, the “visible light,” limiting us to a narrow view of our environment. However, modern remote sensing extends far beyond visible light by using multispectral imagery, which captures a range of wavelengths invisible to our eyes.
Sensors on satellites or drones measure reflected energy across various sections, or “bands,” of the spectrum, allowing each band to reveal unique details. Since different materials (such as vegetation and water) reflect energy uniquely, multi and hyperspectral imagery enables the identification of various environmental features.
Understanding NDVI
The Normalized Difference Vegetation Index (NDVI) is a metric that measures vegetation health and density using sensor data. It’s particularly valuable in:
- Agricultural monitoring and crop health assessment
- Forest cover change detection
- Urban green space management
- Drought monitoring
- Climate change impact studies
How is NDVI calculated?
NDVI is calculated from spectrometric data in the red and near-infrared bands, usually from remote sensors like satellites. It uses a simple formula to identify the presence of vegetation in a given image, calculated as a ratio between the red (R) and near-infrared (NIR) values:
NDVI always ranges from -1 to +1. But there isn’t a distinct boundary for each type of land cover. For example, when you have negative values, it’s likely water. On the other hand, if you have an NDVI value close to +1, there’s a high possibility that it’s dense green leaves. But when NDVI is close to zero, there are likely no green leaves and it could even be an urbanized area.
Calculate NDVI using Python
Now, we’ll walk through each step to compute and visualize NDVI.
Import libraries
# Import libraries
import numpy as np
import rasterio
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
First, we import the following libraries:
numpy
for handling numerical calculations,rasterio
to read, write, and process raster data (geospatial images),matplotlib.pyplot
for data visualization, andLinearSegmentedColormap
to define a custom color map for displaying NDVI values.
Load and prepare data
# Load and prepare data
b4 = rasterio.open('LC09_L1TP_227082_20231015_20231015_02_T1_B4.TIF').read(1)
b5 = rasterio.open('LC09_L1TP_227082_20231015_20231015_02_T1_B5.TIF').read(1)
red = b4.astype('float64')
nir = b5.astype('float64')
Next, we load the red and near-infrared bands (band 4 and band 5) from a Landsat 9 image. astype('float64')
converts the pixel values to floats for accurate NDVI calculation.
Define the NDVI calculation function
# Define the NDVI calculation function
def calculate_ndvi(nir_band, red_band):
"""Calculate NDVI from NIR and Red bands"""
ndvi = (nir_band - red_band) / (nir_band + red_band + 1e-10)
ndvi = np.clip(ndvi, -1, 1)
return ndvi
# Calculate NDVI
ndvi_result = calculate_ndvi(nir, red)
This function calculates NDVI by applying the NDVI formula. np.clip(ndvi, -1, 1)
ensures the NDVI values stay within the expected range of -1 to 1, handling any floating-point errors.
Save NDVI as GeoTIFF
# Save NDVI as GeoTIFF
with rasterio.open('LC09_L1TP_227082_20231015_20231015_02_T1_B4.TIF') as src:
profile = src.profile
profile.update(
dtype=rasterio.float32,
count=1,
compress='lzw',
nodata=None
)
with rasterio.open('ndvi_result.tif', 'w', **profile) as dst:
dst.write(ndvi_result.astype(rasterio.float32), 1)
Then, we save the calculated NDVI array as a GeoTIFF file, setting it up with the same metadata as the original file, and updating the data type to float32
for accuracy.
Create and display NDVI map
# Create and display NDVI map
plt.figure(figsize=(12, 10))
colors = ['red', 'yellow', 'lightgreen', 'darkgreen']
ndvi_cmap = LinearSegmentedColormap.from_list("custom", colors, N=256)
im = plt.imshow(ndvi_result, cmap=ndvi_cmap, vmin=-1, vmax=1)
plt.colorbar(im, label='NDVI Value')
plt.title('NDVI Map', pad=20)
plt.axis('off')
plt.show()
This way, we create an NDVI color map with a gradient from red (low NDVI) to dark green (high NDVI), displaying a colored NDVI map.
Calculate basic statistics
# Calculate basic statistics
valid_ndvi = ndvi_result[~np.isnan(ndvi_result)]
stats = {
'Mean': np.mean(valid_ndvi),
'Median': np.median(valid_ndvi),
'Std Dev': np.std(valid_ndvi),
'Min': np.min(valid_ndvi),
'Max': np.max(valid_ndvi)
}
for stat, value in stats.items():
print(f"{stat}: {value:.3f}")
We can also calculate key statistics for the NDVI values, excluding any invalid values (NaNs) to get a clear summary of vegetation health.
Create and display a histogram
# Create and display a histogram
plt.figure(figsize=(12, 6))
plt.hist(valid_ndvi.flatten(), bins=100, color='darkgreen', alpha=0.7)
plt.title('NDVI Distribution')
plt.xlabel('NDVI Value')
plt.ylabel('Frequency')
plt.grid(True, alpha=0.3)
# Add statistics to plot
stats_text = '\n'.join([f'{k}: {v:.3f}' for k, v in stats.items()])
plt.text(0.95, 0.95, stats_text,
transform=plt.gca().transAxes,
verticalalignment='top',
horizontalalignment='right',
bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))
plt.show()
We can plot a histogram of the NDVI values to visually analyze its distribution, and display key statistics within the plot.
Create visualization and statistics for areas above threshold
# Create visualization and statistics for areas above threshold
def analyze_ndvi_threshold(ndvi_data, threshold):
"""
Plot NDVI areas above threshold and display statistics
Parameters:
ndvi_data: numpy array of NDVI values
threshold: NDVI value above which to show areas
"""
plt.figure(figsize=(12, 10))
# Create mask for values above threshold
high_ndvi_mask = ndvi_data > threshold
# Create masked array where values below threshold are transparent
masked_ndvi = np.ma.masked_where(~high_ndvi_mask, ndvi_data)
# Create colormap for high NDVI values
colors_high = ['lightgreen', 'green', 'darkgreen']
high_ndvi_cmap = LinearSegmentedColormap.from_list("custom_high", colors_high, N=256)
# Plot with white background
plt.imshow(np.zeros_like(ndvi_data), cmap='Greys', alpha=0.1)
plt.imshow(masked_ndvi, cmap=high_ndvi_cmap, vmin=threshold, vmax=1)
# Add colorbar
cbar = plt.colorbar(label='NDVI Value')
cbar.set_ticks(np.linspace(threshold, 1, 5))
# Calculate statistics for areas above threshold
high_ndvi_values = ndvi_data[high_ndvi_mask]
total_pixels = ndvi_data.size
pixels_above = high_ndvi_values.size
percentage_above = (pixels_above / total_pixels) * 100
stats = {
'Total pixels': total_pixels,
'Pixels above threshold': pixels_above,
'Percentage above threshold': f"{percentage_above:.1f}%",
'Mean NDVI': f"{np.mean(high_ndvi_values):.3f}",
'Median NDVI': f"{np.median(high_ndvi_values):.3f}",
'Max NDVI': f"{np.max(high_ndvi_values):.3f}",
'Min NDVI': f"{np.min(high_ndvi_values):.3f}",
'Std Dev': f"{np.std(high_ndvi_values):.3f}"
}
# Add title
plt.title(f'Areas with NDVI > {threshold}\n'
f'({percentage_above:.1f}% of total area)',
pad=20)
plt.axis('off')
plt.show()
# Print statistics
print(f"\nStatistics for NDVI > {threshold}:")
print("-" * 40)
for key, value in stats.items():
print(f"{key + ':':25} {value}")
return high_ndvi_mask, stats
# Cell 2: Create visualizations and get statistics for different thresholds
thresholds = [0.1, 0.3, 0.5]
# Dictionary to store statistics for all thresholds
all_stats = {}
for threshold in thresholds:
mask, stats = analyze_ndvi_threshold(ndvi_result, threshold)
all_stats[threshold] = stats
# Cell 3: Optional - Create comparative summary table
import pandas as pd
# Convert statistics to DataFrame for easy comparison
df_stats = pd.DataFrame(all_stats).round(3)
print("\nComparative Summary of All Thresholds:")
print("-" * 60)
print(df_stats)
Finally, we can build a function for analyzing areas of NDVI above specified thresholds (0.1, 0.3, and 0.5) and generate individual and comparative statistics for those areas.
Conclusion
NDVI analysis using Python provides a powerful tool for monitoring vegetation health and density through satellite imagery. The workflow presented here serves as a foundation for more sophisticated environmental monitoring applications.
For those looking to build upon this foundation, consider exploring other vegetation indices like the Enhanced Vegetation Index (EVI), Soil Adjusted Vegetation Index (SAVI), or Normalized Difference Water Index (NDWI). Additionally, you can implement time series analysis to track changes over seasons or integrate this analysis into automated monitoring systems.