Open In App

Python OpenCV - Super resolution with deep learning

Last Updated : 21 Nov, 2025
Comments
Improve
Suggest changes
5 Likes
Like
Report

Super-Resolution (SR) is the process of converting a low-resolution (LR) image into a high-resolution (HR) version by reconstructing or hallucinating fine details that are not clearly present in the original. Modern SR techniques use deep neural networks to reconstruct high-resolution images from low-resolution inputs, enhancing details and sharpness effectively.

Need for Deep Learning-Based Super Resolution

1. Limitations of Traditional (Non-DL) SR

  • Produces blurry edges and soft textures.
  • Amplifies noise and compression artifacts.
  • Fails to reconstruct intricate patterns (skin texture, fabric, handwriting).
  • Cannot learn or infer missing details.

2. Why Deep Learning Solves This

  • Learns statistical relationships between high-resolution and low-resolution images.
  • Recovers texture, fine edges, sharp contours and lost structural detail.
  • Handles noise, JPEG artifacts and motion blur far better.
  • Generates photo-realistic outputs using GAN-based models such as ESRGAN.

Deep Learning Super-Resolution Methods

1. Interpolation

1
Interpolation of the image by 2x

Interpolation refers to the distortion of pixels from one grid to another that mainly will help us alter the resolution of the image. A low-resolution(LR) image is interpolated by 2x or 4x of the grid size. There are various Interpolation models:

  • Nearest Neighbour Interpolation: In this case, the nearest points of the pixel points are all interpolated
  • Bilinear Interpolation: It interpolates a field size of 2x2. It performs the interpolation of 1 axis completely first and then goes to the second. It is much faster than Nearest Neighbour interpolation. 
  • Bicubic Interpolation: This carries out cubical interpolation of the size 4x4. It carries out the interpolation of 2 axes at a time. It is faster than the other two interpolation models.

2. Pre-Upsampling Super Resolution

2
Pre-Upsampling Super Resolution

Upsampling is a technique that implies the doubling of a simple layer of the input layer. It is then followed by the convolution filtering. Generally, bicubic interpolation is used for the same.

  • Patch extraction to extract local features.
  • Convolution blocks for non-linear mapping.
  • Reconstruction layers to form HR image.

As we can see from the example above, the lower resolution (LR)  image undergoes a patch extraction. Patch extraction is the process of extracting the dense features from the image and convolve it. In the upsampling model, the convolution filters are present. They help in non-linear mapping. Furthermore, the convolved patch is reconstructed resulting in the high resolution (HR)  image.

Some of the common techniques, used for Upsampling an image, are:

1. SRCNN (Super-Resolution CNN)

  • One of the earliest CNN-based SR models.
  • Performs mapping from bicubic-upscaled image to HR output.

2. VDSR (Very Deep Super-Resolution)

  • Uses a very deep CNN.
  • Improves accuracy via residual learning.

3. Post Upsampling Super Resolution

3
Post Upsampling Super Resolution

The upsampling involves the usage of patch extraction. This can lead to a loss in certain features of the image that might be crucial for further processing. Hence, a post Upsampled convolution is needed to extract features which is done by post-upsampling technique.

  • This significantly reduces the complex computation by replacing the predefined upsampling with end-to-end learnable layers.
  • The LR input images are given as inputs to CNN model without increasing resolution.
  • More control over learned upsampling.
  • CNN operates in LR space → much faster computation.
  • Removes the need for initial interpolation.

Some popular techniques that are used in Post Sampling SR are:

1. FSRCNN (Fast Super-Resolution Convolutional Neural Network)

  • Faster and lighter than SRCNN.
  • Learns upsampling using deconvolution layers.

2. ESPCN (Efficient Sub-Pixel Convolutional Neural Network)

  • Uses PixelShuffle for high-quality upsampling.
  • Real-time SR for video.

Learning Techniques 

Super Resolution (SR) pixel models make use of loss functions to optimize the model. Loss functions are also used to measure the reconstruction errors of the model. A variety of Loss functions are used by the SR Model to give results with better accuracy and lesser errors.

Some of the commonly employed Loss Functions are:

  • Pixel-Wise Loss: It measures the difference between each pixel of the real and generated high-resolution (HR) images. L1 loss calculates the absolute difference, while L2 loss computes the mean squared difference between corresponding pixel values.
  • Content Loss: This is the Euclidean distance between the features of the High-level output image and the target HR image. High-level features are obtained by using VGG and ResNet.
  • Adversarial Loss: This loss function used to train the generator and discriminator models. This is also called the GAN loss function. 

Residual Networks

Residual Neural Network (ResNet) are a form of artificial Neural network. ResNet network design can be predominantly used in Super Resolution Models due to the availability of the SRResNet architecture. 

EDSR (Enhanced Deep Residual Networks for Single Image Super-Resolution)

EDSR is an improved version of SRResNet designed for single-scale super-resolution tasks. It removes Batch Normalization (BN) layers to enhance performance, reduce memory usage and improve training efficiency. The model relies on residual blocks for effective feature learning.

  • No Batch Normalization layers, improving accuracy.
  • BN removal reduces memory usage by about 40%.
  • Leads to faster and more stable training.
  • Uses residual blocks for efficient feature extraction.
4
Comparison of SRResNet & EDSR 

MDSR (Multi-scale Deep Super-Resolution system)

MDSR is an extension of the EDSR. It reconstructs various scales of high-resolution images in a single model. It has multiple input and output modules that give corresponding resolution outputs at 2x, 3x and 4x.

  • A larger kernel is used here as the pre-processing layers, which makes the network simple, while still attaining a high receptive field.
  • The common shared residual blocks at the end of scale-specific pre-processing modules for all resolutions.
  • After the upsampling, the depth of MDSR will reach 5 times as compared to single-scale EDSR.  
  • It can give comparable results to scale-specific EDSR combined model with lesser parameters.

Other Network Designs

Apart from Residual Networks, these are some other Network Designs that can be used in designing SR models:

  • Recursive Network
  • Dense Connection Network
  • Group Convolution Network
  • Local Multi-path Network

However, Residual Network is preferred because of the availability of residual blocks.

Generative Models (GAN)

Generative models (GAN) optimize the quality to produce images that are pleasant to the human eye because humans don't distinguish images by pixel difference. The networks optimize the pixel difference between expected and output HR images. Some commonly used GAN architectures are SRGAN and ESRGAN.

1. SRGAN

SRGAN follows the standard GAN framework, consisting of a Generator and a Discriminator, and is primarily used for 4× upscaling tasks.

1. Uses a perceptual loss function, combining:

  • Adversarial loss: encourages the generator to produce realistic images that fool the discriminator.
  • Content loss: maintains similarity to the original high-resolution image.

2. The discriminator is trained to distinguish between generated (SR) and real HR images.

3. This approach pushes the model to generate more natural and visually appealing images.

srgan_2
Generator Network (SRGAN architecture)
srgan_3
Discriminator Network (SRGAN architecture)

2. ESRGAN (Enhanced SRGAN)

ESRGAN builds upon SRGAN with architectural and perceptual improvements, leading to more detailed and sharper images.

  • Introduces Residual-in-Residual Dense Blocks (RRDB) for deeper feature learning.
  • Removes batch normalization layers to reduce artifacts and improve generalization.
  • Uses Relativistic Discriminator, which compares real and fake images relatively rather than absolutely.
  • Enhances perceptual loss by using features extracted before activation layers in the network.

Overall, ESRGAN delivers sharper textures and more realistic visuals compared to SRGAN, setting a benchmark for deep learning–based super-resolution methods.

Super Resolution Model using Deep learning

The code given below demonstrates the conversion of a low-resolution(LR) image to a high-resolution(HR) image using the Super-Resolution(SR) model.

Step 1: Configuration + helpers

  • Sets the script configuration variables (input/output names, model URL, options).
  • USE_SAMPLE_IMAGE: if True, code will download SAMPLE_IMAGE_URL and use it as INPUT_FILENAME.
  • PAD_TO_MULTIPLE_OF_4: ESRGAN needs H and W divisible by 4, we can choose to crop (default) or pad to satisfy that.
  • SIMULATE_LOWRES and SIMULATE_SCALE: creates a pixelated LR input for testing (optional).
  • Imports required libraries (PIL, numpy, cv2, matplotlib).
  • pil_to_np_uint8(img): converts a PIL image to a clean HxWx3 uint8 NumPy array and strips alpha channel if present.
  • crop_or_pad_to_multiple_of_4(img_np, pad=False): enforces dimensions divisible by 4 by cropping (or padding with reflection if pad=True).
Python
import urllib.request
import io
import os
import cv2
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image, ImageFilter
INPUT_FILENAME = "input.png"
OUTPUT_SR = "SR_output.png"
OUTPUT_BIC = "SR_bicubic.png"
OUTPUT_BIC_SHARP = "SR_bicubic_sharp.png"
MODEL_URL = "https://tfhub.dev/captain-pool/esrgan-tf2/1"
UPSCALE_FALLBACK = 4
PAD_TO_MULTIPLE_OF_4 = False
SIMULATE_LOWRES = False
SIMULATE_SCALE = 0.2
SHOW_PREVIEW = True

def download_image(url, path):
    urllib.request.urlretrieve(url, path)
    print("Downloaded:", url, "->", path)

def pil_to_np_uint8(img):
    arr = np.array(img)
    if arr.dtype != np.uint8:
        arr = arr.astype(np.uint8)
    if arr.ndim == 2:
        arr = np.stack([arr] * 3, axis=-1)
    if arr.shape[2] == 4:
        arr = arr[..., :3]
    return arr

def crop_or_pad_to_multiple_of_4(img_np, pad=False):
    h, w = img_np.shape[:2]
    h4, w4 = h - (h % 4), w - (w % 4)
    if h4 == h and w4 == w:
        return img_np
    if pad:
        pad_h = (4 - (h % 4)) % 4
        pad_w = (4 - (w % 4)) % 4
        return cv2.copyMakeBorder(img_np, 0, pad_h, 0, pad_w, cv2.BORDER_REFLECT)
    else:
        return img_np[:h4, :w4]

Step 2: Load Image

  • Uses Colab’s files.upload() to let you upload an image from your computer (PNG/JPG).
  • If USE_SAMPLE_IMAGE is True or the INPUT_FILENAME file is missing, it downloads the sample image.
  • After upload/download it opens the image with PIL, converts to RGB and prints its size.
  • Shows a small preview (256×256) so you can confirm the image loaded correctly.
Python
from google.colab import files
if USE_SAMPLE_IMAGE:
    download_image(SAMPLE_IMAGE_URL, INPUT_FILENAME)
else:
    print("Please upload an image file (PNG/JPG). After uploading, set INPUT_FILENAME to the uploaded filename if different.")
    uploaded = files.upload()
    if uploaded:
        fname = list(uploaded.keys())[0]
        print("Uploaded:", fname)
        INPUT_FILENAME = fname
from PIL import Image
img = Image.open(INPUT_FILENAME).convert("RGB")
print("Loaded:", INPUT_FILENAME, "size:", img.size)
display(img.resize((256, 256)))

Output:

Screenshot-2025-11-21-170230
Output

Step 3: Load ESRGAN model and Run Inference

  • Imports TensorFlow + TF-Hub and prints TF version so you can confirm the runtime has TF installed.
  • GPU handling: checks for GPUs and enables memory growth so TensorFlow won’t pre-allocate the entire GPU. This reduces OOM crashes.
  • Loads the ESRGAN TF-Hub model from MODEL_URL, first run downloads the model and may take time depending on your internet speed.
  • Runs inference: sr_tensor = model(inp) then squeezes and clamps values into [0,255] and converts to uint8.
  • Saves SR output as OUTPUT_SR and optionally displays side-by-side preview (original vs ESRGAN result).
Python
import tensorflow as tf
import tensorflow_hub as hub
from PIL import Image
print("TensorFlow version:", tf.__version__)
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    try:
        for g in gpus:
            tf.config.experimental.set_memory_growth(g, True)
        print("GPU(s) present. Memory growth enabled.")
    except Exception as e:
        print("Couldn't enable memory growth:", e)
else:
    print("No GPU detected, running on CPU.")
print("Loading ESRGAN model from TF-Hub (this will download the model once)...")
model = hub.load(MODEL_URL)
print("Model loaded.")
pil = Image.open(INPUT_FILENAME).convert("RGB")
if SIMULATE_LOWRES:
    w, h = pil.size
    small = pil.resize((max(16, int(w * SIMULATE_SCALE)),
                       max(16, int(h * SIMULATE_SCALE))), resample=Image.BILINEAR)
    pil = small.resize((w, h), resample=Image.NEAREST)
    print("Simulated low-res input created.")

lr_np = pil_to_np_uint8(pil)
lr_np = crop_or_pad_to_multiple_of_4(lr_np, pad=PAD_TO_MULTIPLE_OF_4)
inp = tf.expand_dims(tf.cast(lr_np, tf.float32), axis=0)
print("Running model inference (this may take a while the first time)...")
sr_tensor = model(inp)
sr = tf.squeeze(sr_tensor)
sr = tf.clip_by_value(sr, 0.0, 255.0)
sr_uint8 = tf.cast(sr, tf.uint8).numpy()
sr_pil = Image.fromarray(sr_uint8)
sr_pil.save(OUTPUT_SR)
print("Saved ESRGAN output to:", OUTPUT_SR
if SHOW_PREVIEW:
    import matplotlib.pyplot as plt
    fig, axs = plt.subplots(1, 2, figsize=(12, 6))
    axs[0].imshow(Image.open(INPUT_FILENAME).convert("RGB"))
    axs[0].set_title("Input")
    axs[0].axis("off")
    axs[1].imshow(sr_pil)
    axs[1].set_title("ESRGAN Output")
    axs[1].axis("off")
    plt.show()

Output:

Screenshot-2025-11-21-1
Output

We can see our model improved quality of image but we can further fine tune it to make it better.


Explore