0% found this document useful (0 votes)
59 views41 pages

Viva Documentation

Multi-focus image fusion is a technique used to integrate multiple images of the same scene captured with different focal lengths into a single image where every object is in focus. By fusing images taken at different focal settings, it aims to create a composite image that contains all the focused information from the original images. It has applications in fields like remote sensing, medical imaging, computer vision and defense systems to produce high quality images with everything in focus.

Uploaded by

gouraramnandini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views41 pages

Viva Documentation

Multi-focus image fusion is a technique used to integrate multiple images of the same scene captured with different focal lengths into a single image where every object is in focus. By fusing images taken at different focal settings, it aims to create a composite image that contains all the focused information from the original images. It has applications in fields like remote sensing, medical imaging, computer vision and defense systems to produce high quality images with everything in focus.

Uploaded by

gouraramnandini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 41

BACHELOR OF TECHNOLOGY

IN
ELECTRONICS AND COMMUNICATION ENGINEERING

Submitted By

G. NANDINI REDDY 20911A0470


K. MEGHA REDDY 20911A0483
A. MAHENDAR REDDY 20911A0493
N. CHAITANYA 20911A0499
Under the Esteemed Guidance of,

Dr. C. Rajakumar

Professor

Department of Electronics and Communication


Engineering,

VIDYA JYOTHI INSTITUTE OF TECHNOLOGY


(An Autonomous Institute)
Accredited by NAAC & NBA, Approved By A.I.C.T.E., New Delhi, Permanently Affiliated to JNTU, Hyderabad)
(Aziz Nagar, C.B.Post, Hyderabad -500075)

2020-2024

i
(Accredited by NAAC, Approved by AICTE New Delhi & Permanently Affiliated to JNTUH)
Aziz Nagar Gate, C.B. Post, Hyderabad-500 075

Department of Electronics and Communication Engineering


(Accredited by NBA)

CERTIFICATE
This is to certify that the project report titled “MULTI FOCUS IMAGE FUSION ”is
being submitted by G.NANDINI REDDY (20911A0470), K.MEGHA REDDY
(20911A0483), A.MAHENDAR REDDY (20911A0493), N. CHAITANYA
(20911A499) of IV B. Tech I Semester of Electronics & Communication
Engineering is a record Bonafide work carried out by them. The results embodied in
this report have not been submitted to any other University for the award of any
degree.

INTERNAL GUIDE HEAD OF THE DEPARTMENT


Dr. C. Rajakumar Dr. M. Rajendra Prasad
Professor Professor
Department of ECE Department of ECE

EXTERNAL EXAMINER

ii
DECLARATION

This is to certify that the work reported in the present project entitled MULTI
FOCUS IMAGE FUSION is a record of work done by us in the Department of
Electronics and Communication Engineering, Vidya Jyothi Institute of Technology,
Jawaharlal Nehru Technological University, Hyderabad. The reports are based on the
project work done entirely by us and not copied from any other source.

PROJECT ASSOCIATES

Ms. GOURARAM NANDINI REDDY 20911A0470


Ms. KOTHUR MEGHA REDDY 20911A0483
Mr. ARDA MAHENDAR REDDY 20911A0493
Mr. NIMMALURI CHAITANYA 20911A0499

iii
ACKNOWLEDGEMENT

We would like to express our sincere gratitude to Dr. C. RAJAKUMAR,


Project guide who has guided and supported us through every stage in the project.

We are really grateful to Dr. S.TULASI PRASAD M.E.,Ph.D., Project


Coordinator and Professor ECE Department for his time to time, much needed
valuable guidance throughout my study.

We are really grateful to Dr. M. RAJENDRA PRASAD M.E., Ph.D.,


Professor & HOD ECE Department, Vidya Jyothi Institute of Technology for his time
to time, much needed valuable guidance throughout my study.

We express our hearted gratitude to Dr. A. PADMAJA M.Tech., Ph.D., Dean


(Accreditation & Ranking), Vidya Jyothi Institute of Technology for giving us
spontaneous encourage for completing the project.

We thank Dr. E. SAIBABA REDDY, M.E., Ph.D., Principal, Vidya Jyothi


Institute of Technology for encouraging us in the completion of our project.

It is our privilege to express our gratitude and indebtedness to Dr. PALLA


RAJESHWAR REDDY, M.Sc., Ph.D., Secretary& Correspondent of Vidya Jyothi
Institute of Technology for his moral support.

We express our heartful thanks to Staff of Electronics and Communication


Department, Vidya Jyothi Institute of Technology for helping us in carrying out our
project successfully.

PROJECT ASSOCIATES

G. NANDINI REDDY 20911A0470


K. MEGHA REDDY 20911A0483
A. MAHENDAR REDDY 20911A0493
N. CHAITANYA 20911A0499

iv
ABSTRACT

Multi Focus Image Fusion (MFIF) is the data fusion method that stores the
pictures as the major research elements. It is related to the methods that integrate the
multiple image of the similar scene from multi sensor picture data and associate the
multiple picture of the similar scene at a diverse number from a single picture sensor.
A picture fusion is the method utilised to enhance the visualised interpretation of
pictures in diverse applications. It associates the essential features of more than two
pictures into a single picture without presenting the artefacts. Multi Focus Image
Fusion is the process by which two or more images are combined into a single image,
retaining the important features from each of the original images. It has been
surveyedthat MFIF may automatically achieve high quality of the images.

v
TABLE OF CONTENTS

CHAPTER TITLE PAGE NO


TITLE PAGE i
CERTIFICATION ii
DECLARATION iii
ACKNOWLEDGEMENT iv
ABSTRACT v
1 INTRODUCTION 1
1.1 Introduction 1
1.2 Objectives 4
2 LITERATURE SURVEY 5
2.1 Literature survey table 6
3 OVERVIEW OF MULTI FOCUS IMAGE FUSION 7
3.1 Domains under multi focus image fusion 7
3.1.1 Spatial domain
3.1.2 Transform domain
3.2 Drawbacks of existing methods 9
3.3 Proposed method 10
3.3.1 Convolution neural networks 10
4 ARCHITECTURE OF CNN 13
5 SOURCE CODE 16
6 RESULT 23
7 REFERENCES 26
8 CONCLUSION 27

TABLE TABLES
NUMBER
2.1 Literature survey table 6
FIGURE FIGURES
NUMBER
1 Stages of CNN 12

vi
Architecture of CNN model
2 13
Convolutional layering
3 14
4 Types of pooling layers
14
Fully connected layers
5 15
Input1(childern_1.tif)
6 24
Input2(children_2.tif)
7 24
Process of fusion
8 25
9 Output (fused image)
25

vii
CHAPTER-1

INTRODUCTION

1.1 INTRODUCTION

Imaging cameras, particularly those with long focal lengths, usually have only a
finite depth of field. In an image captured by those cameras, only those objects within
the depth of field of the camera are focused, while other objects are blurred. To obtain
an image that is in focus everywhere, i.e., an image with every object in focus, usually
we need to fuse the images taken from the same view point under different focal
settings.

The aim of image fusion is to integrate complementary and redundant information


from multiple images to create a composite that contains a ‘better’ description of the
scene than any of the individual source images. Image fusion plays important roles in
many different fields such as remote sensing, biomedical imaging, computer vision and
defence system. Multi-focus image fusion is a key research field of image fusion.

Image Fusion is the method of associating the related data from more than two
pictures into single picture. Generally, it used diverse pictures of the similar scene from
visualised sensor system that are fused to create a single fused picture. It eliminates the
related data from input pictures and highlights the useful data and essential features in
the fused picture without presenting the unpredictability in the picture. Generally, a
single picture may not focus on complete objects in the scene in different conditions,
thus multi focus picture fusion method is utilised that fused various pictures of scene
captured with focus on different objects with diverse sensors and these pictures are
fused to create resulting picture that focused whole objects in the scene. It has been
surveyed that MFIF may automatically achieve high quality of the images.

Digital composite photographs can capture images with the help of photographic
lenses such as digital single-lens reflex cameras. Sometimes, certain parts of the image
are clear and focused, while others are unfocused because lenses have limited depth of
focus. Therefore, the subject under a particular focal setting appears sharp while the rest
seems blurred. To focus all the parts of an image, one needs to take pictures with a
considerable depth of focus. Therefore, the image fusion approach is adopted to
integrate the input images for better visibility and quality with the entire focus.

1
Multi focus image fusion (MFIF) is a renowned approach that captures all-in-
focused images taken from the same scene having different focal settings. It is also
valuable for other fields like multi-modal and visible infrared image fusion as multi-
focus images are highly used. Presently, numerous MFIF methods exist that can be
categorized as spatial domain and transform domain.

However, researchers mostly prefer to transform domain methods due to their


intuitive strategy to solve the task of fusion. It is a three-step approach:

1. Convert an image to the transform domain.

2. Then, fusion rules are utilized for the integration of the decomposed coefficients to
generate composite coefficients.

3. Lastly, convert these coefficients to a spatial domain for producing a final fused image.

Some instances of transform domains include pyramid-based approaches such as


Laplacian and morphological pyramids. More transform domain approaches include
wavelet transform; gradient, sparse representation, discrete cosine transform and
neuromorphic set and stationary wavelet transform (NSWT). The critical issues of these
methods are their total dependency on human-based transformations for the localization
of reliable features.

Multi-focus image fusion is a technique used in image processing to enhance the


overall quality and information content of an image by combining multiple images with
different focus levels. In many real-world scenarios, capturing an object or scene in a
single image with all elements in perfect focus is challenging, if not impossible. This
problem is particularly common in macro photography, microscopy, and other
applications where precise focus is crucial.

The main goal of multi-focus image fusion is to create a single composite image
that incorporates the focused regions from each input image, resulting in an image thatis
sharper and clearer than any individual input. This process helps to overcome the
limitations of a single image and provides a more comprehensive representation of the
scene.

2
The process of multi-focus image fusion involves several key steps, each
contributing to the creation of a final image that represents the focused regions from
each input image. The first step is the acquisition of multiple image of the same scene
or object, each captured with a different focus setting. This can be achieved by
adjusting the camera’s focus manually or using techniques like focus stacking, which
involves capturing a series of images with different focus points and combining them
later.

Once the images are acquired, they typically undergo preprocessing to improve their
quality and ensure proper alignment. Preprocessing steps may include noise reduction,
contrast adjustment, and image registration to align the image correctly, as
misalignment can adversely impact the fusion process.

Feature extraction is a crucial step in multi-focus image fusion, where relevant


features such as edges, textures, and other details are identified in each input image.
This information is then used in the decision-making process, where algorithms
determine which pixels or regions from each input image should be included in the
finalfused image. This decision-making step plays a vital role in ensuring that the fused
image retains the most important information from each input.

One notable advancement in multi-focus image fusion involves the integration of


convolutional neural networks (CNNs). CNNs are a class of deep learning models
designed for processing visual data, and their ability to automatically learn hierarchical
representations makes them well-suited for image fusion tasks. In the context of multi-
focus image fusion, CNNs are trained to recognize and extract relevant features from
images, optimizing the fusion process and improving the overall quality of the
composite image.

The fusion process itself involves combining the selected information from each
input image to generate a single composite image. Various mathematical operations,
such as weighted averaging or transforms, can be employed for this purpose. Post-
processing steps may follow, including sharpening, colour correction, or additional
adjustments to further enhance the quality of the fused image.

3
1.2 OBJECTIVES

The purpose of image fusion is to combine information from multiple images of


the same scene into a single image that ideally contains all the important features from
each of the original images. The resulting fused image will be thus more suitable for
human and machine perception or for further image processing tasks.
To investigate and analyse existing methods for multi-focus image fusion and
focus region detection. To develop a guided filter-based approach for preserving details
and reducing artifacts in the fused image. To implement a focus region detection
technique for accurately identifying the focused regions in the input images. To
generate a weight map that combines the focus regions and the detail layers to guide
thefusion process. To evaluate the proposed method using objective metrics and
comparative analysis with existing fusion techniques.

4
CHAPTER-2

LITERATURE SURVEY

Liu et al were the first authors who introduced the CNN in the domain of MFIF which
produced good results. The critical drawback was using only referenced metrics to validate
their approach, whose results were not satisfactory. After that, Du et al. introduced the deep
support value of CNN architecture. They employed the structural risk minimization loss
function to train CNN. And max-pooling layers were replaced by CNN layers to avoid
information loss. Finally, Lai et al. presented the multi-scale visual attention deep CNN
(MADCNN) approach. They developed two different units to achieve the MFIF.

Amin-Naji et al designed an ensemble CNN (ECNN) using three CNNs. The training
of CNN used the three datasets to obtain the initial segmented decision maps. Although
obtained fused images required no post-processing; however, the model evaluation uses
only reference metrics, significantly limiting its usefulness. Wang et al integrated the deep
convolutional networks and patch-based reconstruction approaches to achieve MFIF. The
drawback was its incapability to fuse the areas of the images with overlapped background
and foreground. Moreover, the post-processing strategy has been applied to refine the
decision maps.

Gai et al designed a complex MFIF method based on two stages of CNN. First, Dense
Net generated the fused image with a blurring artifact. Then, edge-deblurring generative
adversarial networks (EDGAN) were employed to remove the blocking effect. However, the
model attained better results with the representative approaches. The complexity of the
model made the fusion process more complex and cumbersome to achieve.

Finally, Zhang et al presented the transform domain image fusion-based CNN known
as IFCNN. However, the developed model fused the different images such as multi-focus,
visible infrared, multi-modal and multi-exposure images. Hence, the comparative
performance analysis with the existing methods indicated that the performance of their
model needs to be improved to obtain a better quality image.

5
2.1 LITERATURE SURVEY TABLE

The multi-focus image fusion technique can effectively solve the problem of optical
lens depth of field, making two or more partially focused images fuse into a fully focused
image.

TITLE AUTHOR YEAR

1 Combination of images with diverse S. Li, J. Kwok 2001


focuses using the spatial frequency.

2 Transform domain image fusion-based Zhang et al 2010


CNN known as IFCNN

3 Image net classification with deep G. Hinton 2012


convolution neural networks.

4 Integrated recognition localization and P. Sermanet, 2014


detection using convolution networks
D. Eigen

6
CHAPTER-3

OVERVIEW OF MULTI FOCUS IMAGE FUSION

Multi-focus image fusion is a multiple image compression technique using input


images with different focus depths to make one output image that preserves all information.

In recent years, image fusion has been used in many applications such as remote
sensing, surveillance, medical diagnosis, and photography applications. Two major
applications of image fusion in photography are fusion of multi-focus images and multi-
exposure images.

The main idea of image fusion is gathering important and the essential information from
the input images into one single image which ideally has all of the information of the input
images. The research history of image fusion spans over 30 years and many scientific
papers. Image fusion generally has two aspects: image fusion methods and objective
evaluation metrics.

In visual sensor networks (VSN), sensors are cameras which record images and video
sequences. In many applications of VSN, a camera can't give a perfect illustration including
all details of the scene. This is because of the limited depth of focus of the optical lens of
cameras. Therefore, just the object located in the focal length of camera is focused and clear,
and other parts of the image are blurred.

VSN captures images with different depths of focus using several cameras. Due to the
large amount of data generated by cameras compared to other sensors such as pressure and
temperature sensors and some limitations of bandwidth, energy consumption and processing
time, it is essential to process the local input images to decrease the amount of transmitted
data.

Much research on multi-focus image fusion has been done in recent years and can be
classified into two categories: transform and spatial domains. Commonly used transforms for
image fusion are Discrete cosine transform (DCT) and Multi-Scale Transform (MST).
Recently, Deep learning (DL) has been thriving in several image processing and computer
vision applications.

7
3.1 DOMAINS UNDER MULTI FOCUS IMAGE FUSION

Multi focus image fusion (MFIF) is a renowned approach that captures all-in-focused
images taken from the same scene having different focal settings. It is also valuable for other
fields like multi-modal and visible infrared image fusion as multi-focus images are highly
used. Presently, numerous MFIF methods exist that can be categorized as spatial domain and
transform domain. However, researchers mostly prefer to transform domain methods due to
their intuitive strategy to solve the task of fusion.

Some instances of transform domains include pyramid-based approaches such as


Laplacian and morphological pyramids. More transform domain approaches include wavelet
transform, gradient, sparse representation, discrete cosine transform and neutrosophic set and
stationary wavelet transform (NSWT). The critical issues of these methods are their total
dependency on human-based transformations for the localization of reliable features.
Moreover, it used hand- crafted features to represent images which ultimately degraded the
image quality.

There are some existing methods of Multi Focus Image Fusion. They are:

 Spatial domain
 Pixel based fusion
 Feature based fusion
 Decision based fusion
 Transform domain
 Wavelet based fusion
 Curvelet based fusion
 DCT based fusion

3.1.1 Spatial domain

The spatial domain, in a broader sense, is a term used in various fields of science and
engineering, including image processing, signal processing, and physics. It refers to the
domain or realm in which data is represented in its original form, typically in a two-or three-
dimensional space.

8
Types of spatial domain:

 Pixel-based fusion: Within the spatial domain refers to the fundamental


representation of an image, where each element of the image is a pixel. A pixel is
the smallest unit of an image, and it is typically represented as a square or
rectangular area in a regular grid, with each pixel having a specific location and
associated pixel values that determine its colour or intensity.

 Feature-based fusion: It refers to a representation of data in which the focus is on


specific features or characteristics extracted from the original dat. This is often
used in image processing and computer vision to describe data that has undergone
feature extraction, where the primary interest lies in relevant attributes or patterns
rather than the raw pixel values.

 Decision-based fusion: It refers to the process of combining multiple pieces of


information or decisions, often derived from different source or sensors, to make a
final decision or inference in the spatial domain. It can be employed in various
applications, including image analysis, object recognition, and pattern
classification.

3.1.2 Transform domain

The transform domain is a concept used in various fields, including signal processing,
image processing, and data analysis. It refers to a space or represent where the original data is
transformed or converted into a different domain, typically to extract or emphasize specific
information, simplify analysis, or enable various processing techniques. The transformation
often involves mathematical operations, such as the Fourier Transform, wavelet transform,
other domain-specific transformations.

 Wavelet based fusion: Wavelet-based fusion schemes are extensions of the high-pass
filter method, which makes use of the idea that spatial detail is contained in high
frequencies. In the wavelet-based fusion schemes, detail information is extracted from
the PAN image using wavelet transforms and injected into the MS image.

9
 Curvelet based fusion: Curvelet transform is a powerful tool that can capture details
along the curvature in images. It is useful when it comes to feature extraction and
pattern recognition. Curvelet transform is also efficient in image denoising. This is
theremoval of noise signals in an image.

 DCT based transform: One of the widely used lossy compression algorithm is JPEG
compression algorithm. JPEG Algorithm works on DCT which is the topic of
discussion in this project. DCT stands for Discrete Cosine Transform. It is a type of
fast computing Fourier transform which maps real signals to corresponding values in
frequency domain.

3.2 DRAWBACKS OF EXISTING DOMAINS

Researchers put enormous efforts into spatial domain methods where images cannot
convert to other representations to solve the above problems. These methods are block or
region-based and pixel-based approaches. Region-based approaches such as boosted random
walk (BRW) and adaptive content blurring (CAB) partitioned the input images into regions
or blocks based on some rules and identified the focused areas such as gradients, variance
and spatial details.

Then, these focused regions are combined to create an enhanced single focused image.
This process generated blurriness. However, composite mages are processed by averaging
the pixel coefficients in pixel-based approaches such as guided filters. Although the
procedure is simple and less time-intensive, the final image suffers from ghosting artifacts.
The prior study developed numerous methods by manually designing all fusion rules but
feature extraction is time-consuming and requires more human efforts and is less efficient
than automatically generate methods. Fortunately, recently deep learning techniques are
thriving in numerous computer vision applications. Therefore, the relevance of and de-focus
blur detection. Due to its powerful capability of self-learning and avoiding designing hand-
crafted features that are less prone to errors.

10
3.3 PROPOSED METHOD:

MFIF aims to produce an all-in-focus image by fusing multiple partially focused images
of the same scene. Normally, MFIF is solved by combining focused regions with some fusion
rules. The key task in MFIF is thus the identification of focused and defocused area, which is
normally formulated as a classification problem. Various focus measurements (FM) were
designed to classify whether a pixel is focused or defocused. For example, Zhaiet al. Used
the energy of Laplacian to detect the focus level of source images. Tanget al. Proposed a
pixel- wise convolutional neural network (p-CNN) which was a learned FM that can
recognizethe focused and defocused pixels.

3.3.1 Convolution neural network (CNN):

A convolution neural network (CNN) is a type of deep learning algorithm, neural


network specially designed for processing structured grid data, such as images, video, and 2D
data. CNNs are particularly effective in tasks like image recognition, object detection, and
image classification. They have become a fundamental technology in computer vision and
have various applications in other fields as well.

CNN model is adopted to extract image fusion features which improves the accuracy of
the picture. A CNN is a kind of network architecture for deep learning algorithms and is
specifically used for image recognition and image processing. Firstly, it was used to recognize
the digits and alphabets and now it is used for the image processing and object detection. To
identify patterns within an image, CNN leverages principles from linear algebra such as
matrix multiplication. CNN’s can also classify audio and signal data.

Convolutional Neural Network, also known as CNN or Convnet, is a class of neural


networks that specializes in processing data that has a grid-like topology, such as an image. A
digital image is a binary representation of visual data. It contains a series of pixels arranged in
a grid-like fashion that contains pixel values to denote how bright and what colour each pixel
should be.

The human brain processes a huge amount of information the second we see an image.
Each neuron works in its own receptive field and is connected to other neurons in a way that
they cover the entire visual field. Just as each neuron responds to stimuli only in the restricted
region of the visual field called the receptive field in the biological vision system, each
neuron in a CNN processes data only in its receptive field as well. The layers are arranged in
such a
11
way so that they detect simpler patterns first (lines, curves, etc.) and more complex patterns
(faces, objects, etc.) further along. By using a CNN, one can enable sight to computers.

CNNs have shown remarkable performance in various computer vision tasks, and their
architecture has also been adapted for other domains, including natural language processing
(NLP) and speech recognition. They are a fundamental building block in modern deep
learning and artificial intelligence applications. The below figure 1 tells about the stages of
CNN, how input defocused images are fused and formed fully focused image.

A Convolutional Neural Network (CNN) is a specialized type of artificial neural network


designed for processing and analysing grid-like data, particularly images and videos. CNNs

12
have revolutionized computer vision and image processing tasks. Their fundamental
architecture consists of convolutional layers, pooling layers, and fully connected
layers.

In CNNs, convolutional layers apply learnable filters to input data, enabling the network
to automatically detect features like edges, textures, and more complex patterns. Pooling layers
reduce the spatial dimensions of feature maps while preserving important information. These
layers are crucial for handling varying scale and position of features.

One notable feature of CNNs is parameter sharing, where the same set of weights and
biases is used for different parts of the input data. This allows CNNs to recognize patterns
throughout the entire input, making them effective for image analysis.

Activation functions like ReLU introduce non-linearity, enabling CNNs to capture


complex relationships. CNNs can be deep, with many layers that automatically learn
hierarchical features, contributing to their success. CNNs are widely used in image
classification, object detection, image segmentation, and various computer vision tasks.
They’ve also found applications in other domains, such as natural language processing and
speech recognition, making them a foundational technology in modern deep learning and AI.

CNN are a type of multi-layer neural network that is meant to discern visual patterns from
pixel images. In CNN, ‘convolution’ is referred to as the mathematical function. It’s a type of
linear operation in which you can multiply two functions to create a third function that
expresses how one function’s shape can be changed by the other. In simple terms, two images
that are represented in the form of two matrices, are multiplied to provide an output that is
usedto extract information from the image. CNN is similar to other neural networks, but
because they use a sequence of convolutional layers, they add a layer of complexity to the
equation. CNN cannot function without convolutional layers.

13
CHAPTER-4

ARCHITECTURE OF CNN

Fig 2. Architecture of CNN model

Convolutional neural network (CNN) is a network architecture for deep learning which learns
directly from data. CNNs are particularly useful for finding patterns in images to recognize
objects. They can also be quite effective for classifying non-image data such as audio, time
series, and signal data. The above figure defines the architecture of CNN model.

The key components of CNN include:

1. Convolutional Layers: These layers apply convolution operation to the input data, which
involves sliding small filter (also known as a kernel) over the input to extract features.
The output of these operations is called feature maps, which capture different features of
the input data. It is the core building block of CNN, and it is where the majority of
computation occurs. It requires a few components, which are input data, a filter, a feature
map. The calculation of the convolution is shown below.

14
Fig.3 convolutional layering

2. Pooling Layers: Pooling layers down sample the feature maps, reducing their spatial
dimensions. Common pooling methods include max pooling, average pooling, and global
pooling. This helps reduce the number of parameters in the network and make it more
computationally efficient. It is of two types: Maximum pooling and average pooling. The
process is as shown in below figure.

Fig.4 Types of pooling layers

3. Fully connected layers: These layers are traditional artificial neural network layers where
each neuron is connected to every neuron in the previous and subsequent layers as shown
in below figure. They are often used in the final layers of the network for classification or
regression tasks.

15
Fig.5 Fully connected layers

4. Activation functions: Activation functions introduce non-linearity into the network,


enabling it to model complex relationships in the data. Common activation functions used
in CNNs include ReLU(Rectified Liner Unit), sigmod, and tanh.

CNNs are characterized by their ability to automatically learn hierarchal representations


of features from the input data. The initial layers typically capture low-level features like
edges and textures, while deeper layers capture higher-level features like object parts and
object shapes. This hierarchal feature extraction structural relationship in the data.

CNNs have revolutionized the field of computer vision and have been in various
applications, such as image classification (e.g., identifying objects in images), object
detection (e.g., finding and localizing objects within images), facial recognition, medical
image analysis, and more. Their effectiveness in handling grid-like data has also led to
adaptations in other domains, lie natural language processing and speech recognition.

16
CHAPTER-5

SOURCE CODE

//CNN_Fusion.m

%% source images preparation

img1 = double(A)/255;

img2 = double(B)/255;

if size(img1,3)>1

img1_gray=rgb2gray(img1);

img2_gray=rgb2gray(img2);

else

img1_gray=img1;

img2_gray=img2;

end

[hei, wid] = size(img1_gray);

%% load the CNN model

disp('load the CNN model

...............................................'

load(model_name);

[conv1_patchsize2,conv1_filters] = size(weights_b1_1); %9*64

conv1_patchsize = sqrt(conv1_patchsize2);

[conv2_channels,conv2_patchsize2,conv2_filters] = size(weights_b1_2); %64*9*128

conv2_patchsize = sqrt(conv2_patchsize2);

[conv3_channels,conv3_patchsize2,conv3_filters] = size(weights_b1_3); %128*9*256

conv3_patchsize = sqrt(conv3_patchsize2);
17
[conv4_channels,conv4_patchsize2,conv4_filters] = size(weights_feature); %512*64*256

18
conv4_patchsize = sqrt(conv4_patchsize2);

[conv5_channels,conv5_patchsize2,conv5_filters] = size(weights_output); %256*1*2

conv5_patchsize = sqrt(conv5_patchsize2);

%% Go forward through the network

disp('Go forward through the network

..................................................................'

%%%%%%%%%%%%%%%%%%%% conv1 %%%%%%%%%%%%%%%%%%%%

disp('---conv1-- ')

weights_conv1 = reshape(weights_b1_1, conv1_patchsize, conv1_patchsize, conv1_filters);

conv1_data1 = zeros(hei, wid, conv1_filters,'single');

conv1_data2 = zeros(hei, wid, conv1_filters,'single');

for i = 1 : conv1_filters

conv1_data1(:,:,i) = conv2(img1_gray, rot90(weights_conv1(:,:,i),2), 'same');

conv1_data1(:,:,i) = max(conv1_data1(:,:,i) + biases_b1_1(i), 0);

conv1_data2(:,:,i) = conv2(img2_gray, rot90(weights_conv1(:,:,i),2), 'same');

conv1_data2(:,:,i) = max(conv1_data2(:,:,i) + biases_b1_1(i), 0);

end

%%%%%%%%%%%%%%%%%%%%%%% conv2
%%%%%%%%%%%%%%%%%%%%

disp('---conv2-- ')

conv2_data1 = zeros(hei, wid,

conv2_filters,'single'); conv2_data2 = zeros(hei,

wid, conv2_filters,'single'); for i = 1 : conv2_filters

for j = 1 : conv2_channels

19
conv2_subfilter = rot90(reshape(weights_b1_2(j,:,i),
conv2_patchsize, conv2_patchsize),2);

20
conv2_data1(:,:,i) = conv2_data1(:,:,i) + conv2(conv1_data1(:,:,j), conv2_subfilter,
'same');

conv2_data2(:,:,i) = conv2_data2(:,:,i) + conv2(conv1_data2(:,:,j), conv2_subfilter,


'same');

end

conv2_data1(:,:,i) = max(conv2_data1(:,:,i) + biases_b1_2(i),

0); conv2_data2(:,:,i) = max(conv2_data2(:,:,i) +

biases_b1_2(i), 0);

end

clear conv1_data1 conv1_data2

%%%%%%%%%%%%%%%%%%%% max-pooling1
%%%%%%%%%%%%%%%%%%%%

disp('---maxpool1---')

conv2_data1_pooling=zeros(ceil(hei/2), ceil(wid/2), conv2_filters,'single');

conv2_data2_pooling=zeros(ceil(hei/2), ceil(wid/2), conv2_filters,'single');

for i = 1 : conv2_filters

conv2_data1_pooling(:,:,i) =

maxpooling_s2(conv2_data1(:,:,i)); conv2_data2_pooling(:,:,i)

= maxpooling_s2(conv2_data2(:,:,i));

end

clear conv2_data1 conv2_data2

%%%%%%%%%%%%%%%%%%%% conv3 %%%%%%%%%%%%%%%%%%%%

disp('---conv3---')

conv3_data1 = zeros(ceil(hei/2), ceil(wid/2),

conv3_filters,'single'); conv3_data2 = zeros(ceil(hei/2),

ceil(wid/2), conv3_filters,'single'); for i = 1 : conv3_filters


21
for j = 1 : conv3_channels

22
conv3_subfilter = rot90(reshape(weights_b1_3(j,:,i),
conv3_patchsize, conv3_patchsize),2);

conv3_data1(:,:,i) = conv3_data1(:,:,i) + conv2(conv2_data1_pooling(:,:,j),


conv3_subfilter, 'same');

conv3_data2(:,:,i) = conv3_data2(:,:,i) + conv2(conv2_data2_pooling(:,:,j),


conv3_subfilter, 'same');

end

conv3_data1(:,:,i) = max(conv3_data1(:,:,i) + biases_b1_3(i), 0);

conv3_data2(:,:,i) = max(conv3_data2(:,:,i) + biases_b1_3(i), 0);

end

clear conv2_data1_pooling conv2_data2_pooling

conv3_data=cat(3,conv3_data1,conv3_data2); %concatenation

clear conv3_data1 conv3_data2

%%%%%%%%%%%%%%%%%%%% conv4 (fc1 in Fig.2 of the paper)


%%%%%%%%%%%%%%%%%%%%

disp('---conv4 (fc1 in Fig.2 of the paper)---')

conv4_data=zeros(ceil(hei/2)-conv4_patchsize+1,ceil(wid/2)-
conv4_patchsize+1,conv4_filters,'single');

for i = 1 : conv4_filters

for j = 1 : conv4_channels

conv4_subfilter = rot90((reshape(weights_feature(j,:,i), conv4_patchsize,


conv4_patchsize)),2);

conv4_data(:,:,i) = conv4_data(:,:,i) + conv2(conv3_data(:,:,j), conv4_subfilter, 'valid');

end

end

clear conv3_data

23
%%%%%%%%%%%%%%%%%%%% conv5 (fc2 in Fig.2 of the paper)
%%%%%%%%%%%%%%%%%%%%

disp('---conv5 (fc2 in Fig.2 of the paper) ---')

conv5_data=zeros(ceil(hei/2)-conv4_patchsize+1,ceil(wid/2)-
conv4_patchsize+1,conv5_filters,'single');

for i = 1 : conv5_filters

for j = 1 : conv5_channels

conv5_subfilter = rot90(reshape(weights_output(j,:,i), conv5_patchsize,


conv5_patchsize),2);

conv5_data(:,:,i) = conv5_data(:,:,i) + conv2(conv4_data(:,:,j), conv5_subfilter);

end

end

clear conv4_data

%%%%%%%%%%%%%%%%%%%% softmax %%%%%%%%%%%%%%%%%%%%

disp('---softmax---')

output_data=zeros(ceil(hei/2)-conv4_patchsize+1,ceil(wid/2)-
conv4_patchsize+1,conv5_filters,'single');

output_data(:,:,1)=exp(conv5_data(:,:,1))./(exp(conv5_data(:,:,1))+exp(conv5_data(:,:,2)));

output_data(:,:,2)=1-output_data(:,:,1);

outMap=output_data(:,:,2);

clear conv5_data

%% focus map generation

disp('focus map generation

.................................................'

sumMap=zeros(hei,wid);

24
cntMap=zeros(hei,wid);

25
patch_size=16; %the size of patches used for training the network

stride=2; %determined by the number of max-pooling layers in the network

y_bound=hei-patch_size+1;

x_bound=wid-patch_size+1;

[h,w]=size(outMap);

for j=1:h

jj=(j-1)*stride+1;

if jj<=y_bound

temp_size_y=patch_size;

else

temp_size_y=hei-jj+1;

end

for i=1:w

ii=(i-1)*stride+1;

if ii<=x_bound

temp_size_x=patch_size;

else

temp_size_x=wid-ii+1;

end

sumMap(jj:jj+temp_size_y-1,ii:ii+temp_size_x-1)=sumMap(jj:jj+temp_size_y-
1,ii:ii+temp_size_x-1)+outMap(j,i);

cntMap(jj:jj+temp_size_y-1,ii:ii+temp_size_x-1)=cntMap(jj:jj+temp_size_y-
1,ii:ii+temp_size_x-1)+1;

end

end

26
focusMap=sumMap./cntMap;

%figure;imshow(uint8(focusMap*255));title('Focus map');

%% initial segmentation

disp('initial segmentation

..............................................'

decisionMap=zeros(hei,wid);

decisionMap(focusMap>0.5)=1;

decisionMap(focusMap<=0.5)=0;

%figure;imshow(uint8(decisionMap*255));title('Binary segmented map');

%% consistency verification

disp('consistency verification

....................................................'

%small region removal

ratio=0.01;%it could be mannually adjusted according to the characteristic of source images

area=ceil(ratio*hei*wid);

tempMap1=bwareaopen(decisionMap,area);

tempMap2=1-tempMap1;

tempMap3=bwareaopen(tempMap2,area);

decisionMap=1-tempMap3;

%figure;imshow(uint8(decisionMap*255));title('Initial decision map');

%guided image filtering

imgf_gray=img1_gray.*decisionMap+img2_gray.*(1-decisionMap);

decisionMap = guidedfilter(imgf_gray,decisionMap,8,0.1);
27
%figure;imshow(uint8(decisionMap*255));title('Final decision map');

if size(img1,3)>1

decisionMap=repmat(decisionMap,[1 1 3]);

28
end

%% fusion

disp('fusion......'

imgf=img1.*decisionMap+img2.*(1-decisionMap);

F=uint8(imgf*255);

end

//script.m

function P=maxpooling_s2(A)

[h,w]=size(A);

if mod(h,2)==1

A=[A;zeros(1,w)];

h=h+1;

end

if mod(w,2)==1

A=[A,zeros(h,1)];

w=w+1;

end

A1=A(1:2:h-1,1:2:w-1);

A2=A(1:2:h-1,2:2:w);

A3=A(2:2:h,1:2:w-1);

A4=A(2:2:h,2:2:w);

P1=max(A1,A2);

P2=max(A3,A4);

P=max(P1,P2);
29
CHAPTER-6

RESULTS

From this project we have been able to extract the fully focused image from the multiple
defocused images with the help of convolution neural networks.

This document mainly presents a new multi-focus image fusion method based on a deep
convolutional neural network. The main novelty of our method is learning a CNN model to
achieve a direct mapping between source images and the focus map. Based on this idea, the
activity level measurement and fusion rule can be jointly generated by learning the CNN
model, which can overcome the difficulty faced by the existing fusion methods. The main
contribution of this paper could be summarized into the following four points:

1) We introduce CNNs into the field of image fusion. The feasibility and superiority of CNNs
used for image fusion are discussed. It Is the first time that CNNs are employed for an image
fusion task to the best of our knowledge.

2) We propose a multi-focus image fusion method based on a CNN model. Experimental


results demonstrate the proposed method can achieve state-of-the-art results in terms of visual
quality and objective assessment.

3) We exhibit the potential of the learned CNN model for other-type image fusion This paper
mainly presents a new multi-focus image fusion method based on a deep convolutional neural
network. The main novelty of our method is learning a CNN model to achieve a direct
mapping between source images and the focus map. Based on this idea, the activity level
measurement and fusion rule can be jointly generated by learning the CNN model, which can
overcome the difficulty faced by the existing fusion methods. The main contribution of this
paper could be summarized into the following four points:

4) We put forward some suggestions on the future study of CNN-based image fusion. We
believe CNNs are capable of opening a new research approach in the field of image fusion.

30
Fig.6 input1(children_1.tif)
In the figure 6, the near field is focused and the far field is defocused.

Fig.7 input2(children_2.tif)

In the figure 7, the near field is defocused and the far field is focused.

31
Fig.8 Process of Fusion

In figure 8, the figure 6 is combined with figure7and formed fused image.

This is the process of fusion.

Fig.9 Output (Fused image)

In figure 9, the fused image is produced and displayed on the screen

32
CHAPTER-7

REFERENCES

1. S. Bhat, D. Koundal, Multi-focus Image Fusion Techniques: A Survey, Artif. Intel.

Rev. 7 (2021).

2. K.B. Chitkara, B. Sharma, D. Koundal, Comparative Analysis of Image Fusion

Methods, in: Proc. of 6th International Conference on Computing for

Sustainable Global Development (INDIACom), 2019.

3. H. Kaur, D. Koundal, V. Kadyan, Image Fusion Techniques: A Survey, Arch.

Compute. Methods Eng. 24 (2021).

4. Zhang et al Transform domain image fusion-based CNN known as IFCNN (2010).

5. P. Sermanet, D. Eigen, Integrated recognition localization and detection using

convolution Networks, (2014).

6. S. Li, J. Kwok, Combination of images with diverse focuses using the spatial

frequency (2001).

7. G. Hinton, Image net classification with deep convolution neural networks (2012).

33
CHAPTER-8

CONCLUSION

In conclusion, multi-focus image fusion stands as a pivotal technique in the realm of


image processing, offering innovative solutions to the inherent challenges associated with
capturing fully focused images in diverse scenarios. The fusion of information from multiple
images with varying focus levels addresses limitations posed by depth-of-field constrains in
traditional imaging systems. As technology advances, the integration of sophisticated
methodologies, such as convolutional neural networks (CNNs), further elevates the
capabilities of multi-focus image fusion.

The primary aim of multi-focus image fusion is to enhance the overall quality and
clarity of images. By intelligently combining focused regions from different images, this
technique generates composite images that are sharper, clearer, and more detailed than any
individual input. The objectives extend beyond mere visual aesthetics, encompassing crucial
aspects such as information retention, improved image interpretation, and adaptability to a
myriad of applications.

One of the key strengths of multi-focus image fusion lies in its adaptability to diverse
fields. From medical imaging to industrial inspection, surveillance, and computer vision, the
technique finds applications in various domains where precision and clarity are paramount.
The fusion process contributes to enhanced image analysis, object recognition, and scene
understanding, making it an invaluable tool professionals and researchers alike.

34

You might also like