Viva Documentation
Viva Documentation
IN
ELECTRONICS AND COMMUNICATION ENGINEERING
Submitted By
Dr. C. Rajakumar
Professor
2020-2024
i
(Accredited by NAAC, Approved by AICTE New Delhi & Permanently Affiliated to JNTUH)
Aziz Nagar Gate, C.B. Post, Hyderabad-500 075
CERTIFICATE
This is to certify that the project report titled “MULTI FOCUS IMAGE FUSION ”is
being submitted by G.NANDINI REDDY (20911A0470), K.MEGHA REDDY
(20911A0483), A.MAHENDAR REDDY (20911A0493), N. CHAITANYA
(20911A499) of IV B. Tech I Semester of Electronics & Communication
Engineering is a record Bonafide work carried out by them. The results embodied in
this report have not been submitted to any other University for the award of any
degree.
EXTERNAL EXAMINER
ii
DECLARATION
This is to certify that the work reported in the present project entitled MULTI
FOCUS IMAGE FUSION is a record of work done by us in the Department of
Electronics and Communication Engineering, Vidya Jyothi Institute of Technology,
Jawaharlal Nehru Technological University, Hyderabad. The reports are based on the
project work done entirely by us and not copied from any other source.
PROJECT ASSOCIATES
iii
ACKNOWLEDGEMENT
PROJECT ASSOCIATES
iv
ABSTRACT
Multi Focus Image Fusion (MFIF) is the data fusion method that stores the
pictures as the major research elements. It is related to the methods that integrate the
multiple image of the similar scene from multi sensor picture data and associate the
multiple picture of the similar scene at a diverse number from a single picture sensor.
A picture fusion is the method utilised to enhance the visualised interpretation of
pictures in diverse applications. It associates the essential features of more than two
pictures into a single picture without presenting the artefacts. Multi Focus Image
Fusion is the process by which two or more images are combined into a single image,
retaining the important features from each of the original images. It has been
surveyedthat MFIF may automatically achieve high quality of the images.
v
TABLE OF CONTENTS
TABLE TABLES
NUMBER
2.1 Literature survey table 6
FIGURE FIGURES
NUMBER
1 Stages of CNN 12
vi
Architecture of CNN model
2 13
Convolutional layering
3 14
4 Types of pooling layers
14
Fully connected layers
5 15
Input1(childern_1.tif)
6 24
Input2(children_2.tif)
7 24
Process of fusion
8 25
9 Output (fused image)
25
vii
CHAPTER-1
INTRODUCTION
1.1 INTRODUCTION
Imaging cameras, particularly those with long focal lengths, usually have only a
finite depth of field. In an image captured by those cameras, only those objects within
the depth of field of the camera are focused, while other objects are blurred. To obtain
an image that is in focus everywhere, i.e., an image with every object in focus, usually
we need to fuse the images taken from the same view point under different focal
settings.
Image Fusion is the method of associating the related data from more than two
pictures into single picture. Generally, it used diverse pictures of the similar scene from
visualised sensor system that are fused to create a single fused picture. It eliminates the
related data from input pictures and highlights the useful data and essential features in
the fused picture without presenting the unpredictability in the picture. Generally, a
single picture may not focus on complete objects in the scene in different conditions,
thus multi focus picture fusion method is utilised that fused various pictures of scene
captured with focus on different objects with diverse sensors and these pictures are
fused to create resulting picture that focused whole objects in the scene. It has been
surveyed that MFIF may automatically achieve high quality of the images.
Digital composite photographs can capture images with the help of photographic
lenses such as digital single-lens reflex cameras. Sometimes, certain parts of the image
are clear and focused, while others are unfocused because lenses have limited depth of
focus. Therefore, the subject under a particular focal setting appears sharp while the rest
seems blurred. To focus all the parts of an image, one needs to take pictures with a
considerable depth of focus. Therefore, the image fusion approach is adopted to
integrate the input images for better visibility and quality with the entire focus.
1
Multi focus image fusion (MFIF) is a renowned approach that captures all-in-
focused images taken from the same scene having different focal settings. It is also
valuable for other fields like multi-modal and visible infrared image fusion as multi-
focus images are highly used. Presently, numerous MFIF methods exist that can be
categorized as spatial domain and transform domain.
2. Then, fusion rules are utilized for the integration of the decomposed coefficients to
generate composite coefficients.
3. Lastly, convert these coefficients to a spatial domain for producing a final fused image.
The main goal of multi-focus image fusion is to create a single composite image
that incorporates the focused regions from each input image, resulting in an image thatis
sharper and clearer than any individual input. This process helps to overcome the
limitations of a single image and provides a more comprehensive representation of the
scene.
2
The process of multi-focus image fusion involves several key steps, each
contributing to the creation of a final image that represents the focused regions from
each input image. The first step is the acquisition of multiple image of the same scene
or object, each captured with a different focus setting. This can be achieved by
adjusting the camera’s focus manually or using techniques like focus stacking, which
involves capturing a series of images with different focus points and combining them
later.
Once the images are acquired, they typically undergo preprocessing to improve their
quality and ensure proper alignment. Preprocessing steps may include noise reduction,
contrast adjustment, and image registration to align the image correctly, as
misalignment can adversely impact the fusion process.
The fusion process itself involves combining the selected information from each
input image to generate a single composite image. Various mathematical operations,
such as weighted averaging or transforms, can be employed for this purpose. Post-
processing steps may follow, including sharpening, colour correction, or additional
adjustments to further enhance the quality of the fused image.
3
1.2 OBJECTIVES
4
CHAPTER-2
LITERATURE SURVEY
Liu et al were the first authors who introduced the CNN in the domain of MFIF which
produced good results. The critical drawback was using only referenced metrics to validate
their approach, whose results were not satisfactory. After that, Du et al. introduced the deep
support value of CNN architecture. They employed the structural risk minimization loss
function to train CNN. And max-pooling layers were replaced by CNN layers to avoid
information loss. Finally, Lai et al. presented the multi-scale visual attention deep CNN
(MADCNN) approach. They developed two different units to achieve the MFIF.
Amin-Naji et al designed an ensemble CNN (ECNN) using three CNNs. The training
of CNN used the three datasets to obtain the initial segmented decision maps. Although
obtained fused images required no post-processing; however, the model evaluation uses
only reference metrics, significantly limiting its usefulness. Wang et al integrated the deep
convolutional networks and patch-based reconstruction approaches to achieve MFIF. The
drawback was its incapability to fuse the areas of the images with overlapped background
and foreground. Moreover, the post-processing strategy has been applied to refine the
decision maps.
Gai et al designed a complex MFIF method based on two stages of CNN. First, Dense
Net generated the fused image with a blurring artifact. Then, edge-deblurring generative
adversarial networks (EDGAN) were employed to remove the blocking effect. However, the
model attained better results with the representative approaches. The complexity of the
model made the fusion process more complex and cumbersome to achieve.
Finally, Zhang et al presented the transform domain image fusion-based CNN known
as IFCNN. However, the developed model fused the different images such as multi-focus,
visible infrared, multi-modal and multi-exposure images. Hence, the comparative
performance analysis with the existing methods indicated that the performance of their
model needs to be improved to obtain a better quality image.
5
2.1 LITERATURE SURVEY TABLE
The multi-focus image fusion technique can effectively solve the problem of optical
lens depth of field, making two or more partially focused images fuse into a fully focused
image.
6
CHAPTER-3
In recent years, image fusion has been used in many applications such as remote
sensing, surveillance, medical diagnosis, and photography applications. Two major
applications of image fusion in photography are fusion of multi-focus images and multi-
exposure images.
The main idea of image fusion is gathering important and the essential information from
the input images into one single image which ideally has all of the information of the input
images. The research history of image fusion spans over 30 years and many scientific
papers. Image fusion generally has two aspects: image fusion methods and objective
evaluation metrics.
In visual sensor networks (VSN), sensors are cameras which record images and video
sequences. In many applications of VSN, a camera can't give a perfect illustration including
all details of the scene. This is because of the limited depth of focus of the optical lens of
cameras. Therefore, just the object located in the focal length of camera is focused and clear,
and other parts of the image are blurred.
VSN captures images with different depths of focus using several cameras. Due to the
large amount of data generated by cameras compared to other sensors such as pressure and
temperature sensors and some limitations of bandwidth, energy consumption and processing
time, it is essential to process the local input images to decrease the amount of transmitted
data.
Much research on multi-focus image fusion has been done in recent years and can be
classified into two categories: transform and spatial domains. Commonly used transforms for
image fusion are Discrete cosine transform (DCT) and Multi-Scale Transform (MST).
Recently, Deep learning (DL) has been thriving in several image processing and computer
vision applications.
7
3.1 DOMAINS UNDER MULTI FOCUS IMAGE FUSION
Multi focus image fusion (MFIF) is a renowned approach that captures all-in-focused
images taken from the same scene having different focal settings. It is also valuable for other
fields like multi-modal and visible infrared image fusion as multi-focus images are highly
used. Presently, numerous MFIF methods exist that can be categorized as spatial domain and
transform domain. However, researchers mostly prefer to transform domain methods due to
their intuitive strategy to solve the task of fusion.
There are some existing methods of Multi Focus Image Fusion. They are:
Spatial domain
Pixel based fusion
Feature based fusion
Decision based fusion
Transform domain
Wavelet based fusion
Curvelet based fusion
DCT based fusion
The spatial domain, in a broader sense, is a term used in various fields of science and
engineering, including image processing, signal processing, and physics. It refers to the
domain or realm in which data is represented in its original form, typically in a two-or three-
dimensional space.
8
Types of spatial domain:
The transform domain is a concept used in various fields, including signal processing,
image processing, and data analysis. It refers to a space or represent where the original data is
transformed or converted into a different domain, typically to extract or emphasize specific
information, simplify analysis, or enable various processing techniques. The transformation
often involves mathematical operations, such as the Fourier Transform, wavelet transform,
other domain-specific transformations.
Wavelet based fusion: Wavelet-based fusion schemes are extensions of the high-pass
filter method, which makes use of the idea that spatial detail is contained in high
frequencies. In the wavelet-based fusion schemes, detail information is extracted from
the PAN image using wavelet transforms and injected into the MS image.
9
Curvelet based fusion: Curvelet transform is a powerful tool that can capture details
along the curvature in images. It is useful when it comes to feature extraction and
pattern recognition. Curvelet transform is also efficient in image denoising. This is
theremoval of noise signals in an image.
DCT based transform: One of the widely used lossy compression algorithm is JPEG
compression algorithm. JPEG Algorithm works on DCT which is the topic of
discussion in this project. DCT stands for Discrete Cosine Transform. It is a type of
fast computing Fourier transform which maps real signals to corresponding values in
frequency domain.
Researchers put enormous efforts into spatial domain methods where images cannot
convert to other representations to solve the above problems. These methods are block or
region-based and pixel-based approaches. Region-based approaches such as boosted random
walk (BRW) and adaptive content blurring (CAB) partitioned the input images into regions
or blocks based on some rules and identified the focused areas such as gradients, variance
and spatial details.
Then, these focused regions are combined to create an enhanced single focused image.
This process generated blurriness. However, composite mages are processed by averaging
the pixel coefficients in pixel-based approaches such as guided filters. Although the
procedure is simple and less time-intensive, the final image suffers from ghosting artifacts.
The prior study developed numerous methods by manually designing all fusion rules but
feature extraction is time-consuming and requires more human efforts and is less efficient
than automatically generate methods. Fortunately, recently deep learning techniques are
thriving in numerous computer vision applications. Therefore, the relevance of and de-focus
blur detection. Due to its powerful capability of self-learning and avoiding designing hand-
crafted features that are less prone to errors.
10
3.3 PROPOSED METHOD:
MFIF aims to produce an all-in-focus image by fusing multiple partially focused images
of the same scene. Normally, MFIF is solved by combining focused regions with some fusion
rules. The key task in MFIF is thus the identification of focused and defocused area, which is
normally formulated as a classification problem. Various focus measurements (FM) were
designed to classify whether a pixel is focused or defocused. For example, Zhaiet al. Used
the energy of Laplacian to detect the focus level of source images. Tanget al. Proposed a
pixel- wise convolutional neural network (p-CNN) which was a learned FM that can
recognizethe focused and defocused pixels.
CNN model is adopted to extract image fusion features which improves the accuracy of
the picture. A CNN is a kind of network architecture for deep learning algorithms and is
specifically used for image recognition and image processing. Firstly, it was used to recognize
the digits and alphabets and now it is used for the image processing and object detection. To
identify patterns within an image, CNN leverages principles from linear algebra such as
matrix multiplication. CNN’s can also classify audio and signal data.
The human brain processes a huge amount of information the second we see an image.
Each neuron works in its own receptive field and is connected to other neurons in a way that
they cover the entire visual field. Just as each neuron responds to stimuli only in the restricted
region of the visual field called the receptive field in the biological vision system, each
neuron in a CNN processes data only in its receptive field as well. The layers are arranged in
such a
11
way so that they detect simpler patterns first (lines, curves, etc.) and more complex patterns
(faces, objects, etc.) further along. By using a CNN, one can enable sight to computers.
CNNs have shown remarkable performance in various computer vision tasks, and their
architecture has also been adapted for other domains, including natural language processing
(NLP) and speech recognition. They are a fundamental building block in modern deep
learning and artificial intelligence applications. The below figure 1 tells about the stages of
CNN, how input defocused images are fused and formed fully focused image.
12
have revolutionized computer vision and image processing tasks. Their fundamental
architecture consists of convolutional layers, pooling layers, and fully connected
layers.
In CNNs, convolutional layers apply learnable filters to input data, enabling the network
to automatically detect features like edges, textures, and more complex patterns. Pooling layers
reduce the spatial dimensions of feature maps while preserving important information. These
layers are crucial for handling varying scale and position of features.
One notable feature of CNNs is parameter sharing, where the same set of weights and
biases is used for different parts of the input data. This allows CNNs to recognize patterns
throughout the entire input, making them effective for image analysis.
CNN are a type of multi-layer neural network that is meant to discern visual patterns from
pixel images. In CNN, ‘convolution’ is referred to as the mathematical function. It’s a type of
linear operation in which you can multiply two functions to create a third function that
expresses how one function’s shape can be changed by the other. In simple terms, two images
that are represented in the form of two matrices, are multiplied to provide an output that is
usedto extract information from the image. CNN is similar to other neural networks, but
because they use a sequence of convolutional layers, they add a layer of complexity to the
equation. CNN cannot function without convolutional layers.
13
CHAPTER-4
ARCHITECTURE OF CNN
Convolutional neural network (CNN) is a network architecture for deep learning which learns
directly from data. CNNs are particularly useful for finding patterns in images to recognize
objects. They can also be quite effective for classifying non-image data such as audio, time
series, and signal data. The above figure defines the architecture of CNN model.
1. Convolutional Layers: These layers apply convolution operation to the input data, which
involves sliding small filter (also known as a kernel) over the input to extract features.
The output of these operations is called feature maps, which capture different features of
the input data. It is the core building block of CNN, and it is where the majority of
computation occurs. It requires a few components, which are input data, a filter, a feature
map. The calculation of the convolution is shown below.
14
Fig.3 convolutional layering
2. Pooling Layers: Pooling layers down sample the feature maps, reducing their spatial
dimensions. Common pooling methods include max pooling, average pooling, and global
pooling. This helps reduce the number of parameters in the network and make it more
computationally efficient. It is of two types: Maximum pooling and average pooling. The
process is as shown in below figure.
3. Fully connected layers: These layers are traditional artificial neural network layers where
each neuron is connected to every neuron in the previous and subsequent layers as shown
in below figure. They are often used in the final layers of the network for classification or
regression tasks.
15
Fig.5 Fully connected layers
CNNs have revolutionized the field of computer vision and have been in various
applications, such as image classification (e.g., identifying objects in images), object
detection (e.g., finding and localizing objects within images), facial recognition, medical
image analysis, and more. Their effectiveness in handling grid-like data has also led to
adaptations in other domains, lie natural language processing and speech recognition.
16
CHAPTER-5
SOURCE CODE
//CNN_Fusion.m
img1 = double(A)/255;
img2 = double(B)/255;
if size(img1,3)>1
img1_gray=rgb2gray(img1);
img2_gray=rgb2gray(img2);
else
img1_gray=img1;
img2_gray=img2;
end
...............................................'
load(model_name);
conv1_patchsize = sqrt(conv1_patchsize2);
conv2_patchsize = sqrt(conv2_patchsize2);
conv3_patchsize = sqrt(conv3_patchsize2);
17
[conv4_channels,conv4_patchsize2,conv4_filters] = size(weights_feature); %512*64*256
18
conv4_patchsize = sqrt(conv4_patchsize2);
conv5_patchsize = sqrt(conv5_patchsize2);
..................................................................'
disp('---conv1-- ')
for i = 1 : conv1_filters
end
%%%%%%%%%%%%%%%%%%%%%%% conv2
%%%%%%%%%%%%%%%%%%%%
disp('---conv2-- ')
for j = 1 : conv2_channels
19
conv2_subfilter = rot90(reshape(weights_b1_2(j,:,i),
conv2_patchsize, conv2_patchsize),2);
20
conv2_data1(:,:,i) = conv2_data1(:,:,i) + conv2(conv1_data1(:,:,j), conv2_subfilter,
'same');
end
biases_b1_2(i), 0);
end
%%%%%%%%%%%%%%%%%%%% max-pooling1
%%%%%%%%%%%%%%%%%%%%
disp('---maxpool1---')
for i = 1 : conv2_filters
conv2_data1_pooling(:,:,i) =
maxpooling_s2(conv2_data1(:,:,i)); conv2_data2_pooling(:,:,i)
= maxpooling_s2(conv2_data2(:,:,i));
end
disp('---conv3---')
22
conv3_subfilter = rot90(reshape(weights_b1_3(j,:,i),
conv3_patchsize, conv3_patchsize),2);
end
end
conv3_data=cat(3,conv3_data1,conv3_data2); %concatenation
conv4_data=zeros(ceil(hei/2)-conv4_patchsize+1,ceil(wid/2)-
conv4_patchsize+1,conv4_filters,'single');
for i = 1 : conv4_filters
for j = 1 : conv4_channels
end
end
clear conv3_data
23
%%%%%%%%%%%%%%%%%%%% conv5 (fc2 in Fig.2 of the paper)
%%%%%%%%%%%%%%%%%%%%
conv5_data=zeros(ceil(hei/2)-conv4_patchsize+1,ceil(wid/2)-
conv4_patchsize+1,conv5_filters,'single');
for i = 1 : conv5_filters
for j = 1 : conv5_channels
end
end
clear conv4_data
disp('---softmax---')
output_data=zeros(ceil(hei/2)-conv4_patchsize+1,ceil(wid/2)-
conv4_patchsize+1,conv5_filters,'single');
output_data(:,:,1)=exp(conv5_data(:,:,1))./(exp(conv5_data(:,:,1))+exp(conv5_data(:,:,2)));
output_data(:,:,2)=1-output_data(:,:,1);
outMap=output_data(:,:,2);
clear conv5_data
.................................................'
sumMap=zeros(hei,wid);
24
cntMap=zeros(hei,wid);
25
patch_size=16; %the size of patches used for training the network
y_bound=hei-patch_size+1;
x_bound=wid-patch_size+1;
[h,w]=size(outMap);
for j=1:h
jj=(j-1)*stride+1;
if jj<=y_bound
temp_size_y=patch_size;
else
temp_size_y=hei-jj+1;
end
for i=1:w
ii=(i-1)*stride+1;
if ii<=x_bound
temp_size_x=patch_size;
else
temp_size_x=wid-ii+1;
end
sumMap(jj:jj+temp_size_y-1,ii:ii+temp_size_x-1)=sumMap(jj:jj+temp_size_y-
1,ii:ii+temp_size_x-1)+outMap(j,i);
cntMap(jj:jj+temp_size_y-1,ii:ii+temp_size_x-1)=cntMap(jj:jj+temp_size_y-
1,ii:ii+temp_size_x-1)+1;
end
end
26
focusMap=sumMap./cntMap;
%figure;imshow(uint8(focusMap*255));title('Focus map');
%% initial segmentation
disp('initial segmentation
..............................................'
decisionMap=zeros(hei,wid);
decisionMap(focusMap>0.5)=1;
decisionMap(focusMap<=0.5)=0;
%% consistency verification
disp('consistency verification
....................................................'
area=ceil(ratio*hei*wid);
tempMap1=bwareaopen(decisionMap,area);
tempMap2=1-tempMap1;
tempMap3=bwareaopen(tempMap2,area);
decisionMap=1-tempMap3;
imgf_gray=img1_gray.*decisionMap+img2_gray.*(1-decisionMap);
decisionMap = guidedfilter(imgf_gray,decisionMap,8,0.1);
27
%figure;imshow(uint8(decisionMap*255));title('Final decision map');
if size(img1,3)>1
decisionMap=repmat(decisionMap,[1 1 3]);
28
end
%% fusion
disp('fusion......'
imgf=img1.*decisionMap+img2.*(1-decisionMap);
F=uint8(imgf*255);
end
//script.m
function P=maxpooling_s2(A)
[h,w]=size(A);
if mod(h,2)==1
A=[A;zeros(1,w)];
h=h+1;
end
if mod(w,2)==1
A=[A,zeros(h,1)];
w=w+1;
end
A1=A(1:2:h-1,1:2:w-1);
A2=A(1:2:h-1,2:2:w);
A3=A(2:2:h,1:2:w-1);
A4=A(2:2:h,2:2:w);
P1=max(A1,A2);
P2=max(A3,A4);
P=max(P1,P2);
29
CHAPTER-6
RESULTS
From this project we have been able to extract the fully focused image from the multiple
defocused images with the help of convolution neural networks.
This document mainly presents a new multi-focus image fusion method based on a deep
convolutional neural network. The main novelty of our method is learning a CNN model to
achieve a direct mapping between source images and the focus map. Based on this idea, the
activity level measurement and fusion rule can be jointly generated by learning the CNN
model, which can overcome the difficulty faced by the existing fusion methods. The main
contribution of this paper could be summarized into the following four points:
1) We introduce CNNs into the field of image fusion. The feasibility and superiority of CNNs
used for image fusion are discussed. It Is the first time that CNNs are employed for an image
fusion task to the best of our knowledge.
3) We exhibit the potential of the learned CNN model for other-type image fusion This paper
mainly presents a new multi-focus image fusion method based on a deep convolutional neural
network. The main novelty of our method is learning a CNN model to achieve a direct
mapping between source images and the focus map. Based on this idea, the activity level
measurement and fusion rule can be jointly generated by learning the CNN model, which can
overcome the difficulty faced by the existing fusion methods. The main contribution of this
paper could be summarized into the following four points:
4) We put forward some suggestions on the future study of CNN-based image fusion. We
believe CNNs are capable of opening a new research approach in the field of image fusion.
30
Fig.6 input1(children_1.tif)
In the figure 6, the near field is focused and the far field is defocused.
Fig.7 input2(children_2.tif)
In the figure 7, the near field is defocused and the far field is focused.
31
Fig.8 Process of Fusion
32
CHAPTER-7
REFERENCES
Rev. 7 (2021).
6. S. Li, J. Kwok, Combination of images with diverse focuses using the spatial
frequency (2001).
7. G. Hinton, Image net classification with deep convolution neural networks (2012).
33
CHAPTER-8
CONCLUSION
The primary aim of multi-focus image fusion is to enhance the overall quality and
clarity of images. By intelligently combining focused regions from different images, this
technique generates composite images that are sharper, clearer, and more detailed than any
individual input. The objectives extend beyond mere visual aesthetics, encompassing crucial
aspects such as information retention, improved image interpretation, and adaptability to a
myriad of applications.
One of the key strengths of multi-focus image fusion lies in its adaptability to diverse
fields. From medical imaging to industrial inspection, surveillance, and computer vision, the
technique finds applications in various domains where precision and clarity are paramount.
The fusion process contributes to enhanced image analysis, object recognition, and scene
understanding, making it an invaluable tool professionals and researchers alike.
34