MATLAB中的注意机制应用分析资源-CSDN下载

共3个文件

exe：1个

pdf：1个

m：1个

注意机制

matlab

2星需积分: 47 142 浏览量 2013-03-15 09:52:03 上传评论 3 收藏 1.54MB RAR 举报

在计算机视觉和图像处理领域，"注意机制"是一种模拟人类视觉系统的模型，旨在让计算机能够自动识别和聚焦图像中的重要部分。在MATLAB环境中实现注意机制，可以为图像分析、目标检测、图像理解和视觉信息处理提供强大的工具。下面将详细讨论这一主题。一、注意机制的原理与应用注意机制源于心理学，它描述了人类如何在大量视觉信息中快速定位关键元素。在计算机视觉中，这种机制通常通过计算图像的显著性来实现，即确定哪些区域在视觉上最吸引人或最不寻常。显著性检测是寻找图像中与其他区域相比具有更高视觉显著性的部分，常用于目标检测、图像摘要、视觉导航等任务。二、MATLAB在实现注意机制中的角色 MATLAB作为一个强大的数值计算和可视化环境，提供了丰富的图像处理和机器学习工具箱，使得研究人员和开发者能方便地实现和调试注意机制算法。例如，`Saliency_CVPR2009.m`可能是一个MATLAB脚本，用于执行在CVPR（计算机视觉与模式识别会议）2009年发表的显著性检测方法。三、显著性检测算法 1. `SalientRegionDetectorCVPR09.exe`可能是一个基于CVPR 2009论文的显著性检测程序。该程序可能采用了特定的特征提取、融合策略和后处理步骤来生成显著性图。在MATLAB中，可以通过调用函数或编写脚本来实现类似的功能，比如利用颜色、纹理、边缘和局部对比度等多模态信息进行特征融合。 2. `Saliency_CVPR2009.m`可能是实现该算法的MATLAB源代码，包含了从输入图像到显著性图的完整流程。通过分析和理解这个脚本，我们可以学习到如何在MATLAB中构建一个自注意力模型，包括特征提取、权重分配、显著性评分以及最终的显著性图生成。四、相关研究与进阶学习阅读`1708.pdf`这篇文献，可能可以深入了解近年来在注意机制方面的最新进展，例如深度学习在显著性检测中的应用，或者结合其他AI技术如强化学习和生成对抗网络的新颖方法。总结，MATLAB作为一种强大的编程环境，非常适合实现和研究注意机制。通过分析和理解提供的文件，我们可以深入理解如何在MATLAB中构建显著性检测模型，进而应用于各种视觉任务。同时，不断跟踪最新的学术文献和技术发展，有助于我们保持对这个领域的前沿知识的掌握。

资源推荐

资源详情

资源评论

收起资源包目录

44659451Saliency.rar （3个子文件）

1708.pdf 1.75MB

Saliency_CVPR2009.m 2KB

SalientRegionDetectorCVPR09.exe 316KB

Frequency-tuned Salient Region Detection

Radhakrishna Achanta

†

, Sheila Hemami

‡

, Francisco Estrada

†

, and Sabine S

usstrunk

†

School of Computer and Communication Sciences (IC)

Ecole Polytechnique F

erale de Lausanne (EPFL), CH-1015, Switzerland.

[radhakrishna.achanta,francisco.estrada,sabine.susstrunk]@epfl.ch

‡

School of Electrical and Computer Engineering

Cornell University, Ithaca, NY 14853, U.S.A.

[email protected]

Abstract

Detection of visually salient image regions is useful for

applications like object segmentation, adaptive compres-

sion, and object recognition. In this paper, we introduce

a method for salient region detection that outputs full reso-

lution saliency maps with well-deﬁned boundaries of salient

objects. These boundaries are preserved by retaining sub-

stantially more frequency content from the original image

than other existing techniques. Our method exploits fea-

tures of color and luminance, is simple to implement, and is

computationally efﬁcient. We compare our algorithm to ﬁve

state-of-the-art salient region detection methods with a fre-

quency domain analysis, ground truth, and a salient object

segmentation application. Our method outperforms the ﬁve

algorithms both on the ground-truth evaluation and on the

segmentation task by achieving both higher precision and

better recall.

1. Introduction

Visual saliency is the perceptual quality that makes an

object, person, or pixel stand out relative to its neighbors

and thus capture our attention. Visual attention results both

from fast, pre-attentive, bottom-up visual saliency of the

retinal input, as well as from slower, top-down memory and

volition based processing that is task-dependent [24].

The focus of this paper is the automatic detection of

visually salient regions in images, which is useful in ap-

plications such as adaptive content delivery [22], adap-

tive region-of-interest based image compression [4], image

segmentation [18, 9], object recognition [26], and content

aware image resizing [2]. Our algorithm ﬁnds low-level,

pre-attentive, bottom-up saliency. It is inspired by the bio-

logical concept of center-surround contrast, but is not based

on any biological model.

Figure 1. Original images and their saliency maps using our algo-

rithm.

Current methods of saliency detection generate regions

that have low resolution, poorly deﬁned borders, or are ex-

pensive to compute. Additionally, some methods produce

higher saliency values at object edges instead of generat-

ing maps that uniformly cover the whole object, which re-

sults from failing to exploit all the spatial frequency content

of the original image. We analyze the spatial frequencies

in the original image that are retained by ﬁve state-of-the-

art techniques, and visually illustrate that these techniques

primarily operate using extremely low-frequency content

in the image. We introduce a frequency-tuned approach

to estimate center-surround contrast using color and lumi-

nance features that offers three advantages over existing

methods: uniformly highlighted salient regions with well-

deﬁned boundaries, full resolution, and computational efﬁ-

ciency. The saliency map generated can be more effectively

used in many applications, and here we present results for

object segmentation. We provide an objective comparison

of the accuracy of the saliency maps against ﬁve state-of-

the-art methods using a ground truth of a 1000 images. Our

method outperforms all of these methods in terms of preci-

sion and recall.

2. General approaches to determining saliency

The term saliency was used by Tsotsos et al. [27] and Ol-

shausen et al. [25] in their work on visual attention, and by

Itti et al. [16] in their work on rapid scene analysis. Saliency

has also been referred to as visual attention [27, 22], un-

predictability, rarity, or surprise [17, 14]. Saliency esti-

mation methods can broadly be classiﬁed as biologically

based, purely computational, or a combination. In general,

all methods employ a low-level approach by determining

contrast of image regions relative to their surroundings, us-

ing one or more features of intensity, color, and orientation.

Itti et al. [16] base their method on the biologically plau-

sible architecture proposed by Koch and Ullman [19]. They

determine center-surround contrast using a Difference of

Gaussians (DoG) approach. Frintrop et al. [7] present a

method inspired by Itti’s method, but they compute center-

surround differences with square ﬁlters and use integral im-

ages to speed up the calculations.

Other methods are purely computational [22, 13, 12, 1]

and are not based on biological vision principles. Ma and

Zhang [22] and Achanta et al. [1] estimate saliency us-

ing center-surround feature distances. Hu et al. [13] es-

timate saliency by applying heuristic measures on initial

saliency measures obtained by histogram thresholding of

feature maps. Gao and Vasconcelos [8] maximize the mu-

tual information between the feature distributions of center

and surround regions in an image, while Hou and Zhang

[12] rely on frequency domain processing.

The third category of methods are those that incorporate

ideas that are partly based on biological models and partly

on computational ones. For instance, Harel et al. [10] create

feature maps using Itti’s method but perform their normal-

ization using a graph based approach. Other methods use

a computational approach like maximization of information

[3] that represents a biologically plausible model of saliency

detection.

Some algorithms detect saliency over multiple scales

[16, 1], while others operate on a single scale [22, 13]. Also,

individual feature maps are created separately and then

combined to obtain the ﬁnal saliency map [15, 22, 13, 7], or

a feature combined saliency map is directly obtained [22, 1].

2.1. Limitations of saliency maps

The saliency maps generated by most methods have

low resolution [16, 22, 10, 7, 12]. Itti’s method produces

saliency maps that are just 1/256

the original image size

in pixels, while Hou and Zhang [12] output maps of size

64 × 64 pixels for any input image size. An exception is

the algorithm presented by Achanta et al. [1] that outputs

saliency maps of the same size as the input image. This is

accomplished by changing the ﬁlter size to achieve a change

in scale rather than the original image size.

Depending on the salient region detector, some maps

additionally have ill-deﬁned object boundaries [16, 10, 7],

limiting their usefulness in certain applications. This arises

from severe downsizing of the input image, which reduces

the range of spatial frequencies in the original image con-

sidered in the creation of the saliency maps. Other methods

highlight the salient object boundaries, but fail to uniformly

map the entire salient region [22, 12] or better highlight

smaller salient regions than larger ones [1]. These short-

comings result from the limited range of spatial frequen-

cies retained from the original image in computing the ﬁnal

saliency map as well as the speciﬁc algorithmic properties.

3. Frequency Domain Analysis of Saliency De-

tectors

We examine the information content used in the creation

of the saliency maps of ﬁve state-of-the-art methods from a

frequency domain perspective. The ﬁve saliency detectors

are Itti et al. [16], Ma and Zhang [22], Harel et al. [10],

Hou and Zhang [12], and Achanta et al. [1], hereby re-

ferred to as IT, MZ, GB, SR, and AC, respectively. We refer

to our proposed method as IG. The choice of these algo-

rithms is motivated by the following reasons: citation in lit-

erature (the classic approach of IT is widely cited), recency

(GB, SR, and AC are recent), and variety (IT is biologically

motivated, MZ is purely computational, GB is a hybrid ap-

proach, SR estimates saliency in the frequency domain, and

AC outputs full-resolution maps).

3.1. Spatial frequency content of saliency maps

To analyze the properties of the ﬁve saliency algorithms,

we examine the spatial frequency content from the original

image that is retained in computing the ﬁnal saliency map.

It will be shown in Sec. 4.3 that the range of spatial frequen-

cies retained by our proposed algorithm is more appropriate

than the algorithms used for comparison. For simplicity, the

following analysis is given in one dimension and extensions

to two dimensions are clariﬁed when necessary.

In method IT, a Gaussian pyramid of 9 levels (level 0 is

the original image) is built with successive Gaussian blur-

ring and downsampling by 2 in each dimension. In the case

of the luminance image, this results in a successive reduc-

tion of the spatial frequencies retained from the input im-

age. Each smoothing operation approximately halves the

normalized frequency spectrum of the image. At the end

of 8 such smoothing operations, the frequencies retained

from the spectrum of the original image at level 8 range

within [0, π/256]. The technique computes differences

of Gaussian-smoothed images from this pyramid, resizing

them to size of level 4, which results in using frequency con-

tent from the original image in the range [π/256, π/16]. In

this frequency range the DC (mean) component is removed

(a) Original (b) IT [16] (c) MZ [22] (d) GB [10] (e) SR [12] (f) AC [1] (g) IG

Figure 2. Original image ﬁltered with band-pass ﬁlters with cut-off frequencies given in Table 3.1. (b)-(g) illustrate the spatial frequency

information retained in the computation of each of the saliency maps.

along with approximately 99% ((1−

)×100) of the high

frequencies for a 2-D image. As such, the net information

retained from the original image contains very few details

and represents a very blurry version of the original image

(see the band-pass ﬁltered image of Fig. 2(b)).

In method MZ, a low-resolution image is created by av-

eraging blocks of pixels and then downsampling the ﬁltered

image such that each block is represented by a single pixel

having its average value. The averaging operation performs

low-pass ﬁltering. While the authors do not provide a block

size for this operation, we obtained good with a block size

of 10 × 10 pixels, and as such the frequencies retained from

the original image are in the range [0, π/10].

In method GB, the initial steps for creating feature

maps are similar to IT, with the difference that fewer lev-

els of the pyramid are used to ﬁnd center-surround differ-

ences. The spatial frequencies retained are within the range

[π/128, π/8]. Approximately 98% ((1 −

) × 100) of the

high frequencies are discarded for a 2D image. As illus-

trated in Fig. 2(d), there is slightly more high frequency

content than in 2(b).

In method SR, the input image is resized to 64 × 64 pix-

els (via low-pass ﬁltering and downsampling) based on the

argument that the spatial resolution of pre-attentive vision is

very limited. The resulting frequency content of the resized

image therefore varies according to the original size of the

image. For example, with input images of size 320 × 320

pixels (which is the approximate average dimension of the

images of our test database), the retained frequencies are

limited to the range [0, π/5]. As seen in Fig. 2(e), higher

frequencies are smoothed out.

In method AC, a difference-of-means ﬁlter is used to

estimate center-surround contrast. The lowest frequencies

retained depend on the size of the largest surround ﬁlter

(which is half of the image’s smaller dimension) and the

highest frequencies depend on the size of the smallest center

ﬁlter (which is one pixel). As such, method AC effectively

retains the entire range of frequencies (0, π] with a notch

at DC. All the high frequencies from the original image are

retained in the saliency map but not all low frequencies (see

Fig. 2(f)).

Method Freq. range Res. Complexity

IT [π/256, π/16] S/256 O(k

MZ [0, π/10] S/100 O(k

GB [π/128, π/8] S/64 O(k

SR [0, π/5] 64 × 64 O(k

AC (0, π] S O(k

IG (0, π/2.75] S O(k

Table 1. A comparison of 1-D frequency ranges, saliency map res-

olution, and computational efﬁciency. S is the input image size

in pixels. Although the complexity of all methods except GB is

proportional to N , the operations per pixel in these methods vary

< k

). GB has an overall

complexity of O(k

K), depending on the number of itera-

tions K.

3.2. Other properties of methods MZ, SR, and AC

In MZ, the saliency value at each pixel position (i, j) is

given by:

S(x, y) =

(m,n)∈N

d[p(x, y), q(m, n)] (1)

where N is a small neighborhood of a pixel (in the resized

image obtained by 10 × 10 box ﬁltered and downsampled

image) at position (x, y) and d is a Euclidean distance be-

tween Luv pixel vectors p and q. In our experiments, we

choose N to be a 3 × 3 neighborhood. The method is fast

but has the drawback that the saliency values at either side

of an edge of a salient object are high, i.e the saliency maps

show the salient object to be bigger than it is, which gets

more pronounced if block sizes are bigger than 10 × 10. In

addition, for large salient objects, the salient regions are not

likely to be uniformly highlighted (see Fig.3(c)) .

In SR, the spectral residual R is found by subtracting a

smoothed version of the FFT (Fast Fourier Transform) log-

magnitude spectrum from the original log-magnitude spec-

trum. The saliency map is the inverse transform of the spec-

tral residual. The FFT is smoothed using a separable 3 × 3

mean ﬁlter. Examining this operation in one dimension, this

is equivalent to forming the residue R(k) as:

R(k) = ln|X(k)| − g

∗ ln|X(k)| (2)

with g

= [

], and ∗ denoting convolution. A simple

评论收藏

内容反馈

yanzili2796

2022-04-08

程序有错误
ZhaoyueZhang

2023-02-23

#标题与内容不符
hdianyingwangpan

2022-01-28

假的别信，下载的东西没用
hanjie289

2014-12-22

资源挺有用的，对于我这种菜鸟有很大的帮助

xue201220122013

粉丝: 1

注意机制matlab

最新资源

注意机制matlab

回归预测 - MATLAB实现Attention-LSTM(注意力机制长短期记忆神经网络)多输入单输出（完整源码和数据）

colorimageretrivew.rar_K._matlab 注意力_区域提取_注意机制

基于卷积长短期记忆网络结合注意力机制(CNN-LSTM-Attention)时间序列预测（Matlab完整源码和数据）

基于卷积双向长短期记忆网络结合注意力机制(CNN-BILSTM-Attention)时间序列预测（Matlab完整源码和数据）

MATLAB实现TPA-BiLSTM注意力机制时间序列预测（完整源码和数据）

图像的均方误差的matlab代码-Recurrent-Attention-Model:循环注意力模型

Attention(注意力机制代码)

matlab注意力机制是什么意思

MATLAB实现CNN-BiLSTM多变量时间序列预测（完整源码和数据）

Matlab 基于长短期记忆网络(LSTM)的时间序列预测 LSTM时间序列

基于相位谱视觉注意机制matlab代码

matlab字符串注意

基于注意机制的小样本故障诊断模型

keras注意机制：Keras的注意机制实现

matlab注意力机制总结.zip

添加TPA注意力机制的LSTM时间序列预测（matlab）

matlab注意力机制.zip

回归预测 - MATLAB实现TPA-LSTM(时间注意力注意力机制长短期记忆神经网络)多输入单输出（完整源码和数据）

添加attention机制的LSTM时间序列预测（matlab）

基于卷积门控循环单元结合注意力机制(CNN-GRU-Attention)时间序列预测（Matlab完整源码和数据）

30种常见注意力机制论文、解读、使用方法、实现代码整理（Attention）

基于EfficientNet加入注意力机制matlab+Python仿真源码+数据（课程设计）.zip

注意力机制 pointer network

基于matlab注意力机制的异常检测方法.rar

Matlab实现Attention-GRU时间注意力机制融合门控循环单元时间序列预测（完整源码和数据)

MATLAB实现QRLSTM长短期记忆网络分位数回归多输入单输出区间预测（完整程序和数据）

Gaze tracking视线注意力检测MATLAB实现

nmt.matlab, 神经机器翻译系统的训练状态代码.zip

小生境遗传算法源程序1

VMWare ubuntu虚拟机等待1分30秒解决方案（A start job is running for dev-disk-by\x2duui...）（修改真实swap UUID）1min30s

模糊集赋能医学影像智能

最新资源