A Graph-Based Approach For Visualizing
A Graph-Based Approach For Visualizing
Information Sciences
journal homepage: www.elsevier.com/locate/ins
a r t i c l e i n f o a b s t r a c t
Article history: Nowadays, together with the increasing spread of the content available online, users’ in-
Received 19 January 2016 formation needs have become more complex. To fulfill them, users strongly rely on Web
Revised 5 July 2016
search engines, but traditional ways of presenting search results are often unsatisfactory. In
Accepted 28 July 2016
fact, Web pages carry information that exists in multiple media formats, such as text, au-
Available online 3 August 2016
dio, image and video objects. Vertical search engines and medium-specific search services
Keywords: do not provide users with an integrated view of search results. Furthermore, multiple me-
Search result visualization dia objects are in most of the cases highly semantically interlinked, but the connections
Search result exploration between them are not sufficiently exploited to provide a further exploration of the re-
Multimedia Information Retrieval trieved objects. To address these issues, in this paper we propose a graph-based approach
Aggregated search aimed at providing users with the possibility to dynamically visualize and explore a search
Search result space result space built over a repository of multimedia documents containing interconnected
Search engines multiple media objects. To do this, we represent the search result space via a graph-based
data model, where both the retrieved multimedia documents and connected relevant me-
dia objects are considered. Media objects, among them, are connected via different kinds
of similarity relationships, which depend on the low-level features and metadata taken
into consideration to access the media objects. The approach and the connected visualiza-
tion and exploration interface have been implemented and tested on a publicly available
dataset, and they have been evaluated by means of a usability test.
© 2016 Elsevier Inc. All rights reserved.
1. Introduction
In recent years, a rapid and huge increase in the production of digital data has been observed; for example, Bounie
and Gille [4] reported a growth of 233% of the information available on the World Wide Web from 2003 to 2008. This
information is in most cases carried by different media types, such as text, audio, image, and video objects [1], which in
the literature are referred as multiple media objects [25]. Multimedia documents can in this context be seen as containers of
media objects [21], and a Web page is definitely the most known example of a multimedia document.
Nowadays, Web search engines offer to users the possibility to search both for Web pages (traditional search) and for
various media types (vertical search) separately. As a consequence, the results produced by the evaluation of a query are
presented to users in separate ranked lists, one for Web pages (multimedia documents) and one for each distinct media type.
Recently, aggregated search approaches have been proposed to integrate the search results from one or multiple information
∗
Corresponding author.
E-mail addresses: [email protected] (U. Rashid), [email protected] (M. Viviani), [email protected] (G. Pasi).
http://dx.doi.org/10.1016/j.ins.2016.07.072
0020-0255/© 2016 Elsevier Inc. All rights reserved.
304 U. Rashid et al. / Information Sciences 370–371 (2016) 303–322
sources into a single Web page [5,14]. The rationale behind aggregated search is to present to users integrated information
nuggets [14], which are either parts/portions of documents or media objects. In this context, the integration of multiple
media contents can be provided in two main ways. A first solution consists in the simple aggregation of search results from
vertical search engines. A second approach assembles and presents information nuggets based on the possible relationships
among them.
Taking inspiration from this second approach, we consider that multiple media objects within multimedia documents
could be better integrated in the presentation of search results by taking into account their low-level features and/or the
metadata associated with them, and by generating similarity relationships among media objects by exploiting this informa-
tion. In this way, it would be possible to visualize and explore the search result space by means of a relational (graph-based)
approach.
Following this line of research, in this paper we present a novel way of presenting search results that allows: (i) the
visualization in an integrated way of both the retrieved multimedia documents and the relevant media objects they contain,
and (ii) the exploration of the search result space based on the similarity relationships among media objects. The proposed
approach relies on a graph-based data model that has been defined to represent a search result space, where nodes are
associated with both multimedia documents and media objects, and the considered relationships are: (i) the part-of re-
lationship, connecting media objects to the multimedia document they belong to, and (ii) different similarity relationships,
connecting multiple media objects among them. The rationale behind the definition of distinct similarity relationships is that
each media object can be possibly accessed through different modalities (i.e., textual, acoustic, and visual) depending on the
considered low-level features or on the available metadata extracted from or associated with it. An image object, for exam-
ple, could be accessed through a textual modality via the caption associated with it, or a visual modality by considering its
color histogram. Similarly, an audio object could be accessed through a textual modality considering some user comments
associated with it, or a provided textual transcription of the speech, and through an acoustic modality by extracting its
audio histogram. This way, considering the possible access modalities connected to different media objects, we are able to
establish (at most) three kinds of similarity relationships among media objects, based on suitable similarity measures that
act on the considered low-level features or metadata.
A preliminary introduction of some of the ideas presented in this paper has been described in [22]. With respect to that
preliminary work, in this paper:
The paper is organized as follows. In Section 2, we discuss the background and the motivations of our research. In
Sections 3 and 4, we describe the graph-based data model and its instantiation on a publicly available dataset. In Section 5,
we illustrate the GUI and we elaborate a visualization and exploration example. In Section 6, we provide an evaluation of
the approach by means of a usability test. Finally, in Section 7, we draw the conclusions and we discuss future research
directions.
Most Web search companies provide search engines that allow users to access distinct media repositories, following the
so-called vertical search paradigm [27]. In this scenario, users select a single media type in which they are interested, they
formulate a query, and in most cases the vertical search engine produces a simple ranked list of the retrieved results.
On the Web, there are also a variety of specific media search engines like YouTube,1 Flicker,2 and FindSounds3 that allow,
by means of keyword-based queries, to retrieve video, image, and audio objects respectively. Only more recently, general
purpose search engines started to focus on the integration of the search results produced by a same query over distinct
media types. For example, both Google4 and Bing5 provide content-based search for text and image objects; moreover,
Google provides a small integrated view to present together highlights of textual, image and other media objects in the
results Web page. The research area focusing on the above issue is aggregated search [15]. At the basis of aggregated search
there are both the aspects of how to select and represent the objects constituting the search result space, and how to
present search results to users in an integrated way. Since the approach proposed in this paper is connected to both aspects,
in Section 2.1 we provide a description of the two main classes of approaches to aggregated search, and in Section 2.2 we
1
https://www.youtube.com/.
2
https://www.flickr.com/.
3
http://www.findsounds.com/.
4
https://www.google.com/.
5
https://www.bing.com/.
U. Rashid et al. / Information Sciences 370–371 (2016) 303–322 305
illustrate the state of the art in the development of aggregated search approaches. Finally, in Section 2.3 we present the
novelties with respect these approaches, and the motivations behind our work.
According to [14], aggregated search is the process of combining different information to achieve better focus and bet-
ter organization of search results. In particular, its aim is to assemble useful information from one or multiple sources
through one interface. In this context, the aggregation of multiple media search results is usually provided by approaches
like cross vertical Aggregated Search (cvAS) (usually provided in the Web search context) and Relational Aggregated Search
(RAS) [14] (useful to structure search results). The first approach is inspired by traditional federated search and meta-search
approaches. It allows to include multiple media contents (e.g., images, videos, news, etc.) within traditional search results
via the use of vertical search engines. The second approach takes into account the possible relationships among information
nuggets, and it assembles and presents them based on these relationships.
The cvAS approach does not focus primarily on the explicit assembly of the search results, which consist of a group of
unrelated information nuggets coming from different sources. Interrelations among media contents remain unknown. It has
been demonstrated [13,28] that cvAS increases the diversity of relevant results, since results coming from different sources
can be complementary. This is particularly useful in the presence of ambiguous queries [13]. By using the cvAS approach it
is possible to present search results by placing content of the same type into different panels, using an unblended approach.
It is also possible to follow a blended approach, merging search results from different sources in the same panel.
Alternatively, the RAS approach provides more structure and organization to the search results, because it exploits the
interrelationships among information nuggets. Complex data structures/models like graphs, trees, sequences, etc. are emerg-
ing in recent years to organize and assemble information contents in relational-based approaches [16]. In the RAS approach,
blended integration can be better exploited with respect to cvAS for presenting search results. Nowadays, researchers have
developed some engines to support blended integration of multimedia information. The majority of them relies on the RAS
approach. Section 2.2 provides an overview of the different solutions proposed.
One of the first researchers who has addressed the role of browsing, navigation, and visualization of search results is
Marti Hearst [12]. She has worked on the concept of faceted navigation6 [11], but her research did not primarily focus on
the issue of aggregated search. In recent years, only some tools have been proposed both to visualize in an integrated way
and to explore a multimedia search result space. FaericWorld [23] provides combined search, browsing and visualization of
multimedia documents containing audio-visual contents through a graph-based data model. The navigation is possible by
following links that directly connect multimedia documents by exploiting thematic, temporal and reference relationships.
Visual Islands [34] provides text-based retrieval of a broadcast news corpus, where multimedia documents contain news
with related images and textual descriptions. Images are retrieved through keyword-based queries, and they are clustered
in the form of image islands, i.e., thumbnails of similar images in adjacent locations. Media Finder [24] provides image and
video search over multiple social networks and related events. Media Finder utilizes textual information in the establish-
ment of links among multimedia documents. At the basis of the interface, a common document model/schema is employed
to align the results retrieved from different social network sites. The I-Search Project [2] provides an adaptive presenta-
tion of search results containing multiple media objects. This project provides a hierarchical-conceptual organization of the
search results through clustering. Although I-Search employs tree-like structures to cluster the search results, visualization
is only provided by presenting thumbnails of multiple media objects. When clicking on thumbnails, details regarding media
are provided, and it is also possible to both download and preview media sources. Also FacetScape [26] is an engine for
visualizing and exploring the search result space containing multiple media objects. Its interface creates facets of the search
results; multiple media objects belonging to the same facet lie in the same region on the screen, instead of being linearly
listed as items belonging to a facet like in traditional search engines.
2.3. Motivations
As pointed out in Section 2.1, the RAS approach provides users with an effective integrated view of search results, since
it exploits the interrelationships among information nuggets. The aggregated search engines presented in Section 2.2, which
are mostly based on the RAS approach, present nevertheless different weaknesses. First of all, the majority of them have
been developed to visualize only specific media types or a subset of them. Furthermore, they usually provide the exploration
of search results through a document-to-document (intended as multimedia documents) navigation, following semantic as-
sociations only based on textual and/or meta-information. The blended integration of the search results containing media
objects (associated with different multimedia documents) has been less discussed in the literature, in particular in the con-
text of aggregated search. Related to this latter issue, the existing approaches consider multimedia documents and media
6
Faceted search, also called faceted navigation or faceted browsing, is a technique for accessing information organized according to a faceted classification
system, allowing users to explore a collection of information by applying multiple filters [30].
306 U. Rashid et al. / Information Sciences 370–371 (2016) 303–322
objects as the same information retrieval granules; they do not allow users to either exploit the document granularity level
or the media object granularity level. Moreover, existing approaches neither consider the multimodal nature of media ob-
jects, nor exploit the fact that connections among them can be established by exploiting the multiple access modalities (like
textual, acoustic, and visual) characterizing them.
For all the above reasons, we propose a graph-based approach and the related Graphical User Interface for presenting
search results constituted by different media types. In particular, by our approach it is possible (i) to visualize in an in-
tegrated way both the retrieved multimedia documents and the relevant media objects they contain, and (ii) to explore
a search result space following similarity relationships among media objects. The graph-based data model representing a
search result space will be discussed in Section 3. This model constitutes the core of the visualization and exploration in-
terface that has been implemented; this GUI will be described in Section 5.
3. The graph-based data model
In our approach, a search result space is composed of both the retrieved multimedia documents and the media objects they
contain. In fact, a multimedia document can be composed of different (and multiple) media types, i.e., text objects, audio
objects, image objects, and video objects. Media objects are connected to multimedia documents via the part-of relationship,
while media objects can be connected among them via different similarity relationships. Both the part-of and the similarity
relationships can be used to explore the search result space through the retrieved multimedia documents and media objects.
In the proposed data model, three kinds of similarity relationships are defined: textual similarity, acoustic similarity, and
visual similarity. This happens because three are the possible access modalities to media objects: i.e., textual, acoustic, and
visual. A text object can be only accessed by means of the textual modality. An audio object may be accessed by multiple
modalities; in addition to the acoustic modality, it can also be accessed via the textual modality. An audio object may be
characterized, in fact, either by the transcription of the audio recording (if the audio contains or it has associated a spoken
content), or by some textual annotations provided by the listeners. Similarly, an image object may be accessed by both the
textual modality and the visual modality. Finally, a video object (the most complex) may be accessed through the textual
modality, the acoustic modality, and the visual modality. Based on the modality two media objects are accessed, a distinct
similarity relationship can be computed between them, as it will be illustrated in detail in Section 4.1. An informal, graphical
representation of the proposed model is depicted in Fig. 1.
Formally, the data model is represented as an acyclic, labeled graph G = V, E , where both multimedia documents
and the multiple media objects they contain are represented as nodes in V = {n1 , n2 , . . . , nm }. Moreover, both the rela-
tionships among media objects and those between media objects and multimedia documents are represented by edges in
E = {e1 , e2 , . . . , en }.
In particular, the set V is composed of two subsets of nodes: the set NMD , representing multimedia documents in the
retrieved result set, and the set NMO , representing media objects contained in the retrieved multimedia documents. Thus,
V = NMD ∪ NMO . Due to the fact that multimedia objects can be of four different types, NMO is composed of four distinct
subsets: Nt , Na , Ni , and Nv . They respectively contain nodes representing text, audio, image, and video objects. Formally,
V = NMD ∪ Nt ∪ Na ∪ Ni ∪ Nv , and |V | = |NMD | + |Nt | + |Na | + |Ni | + |Nv | = m.
In the same way, the set E is composed of two subsets of edges: the set LDM , containing edges connecting multimedia
documents to media objects, and the set LMM , containing edges connecting media objects across them. Thus, E = LDM ∪
LMM . Each edge in LDM is labeled as part-of-link,7 since this edge represents the part-of relationship between a multimedia
document and a media object, as illustrated at the beginning of this section.
7
For the sake of simplicity, in the rest of the paper we will refer directly to part-of-links when considering edges connecting multimedia documents and
media objects.
U. Rashid et al. / Information Sciences 370–371 (2016) 303–322 307
Each edge belonging to LMM represents one of the three possible similarity relationships between two media objects. For
this reason, this set is further decomposed into three subsets: Lt , La , and Lv , where each set contains edges representing
respectively: textual relationships, acoustic relationships, and visual relationships. Formally, E = LDM ∪ Lt ∪ La ∪ Lv , and |E | =
|LDM | + |Lt | + |La | + |Lv | = n.
Each edge belonging to Lt is labeled as t-link, each edge belonging to La as a-link, and each edge belonging to Lv as
v-link. A t-link connects a pair of media objects through their textual modality. Text, audio, image, and video objects can
be potentially interlinked with each other through t-links. In the same way, an a-link and a v-link connect a pair of media
objects through their acoustic and visual modalities respectively. Audio and video objects can be interlinked with each other
through a-links while image and video objects through v-links.
In particular, let us denote by deg(ni ) the number of edges that are incident to a given node ni . The following properties
hold:
• ∀ni ∈ NMD , deg(ni ) ≥ 1, because each multimedia document contains at least one media object.
• ∀ni ∈ Nt , deg(ni ) = p + k. Each ni can be connected to p ≥ 0 text, audio, image, and video objects through only t-links
and to k ≥ 1 multimedia documents via part-of-links.
• ∀ni ∈ Na , deg(ni ) = q + k. Each ni can be connected to q ≥ 0 text, audio, image, and video objects through t-links and
a-links, and to k ≥ 1 multimedia documents via part-of-links.
• ∀ni ∈ Ni , deg(ni ) = r + k. Each ni can be connected to r ≥ 0 text, audio, image, and video objects through t-links and
v-links, and to k ≥ 1 multimedia documents via part-of-links.
• ∀ni ∈ Nv , deg(ni ) = s + k. Each ni can be connected to s ≥ 0 text, audio, image, and video objects through t-links, a-links,
and v-links, and k ≥ 1 multimedia documents via part-of-links.
An edge ei (labeled either as t-link, a-link or v-link) is generated between a pair of media objects, if and only if their
similarity/distance, computed over their different access modalities (textual, visual, acoustic), is greater/lower than a given
threshold (a different threshold is established for each modality). Both the choices undertaken concerning similarity mea-
sures, and the process of choosing suitable thresholds will be addressed in detail in the next section.
In this section we illustrate an instantiation of our model on a publicly available dataset. It is important to outline that
this does not affect in any way the applicability of the proposed approach to alternative datasets and scenarios.
We make use of the multimedia document collection generated by the I-Search Project,8 the aim of which is to provide
a unified framework for multi-modal content indexing, sharing, search and retrieval. We use, in particular, the Rich Unified
Content Description (RUCoD) format, and the dataset containing 10,305 content objects (COs) stored as XML documents, both
provided by the project.9 A CO, according to RUCoD, is a multimedia document composed of one or more media objects; a
CO can be then seen as a container that encapsulates media objects related to a same concept. Each CO may contain free
text in the form of textual descriptions or user tags/annotations, the URI of the media preview, the URI of the actual media
source, and the low-level descriptors associated with multiple media objects that are connected to the content object itself.
Based on our graph-based data model, each content object is represented by a node ni ∈ NMD , and each media object
(text, audio, image and video object) contained in a content object is represented by a node belonging to the appropriate
set (i.e., either Nt , or Na , or Ni , or Nv ). Within the search result space, each media object is connected to at least one content
object via a part-of-link in LDM ; it can be further connected to other media objects via t-links in Lt , a-links in La , and v-
links in Lv by exploiting the textual, acoustic, and visual access modalities characterizing the retrieved media objects. With
respect to each access modality, we have selected both the specific low-level features/metadata associated with it, and the
appropriate similarity measure with the related thresholds to establish the edges in the graph. Technical details concerning
this process are provided in the following.
In the considered dataset all media objects are accessible through their textual modality by means of textual features
and textual annotations associated with them. In order to compute a textual similarity between media objects, we consider
the simple Jaccard index: J (A, B ) = AA∩
∪B
B
. This measure is based on the set of terms (keywords) A and B associated with the
two media objects. As in the considered dataset text descriptors are very short, to avoid sparsity problems connected to a
vector-based representation, our choice has fallen on this simple measure.
When considering the acoustic modality, we access audio and video objects through the low-level features in their audio
tracks. To this purpose, we have decided to consider the Mel-Frequency Cepstral Coefficients (MFCCs). MFCCs are computed
for variable sized audio clips, which is also the case of our data set. To compare them, we have selected the simple Eu-
clidean distance. We have extracted MFCCs and calculated the Euclidean distance between them via MatLab routines. We
8
http://iti.gr/iti/projects/I-SEARCH.html.
9
Source XML documents are available at: http://vcl.iti.gr/multimodal- search- and- retrieval/. Please note that we used the multimodal dataset 5.
308 U. Rashid et al. / Information Sciences 370–371 (2016) 303–322
have chosen this distance because it is well discussed in the literature along with MFCCs to compute acoustic distances in
content-based audio retrieval [19,31,35].
We have decided to access image and video objects through their visual modality by considering for video objects their
key-frames. From images and from video key-frames we have extracted their Color and Edge Directivity Descriptors (CEDDs)
(a choice compatible with the MPEG-7 standard). CEDDs consider both color and edge details in matching, unlike most of
the other existing complex descriptors do [8,9]. For extracting CEDDs and to compare them, we have still used MatLab and
selected the Euclidean distance due to its simplicity, and since it is very used in well-known image and video retrieval
applications [2,32,33].
To determine the edges that will compose the graph-based representation of the search result space, appropriate thresh-
olds are applied to the similarity values obtained between homogeneous features (corresponding to the specific access
modalities) for each considered couple of media objects. This way, only strong connected media objects are selected and
visualized in the search result space. Addressing the threshold selection issue by exploiting machine learning techniques or
statistical methods is out of the scope of this research. We adopt a simpler approach described in the following. To build
t-links, we simply consider as threshold α the average of the Jaccard indexes computed for every node with respect to all
other nodes accessible via the textual modality in the search result space. If the textual similarity value between two media
objects is higher than α , a t-link is established between them. Concerning a-links and v-links, we take as thresholds β and γ
the average of the Euclidean distances computed for every node with respect to all other nodes accessible through acoustic
and visual modalities in the search result space. If the acoustic and the visual similarity values (interpreted as distances)
between two media objects are lower than β and γ respectively, an a-link and a v-link are established between them.
The process of construction of the graph can be tedious and computationally expensive. Complexity is connected both
to the previously described aspects (i.e., the extraction of the features/metadata, the computation of the similarity measures
among media object, the selection of suitable thresholds), and to the data structures that are chosen to represent the graph
itself.
To tackle the first three issues, we have preliminarily extracted and stored by means of C# and MatLab routines the
features and metadata associated with media objects in CSV files as signatures, i.e., numeric representations of MFCCs and
CEDDs associated with the media objects.
Concerning the way the graph has been implemented, we adopted an adjacency list representation. Each list represents
a set of neighbors to a reference node. In this way, all the nodes in a graph are referred, and their neighbor nodes are
present in their corresponding lists. This representation provides a fast access to directly connected nodes as compared to
the alternative adjacency matrix representation, and avoid sparsity problems. In particular, we provided distinct adjacency
lists to represent t-links, a-links, and v-links.
In this section we present the Graphical User Interface that has been developed for visualizing and exploring a search
result space in a graph-based way. Before describing it in detail, we make a few technical premises.
First of all, in order to illustrate our approach and the operation of the interface, we have developed a simple search
engine based on the classical keyword-based approach for retrieving multimedia information. Since search and indexing
issues are out of the scope of this paper, the search engine is based on classical indexing techniques, and on the Vector
Space Model. An inverted index for both multimedia documents (content objects) and media objects has been created by
using C# and the Lucene.Net library.10 The postings in the index include both COs and media objects as fields in which the
textual query terms are searched. In fact, media objects are connected to content objects via part-of-links, and are searched
and retrieved through the textual modality associated with them according to keyword-based search. However, the interface
can be easily generalised to a search context where also content-based queries related to other media than text can be
specified.
Secondly, it is worth to underline that, based on the data model described in Section 3, the search result space can be
explored by following t-links, a-links and v-links. The way these edges are generated has been illustrated in Section 4. Here,
since we have introduced the description of the proposed inverted index structure, we point out that media objects in the
postings further contain the pointers to acoustic and visual descriptors, in order to allow the dynamic computation of the
similarity/distance values between media objects.
10
http://lucenenet.apache.org/.
U. Rashid et al. / Information Sciences 370–371 (2016) 303–322 309
Fig. 2. The interface of the visualization and exploration interface with its five panels: the Query Formulation Panel (a), the Result List Panel (b), the Media
View Panel (c), the Related Media Panel (d), a preview of the selected video object (e), and the Graph-based Related Media Panel (f).
The GUI is composed of five panels: (i) the Query Formulation Panel (QFP), (ii) the Result List Panel (RLP), (iii) the Media
View Panel (MVP), (iv ) the Related Media Panel (RMP), and (v ) the Graph-based Related Media Panel (GRMP). The interface
and its panels are shown in Fig. 2. A detailed description of each of the panels is provided in the following.
Query Formulation Panel. The QFP (Fig. 2(a)) allows a user to specify Boolean keyword-based queries in three distinct fields:
(i) the OR field, for ‘oring’ query terms; (ii) the AND field, for ‘anding’ query terms; (iii) the NOT field, for excluding query
terms (in the case of multiple query terms in the NOT field, they are considered connected by AND). A user can also specify
both the media types s/he is interested to retrieve, and by which modalities (textual, acoustic, and visual) s/he is interested
to explore the search result space.
Result List Panel. The RLP (Fig. 2(b)) presents the initial set of the (ranked11 ) retrieved multimedia documents (content
objects). In the result list, a user can see the title and the keywords associated with the retrieved documents. Each document
in the Result List Panel can be selected to view, in the Media View Panel (described below), the media objects it contains.
In this way a user can analyze which media objects exist within the retrieved multimedia documents.
Media View Panel. The MVP (Fig. 2(c)) presents as thumbnails all the media objects contained in the multimedia document
selected from the ranked list. In particular, it highlights those media objects in which the keywords have effectively been
found. In this panel, for each media object (depending on its type), a user can view keywords, descriptions, media previews
and links associated with it. By clicking the appropriate button (i.e., ‘listen’, ‘view’, or ‘watch’) on the thumbnail, the user
can directly open audio, image, or video objects in separate windows (Fig. 2(d)). To see the media objects (belonging to the
retrieved result set) that are semantically connected to a highlighted media object, a user can click on the ‘navigate’ button
that only appears on highlighted thumbnails. By clicking on the ‘navigate’ button, the Related Media Panel opens.
11
We only considered topical relevance based on the Vector Space Model.
310 U. Rashid et al. / Information Sciences 370–371 (2016) 303–322
Fig. 3. A keyword-based query in the QFP (a), the selected content object in the RLP (b), and the contained media objects in the MVP (c).
Related Media Panel. For the media object selected12 in the MVP, the thumbnails of the media objects connected to it via t-
links, a-links, and/or v-links (depending on the user choice in the QFP) are presented in the RMP (Fig. 2(e)). Each thumbnail
in the RMP presents the ‘listen’, ‘view’, or ‘watch’ buttons (depending on the media type), and the ‘navigate’ button (for
further exploring the media object). In this panel, forward and backward navigation buttons are also provided, if the number
of media objects exceeds the space of a single page. Furthermore, by clicking on the ‘visualize’ button on the bottom of the
panel, the Graph-based Related Media Panel opens.
Graph-based Related Media Panel. In the GRMP both the selected media object (either in the MVP or in the RMP) and its
semantically related media objects are represented, as interconnected nodes of a graph (Fig. 2(f)). The central node is the
selected media object, while its adjacent nodes presented at two levels deep are its related media objects. The details of the
central node are provided as textual descriptions, media previews, and a direct link to Web sources, illustrated in a glass
window. The glass window is made visible by simply double clicking on the panel.
In the previous section we have described the main components and features of the proposed interface. In the following
we provide a step-by-step example of how a user can interact with the interface.
Searching for multimedia information. Fig. 3(a) illustrates the case of a user introducing in the Query Formulation Panel some
query terms (i.e., Propellor in the OR field, Fighter Aircraft in the AND field, and Airplane in the NOT field).
The user also specifies that s/he intends to retrieve COs containing all kinds of media objects (i.e., text, image, audio,
video) that are relevant to the specified query terms. The user also selects visual links from the Query Formulation
Panel, which means that s/he chooses to visualize and explore media objects connected only via v-links.
Retrieving content objects and media objects. In the Result List Panel, the user visualizes all the retrieved content objects,
and selects the Tornado one, as illustrated in Fig. 3(b). In the Media View Panel, the two media objects contained in the
selected CO are visualized, but only the video object Panavia Tornado is highlighted (Fig. 3(c)). This means that the
query terms have been found in the textual information associated with the video object, but not in the textual information
associated with the image object.
Exploring media objects. The user can explore any of the highlighted media objects in the Media View Panel, to view the
media objects semantically related to it in the Related Media Panel. As illustrated in Fig. 4, a user selects the video ob-
ject Panavia Tornado from the Media View Panel. In the Related Media Panel, two image objects (F-14 Tomcat and
12
The media object for which the ‘navigate’ button has been clicked.
U. Rashid et al. / Information Sciences 370–371 (2016) 303–322 311
Fig. 4. The QFP, RLP, MVP and RMP, and the media object in the Related Media Panel semantically connected with the selected media object in the Media
View Panel.
Thunderbolt) and three video objects (Courtesy Fighter Bomber, F-14 Tomcat and Shinden), which are con-
nected via v-links with the selected video object Panavia Tornado, are presented to the user.
At this point, either from the Media View Panel or the Related Media Panel, a user can further explore the search result
space by selecting other media objects.
In the specific example, a user selects the video object F-14 Tomcat in the Related Media Panel (Fig. 5(a)). When a
user clicks the ‘navigate’ button on a media object in the RMP, the Result List Panel, the Media View Panel and the Re-
lated Media Panel itself are updated accordingly (Fig. 5(b)). The MVP now shows the selected video object F-14 Tomcat
and the image object F-14 Tomcat. These two media objects belong to the same CO F-14, which is automatically high-
lighted in the Result List Panel. Simultaneously, the media objects related (connected via v-links) to the selected video object
F-14 Tomcat are presented in the Related Media Panel. Specifically, we have two video objects (Shinden and Panavia
Tornado) and three image objects (Thunderbolt, Panavia Tornado and F-14 Tomcat).
As previously outlined, the proposed interface allows to select, during the query formulation phase, any combination of
link types (i.e., textual links, acoustic links, and/or visual links) that a user wants to visualize and navigate. Fig. 6 illustrates
the case of a user who has chosen to select all the possible link types. The different background colors associated with media
objects in the Related Media Panel indicate different kinds of links connecting them to the selected media object in the
Media View Panel. Specifically, the video objects Panavia Tornado and F-14 Tomcat, and the image object Fokker
Dr. I are connected to the selected Panavia Tornado video object in the MVP via v-links; the video object Panavia
Tornado and the audio object Thunderbolt via a-links; the video object Shinden, the image object Corsair, and the
video object Panavia Tornado via t-links.
Exploring media objects in a graph-based way. The Graph-based Related Media Panel presents on a dynamic graph the se-
lected media object (either in the MVP or in the RMP) and its semantically related media objects. Fig. 7 illustrates the 1:1
correspondences between the nodes in the Graph-based Related Media Panel and the media objects in the Related Media
Panel and in the Media View Panel (the same ones illustrated in Fig. 4).
In particular, Fig. 7(a) illustrates the video object Panavia Tornado selected from the MVP, Fig. 7(b) represents its
semantically connected media objects in the RMP, and Fig. 7(c) presents the corresponding graph, where the central node is
highlighted by means of a bold label and represents the video object in the MVP. The Graph-based Related Media Panel also
provides, in a semitransparent glass window, the textual information, media preview, and link to the actual media source
selected by the user.
It is worth to notice that each node in the graph can be further selected and explored. As illustrated in Fig. 8, a user
selects the video object F-14 Tomcat from the Graph-based Related Media Panel (see Fig. 8(a)). The corresponding media
objects in the Related Media Panel and in the Media View Panel are updated accordingly (see Fig. 8(b) and Fig. 8(c)). On
the graph, when clicking on the video object F-14 Tomcat, the corresponding node is moved to the central position, it is
highlighted, and its adjacent media objects are consequently arranged. The Panavia Tornado video object for example,
which was initially at the center of the graph, moves at the first deep level. The semitransparent glass window is updated
accordingly, and provides the details of the selected F-14 Tomcat video object.
Fig. 9 illustrates the correspondences between the media objects in the Related Media Panel (the same ones illustrated
in Fig. 6) and the nodes in the Graph-based Related Media Panel, when a user selects multiple link types for exploration.
312 U. Rashid et al. / Information Sciences 370–371 (2016) 303–322
Fig. 5. The video object F-14 Tomcat selected from the RMP (a), and the updated CO and media objects in the RLP, MVP and RMP (b).
Fig. 6. Distinct background colors are associated with media objects in the GRMP when a user selects multiple link types in the QFP.
U. Rashid et al. / Information Sciences 370–371 (2016) 303–322 313
Fig. 7. The video object Panavia Tornado selected by a user from the MVP (a), its adjacent media objects in the RMP (b), and the same media objects
represented on a graph in the GRMP (c).
Fig. 8. The video object F-14 Tomcat selected from the GRMP and its adjacent media objects (a), the corresponding media objects in the updated RMP
(b), and the video object updated accordingly in the MVP (c).
Different colors characterizing edges on the graph represent different link types: textual, acoustic, and visual links are rep-
resented by blue, green and red colors respectively.
The evaluation of the proposed approach has been performed by means of a usability test. Among the several metrics that
can be collected during the course of testing, we have assessed: (i) successful task completion, (ii) time-on-task, (iii) subjective
measures of usability, and (iv ) likes, dislikes and recommendations. In this scenario, (i) represents the users ability to complete
predefined tasks connected to both query formulation and to the exploration of multimedia information over a search result
314 U. Rashid et al. / Information Sciences 370–371 (2016) 303–322
Fig. 9. The media objects in the RMP and the corresponding nodes in the GRMP, when a user specifies multiple link types for exploration.
space; (ii) is the amount of time it takes a participant to complete the task; (iii) are self-reported participant ratings for
different usability aspects (e.g., satisfaction, ease of use, ease of finding information, etc.) of the system under evaluation,
where participants rate each aspect on a 5–9-point Likert scale [18]; (iv ) are the participants’ free comments about what
they liked/disliked most about the system, and recommendations for improving it. These comments are particularly useful
to have an idea about the usefulness of our approach with respect to traditional solutions to the presentation of search
results.
In order to select a significant number of users, we have referred to the works of Tullis and Stetson [29] and Nielsen
and Landauer [20]. As one would expect, from these studies it emerges that the accuracy of the analysis increases as the
sample size gets larger. In particular, the results indicate that sample sizes of at least 12/14 participants are needed to
get reasonably reliable results. Since in a more recent recommendation13 Nielsen suggests to test at least 20 users for
quantitative studies, we have conducted a survey with 25 participants, recruiting users with different characteristics, and
respecting the age and the gender balance. Specifically, participants were 13 males and 12 females, on average 31 years,
ranging from 20 to 62 years. 5 of them were academics both from science faculties and humanities, 12 were students (6
undergraduate student, 4 master students and 2 Ph.D. students) from science faculties, 4 were students (2 master student
and 2 Ph.D. students) from humanities, and the remaining 4 were people not related to the academic environment. For
13
https://www.nngroup.com/articles/quantitative- studies- how- many- users/.
U. Rashid et al. / Information Sciences 370–371 (2016) 303–322 315
better analyzing the obtained results, users were selected in a way to obtain three groups: (i) 5 experts in Information
Retrieval, (ii) 5 frequent computer users (having some hardware knowledge, and good knowledge of software and Web
applications), and (iii) 15 average computer users (accustomed to using popular software and Web applications). The choice
of selecting 15 average computer users is due to the fact that experts in IR and frequent computer users already constitute
10 more or less advanced users.
The task completion phase was conducted in a calm environment, on a laptop with a 2 GHz Intel Core i7 processor, 16GB
of RAM, and using a USB mouse. The display resolution was 1920 × 900 pixels. The tests were conducted as follows. After
a brief introduction to the structure of the evaluation and to the basic functionalities of the interface, users were requested
to complete the following tasks (all the tasks have been studied in a way to be performed and exactly reproducible based
on the provided dataset):
1. Search for text and image objects connected to the query “snake AND boa”, and select the second result in the Result List
Panel;
2. Search for image and video objects connected to the query “clownfish OR piranha”, select the fourth result from the Result
List Panel, select from the Media View Panel the image object connected to it, and open its preview (by clicking on the
‘view’ button);
3. Search for audio and video objects connected to the query “birdsong OR bird song NOT song”, select acoustic links, select
the third result from the Result List Panel, select from the Media View Panel the first highlighted audio object connected
to it, open its preview (by clicking on the ‘listen’ button), visualize the connected media objects in the Related Media
Panel (by clicking on the ‘navigate’ button), select the first video object in the RMP, and open its preview (by clicking on
the ‘watch’ button);
4. Search for text and video objects connected to the query “dragonfly NOT cessna”, select textual and visual links, select
the first result from the Result List Panel, select from the Media View Panel the first highlighted video object connected
to it, visualize the connected media objects in the Related Media Panel (by clicking on the ‘navigate’ button), select the
only image object in the Related Media Panel which is connected via a visual link, open its preview (by clicking on
the ‘view’ button), open the Graph-based Related Media Panel (by clicking on the ‘visualize’ button), identify the central
node Video:Dragonfly, and click on its adjacent node Video:Fly, in order to see its details in the semitransparent
glass window.
As it emerges from the proposed tasks, tasks 1–2 are more focused on ‘classical’ search activities, while tasks 3–4 are
more focused on the exploration of search results.
To evaluate the successful task completion with respect to time-on-task, we measured the Average Completion Time
(ACT) for each task. For the first task, the ACT was 12 s (ACT[1] = 12sec), for the second task 32 s (ACT[2] = 32sec), for the
third task 55 s (ACT[3] = 55sec), and for the fourth task 75 s (ACT[4] = 75sec). We considered the first task accomplished if
executed in a time less or equal than a threshold α , calculated as ACT[1] plus a tolerance T of 30 s, i.e., α = ACT[1] + T[1] =
42sec. We did the same for the other three tasks, considering the tolerances T[2] = 30sec, T[3] = 60sec, and T[4] = 120sec,
and consequently three thresholds β = 62sec, γ = 115sec and δ = 195sec. Table 1 summarizes the successful task completion
with respect to time-on-task.
In the table, for each user (belonging to a specific group) and for each task, the symbol ‘•’ indicates that the user com-
pleted the specified task in the considered useful time. We indicate, with the symbol ‘+’, the fact that the user completed
the task passing the considered useful time. We also show that a user requested clarifications or received assistance for the
completion of the task, by employing the symbol ‘∗ ’. Finally, the symbol ‘–’ indicates that for a particular user the specified
task was not completed in a 300sec time limit.
Table 1
Successful task completion.
1. Experts in IR IR1 • • • +
IR2 • • • •
IR3 • • • •
IR4 • • • •
IR5 • • • ∗
2. Frequent Users FU1 • • • •
FU2 • • • •
FU3 • • • ∗
FU4 • • + •
FU5 • • • •
Fig. 10. A comparison of questionnaires for measuring interactive systems usability. Image from [29]. Reprinted with permission.
Formulation Panel; users AU13 and AU15 did not complete respectively task 3 and task 4 due to some general difficulties in
performing the tasks.
After performing the required tasks, in order to evaluate subjective measures of usability, users were required to rate
different usability aspects connected to the proposed approach and interface. A variety of questionnaires have been used and
reported in the literature for assessing the perceived usability of interactive systems [29]. Among the most used there are the
Questionnaire for User Interface Satisfaction (QUIS) [10], the Computer System Usability Questionnaire (CSUQ) [17], and the
System Usability Scale (SUS) [6,7] questionnaire. The QUIS focuses on specific aspects of the human-computer interaction,
while the CSUQ and the SUS questionnaire refers to the usability of an interactive system in general. In Fig. 10 the results
of the study by Tullis and Stetson [29] about the effectiveness of standard usability questionnaires are illustrated. As it
emerges from the figure, CSUQ and SUS have a higher accuracy with an increasing sample size with respect to the other
questionnaires.
6.2.1. QUIS
The Questionnaire for User Interface Satisfaction (QUIS) [10] is a popular tool specifically designed to assess users’ subjec-
tive satisfaction with specific aspects of the human-computer interaction. The current QUIS 7.0 version of the questionnaire
U. Rashid et al. / Information Sciences 370–371 (2016) 303–322 317
has 21 main questions listed along with the 6 overall satisfaction questions. Each of the questions has a rating scale ascend-
ing from 1 on the left to 9 on the right and it is anchored at both endpoints with adjectives (e.g., inconsistent/consistent).
These adjectives are always positioned so that the scale goes from negative on the left to positive on the right. In addition,
each question can be answered by ‘not applicable’ (N/A). As it emerges from Fig. 11, questions 7–10 assess aspects con-
nected more to the appearance of the system interface. Questions 11–16 evaluate the quality of the terminology and the
information associated with the system. With questions 17–22 it is possible to measure how easy is for users to learn how
to use the system. Finally, questions 23–27 assess participants’ satisfaction with respect to general system capabilities.
Table 2 summarizes the QUIS overall usability scores assigned by the different participants, and the four factor scores for
the sub-scales screen, terminology and system information, learning, and system capabilities. The overall assessment of the
users about the usability of the system is positive (74.65%). With respect to this overall evaluation, the results show that the
proposed interface has been particularly appreciated under its appearance and the connected ‘interactive’ aspect (77.19%).
6.2.2. CSUQ
The Computer System Usability Questionnaire (CSUQ) [17] allows to measure the overall user satisfaction in using an
interactive system, not necessarily a Web interface. It contains 19 questions on a 7 point scale, where 1 represents ‘strongly
disagree’ and 7 ‘strongly agree’. An extra ‘not applicable’ (N/A) point is provided outside the scale. It is characterized by three
internal sub-scales: system usefulness, information quality, and interface quality. As it emerges from Fig. 12, the first eight
questions assess system usefulness. Questions 9–15 assess the participants’ satisfaction with the quality of the information
associated with the system (e.g., error messages, information clarity, etc.). Questions 16–18 allow to provide a rating for the
interface quality.
Table 3 summarizes the CSUQ overall usability scores assigned by the different participants, and the three factor scores
for the sub-scales system usefulness, information quality, and interface quality. The overall assessment of the users about the
usability of the system is good (73.65%). With respect to this overall evaluation, the results obtained by this questionnaire
show that the proposed approach has been particularly appreciated under its ‘usefulness’ aspect (75.50%).
6.2.3. SUS
The System Usability Scale (SUS) [6,7] questionnaire provides a “quick and dirty”, reliable tool for measuring usability.
It allows to evaluate a wide variety of products and services, including hardware, software, mobile devices, Web-sites and
applications. It consists of a 10-item questionnaire with five response options for respondents: the participants have to rate
318 U. Rashid et al. / Information Sciences 370–371 (2016) 303–322
Table 2
QUIS single participant’s evaluation scores.
Part. No. Overall Screen Term. and Syst. Inf. Learning Syst. Cap. Average Score value
each question with a 5 point scale, where 5 is ‘strongly agree’ and 1 is ‘strongly disagree’. As it emerges from Fig. 13,
questions 1, 3, 5, 7, and 9 evaluate ‘positive’ aspects of the system in exam, while questions 2, 4, 6, 8, and 10 evaluate
‘negative’ aspects. Concerning the first group of questions, the higher the rating, the better the evaluation. On the contrary,
for the second group of questions, the lower the rating, the better the evaluation.
Table 4 summarizes the SUS usability score values assigned by the different participants. Please note that SUS yields a
single number representing a composite measure of the overall usability of the system under analysis. For this reason, scores
for individual items are not meaningful on their own. To calculate the SUS score, the score contributions from each item are
first summed. Each item’s score contribution will range from 0 to 4. For items 1, 3, 5, 7, and 9, the score contribution is the
scale position minus 1. For items 2, 4, 6, 8 and 10, the contribution is 5 minus the scale position. To obtain the overall value
in the SUS range [0–100], the sum of the scores has to be multiplied by 2.5.
As it emerges from Table 4, the final average SUS score is 75.80. Fig. 14 provides the correspondence between the SUS
scores and other scales proposed in [3] (from where the figure has been reproduced). Based on this comparison, the pro-
posed interface has, using the SUS questionnaire, a ‘good’ and ‘acceptable’ level of usability.
U. Rashid et al. / Information Sciences 370–371 (2016) 303–322 319
Table 3
CSUQ single participant’s evaluation scores.
Part. No. Overall Syst. usefulness Inf. quality Int. quality Average Score value
Fig. 14. A comparison between the SUS scores and other grade rankings from [3]. Reprinted with permission.
Each of the three proposed questionnaires envisages that users can provide free comments on the positive and negative
aspects of the interactive system under test. From participants’ replies, some interesting comments and suggestions for
ameliorating the interface usability have emerged. In general, more or less all users (almost all members belonging to the
320 U. Rashid et al. / Information Sciences 370–371 (2016) 303–322
Table 4
SUS single participant’s evaluation scores.
1 3 4 3 4 2 4 3 4 3 4 34 85
2 1 3 2 3 2 3 2 3 2 2 23 57.5
3 3 3 2 4 3 4 2 3 2 4 30 75
4 2 4 3 3 2 3 3 3 3 3 29 72.5
5 2 3 2 4 2 3 2 4 2 3 27 67.5
6 3 3 3 4 3 4 3 3 2 4 32 80
7 2 3 2 3 2 3 3 3 2 3 26 65
8 1 3 2 3 2 3 2 3 2 3 24 60
9 3 3 4 4 2 4 3 4 3 4 34 85
10 2 3 2 3 2 3 3 3 3 3 27 67.5
11 1 3 1 3 2 2 2 2 2 2 20 50
12 3 3 2 4 2 4 3 3 3 3 30 75
13 2 3 3 4 2 3 2 3 2 3 27 67.5
14 3 4 3 3 2 3 3 4 3 4 32 80
15 3 3 3 4 3 4 4 4 4 4 36 90
16 4 4 3 3 3 4 3 4 3 4 35 87.5
17 3 3 2 4 3 3 2 3 2 4 29 72.5
18 3 4 3 3 3 4 3 4 3 4 34 85
19 3 4 3 4 3 3 4 4 3 4 35 87.5
20 3 4 3 4 3 4 4 4 4 4 37 92.5
21 3 3 2 3 2 3 3 3 2 4 28 70
22 3 4 2 4 3 4 2 3 3 3 31 77.5
23 3 4 3 3 3 4 4 4 4 3 35 87.5
24 3 3 4 4 3 3 3 3 4 4 34 85
25 2 4 2 4 3 3 3 3 2 3 29 72.5
Average score 75.80
first two groups and the majority of the third group) have found this way of exploring search results very useful, since it
allows to overcome the need to make different queries for different media types. In addition to this, the possibility to start
from a given query and to freely explore the search result space of connected results has been strongly appreciated, since
this presentation minimizes the need for new queries to refine search results.
Despite this, few deficiencies also emerged. From a purely aesthetic point of view, some users pointed out that improve-
ments are necessary in the general appearance of the interface. A limited number of users claimed that sometimes the
response of the system was slow. Information Retrieval experts have stressed the importance of better identifying how rel-
evance could be highlighted in presenting media objects both in the Related Media Panel and in the Graph-based Related
Media Panel. Frequent computer users have suggested in particular some ‘usability’ improvements, for example removing
the necessity for users to explicitly press buttons to open the RMP, or to have all the results on a single scrolling page in
the Related Media Panel. Having no familiarity with complex queries, average computer users have complained about a lack
of clarity in the mechanism developed to perform search, but this particular aspect was not the focus in the interface devel-
opment. Finally, a non negligible number of users (regardless of the group they belonged to) pointed out that the number
of results in the graph and the graph depth are sometimes too high, leading to a possible lack of clarity that affects an easy
navigability of search results. In the future, these useful comments and suggestions provided by the users will be addressed
to improve the whole approach, as discussed in the next section.
In Multimedia Information Retrieval, an effective solution to the problem of defining an integrated presentation of search
results constituted by multiple media objects remains an open issue, which requires further investigation. In this paper, a
graph-based approach for visualizing and exploring a multimedia search result space has been described. The approach pro-
posed in this article exploits a graph-based data model that associates multiple media objects with multimedia documents
via the part-of relationship, and that connects media objects via semantic similarity relationships built across the access
modalities characterizing them. Each media object can, in fact, be characterized by textual, acoustic and visual features, as
well as their combinations (depending on the media type). An image object can have textual information associated with
it (e.g., its caption, the name of the file, etc.). In the same way, a video object can be accessed by both a visual and an
acoustic modality by considering the visual and acoustic information associated with it. By evaluating the similarity be-
tween the low-level features characterizing media objects along their different access modalities, we have formalized three
different connections between them, i.e., t-links, a-links and v-links respectively. On the top of this model, we have devel-
oped a full-flagged Graphical User Interface that provides a blended integration of the retrieved media objects. Furthermore,
the interface allows users to explore search results both via a window-based navigation, and via interactive graph-based
visualizations.
U. Rashid et al. / Information Sciences 370–371 (2016) 303–322 321
In the future, our primary objective will be to study and improve the proposed approach from different viewpoints.
Concerning theoretical aspects connected to the data model, other data structures (like trees and their variations) will be
investigated. This could serve the purpose of providing different ways of visualizing and exploring a search result space. We
are also interested to implement some solutions for reducing the size of the graph presented in the Graph-based Related
Media Panel (GRMP), to address the users’ comments illustrated in Section 6.3 connected to this aspect. To present a lower
number of results in the graph, it would be possible for example to select only the top-k documents (COs) and their con-
nected media objects. This solution cannot provide the navigation within all search results: a user can only navigate across
the top-k ranked multimedia documents, so s/he cannot reach the lower ranked search results via the graph-based navi-
gation. Alternatively, the graph depth could also be reduced by increasing/decreasing the threshold values for establishing
t-links, a-links, and v-links.
Regarding the Graphical User Interface, we plan to improve its general appearance for providing users with a simple
search interface, and to implement additional visualization and exploration options. In particular, we aim to highlight the
relevance of search results in the different panels, and to ameliorate the way media objects are represented and organized
in the GRMP, while preserving and improving the ease of use of the GUI.
References
[1] A. Axenopoulos, S. Manolopoulou, P. Daras, Multimodal search and retrieval using manifold learning and query formulation, in: Proceedings of the
16th International Conference on 3D Web Technology, ACM, 2011, pp. 51–56.
[2] A. Axenopoulos, P. Daras, S. Malassiotis, V. Croce, M. Lazzaro, J. Etzold, P. Grimm, A. Massari, A. Camurri, T. Steiner, et al., I-search: a unified framework
for multimodal search and retrieval, in: The Future Internet, Springer, 2012, pp. 130–141.
[3] A. Bangor, P. Kortum, J. Miller, Determining what individual SUS scores mean: adding an adjective rating scale, J. Usability Stud. 4 (3) (2009) 114–123.
[4] D. Bounie, L. Gille, Info capacity| international production and dissemination of information: results, methodological issues and statistical perspectives,
Int. J. Commun. 6 (2012) 21.
[5] M. Bron, J. Van Gorp, F. Nack, L.B. Baltussen, M. de Rijke, Aggregated search interface preferences in multi-session search tasks, in: Proceedings of the
36th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2013, pp. 123–132.
[6] J. Brooke, Sus-a quick and dirty usability scale, Usability Eval. Ind. 189 (194) (1996) 4–7.
[7] J. Brooke, Sus: a retrospective, J. Usability Stud. 8 (2) (2013) 29–40.
[8] S.A. Chatzichristofis, Y.S. Boutalis, Cedd: color and edge directivity descriptor: a compact descriptor for image indexing and retrieval, in: Computer
Vision Systems, Springer, 2008, pp. 312–322.
[9] S.A. Chatzichristofis, K. Zagoris, Y.S. Boutalis, N. Papamarkos, Accurate image retrieval based on compact composite descriptors and relevance feedback
information, Int. J. Pattern Recognit. Artif. Intell. 24 (02) (2010) 207–244.
[10] J.P. Chin, V.A. Diehl, K.L. Norman, Development of an instrument measuring user satisfaction of the human-computer interface, in: Proceedings of the
SIGCHI Conference on Human Factors in Computing Systems, in: CHI ’88, ACM, New York, NY, USA, 1988, pp. 213–218, doi:10.1145/57167.57203.
[11] M. Hearst, UIs for faceted navigation: recent advances and remaining open problems, in: HCIR 2008: Proceedings of the Second Workshop on Hu-
man-Computer Interaction and Information Retrieval, 2008, pp. 13–17.
[12] M. Hearst, Search User Interfaces, Cambridge University Press, 2009.
[13] A. Kopliku, F. Damak, K. Pinel-Sauvagnat, M. Boughanem, Interest and evaluation of aggregated search, in: Proceedings of the 2011 IEEE/WIC/ACM
International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01, in: WI-IAT ’11, IEEE Computer Society, Washington, DC,
USA, 2011, pp. 154–161, doi:10.1109/WI-IAT.2011.99.
[14] A. Kopliku, K. Pinel-Sauvagnat, M. Boughanem, Aggregated search: a new information retrieval paradigm, ACM Comput. Surv. (CSUR) 46 (3) (2014) 41.
[15] M. Lalmas, Aggregated search, in: Advanced Topics in Information Retrieval, Springer, 2011, pp. 109–123.
[16] T.-H. Le, H. Elghazel, M.-S. Hacid, A relational-based approach for aggregated search in graph databases, in: Database Systems for Advanced Applica-
tions, Springer, 2012, pp. 33–47.
[17] J.R. Lewis, IBM computer usability satisfaction questionnaires: psychometric evaluation and instructions for use, Int. J. Hum.-Comput. Interact. 7 (1)
(1995) 57–78, doi:10.1080/10447319509526110.
[18] R. Likert, A technique for the measurement of attitudes., Arch. Psychol. 22 (140) (1932) 1–55.
[19] T.C. Nagavi, S. Anusha, P. Monisha, S. Poornima, Content based audio retrieval with mfcc feature extraction, clustering and sort-merge techniques, in:
Computing, Communications and Networking Technologies (ICCCNT), 2013 Fourth International Conference on, IEEE, 2013, pp. 1–6.
[20] J. Nielsen, T.K. Landauer, A mathematical model of the finding of usability problems, in: Proceedings of the INTERACT ’93 and CHI ’93 Conference on
Human Factors in Computing Systems, in: CHI ’93, ACM, New York, NY, USA, 1993, pp. 206–213, doi:10.1145/169059.169166.
[21] D. Rafailidis, S. Manolopoulou, P. Daras, A unified framework for multimodal retrieval, Pattern Recognit. 46 (12) (2013) 3358–3370.
[22] U. Rashid, M. Viviani, G. Pasi, M.A. Bhatti, The browsing issue in multimodal information retrieval: a navigation tool over a multiple media search
result space, in: T. Andreasen, H. Christiansen, J. Kacprzyk, H. Larsen, G. Pasi, O. Pivert, G. De Tré, M.A. Vila, A. Yazici, S. Zadrony (Eds.), Flexible
Query Answering Systems 2015, Advances in Intelligent Systems and Computing, 400, Springer International Publishing, 2016, pp. 271–282, doi:10.
1007/978- 3- 319- 26154- 6_21.
[23] M. Rigamonti, D. Lalanne, R. Ingold, Faericworld: browsing multimedia events through static documents and links, in: Human-Computer Interac-
tion–INTERACT 2007, Springer, 2007, pp. 102–115.
[24] G. Rizzo, T. Steiner, R. Troncy, R. Verborgh, J.L. Redondo García, R. Van de Walle, What fresh media are you looking for?: retrieving media items from
multiple social networks, in: Proceedings of the 2012 International Workshop on Socially-Aware Multimedia, ACM, 2012, pp. 15–20.
[25] M.L. Sapino, K.S. Candan, Multimedia information systems, in: Encyclopedia of Multimedia, 2nd Ed., 2008, pp. 554–562, doi:10.1007/
978- 0- 387- 78414- 4_47.
[26] C. Seifert, J. Jurgovsky, M. Granitzer, Facetscape: a visualization for exploring the search space, in: Information Visualisation (IV), 2014 18th Interna-
tional Conference on, IEEE, 2014, pp. 94–101.
[27] M. Slaney, Precision-recall is wrong for multimedia, MultiMedia IEEE 18 (3) (2011) 4–7, doi:10.1109/MMUL.2011.50.
[28] S. Sushmita, H. Joho, M. Lalmas, A task-based evaluation of an aggregated search interface, in: J. Karlgren, J. Tarhio, H. Hyyr (Eds.), String Processing and
Information Retrieval, Lecture Notes in Computer Science, 5721, Springer Berlin Heidelberg, 2009, pp. 322–333, doi:10.1007/978- 3- 642- 03784- 9_32.
[29] T.S. Tullis, J.N. Stetson, A comparison of questionnaires for assessing website usability, in: Usability Professional Association Conference, 2004, pp. 1–12.
[30] D. Tunkelang, Faceted search, Synthesis Lectures on Information Concepts, Retrieval, and Services, Morgan & Claypool Publishers, 2009, doi:10.2200/
S00190ED1V01Y200904ICR005.
[31] Y. Vaizman, B. McFee, G. Lanckriet, Codebook-based audio feature representation for music information retrieval, Audio Speech Lang. Process. IEEE/ACM
Trans. 22 (10) (2014) 1483–1493.
[32] S. Vrochidis, S. Papadopoulos, A. Moumtzidou, P. Sidiropoulos, E. Pianta, I. Kompatsiaris, Towards content-based patent image retrieval: a framework
perspective, World Patent Inf. 32 (2) (2010) 94–106.
322 U. Rashid et al. / Information Sciences 370–371 (2016) 303–322
[33] Z. Wang, M.D. Hoffman, P.R. Cook, K. Li, Vferret: content-based similarity search tool for continuous archived video, in: Proceedings of the 3rd ACM
Workshop on Continuous Archival and Retrival of Personal Experences, ACM, 2006, pp. 19–26.
[34] E. Zavesky, S.-F. Chang, C.-C. Yang, Visual islands: intuitive browsing of visual search results, in: Proceedings of the 2008 International Conference on
Content-Based Image and Video Retrieval, ACM, 2008, pp. 617–626.
[35] T. Zhang, J. Wu, D. Wang, T. Li, Audio retrieval based on perceptual similarity, in: Collaborative Computing: Networking, Applications and Worksharing
(CollaborateCom), 2014 International Conference on, IEEE, 2014, pp. 342–348.