Information Systems, Technology and Management INFS3608 Data & Information Management Semester 2, 2008 Assignment Cover Page
Information Systems, Technology and Management INFS3608 Data & Information Management Semester 2, 2008 Assignment Cover Page
Name of Assignment
Tutorial Details
Tutors Name Greg Stephens Tutorial Day Wednesday Tutorial Time 4 PM
I certify that this assignment is my own work in which my sources are acknowledged and which I submit for the first time.
Introduction Digital images, video, audio, animation, and graphi s together !ith te"t data are the t#pi al multimedia data$ %he a &uisition, generation, storing and pro essing of multimedia data in omputers and transmission over net!or' has a tremendous gro!th re entl#$ %his ould (e aused (# several fa tors$ %he u(i&uitous nature of personal omputer and in reasing omputational po!er is one of the important fa tors$ %e hnologi al advan ement resulted in high)resolution devi e !hi h a(le to apture and displa# multimedia data *digital ameras, s anner, +D monitors, et ,$ More and more multimedia data is needed (e ause of its e" iting features, friendliness and omprehensi(ilit# of images, voi es and movies$ %he huge amount of data in different multimedia appli ations deserved to have the data(ases sin e the data(ases ould provide onsisten #, integrit#, se urit# and availa(ilit# of the data$ From the user perspe tive, data(ases offer fun tionalit# for eas# manipulation, &uer#, and retrieval of relevant information from huge olle tions of stored data$ Multimedia data(ases have to ope up !ith the in reasing usage of large volume of multimedia data used in various soft!are appli ations$ %hese appli ations in lude the -ournalism, art, entertainment, digital li(raries, et $ Some inherent &ualities of the multimedia data have influen ed the design and the development of multimedia data(ases$ . multimedia data(ase re&uires more apa(ilit# than the traditional D/MS$ %he apa(ilities in lude unified frame!or's for storing, pro essing, retrieving and presenting variet# of media data t#pes in various formats$ .t the same time, it also has to adhere to numeri al onstraints !hi h !ere not found in a t#pi al data(ase$ %his report !ill address some ma-or hallenges in designing the multimedia data(ase$ It !ill dis uss the general re&uirements of the multimedia data(ase, &uer# optimi0ation and performan e evaluation to measure the effe tiveness of the data(ase$ Design Issues inside Multimedia Database %he on eptual, logi al, and ph#si al design of multimedia data(ases has not (een full# addressed and it remains an a tive resear h area *1lmasri & Navathe, 2003,$ . ording to Su(raman#a *2000, there are several different t#pes of multimedia data that a multimedia data(ase should (e apa(le to handle$ %he (road lassifi ation is as follo!2 Media Data, the a tual data su h as images, videos that aptured, digiti0ed, pro essed, and ontrolled$ Media Format Data, !hi h is information related to the media data after it goes through the a &uisition, pro essing, and en oding phases$ 1"ample of this information in ludes the sampling rate, frame rate, en oding format, et $
Media 3e#!ord Data, !hi h ontains des ription of the 'e#!ords, related to the media data$ In an audio data for instan es, this might in lude the date, time and pla e of re ording, the person !ho is re orded, et $ %his is also might (e referred to ontent des riptive data$ Media Feature Data, !hi h ontain the features ta'en from the media data$ 1"ample of this information in ludes the distri(ution of olors, the 'ind of te"tures and the different shapes present in an image$ %his is also referred to as ontent dependent data$
%he last three t#pes of the data ould (e referred as metadata (e ause it des ri(es the aspe ts inside the media data$
%he figure a(ove sho!s the metadata generation pro ess from the a tual data$ %he media 'e#!ord and media feature data are used as indi es in the sear h pro ess !hile the media format data is used in presentation of retrieval results$ %he design of the multimedia data(ase is highl# influen ed (# the inherent nature of the multimedia data$ 3alipsi0 *2000, des ri(ed some of the hara teristi of the media data !hi h are unstru tured, temporal *impa t the storage, manipulation, and presentation,, huge si0ed, and omple" in terms of representation and interpretation$ In order to (e apa(le of handling su h a data, the data(ase should fulfill some important re&uirements as outlined (# Su(raman#a *2000, in one of his resear h paper2 Manage different t#pes of input, output and storage devi es$ Input data ould ome from a variet# of sour es li'e s anners, MIDI devi es or video ameras$ %#pi al output devi es in lude +D)monitor, spea'er +andle a variet# of data ompression and storage formats$ 1ven !ithin a single appli ation, the en oding of the data has a num(er of formats$ For instan es, M5I images of (rain has lossless or ver# stringent &ualit# of loss# oding te hni&ue !hile " ra# images might (e less stringent$ Integrate different data models$ Some data are (est handled (# different t#pes of data(ases i$e$ video do uments are (est handled (# o(-e t oriented data(ase !hile te"tual
data are (est handled using the relational model$ .ll of these models should o)e"ist !ithin the data(ase$ 6ffer a variet# of user)friendl# &uer# s#stem suited to different 'ind of media$ For user, eas#)to)use &uer# and effe tive retrieval of information is highl# desira(le$ Different forms ould (e used for &uer# of the same item$ For e"ample, a portion of interest in a video ould (e &ueried (# using either a fe! samples of video frames as an e"ample or a lip of the orresponding audio tra '$ +andle different 'ind of indi es$ %he su(-e tive nature of the multimedia data has aused e"a t sear h and 'e#!ord (ased indi es in traditional data(ase ineffe tive$ %he# need ontent)(ased &ueries and similarit# sear h e$g$ sear h for a person using fa e features from a data of fa ial images$ Develop measures of data similarit# that orrespond !ell !ith per eptual similarit#$
It seems that there is a huge -ump of te hni al re&uirements for the handling of multimedia data(ase$ 7ith this gap in te hni al re&uirements, it might ta'e a !hile for a multimedia data(ase to rea h its mature state$ 7ith similar vie!, .pers & 3ersten *4888, also mention that fulfilling all the re&uirements of a good multimedia data(ase !ith the urrent te hnolog# is not feasi(le$ %herefore, onl# !ith enough time, te hnologi al advan ement and ontinuous resear h that multimedia data(ase design ould ontinue to evolve$ Queries and Retrieval in Multimedia Database %he re ent development of the information retrieval has seen a lot of resear h fo using on development of the effe tive te hni&ues in for inde"ing and ontent)(ased retrieval *.pers & 3ersten, 4888,$ /ohm et al$ *2004, have a same opinion and stated that in ontrast !ith the onventional inde"ing s heme, ontent (ased retrieval re&uires the sear h of similar o(-e t as a (asi fun tionalit# of the data(ase s#stem$ . hieving an effi ient inde"ing s heme is the ma-or hallenge in the development of ontent)(ased retrieval appli ation$ . ne! inde"ing stru ture is needed (e ause the traditional inde"ing s heme annot ope !ith the re&uirements of a multimedia data(ase su h as d#nami inde"ing *3iran#a0 & 9a((ou- , 200:,$ /elo! are some development of &uer# s#stem for multimedia data(ase$ Hierarchical Cellular Tree ;uer# optimi0ation !ill (e dependent on the inde" used inside the data(ase$ 3iran#a0 & 9a((ou- *200:, proposed the hierar hi al ellular tree s heme in order to develop an effi ient inde"ing s heme using Metri . ess Method (ase$ It has a hierar hi stru ture !hi h is formed into one or more levels !ith ea h level apa(le of holding one or more ells$ Items are partitioned depending on their relative distan es and stored !ithin ells (ased on their similarit# degree$ 1a h ell further ontains a tree stru ture, a minimum spanning tree !hi h refers to the
data(ase o(-e ts as its MS% node$ In addition, +<% is a self)organi0ed tree !hi h is implemented via geneti programming method$ It means that the operation is not e"ternall# ontrolled and instead it is done a ording to some internal rules !ithin a ertain level$ %heir out omes ma# une"pe tedl# start other operation on other level$ +<% inde"ing an inde" a multimedia data(ase using an# set of availa(le features given a similarit# measure and fusion me hanism$ In remental onstru tion of the +<% (od# and fitness he ' forms the (asis of +<% inde"ing$ %he inde"ing (od# algorithm an (e e"pressed as follo!s, !hen 9 is an inde"ing genre and D is the data(ase$
Fitness he ' is done to prevent orruption !hi h might o urred due to the un ontrolla(le fa tors in the forming of +<% (od#$ +<% is mainl# used to !or' !ith progressive su()&uer# in order to provide the earliest possi(le retrieval for the most relevant items$ It is also useful for effi ient navigation among data(ase item (e ause users are guided (# nu leus items at ea h level$ From the e"periment, it suggest that +<% !or's !ell !ith in reasing num(er of data and ould provide (etter item relevan # due to its strong dis rimination features$ New Index Structure of RMD Tree 5NNMDS *5everse Nearest Neigh(or Multimedia Data Set, is one t#pe of &uer# that has re entl# re eived more attention$ In a multimedia data set of . and a &uer# point (, an 5NNMD &uer# finds all the point in . !ith the nearest neigh(or of a$
Mu'her-ee et al *200=, adopted the data model relationship on their s#stem (ased on the figure a(ove$ /# using this relationship, the s#stem ould generate dire t and indire t &ueries and retrieve re ords in the given heterogeneous domain$ %he proposed 5MD tree !ill follo! the stru ture of the standard 5)%ree Stru ture !hi h is similar to /)%ree stru ture (ut it is used for inde"ing multi)dimensional information$ %he differen e is that the ne! s#stem !ill store e"tra information a(out nearest neigh(or of the points in ea h node !hi h ould improve the algorithm pro ess$
Figure 3$ 5)%ree Stru ture .ssume that S is a set of multimedia data points in a d dimensional spa e$ D*p,&, refers to the distan e (et!een t!o points p and &$ If % is a su(set of S, D*p,%, represents the minimum distan e (et!een p and an points in %$ <*p,r, is a ir le !ith enter p and radius of r$ %he o(-e tive for the sear h algorithm is to find a su(set the nearest neigh(or multimedia data set sear h NNMDs *&, of . !hi h is defined as2
In ase of reverse nearest neigh(or multimedia data set sear h 5NNMDs for a given set S of points in some dimensional spa e and &uer# point & is defined as2
D*p, NNMDs *p,, is the distan e (et!een p and its nearest neigh(ors in S$ In the proposed 5MD>tree stru ture, a leaf node ontains entries of the form *ptid, dnn, !hile ?ptid@ refers to a multimedia data dimensional point in the multimedia data sets and ?dnn@ is the distan e (et!een the point and the nearest neigh(or in the multimedia data set$ . non)leaf node has an arra# of (ran hes of the form *Atr,5e t,ma">dnn,$ ?Atr@ is the address of a hild node in the tree$ If ?Atr@ points to a leaf node, ?5e t@ is the minimum)(ounding re tangle of all points in the leaf node$ If ?Atr@ points to a non)leaf node, ?5e t@ is the minimum)(ounding re tangle of all
re tangles that are entries in the hild node$ Ma">dnn B ma" Cdnns*p,D !here p are points in the su( tree rooted at this node$ From the result of the test, it seems that the s#stem has a signifi ant impa t in the d#nami ase and overall sho!s a (etter result in terms of effi ient and ost ompared to the standard tree$ %he t!o s#stem dis ussed a(ove are onl# e"ample of man# other proposed s#stem for &uer# optimi0ation and information retrieval inside the multimedia data(ase$ Man# s#stems in orporate the availa(le te hnolog# to their ne!l# proposed s#stem$ Most of the s#stem is also sho!ing attention to the similarit# of the o(-e ts *similarit# &ueries, although ea h s#stem handles it in a different !a# sin e the multimedia data is more am(iguous in nature$ +o!ever, there is no (est pra ti e for the &uer# and information retrieval in the multimedia data(ase for the time (eing$ None of the developed s#stem is perfe t and ea h s#stem is usuall# having strength in a parti ular area !hi h depends on some fa tor su h as data(ase environment and t#pe of data to (e dealt !ith$ !erformance "valuation Most resear h on the multimedia data(ase fo used on the data stru tures and &uer# optimi0ation !hi h is ne essar# for the multimedia data(ase$ +o!ever, fe!er !or's have (een done for the sa'e of performan e evaluation from anal#ti al and simulation point of vie!$ #verage Retrieval Time .verage retrieval time has an impa t on the num(er of on)line opti al dis' drives to (e installed and therefore, it is ver# important as a preliminar# evaluation in a proposed multimedia data(ase$ Feli ian *4880, develop a parti ular formula for this measurement using several assumptionsE 6nl# the average of on line a ess time *% on, and off line a ess time *%off , are ta'en into a ount in evaluating the position timeE segment transfer time !ill (e disregarded and dis's are 'ept online until another mount demand is issued$ Fsing these assumptions, the average a ess time t*-,',% on,%off, to the segments of a do ument !hi h is distri(uted on - on line dis' and * ' - , offline dis's is2
It is ne essar# to to ta'e !eighted average on the pro(a(ilit# distri(ution of - !hi h ma# var# from ma"*0, ')N G D, and min*',D,$ .ssume A*-,', is the pro(a(ilit# that e"a tl# - dis's are online and other dis's *')-, are offline2
In order to design a multimedia data(ase !ith a good response time, the rule of thum( is to minimi0e the value of ' rather than to let the num(er of on line drives gro!$
Fig=$ .verage a ess time to the image segments of a do ument !hi h is distri(uted on ' dis's, as a fun tion of the num(er of dis's$ For the values k = I, 2,3$ three different urves are plotted, for D B 2,=,8 drives respe tivel#$ %,,, B lHl66%,, is assumed$
Several e"ternal parameters su h as the num(er of initial do ument in the data(ase, the annual gro!th rate of the do ument, the average image &uantit# stored on a single dis', and per entage of image deletion are impa ting the (est hoi e for I and 'ma" *Feli ian, 4880,$ 7ith ontinuous e"periment in this area, it !ould give a (etter idea of the performan e re&uirement for the data(ase developer and pra titioner in the design of the multimedia data(ase$
C$NC%&SI$N %his paper has loo'ed into some issues (ehind the multimedia data(ase$ Multimedia data(ase has (e ome in reasingl# more important !ith the advan ement of te hnolog# and the gro!th of multimedia data$ It is a relativel# ne! te hnolog# in the data(ase field and it is still undergoing a ontinuous resear h to resolve the issue$ %he ma-or hallenge in the development of the multimedia data(ase is often related !ith the design and &uer# optimi0ation sin e it deals !ith various 'ind of different data stru ture inherent in the multimedia data$ %he issues (ehind the design is mainl# fo used !ith effe tive handling and presentation of different data stru ture !hile the &uer# and retrieval issue mainl# deals !ith inde"ing stru ture for effi ient &uer# and information retrieval$ %here are man# proposed inde"ing and &uer# s#stem for multimedia data(ase$ Jet, the most effe tive s#stem is #et to (e found and this area is still under e"tensive resear h$ %he performan e evaluation of the multimedia data(ase is an area that is often negle ted$ +o!ever, it should re eive more attention (e ause it ould a t as an important tool in measuring the effe tiveness of the multimedia data(ase development$
R"'"R"NC"S .pers, A$& 3ersten$ M 2000$ <ontent /ased 5etrieval in Multimedia Data(ases /ased on feature models, K<NS, Springer /erlin$ /ohm et al$ 2004, Sear hing in +igh Dimensional Spa e) Inde" Stru tures for improving the performan e of multimedia data(ases, .<M <omputing surve#s vol$ 33 no 3 pp$322)3:3$ . essed on 4=th 6 to(er 2008 1lmasri & Navathe 2003, Fundamental of Data(ase S#stem, =th 1dition, .ddisson 7esle#$ Feli ian, K$ 4880, Simulative and .nal#ti al Studies on Aerforman e in Karge Multimedia Data(ase, Information System vol$ 4L no = pp =4:)=2:, Aergamon Aress, a essed on 4Lth 6 to(er 2008 3alipsi0, 6$ 2000, Multimedia Data(ases, IEEE Xplore, a essed on 4=th 6 to(er 2008 3iran#a0, S$ & 9a((ou-, M$ 200:, +ierar hi al <ellular %ree 2.n 1ffi ient Inde"ing S heme for <ontent /ased 5etrieval on Multimedia Data(ases, IEEE Transaction on Multimedia, vol 8 no 4, a essed on 4=th 6 to(er 2008 Mu'her-ee et al 200=, Inde"ing and Sear hing in Multimedia Data(ase, Indian Institute of %e hnolog#, IEEE Xplore, a essed on 4= 6 to(er 2008 Su(raman#a,S$5$ 2000, Multimedia Data(ases Issues and <hallenge, IEEE Xplore, De em(erHManuar# Issue, a essed on 4= 6 to(er 2008