Computer Graphics Systems Overview
Computer Graphics Systems Overview
NOTES
UNIT I
INTRODUCTION
LEARNING OBJECTIVES
To understand the different primitives like line, circle and their construction using
different methods.
To learn different clipping algorithms and their importance in visualization
Raster Algorithms
The goal of these raster algorithms is to convert graphical primitives into a set of
pixels as efficiently as possible. This process of converting from primitives to pixels is
known as scan conversion. These algorithms typically use incremental methods in order to
minimise the number of calculation (particularly multiplications and divisions) that are
performed during each iteration. These techniques use integer arithmetic rather than floating-
NOTES point arithmetic.
A beam of electrons emitted by an electron gun, passes through focusing and deflection
systems that direct the beam toward specified positions on the phosphor-coated screen.
The phosphor then emits a small spot of light at each position contacted by the electron
beam. Because the light emitted by the phosphor fades very rapidly, some method is
needed for maintaining the screen picture. One way to keep the phosphor glowing is to
redraw the picture repeatedly by quickly directing the electron beam back over the same
points. This type of display is called a refresh CRT.
The maximum number of points that can be displayed without overlap on a CRT is
referred to as the resolution. Typical resolution on high-quality systems is 1280 by 1024,
with higher resolutions available on many systems. High resolution systems are often referred
to as high-definition systems. Another property of video monitors is aspect ratio. This
number gives the ratio of vertical points to horizontal points necessary to produce equal-
length lines in both directions on the screen. An aspect ratio of 3/4 means that a vertical line
plotted with three points has the same length as a horizontal line plotted with four points.
The most common type of graphics monitor employing a CRT is the raster-scan
display, based on television technology. In a raster-scan system, the electron beam is
swept across the screen, one row at a time from top to bottom. As the electron beam
moves across each row, the beam intensity is turned on and off to create a pattern of
illuminated spots. Picture definition is stored in a memory area called the refresh buffer or
frame buffer. This memory area holds the set of intensity values for all the screen points.
Stored intensity values are then retrieved from the refresh buffer and “painted” on the
screen one row (scan line) at a time. Each screen point is referred to as a pixel or pel
(shortened form of picture element). The capability of a raster-scan system to store intensity
information for each screen point makes it well suited for the realistic display of scenes NOTES
containing subtle shading and color patterns. Home television sets and printers are examples
of other systems using raster-scan methods.
A system with 24 bits per pixel and a screen resolution of 1024 by 1024 requires 3
megabytes of storage for the frame buffer. On a black-and-white system with one bit per
pixel, the frame buffer is commonly called a bitmap. For systems with multiple bits per
pixel, the frame buffer is often referred to as a pixmap.
Color CRTs in graphics systems are designed as RGB monitors. These monitors use
shadow-mask methods and take the intensity level for each electron gun (red, green, and
blue) directly from the computer system without any intermediate processing. High-quality
raster-graphics systems have 24 bits per pixel in the frame buffer, allowing 256 voltage
NOTES settings for each electron gun and nearly 17 million color choices for each pixel. An RGB
color system with 24 bits of storage per pixel is generally referred to as a full-color system
or a true-color system.
Random scan monitors draw a picture one line at a time and for this reason are also
referred to as vector displays. The component lines of a picture can be drawn and refreshed
by a random-scan sys- tem in any specified order. A pen plotter operates in a similar way
and is an example of a random-scan, hard-copy device. Refresh rate on a random-scan
system depends on the number of lines to be displayed. Picture definition is now stored as
a set of line drawing commands in an area of memory referred to as the refresh display file.
Sometimes the refresh display file is called the display list, display program, or simply the
refresh buffer. To display a specified picture, the system cycles through the set of commands
in the display file, drawing each component line in turn. After all line drawing commands
have been processed, the system cycles back to the first line command in the list.
Random-scan displays are designed to draw all the component lines of a picture 30
to 60 times each second. Random-scan systems are designed for line drawing applications
and cannot display realistic shaded scenes. Since picture definition is stored as a set of line
drawing instructions and not as a set of intensity values for all screen points, vector displays
generally have higher resolution than raster systems. Also, vector displays produce smooth
line drawings because the CRT beam directly follows the line path. A raster system, in
contrast, produces jagged lines that are plotted as discrete point sets.
Flat-panel displays include small TV monitors, calculators, pocket video games, laptop
computers etc., we can separate flat-panel displays into two categories: emissive displays
and non-emissive displays. The emissive displays (or emitters) are devices that convert
electrical energy into light. Plasma panels, thin-film electroluminescent displays, and Light-
emitting diodes are examples of emissive displays. Non-emissive displays (or non-emitters)
use optical effects to convert sunlight or light from some other source into graphics patterns.
The most important example of a non-emissive flat-panel display is a liquid-crystal device.
Plasma panels, also called gas-discharge displays, are constructed by filling the
region between two glass plates with a mixture of gases that usually includes neon. Another
type of emissive device is the light-emitting diode (LED). A matrix of diodes is arranged
to form the pixel positions in the display, and picture definition is stored in a refresh buffer.
NOTES
Liquid Crystal displays are used in small systems, such as calculators and portable,
laptop computers. These non-emissive devices produce a picture by passing polarized
light from the surroundings or from an internal light source through a liquid-crystal material
that can be aligned to either block or transmit the light. Rows of horizontal transparent
conductors are built into one glass plate, and columns of vertical conductors are put into
the other plate. The intersection of two conductors defines a pixel position. Polarized light
passing through the material is twisted so that it will pass through the opposite polarizer.
The light is then reflected back to the viewer. To turn off the pixel, we apply a voltage to the
two intersecting conductors to align the molecules so that the light is not twisted. This type
of flat-panel device is referred to as a passive-matrix LCD. Picture definitions are stored
in a refresh buffer, and the screen is refreshed at the rate of 60 frames per second, as in the
emissive devices. Another method for constructing LCD’s is to place a transistor at each
pixel location, using thin-film transistor technology. The transistors are used to control the
voltage at pixel locations and to prevent charge from gradually leaking out of the liquid-
crystal cells. These devices are called active-matrix displays.
Interactive raster graphics systems typically employ several processing units. In addition
to the central processing unit, or CPU, a special-purpose processor, called the video
controller or display controller, is used to control the operation of the display device. Here,
the frame buffer can be anywhere in the system memory, and the video controller accesses
the frame buffer to refresh the screen.
Raster scan systems also use a separate processor, sometimes referred to as a graphics
controller or a display coprocessor. The purpose of the display processor is to free the
CPU from the graphics chores. In addition to the system memory, a separate display
processor memory area can also be provided. A major task of the display processor is
digitizing a picture definition given in an application program into a set of pixel-intensity
values for storage in the frame buffer. This digitization process is called scan conversion.
Graphics commands specifying straight lines and other geometric objects are scan converted.
An application program is input and stored in the system memory along with a graphics
package. Graphics commands in the application program are translated by the graphics
package into a display file stored in the system memory. This display file is then accessed
by the display processor to refresh the screen. The display processor cycles through each
command in the display file program once during every refresh cycle. Sometimes the display
processor in a random-scan system is referred to as a display processing unit or a graphics
controller.
6 ANNA UNIVERSITY CHENNAI
COMPUTER GRAPHICS AND MULTIMEDIA SYSTEMS
Mouse : A mouse is small hand-held box used to position the screen cursor.
Trackball and Spaceball: As the name implies, a trackball is a ball that can be rotated
with the fingers or palm of the hand to produce screen-cursor movement. While a trackball
is a two-dimensional positioning device, a spaceball provides six degrees of freedom.
Joysticks : A joystick consists of a small, vertical lever (called the stick) mounted on a
base that is used to steer the screen cursor around.
Data Glove: a data glove that can be used to grasp a “virtual” object. The glove is
constructed with a series of sensors that detect hand and finger motions.
Image Scanners: Drawings, graphs, color and black-and-white photos, or text can be
stored for computer processing with an image scanner by passing an optical scanning
mechanism over the information to be stored. The gradations of gray scale or color are
then recorded and stored in an array.
Touch Panels: As the name implies, touch panels allow displayed objects or screen
positions to be selected with the touch of a finger. A typical application of touch panels is
for the selection of processing options that are represented with graphical icons.
Light Pens: These pencil-shaped devices are used to select screen positions by detecting
the light coming from points on the CRT screen.
Voice Systems: Speech recognizers are used in some graphics workstations as input
devices to accept voice commands
The quality of the pictures obtained from a device depends on dot size and the number
of dots per inch, or Lines per inch, that can be displayed. Printers produce output by either
impact or non-impact methods. Impact printers, press formed character faces against an
inked ribbon onto the paper. A line printer is an example of an impact device, with the
typefaces mounted on bands, chains, drums, or wheels. Non-impact printers and plotters
use laser techniques, ink-jet sprays, xerographic processes (as used in photocopying
machines), electrostatic methods, and electro thermal methods to get images onto Paper.
In a laser device, a laser beam mates a charge distribution on a rotating drum coated with
a photoelectric material, such as selenium. Toner is applied to the drum and then transferred
NOTES to paper. Ink-jet methods produce output by squirting ink in horizontal rows across a roll
of paper wrapped on a drum. The electrically charged ink stream is deflected by an electric
field to produce dot-matrix patterns.
Coordinate Systems
Any picture definition must be converted to Cartesian coordinates before they can be
input to the graphics package. Several different Cartesian reference frames are used to
construct and display a scene. We can construct the shape of individual objects, such as
trees or furniture, in a scene within separate coordinate reference frames called modeling
coordinates, or sometimes local coordinates or master coordinates. Once individual object
shapes have been specified, we can place the o b F s into appropriate positions within the
scene using a reference frame called world coordinates. Finally, the world-coordinate
description of the scene is transferred to one or more output-device reference frames for
display. These display coordinate systems are referred to as device coordinates or screen
coordinates in the case of a video monitor.
Graphics Functions
Software standards
Graphical Kernel System (GKS): This system was adopted as the first graphics software
standard by the International Standards Organization.
PHIGS: The second software standard to be developed and approved by the standards
organizations was PHIGS (Programmer’s Hierarchical Interactive Graphics standard),
which is an extension of GKS. Increased capabilities for object modeling color specifications,
surface rendering, and picture manipulations are provided.
Although PHIGS presents a specification for basic graphics functions, it does not
provide a standard methodology for a graphics interface to output devices. Nor does it
specify methods for storing and transmitting pictures. Separate standards have been
developed for these areas. Standardization for device interface methods is given in the
Computer Graphics Interface (CGI) system. And the Computer Graphics Metafile (CGM)
system specifies standards for archiving and transporting pictures.
9 ANNA UNIVERSITY CHENNAI
DMC 1934
Choice devices: Graphics packages use menus to select programming options, parameter
values and object shapes to be used in constructing a picture. A choice device is defined as NOTES
one that enters a selection from a list of alternatives. Commonly used choice devices are a
set of buttons, a cursor positioning device, such as a mouse, trackball, or keyboard cursor
keys, and a touch panel.
For screen selection of listed menu options, we can use cursor-control devices. When
a coordinate position (x,y) is selected, it is compared to the coordinate extents of each
listed menu item, A menu item with vertical and horizontal boundaries at the coordinate
values xmin, xmax, ymin, and ymax is selected if the input coordinates (x,y) satisfy the inequalities
Pick devices: These are used to select parts of a scene that are to be transformed to
edited in some way. Typical devices used for object selection are the same as those for
menu selection: the cursor-positioning devices. With a mouse or joystick, we can position
the cursor over the primitives in a displayed structure and press the selection button. The
position of the cursor is then recorded, and several levels of search may be necessary to
locate the particular object that is to be selected. First the cursor position is compared to
the coordinate extents of the various structures in the scene. If the bounding rectangle of a
structure contains the cursor coordinates, the picked structure has been identified. But if
two or more structure areas contain the cursor coordinates, further checks are necessary.
Graphical input functions can be set up to allow users to specify the following options:
Which physical devices are to provide input within a particular logical classification
(for example, a tablet used as a stroke device)
How the graphics program and devices are to interact (input mode).
When the data are to be input and which device is to be used at that time to deliver
a particular input type to the specified data variables.
Input modes
Functions to provide input can be structured to operate in various input modes which
specify how the program and input devices interact. Input could be initiated by the program,
or the program and input devices both could be operating simultaneously, or data input
could be initiated by the devices. These three input modes are referred to as request
mode, sample mode, and event mode.
In request mode, the application program initiates data entry. Input values are requested
and processing is suspended until the required values are received.
In sample mode, the application program and input devices operate independently.
Input devices now may be operating at the same time that the program is processing other
data. New input values from the input devices are stored, replacing previously input data
NOTES values. When the program requires new data, it samples the current values from the input
devices.
In event mode, the input devices initiate data input to the application program. The
program and the input devices again operate concurrently, but now the input devices deliver
data to an input queue. All input data are saved. When the program requires new data, it
goes to the data queue.
There are two popular methods, namely DDA and Bresenham’s methods through
which line can be scan converted. Let us discuss about Digital Differential Algorithm.
Normally two end points will be given as input for the algorithm. From the given
points slope is found. Slope can have any value with positive and negative quantity. Slope
is of floating value in nature. The system can display only discrete quantities or integer
addressable pixel values. Ideal line is the one which has exact quantities of x and y co-
ordinates with float values. The goal is to identify the pixels that lie near the ideal line
imposed on the 2D raster grid. These pixels will form the scan-converted line. The scan
converted line should have constant and even brightness / thickness independent of
orientation and length and be drawn as rapidly as possible. The scenario is as pictured
below, where the shaded part indicates the potential displayable candidate points, and
dark line is nothing but ideal line. The question here is, how did we choose which of these
pixels to be selected for proper display of line?
How did we
choose which
of these pixels?
DDA Algorithm
Yi = mxi +B
Þ yi+1 = mxi+1 + B
Þ yi+1 = m(xi+”x)+B
NOTES
Þ yi+1 = yi +m”x
Yi+1 = yi +m
Xi+1 = xi +(1/m)
Since program variables have limited precision, the cumulative build up of error can
cause problems (Disadvantage) for long lines.
void ddaLine (int x0, int y0, int x1, int y1)
{
int x, y;
double dy = y1 – y0;
double dx = x1 – x0;
double m = dy/dx;
double incr = m;
if (m > 1.0)
{
incr = 1/m;
y = x0;
x = y0;
}
for (x = x0; x<= x1; x++)
{
writePixel (x,y);
y = y + incr;
}
return;
}
However, it doesn’t deal with special cases like horizontal and vertical lines where
slope is 0 and Infinity respectively.
The black dot is current pixel position of the line. Since the slope is less than one
(Assumption) straight increment in x co-ordinate is given for next pixel and y coordinate
pixel (NE or E) is under the consideration for next pixel display. Let Q be the intersection
of the line and the grid-line (Xp+1). The choice of pixel to use for the line is between E and
NE. The obvious choice is to pick the pixel closest to Q. Another way to ask the same
question is: Is Q above or below M (the midpoint)? The error in the choice is always
d” 1/2. To determine which side of M is Q we use both forms of the equation of a line.
F(x,y) = ax+by+c=0
The b in the slope-intercept form and the b in the explicit form are unrelated. Therefore,
The sign of the midpoint f(xp+1,yp+1/2) will determine if m is above or below the line.
Once we have chosen either e or ne we can calculate the value of d for the next grid
line
(xp +2).
If e is chosen, then:
Dnew = f(xp+2,yp+1/2)=a(xp+2)+b(yp+1/2)+c
But
Dold = a(xp+1)+b(yp+1/2)+c
If ne is chosen, then:
Dnew = dold + a + b
F(x0+1,y0+1/2 ) = a(x0+1)+b(y0+1/2)+c=ax0+by0+c+a+b/2
= f(x0,y0)+a+b?2
for compatibility with the circle and ellipse algorithms we will increment x and y after
NOTES updating the decision variable.
extending the code to cover the other 7 octants can be done by looking at the
relationships between the octants.
NOTES
when scanning a line from p0 to p1 and the decision variable was 0 we choose the e
pixel.
when scanning from p1 to p0 and the decision variable is 0 picking the w pixel would
be mistake!
– it would produce different pixels when travelling in different directions.need to
choose sw pixel.
– variable intensity
a line with slope 0 will have the same number of pixels as a line with slope 1, even
though the second line is “2 times longer.
x2 + y2 = r2
y2 =?r2-x2???
we can increment from 0 to r and solve for y. This has 2 major problems:
– inefficient: multiply and square root operations.
– as the slope become “ large gaps appear.
NOTES
We focus on one octant of the circle and then we can map the results to the other
7 octants.
Similar to the midpoint line algorithm we choose between 2 candidate “next” pixels
by using a midpoint criterion.
let f(x,y) = x2 + y2 – r
– f is zero for points on the circle.
– f is positive for point outside the circle.
– f is negative for point inside the circle
To remove the fractions from the calculation substitute n = d-1/4. The initial value is
n=1-r, and since n will be incremented by integer values(“e,”se) we can change the test to
n<0.
We can improve the algorithm further by realising that the value for ge and gse are
linear functions. By applying the differencing approach again we can remove them.
if we choose e then the evaluation point moves to (xp+1,yp).
“eold = “e(xp,yp)=2xp+3
“enew = “e(xp+1,yp)=2(xp+1)+3
“enew-”eold =2
“seold = “se(xp,yp)=2xp-2yp+5
“seold = “se(xp+1,yp)=2(xp+1)-2yp+5
“senew-”seold=2
similarly, if we choose se then the evaluation point moves to (xp+1,yp-1).
“eold = “e(xp,yp)=2xp+3
“enew=”e(xp+1,yp-1)=2(xp+1)+3
“enew – “eold = 2
“seold = “se(xp,yp)=2xp-2yp+5
“senew = “se(xp+1,yp-1)=2(xp+1)-2(yp-1)+5
“senew – “seold =4
the midpoint circle algorithm with second order difference is therefore as given below.
Pseudo code for mid point algorithm with second order difference
NOTES void midpointcircle (int radius)
{
int x = 0;
int y = radius;
int d = 1 - radius;
int deltae = 3;
int deltase = (-2 * radius) + 5;
writepixel (x,y);
while (y > x)
{
if (d < 0)
{ // choose e
d = d + deltae;
deltae = deltae + 2;
deltase = deltase + 2;
}
else
{ // choose se
d = d + deltase;
deltae = deltae + 2;
deltase = deltase + 4;
y—;
}
x++;
writepixel (x,y) ;
}
}
1.5 LINE CLIPPING
Cohen-Sutherland 2d clipping Method
The cohen-sutherland line clipping algorithm clips lines to rectangular clipping regions.
The clipping region is defined by 4 clipping planes.
The 2d plane is divided into regions, each with its own outcode.
The endpoints of each line to be clipped are assigned a region depending on their
location.
NOTES
float x, y;
NOTES outcode0 = computeoutcode (x0, y0, xmin, xmax, ymin, ymax);
outcode1 = computeoutcode (x1, y1, xmin, xmax, ymin, ymax);
do
{
if (!(outcode0 | outcode1))
{
accept = true; done = true; // trivial accept
}
else if (outcode0 & outcode1)
{
done = true; // trivial reject
}
else
{
// Calculate intersection with a clipping plane using
// y = y0 +slope*(x-x0) and x = x0 + (1/slope)*(y-y0)
outcodeout = outcode0 ? outcode0 : outcode1;
if (outcodeout & top)
{
x = x0 + (x1-x0)*(ymax-y0)/(y1-y0);
y = ymax;
}
else if (outcodeout & bottom)
{
x = x0 + (x1-x0)*(ymin-y0)/(y1-y0);
y = ymin;
}
else if (outcodeout & right)
{
y = y0 + (y1-y0)*(xmax-x0)/(x1-x0);
x = xmax;
}
else if (outcodeout & left)
{
y = y0 + (y1-y0)*(xmin-x0)/(x1-x0);
x = xmin; NOTES
}
}
if (outcodeout == outcode0)
{
x0 = x; y0 = y; outcode0 = computeoutcode (x0, y0, xmin, xmax, ymin,
ymax);
}
else
{
x1 = x; y1 = y;
outcode1 = computeoutcode (x1, y1, xmin, xmax,
ymin, ymax);
}
} while (done == false);
if (accept)
{
drawline (x0, y0, x1, y1);
}
}
byte outcode ()
{
byte outcode = 0;
if (y > ymax)
outcode = outcode | top;
else if (y < ymin)
outcode = outcode | bottom;
if (x > xmax)
outcode = outcode | right;
else if (x < xmin)
outcode = outcode | left;
return outcode;
}
}
All or none text clipping : Since STRING 1 is partial with respect to the clip
window STRING 1 is completely excluded(clipped). Here STRING 2 alone is displayed
after clipping.
NOTES
QUESTIONS
NOTES
NOTES
UNIT II
2D TRANSFORMATIONS
LEARNING OBJECTIVES
2.1 2D TRANSFORMATIONS
There are 5 types of transformations namely translation, scaling, rotation and also
special transformations namely reflection and shear that are essential.
– Translation
Changing the size of the object. This can be uniform where both dimensions are
resized by the same factor, or non-uniform.
– Rotation
x x d x
p , p' , T
y y d y
X’ = SX.X Y’ = SY.Y
SX= Scaling coefficient in X direction
SY = Scaling coefficient in Y direction
x1 sx 0 x
y 0 s y y
1
points are rotated around the origin by an angle è. For complete objects like
squares, rectangles or any polygons point wise processing is done to completely
rotate and then lines are drawn to complete the shape.
In matrix form rotation of an angle è with respect to origin for the given (x,y) point
is as follows:
The w = 1 plane is our 2d space and the intersection of the line with this plane gives
us the point.
in a homogeneous coordinate system translation, scaling and rotation are matrix
multiplications.
Homogeneous transformations
TRANSLATION
x1 1 0 d x x
1
y 0 1 d y y
1 0 0 1 1
SCALING
x1 s x 0 0 x
y1 0 s 0 y
y
1 0 0 1 1
ROTATION
x1 cos sin 0 x
1
y sin cos 0 y
1 0 0 1 1
suppose we wish to reduce the square in the following image to half its size and
rotate it by 45° about point p.
we need to remember that scaling and rotation takes place with respect to the
origin.
- translate square so that the point around which the rotation is to occur is at the
origin.
- Scale
- Rotate
- translate origin back to position p.
ANOTHER EXAMPLE
NOTES
– this is just an approximation and as the errors accumulate the image will become
unrecognizable.
– a better approximation is
X’ = X - Y.SINÈ
Y’ = X’.SINÈ + Y = (X - Y.SINÈ).SINÈ + Y
= X.SINÈ + Y(1-SIN2È)
the corresponding 2x2 matrix has a determinant of 1 and hence preserves areas.
2.1.5 Reflection
NOTES
It is all about generation of mirror effects with reference to x, y or both the coordinates.
The general form of reflection matrix is as follows, where RX and RY are corresponding
reflection coefficients.
RX 0 0
0 RY 0
0 0 1
1 0 0
0 -1 0
0 0 1
-1 0 0
0 1 0
0 0 1
The matrix notation is as follows for reflection about both x and y axis is ;
-1 0 0
0 -1 0
0 0 1
Shear transformation creates the ‘slide over effects’ on the object of operation.
Assume that books are stacked one above the other, if force is applied in one direction it
causes the distorted effect on books. There are x shear and y direction shear corresponding
transformation matrix is as follows;
1 Shx 0
0 1 0
0 0 1
1 0 0
NOTES
Shy 1 1
0 0 1
Request Mode : The application program initiates data entry. Input values are
requested and processing is suspended until the required values are received. Program
waits until the data are delivered.(only one input device will be operating.)
Sample mode: The application program and input devices operate independently.
Input devices may be operating at the same time that the program is processing other
data. New input values are stored , replacing previous values. When the program requires
new data it samples the current values.
Event Mode: Program and the input devices again operate concurrently, but now
the input devices deliver data to an input queue. All input data are saved. When the program
requires new data, it goes to the data queue.
NOTES
V = (V1,V2,V3,V4,V5)
E1 = (V1,V2,P2,Ö)
E2 = (V2, V3,P3,Ö)
E6 = (V5,V1,P1,Ö)
E7 = (V1,V6,P1, P2)
E8 = (V6,V2,P2,P3)
P1 = (E7,E5,E6), P2=(E1,E8,E7)
P3=(E8,E2,E3,E4,E5)
When working with polygon meshes, it is important to ensure that the mesh
representation is consistent.
– All polygons are closed.
– each edge is used at least once and not more than twice.
– each vertex is referenced by at least two edges. NOTES
plane equations
– if a polygon is defined by more than 3 vertices, the polygon may be non-planar.
– if the polygon is defined by 3 vertices it is planar and defined by:
ax + by + cz + d = 0
A spline curve is defined by a set of control points. These control points are either:
– interpolating control points: the curve passes through these control points.
– approximating control points: these control point exert an influence of the curve.
the parametric equation for a line is:
X=X0(1-T)+X1T,0D”TD”1
Y=Y0(1-T)+Y2T,0D”TD”1
X=X0+(X2-X0)T
Y=Y0+(Y1-Y0)T
X=A0+A1T
Y=B0+B1T
X=A0+A1T+A2T2+A3T3 ,0D”TD”1
Y=B0+B1T+B2T2+B3T3 ,0D”TD”1
X0=A0
Y0=B0
X1=A0+A1+A2+A3
Y1=B0+B1+B2+B3
Curves that join have geometric continuity, g0. Curves whose tangents vectors have
equal magnitude and direction at the join have parametric continuity, c1.
A 4 point bezier curve is defined by 4 control points; p0, p1, p2, p3.
– p0 and p3 are interpolating control points through which the curve must pass.
– p1 and p2 are approximating control points that control the slope of the curve at
p0 and p3 respectively.
the slope of p0p1 defines the tangent gradient at p0 and the slope of p2p3 defines
the tangent gradient at p3.
X(T) =” BRXR(1-T)N-R-1TR
R=0
N-1
Y(T) =” BRYR(1-T)N-R-1TR
R=0
Example for the window can be a square/rectangle dotted lines present in most of the
camera’s viewing glass. There can be any number of view ports present or opened in the
viewing device. View ports are justified for the sake of different viewing devices like crt,
lcd panels etc which widely vary their dimensions in any practical scenario.
A point at position (Xw, Yw) in the window is mapped into position (Xv, Yv) in the
associated viewport. To maintain the same relative placement in the viewport as in the
window, we need that,
Where Xv min and max are viewport minimum and maximum coordinates, Similarly
Xw are window coordinates.
QUESTIONS
1. Perform a 2d scaling(sx=5, sy = 5) for a point (10,10), with respect to a reference
axis located about 30 degrees from x axis.
2. What is composite transformation matrix? State the need.
3. State the examples for different types of input modes?
4. Define an efficient polygon representation for a cylinder.
Wirte a routine to display any specified conic in the xy plane using rational bezier
spline representation
NOTES
NOTES
UNIT III
3D TRANSFORMATIONS
LEARNING OBJECTIVES
And view port is all about “where the captured image is to be displayed”.
Example for the window can be a square/rectangle dotted lines present in most of the
camera’s viewing glass. There can be any number of view ports present or opened in the
viewing device. View ports are justified for the sake of different viewing devices like crt,
lcd panels etc which widely vary their dimensions in any practical scenario.
NOTES
To view the world we transform the world to place the viewer at the origin
looking down an axis. Now the world is specified in the viewer coordinate NOTES
system (vcs).
This 3D world from the viewer’s point of view is projected on to and clipped to
a viewing window. now we have a 2D device scene in the normalised device
coordinate system.
Last step is to rasterise the 2D image in the device coordinate system (dcs).
Viewing devices :
Viewing devices fall into 2 categories.
calligraphic devices.
–vector graphic devices.
Draw line segments and polygons directly.
plotters
laser light projection systems.
Raster devices.
An image is represented as a regular grid of pixels (picture elements).
crt/lcd/plasma monitors
Translation
x1 1 0 0 tx x
1
y 0 1 0 t y y
z 1 0 0 1 tz z
1 0 0 0 1 1
Scaling
x1 s x 0 0 0 x
1
y 0 s y 0 0 y
z 1 0 0 sz 0 z
1 0 0 0 1 1
x1 s x 0 0 0 x
1
y 0 s y 0 0 y
z1 0 0 sz 0 z
1 0 0 0 1 1
x1 1 0 0 0 x
y 0 cos sin 0 y
1
z1 0 sin cos 0 0
1 0 0 0 1 1
r11 r12 r13 0 1 r11 r11 r12 r13 0 1 r12
r r22 r23 0 0 r r r22 r23 0 0 r
21 21 , 21 22
r31 r32 r33 1 0 r31 r31 r32 r33 1 0 r32
0 0 0 1 0 0 0 0 0 1 0 0
r11 r12 r13 0 1 r13
r r22 r23 0 0 r
21 23
r31 r32 r33 1 0 r33
0 0 0 1 0 0
3D rotations
the first, second and third columns of the upper-left 3x3 submatrix are the rotated
x-axis, rotated y-axis and rotated z-axis respectively.
Rx Ry Rz 0
0 0 0 1
0
y x DOF DOF x ( y x DOF) DOF 0
R
0 0 0 0
1
This approach has limitations. In this case if the dof is collinear with the y axis, the
(y x dof) is zero
This general approach can be used when instantiating models (specified in a model
coordinate system) in our world coordinate system.
3.2 PROJECTIONS
Conceptually the 3D viewing process is:
The projection is defined by projection rays (projectors) that come from a centre
of projection and pass through each point of the object, and intersect with a NOTES
projection plane.
– if the centre of projection is at infinity we have parallel projection.
– if the centre of projection is at a point we have perspective projection.
Figure 3.5 Perspective and parallel projection scenario with projection lines
Since the direction of projection is a vector, and a vector is the difference between
two point, it can be calculated as (x,y,z,1)t – (x’,y’,z’,1)t = (a,b,c,0)t.
The above diagram show an effect of perspective projection, namely perspective
foreshortening; the size of an object is inversely proportional to the distance to the
centre of projection.
Specifying a 3D view
The projection plane, also called the view plane, is defined by a view reference
point (vrp) and the view plane normal (vpn).
The 3D viewing reference coordinate (vrc) system is formed from the vpn and the
view up vector (vup). the vpn define the n-axis, the projection of the vup on the
view plane define the v-axis and the u-axis makes up the right-handed coordinate
system.
The viewing window in the view plane is defined by a centre of window (cw)
and minimum and maximum u and v values.
For a parallel projection the prp and the direction of projection (dop) define the
view. The dop is a vector from the prp to the cw. NOTES
The projectors from the prp through the minimum and maximum window coordinate
define a semi-infinite pyramid view volume (perspective projection) or semi-infinite
parallelepiped view volume (parallel projection).
We usually define a front and back clipping plane.
This is then mapped to normalised projection coordinates (npc) with the prp at
the origin and the back clipping plane mapped to 1.
1 0 0 vrp x
0 1 0 vrp y
T ( VRP)
0 0 1 vrp z
0 0 0 1
VPN
R2
VP N
– the u-axis which is perpendicular to the z-axis and vup is rotated on the y-axis.
VUP R2
Rx
VU P R2
step 3 is to shear along the z-axis to align the dop with z-axis while maintaining the
vpn.
1 0 shx par 0
0 1 shy par 0
SH pa r SH shx pa r , shy par
0 0 1 0
0 0 0 1
dop=cw-prp
NOTES
Vmix Vmin
2 2Vrpz
SY1 1 SY1
Vpr Vmix Vmin
1
S y SY1
Vprz B
2 prp n 2 prpn 1
S prp
(U U )( prp B ) ( V )( prp B ) prp
mix min n Vmix min n n
Computing intersections
This task is at the heart of any ray tracing algorithm.
where does the eye ray intersect an object?
The eye ray is defined by the cop (x0,y0,z0) and the centre of the pixel on the
NOTES window in the view plane (x1,y1,z1).
Using a parametric representation we have:
x = x0+t(x1-x0),y=t(y1-y0),z=z0+t(z1+z0)
Let us define 4x = x1-x0, and 4y and 4z similarly.
Then,
x = x0+t”x,y = y0+t”y,z=z0+t”z
T ranges from 0 to 1 between the cop and viewplane. Values of t greater than 1
correspond to points on the other side of the viewplane.
Calculating the intersection of an eye ray with a sphere is relatively easy.
explains why most ray casting examples are full of spheres!
The equation of a sphere centred on (a,b,c) with radius r is:
Expanding and substituting 4x = x1-x0 etc. yields:
x2-2ax+a2+y2-2by+b2+z2-2cz-c2=r2
(x0+t”x)2-2a(x0+t”x)+a2+(y0+t”y)2-2b(y0+t”y)+b2+(z0+t”z)2-2c(z0+t”z)+c2=r2
Multiply out and collect similar term “s.
(“x2+”y2+”z2)t2+2t(“x(x0-a)+ “y(y0-b)+ “z(z0-c))+ (x0-a)2+(y0-b)2+(z0-c)2-r2=0
This gives a quadratic in t that can be solved using the quadratic formula.
if there are no real roots the eye ray does not intersect the sphere.
if there is one real root, the eye ray grazes the sphere at a point.
if there are two real root the eye ray intersects the sphere.
The small t value corresponds to the closest intersection point.
The surface normal of a sphere centred on (a,b,c) with radius r at the point of
intersection (x,y,z) is:
refraction
– snell’s law uses indices of refraction ç. NOTES
int depth)
NOTES {
rt_colour colour;
rt_ray rray, tray; /* reflected & transmitted ray */
rt_ray sray; /* shadow ray */
rt_colour rcolour, tcolour;
colour = ambient colour;
for (each light)
{
sray = ray to light from point;
if (dot product of normal and direction to light is positive)
{
compute how much light is blocked by opaque and
transparent surfaces, and use to scale diffuse and
specular terms before adding to colour;
}
}
8
if (depth < maxdepth)
{
if (object is reflective)
{
rray = ray in reflection direction from point;
if (sray ‘“ rray)
rcolour = rt_trace (rray, depth+1);
scale rcolour by specular coefficient and add to colour;
}
if (object is transparent)
{
tray = ray in refraction direction from point;
if (total internal reflection does not exist)
{
if (sray ‘“ rray)
tcolour = rt_trace (tray, depth+1);
scale tcolour by transmission coefficient
Comments on algorithm
Opaque objects totally block light whereas transparent objects scale the light’s
contribution.
Ray tracing suffers from problems caused by limited numerical precision.
when calculating secondary rays from an intersection point t is reset to zero. If another
ray intersects, the object there will be a small non-zero t value. Typically happens with
shadow rays resulting in incorrect “self shadows”.
Dealt with by considering abs (t) < å equivalent to zero.
Efficiency
Even more important than before. With m, light sources and a tree depth of n there
are 2n-1 reflection and refraction rays in the ray tree, and m (2n-1) shadow rays!
Since rays can come from any direction view volume clipping and culling back-surfaces
cannot be used.
– pruning the size of the ray tree can have significant results.
za = z1-(z1-z2)(y1-ys)/(y1-y2)
zb = z1-(z1-z3)(y1-ys)/(y1-y3)
zp = zb-(zb-za)(xb-xp)/(xb-xa)
NOTES
If the aoi is one of the previous 4 cases apply the rule, otherwise sub-divide the aoi.
– can stop when aoi is equivalent to a pixel.
– can go to sub-pixel resolution and average sub-pixels to provide antialiasing.
NOTES
Descriptions for natural objects , such as trees, clouds and other natural objects can
be modeled using fractals, shape descriptors and particle analysis. Special data structures
may be required for data representation in the case of multivariate data.
Some display devices are bilevel, that is they produce just two intensity levels.
The increase the range of perceived intensity levels we can make use the spatial
integration that our visual system performs.
– when viewing a small area from a distance we perceive the overall intensity of
the area.
Halftoning (aka cluster-dot ordered dither) exploits this phenomenon.
The following 2x2 pixel area can produce 5 different intensity levels.
a dither matrix can express a set of dither patterns with a growth sequence. To
display an intensity i we turn on all pixels whose value is less than i. For the previous NOTES
2x2 patterns the dither matrix is:
Dither patterns should avoid isolated pixels as some devices (laser printers and
printing presses) have difficulty reproducing isolated pixels consistently.
Error diffusion can be used to improve the image quality. in the flyod/stienberg
algorithm the difference between the exact intensity and the displayed value is
spread across the pixels to the right
and below the current pixel; 7/16 of the error to the east pixel, 3/16 to the south-
west pixel, 5/16 to the south pixel and 1/16 to the southeast pixel.
Chromatic light
Colour perception involves:
– hue: the colour we see.
– saturation: how far the colour is from a grey of equal intensity.
Red is highly saturated whereas pink is relatively unsaturated. unsaturated colours
include more white light than saturated colours.
– lightness: perceived intensity of a reflecting object.
– brightness: perceived intensity of a self-luminous object.
Colourimetry
Colourimetry is the branch of physics that quantifies colours.
- the dominant wavelength is the wavelength of the light from
an object. it corresponds to hue.
- the excitation purity measures the proportion of pure light of the
dominant wavelength and of white light needed to define the colour. it corresponds
to saturation.
- Luminance is the amount of light intensity. it corresponds to lightness for reflecting
objects and brightness for self-luminous objects.
Tristimulus theory
The tristimulus theory is based on the hypothesis that the eye has 3 types of cones
(colour sensors). these cones’ peak sensitivity corresponds to red, green and
blue light.
CIE chromaticity diagram
The cie (commision internationale del’Eclairage) defined colours using 3 primaries
x, y and z to replace red, green and blue. the primaries require no negative values
in the matching process.
The point c defines standard white light (approximating sunlight). it is very near
x=y=z=1/3.
100% spectrally pure colours are on the curved part of the diagram.
the chromaticity diagram factors out luminance. not all colours are represented.
– brown (orange-red at low luminance) is not in the diagram.
– there are an infinite number of planes in the (x,y,z) space.
complementary colours are those that mix to produce white light.colour gamuts
a typical use of the cie chromaticity diagram is to define colour gamuts (or colour
ranges).
NOTES
This is a typical crt gamut. by mixing r, b, and g we can match any colour in the
triangle.
NOTES
k’ = min (c,m,y)
c’ = c – k
m’ = m – k
y’ = y – k
NOTES
3.8 ANIMATION
ANIMATION TECHNIQUES
Animation techniques are incredibly varied and difficult to categorize. Techniques are
often related or combined. The following is a brief on common types of animation. Again,
this list is by no means comprehensive.
TRADITIONAL ANIMATION
Also called cel animation, the frames of a traditionally animated movie are hand-
drawn. The drawings are traced or copied onto transparent acetate sheets called cels,
which are then placed over a painted background and photographed one by one on a
rostrum camera. Nowadays, the use of cels (and cameras) is mostly obsolete, since the
drawings are scanned into computers, and digitally transferred directly to 35 mm film. The
“look” of traditional cel animation is still preserved, and the character animator’s work has
remained essentially the same over the past 70 years. Because of the digital influence over
modern cel animation, it is also known as tradigital animationExamples: the lion king,
spirited away.
FULL ANIMATION
The most common style in animation, known for its realistic and often very detailed
art.
Examples: all disney feature length animated films, the secret of nimh, the iron giant
LIMITED ANIMATION
NOTES
A cheaper process of making animated cartoons that does not follow a “realistic”
approach.
The characters are usually cartoon, and the animators have a lot of artistic freedom as
rubber hose animations don’t have to follow the laws of physics and anatomy in the same
degree as the other main styles in animation.
A technique where animators trace live action movement, frame by frame, for use in
animated films.
Stop-motion animation is any type of animation which requires the animator to physically
alter the scene, shoot a frame, again alter the scene and shoot a frame and so on, to create
the animation. They are many different types of stop-motion animation some notable
examples are listed below.
CLAY ANIMATION
Examples: the animated sequences of monty python’s flying circus (often referred to
as dada animation, named after the dada art movement [citation needed]); tale of tales;
early episodes of south park
Silhouette animation
A type of cutout animation where the viewer only sees black silhouettes.
Example: the adventures of prince achmed, the world’s oldest surviving animated
feature film, from 1926.
Graphic animation
Model animation
In this form of animation, model animated characters interact with, and are a part of,
the live-action world.
Examples: the films of ray harryhausen (jason and the argonauts) and willis o’brien
(king kong)
Go motion
Object animation
Pixilation
examples: neighbors
PUPPET ANIMATION
Puppet animation typically involves puppet figures interacting with each other in a
constructed environment, in contrast to the real-world interaction in model animation (above).
The puppets generally have an armature inside of them to keep them still and steady as
well as constraining them to move at particular joints.
Examples: the nightmare before christmas, robot chicken, the tale of the fox
Computer animation:
Like stop motion, computer animation encompasses a variety of techniques, the unifying
idea being that the animation is created digitally on a computer.
2D animation
NOTES
Figures are created and/or edited on the computer using 2D bitmap graphics or created
and edited using 2D vector graphics. This includes automated computerized versions of
traditional animation techniques such as of tweening, morphing, onion skinning and
interpolated rotoscoping.
Examples: a scanner darkly, jib jab, analog computer animation, flash animation,
powerpoint animation
3D ANIMATION
Figures are created in the computer using polygons. To allow these meshes to move
they are given a digital armature (sculpture). This process is called rigging. Various other
techniques can be applied, such as mathematical functions (gravity), simulated fur or hair,
effects such as fire and water and the use of motion capture to name but a few.
Resolution in computer graphics either refers to the number of pixels per inch or other
unit of measure (centimeter for example) on a monitor or printer. It is also sometimes used
to describe the total number of pixels on a monitor. Resolution is usually measured in pixels
per inch or dots per inch (dpi).
KEY FRAMES
When someone creates a 3D animation on a computer, they usually don’t specify the
exact position of any given object on every single frame. They create key frames. Key
frames are important frames during which an object changes its size, direction, shape or
other properties. The computer then figures out all the in between frames and saves an
extreme amount of time for the animator.
NOTES
Modeling
Morphing is a very cool looking transition. It is also one of the most complicated
ones. A morph looks as if two images melt into each other with a very fluid motion. In
technical terms what happens is, two images are distorted and a fade occurs between
them. This is pretty complicated to understand, but believe me, it looks very cool. (you
might have seen morphing in the gillette© and shell© gasoline commercials).
Warping is the same as morphing, except only one image is distorted and no fade
occurs.
Onion Skinning
NOTES
Onion skinning is a term that commonly refers to a graphic process in which an image
or an animation is composed of a couple of different layers. For example, if you have ever
used adobe photoshop 3.0 or higher you are probably familiar with the layers window.
That is exactly what onion skinning is all about. Imagine it as a series of totally transparent
pieces of plastic with different drawings on them. When they are all stacked on top of
another, a composite is formed. This is widely used in traditional animation when the
background is a separate layer and each character is a separate layer. This way, only
layers have to be redrawn or repositioned for a new frame. Onion skinning is also found in
computer software where different effects can be placed on different layers and later compo
sited into a final image or animation.
Rendering
Rendering is the process a computer uses to create an image from a data file. Most
3D graphics programs are not capable of drawing the whole scene on the run with all the
colors, textures, lights, and shading. Instead, the user handles a mesh which is a rough
representation of objects. When the user is satisfied with the mesh, he then renders the
image.
Tweening
The process of generation of the intermediate frames given the key frames is called
tweening.
Scene description
This includes the positioning of objects and light sources, defining the photometric
parameters, and setting the camera parameters (position, orientation, and lens
characteristics).
Action description
This involves the layout of motion paths for objects and camera.
Morphing
Transformation of object shapes from one form to another form is called as morphing,
which is a shortened form of metamorphosis. Morphing methods can be applied to any
motion or transition involving a change in shape.
Given two key frames for an object transformation, we first adjust the object
specification in one of the frames so that the number of polygon edges (or the number of
vertices) is the same for the two frames. This preprocessing step is illustrated below. A
straight line segment in the key frame k is transformed into two line segments in key
frame k+1. Since k+1 have an extra vertex, we add vertex between vertices 1 & 2 in key
frame k to balance the number of vertices in the two key frames.
we can state general preprocessing rules for equalizing key frames in terms of either
the number of edges or the number of vertices to be added to a key frame. Suppose we
equalize edge count, and parameter lk and lk+1 denote the number of line segments in two
consecutive frames. We then define
LMAX = MAX (LK, LK+1)
LMIN = MIN (LK, LK+1)
NE = LMAX MOD LMIN
NS = INT (LMAX/LMIN)
Then the preprocessing is accomplished by:
Dividing ne edges of key framemin into ns+1 section.
Dividing remaining lines of key framemin into ns sections.
As an example, if
lk = 15 & lk+1 = 11, we should divide 4 lines of key framek+1 into two sections each.
The remaining lines of key framek+1 are left intact.
If we equalize the vertex count, we can use parameters vk and vk+1 to denote the
number of vertices in two consecutive frames. In this case we define:
For the triangle to quadrilateral example, vk = 3 and vk+1 = 4. Both nls and np are 1, so
would add one point to one edge of key framek. No points would be added to the remaining
lines of key framek+1.
Simulating Accelerations
NOTES
For constant speed (zero acceleration) we use equal-interval time spacing for the
time interval between the key frames is divided into ‘n+1’ subintervals yielding an in-
between spacing of
Tbj = t1 + j ät
Nonzero accelerations are used to produce realistic displays, which are being modeled
using animation path with a spline or trigonometric functions.
To model the increasing speed (positive acceleration), we want the time spacing
between frames to increase so that greater changes in position occur as the object moves
faster. We can obtain an increasing interval size with the function
1 – cosè 0<è<ð/2
for n in-betweens, the time for the jth in-between would be then calculated as
MOTION SPECIFICATION
The most straight forward method for defining a motion sequence is direct specification
of the motion parameters. Here we explicitly give the rotation angles and the translation
vectors. Then the geometric transformation matrices are applied to transform coordinate
positions. Alternatively, we could use an approximating equation to specify certain kinds of
motions. We can approximate the path of a bouncing ball, for instance, with a damped,
rectified, sine curve
Where a is the initial amplitude, w is the angular frequency, è is the phase angle, and
k is the damping constant.
NOTES
Figure 3.38 Approximating the motion of a bouncing ball with a damped sine
function
As the opposite extreme we can specify the motions that are to take place in general
terms that abstractly describe the actions. These systems are referred to as the goal directed
systems because they determine the specific motion parameters given the goals of the
animation. For example, we could specify that we want to “walk” or to “run” to a particular
destination, and so on. The input directives are then interpreted in terms of component
motions that accomplish the selected task. Human motions for instance, can be defined as
a hierarchical structure of sub motions of the torso, limbs, and so forth.
Kinematics specification of a motion can also be given by simply describing the motion
path. This is often done using spline curves.
An alternate approach to this is to use inverse kinematics. Here, we specify the initial
and final positions of the objects at specified times and the motion parameters are computed
by the system.
The Future
create software where the animator can generate a movie sequence showing a photorealistic
human character, undergoing physically-plausible motion, together with clothes, NOTES
photorealistic hair, a complicated natural background, and possibly interacting with other
simulated human characters. This should be done in a way that the viewer is no longer able
to tell if a particular movie sequence is computer-generated, or created using real actors in
front of movie cameras. Achieving such a goal would mean that conventional flesh-and-
bone human actors are no longer necessary for this kind of movie creation, and computer
animation would become the standard way of making every kind of a movie, not just
animated movies. However, living actors will be needed for voice-over acting and motion
capture body movements. Complete human realism is not likely to happen very soon,
however such concepts obviously bear certain philosophical implications for the future of
the film industry.
Then we have the animation studios who are not interested in photorealistic cgi features,
or to be more precise, they want some alternatives to choose from and may prefer one
style over another, depending on the movie.
or the moment it looks like three dimensional computer animations can be divided
into two main directions; photorealistic and non-photorealistic rendering. Photorealistic
computer animation can itself be divided into two subcategories; real photorealism (where
performance capture is used in the creation of the virtual human characters) and stylized
photorealism. Real photorealism is what final fantasy tried to achieve and will in the future
most likely have the ability to give us live action fantasy features as the dark crystal without
having to use advanced puppetry and animatronics, while antz example on stylistic
photorealism (in the future stylized photorealism will be able to replace traditional stop
motion animation as corpse bride). None of them are as mentioned perfected yet, but the
progress continues. The non-photorealistic/cartoonist direction is more like an extension
and improvement of traditional animation, an attempt to make the animation look like a
three dimensional version of a cartoon, still using and perfecting the main principles of
animation articulated by the nine old men, such as squash and stretch. While a single frame
from a photorealistic computer animated feature will look like a photo if done right, a single
frame vector from a cartoonist computer animated feature will look like a painting (not to
be confused with cel shading, which produces an ever simpler look).
QUESTIONS
NOTES
1. Prove that three successive 3D matrix transformation is commutative for scaling
and rotation.
2. Scale a cubical object of size two cm for each of it’s sides with scaling parameters
(3,3,3) for x,y,z axis respectively.
3. Device a procedure for rotating an object based on OCTREE structure.
4. Write a procedure for performing two point perspective projection of a cubical
object.
5. Extend the liang-barsky clipping algorithm for 3D objects against a specified regular
parallelepiped.
6. Write a program to display the visible surfaces of a convex polyhedron using BSP
tree method.
7. Device an algorithm for viewing a single sphere using ray casting method.
8. Discuss in detail about wire frame models and its applications.
9. Discuss the types of animation and tools required for performing the same.
10. Compare different color models and the specific applications of each.
NOTES
UNIT IV
OVERVIEW OF MULTIMEDIA
LEARNING OBJECTIVES
A multimedia computer system is a computer system that can create, import, integrate,
store, retrieve, edit, and delete two or more types of media materials in digital form, such
as audio, image, full-motion video, and text information. Multimedia computer systems
also may have the ability to analyze media materials (e.g., counting the number of occurrences
of a word in a text file).
4.2 PERIPHERALS
NOTES
Input devices
Output devices
Storage units
Erasable Optical: Ex., 3.5 Inch
Jukebox: Disk, Tape
Magnetic: Diskette, Disk
Read-Only Optical: CD-ROM
Components
Higher resolution screens with at least 640x480 video SVGA is 640x480, 256 colors,
CLUT 15 bit SVGA handles 32,768 colors ,24 bit handles 16,777,216 colors
Fast graphics
24<->8 bit color conversion dithering (e.g., 24->8)
e.g., Heckbert’s median cut algorithm
N.B. If only have 4 bits, it is better to switch to gray scale.
Sound boards
Sampling rates to 44.1 kHz for 16 bit samples
Sound Blaster is one of the original “standards” (in MPC).
MIDI is cheaper with an FM synthesizer that generates sounds algorithmically.
Wavetable synthesizers cost more, but sound more realistic, and use stored
waveforms from actual instruments. DSPs are desirable because of their flexibility.48
kHz audio is slightly better, but hard to tell difference.
Computer-based
Game/Home Systems
Flops
CD-I, CDTV, VIS.By mid 1993, only about 25K players shipped in US. In contrast,
1992 Sega CD-ROM add-ons sales were 200K!.
CD-I (Philips)
NOTES
Introduced in 1986, shipped in 1992, MPEG in 1994 68000 target machine with
numerous constraints
Real-time OS
Slow evolution of authoring environments
CDTV(Commodore)
Out to market before CD-I
Built upon successful Amiga 500
Real-time OS, rich multimedia support
Investigated fractal compression
Problems: computer too slow,
CD-ROM drive too slow
The video adapter card is an expansion card which usually sits on the Motherboard.
It acts as an interface between the processor of the computer and the monitor.
Early Adapters
1) Mono Display Adapter
2) Color Graphics Adapter
VGA was a standard introduced by IBM in 1987 which was capable of displaying
text and graphics in 16 colors.
Technically the VGA standard was replaced by XGA from IBM in 1990.
The video adapter card is connected to the monitor with the video adapter cable.
About plasma
Plasma is an energetic gas state of matter often referred to as the fourth state of
matter.
Merits
1) Color reproduction and contrast is good.
2) Large screen size of 70’’ is possible.
Demerits
1) The PDP is quite fragile and may create shipping problems.
2) Detail in dark scenes is not as good as CRTs.
Multimedia data is often very large. A single image may take up several megabytes,
while a single 90 minute video might occupy several gigabytes of disk space, even when it
is highly compressed. Hence, it is imperative that media data be stored in secondary a
tertiary storage devices such as disks, cd-roms, and tapes. Unlike traditional database
applications, the retrieval of dynamic multimedia data such as audio and video requires
that it be continuous - the segments of video must be retrieved and presented to the user in
a manner that has no jitter or delay.
One of the most successful disk-based storage methods has been the RAID (redundant
array of inexpensive disks) paradigm. RAID provides several architectures by which many
relatively inexpensive disks can be used together to provide scalable performance. A “block”
is used to denote the smallest chunk of data that will be read or written. Data may be
divided up into several contiguous blocks. For example a video being shown at 30 frames
per second may be broken up into blocks composed of 1800 frames (1 minute).Some of
the well known RAID architectures are:
Raid 0 Architecture
This is the simplest form of RAID architecture. In this architecture there are a set of n
disks, labeled 0,1, ..., (n - 1), that are accessed through a single disk controller.
Disk Striping
NOTES
Disk striping stores data across multiple physical storage devices. In a stripe set,
multiple partitions of equal size on separate devices are combined into a single logical
device. A k-stripe is a set of k drives, for some integer k <= n that divides n. Intuitively,
once n and k are fixed, we can, in effect, logically (not necessarily physically) split the set
of disk drives into n/k clusters, consisting of k drives each.
When storing a set b0, b1, ..., br-1 of contiguous blocks in terms of a k-striped
layout, we store block b0 on disk 0, block b1 on disk 1, block b2 on disk 2, and so on. In
general, we store block bi on disk i mod r. Furthermore, a stripe could start at disk j rather
than at disk 0, in which case, block bi would be stored on disk ((i+j) mod r).
The picture provides a simple layout of two movies. The blocks of the first movie are
denoted by b0, b1, b2,b3,b4. These are striped with k = 3 starting at disk 0. Thus, block
b0 is placed on disk 0, block b1 is placed on disk 1, and block b2 is placed on disk 2.
Block b3 is now placed back on disk 0, and block b4 is placed on disk 1.
The second movie has six blocks, denoted by c0, ...,c5, and these are striped with k
= 4 and starting at disk 1. (i.e., j = 1). Thus, block c0 is placed on disk 1, block c1 is
placed on disk 2, and so on.
When we stripe a movie across k disks, it is as though the transfer rate of the disk has
increased k-fold. The reason for this is that data can be read, in parallel, from each of the
disks. For example, the controller can directly read blocks b0,b1 and b2 in parallel from
the three disks containing those blocks. Thus, the larger k is, the better the transfer rate of
the disk. However, we should not increase k arbitrarily because the actual transfer rate is
limited by the buffer size as well as the output bandwidth of the channel/bus to which the
disk array is connected. Furthermore, in practice k-fold increases are almost never obtained
due to various performance attenuations.
Raid - 1 Architecture
The Raid -1 architecture basically uses only half the available disk. In other words, if
there is N disks available altogether, then n = N/2 disks are utilized. For each disk, there is
a mirror disk. Stripping is done across the n disks as before. RAID-1 is predicated on the
assumption that there is a very low probability that a disk and its mirror will fail simultaneously.
When we wish to read from a disk, we read from the disk (if it is active) or we read from
its mirror. The obvious disadvantage of the RAID-1 architecture is that only 50% storage
utilization is achieved - this is the price paid for the desired reliability.
RAID -5 Architecture
NOTES
The raid-5 architecture shown in figure 4.3 is perhaps the best suited for database
applications.
It reflects a simple but elegant trade-off between efficient utilization of available storage
and excellent reliability. In the raid-5 architecture, each closer of k disks has one disk
reserved as a parity disk which would enable data to be recreated should one of the drives
in the parity set fail.
For example, consider the situation shown in the figure 4.4 shown. Here, our movie-
on-demand (mod) server is providing access to three disks. For the sake of simplicity,
these disks contain only one movie, composed of 300 blocks. Disk 1 contains blocks 1-
150;, disk 2 contains blocks 150-250; and disk 3 contains blocks 200-300.
NOTES
At time t, suppose user u1 is watching blocks at the rate of 2 blocks per time unit, and
suppose he is watching block 140. His current request is being served by disk 1. Suppose:
1) The user continues watching the movie in “ordinary” mode. That is, data (u1, t) =
(m, 140, 2, 1). In this case, the blocks to be retrieved are blocks 140,141, which
can only be served by disk 1. The mod server therefore ships a request to disk 1’s
server, requesting these blocks.
2) The user pauses: in this case, data (u1, t) = (m, 140, 1, 0). In this case, the user is
shown no blocks, and the block that was previously on the screen stays on.
3) The user fast-forwards at 6 blocks per second: in this case, data (u1, t) = (m, 140,
2, and 6). That is, the blocks to be retrieved are blocks 146, 152. Blocks 142
exist only on disk 1, and block 152 exists only on disk 2. Hence, to satisfy this
request by the user, the mod server must dispatch two requests - one each to the
servers controlling disks 1 and 2.
4) The user rewinds at 6 blocks per second: in this case, data (u1, t) = (m, 140, 2,-
6). That is, the blocks to be retrieved are blocks 134 and 128, both of which can
be retrieved from disk 1 only.
Suppose users u1 and u2 are both accessing movies at the same time, at the rate of 2
blocks per time unit, and users u1 and u2 are watching blocks 140 and 199, respectively.
Suppose now that user u1 fast-forwards at the rate of 5 blocks per time unit, while user u2
continues normal viewing.
1) At time t, user u1 wants blocks 140 and 145, while user u2 wants blocks 199 and
200. User u2’s request should be satisfied by disk 1, whole user u2’s request is
satisfied by disk.
2) At time t + 1, user u1, continuing the same transaction as above, wants blocks 150
and 155, whole user u2 wants blocks 201 and 202. Here are some constraints
that need to be taken into account when attempting to satisfy this:
Either of the following assignments can be used to handle the requests of the two
users:
The user u1’s transaction data (u1, t+1) = m, 150, 2, 5) is split into two sub
transactions, denoted by the 4-tuples (m, 150, 1, 5), (m, 155, 1, 5). The first sub
transaction is served by disk 1, the second by disk 2. In the same way, user u2’s
transaction data (u2,t + 1) = (m, 200, 2, 1) is split into two sub transactions,
denoted by the 4-tuples (m, 201, 1, 1) and (m, 202, 1, 1). The first sub transaction
is saved by disk 2, the second by disk 3. (notice that the second subs transactions
of both users’ requests could be satisfied by disk 2, but in the first user’s case, only
disk 2 can satisfy it, while in the second user’s case, either disk 2 or disk 3 can
satisfy it.)
An alternative possibility is that we split user u1’s transaction into two as above
and have disks 1 and 2 satisfy the sub transactions; however, instead of splitting
user u2’s request, we switch his entire transaction to disk 3.
Looking at the above example, two new operations have been introduced: splitting,
which causes a user’s transaction to be split into two or more pieces (called twins),
and switching, which causes a user’s transaction (or its descendant sub transactions)
to be switched from the server that was originally handling the request to another
server.
The basic tool set for building multimedia projects contains one more authoring systems
and various editing applications for text, images, sounds, and motion video.
Word processors such as Microsoft Word, Word Perfect, are powerful applications
that include spell checkers, table formatters, thesauruses, and prebuilt templates for letters,
resumes, purchase orders, and other common documents. In many word processors, you
can actually embed multimedia elements such as sounds, images, and video.
OCR software: With Optical Character Recognition software, a flat-bed scanner, and
your computer, you can save many hours of rekeying printed words, and get the job done
faster and more accurately than a roomful of typists. OCR software turns bitmapped
characters into electronically recognizable ASCII text. A scanner is typically used to create
the bitmap. Then the software breaks the bitmap into chunks according to whether it
contains text or graphics, by examining the texture and density of areas of the bitmap and
by detecting edges. The text areas of the image are then converted to ASCII characters
using probability and expert system algorithms. Most OCR applications claim about 99%
accuracy when reading 8 to 36 point printed characters at 300 dpi and can reach processing
NOTES speeds of about 150 characters per second.
With 3-D modeling software, objects rendered in perspective appear more realistic.
Powerful modeling packages such as AutoDesk’s Discreet, Strata Vision’s 3D, Specular
LogoMotion and Infini-D, Alias’ Wavefront, Avid’s SoftImage, and Caligari’s trueSpace
are also bundled with assortments of prerendered 3-D clip art objects such as people,
furniture, buildings, cars, airplanes, trees, and plants. Important for multimedia developers,
many 3-D modeling applications also include export features enabling you to save a moving
view or journey through you scene as a QuickTime or AVI animation file. Each rendered
3-D image takes from a few seconds to a few hours to complete, depending upon the
complexity of the drawing and the number of drawn objects included in it.
Image –editing applications are specialized and powerful tools for enhancing and
retouching existing bitmapped images, these applications also provide many of the features
and tools of painting and drawing programs and can be used to create images from scratch
as well as images digitized from scanners, video frame-grabbers, digital cameras, clip art
files, or original artwork files created with a painting or drawing package.
Plug-Ins: Image-editing programs usually support powerful plug-ins available from third-
party developers that allow you to warp, twist, shadow, cut, diffuse, and otherwise “filter”
your images for special visual effects.
Multiple Tracks: Being able to edit and combine multiple tracks and then merge the
tracks and export them in a final mix to a single audio file is important.
Trimming: Removing “dead air” or blank space from the front of a recording and any
unnecessary extra time off the end is the first sound editing task. Trimming even a few
seconds might make a big difference in file size.
Splicing and Assembly: Using the same tools mentioned for trimming, you will probably
want to remove the extraneous noises that inevitably creep into a recording.
Volume adjustments: If you are trying to assemble ten different recordings into a single
sound track, there is little chance that all the segments will have the same volume. To
provide a consistent volume level, select all the data, in the file, and raise or lower the
overall volume by a certain amount. Best is to use a sound editor to normalize the assembled
audio file to a particular level.
Format Conversion: In some cases your digital audio software might read a format
different from that read by you presentation or authoring program. Most sound editing
software will save files in your choice of many formats, most of which can be read and
imported by multimedia systems.
Resampling or Downsampling: If you have recorded and edited your sounds at 16-bit
sampling rates but are using lower rates and resolutions in your project, you must resample
or downsample the file. Your software will examine the existing digital recording, and work
through it to reduce the number of samples. This process may save considerable disk
space.
Fade-Ins and Fade-outs: Most programs offer enveloping capability, useful for long
sections that you wish to fade in or fade out gradually. This enveloping helps to smooth out
the very beginning and the very end of a sound file.
Equalization: Some programs offer digital equalization capabilities that allow you to modify
a recording’s frequency content so that it sounds brighter or darker.
Time Stretching: Advanced programs let you alter the length of a sound file without
changing its pitch. But most time-stretching algorithms severely degrade the audio quality
of the file if the length is altered more than a few percent in either direction.
Digital Signal Processing: Some programs allow you to process the signal with
NOTES reverberation, multitap delay, chorus, flange, and other special effects using digital signal
processing routines.
Animations and digital video movies are sequences of bitmapped graphic scenes,
rapidly played back. But animations can also be made within the authoring system by
rapidly changing the location of objects, or sprites, to generate an appearance of motion.
Most authoring tools adopt either a frame or object oriented approach to animation, but
rarely both.
Movie making tools, typically take advantage of QuickTime for Mac and Windows
and Microsoft video for Windows (AVI) and let you create , edit, and present digitized
motion video segments, usually in a small window in your project.
Video formats and systems for storing and playing digitized video to and from disk
files are available with QuickTime and AVI. Both systems depend on special algorithms
that control the amount of information per video frame that is sent to the screen, as well as
the rate at which new frames are displayed. Both provide a methodology for interleaving,
or blending, audio data with video and other data so that sound remains synchronized with
the video. And both technologies allow data to stream from disk into memory in a buffered
and organized manner. DVD is a hardware format defining a very dense, two-layered disc
that uses laser light and, in the case of recordable discs, heat to store and read digital
information. The digital information or software on a DVD is typically multiplexed audio,
image, text, and video data optimized for motion picture display using MPEG encoding.
QuickTime can deliver 3D animation, real-time special effects, virtual reality, and streaming
video and audio. Its role as a powerful cross platform integrator of multimedia objects and NOTES
formats make it a tool upon which multimedia developers depend.
QuickTime building blocks
Three elements make up QuickTime
When delivering QuickTime projects on the World Wide Web, you can embed
powerful commands into your HTML documents that control and fine-tune the display of
your QuickTime file;
AUTO PLAY starts a movie playing automatically
BGCOLOR sets a background color for the movie display
CACHE indicates whether the movie should be cached
CONTROLLER specifies whether to display the QuickTime movie controller bar
Audio Video Interleaved (AVI) is a Microsoft developed format for playing full-
motion interleaved video and audio sequences in windows, without specialized hardware.
Video data is interleaved with audio data within the file that contains the motion sequence,
so the audio portion of the movie remains synchronized to the video portion.
The AVI file format is not extensible, ‘open’ environment and lacks features needed
for serious video editing environments. To improve this situation, a group of interested
companies recently created the OpenDML file format to make AVI more useful for the
professional market.
Image compression algorithms are critical to the delivery of motion video and audio
on both the MAC and PC platforms. The three basic concepts are
Compression ratio: The compression ratio represents the size of the original image
divided by the size of the compressed image that is, how much the data is actually
compressed. Some compression schemes yield ratios that are dependent on the image
content: a busy image of a field of multicolored objects may yield a very small compression
ratio, and an image, of blue ocean and sky may yield a very high compression ratio. Video
compression typically manages only the part of an image that changes from image to image.
Compression is either lossy or lossless. Lossy schemes ignore picture information the
viewer may not miss, but that means the picture information is in fact lost – even after
decompression. And as more and more information is removed during compression, image
quality decreases. Lossless schemes preserve the original data precisely. The compression
ratio typically affects picture quality because, usually, the higher the compression ratio, the
lower the quality of the decompressed image.
While developing projects a faster compression time is preferred on the other hand
users will appreciate a fast decompression time to increase display performance.
For compressing video frames, the MPEG format used for DVD employs three types
of encoding: I-frames (Intra), P-Frames (predicted), and B-Frames (Bidirectional
Predicted). Sequences of these frame types are compiled into a GOP, and all the GOPs
are stitched into a stream of images. The result is an MPEG video file.
These are conventional systems modified to offer improved vertical and horizontal
resolutions.
2. CCIR Recommendations
NOTES
The international body for television standards CCIR defined a standard for digitization
of video signals known as CCIR-601 Recommendations.
It is a new standard for digital video for improving picture quality compared to the
standard NTSC or PAL formats.
It is a protocol that transfers reservations and keep a state at the intermediate nodes.
4.7.1 Text
Text looks like the easiest medium to create and the least expensive to transmit,
First, effective use of text requires good writing, striving for conciseness and accuracy.
Advertising wordsmiths sell product lines with a logo or tag lines with just a few
words
Similarly, multimedia developers are also presenting text in a media-rich context,
weaving words with sounds, images, and animations
Design labels for multimedia title screens, menus and buttons using words with the
precise and powerful meanings
Which feedback is more powerful: “That answer was correct.” or “Terrific!”
When is “Terrific” more appropriate or effective?
Why is “quit” more powerful than “close”? Why does UM uses “out” instead?
Why is the title of a piece especially important?
It should clearly communicate the content.
It should get the user interested in exploring the content.
Let’s discuss some of your proposed project titles.
4.7.2 Images
1. Visible images:
The images exist for some duration in complete bitmap form , which includes
every pixel captured by the input device.
Example
Drawings,
Documents
Photographs
Paintings
2.NonVisible Images
Non Visible Images are those that are not stored as images but are displayed as
images
Ex:ample
Temp gauge
Pressure gauge
3.Abstract Images
They are really not images that ever existed as real world objects or representations
Ex:ample
Discrete
Continuous
Stored audio and video contain compressed information. This can consist of music,
speech, voice commands, telephone conversations and so on..
An audio object need to store information about the sound clip such as the length of
the sound clip, its compression algorithm, playback characteristics and any sound
annotations associated with the original clip that must be played at the same time as overlays.
Digital Audio
Windows uses wav format, apple uses aiff
Advantages:
Reliable playback (“what you hear is what you get”)
Required for speech playback
Sound effects in \windows\media or the web
Are any of you planning to use simple digitized sound effects in your projects?
How so? Where are you getting your sound effects?
Images are often received from a camera or scanner as a sequence of pixels for each
of the primary colors. Those are converted to a digital bit stream. Each pixel is represented
in a particular color space and encoding scheme. The figure 4.6 shows the RGB to digital
bit stream coding scenario.
NOTES
Sound, animation, and video can add a great deal of appeal and sensuality to a piece
but there are always tradeoffs. david ludwig of interactive learning designs says,
“Let the content drive the selection of media for each chunk of information to be
presented. Use traditional text and graphics when appropriate; add animation when ‘still
life’ won’t get your message across; add audio when further explanation is required; resort
to video only when all other methods pale by comparison.”
NOTES
Every nth fraction of a second, a sample of analog sound is taken and stored in
binary form
Sampling rate: Indicates how often the sound sample is taken
Sampling size: Denotes how much information is stored for each sample
The more often you sample and more data per sample, the higher your quality
and resolution
The value of each sample is rounded off to the near integer (this is called quantization)
An 8-bit sampling size provides 256 bits to describe dynamic range of amplitude
A 16-bit sampling size provides over 65 thousand bits for dynamic range,
But significantly increases space requirements
If amplitude is > intervals available, clipping at top and bottom of waves occurs:
Produces background hissing noise or other distortions (i can’t hear them too
good!)
Three most common frequencies are 44.1 khz (kilohertz), cd-quality, 22.05 and
11 khz
Formula for determining the size (in bytes) of a digital recording
Note that most but not all pcs have 16 bit sound cards (not all have sound cards!)
A good sound editor (such as cool edit pro) lets you manipulate your sound files in
useful ways
Trimming: removing blank space from front or back of a sound file
Splicing: removing extraneous sounds or inserting sounds from other sources
Amplification (volume adjustments): making different sounds play at consistent
level
Why might amplification in a sound editor not be such a great idea?
What did you think of the reference librarian’s voice in the cimel prototype?
Re-sampling or down-sampling, e.g., form 16-bit to 8-bit
Digital signal processing effects: reverberation, flange, fades and other effects
Format conversion: windows standard is wav, mac is snd or aif, unix is au (mew-
law)
Cool edit 2000 also supports mp3
A little bug in cool edit: when saving from wav to mp3, it sometimes truncates the start
of the file;you can avoid this bug by using copy and paste from a wav file
Animation adds motion to a piece, perhaps to draw attention to what you want user
to notice Can be as simple as a transition effect, such as a fade, dissolve or zoom Or as
elaborate and expensive as a full cartoon-like cel-animation or even 3d animation
Object seen by a human eye remains mapped on retina for a brief time after viewing
Makes it possible for a series of images that change slightly and rapidly to blend, Giving the
illusion of movement
Acceptable multimedia animation can make do with fewer frames per second
Classical cartoon animation makes a different cel (celluloid sheet) for each frame Cel
animation artwork begins with key frames for each action, such as a character about to
take a step, pitching its body weight forward .tweening an action requires calculating the
number of frames between key frames.
Computer animation can imitate the classical technique, with key frame, tweening and
layers.
E.g., macromedia director and flash both support these concepts, letting the computer
automate the tweening process where possible
E.g., rather than reproduce an entire cel for each frame, individual objects (called
sprites in director) move across a background image
Author ware motions give this effect; director animations provide finer control of
sprites .morphing effect can be achieved by dissolving from one image to another,
E.g., from one face to another – many specialized morphing software products
available
Animations introduced more file formats: dir for director movies, fli and flc for autodesk
and animatorpro, max for studiomax, gif89a for animations in gif (most popular on the
web)director movies may be compressed for the web (yet another format, dcr) .
Ever since the first silent movie flickered to life, people have been fascinated with
“motion pictures”. However, video places the highest performance demand on processor
speed and memory. Playing a full screen video uncompressed could require 1.8gb per
minute! Special hardware and software enhancements needed for better performance,
both on the production and playback side. On the production side, hardware/software
can get pretty expensive!
S-VHS keeps video, colour and luminance are kept on two separate tracks Sony
Betacam SP features 3 channels for video (red, blue, luminance), 4 for audio Higher
resolution; considered “the” choice of broadcast industry and archiving.
Shooting and editing video Never underestimate the value of a steady shooting platform
Shaky camera work is a sure sign of amateur home movies! Use a tripod or even a towel
on a table Provide plenty of lighting, ideally bright sunlight.
Chroma key or blue screen is a useful special effect (available in Premiere): e.g., To
show Captain Picard on the moon, shoot Picard in front of a screen or wall painted blue,
then shoot another video of a background moonscape, then mix the two together, deleting
the blue chroma key in the Picard shot Popular technique using 3-D modelling and graphic
software.
Windows avi (audio video interleaved) and apple quicktime mov are popular formats
for movies. Quicktime 3 was released in 1998, with many new features.
Note: Quicktime movies produced on the mac must be “flattened” for playback under
windows (interleaving video and audio together)
in the video! Streaming technologies have made video somewhat more practical on the
NOTES web real networks claims that 100,000 hours/week of live audio/video
The character “woody” in toy story, for example, uses 700 avars with 100 avars in
his face alone. Successive sets of avars control all movement of the character from frame
to frame. Once the stick model is moving in the desired way, the avars are incorporated
into a full wire frame model or a model built of polygons. Finally surfaces are added,
requiring a lengthy process of rendering to produce the final scene.
There are several ways of generating the avar values to obtain realistic motion. Motion
tracking uses lights or markers on a real person acting out the part, tracked by a video
camera. Or the avars may be set manually using a joystick or other form of input control.
Toy story uses no motion tracking, probably because only manual control by a skilled
animator can produce effects not easily acted out by a real person.
Computer animation can be created with a computer and animation software. Some
examples of animation software are: amorphium, art of illusion ray dream studio, bryce,
maya, blender, true space, light wave, 3d studio max, softimage xsi, alice, and adobe flash
(2d). There are many more. Prices will vary greatly depending on target market. Some
impressive animation can be achieved even with basic programs; however, the rendering
can take a lot of time on an ordinary home computer. Because of this, video game animators
tend to use low resolution, low polygon count renders, such that the graphics can be
rendered in real time on a home computer. Photorealistic animation would be impractical
in this context.
stations being able to render much faster, due to the more technologically advanced hardware
that they contain. NOTES
Pixar’s renderman is rendering software which is widely used as the movie animation
industry standard, in competition with mental ray. It can be bought at the official pixar
website for about $5,000 to $8,000. It will work on linux, mac os x, and microsoft windows
based graphics workstations along with an animation program such as maya and softimage
xsi. Professionals also use digital movie cameras, motion capture or performance capture,
blue screens, film editing software, props, and other tools for movie animation.
When the computer fails to wait for the v-sync, a condition called sprite breakup or
NOTES image breakup is perceptible. This is highly undesirable and should always be avoided
when possible to maintain the illusion of movement.
Storyboard layout
The storyboard is an outline of the action. It defines the motion sequence as a basic
set of events that are to take place. Depending on the type of the animation to be produced,
the storyboard could consist of a set of rough sketches or it could be a list of basic ideas
for the motion.
Object definitions
An object definition is given for each participant in the action. Objects can be defined
in terms of basic shapes, such as polygons or splines. In addition, the associated movements
for each object is specified along with the shape.
Key-frame specification
A key frame is detailed drawing of the scene at a certain time in the animation sequence.
Within each key frame, each object is positioned according to the time for that frame.
Some key frames are chosen at extreme positions in action; others are spaced so that the
time interval between key frames is not too great. More key frames are specified for
intricate motions that for simple, slowly varying motions.
In-betweens are the intermediate frames between the key frames. The number of in-
betweens needed is determined by the media to be used to display the animation. Typically,
time intervals for the motion are set up so that there are from three to five in-betweens for
each pair of key frames.
Raster animations
One fact to note is that all of these classes of application have some common
characteristics:
They are shared application and used by a large number of users.
The user share data objects as and when they need them.
Some processes are carried out in a sequential manner,that is data processed by
one user or group of user goes to next in a sequence.
Functionality
The functionality of the system is the primary determinant of the technologies required
and how they will be used .each technology utilized in multimedia application supports a
range of functionality options.
Authoring systems for multimedia application are designed with the following two
primary target users in mind
1. Professionals.
2. Average business users.
Display resolution
i. Level of standardization on display resolution
ii. Display protocol standardization.
iii. Corporate norms for service degradaions.
iv. Corporate norms for network traffic degradation as they relate to resolution issues.
Dedicated authoring systems are the simplest type of system,designed for a single
user and generally for sigle streams Although a dedicated authoring system is very
simple,designing an authoring system capable of combining even two object streams can
be quite compex.a structured design approach is very useful in isolating the visual and
procedural design components.
In a timeline based authoring system,objects are placed along a timeline. The timeline
can be drawn on the screen in a window,in a grahioc manner,or it created. Using a script in
a manner similar to a project plan.
Multimedia authoring tools provide the important framework needed for organizing
and editing the elements of your multimedia project, including graphics, sounds, animations,
and video clips. Authoring tools are used for designing interactivity and the user interface,
for presenting your project on screen, and for assembling diverse multimedia elements into
a single, cohesive product.
Authoring software provides an integrated environment for binding together the content
and functions of your project, and typically includes everything you need to create, edit
and import specific types of data: assemble raw data into a playback sequence: and provide
a structured method or language for responding to user input. With multimedia authoring
software, we can make
Video productions
Animations
Games
Interactive web sites
Demo disks and guided tours
Presentations
Kiosk applications
Interactive Training
Simulations, prototypes, and technical visualizations
Time-based tools
In these authoring systems, multimedia elements and interaction elements are organized
as objects in a structural framework or process. Icon or object based, event-driven tools
simplify the organization of your project and typically display flow diagrams of activities
along branching paths. In complicated navigational structures, this charting is particularly
useful during development.
In these authoring systems, elements and events are organized along a timeline, with
resolutions as higher than 1/30 sec. Time-based tools are best to use when you have a
message with a beginning and an end. Sequentially organized graphic frames are played
back at a speed that you can set. Other elements are triggered at a given time or location
in the sequence of events. The more powerful time-based tools let you program jumps to
any location in a sequence, thereby adding navigation ad interactive control. NOTES
The elements of multimedia – images, animations, text, digital audio and MIDI music,
and video clips – need to be created, edited and converted to standard file formats and the
specialized software provide these capabilities. Also editing tools for these elements,
particularly text and still images, are often included in your authoring system. The more
editors your authoring system has, the fewer specialized tools you may need. In many
cases, however, the editors that come with an authoring system will offer only a subset of
the substantial features found in dedicated tools.
The organization, design, and production process for multimedia involves storyboarding
and flowcharting. Some authoring tools provide a visual flowcharting system or overview
facility for illustrating your projects structure at a macro level. Storyboards or navigation
diagrams, too, can help organize a project. Because designing the interactivity and navigation
flow of your project often requires a great deal of planning and programming effort, your
storyboard should describe not just the graphics of each screen, but the interactive elements
as well.
Multimedia authoring systems offer one or more of the following approaches, which
are explained below.
Visual programming with cues, icons, and objects
Programming with a scripting language
Programming with traditional languages, such as Basic or C
Document development tools
Visual programming with icons or objects is perhaps the simplest and easiest authoring
process. To play a sound or to put pictures into your project just drag the elements icon
into the playlist. Visual authoring tools such as Authorware and Icon Author are particularly
useful for slide shows and presentations.
Authoring tools that offer a very high level language or interpreted scripting environment
NOTES for navigation control and for enabling user inputs-such as Macromedia Director,
macromedia Flash, etc., are more powerful. With scripts you can perform computational
tasks, sense respond to user input, create character, icons, and motion animations, launch
other applications, and control external multimedia devices.
Interactivity empowers the end users of your project by letting them control the content
and flow of information. Authoring tools should provide one or more levels of interactivity:
Simple branching
Conditional branching
Nested IF-THENs, subroutines, event tracking and message passing among objects
and elements
Complex multimedia projects require exact synchronization of events. One will need
to use the authoring tools own scripting language or custom programming facility to specify
timing and sequence on systems with different processors.
As you build your multimedia project, you will be continually assembling elements
and testing to see how the assembly looks and performs. Your authoring system should let
you build a segment or part of your project and then quickly test it as if the user were
actually using it.
Delivering your project may require building a run-time version of the project using
the multimedia authoring software. A run-time version or standalone allows your project to
play back without requiring the full authoring software and all its tools and editors. Often,
the run-time version does not allow users to access or change the content, structure, and
programming of the project.
Choose the target platform. You might sometimes need to buy the software that suits
a particular target development and delivery platform.
Card and page based authoring systems provide a simple and easily understood
metaphor for organizing multimedia elements. Because graphic images typically form the
backbone of a project, both as navigation menus, and as content, many developers first
arrange their images into a logical sequences or groupings similar to the chapters and
pages of a book, or cards in a catalog. Navigation routines become then simply directives
to go to a page or card that contains appropriate images, and text, and associated sounds,
animations, and video clips.
Page based authoring systems contain media objects: the objects are the buttons, text
fields, graphic objects, backgrounds, pages or cards, and even the project itself. The
characteristics of objects are defined by properties. Each object may contain programming
script, usually a property of that object that is activated when an event related to that
object occurs. Events cause messages to pass along the hierarchy of objects in your project.
As the message travels it looks for handlers in the script of each object: if it finds a matching
handler, the authoring system then executes the task specified by that handler.
Most page based authoring systems provide a facility for linking objects to pages or
cards. Examples are HyperCard, ToolBook. The scripting languages associated with them
respectively are HyperTalk and OpenScirpt.
To go to the next card or page when a button is clicked, you would place a message
handler into the script of that button. Below is an example script in Hyper Talk.
On mouse Up
Go next card
end mouse Up
Here is an example in OpenScript:
to handle buttonUp
Go next page
end buttonUp
Card-and page-based systems typically provide two separate layers on each card: a
NOTES background layer that could be shared among many cards, and a foreground layer that
was specific to a single card.
Example: 1) Director
Macromedia’s Director is a powerful and complex multimedia authoring tool with a broadest
of features to create multimedia presentations, animations, and interactive multimedia
applications. In Director you assemble and sequence the elements of your project using a
Cast and a Score.
Cast: The cast is a multimedia database containing still images, sound files, text,
palettes, QuickDraw shapes, programming scripts, QuickTime movies, Flash movies, and
even other Director files. You can import a wide range of data types and multimedia element
formats directly into this Cast, and also create multimedia elements from scratch using
Director’s own tools and editors.
Score: Cast members are tied together using the Score facility. Score is a sequence
for displaying, animating, and playing Cast members, and it is made up of frames that
contain, Cast members, tempo, a palette, timing, and sound information. Each frame is
played back on a Stage at a rate specified in the tempo channel. The Score provides
elaborate and complex visual effects and transitions, adjustments of color palettes, and
tempo control. You can synchronize animations with sound effects by highlighting a range
of frames with sound effects by highlighting a range of frames and selecting the appropriate
sound from your cast.
NOTES
Using Lingo scripts, you can chain together separate Director Documents and call
other files as subroutines. You can also import elements into your cast using pointers to a
file.
Example 2)Flash
QUESTIONS
NOTES
1. Define sampling and encoding. Also discuss the various standards available for
video data.
2. What is a file system interface? State the need.
3. State the functionalities of a mixer.
4. What is speech synthesizer.
5. Propose a suitable RAID architecture for a live video conferencing system.
6. Compare various real time protocols with other relevant protocols.
7. Discuss in detail about various data compression standards and issues.
8. Discuss the open issues and challenges in multimedia authoring system.
NOTES
NOTES
UNIT V
Multimedia Applications
Public Access
1. Tourist Information System.
2. Public Information System for Railways.
Publishing Industry
A publication can be classified according to the market it caters to and it aims at –like
the family, school children, professional persons or academics
Edutainment
1. Many Educational games are available in the market.
2. Microsoft has produced games such as Sierra, Knowledge Adventure etc.,
Business Communication
1. Employee – related communication
2. Product promotions
3. Customer information
4. Reports for innovation.
5. Multimedia Information Database
There are many tools for collaborative computing, such as e-mail, bulletin boards,
screen sharing tools, text-based conferencing systems, telephone conference systems,
conference rooms and video conference systems.
Collaborative dimensions
Time: With respect to time, there are two modes of cooperative work: asynchronous and
synchronous. Asynchronous cooperative work specifies processing activities that do not
happen at the same time; the synchronous cooperative work happens at the same time.
User Scale: The user scale parameter specifies whether a single user collaborates with
another user or a group of more than two users collaborate together.
- A group may be static or dynamic during its lifetime. A group is static if its participating
members are pre-determined and membership does not change during the activity. A group
is dynamic if the number of group members varies during the collaborative activity, i.e.
group members can join or leave the activity at any time.
Other partition parameters may include locality, and collaboration awareness. Locality
partition means that collaboration can occur either in the same place or among users located
in different places through tele-collaboration. Collaboration awareness divides group
communication systems into collaboration-transparent and collaboration-aware systems.
The collaboration system is an existing application extended for collaboration. The
collaborative-aware system is a dedicated software application for CSCW.
In a centralized architecture, a single copy of the shared application runs at one site.
All participants input to the application is forwarded to the local site and the
application’s output (shared object) is then distributed to all sites. Advantage: Easy
maintenance because there is only one copy of the application that updates the shared
object. Disadvantage: high network traffic because the output of the application needs
to be distributed every time.
Replicated Architecture
NOTES
In a replicated architecture, a copy of the shared application runs locally at each site.
Input events to each application are distributed to all sites and each copy of the
shared application is executed locally at each site. Advantages: low network traffic
because only input events are distributed among the sites, and low response times,
since all participants get their output from local copies of the application. Disadvantage:
requirement of same execution environment for the application at each site and difficulty
in maintaining consistency.
Output
Shared Window Shared Window
Network
Network
Input Output
Shared Application
Centralized architecture
Input
Shared Window Shared Window
Network
Input Output Input Output
Replicated Architecture
Conferencing
This provides the establishment of a conference. First the initiator (e.g., chairman)
starts a conference by selecting an initial group of invited conference members. The
knowledge of the conference state is inquired from a central directory server, which implies
that the client has registered his/her location.
Second, each invited client responds to the invitation so that the initiator is informed
of who will participate in the conference. After this step, a negotiation of conference policies
and an admission of resources is performed among the conference participants. during the
negotiation, the shared conference state is distributed using a reliable messaging service to
all participants. Advantage: this static control guaranteed consistency of the conference
state. Works well for small conferences. Disadvantage: when a new participant wants to
join, explicit exchange of the conference state must be performed among all participants,
which causes large delays. In case of link failure, it is more difficult to re-establish the
conference state.
Each site distributes its own participation status to other conference participants, but
there is no global notation of a group membership, and no guarantees that all users will
have the same view of the state space. Hence, this loose control is implemented through
retransmitting the state periodically for eventual consistency. The periodical retransmission
is done using an unreliable messaging service. .The loose control works well for large
conferences.
Advantages: inherent fault tolerance and scaling properties. Disadvantage is that the
conference participants may not have the same view of the state space.
SESSION MANAGEMENT
NOTES
Session management is an important part of the multimedia communication architecture.
It is the core part which separates the control, needed during the transport, from the actual
transport.
Session Control
Conference Membership Protocol
Control Control
Floor Control
Configuration Media
Control Control
Reliable Transport
Whiteboard (shared Protocol
workspace) agent
Real-Time Transport
Video Protocol
agent
Real-Time Transport
Audio Protocol
Media agent
agents Real-Time Transport
Sensory Protocol
data agent
Session Manager
1. Local functionalities:
a) Membership control management, such as participant authentication or
presentation of coordinated user interfaces.
b) Control management for shared workspace, such as floor control.
c) Media control management such as inter communication among media agents
or synchronization
d) Configuration management such as exchange of interrelated QoS.
e) Conference control management, such as an establishment, modification
and a closing of a conference.
2. Remote functionality:
NOTES The session manager communicates with other session managers to exchange
session state information which may include the floor information, configuration
information, etc.
Media Agents
Media agents are responsible for decisions specific to each type of media. Each
agent performs its own control mechanism over the particular medium.
Shared Workspace Agent
The shared workspace agent transmits shared objects among the shared applications.
Control
Each session is described through its session state. This state information is either
private or shared among all the session participants. Session management includes two
steps to process the session state: 1.An establishment 2.Modification of the session.
Floor Control
Within shared workspaces, the floor control is employed to provide access to the
shared workspace. It is often used to maintain data consistency. It uses floor passing
mechanism which means that at any time, only one participant has the floor. The floor is
handed off to another participant when requested.
With real-time audio, there is no notion of data consistency, instead, the floor control
is typically used in more formal settings to promote turn-taking. Floor control for real-time
video is frequently used to control bandwidth usage.
It is also used to implement floor policies. A floor policy describes how participants
request the floor and how the floor is assigned and released.
Conference Control
Media Control
Configuration Control
requirements. This control may embed services, such as the negotiation and
renegotiation of media quality. NOTES
Membership Control
Transport layer need to provide the following functions for multimedia transmissions:
timing information, semi-reliability, multicasting, NAK based recovery mechanism and rate
control.
Internet Transport Protocols
Transmission control Protocol (TCP)
TCP provides reliable, serial communication path, or virtual circuit, between
processes exchanging a full-duplex stream of bytes.
To achieve reliable, sequenced delivery of a stream of bytes, it makes use of
timeouts and positive acknowledgements.
Flow control in TCP makes use of sliding window technique.
Further, TCP is not suitable for real-time video and audio transmission because its
NOTES retransmission mechanism may cause a violation of deadlines which disrupt the
continuity of the continuous media streams.
Internet protocol
a) Type of Service
TOS specifies
For example, Multimedia conferencing would need service class which supports low
delay, high throughput and intermediate reliability. Precedence handling would support
real-time network traffic.
The LAN technology brought the concept of convenient broadcasting to all end-
points on the LAN. LAN also introduces the concept of multicasting. This capability allows
an application to send a single message to the network and have it delivered to multiple
recipients. This service is attractive in a variety of distributed applications, including multi-
side audio/video conferencing and distributed database maintenance. Using class D
addresses multicast addressed packets are routed to all targets that are part of multicast
group.
The worldwide Internet has been providing an IP multicast routing service for some
NOTES time now, through an Internet segment called MBone (Multicast Backbone). The MBone
is Collection of UNIX workstations running a routing daemon called “mrouted”, which is
an implementation of the Distance Vector Multicast Routing Protocol. Using MBone,
conference sessions and other Internet Technical meetings can be multicast and the remote
users can listen to the technical talks and ask the speaker questions.
The router learns the binding of the IP to the 48 bit LAN address through the Address
Resolution Protocol(ARP).This protocol allows a router to broadcast a query containing
an IP address and receive back associated LAN address. A related protocol reverse
ARP, can be used to ask which IP address is bound to a given LAN address.
For Multimedia, the best performance would be achieved if a fixed path (static route)
could be allocated. The problem with this extension is that the IP protocol would lose the
ability to bypass link-layer failures, which is a fundamental property of the Internet
architecture and should be retained for integrated services. Further in the case of static
route, if no resource reservation would be performed along the fixed path, the flexibility of
changing a route on a packet basis would be lost , which would decrease the performance
of the best effort service.
Internet Group Management Protocol (IGMP)
Internet Group Management protocol (IGMP) is a protocol for managing Internet
multicasting groups. It is used by conferencing applications to join and leave particular
multicast group. The basic service permits a source to send datagrams to all members of a
multicast group. There are no guarantees of the delivery to any or all targets in the group.
Multicast routers send queries to refresh their knowledge of memberships present on
a particular network. If no reports are received for a particular group after some number
of queries, the routers assume the group has no local members, and that they need not
forward remotely originated multicasts for that group onto the local network.
Otherwise, hosts respond to a query by generating reports(Host membership reports).
Queries are normally sent infrequently, so as to keep the IGMP overhead on host and
routers very low. However, when a multicast router starts up, it may issue several queries
to quickly build up its knowledge of local membership.
When a host joins a new group, it should immediately transmit a report for that group,
rather than waiting for a query, in case it is the first member of the group.
In a multimedia scenario, IGMP must loosely cooperate with an appropriate resource
Management protocol, such as RSVP, to provide a resource reservation for a member
who wants to join a group during a conference.
Multimedia data typically means digital images, audio, video, animation and graphics
together with text data. The acquisition, generation, storage and processing of multimedia
data in computers and transmission over networks have grown tremendously in the recent
past.
Contents of MMDB
Media data
This is the actual data representing images, audio, video that are captured, digitized,
processes, compressed and stored.
This contains the keyword descriptions, usually relating to the generation of the media
data. For example, for a video, this might include the date, time, and place of recording,
the person who recorded, the scene that is recorded, etc this is also called as content
descriptive data.
This contains the features derived from the media data. A feature characterizes the
media contents. For example, this could contain information about the distribution of colors,
the kinds of textures and the different shapes present in an image. This is also referred to as
content dependent data.
The last three types are called Meta data as they describe several different aspects of
the media data. The media keyword data and media feature data are used as indices for
searching purpose. The media format data is used to present the retrieved information.
Types of Databases
Documents are edited by large no. of users. Database may be very active and can
include large no of video objects. A large no of users in the group may read most documents
during the tracking stage.
Mail database
NOTES
The addresses and creator of mail message may access the mail for short duration
before it falls into disuse and is ultimately relegated to archival status. The routing program
delivers the message to recipients mail file.
Information repositories
Document may be large and may have multiple linked sound clips and videos.
Database management system
Many inherent characteristics of multimedia data have direct and indirect impacts on
the design of MMDB’s. These include, the huge size of MMDB’s, temporal nature, richness
of content, complexity of representation and subjective interpretation. The major challenges
in designing multimedia databases arise from several requirements they need to satisfy
such as the following:
1. Manage different types of input, output, and storage devices. Data input can be
from a variety of devices such as scanners, digital camera for images, microphone,
MIDI devices for audio and video cameras. Typical output devices are high-
resolution monitors for images and video, and speakers for audio.
2. Handle a variety of data compression and storage formats. The data encoding has
a variety of formats even within a single application. For instance, in medical
applications, the MRI images of brain have lossless or very stringent quality of
lossy coding technique, while the X-ray images of bones can be less stringent.
Also, the radiological image data, the ECG data, other patient data, etc. have
widely varying formats.
Support different computing platforms and operating systems. Different users operate
computers and devices suited to their needs and tastes. But they need the same kind of
user-level view of the database.
Integrate different data models. Some data such as numeric and textual data are best
handled using a relational database model, while some others such as video documents are
better handled using an object-oriented database model. So these two models should
coexist together in MMDB’s.
A variety of user-friendly query systems suited to different kinds of media is needed.
From a user point of view, easy-to-use queries and fast and accurate retrieval of information
is highly desirable. The query for the same item can be in different forms. For example, a
portion of interest in a video can be queried by using either
1) A few sample video frames as an example,
2) A clip of the corresponding audio track or
3) A textual description using keywords.
1. Handle different kinds of indices. The inexact and subjective nature of multimedia
NOTES data has rendered keyword-based indices, exact and range searches used in
traditional databases ineffective. For example, the retrieval of records of persons
based on social security number is precisely defined, but the retrieval of records of
persons having certain facial features from a database of facial images requires,
content-based queries and similarity-based retrievals. This requires indices that
are content dependent, in addition to key-word indices.
2. Develop measures of data similarity that correspond well with perceptual similarity.
Measures of similarity for different media types need to be quantified to correspond
well with the perceptual similarity of objects of those data types. These need to be
incorporated into the search process
3. Provide transparent view of geographically distributed data. MMDB’s are likely
to be in distributed nature. The media data resides in many different storage units
possibly spread out geographically. This is partly due to the changing nature of
computation and computing resources from centralized to networked and
distributed.
4. Adhere to real-time constraints for the transmission of media data. Video and
audio are inherently temporal in nature. For example, the frames of a video need
to be presented at the rate of at least 30 frames/sec. for the eye to perceive continuity
in the video.
5. Synchronize different media types while presenting to user. It is likely that different
media types corresponding to a single multimedia object are stored in different
formats, on different devices, and have different rates of transfer. Thus they need
to be periodically synchronized for presentation.
The recent growth in using multimedia data in applications has been phenomenal.
Multimedia databases are essential for efficient management and effective use of huge
amounts of data. The diversity of applications using multimedia data, the rapidly changing
technology, and the inherent complexities in the semantic representation, interpretation and
comparison for similarity pose many challenges. MMDB’s are still in their infancy. Today’s
MMDB’s are closely bound to narrow application areas. The experiences acquired from
developing and using novel multimedia applications will help advance the multimedia database
technology.
The first MMDBMS rely mainly on the operating system for storing and querying
files. These were ad-hoc systems that served mostly as repositories. The mid 90s saw a
first wave of commercial, implemented from-the-scratch, and full-fledge MMDBMS. Some
of them were MediaDB, nowMediaWay, JASMINE, and ITASCA
That is the Commercial successor of ORION. They were all able to handle diverse
kinds of data and provided mechanisms for querying, retrieving, inserting, and updating NOTES
data. Most of these products disappeared from the market after some years of existence,
and only some of them continued and adapted themselves successfully to the hardware
and software advances as well as to application changes. Instance, Media Way provided
early very specific support for a wide variety of different media types. Specifically different
media file formats varying from images, and video to PowerPoint documents can be managed
segmented, linked and searched.
The most advanced solutions are marketed by Oracle 10g, IBM DB2 and IBM
Informix. They propose a similar approach for extending the basic system. As sample, we
consider the IBM DB2 Universal Database Extenders. The IBM DB2 Universal Database
Extenders extend the ORDBMS management to images, video, audio, and spatial objects.
All these data types are modeled, accessed, and manipulated in a common framework.
Features of the multimedia extenders include importing and exporting multimedia objects
and their attributes in and out of a database, controlling access to non-traditional types of
data with the same level of protection as traditional data and browsing or playing objects
retrieved from the database. For instance, the DB2 Image Extender defines the distinct
data type DB2IMAGE with associated user-defined functions for storing and manipulating
image files
On top of MIRROR runs the ACOI system that is a platform for indexing and retrieval
of video and image data. The system provides a plug-in architecture to subsequently index
multimedia objects using various feature extraction algorithms. ACOI relies on the COBRA
(Content-Based Retrieval) video data model (only low level descriptors). COBRA
introduces a feature grammar to describe the low-level persistent meta-data and the
dependencies between the extraction mechanisms.
The MARS project includes the conception of a multimedia data model, for content
indexing and retrieval, and for database management. The presented multimedia data model
influenced the development of the MPEG-7standard. MARS is a from-the-scratch
The MPEG-7 Multimedia Data Cartridge (MDC) is a system extension of the Oracle
9i DBMS providing a multimedia query language, access to media, processing and
optimization of queries, and indexing capacities relying on a multimedia database schema
derived from MPEG-7. The MDC builds on three main concepts. At first, the Multimedia
Data Model is the database schema which is derived from MPEG-7 descriptions. It is
realized with the help of the extensible type system of the cartridge environment, i.e.,
descriptors in the MPEG-7 schema are mapped to object types and tables. Secondly, the
Multimedia Indexing Framework (MIF) which provides an extensible indexing environment
for multimedia retrieval. The indexing framework is integrated into the query language and
enables efficient multimedia retrieval. Finally a set of internal and external libraries allow
the access to the media and communicate with MDC (query, insert, update, etc.). The
Multimedia Schema of MDC relies on the one hand on the structural and semantic parts of
the MPEG-7 standard (high-level descriptions). On the other hand, object types for the
MPEG-7 low-level descriptors, like color, shape, texture are provided and linked to the
high-level descriptions. This enables one to retrieve multimedia data not only by low-level
features, but also by semantics in combination with low-level characteristics.
The Multimedia Indexing Framework (MIF) offers advanced indexing services to the
MMDBMS. It is generic in a way that new index types may be added without changing
the interface definitions. The MIF is divided into three modules. Each module, especially
the Gist Service and the Oracle Enhancement may be used on its own and may be distributed
over the network The Gist Service is the main part and realized in the external address
space. It offers services for index management and relies on the Generalized Search Trees
(GIST) framework .The original GIST framework was extended to cope with several
index trees at time and to be linked to the Oracle DBMS. Several index trees can be
employed in our system, in the cat gory balanced trees e.g., X-tree and SRtree, and other
access methods not relying on balanced trees (VA-files). This service is split into two
components: Gist Communicator and Gist Holder. The Gist Communicator is a COM-
object (Component Object Model) used for inter-process communication between the
database (by passing through the Gist
Wrapper shared library) and the implemented access methods. Thus, the Gist
Communicator supplies the necessary functionality (e.g., creating, inserting, deleting) for
accessing the index structures. The Gist Holder manages all currently running index trees
and the accesses to them. Each index tree is identified through a global and unique ID,
which is forwarded to the accessing process. The integration of MIF into the MDC is NOTES
done via the index extension mechanisms of Oracle 9i. For each new index, a new Oracle
index type has to be defined, but the interface to the Gist remains unchanged.
5.3.1 Synchronization
Content relations
Content relations define a dependency of media objects from some data. An example
of a content relation is the dependency between a filled spreadsheet and a graphic text that
represents the data listed in the spreadsheet.
Spatial relations
The spatial relations that are usually known as layout relationship define the space
used for the presentation of a media object on an output device at a certain point of time in
a multimedia presentation.
Temporal relations
These relations define the temporal dependencies between media objects. They are
of interest whenever time-dependent media object exists.
Another example is an uncompressed video object that is divided into scenes and
frames. The frames can be partitioned in areas of 16x16 pixels. Each pixel consists of
luminance and chrominance values. All these units are candidates for LDU units.
In addition LDUs an be classified into closed and open LDUs. Closed LDUs have a
predictable duration. Examples are LDUs that are part of stored media objects of continuous NOTES
media like audio and video, or stored media objects with a fixed duration. The duration of
open LDUs is not predictable before the execution of presentation. Open LDUs typically
represent input from a live source, for example, a camera or a microphone, or media
objects that include a user interaction.
For digital video, often the frames are selected as LDUs. For example, for a video
with 30 pictures per second, each LDU is a closed LDU with a duration of 1/30 s. Figure
below shows the video LDUs.
In the case of the basic physical unit being too small to handle, often LDUs are
selected that block the samples into units of a fixed duration. A typical example is an audio
stream where the physical unit duration is too small; therefore LDUs are formed comprising
512 samples. Assuming one sample is coded with one byte, hence each block contains
512 bytes.
In computer generated media objects, the duration of LDUs may be selected by the
user. An example of these user-defined LDU durations is the frames of an animation
sequence. For the presentation of a two-second animation sequence, 30 to 60 pictures
may be generated depending on the necessary quality. Thus the LDU duration depends on
the selected picture rate.
Open LDUs of unpredictable duration are given in the case that the LDU has no
inherent duration. An example of open LDU is a user interaction in which the duration of
the interaction is not known in advance.
LDUs of a timer
Synchronization examples
Synchronization example
A further requirement of distributed multimedia applications is the need for a rich set
of real-time synchronization mechanisms about continuous media transmission. Such real-
time synchronization can be divided into two categories: intra-media synchronization and
inter-media synchronization.
NOTES
Figure 5.2 shows the multicast communication scenario, where sender sends the full
color video at 24~30 frames per second. The video signal goes without filtering to the
receiver A ie. full color signal. Receiver B and receiver C have received the signal after one
level of filtering may be for reduction in number of frames. Receiver D receives after two
levels of filtering, i.e less number of frames as well as may be reduction in colors or mono
chrome image.
The following factors influence the lip synchronization and thus quality when only
visuals (video) are considered is
Content
Resolution and quality – The difficulty of human perception to distinguish any lip
synchronization skew with a higher resolution and the capability of multimedia software
and hardware devices to refresh motion video data every 40 ms.
The following factors influence synchronization when audio is taken into account
Content
Background noise or music
Language and articulation
Two audio tracks can be tightly or loosely coupled. For tightly coupled audio tracks
a skew of a magnitude of 20 ms is allowable. For loosely coupled audio a skew of 500 ms
is affordable.
The combination of audio with images has its initial application in slide shows. A skew
of 1 ms. is affordable.
The synchronized presentation of audio with some text is usually known as ‘audio
annotation’. For this type of media synchronization, the affordable skew can be 240ms.
The synchronization of video and text or video and image occurs in two distinct
fashions: In the overlay mode text is often an additional description to the displayed moving
image sequence. For such situations a skew of 240 ms is affordable. In the second mode,
no overlay occurs and skew is less important. For such types of applications a skew of
500ms is affordable.
The system shown in figure 5.3 uses a distributed set of computers to control six
independent video properties for three wall screens and three monitors into the navigation NOTES
table. Simple multimedia systems are but extended computers that enable capture, storage,
and playback of several media types.
Modern systems for multimedia development, and some for playback, have extensive
multimedia capabilities, with many components in the multimedia subsystem, often more
(powerful) than in the host system!.
5.5 APPLICATIONS
It is a technology that provides one with sensations and the control of perspective so
that one experiences being the illusion of being in the presence of the object, within a
situation or surrounded by a place. It relies on audio and video technologies augmented by
a computer interface that reads the movement of the participant’s body. It is used in
1. Perambulation (walking through a building)
2. Synthetic Experience (perform surgery, operate a plant control room)
3. Realization (foreign currency , inventory of items)
The preceding types of multimedia computer systems enrich the computing environment
with a wider variety of visual and auditory data. Virtual reality systems transform the
computing environment by immersing the user in a simulated world, which also can include
movement and tactile control. When this is accomplished, the user enters a “virtual reality.”
virtual reality systems will permit users to interact with computer systems in a manner that
more closely mimics how humans naturally operate in the real world.
MULTIMEDIA APPLICATIONS
NOTES
5.5.2.1 Virtual Reality
The term Virtual Reality (VR) promises far more than our technology can currently
deliver. It has been variously used to describe user interfaces ranging from synthesized
physical environment presented on Head-Mounted Displays (HMDs), to ordinary graphics
displayed on conventional CRTs, to text-based multi-user games.
The first VR systems appeared before computers were used for VR. Early flight
simulators also created virtual environments without the aid of computers. They used movies
or created live video by shooting model boards with TV cameras.
Currently, the hardware platform of virtual environments consists of color stereo HMDs,
haptic displays, spatial sound, data gloves and 3D graphics. The software architecture for
virtual environments has been developed to support a single hardware platform or a small
number of tightly coupled platforms. As a result, systems were originally modeled after
traditional interactive programs. The first virtual environment applications were simple event-
loop-based programs. There are several problems with this approach because the following
requirements need to be satisfied:
1. VR displays should respond to changes in tracked objects, especially the user’s
head, at least ten times per second for the virtual environment to be convincing.
2. VR systems should not have tightly coupled distributed processes because this
approach does not scale towards new hardware and system software solutions. A
solution is to use structured as a large set of asynchronous, event-driven processes.
Each process is independent of the others and communication is performed via a
well-defined message protocol.
3. VR systems should scale up gracefully. Solutions for this requirement can be
achieved using adaptive algorithms, dynamic environments and adaptive protocols.
4. VR systems should have immersive, fully synthesized, photo-realistic graphical
displays. The solution for this requirement is still far away because current technology
still does not provide such displays. Partial solutions to this requirement might be in
a graphical display with (a) rendering of fully scenes, (b) rendering images from the
view point of a given user.
which angle it is displayed, MR Toolkit distributes the VR system over the multiple processes.
Another implementation approach took toolkits such as dVS, VR-DECK or DIVE.
NOTES
5.5.3 Mobile Messaging
Convergence between current network technologies: the internet and the mobile
telephony is thus taking place, but the internet’s ip routing, was designed to work with
conventional static nodes not mobile nodes. Efforts are therefore being made in wireless
and internet forums to enhance ip routing to support mobility and many proposals have
been made in this direction.
Mobile ip is a key proposal from the internet engineering task force (ietf) that specifies
protocol enhancements to enable transparent routing of ip data packets to mobile nodes in
the internet. This white paper thus consolidates and summarizes mobile ip concepts from
the base rfc, as well as numerous related rfcs.
E-mail and cell phones each provide obvious and significant advantages over traditional
land-line telephone communication, and this accounts for their extraordinary success. Among
other things, e-mail provides the ability to conduct interpersonal communications on an
asynchronous basis, and also provides the qualitative advantages of the written, over the
spoken, word. And cell phones provide the self-evident advantages of mobile, real-time
voice communications. Individually, the major advantage of each of these two technologies
is not shared by the other. The correct combination of the two technologies, however, can
provide the major advantages of both. We refer to this combined technology as mobile
messaging. Formally, we define mobile messaging as the ability to send and receive e-mail
messages from a mobile, hand-held or hand-carried device. This capability is also sometimes
referred to as wireless messaging, or mobile e-mail.
5.5.4 Mobile Telephone function access
During the 1990s, mobile telephones and the networks they were connected to began
to offer a far broader range of functions than simply making and receiving calls. Users
could divert calls, set up message boxes, change ringing options, and so on. To support
this range of functionality, most mobile phones used — and still do — a hierarchical menu
based approach. That is, a user can view a series of options on the small screen of the
phone,select one of the options. They are then presented with a series of sub-options (an
example display from a mobile phone is shown in figure 1). This navigation continues until
the user finds the function or the desired information is found (or they give up).figure 1:
mobile ‘phone display’ showing sub-menu and navigation. Third sub-option of menu two
NOTES displayed. The next level in the tree can be reached by pressing select and the previous by
back
Read Menu
network address (ip address) while changing sub-nets. But if a mobile host changes its
network address, all established transport layer connections (tcp) are broken. NOTES
EXAMPLE
Imagine a commuter downloading music while travelling by train . This user is using a
laptop attached to a mobile handset. The mobile handset could be connected to the internet
using data services provided by gsm or cdma
Networks. When the user registers for data services, i.e. The user initiates a data call,
he/she will be assigned a unique ip address. Once connected, the user starts an ftp session
to download music from the internet. This ftp session is based on a transport layer connection
that is dependent on the connection invariant1. But as the train moves, the mobile station
moves to another cell; the point of attachment for data services and therefore the sub-net,
may change. If the mobile station is now assigned a new ip address, all the transport layer
connections will break down. The ftp session will therefore be aborted.
This is the problem that mobile ip seeks to solve.
Teleconferencing systems allow the user to achieve the most of the efficiency and
productivity of traditional meetings with one main difference: the user can stay at his/her
desk as can the remote conference participants. A multimedia conferencing system enables
people to work together across geographically distant locations without the need to meet
at one site. They communicate among each other in multi-party or face-to-face mode
using motion video, audio and textual information in each direction. The audio and video
quality heavily depends on the platform. Therefore a big factor in the success of a tele-
conferencing system is to achieve high media quality over any platform and interconnectivity
among various platform vendors. A possible set up is shown in the figure below.
Monitor Monitor
Network
Video conferencing is used either in an office environment, where the video is displayed
NOTES on a PC or workstation screen, or in a conference room, where the video is displayed on
a video wall. For the office environment, desktop video conferencing systems have been
developed. For a conference room environment, large TV screens in conference rooms
are used for meetings of groups located at different geographical places.
Interactive video research various problems in the area of interests and Video-
On-Demand. Interactive TV research concentrates on cable a television, whereas Video-
On-Demand concentrates computer-oriented. Since both areas merge, in the future we
will see the results of both are interactive video service.
5.5.6.1 Interactive TV
Interactive TV specifies that the TV viewer can become a more active than is the
case today. There are several types of interactivity. For instance user might select one out
of several camera angles from a televised sport or ask for supplementary information
about the teams or players. Another could be an educational program where one out of
several educational to be selected and/or extra tutorials could be requested.
5.5.6.2 Video-On-Demand
and set top units for receiving, demodulating, decoding and converting video for television
playback. NOTES
VOD services need retrieval tele-services. Furthermore, the video services are an
asymmetrically switched service in which the customer chooses among a wide selection of
video material and receives, on-demand, a real-time response. The service is asymmetric
in the sense that the downstream (to the customer) channel is much higher bandwidth than
the upstream channel.
The best-known application of VOD is the video library which uses Interactive
VOD. Interactive VOD allows a user to gain access to a movie (i.e., digitized video
sequence stored on a storage medium such as hard disk) via point-to-point connection.
This connection allows the user individual and instantaneous control of the storage medium
in terms of start, fast-forward, pause and rewind actions.
There are two types of interactive VOD service:
Interactive VOD with instantaneous access whereby the user can instantly retrieve
and individually control program information from a library instantly with instant
control response. The service is provided as follows: the customer selects a movie
out of a large set of movies; the transmission starts within a few seconds; the user
can stop and continue the transmission instantaneously, and; the user gets
uninterrupted response. To provide the service, a video load buffer must be created
at the start of the programme that responses to different functions can be performed
immediately.
Video
Server Switch
Video
Dial Head
Tone
Video Gateway End
Server
Interactive VOD with delayed access, whereby users retrieve and individually
NOTES control program information from a library, but there is a waiting time depending
on the available bandwidth resources to the network, and/or popularity index of
the requested program. In this case, the user needs to wait a few minutes before
the movie starts. For this case, the video load buffer is only created when the
function pause is performed and not the start of the program. Therefore, this service
consumes less video buffer resources.
QUESTIONS
1. Discuss the processor configuration required for multimedia live presentation system.
2. What are all the functions of a multimedia operating system
3. Compare the first three generations of multimedia database systems.
4. Discuss the QoS architecture for a distributed multimedia system.
5. Define jitter and orchestration
6. Write short notes on the following:
a. Virtual reality b. Mobile messaging c Video- on -demand
NOTES NOTES
NOTES NOTES