0% found this document useful (0 votes)
22 views78 pages

Computer Systems Overview

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views78 pages

Computer Systems Overview

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

CSI 104

Chapter 1:Computer organization


COMP UTER

The input/output subsystem

CPU

The memory —> được đánh địa chỉ cho từng ô nhớ từng byte ( cất dữ liệu vào đây
) từng byte của mt đều đc đánh 1 con số con số đó gọi là address

CSI 104 1
CPU

CPU điều khiển hoạt động của mt tùy vào câu lệnh

Control unit Điều khiển hoạt động của CPU

PC Chứa địa chỉ của câu lênh tiếp theo

IR Chứa địa chỉ chứa lệnh máy nó cần thực hiện

Lưu trữ các lệnh khác nhau để nó thực hiện thì nó cần cái dữ
Register liệu, dữ liệu được đựt trong register được truy cập đạt tốc dộ
cao nhất

⇒ thực hiện xong IR thì thực hiện 1 cái của PC


ALU —>đơn vị xử lí logic học thực hiện phép tính
MAIN MEMORY ( RAM)

8bits =s 1byte , 16bits = 2 bytes , 32bits = 4bytes


Cần nhiều bít để chứa dữ liệu lớn —> ko gian đĩa cứng lớn

1 ô nhớ lưu trữ dữ liệu ngta gọi là 1 word, 1 word có thể có độ dài là 8bits 16bits 32bits
64bits ….

Để có thể trình bày bộ nhớ chính có thể


thấy mỗi một dòng của 1 cái hình chữ

CSI 104 2
nhật blue biểu diễn các tạng thái khác
nhau của từng bit ( bit thì hoặc là 0 or 1)
8bits liền nhau tạo thành 1 word mỗi 1
byte gồm có 8bits được đánh 1 con số,
con số đó gọi là address ( địa chỉ của
từng ô nhớ )

Memory type ROM

RAM 1. Static ram 1. PROM → lập trình


1. RAM Random access được
2. Dynamic ram
memory 2. EPROM → xóa đc
2. ROM Read-only 3. EEPROM → xóa =
memory điện tử được

sram tốt hơn dram vì nó truy cập vào đạt tốt độ nhanh hơn nhiều

VD về ROM : đĩa CD
CẤU TRÚC PHÂN CẤP CỦA MEMORY

CSI 104 3
Cache memory thường được đặt giữa CPU và memory

Cache is a small amount of memory which is part of the CPU which is physically closer
to the CPU than RAM is. The more cache there is, the more data can be stored closer
to the CPU.
Cache memory is beneficial because:

Cache memory holds frequently used instructions/data which the processor may
require next and it is faster access memory than RAM, since it is on the same chip
as the processor.

This reduces the need for frequent slower memory retrievals from main memory,
which may otherwise keep the CPU waiting.

The more cache the CPU has, the less time the computer spends accessing slower
main memory and as a result programs may run faster.

CSI 104 4
CSI 104 5
Keys note from chapter 1-3
TURING MACHINE
The idea of a universal computational device was first describes by Alan Turing

CSI 104 6
All computation could be performed by a special kind of a machine, now called a Turing
machine.

A program is a set of instructions that tells the computer what to do with the data

A universal Turing machine, a machine that can do any computation if the appropriate
program is provided was the first description of modern computer

VON NEUMAN

The control unit fetches one instruction from memory, decodes it, then executes it.

Second generation has the appearance of 2 high level languages FORTRAN and
COBOL made programming easier.

CSI 104 7
ALU performs logic operations( OR, NOT, XOR, AND); shift operations( logic shift and
arithmetic shift); arithmetic operations( caculate).

Main memory( have noted before)


IO subsystem in a computer is a collection of devices—> communicate with the outside
world and store programs and data even when the power is off.

Non-storage device ⇒
communicate with the outside world but can not store
information (keyboard, printer,…)
Storage ⇒ contents are nonvolatile → not erase when the power is turn off( magnetic,
optical)
HDD disk + CD-ROM : magnetic or optical

In a computer, the input/output subsystem accepts data and programs and sends
processing results to output devices.

Storage devices —> magnetic or optical


A 17th-century computing machine that could perform addition and subtraction
was the pascaline.

The first computing machine to use the idea of storage and programming was
called the Jacquat loom.
High level language separated the programming task from computer operation
tasks.
Software engineering is the design and writing of a program in struct form.

CSI 104 8
SCSI controller to connect I/O devices
FireWire controller to connect I/O devices

USB controller to connect I/O devices


Addressing of IO devices : isolated I/O and memory mapped I/O

CSI 104 9
CISC, RISC, Pipelining

CSI 104 10
Parallel processing

SISD 1—>1

SIMD 1—>3
MISD 3—>1

MIMD3—>3

Chapter 2:
A number system defines how a number can be represented using distinct symbol

A number can be represented differently in different system

Chapter 3

CSI 104 11
CSI 104 12
CSI 104 13
Data ⇒ numbers, text, audio, image and video.
“Multimedia” → define information that contains numbers, text, images, audio and video.
ALL DATA TYPES ARE TRANSFORMED INTO A UNIFORM REPRESENTATION.
Bits (binary digit) is the symbol 0 or 1 is called bit ⇒ smallest data that can be stored in
a computer.

⇒ Bit pattern - a sequence, or a string of bits.


I, Storing numbers
1, How to store the sign of the number
2, How to show decimal point
2 → computers use 2 different representations: fixed-point and floating point
Fixed point⇒ store number as an integer
Floating point⇒ store a number as a real

CSI 104 14
SIGN AND MAGNITUDE REPRESENTATION ⇒

0 is pos, 1 is nega

CSI 104 15
CSI 104 16
A floating point representation of a number is made up of 3 parts: a sign, a shifter and a
fixed point number

Storing text, media, image, video


1, TEXT:
A section of text → is a sequence of symbols ⇒ represent each symbol with a bit
pattern.

ASCII using 7 bits for each symbol


Extended ASCII: Using 8 bits for each symbol
Unicode: 32 bits.

CSI 104 17
2, AUDIO:

Audio is an example of analog data


Step 1: ⇒ Sampling → If we can not record all the values of an audio signal over an
interval ⇒ record some of them

Step 2: Quantization ⇒ a process that rounds the value of a sample to the closest
integer value.

Step 3: Encoding → if we call bit depth or number of bits per samples is B; the number
of samples per second is S ⇒ bit rate is B*S(bits per second)
Standard for sound recording: ⇒ MP3
3, IMAGES
2 techniques: raster graphics(bitmap graphics) and vector graphics
Bitmap graphics → used when we need to store an analog such as a photograph ( file
size big+ rescaling is troublesome+ image looks ragged when it is enlarged)
A photograph consists of analog data, similar to audio information. The difference is that
the color of data varies in space instead of in time.
The samples are called pixels( picture elements)
Resolution → We need to decide how many pixels we should record for each square or
linear inch ⇒
scanning rate in image processing → resolution.

CSI 104 18
Color depth ⇒ is the number of bits used to represent a pixel
True color ⇒ uses 24bits to encode a pixel - JPEG

Indexed color ⇒ use only a portion of these colors. -GIF


Vector graphic → does not store the bit patterns for each pixel
An image is decomposed into a combination of geometrical shapes
4, VIDEO
Video is a representation of images(called frames) over time ⇒ representation of
information that changes in space and in time

Operation on data:
NOT(unary), AND(binary), OR(binary), XOR(binary).
SHIFT OPERATION: move the bits in a pattern, change position of the bits

Simple shift:

Circular shift:

CSI 104 19
Arithmetic shift operation:

Arithmetic right shift is used to divide an integer by two, while arithmetic left shift is used
to multiply an integer by two (discussed later).
If the new sign bit is the same as the previous one, the operation is successful,
otherwise an overflow or underflow has occurred and the result is not valid.

ARITHMETIC OPERATION: +,-

Chapter 4
A net work is defined as the interconnection of a set of devices capable of
communication
A LAN is usually privately owned and connects some hosts in a single office, building,
or campus LAN interconnects hosts, each host in a LAN has an identifier, an address,
that uniquely defines the host in LAN
A WAN is also an interconnection of devices capable of communication of many LAN.
WAN interconnects connecting devices such as switches, routers, or modems

An internet is 2 or more networks that can communicate with each other and is
composed of thousands of interconnected networks

CSI 104 20
The internet is as several backbones, provider networks, and customer network
Provider networks use the service of the backbones for a fee
Backbones and provider networks are also called Internet Service Provider (ISPs)

TCP/IP - Transmission Control Protocol / Internet Protocol


Protocol là gì? Protocol là một giao thức mạng, tập hợp các quy tắc đã được thiết lập
với nhiệm vụ hàng đầu là định dạng, truyền và nhận dữ liệu. Tất cả nhiệm vụ này sẽ
được thực hiện sao cho các thiết bị mạng máy tính (Từ server, router đến end point) có
thể giao tiếp rõ ràng với nhau. Dù có sự khác biệt về cơ sở hạ tầng, thiết kế hay các
tiêu chuẩn cơ bản thì giao thức Protocol vẫn sẽ hỗ trợ tuyệt đối để việc giao tiếp có thể
diễn ra tốt nhất.
We need protocol at each layer, or protocol layering

LAYER IN NETWORKING

CSI 104 21
Tầng 5 – Tầng Ứng Dụng (Application)
Cung cấp cho các ứng dụng những trao đổi dữ liệu chuẩn hóa, giao tiếp dữ liệu
giữa 2 máy khác nhau thông qua các dịch vụ mạng khác nhau.

Bao gồm các giao thức trao đổi dữ liệu hỗ trợ truyền tập tin: HTTP, FTP, Post Office
Protocol 3 (POP3), Simple Mail Transfer Protocol (SMTP) và Simple Network
Management Protocol (SNMP).

Dữ liệu trong tầng này là dữ liệu ứng dụng thực tế.

Tầng 4 – Tầng Giao Vận (Transport)


Chịu trách nhiệm duy trì thông tin liên lạc end-to-end trên toàn mạng. TCP xử lý
thông tin liên lạc giữa các máy chủ và cung cấp khả năng kiểm soát luồng, ghép
kênh và độ tin cậy.

provides services to the application layer and recieves services from network layer

transport layer provides process to process communication

For communication we must define the ip, to define the processes we need to send
identifiers called port numbers, in tcp/ip numbers are intergers between 0-65535
-16bits

Trong tầng này bao gồm 2 giao thức cốt lõi là TCP và UDP. TCP giúp đảm bảo chất
lượng gói tin và UDP giúp tốc độ truyền tải nhanh hơn.

Tang 3 - Network
THE NETWORK LAYER ACCEPTS A PACKET FROM A TRANSPORT LAYER
ENCAPSULATE(dong goi) THE PACKET IN A DATAGRAM AND DELIVER TO DATA
LINK LAYER

⇒ host to host
The main protocol is called the Internet Protocol IPv4-32bit va IPv6-128 bit are in used
today
Co 3 cach de show ip address base 2 base 16 va base 256

CSI 104 22
Tang 2 - Datalink
Communication in datalink layer is node to node

CSI 104 23
Wireless ethernet or wifi is a wireless LAN: 2 kind of service the basic service set
BSS+extended service set ESS
CABLE SERVICE provide access to TV program
Wireless WAN Wimax- worldwide interoperability access is the wireless version of DSL

Tầng 1 – Tầng Vật Lý (Physical)


Transfer the bits recieved from the datalink layer and convert them to electromagnetic
signals for transmission

Còn gọi là Link Layer, gồm các giao thức chỉ hoạt động trên một liên kết – thành
phần mạng kết nối các nút hoặc máy chủ trong mạng. Tầng này chịu trách nhiệm
truyền dữ liệu giữa hai thiết bị trong cùng một mạng.

Các giao thức truyền dữ liệu: Ethernet (cho mạng LAN) và ARP

Overview:
The application layer provides services to the user. Communication is provided using a
logical connection

CSI 104 24
Using the internet, we need 2 application programs to interact with each other: one
running on a computer and the other running on another
2 paradigms —> the client server paradigm and peer to peer paradigm Standard client
sever still used in http, ssh, ftp, email, www,…

DNS: generic domains (define registered hosts), country domains, and inverse domain

TCP có chức năng xác định các ứng dụng và tạo ra các kênh giao tiếp+ quản lí thông
tin truyền tải, tập hợp các thông tin theo đúng thứ tự + truyền chính xác đến địa chỉ cần
đến

IP là địa chỉ
Application—> provide services to user (client server, peer to peer used byHTTP,
WWW, FTP,SSH, email(SMTP)) DNS in the internet
network layer send services—>transport(process to process communication) provides
services to—> application
network layer responsible for the host to host delivery of message

the network layer accepts a packet from a transport layer, encapsulate the packet and
delivers to data link layer

CSI 104 25
at the destination host the datagram is decapsulate, the packet is extracted and
delivered to the corresponding transport layer
datalink layer( node to node) is territories of networks that when connected make up the
Internet. These network recieve services and provide services to the network layer
physical layer transfer the bits recieved from the datalink layer and convert them to
electromagnetic signals for transmission

CSI 104 26
CSI 104 27
Chapter 5: Operating System
Software is the collection of programs that allows the hardware to do its job
Computer software is divided into 2 board categories: the operating system and
application programs
Application program use the computer hardware to solve users’ problems.
The operating system on the other hand controls the access to hardware by users
—> An operating system is an interface between the hardware of the computer and the
user
—> An operating system is a program which facilitates the execution of other program
—> Act as a general manager supervising the activity of each component in the
computer system
Two major design goal —> Efficient use of hardware
—> Ease of use of resources

CSI 104 28
Turn on a computer —> bootstrap run first

Evolution of operating system


Batch system ⇒ control mainframe computers, use punched cards for input, line
printers for output, tape drives for secondary storage media.
Time-sharing system ( Hold several jobs in memory at a time)
Personal system ( single-user operating systems such as DOS)

Parallel system ( Many CPUs on the same machine ⇒ more speed and effiency)
Distributed system ( Shared between computers - Chapter 4)
Real-time system ( To do a task within a specific time constraint)

Components(user interface or shell) of OS


A modern operating system has at least 4 duties: memory manager, process manager,
device manager, file manager.

CSI 104 29
User interface, a program that accepts requests from users and interprets them for the
rest of the operating system - Ex: in UNIX called shell, in Window called GUI.
Memory allocation must be managed to prevent applications from running out of
memory—— there are 2 board categories of memory management: monoprogramming
&& multiprogramming.
In monoprogramming, most of the memory capacity is dedicated to a single program ,
only a small part is needed to hold the operating system. In this configuration, the whole
program is in memory for execution. When the program finishes running, the program
area is occupied by another program.(the program must fit the size of memory)
In multiprogramming more than 1 program is in memory at the same time, and they
are executed concurrently, CPU switching rapidly between the programs

Nonswapping ⇒ the program remains in memory for the duration of execution


1, Partitioning :
Memory is divided into variable-length sections. Each section or partition holds one
program.
CPU switches between programs. It starts with one program, executing some
instructions until it either encounters an input/output operation or the time allocated for
that program has expired → move next to the program → after all done ⇒ move back to
first.
⇒ each program is entirely in memory and occupying contiguous locations.
2, Paging:
Memory is divided into equally sized sections called frames. Programs are also divided,
into equally sized sections called pages(size frames = size pages)

CSI 104 30
A page is loaded into a frame in memory Ex: Program has 3 pages → occupies 3
pages.

Swapping ⇒
During execution, the program can be swapped between memory and disk
one or more times.
1, Demand paging:
The program is divided into pages, but the pages can be loaded into memory one by
one, executed, and replaced by another page. In other words, memory can hold pages
from multiple programs at the same time. In addition, consecutive pages from the same
program do not have to be loaded into the same frame—a page can be loaded into any
free frame
2, Demand segmentation:
The program is divided into segments that match the programmer’s view. These are
loaded into memory, executed, and replaced by another module from the same or a
different program

Process manager
Program is non active set of instructions stored on disk.
A program becomes a job from the moment it is selected for execution until it has
finished running. It may be located on disk waiting to be loaded to memory, or it may be
loaded into memory and waiting for execution by the CPU. It may be on disk or in
memory waiting for an input/output event, or it may be in memory while being executed
by the CPU
A process is a program in execution. It is a program that has started but has not
finished.
Every process is a job, but not every job is a process. A process may be executing or it
may be waiting for CPU time.
State diagrams - Biểu đồ trạng thái

CSI 104 31
A program becomes a job when selected by the operating system and brought to the
hold state. When there is memory space available to load the program totally or
partially, the job moves to the ready state. It now becomes a process. It remains in
memory and in this state until the CPU can execute it, moving to the running state at
this time. When in the running state, one of three things
can happen:
❑The process executes until it needs I/O resources ⇒ waiting state until I/O is
complete.
❑ The process exhausts its allocated time slot ⇒ ready state
❑ The process terminates ⇒ terminate state
Queuing - Trạng thái chờ
⇒ Shows one job or process moving from one state to another.
Process synchronization is the whole idea behind process management , when the
operating system does not put resource restrictions on processes—> deadlock and
starvation occurs

Deadlock occurs if the operating system allows a process to start running without first
checking to see if the required resources are ready, and allows a process to hold a
resource as long as it wants.
Starvation can happen when the operating system puts too many resource restrictions
on a process.

Device manager
Device manager or input/output manager is responsible for access to input/output
devices

CSI 104 32
File manager
-Control access to files
-Supervises the creation, deletion and modification of files
-controls the naming of files
-supervises the storage of files
-is responsible for archiving and backups

Chapter 6: Algorithms
Input/Output processing:

Algorithm needs refinement(su sang loc) to be acceptable to the programming


community.
2 problems:
1, The action in first step is different than those for the other step

CSI 104 33
2, The wording is not the same steps 2 to 5
Generalization of algorithm.
Three basic constructs:
⇒ Structured program or algorithm: Sequence, Decision(if else), Repetition(loop) ⇒
Make an algorithm easy to understand, debug, or change.
Sequence:

Decision(selection construct):

CSI 104 34
Repetition:

CSI 104 35
Algorithm Representation:
UML - Unified Modeling Language - is a pictorial representation of an algorithm
UML is a general purpose modelling language. The main aim of UML is to define a
standard way to visualize the way a system has been designed
UML is not a programming language, it is rather a visual language. We use UML
diagrams to portray the behavior and structure of a system.
Pseudocode:

Is an English-language-like representation of an algorithm ⇒ Use word to describe


action of program.

CSI 104 36
Basic Algorithms:

1. Summation ⇒ Add many interger using loop

2. Smallest and largest ⇒ use loop and condition

CSI 104 37
3. Sorting ⇒ process by which data is arranged according to its values
Bubble/Shell sort

Insertion sort

Selection sort (Learned in PRF)

Quick sort

Merge sort

Bubble sorts(Learned in PRF)


The smallest element is bubbled up from the unsorted sublist and moved to the sorted
sublist.
Searching:
⇒ Finding the location of a target among a list of objects
2 kinds: Sequential(linear search), binary search
Sequential can be used to locate an item in any list while binary search requires the list
first to be sorted
Linear Search: ⇒ Finding an elements within a list
Sequentially checks each element of the list until a match is found or the whole list has
been search

CSI 104 38
Binary Search: ⇒ find the position of a target value within a sorted array

CSI 104 39
= half-interval search =logarithmic search = binary chop, is a search algorithm that finds
the position of a target value within a sorted array.

Chapter 7: Programming
write a program for a computer → use a computer language

CSI 104 40
A computer language is a set of predefined words that are combined into a program
according to predefined rules syntax

Machine languages: made of streams of 0s and 1s


Assembly languages: use symbols or known as symbolic languages, replace binary
code for instructions and addresses with symbols or mnemonics
High-level languages: —> the desire to improve programmer efficiency and to change
the focus from the computer to the problem being solved led to the development of
high-level language.
Translation
Written in high-level ⇒ need to be translated into the machine language
The program in a high level language is called the source program.
The translated program in machine language is called the object program

2 methods used for translation: compilation and interpretation


Translation process:

CSI 104 41
Lexical analyzer → reads the source code, symbol by symbol, and creates a list of
tokens in the source language

Syntax analyzer → parses-phân tích cú pháp a set of tokens to find instructions

Sematic analyzer → checks the sentences created by the syntax analyzer to be


sure that they contain no ambiguity - tính không rõ ràng.

Code generator → Each instruction is converted to a set of machine language


instructions

Programming paradigms:
A paradigm is a way in which a computer language looks at the problem to solved

Procedural paradigm:

COBOL (COmmon Business-Oriented Language)

CSI 104 42
In procedural paradigm or imperative paradigm we can think of any program as an
active agent that manipulates-vận dụng passive objects

OOP:
A program in Java can either be an application or an applet.
Another interesting feature of Java is support for multithreading.

Deal with many active objects in our daily life

Functional paradigm:
is considered a mathematical function, a function is a black box

CSI 104 43
Declarative paradigm:
Use the principle of logical reasoning to answer queries. It is based on formal logic
defined by Geek

Common concepts
OOP uses the procedural paradigm when creating methods.
Identifiers:

the name of objects. Identifiers allow us to name objects in the program.

Data types:
Defines a set of value and set of operations that can be applied to those values. The set
of values for each type is known as the domain for the type

CSI 104 44
Variables:
names for memory locations.
Literals:
is a predetermined value used in program
Constants:
The use of literals is not considered good programming unless we are sure that the
value of the literal will not change with time
Inputs and Outputs:
read or write data- input :scanf; output: printf.

Chapter 8: Software engineering


The software development life cycle ~ application development life-cycle, is a process of
planning, creating, testing, and deploying( trien khai) an information system.
The development process in the software life lifecycle involves 4 phases: analysis,
design, implementation, and testing.
6 stages:

CSI 104 45
SDLC models → specify the various stages of the process and the order in which they
are carried out.
Waterfall model:

The incremental model:

In the incremental model, software is developed in a series of steps→ The developers


first complete a simplified version of the whole system.

CSI 104 46
2. ANALYSIS PHASE

⇒ Show what the software will do without specifying how it will be done.
⇒ Can use 2 separated approaches depending on the implementation phase is done
using procedural programming language or on object-oriented language.

procedural-oriented analysis Ex: flow diagram, state diagram… (structured analysis


or classical analysis) is the analysis process if implementation phase use
procedural language.

object-oriented analysis if the implementation use oo language.( use-case diagram


, class diagram, state chart,…)

3. DESIGN PHASE

defines how the system will accomplish what defined in analysis phase
⇒ All components of the system are defined

CSI 104 47
Procedure-oriented design→ have both procedures and data to design→ the whole
system is divided into a set of procedure or modules( structure chart)

Coupling is a measure of how tightly 2 modules are bound to each other ⇒ must be
minimize

Cohesion is a measure of how closely modules in a system are related ⇒ must be


maximize

Object-oriented design ⇒ design phase continues by elaborating the details of


classes( made of a set of variables(attribute) and set of methods(functions)

4. IMPLEMENTATION PHASE

⇒ Programmers write code for the modules in procedure-oriented design, or write the
program units to implement classes in ood.

Choice of language

CSI 104 48
Software quality

I, Operability
II, Maintainability- keep up to date and running correctly
III, Transferability- move data or a system from one flatform to another and to reuse
code.

5. TESTING PHASE

⇒ Find errors
2 types: white box &black box
i, Glass-box testing (white-box testing)
⇒ knowing the internal structure of the software → check to determine whether all
components of the software do what they are designed to do.
Glass box testing assumes that the tester knows everything about software.
ii, Black testing
Test without knowing what inside is, just have an input and the program have to have
the same output as the box set.
iii, Exhaustive testing

The best black-box test method is to test the software for all possible values in the input
domain.

CSI 104 49
iv, Random testing

A subset of values in the input domain is selected for testing. It is very important that the
subset be chosen in such way that hte values are distributed over the domain input.
v, Boundary-value testing
Errors often happen when boundary values are encountered.

Chapter 9: Data structures


Array :
Sequenced collection of elements, of the same data type.

Loops

Multi-dimensional array

Memory layout

CSI 104 50
Operations on array

Some common operations on arrays as structures are searching, insertion into array,
deletion, retrieval, traversal (an operation that is applied to all elements of the array),
string.

Records
record ~ struct ?
A record is a collection of related elements, possibly of different types, having a single
name. Each element in a record is called a field.

A field is the smallest element of named data that has meaning. A field has a type, and
exists in memory. Fields can be assigned values, which in turn can be accessed for
selection or manipulation. A field differs from a variable primarily in that it is part of
record.
2 types of identifier in the record: - Name of the record, name of each individual field in
inside the record
An array defines a combination of elements
A record defines the identifiable parts of an element.

CSI 104 51
Array of records:

An array can be thought of as a special case of an array of records in which each


element is a record with only a single field.

Linked list

CSI 104 52
A collection of data in which each element contains the location of the next element-
each element contains 2 parts: data and link ⇒ The link is used to chain the data
together, and contain pointer that identify next element in the list.
In addition, a pointer variable identifies the first element in the list. The name of the list
is the same as the name of this pointer variable.

The link in the last element contains a null pointer, indicating the end of the list. We
define an empty linked list to be only a null pointer.

In array the linking tool is index


In a linked list, the linking tool is the link that points to the next element- the pointer or
the address of the next element.
A linked list also has a name but the nodes have no specific name.
The name of a node is related to the name of the pointer that
points to the node. If the pointer that points to a node is called p, for example, we call
the node *p. Since the node is a record, we can access the fields inside the node using
the name of the node. For example, the data part and the link part of a node pointed by
a pointer p can be called (*p).data and (*p).link.

Operations on linked lists

→ the search algorithm for a linked list can only be sequential


Nodes have no name ⇒ Use 2 pointers, pre(previous) and cur(current).
At the beginning of the search, the pre pointer is null and the cur pointer points to the
first node. The search algorithm moves the two pointers together towards the end of the

CSI 104 53
list.

→ Inserting a node
Before insertion into a linked list need searching algorithm

⇒ If the flag returned is false → allow insertion. Esle → abort the insertion algorithm
❑ Inserting into an empty list.
⇒ new item is inserted as the first element.
❑ Insertion at the beginning of the list.
⇒ If flag is false + value of pre pointer is NULL .
❑ Insertion at the end of the list.
⇒ If flag is false + value of cur pointer is NULL .
❑ Insertion in the middle of the list.
⇒ If flag is false + none of the returned pointers are NULL
4. STACK, QUEUE, TREE, GRAPH
To process data, we need to define the data type and the operation to be performed on
the data ⇒ Idea behind an abstract data type(ADT).
An abstract data type is a data type packaged with the operations that are meaningful
for the data type

CSI 104 54
STACKS-LIFO

Stack is a restricted linear list in which all additions and deletions are made at
one end, the top

Operations on stacks: stack, push, pop, empty.


1, Stack operation⇒ The stack operation creates an empty stack.
2, Push operation ⇒ Inserts an item at the top of the stack
3, Pop operation ⇒ Deletes the item at the top of the stack
4, Empty operation ⇒ Check the status of the stack

QUEUES-FIFO

A queue is a linear list in which data can only be inserted at one end, called the rear,
and deleted from the other end, called the front.
Operations on Queue: queue, enqueue, dequeue, empty.

Queue implementation:

At the ADT level, we use queue and its 4 operations, A queue ADT can be implemented
using either array or linked list.

TREE

CSI 104 55
A tree consists of a finite set of elements, called nodes(or vertices), and a finite set of
directed lines, called arcs, that connect pairs of the nodes

Binary tree ⇒ no node have more than 2 subtrees

Operation on binary trees:

Binary tree traversals : A binary tree traversal requires that each node of the tree be
processed once and only once in a predetermined sequence. The two general
approaches to the traversal sequence are depth-first and breadth-first traversal.

1 - Depth-first traversal

CSI 104 56
2 - Breadth-first-traversal

Binary tree applications → huffman coding-uses binary trees to generate a variable


length binary code from a string of symbols
→ expression trees- An arithmetic expression can be
represented in three different formats: infix, postfix, and prefix. In an infix notation, the
operator comes between the two operands. In postfix notation, the operator comes after
its two operands, and in prefix notation it comes before the two operands.

GRAPH:
A graph is an ADT made of a set of nodes, called vertices, and set of lines connecting
the vertices, called adges or arcs
A tree defines a structure in which a node can have only 1 single parent

CSI 104 57
while each node in graph can have one or more parents.
Graphs have 2 kinds: directed and undirected

Directed graph: each edge has a direction


Undirected graph: no direction

Application: model a transportation network, use weight graph → represent the


distance between 2 cities connected by that edge.

Chap 10: File structure


Files are stored on auxiliary or secondary storage devices. The two most common
forms of secondary storage are disk and tape. Files in secondary storage can be both
read from and written to.
Files can also exist in forms that the computer can write to but not read. For example,
the display of information on the system monitor is a form of file, as is data sent to a
printer. In a general sense, the keyboard is also a file, although it cannot store data.
(Các tệp được lưu trữ trên các thiết bị lưu trữ phụ hoặc thứ cấp. Hai hình thức lưu trữ
thứ cấp phổ biến nhất là đĩa và băng. Các tệp trong bộ lưu trữ thứ cấp có thể được đọc
và ghi vào. Các tệp cũng có thể tồn tại ở dạng mà máy tính có thể ghi nhưng không đọc
được. Ví dụ: hiển thị thông tin trên màn hình hệ thống là một dạng tệp, cũng như dữ
liệu được gửi đến máy in. Theo nghĩa chung, bàn phím cũng là một tệp, mặc dù nó
không thể lưu trữ dữ liệ

CSI 104 58
Files involved in updating sequential files:

When the file is to be updated, the master file is retrieved from offline storage and
becomes the old master.

❑ New master file. The new permanent data file or, as it is commonly known, the new
master file, contains the most current data.
❑ Old master file. The old master file is the permanent file that should be updated.
Even after updating, the old master file is normally kept for reference.
❑ Transaction file. The third file is the transaction file. This contains the changes to be
applied to the master file.

A key is one or more fields that uniquely identify the data in the file

The error report contains a listing of all errors discovered during the update process
and is presented for corrective action

Process file updates:

1. If the transaction file key is less than the master file key and the transaction is an
add (A), add the transaction to the new master.

CSI 104 59
2. If the transaction file key is equal to the master file key, either:
a. Change the contents of the master file data if the transaction is a change (C).
b. Remove the data from the master file if the transaction is a deletion (D).

3. If the transaction file key is greater than the master file key, write the old master file
record to the new master file.

4. Several cases may create an error and be reported in the error file:
a. If the transaction defines adding a record that already exists in the old master file
(same key values).
b. If the transaction defines deleting or changing a record that does not exist in the
old master file.

INDEXED FILES:

An indexed file is made of a data file, which is a sequential file, and an index

The index itself is a very small file with only two fields: the key of the sequential file
and the address of the corresponding record on the disk. The index is sorted based
on the key values of the data files.

To access a record in file randomly ⇒


we need to know the address of the record. ⇒ An
indexed file can relate the key word to the record address.

HASHED FILES:
In an indexed file, the index maps the key to the address A harshed file uses a
mathemactical function(thuật toán để truy cập đúng những gì yêu cầu)

Hashing methods:
1, Direct hashing

Những key nhập vào chính là data file address mà ko cần đến thuật toán nào.
Vì thế the file must contain a record for every possible key.

Dù số trường hợp sử dụng phương pháp này còn hạn chế nhưng nó có thể
đảm bảo đc để ko có từ đồng nghĩa hay xung đột.

2, Modulo division hashing - division remainder hashing

Devides the key by the file size and uses the remainder plus 1 for the address

COLLISION RESOLUTION:

CSI 104 60
We call the set of keys that hash to the same address in our list synonyms.

A collision is the event that occurs when a hashing algorithm produces an


address for an insertion key but that address is already occupied. The address
produced by the hashing algorithm is known as the home address. The part of
the file that contains all the home addresses is known as the prime area.

Each of collision is hashing algorithm ⇒ Any hashing method can be used with any
collision resolution method.

1. Open addressing

Resolves collisions in the prime area. Simple strategy for data that
cannot be stored in the home address is to store it in the next address (home
address + 1)

2. Linked list

In this method, the first record is stored in the home address, but contains a pointer
to the second record.

Text vs Binary:

CSI 104 61
Text file: is a file of characters can not contain other data structure. To store
other structures ⇒ must be converted to their character equivalent formats.
Binary file: a collection of data stored in the internal format of the computer ⇒
Data can be an integer(such as image, audio, video) , a floating point number or
any other structured data(except a file).

DIRECTORIES
⇒ provided by most operating systems for ORGANIZING FILES.
⇒ Same function as folder in a filing cabinet
⇒ Represented as a special type of file that holds information about other files.
⇒ Not only serves as a kind of index that tells the operating system where files are
located on auxiliary storage device, but can also contain other information about the
files it contains such as who has access to each file, or the date when file was
created, accessed, or modified.

Chap11:Databases
Data base is a collection of related, logically coherent, data used by the application
programs in an organization.

Advantage:

-Less redundancy (ít dư thừa)


-Inconsistency avoidance (tránh sự không nhất quán)
-Efficiency(hiệu quả)

-Data integrity(toàn vẹn dữ liệu)

CSI 104 62
-Confidentaility(bảo mật)
DBMS( database management systems) ⇒ defines, creates, and maintains a
database.
⇒ allows controlled access to data in the database
DBMS is a combination of five components:

Hardware: physical computer allows users to access to data


Software: actual program that allows users to access, maintain + update data
Data: stored on the storage devices
Users: people who control and manage the databases and perform different types
of operations on the databases
Procedures : general rules and instructions that help to design the database and
use DBMS

Database architecture:
3-level architecture for a DBMS: internal, conceptual, and external.
Internal level ⇒ determines where the data is actually stored on the storage devices
⇒ deals with low level access methods and how bytes are transferred to and from
storage device.

CSI 104 63
Conceptual level ⇒ defines the logical view of data ⇒ data model, the main
functions of the DBMS (such as queries) defined on this level.
External level ⇒ interacts directly with the user⇒ change the data from conceptual
level to a format and view that familiar to the users.
INTERNAL LEVEL:
has internal schema → describes the physical storage structure of the database.
→ how the data is stored in the database
→ physical implementation of the DB to achieve optimal run,
time performance and storage space utilization, storage space allocation for
data and indexes, record description for storage, record placement, data
compression, encryption.
THE CONCEPTUAL LEVEL:
has conceptual schema → describes the structure of the whole database for a
community of users.
→ what data is stored in the database
→ The logical structure of the entire database as seen by DBA
→ The relationship among the data
→ View of the data requirements of the organization
THE EXTERNAL OR VIEW LEVEL:
has a number of external schemas or user views ⇒ describes the part of the
database that a particular user groups interested in and hides the rest of the
database from that user group.
→ consists of a number of different external views of the DB
→ the user’s view of DB
→ describe part of DB for users
→ provides a powerful and flexible security mechanism by hiding parts of the DB
from certain users
→ permits users to access data in a way that is customized to their needs.

CSI 104 64
Database models:
⇒ defines the logical design of data
3 models:
hierarchical model
network model
relational model

hierarchical model ⇒ data is organized as an inverted tree

network model ⇒ the entities are organized in a graph, in which some entities can
be accessed through several paths.

CSI 104 65
relational model ⇒ data is organized in 2-dimensional tables called relations ⇒
tables are related to each other.

The relationship database model


1. Relation ⇒ RDBMS organizes the data so that its external view is a set of
relations or table. This does not mean that data are store in the table

Relation in RDBMS has following features:

Name

Attributes(each column in a relation) - are the column headings in the table ⇒


each attribute gives meaning to the data.

Tuples(each row in a relation) ⇒ defines a collection of attribute values.


The total number of rows in a relation is called the cardinality of the relation

CSI 104 66
Operations on relations ⇒ each operation as defined in the database query
language SQL.

SQL use on relational databases.

2. Insert ⇒ is an unary operation ⇒ inserts a new tuple into the relation

3, Delete ⇒ is an unary operation ⇒deletes a tuple defined by a criterion from the


relation

CSI 104 67
4, Update ⇒ is an unary operation ⇒ changes the value of some attributes of a tuple

5. Select ⇒ unary operation

CSI 104 68
6. Join ⇒ binary operation combine 2 relations on common attributes

7. Union ⇒ take 2 relations with the same set of attributes

CSI 104 69
Database design:
The design of DB ⇒ is a lengthy and involved task that can only be done through a step
by step process
The first step normally involves a lot of interviewing of potential users of the DB
The second step is to build an entity-relation model(ERM) that defines the entities for
which some information must be maintained.

ERM: ⇒ creates an entity-relationship diagram to show the entities for which


information needs to be stored and the relationship between those entities.

CSI 104 70
From E-R diagrams to relations: ⇒ For each entity set in the E-R diagram ⇒
create a relation(table) in which there are n columns related to the n attributes
defined for that set.

For each relationship set in the E-R diagram ⇒ create a relation(table) ⇒ This
table has 1 column for the key of each entity set involved in this relationship ,1
column for each attribute of the relationship.

Normalization: ⇒ a given set of relations are transformed to a new set of relations


with a more solid structure. It is needed to allow any relation in the database to be
represented, to allow language like SQL to use powerful retrieval operations
composed of atomic operations,…

Normalization process defines a set of hierachial normal form(NFs).

First normal form(1NF) ⇒

CSI 104 71
Second normal form (2NF) ⇒ in each realtion we need to have a key on which all
other attributes

Chapter 12: Security and ethical issue


Information now need:
1, hidden from unauthorized access(confidentially)
2, protect from unauthorized change(integrity)
3, available to an authorized entity when it is needed(availability)
1,2,3 are security goals

Confidentiality ⇒ an organization need to guard against those malicious actions that


endanger the confidentiality of its information

Integrity ⇒ changes need to be done only by authorized entities and through


authorized mechanisms.

CSI 104 72
Availability ⇒ needs to be available to authorized entities
But 1,2,3 can be threaten by security attacks

4, Services and techniques ⇒ ITU-T defines some security services to achieve security
goals and prevent attacks ⇒Each of the services is designed to prevent one or more
attacks while maintaining security goals.
Technique 1: Cryptography using secret keys. Cryptography means concealing the
contents of a message by enciphering

Technique 2: Steganography ⇒
secret writing. Steganography means concealing the
message itself by covering it with something else

CSI 104 73
Confidentiality:
1. Symmetric-key ciphers ⇒ uses the same key for both encryption and decryption,
the key can be used for bidirectional communication

2. Asymmetric-key ciphers ⇒ the secret is personal ⇒ each person create and keep
his or her own secret.

Symmetric is often used for long message, while asymetric is for short message

CSI 104 74
3. RSA cryptosystem ⇒ one of the common public key algorithms ⇒ RSA use 2
exponents e,d e is public and d is private

Ethical principles:

Moral rules → we should avoid doing anything if it is against universal morality.

CSI 104 75
Utilization → an act is ethical if it brings about a good result.
Social contract → an act is ethical if a majority of people in society agree with it

Privacy

CSI 104 76
Non-disclosure Agreement ( known as confidentiality, non-use or trade secret
agreement. ⇒NDA agreement is a legally binding contract between parties that
requires them to keep certain information confidential.

Hacker

CSI 104 77
Type of hacking:

1. Hacking for financial gain

2. Corporate espionage ⇒is the commercial application of hacking, malware, phishing,


and other unsavory spying techniques to obtain privileged insider information from a
business competitor.

3. State-sponsored hacking ⇒ the potential rewards from security hacking can be so


great.

CSI 104 78

You might also like