Batch Processing
Batch Processing
Batch processing is not transaction processing. Batch processing involves processing several
transactions at the same time, and the results of each transaction are not immediately
available when the transaction is being entered;[1] there is a time delay.
There are a number of differences between real-time and batch processing. These are
outlined below:
Real-time processing requires the master file to be available more often for updating and
reference than batch processing. The database is not accessible all of the time for batch
processing.
Real-time processing has fewer errors than batch processing, as transaction data is validated
and entered immediately. With batch processing, the data is organised and stored before the
master file is updated. Errors can occur during these steps.
Infrequent errors may occur in real-time processing; however, they are often tolerated. It is
not practical to shut down the system for infrequent errors.
More computer operators are required in real-time processing, as the operations are not
centralised. It is more difficult to maintain a real-time processing system than a batch
processing system.
[edit] Features
[edit] Rapid response
Fast performance with a rapid response time is critical. Businesses cannot afford to have
customers waiting for a TPS to respond, the turnaround time from the input of the transaction
to the production for the output must be a few seconds or less.
[edit] Reliability
Many organizations rely heavily on their TPS; a breakdown will disrupt operations or even
stop the business. For a TPS to be effective its failure rate must be very low. If a TPS does
fail, then quick and accurate recovery must be possible. This makes well–designed backup
and recovery procedures essential.
[edit] Inflexibility
A TPS wants every transaction to be processed in the same way regardless of the user, the
customer or the time for day. If a TPS were flexible, there would be too many opportunities
for non-standard operations, for example, a commercial airline needs to consistently accept
airline reservations from a range of travel agents, accepting different transactions data from
different travel agents would be a problem.
A transaction’s changes to the state are atomic: either all happen or none happen. These
changes include database changes, messages, and actions on transducers.[2]
[edit] Consistency
A transaction is a correct transformation of the state. The actions taken as a group do not
violate any of the integrity constraints associated with the state. This requires that the
transaction be a correct program![2]
[edit] Isolation
Even though transactions execute concurrently, it appears to each transaction T, that others
executed either before T or after T, but not both.[2]
[edit] Durability
Once a transaction completes successfully (commits), its changes to the state survive failures.
[2]
[edit] Concurrency
Ensures that two users cannot change the same data at the same time. That is, one user cannot
change a piece of data before another user has finished with it. For example, if an airline
ticket agent starts to reserve the last seat on a flight, then another agent cannot tell another
passenger that a seat is available.
The storage and retrieval of data must be accurate as it is used many times throughout the
day. A database is a collection of data neatly organized, which stores the accounting and
operational records in the database. Databases are always protective of their delicate data, so
they usually have a restricted view of certain data. Databases are designed using hierarchical,
network or relational structures; each structure is effective in its own sense.
A relational structure.
A hierarchical structure.
A network structure.
The following features are included in real time transaction processing systems:
• Good data placement: The database should be designed to access patterns of data
from many simultaneous users.
• Short transactions: Short transactions enables quick processing. This avoids
concurrency and paces the systems.
• Real-time backup: Backup should be scheduled between low times of activity to
prevent lag of the server.
• High normalization: This lowers redundant information to increase the speed and
improve concurrency, this also improves backups.
• Archiving of historical data: Uncommonly used data are moved into other databases
or backed up tables. This keeps tables small and also improves backup times.
• Good hardware configuration: Hardware must be able to handle many users and
provide quick response times.
In a TPS, there are 5 different types of files. The TPS uses the files to store and organize its
transaction data:
A data warehouse is a database that collects information from different sources. When it's
gathered in real-time transactions it can be used for analysis efficiently if it's stored in a data
warehouse. It provides data that are consolidated, subject-oriented, historical and read-only:
Since business organizations have become very dependent on TPSs, a breakdown in their
TPS may stop the business' regular routines and thus stopping its operation for a certain
amount of time. In order to prevent data loss and minimize disruptions when a TPS breaks
down a well-designed backup and recovery procedure is put into use. The recovery process
can rebuild the system when it goes down.
A TPS may fail for many reasons. These reasons could include a system failure, human
errors, hardware failure, incorrect or invalid data, computer viruses, software application
errors or natural or man-made disasters. As it's not possible to prevent all TPS failures, a TPS
must be able to cope with failures. The TPS must be able to detect and correct errors when
they occur. A TPS will go through a recovery of the database to cope when the system fails, it
involves the backup, journal, checkpoint, and recovery manager:
Depending on how the system failed, there can be two different recovery procedures used.
Generally, the procedures involves restoring data that has been collected from a backup
device and then running the transaction processing again. Two types of recovery are
backward recovery and forward recovery:
• Backward recovery: used to undo unwanted changes to the database. It reverses the
changes made by transactions which have been aborted. It involves the logic of
reprocessing each transaction, which is very time-consuming.
• Forward recovery: it starts with a backup copy of the database. The transaction will
then reprocess according to the transaction journal that occurred between the time the
backup was made and the present time. It's much faster and more accurate.
There are two main types of Back-up Procedures: Grandfather-father-son and Partial
backups:
[edit] Grandfather-father-son
This procedure refers to at least three generations of backup master files. thus, the most
recent backup is the son, the oldest backup is the grandfather. It's commonly used for a batch
transaction processing system with a magnetic tape. If the system fails during a batch run, the
master file is recreated by using the son backup and then restarting the batch. However if the
son backup fails, is corrupted or destroyed, then the next generation up backup (father) is
required. Likewise, if that fails, then the next generation up backup (grandfather) is required.
Of course the older the generation, the more the data may be out of date. Organizations can
have up to twenty generations of backup.
This only occurs when parts of the master file are backed up. The master file is usually
backed up to magnetic tape at regular times, this could be daily, weekly or monthly.
Completed transactions since the last backup are stored separately and are called journals, or
journal files. The master file can be recreated from the journal files on the backup tape if the
system is to fail.
This is used when transactions are recorded on paper (such as bills and invoices) or when it's
being stored on a magnetic tape. Transactions will be collected and updated as a batch at
when it's convenient or economical to process them. Historically, this was the most common
method as the information technology did not exist to allow real-time processing.
• Collecting and storage of the transaction data into a transaction file - this involves
sorting the data into sequential order.
• Processing the data by updating the master file - which can be difficult, this may
involve data additions, updates and deletions that may require to happen in a certain
order. If an error occurs, then the entire batch fails.
Updating in batch requires sequential access - since it uses a magnetic tape this is the only
way to access data. A batch will start at the beginning of the tape, then reading it from the
order it was stored; it's very time-consuming to locate specific transactions.
The information technology used includes a secondary storage medium which can store large
quantities of data inexpensively (thus the common choice of a magnetic tape). The software
used to collect data does not have to be online - it doesn't even need a user interface.
Steps in a real-time update involve the sending of a transaction data to an online database in a
master file. The person providing information is usually able to help with error correction and
receives confirmation of the transaction completion.
Updating in real-time uses direct access of data. This occurs when data are accessed without
accessing previous data items. The storage device stores data in a particular location based on
a mathematical procedure. This will then be calculated to find an approximate location of the
data. If data are not found at this location, it will search through successive locations until it's
found.
The information technology used could be a secondary storage medium that can store large
amounts of data and provide quick access (thus the common choice of a magnetic disk). It
requires a user-friendly interface as it's important for rapid response time.
Reservation Systems Reservation systems are used for any type of business where a service
or a product is set aside for a customer to use for a future time.