Monitor and Support Data Conversion
Monitor and Support Data Conversion
LEARNING GUIDE # 09
Module Title: - Monitor and Support
Data Conversion
SYMBOLS
These symbols are located at the left margin of the module. These illustrate the actions that should be
taken or resource to be used at a particular stage in the module.
LO Learning
Outcome Self-Check
Answer Key
Resources
Reading Assessment
Activity
Remember/Tips
Use Computer
LO
Data conversions may as simple as the conversion of a text file from one character encoding
system to another; or more complex, such as the conversion of office file formats, or the
conversion of image and audio file formats.
There are many ways in which data is converted within the computer environment. This may be
seamless, as in the case of upgrading to a newer version of a computer program. Alternatively,
the conversion may require processing by the use of a special conversion program, or it may
involve a complex process of going through intermediary stages, or involving complex
"exporting" and "importing" procedures, which may converting to and from a tab-delimited or
comma-separated text file. In some cases, a program may recognize several data file formats at
the data input stage and then is also capable of storing the output data in a number of different
formats. Such a program may be used to convert a file format. If the source format or target
format is not recognized, then at times third program may be available which permits the
conversion to an intermediate format, which can then be reformatted using the first program.
There are many possible scenarios.
In computing, data integrity refers to maintaining and assuring the accuracy and consistency of
data over its entire life-cycle, and is an important feature of a database or RDBMS system. Data
warehousing and business intelligence in general demand the accuracy, validity and correctness
of data despite hardware failures, software bugs or human error. Data that has integrity is
identically maintained during any operation, such as transfer, storage or retrieval.
All characteristics of data, including business rules, rules for how pieces of data relate dates,
definitions and lineage must be correct for its data integrity to be complete. When functions
operate on the data, the functions must ensure integrity. Examples include transforming the data,
storing history and storing metadata.
Databases
Data integrity contains guidelines for data retention, specifying or guaranteeing the length of
time of data can be retained in a particular database. It specifies what can be done with data
values when its validity or usefulness expires. In order to achieve data integrity, these rules are
consistently and routinely applied to all data entering the system, and any relaxation of
enforcement could cause errors in the data. Implementing checks on the data as close as possible
to the source of input (such as human data entry), causes less erroneous data to enter the system.
Strict enforcement of data integrity rules causes the error rates to be lower, resulting in time
saved troubleshooting and tracing erroneous data and the errors it causes algorithms.
Data integrity also includes rules defining the relations a piece of data can have, to other pieces
of data, such as a Customer record being allowed to link to purchased Products, but not to
unrelated data such as Corporate Assets. Data integrity often includes checks and correction for
invalid data, based on a fixed schema or a predefined set of rules. An example being textual data
entered where a date-time value is required. Rules for data derivation are also applicable,
specifying how a data value is derived based on algorithm, contributors and conditions. It also
specifies the conditions on how the data value could be re-derived.
If a database supports these features it is the responsibility of the database to insure data integrity
as well as the consistency model for the data storage and retrieval. If a database does not support
these features it is the responsibility of the applications to ensure data integrity while the
database supports the consistency model for the data storage and retrieval.
As of 2012, since all modern databases support these features (see Comparison of relational
database management systems), it has become the de-facto responsibility of the database to
ensure data integrity. Out-dated and legacy systems that use file systems (text, spreadsheets,
ISAM, flat files, etc.) for their consistency model lack any kind of data-integrity model. This
requires organizations to invest a large amount of time, money, and personnel in building data-
integrity systems on a per-application basis that effectively just duplicate the existing data
integrity systems found in modern databases. Many companies, and indeed many database
systems themselves, offer products and services to migrate out-dated and legacy systems to
modern databases to provide these data-integrity features. This offers organizations substantial
savings in time, money, and resources because they do not have to develop per-application data-
integrity systems that must be re-factored each time business requirements change.
Accuracy
The fundamental issue with respect to data is accuracy. Accuracy is the closeness of results of
observations to the true values or values accepted as being true. This implies that observations of
most spatial phenomena are usually only considered to estimates of the true value. The
difference between observed and true (or accepted as being true) values indicates the accuracy of
the observations.
Types of accuracy
Basically two types of accuracy exist. These are positional and attribute accuracy.
1. Positional accuracyis the expected deviance in the geographic location of an object from its
true ground position. This is what we commonly think of when the term accuracy is
discussed. There are two components to positional accuracy. These are relative and absolute
accuracy. Absolute accuracy concerns the accuracy of data elements with respect to a
coordinate scheme, e.g. UTM. Relative accuracy concerns the positioning of map features
relative to one another.
Often relative accuracy is of greater concern than absolute accuracy. For example, most GIS
users can live with the fact that their survey township coordinates do not coincide exactly with
the survey fabric, however, the absence of one or two parcels from a tax map can have
immediate and costly consequences.
Data accuracy is getting the exact data. Data integrity is making sure that this data is received
completely and correctly.
A natural play for integrators is to add new storage frames to support data growth while also
consolidating and eliminating older, more expensive storage subsystems. It lowers customer
operating costs and at the same time increases reseller revenue. However, customers often shy
away from consolidation because they fear large, complex and disruptive data migrations.
Successful storage integrators can address this concern by offering data migration services based
on solid methodology and great tools.
Here's my unscientific take on the best data migration tool in each of five categories. To be
considered, the tool must be optimized for one-time data relocation from one storage device to
another, with an emphasis on heterogeneous replication. Special emphasis is given to data
migration tools that don't have to become a permanent part of the infrastructure.
This open source tool has been around for a long time and distinguishes itself by being very
simple, yet powerful, and totally host- and storage-agnostic. rsync is very flexible and can be
adapted to almost every data migration need, but it shines especially brightly with largely static
unstructured content.
With large structured files like databases, block-level migration tools make the most sense. I'm
going to cop out here and not name a specific tool but instead a spectrum of tools: Host-based
volume managers are often overlooked as a data migration tool, yet they provide a powerful way
to non-disruptively migrate data from one storage array to another. Most operating systems
already have a capable volume manager that is heterogeneous and already installed.
Sometimes the data migration simply can't be done on the host. This is especially true when a lot
of hosts access the same data, as happens with NAS arrays. The winner in this category is EMC's
Rainfinity. This NAS virtualization appliance can be inserted into the data path between the
servers and the storage array, orchestrate non-disruptive migrations and then slip quietly back out
of the data path.
Storage area networks (SANs) were once just a place to route servers to disk. These days they
have become much more sophisticated, and intelligent fabric services are not only possible, they
are commonplace. Brocade's Data Migration Manager (DMM) is a SAN-based heterogeneous
data migration tool that leverages the company's AP7600 intelligent SAN device. Migrating
logical unit numbers (LUNs) from one storage array to the next is possible with several SAN-
based tools, but this one is different because it moves the data online and doesn't have to take
control of the LUNs.
It is nearly impossible for an array-based data migration tool to be heterogeneous -- unless the
array itself is heterogeneous. Hitachi Data Systems' (HDS) Universal Replicator can migrate data
that is both internal and external to HDS Universal Storage Platform (USP) arrays. This type of
replication works great if the customer already has or is moving toward a HDS USP array and
the hosts cannot support the workload required to move the data.
There you have it, five storage replication tools in five separate categories. Solution providers
who can wrap data migration services around a few of these tools will bring more value to their
customers and more revenue to themselves.
Transferring data
Data Transferring: is the physical transfer of data (a digital bit stream) over a point-to-point or
point-to-multipoint communication channel.
Importing data is the process of retrieving data from sources external to Microsoft® SQL
Server™ (for example, an ASCII text file) and inserting it into SQL Server tables. Exporting data
is the process of extracting data from an instance of SQL Server into some user-specified format
(for example, copying the contents of a SQL Server table to a Microsoft Access database).
Importing data from an external data source into an instance of SQL Server is likely to be the
first step you perform after setting up your database. After data has been imported into your SQL
Server database, you can start to work with the database.
Importing data into an instance of SQL Server can be a one-time occurrence (for example,
migrating data from another database system to an instance of SQL Server). After the initial
migration is complete, the SQL Server database is used directly for all data-related tasks, rather
than the original system. No further data imports are required.
Importing data can also be an ongoing task. For example, a new SQL Server database is created
for executive reporting purposes, but the data resides in legacy systems updated from a large
number of business applications. In this case, you can copy new or updated data from the legacy
system to an instance of SQL Server on a daily or weekly basis.
Usually, exporting data is a less frequent occurrence. SQL Server provides tools and features that
allow applications, such as Access or Microsoft Excel, to connect and manipulate data directly,
rather than having to copy all the data from an instance of SQL Server to the tool before
manipulating it. However, data may need to be exported from an instance of SQL Server
regularly. In this case, the data can be exported to a text file and then read by the application.
Alternatively, you can copy data on an ad hoc basis. For example, you can extract data from an
instance of SQL Server into an Excel spreadsheet running on a portable computer and take the
computer on a business trip.
SQL Server provides tools for importing and exporting data to and from data sources, including
text files, ODBC data sources (such as Oracle databases), OLE DB data sources (such as other
instances of SQL Server), ASCII text files, and Excel spreadsheets.
Additionally, SQL Server replication allows data to be distributed across an enterprise, copying
data between locations and synchronizing changes automatically between different copies of
data.
Data Transformation Services, or DTS, is a set of objects and utilities to allow the automation of
extract transform and load operations to or from a database. The objects are DTS packages and
their components, and the utilities are called DTS tools. DTS was included with earlier versions
of Microsoft SQL Server, and was almost always used with SQL Server databases, although it
could be used independently with other databases.
DTS allows data to be transformed and loaded from heterogeneous sources using OLE DB,
ODBC, or text-only files, into any supported database. DTS can also allow automation of data
import or transformation on a scheduled basis, and can perform additional functions such as
FTPing files and executing external programs. In addition, DTS provides an alternative method
of version control and backup for packages when used in conjunction with a version control
system, such as Microsoft Visual SourceSafe.
Data Transformation Services (DTS) is a set of tools that lets you quickly and easily move and
manipulate data.
Data Transformation Services (DTS) is a set of tools that lets you quickly and easily move and
manipulate data. If you do any work with current versions of SQL Server, you have probably
used the DTS wizard to import or export data from SQL Server into other data sources.
Today's IT environment is very diverse. Most companies store their data in multiple relational
database management systems (RDBMS). The most popular RDBMS on the market are
Microsoft SQL Server, Oracle, Sybase, and DB2. Many organizations also store some of their
data in non-relational formats such as mainframes, spreadsheets, and email systems. Smaller
databases are commonly built and maintained using one of the desktop RDBMS, such as
Microsoft Access. Despite the fact that data is disseminated among multiple data stores, the
organization still has to operate as a single entity. Therefore, there needs to be a way to relate and
often interchange data among various data stores.
Need for exchanging data among multiple systems has been around for a long time. Prior to
DTS' debut in SQL Server 7.0, the only tool for importing and exporting data to and from SQL
Server was the Bulk Copy Program (BCP). This command line utility is relatively
straightforward (although somewhat cryptic) to use, and offers fair performance. However, the
capabilities of BCP are quite limited: You can either export data from SQL Server into a text file
or from the text file to SQL Server.
Another predecessor of DTS was the SQL Server object transfer utility, which let you transfer
database objects and data between two SQL Servers.
It's easy to guess that neither BCP nor the object transfer utility could sufficiently serve data
exchanging needs. Many companies spent top dollars to create their own custom tools for
transferring and transforming data among various sources.
With DTS, you can import data from any data source for which you have an OLEDB or
ODBC provider. Whether your company stores its data in relational databases such as
Oracle or in a non-relational format such as email stores or Lotus Notes, DTS can get
around importing such data. While moving data, you can also manipulate it and store it in
the desired format.
DTS capabilities are not limited to data transfer. DTS also provides an excellent way to
automate some of the administrative tasks. For instance, you can import the data from an
external data source to a staging table, call a stored procedure to give the imported data
the particular shape you need, and then kick off an Analysis Services cube processing
task—all from the same DTS package. If necessary, you can also send an email
notification to the responsible employee in case of a package failure.
DTS probably wouldn't be as popular if it did not have a very nice intuitive user interface.
You can create a package from the DTS Designer, using the DTS wizard, or through
code. The DTS Designer can be accessed by expanding the Data Transformation
Servicesfolder in Enterprise Manager, right-clicking on Local Packages, and selecting
New Package. The DTS wizard can be accessed in several different ways—the easiest is
selecting Import and Export Data from the SQL Server menu. The wizard lets you answer
a few simple questions, and gets you well on the way of developing your packages. The
DTS Designer lets you pick from the list of tasks and then customize each task for your
needs.
Perhaps one of the best things about DTS is that it is extensible. SQL Server 7.0 only
provided eight built-in tasks. SQL Server 2000 provides 19 built-in tasks—which, in
most cases, will be more than sufficient. However, each of these tasks can be customized
through the DTS Object Model. In addition, you can develop your own custom tasks, and
register them with SQL Server.
SQL Server also provides a way to secure your DTS packages. You can set a user
password and an owner password to each package. The users can only execute the
package, whereas the owner can make changes to the package.
Last but not least, DTS comes free when you purchase any edition of SQL Server 2000
(developer, desktop, standard, or enterprise editions are available). In fact, you don't even
have to have SQL Server installed on your computer to run DTS—you can use DTS to
transfer data among non-SQL Server data sources just as well.
Data Conversion Plan and Verification Document contain a line entry for each table that the
upgrade is:
Updating
Inserting rows into
Deleting rows from.
Each table listed on this document must be investigated and verified.
The purpose of the conversion plan is to reflect the status of the verification of each table. It
should also serve as an overall view of the verification work that needs to be done after each test
move.
Determine if the table will be used in the new version of PeopleSoft, using the
following options. Each module team will need to determine the appropriate level
of validation for its tables.
Review the table ‘before’ and ‘after’ count reports. These reports are located in
the PS Upgrade Document repository for Conversion. A separate ‘before’ and
‘after’ table count document will exist for each test move. If the counts are zero,
or very small, it’s possible the table is not used.
Review the Conversion.Script.Extract.txt report also located in the PS Upgrade
Document repository for Conversion. This document contains the INSERT,
UPDATE and DELETE statements from the conversion scripts. The report is
sorted alphabetically by table name. This report should be viewed using TextPad.
Investigate the purpose of the table and determine how it is used within your
module’s overall functionality.
If the table is a temporary table that is used by PS during conversion, but is not the final
destination for data, update the conversion plan by placing an ‘NA’ in the status column
for each test move. Temporary tables do not need to be verified after each test move nor
during the production move.
If the table will be used in the new version of PeopleSoft, the validity of the data in
the table must be verified. The following are options for doing this validation.
Each module team will need to determine the appropriate level of validation for its
tables.
Review the ‘before’ and ‘after’ table count reports stored in the PS Upgrade
Document repository for Conversion. Investigate and explain differences.
Review the Conversion.Script.Extract.txt report also located in the PS Upgrade
Document repository for Conversion. This document contains the INSERT,
UPDATE, and DELETES statements from the conversion scripts. The report is
sorted alphabetically by table name. This report should be reviewed using
TextPad.
Using SQLPlus, visually compare a subset of the table’s data in the current and
new version of PeopleSoft. Explain differences.
Verify the table values by bringing up pages that use the table, running reports
that use the table, entering data against the table, etc.
Document your analysis and results in your module’s subdirectory of the PS
Upgrade Document repository. Store any queries or sql used to verify data in
these subdirectories.
involved in verifying the data will be considerably less in subsequent test moves than that
required in the initial verification. These subsequent checks will just make sure that
nothing ‘got broken’ during the test move. Remember that the final move to
Production will be done within a limited time span. Therefore, each module must
come up with efficient methods for verifying its data during the final conversion.
It’s especially important to ensure UMICH-written scripts work properly with each test
move.
Status:
For each table on the conversion plan, place a ‘C’ (Complete) in the status column for
each table that has been verified. Once the analysis on a table has begun, place a ‘S’
(Started) in the status column. A Status column exists for each test move and for the final
move to Production. Many module teams will not begin data verification until after Test
Move 2. In these cases, the status column for Test Move 1 should be left blank.
Outstanding Problems:
Place comments related to any problems found in the Outstanding Problems column.
All data needs to be verified and all UMICH scripts need to be written/tested for
inclusion in Test Move 3.
The Data Conversion Plan shall describe the preparation for, delivery of, and confirmation of the
successful conversion and all associated processes and interfaces.
4. CONTENT REQUIREMENT
The following describes the minimum required content of the deliverable. Any changes to
content must be approved by the state in advance.
Cover/title page.
Table of contents.
An introduction that includes the document’s purpose, suggested audience, and listing of key
terms.
An overview of the activities and services that the Contractor will provide, the assumptions
on which the Plan is based, and the roles and responsibilities for individuals and
organizations involved in the conversion effort.
Data Conversion Objectives: This section shall describe the Objectives to be addressed in the
data conversion from both paper documents and electronic data.
Data Conversion Strategy: Describe the conversion effort. Any conventions needed to
understand the overall conversion method shall be presented or referenced. Graphic
illustrations of interrelationships are required.
o Major Systems Involved. Identify the source systems, electronic and hardcopy that
are involved. Identify the goals and issues for each source system.
o Locations Involved. Identify the locations involved, and the part that location plays in
the conversion effort.
o Conversion Method. Describe any automated method of conversion that requires
minimal intervention from State staff and how hardcopy records will be converted,
validated, and loaded into the new system. If part or all of the conversion method
depends upon system states or modes, this dependency shall be indicated. Any
conventions needed to understand the overall conversion method shall be presented or
referenced.
o Conversion Security. Describe what security measures will be enforced regarding
data sensitivity issues.
o Conversion Control. Describe the means to centrally control the conversion of
selected group (such as conversion of a single organization versus all organizations at
once) to one or more functions at a time, or at various times.
o Conversion Reporting. Describe the mechanism for identifying and reporting
conversion errors.
The Contractor shall refer to the OSI Style Guide for format and preparation guidelines.
Self-Check 1
Answer the questions on the following questionnaire; provide the answer sheet to your
trainer.
Check your answers by looking at the feedback sheets; ask for the assistance of the
trainer whenever necessary.
Satisfactory
Questions
Response
The trainee should answer the following questions YES NO
Say “true” if the statement is correct and “False” is not
1. Data conversion is the conversion of computer data from one
format to another.
2. Data conversions may as simple as the conversion of a text file
from one character encoding system to another; or more
complex, such as the conversion of office file formats,
3. Data integrity refers to maintaining and assuring the accuracy
and consistency of data over its entire life-cycle, and is an
important feature of a database or RDBMS system.
4. Accuracy is not the closeness of results of observations to the
true values or values accepted as being true.
5. Attribute accuracy is not equally as important as positional
accuracy.
6. Data Transferring: is the logical transfer of data (a digital bit
stream) over a point-to-point or point-to-multipoint
communication channel.
7. Importing data is the process of retrieving data from sources
external to Microsoft® SQL Server™ (for example, an ASCII
text file) and inserting it into SQL Server tables. Exporting
data
Feedback to Trainee:
Answer Key
1. True
2. True
3. True
4. False
5. False
6. False
7. True
8.
Performance Criteria
Satisfactory
Assessment Criteria
Response
The trainee will be assessed through the following criteria: YES NO
Answered all the interview questions clearly
Performed all activities accordingly
Followed all instructions in the activities
Feedback to Trainee: