Assignment 1 - Narrative
Assignment 1 - Narrative
MDS_Exercise1_FileA.txt is a flat data file contains inventory data in space delimited format.
Files doesn’t not have column heading (attribute) and there is some issue with formatting that’s
causing data overlapping with other columns data. I added following data headers to identify
the file’s data:
1. ID
2. VIN
3. YEAR
4. MAKE
5. MODEL
6. VERSION
7. DRIVE_TYPE
8. COLOR
9. BODY_STYLE
10. FUEL
12. MSRP
Files contains duplicate data in row # 2, example “S 2.0L 4WD 4WD” and some space delimiter
issue as well. Following data shared among Inventory and Sales:
1. VIN
2. Model
3. Year
4. Color
5. Fuel
6. MSRP
Question - How did you decide to represent the data in the way that you did?
Answer - Analyzed Inventory and sales data and identified similar data and their columns
heading and based on observation added column heading in the Inventory. Inventory data was
not organized but based in data in same column and other columns identified and organized
data in the table.
Question - What were the hardest decisions you had to make in this design process?
Answer – Included all data as it is after identifying column heading and organizing data, so note
taken any hard decision to add additional column or leave any column.
Question - How does your schema design support data independence?
Answer – Schema designed in relational model and it support new data column addition,
deletion and update without impacting existing data in the table. Current schema designed
support both logical and physical data independency.
Question - How may your schema design support the overarching goals of data curation (revisit
objectives and activities of Week 1)?
Answer – Schema designed in relational model and relation schema support overreaching goal
of data curation.
Question - Which curation activities could enhance or sustain the database for future discovery
and use for new purposes? What additional activities would you recommend?
Answer – Data curation activities like documentation, data authentication, archiving and
management will enhance and sustain the database for future discovery and use for new
purpose.
NOTE - Inventory table schema, attribute’s description and data are stored in the excel sheet
Inventory_Data and ER diagram in the ERDiagramAndSchema sheet.
SALES DATA FILE –
MDS_Exercise1_FileB.csv is flat file contains sales data in comma “,” separated format. Files
contains column heading (attribute) and data formatting seems organized. Sales file contains
data shared among Inventory and Customer_relation data files:
Following data shared with Inventory:
1. VIN
2. Model
3. Year
4. Color
5. Engine
6. MSRP
Following data shared with Customer_relation:
1. LastName
2. FirstName
3. MI
4. Address
5. City
6. State
7. Country
Question - How did you decide to represent the data in the way that you did?
Answer - Analyzed sales data and compared with Inventory and customer_relation data and
identified similar data and their columns heading to find out to remove redundancy and
normalize the data. Based on observation, selected data that is unique for sales only and moved
other commons data in their respective data files Inventory and customer relation tables.
Question - Did you leave out any information? If so, why?
Answer – Yes, I not included some fields in the sales because to normalize and avoid
redundancy I excluded following fields of data -Model, Year, Color and Engine because these
fields are already in the Inventory and excluded LAstName, FirstName, MI, Address, City, State
and country because these fields are already in the customer_relation.
Question - Why did you choose certain things as attributes? As keys?
Answer – I added new field “CUST_ID” as foreign key to establish relation with
customer_relation tables primary key “CUST_ID”. Added foreign key VIN to establish relation
with Inventory tables Primary Key VIN.
Question - What were the hardest decisions you had to make in this design process?
Answer – Added new field “CUST_ID” to establish relation with customer_relation table to
avoid using composites key (FirstName + LastName). Similar First Name and Last Name can
create problem.
Question - How does your schema design support data independence?
Answer – Schema designed in relational model and it support new data column addition,
deletion and update without impacting existing data in the table. Current schema designed
support both logical and physical data independency.
Question - How may your schema design support the overarching goals of data curation (revisit
objectives and activities of Week 1)?
Answer – Schema designed in relational model and relation schema support overreaching goal
of data curation.
Question - Which curation activities could enhance or sustain the database for future discovery
and use for new purposes? What additional activities would you recommend?
Answer – Data curation activities like documentation, data authentication, archiving and
management will enhance and sustain the database for future discovery and use for new
purpose.
NOTE - Sales table schema, attribute’s description and data are stored in the excel sheet
Sales_Data and ER diagram in the ERDiagramAndSchema sheet.
CUSTOMER_RELATION DATA FILE –
MDS_Exercise1_FileC.docx is a MS word document file contains customer_relation data in plain
text format. File doesn’t contain data column heading to identify the data. It’s used next line
and space delimiter to keep data organized. After identifying the data added following column’s
heading:
1. FIRST_NAME
2. LAST_NAME
3. MI
4. PROFESSION
5. ADDRESS
6. CITY
7. STATE
8. ZIP
9. COUNTRY
10. FINANCING
In customer_relation data file following data is common and shared among with Sales data:
1. LastName
2. FirstName
3. MI
4. Address
5. City
6. State
7. Country
Question - How did you decide to represent the data in the way that you did?
Answer - Analyzed customer_relation data and compared with sales and identified similar data
and their columns heading to remove redundancy and normalize the data. And based on
observation I selected data that is unique in customer_relation only.
Question - Did you leave out any information? If so, why?
Answer – No, included all data after organizing it and added an additional filed CUST_ID to keep
unique customer id to establish relation with other tables.
Question - Why did you choose certain things as attributes? As keys?
Answer – I added new field “CUST_ID” as primary key to establish relation with sales table. I
added CUST_ID because it will contain alphanumeric unique keys.
Question - What were the hardest decisions you had to make in this design process?
Answer – Added new field “CUST_ID” to establish relation with sales table to avoid using
composites key (FirstName + LastName). Similar FN and LN can create problem.
Question - How does your schema design support data independence?
Answer – Schema designed in relational model and it support new data column addition,
deletion and update without impacting existing data in the table. Current schema designed
support both logical and physical data independency.
Question - How may your schema design support the overarching goals of data curation (revisit
objectives and activities of Week 1)?
Answer – Schema designed in relational model and relation schema support overreaching goal
of data curation.
Question - Which curation activities could enhance or sustain the database for future discovery
and use for new purposes? What additional activities would you recommend?
Answer – Data curation activities like documentation, data authentication, archiving and
management will enhance and sustain the database for future discovery and use for new
purpose.
NOTE – Customer_Relation table schema, attribute’s description and data are stored in the
excel sheet Customer_Relation_Data and ER diagram in the ERDiagramAndSchema sheet.
Discount Details: Added new DISCOUNT table to store discount details. This will create minimal
errors when the discount details must be updated or adding new discount parameters.
The primary key is DISCOUNT_ID, which will be a foreign key in the Sales Table. The associated
attribute, DISCOUNT_DISCRIPTION and DISCONT_AMOUNT will have the information regarding
the kind of discount.
NOTE - Discount table schema, attribute’s description and data are stored in the excel sheet
Discount_Data and ER diagram in the ERDiagramAndSchema sheet.
Cons –