CP7019
MANAGING BIG DATA
LTPC
3003
OBJECTIVES:
Understand big data for business intelligence
Learn business case studies for big data analytics Understand nosql big data management
Perform map-reduce analytics using Hadoop and related tools
UNIT I UNDERSTANDING BIG DATA9
What is big data why big data convergence of key trends unstructured data industry
examples of big data web analytics big data and marketing fraud and big data risk and
big data credit risk management big data and algorithmic trading big data and healthcare
big data in medicine advertising and big data big data technologies introduction to Hadoop
open source technologies cloud and big data mobile business intelligence Crowd
sourcing analytics inter and trans firewall analytics
UNIT II NOSQL DATA MANAGEMENT
Introduction to NoSQL aggregate data models aggregates key-value and document data
models relationships graph databases schemaless databases materialized views
distribution models sharding master-slave replication peer-peer replication sharding and
replication consistency relaxing consistency version stamps map-reduce partitioning
and combining composing map-reduce calculations
UNIT III
BASICS OF HADOOP 9
Data format analyzing data with Hadoop scaling out Hadoop streaming Hadoop pipes
design of Hadoop distributed file system (HDFS) HDFS concepts Java interface data flow
Hadoop I/O data integrity compression serialization Avro file-based data structures
UNIT IV
MAPREDUCE APPLICATIONS
MapReduce workflows unit tests with MRUnit test data and local tests anatomy of
MapReduce job run classic Map-reduce YARN failures in classic Map-reduce and YARN
job scheduling shuffle and sort task execution MapReduce types input formats output
formats
UNIT V HADOOP RELATED TOOLS 9
Hbase data model and implementations Hbase clients Hbase examples [Link]
cassandra data model cassandra examples cassandra clients Hadoop integration. Pig Grunt
pig data model Pig Latin developing and testing Pig Latin scripts.
Hive data types and file formats HiveQL data definition HiveQL data manipulation HiveQL
queries.
TOTAL: 45 PERIODS
OUTCOMES:
Upon Completion of the course, the studentswill be able to
Describe big data and use cases from selected business domains Explain NoSQL big data
management
Install, configure, and run Hadoop and HDFS Perform map-reduce analytics using Hadoop
Use Hadoop related tools such as HBase, Cassandra, Pig, and Hive for big data analytics
REFERENCES:
Michael Minelli, Michelle Chambers, and Ambiga Dhiraj, "Big Data, Big Analytics: Emerging Business
Intelligence and Analytic Trends for Today's Businesses", Wiley, 2013.
P. J. Sadalage and M. Fowler, "NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot
Persistence", Addison-Wesley Professional, 2012.
Tom White, "Hadoop: The Definitive Guide", Third Edition, O'Reilley, 2012.
Eric Sammer, "Hadoop Operations", O'Reilley, 2012.
E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive", O'Reilley, 2012.
Lars George, "HBase: The Definitive Guide", O'Reilley, 2011.
Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010.
Alan Gates, "Programming Pig", O'Reilley, 2011.
CP7025
DATA MINING TECHNIQUES
LTPC
3003
UNIT I
INTRODUCTION TO DATA MINING
9
Introduction to Data Mining Data Mining Tasks Components of Data Mining Algorithms Data
Mining supporting Techniques Major Issues in Data Mining Measurement and Data Data
Preprocessing Data sets
UNIT II OVERVIEW OF DATA MINING ALGORITHMS
Overview of Data Mining Algorithms Models and Patterns Introduction The Reductionist
viewpoint on Data Mining Algorithms Score function for Data Mining Algorithms- Introduction
Fundamentals of Modeling Model Structures for Prediction Models for probability
Distributions and Density functions The Curve of Dimensionality Models for Structured Data
Scoring Patterns Predictive versus Descriptive score functions Scoring Models with Different
Complexities Evaluation of Models and Patterns Robust Methods.
UNIT III
CLASSIFICATIONS
Classifications Basic Concepts Decision Tree induction Bayes Classification Methods Rule
Based Classification Model Evaluation and Selection Techniques to Improve Classification
Accuracy Classification: Advanced concepts Bayesian Belief Networks- Classification by Back
Propagation Support Vector Machine Classification using frequent patterns.
UNIT IV
CLUSTER ANALYSIS 9
Cluster Analysis: Basic concepts and Methods Cluster Analysis Partitioning methods
Hierarchical methods Density Based Methods Grid Based Methods Evaluation of Clustering
Advanced Cluster Analysis: Probabilistic model based clustering Clustering High
Dimensional Data Clustering Graph and Network Data Clustering with Constraints.
UNIT V ASSOCIATION RULE MINING AND VISUALIZATION
Association Rule Mining Introduction Large Item sets Basic Algorithms Parallel and
Distributed Algorithms Comparing Approaches Incremental Rules Advanced Association
Rule Techniques Measuring the Quality of Rules Visualization of Multidimensional Data
Diagrams for Multidimensional visualization Visual Data Mining Data Mining Applications
Case Study: WEKA.
TOTAL: 45 PERIODS
REFERENCE S:
Jiawei Han, Micheline Kamber , Jian Pei, Data Mining: Concepts and Techniques, Third Edition
(The Morgan Kaufmann Series in Data Management Systems), 2012.
David J. Hand, Heikki Mannila and Padhraic Smyth Principles of Data Mining (Adaptive
Computation and Machine Learning), 2005
Margaret H Dunham, Data Mining: Introductory and Advanced Topics, 2003
Soman, K. P., Diwakar Shyam and Ajay V. Insight Into Data Mining: Theory And Practice, PHI,
2009.
CP7029
INFORMATION STORAGE MANAGEMENT
LTPC
3
003
UNIT I
INTRODUCTION TO STORAGE TECHNOLOGY
9
Review data creation and the amount of data being created and understand the value of data to
a business, challenges in data storage and data management, Solutions available for data
storage, Core elements of a data center infrastructure, role of each element in supporting
business activities
UNIT II STORAGE SYSTEMS ARCHITECTURE
Hardware and software components of the host environment, Key protocols and concepts used
by each component ,Physical and logical components of a connectivity environment ,Major
physical components of a disk drive and their function, logical constructs of a physical disk,
access characteristics, and performance Implications, Concept of RAID and its components,
Different RAID levels and their suitability for different application environments: RAID 0, RAID 1,
RAID 3, RAID 4, RAID 5, RAID 0+1, RAID 1+0, RAID 6, Compare and contrast integrated and
modular storage systems ,Iligh-level architecture and working of an intelligent storage system
UNIT III
INTRODUCTION TO NETWORKED STORAGE
Evolution of networked storage, Architecture, components, and topologies of FC-SAN, NAS, and
IP-SAN, Benefits of the different networked storage options, understand the need for long-term
archiving solutions and describe how CAS full fill the need, understand the appropriateness of
the different networked storage options for different application environments
UNIT IV
INFORMATION AVAILABILITY, MONITORING & MANAGING
DATACENTER
9
List reasons for planned/unplanned outages and the impact of downtime, Impact of downtime Differentiate between business continuity (BC) and disaster recovery (DR) ,RTO and RPO,
Identify single points of failure in a storage infrastructure and list solutions to mitigate these
failures, Architecture of backup/recovery and the different backup/ recovery topologies,
replication technologies and their role in ensuring information availability and business continuity,
Remote replication technologies and their role in providing disaster recovery and business
continuity capabilities. Identify key areas to monitor in a data center, Industry standards for data
center monitoring and management, Key metrics to monitor for different components in a storage
infrastructure, Key management tasks in a data center
UNIT V SECURING STORAGE AND STORAGE VIRTUALIZATION 9
Information security, Critical security attributes for information systems, Storage security
domains, List and analyzes the common threats in each domain, Virtualization technologies,
block-level and file-level virtualization technologies and processes
TOTAL: 45 PERIODS
REFERENCE BOOKS:
EMC Corporation, Information Storage and Management, Wiley, India.
Robert Spalding, Storage Networks: The Complete Reference, Tata McGraw Hill , Osborne,
2003.
Marc Farley, Building Storage Networks, Tata McGraw Hill ,Osborne, 2001.
Additional resource material on [Link]/resource-library/[Link]
CP7301
SOFTWARE PROCESS AND PROJECT MANAGEMENT
LTPC
310
4
OBJECTIVES:
To understand overall SDLC and adopt suitable processes
To elicite, analyze, prioritize, and manage both functional and quality requirements
To estimate efforts required, plan, and track the plans
To understand and apply configuration and quality management techniques
To evaluate, manage, and design processes
(A mini-project can be chosen by the instructor and use it as a context for the tutorials)
UNIT I DEVELOPMENT LIFE CYCLE PROCESSES 9
Overview of software development life cycle introduction to processes Personal Software
Process (PSP) Team software process (TSP) Unified processes agile processes
choosing the right process Tutorial: Software development using PSP
UNIT II REQUIREMENTS MANAGEMENT
Functional requirements and quality attributes elicitation techniques Quality Attribute
Workshops (QAW) analysis, prioritization, and trade-off Architecture Centric Development
Method (ACDM) requirements documentation and specification change management
traceability of requirements
Tutorial: Conduct QAW, elicit, analyze, prioritize, and document requirements using ACDM
UNIT III
ESTIMATION, PLANNING, AND TRACKING 9
Identifying and prioritizing risks risk mitigation plans estimation techniques use case points
function points COCOMO II top-down estimation bottom-up estimation work breakdown
structure macro and micro plans planning poker wideband delphi documenting the plan
tracking the plan earned value method (EVM)
Tutorial: Estimation, planning, and tracking exercises
UNIT IV CONFIGURATION AND QUALITY MANAGEMENT 9 identifying artifacts to be
configured naming conventions and version control configuration control quality assurance
techniques peer reviews Fegan inspection unit, integration, system, and acceptance testing
test data and test cases bug tracking causal analysis Tutorial: version control exercises,
development of test cases, causal analysis of defects
UNIT V SOFTWARE PROCESS DEFINITION AND MANAGEMENT 9
Process elements process architecture relationship between elements process modeling
process definition techniques ETVX (entry-task-validation-exit) process baselining process
assessment and improvement CMMI Six Sigma
Tutorial: process measurement exercises, process definition using ETVX
TOTAL 45+15=60 PERIODS
OUTCOMES:
Upon Completion of the course, the students will beable to
Explain software development life cycle
Adopt a suitable process for software development
Elicit functional and quality requirements
Analyze, prioritize, and manage requirements
Perform trade-off among conflicting requirements
Identify and prioritize risks and create mitigation plans
Estimate the efforts required for software development
Perform planning and tracking activities
Control the artifacts during software development
Perform various tests to ensure quality
Define new processes based on the needs
Adopt best practices for process improvement
REFERENCES:
Pankaj Jalote, Software Project Management in Practice, Pearson, 2002.
Chris F. Kemerer, Software Project Management Readings and Cases, McGraw Hill, 1997.
Watts S. Humphrey, PSP: A self-improvement process for software engineers, Addison-Wesley,
2005.
Watts S. Humphrey, Introduction to the Team Software Process, Addison-Wesley, 2000.
Orit Hazzan and Yael Dubinsky, Agile software engineering, Springer, 2008.
James R. Persse, Process Improvement Essentials, OReilly, 2006.
Roger S. Pressman, Software Engineering A Practitioners Approach, Seventh Edition,
McGraw Hill, 2010.