We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6
UNDERSTANDING DATA - DATA are individual facts, statistics, applications.
cations. Using tools for data cleaning will make for
or items of information, often numeric. In a more technical more efficient business practices and quicker . sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects. data are DATA PREPARATION - Data preparation is the process of sometimes said to be transformed into information when cleaning and transforming raw data prior to processing and they are viewed in context or in post-analysis. However, in analysis. It is an important step prior to processing and often academic treatments of the subject data are simply units of involves reformatting data, making corrections to data and INFORMATION SYSTEMS IN MODERN DAY BUSINESS - In information. Data are used in scientific research, businesses the combining of data sets to enrich data. Data preparation is today’s continuously changing and fast moving world, where management (e.g., sales data, revenue, profits, stock price), often a lengthy undertaking for data professionals or business customers’ requirements and preferences are always finance, governance (e.g., crime rates, unemployment rates, users, but it is essential as a prerequisite to put data in evolving, the only businesses that can hope to remain literacy rates), and in virtually every other form of human context in order to turn it into insights and eliminate bias competitive and continue to function at the performance organizational activity (e.g., censuses of the number of resulting from poor data quality. For example, the data levels that can match their customers’ expectations are those homeless people by non-profit organizations). Data are preparation process usually includes standardizing data that are going to embrace innovation. In the recent past, any measured, collected, reported, and analyzed, and used to formats, enriching source data, and/or removing outliers. business success has been pegged on the information create data visualizations such as graphs, tables or images. DATA PREPARATION STEPS - Gather data: The data technology quality that the business has employed and the Data as a general concept refers to the fact that some existing preparation process begins with finding the right data. This capability to correctly use such information. Information information or knowledge is represented or coded in some can come from an existing data catalog or can be added ad- systems (IS) importance has increased dramatically, and most form suitable for better usage or processing. Raw data hoc. Discover and assess data: After collecting the data, it is businesses have been prompted to introduce it to keep their ("unprocessed data") is a collection of numbers or characters important to discover each dataset. This step is about getting competitive edge. Today, nobody can envisage a business before it has been "cleaned" and corrected by researchers. to know the data and understanding what has to be done without an effective information system. Introduction of an Raw data needs to be corrected to remove outliers or obvious before the data becomes useful in a particular context. information system to a business can bring numerous instrument or data entry errors (e.g., a thermometer reading Discovery is a big task, but Talend’s data preparation platform benefits and assist in the way the business handles its from an outdoor Arctic location recording a tropical offers visualization tools which help users profile and browse external and internal processes that a business encounters temperature). TYPES OF DATA - Qualitative Types - their data. Cleanse and validate data: Cleaning up the data daily and decision making for the future. Some of the Qualitative or Categorical Data describes the object under is traditionally the most time consuming part of the data benefits or importance of an information system include: - consideration using a finite set of discrete classes. It means preparation process, but it’s crucial for removing faulty data New Products and Services: Any company looking to improve that this type of data can’t be counted or measured easily and filling in gaps. Important tasks here include: o Removing and secure the future has to establish a broader perspective using numbers and therefore divided into categories. The extraneous data and outliers. o Filling in missing values. o with the use of a well-designed and coordinated information gender of a person (male, female, or others) is a good Conforming data to a standardized pattern. o Masking private system. The IS makes it easier to analyze independent example of this data type. These are usually extracted from or sensitive data entries. Once data has been cleansed, it processes such as information to produce valuable products audio, images, or text medium. Another example can be of a must be validated by testing for errors in the data preparation or services and organized work activities. - Information smart phone brand that provides information about the process up to this point. Often times, an error in the system Storage: Every organization needs records of its activities to current rating, the color of the phone, category of the phone, will become apparent during this step and will need to be find the cause of problems and proper solutions. Information and so on. All this information can be categorized as resolved before moving forward. Transform and enrich systems come in handy when it comes to storing operational Qualitative data. There are two subcategories under this: - data: Transforming data is the process of updating the format data, communication records, documents, and revision Nominal: These are the set of values that don’t possess a or value entries in order to reach a welldefined outcome, or histories. Manual data storage will cost the company lots of natural ordering. Let’s understand this with some examples. to make the data more easily understood by a wider time, especially when it comes to searching for specific data. The color of a smartphone can be considered as a nominal audience. Enriching data refers to adding and connecting data - Easier Decision Making: Without an information system, a data type as we can’t compare one color with others. It is not with other related information to provide deeper insights. company can take a lot of time and energy in the decision possible to state that ‘Red’ is greater than ‘Blue’. -- Ordinal: Store data: Once prepared, the data can be stored or making process. However, with the use of IS, it’s easier to These types of values have a natural ordering while channeled into a third party application—such as a business deliver all the necessary information and model the results maintaining their class of values. If we consider the size of a intelligence tool—clearing the way for processing and analysis and this can help you make better decisions. - Behavioral clothing brand then we can easily sort them according to to take place. INFORMATION -- The term ‘Information, is Change: Employers and employees can communicate rapidly their name tag in the order of small < medium < large. difficult to define precisely although its properties and effects and more effectively with an information system. While Quantitative Types - This data type tries to quantify things are observed in all walks of life. The usage of information has emails are quick and effective, the use of Information systems and it does by considering numerical values that make it given it different namings. The dictionary meaning of this is more efficient since documents are stored in folders that countable in nature. The price of a Smartphone, discount term ‘Knowledge’ ‘Intelligence’ ‘a facts’ ‘data’ ‘a message’ ‘a can be shared and accessed by employees. IMPORTANCE OF offered, number of ratings on a product, the frequency of signal’ which is transmitted by the act or process of INFORMATION PROCESSING IN MANAGEMENT - processor of a Smartphone, or ram of that particular phone, communication. Information is a fact, thought or data Information processing is crucial for effective management. In all these things fall under the category of Quantitative data conveyed or described through various mediums, like written, today's information age, information is a valuable resource types. The key thing is that there can be an infinite number of oral, visual and audio communications. It is knowledge shared that can give businesses a competitive advantage. Managers values a feature can take. For instance, the price of a or obtained through study, instruction, investigation or news need to process information effectively to make informed Smartphone can vary from x amount to any value and it can and you share it through the act of communicating, whether decisions, allocate resources efficiently, and plan for the be further broken down based on fractional values. The two verbally, nonverbally, visually, or through written word. future. - Here are some reasons why information processing subcategories which describe them clearly are:- Discrete: Information has different names, including intelligence, is essential in management: - Decision-making: Managers The numerical values which fall under are integers or whole message, data, signal or fact. Knowing what type of need to make decisions quickly and accurately. Processing numbers are placed under this category. The number of information you need or how to share it can help you save information helps them to identify problems, analyze data, speakers in the phone, cameras, cores in the processor, the time, stay organized and establish best practices for divulging and choose the best course of action. Without effective number of sims supported all these are some of the examples information. STRUCTURED VS. UNSTRUCTURED DATA - Data information processing, managers may make decisions based of the discrete data type. - Continuous: The fractional is the lifeblood of business, and it comes in a huge variety of on incomplete or inaccurate information, which can lead to numbers are considered as continuous values. These can take formats — everything from strictly formed relational poor outcomes. - Resource allocation: Managers need to the form of the operating frequency of the processors, the databases to your last post on Facebook. All of that data, in all allocate resources such as staff, time, and money effectively. android version of the phone, Wi-Fi frequency, temperature different formats, can be sorted into one of two categories:- Information processing helps them to identify where of the cores, and so on. Structured Data - Structured data is structured and unstructured data. Structured vs. unstructured resources are needed most, track resource usage, and adjust data that has been predefined and formatted to a set data can be understood by considering the who, what, when, resource allocation as needed. This can help businesses to structure before being placed in data storage, which is often where, and the how of the data: - Who will be using the optimize their use of resources and improve their bottom referred to as schema-on-write. The best example of data? What type of data are you collecting? When does line. - Planning: Managers need to plan for the future and structured data is the relational database: the data has been the data need to be prepared, before storage or when used? anticipate changes in the market. Effective information formatted into precisely defined fields, such as credit card Where will the data be stored? How will the data be processing helps them to analyze trends, forecast future numbers or address, in order to be easily queried with SQL. stored? - These five questions highlight the fundamentals of demand, and develop strategies to capitalize on opportunities Unstructured Data - Unstructured data is data stored in its both structured and unstructured data, and allow general or mitigate risks. - Communication: Managers need to native format and not processed until it is used, which is users to understand how the two differ. They will also help communicate with employees, stakeholders, and customers known as schema-on-read. It comes in a myriad of file users understand nuances like semistructured data, and guide effectively. Information processing helps them to gather, formats, including email, social media posts, presentations, us as we navigate the future of data in the cloud. organize, and present information in a clear and concise chats, IoT sensor data, and satellite imagery. DATA CLEANING CHARACTERISTICS OF INFORMATION - Subjectivity: The manner. This can help to ensure that everyone has the same - Data cleaning is the process of fixing or removing incorrect, value and usefulness of information are highly subjective, understanding of the situation and can work together corrupted, incorrectly formatted, duplicate, or incomplete because what is information for one person may not be for towards a common goal. data within a dataset. When combining multiple data sources, another. Relevance: Information is good only if it is there are many opportunities for data to be duplicated or relevant - that is, pertinent and meaningful to the decision mislabeled. If data is incorrect, outcomes and algorithms are maker. Timeliness: Information must be delivered at the unreliable, even though they may look correct. There is no right time and the right place to the right person. one absolute way to prescribe the exact steps in the data Accuracy: Information must be free of errors, because cleaning process because the processes will vary from dataset erroneous information can result in poor decisions and erode to dataset. Data cleaning is the process that removes data the confidence of users. Correct information format: that does not belong in your dataset. Data transformation is Information must be in the right format to be useful to the the process of converting data from one format or structure decision maker. Completeness: Information is said to be into another. Transformation processes can also be referred complete if the decision maker can satisfactorily solve the to as data wrangling, or data munging, transforming and problem at hand using that information. Accessibility: mapping data from one "raw" data form into another format Information is useless if it is not readily accessible to decision for warehousing and analyzing. BENEFITS OF DATA CLEANING makers, in the desired format, when it is needed. - Removal of errors when multiple sources of data are at play. Fewer errors make for happier clients and less- frustrated employees. Ability to map the different functions and what your data is intended to do. Monitoring errors and better reporting to see where errors are coming from, making it easier to fix incorrect or corrupt data for future “racking and stacking”—hardware setup, software patching, Migration Service), SMS (Server Migration Service), Snowball. and other time-consuming IT management chores. Cloud - Storage: Amazon Glacier, Amazon Elastic Block Store (EBS), computing removes the need for many of these tasks, so IT AWS Storage Gateway. - Security Services: IAM (Identity and teams can spend time on achieving more important business Access Management), Inspector, Certificate Manager, WAF goals. 5. Performance: The biggest cloud computing services (Web Application Firewall), Cloud Directory, KMS (Key run on a worldwide network of secure datacenters, which are Management Service), Organizations, Shield, Macie, regularly upgraded to the latest generation of fast and GuardDuty. - Database Services: Amazon RDS, Amazon efficient computing hardware. This offers several benefits DynamoDB, Amazon ElastiCache, Amazon RedShift. - ONLINE DATA STORAGE - Online data storage is a virtual over a single corporate datacenter, including reduced Analytics: Athena, CloudSearch, ElasticSearch, Kinesis, storage approach that allows users to use the Internet to network latency for applications and greater economies of QuickSight., EMR (Elastic Map Reduce), Data Pipeline. - store recorded data in a remote network. This data storage scale. Management Services: CloudWatch, CloudFormation, method may be either a cloud service component or used CloudTrail, OpsWorks, Config, Service Catalog, AWS Auto with other options not requiring on-site data backup. Online 6. Reliability: Cloud computing makes data backup, disaster Scaling, Systems Manager, Managed Services. data storage uses Internet channels to store information on recovery and business continuity easier and less expensive remote servers kept secure by service providers. Explore the because data can be mirrored at multiple redundant sites on Internet of Things: IoT Core, IoT Device Management, IoT benefits and challenges of using online data storage, with the cloud provider’s network. 7. Security: Many cloud Analytics, Amazon FreeRTOS. - Application Services: Step many famous companies as examples. It can cost a company providers offer a broad set of policies, technologies and Functions, SWF (Simple Workflow Service), SNS (Simple a lot of money to store data on-site, and every day someone controls that strengthen your security posture overall, helping Notification Service), SQS (Simple Queue Service), Elastic loses their entire family photo album. Online data storage is a protect your data, apps and infrastructure from potential Transcoder, Deployment and Management, AWS CloudTrail, virtual storage model that lets users and businesses upload threats. TYPES OF CLOUD COMPUTING - Public cloud: Amazon CloudWatch, AWS CloudHSM. - Developer Tools: their data across Internet channels to a remote data network. Public clouds are owned and operated by a thirdparty cloud CodeStar, CodeCommit, CodeBuild, CodeDeploy, Data is stored in the cloud, or stored on servers that are not service providers, which deliver their computing resources CodePipeline, Cloud9. - Mobile Services: Mobile Hub, owned by the person using them. Other users can also access like servers and storage over the Internet. Microsoft Azure is Cognito, Device Farm, AWS AppSync. Cloud based services the same infrastructure. It's like a cloud in that we can all see an example of a public cloud. With a public cloud, all offered by Google - Compute: App Engine, Compute Engine, the same cloud in the sky but no single individual owns it, yet hardware, software and other supporting infrastructure is Google Cloud VMware Engine (GCVE). - Storage: Cloud we can each access it. Unlike a USB drive, external hard drive, owned and managed by the cloud provider. You access these Storage, Persistent Disk, Cloud Filestor, Cloud Storage for or flash drive, users do not need to carry around a physical services and manage your account using a web browser. Firebase. - Databases: Cloud Bigtable, Datastore, Firestore, device to store their data. They just have to remember a Learn more about the public cloud. - Private cloud: A Memorystore, Cloud Spanner, Cloud SQL. - Networking: password and trust the security of the service provider. private cloud refers to cloud computing resources used Cloud CDN, Cloud DNS, Cloud IDS (Cloud Intrusion Detection Online storage is a viable option for data backup: it provides exclusively by a single business or organisation. A private System), Cloud Interconnect, Cloud Load Balancing, Cloud both security and convenience to the end user. Small cloud can be physically located on the company’s on-site NAT (Network Address Translation), Cloud Router, Cloud VPN, businesses and individuals may not have the network datacenter. Some companies also pay third-party service Google Cloud Armor, Google Cloud Armor Managed bandwidth or the resources to maintain a strong on-site providers to host their private cloud. A private cloud is one in Protection Plus, Network Connectivity Center, Network storage and retrieval system. Further, online storage may which the services and infrastructure are maintained on a Intelligence Center, Network Service Tiers, Service Directory, alleviate the need to have physical backups of the data: the private network. Learn more about the private cloud. - Traffic Director, Virtual Private Cloud. - Operations: Cloud storage solution provider may already do this at their data Hybrid cloud: Hybrid clouds combine public and private Debugger, Cloud Logging, Cloud Monitoring, Cloud Profiler, centers. Advantages - Data storage saving / World Wide clouds, bound together by technology that allows data and Cloud Trace. - Developer Tools: Artifact Registry, Container accessibility / Data safety / Security / Easy sharing / Data applications to be shared between them. By allowing data Registry, Cloud Build, Cloud Source Repositories, Firebase Test recovery / Automatic backup . Disadvantages - Improper and applications to move between private and public clouds, Lab, Google Cloud Deploy. - Data Analytics: BigQuery, handing can cause trouble / Choose trustworthy source to a hybrid cloud gives your business greater flexibility, more Cloud Composer, Cloud Data Fusion, Cloud Life Sciences avoid any hazard / Internet connection. RELEVANCE OF deployment options and helps optimise your existing (formerly Google Genomics), Data Catalog, Dataplex, ONLINE DATA PROCESSING - Ease of making reports - Since infrastructure, security and compliance. Learn more about Dataflow, Datalab, Dataproc, Dataproc Metastore, data is already processed, it can be obtained and used the hybrid cloud. USES OF CLOUD COMPUTING - Create Datastream, Pub/Sub. - User Protection Services: reCAPTCHA directly. These processed facts and figures can be arranged cloud-native applications: Quickly build, deploy and scale Enterprise, Web Risk API. - Serverless Computing: Cloud Run, appropriately such that it that helps executives in making applications—web, mobile and API. Take advantage of cloud- Cloud Functions, Cloud Functions for Firebase, Cloud quick analysis. Pre-defined reports help professionals in native technologies and approaches, such as containers, Scheduler, Cloud Tasks, Eventarc, Workflows. Cloud based making reports speedily. - Accuracy and speed - Kubernetes, microservices architecture, API-driven services offered by IBM - Compute Infrastructure — includes Digitization helps to process the information quickly. communication and DevOps. - Test and build applications: its bare metal servers (singletenant servers that are highly Thousands of files can be processed in a minute, storing Reduce application development cost and time by using cloud customizable), virtual servers, GPU computing, POWER required information from each. During business data infrastructures that can easily be scaled up or down. - servers (based on IBM’s POWER architecture) and server processing, the system itself checks for and takes care of Store, back up and recover data: Protect your data more software. - Compute Services — includes OpenWhisk invalid data or errors. Such processes thus help companies costefficiently—and at massive scale—by transferring your serverless computing, containers and Cloud Foundry ensure a high accuracy in information management. - data over the Internet to an offsite cloud storage system that runtimes. - Storage — includes object, block and file Cost reduction - The cost of digitized processing is much is accessible from any location and any device. - Analyze storage, as well as serverbackup capabilities. - Network — lesser than that of managing and maintaining paper data: Unify your data across teams, divisions and locations in includes load balancing, Direct Link private secure documents. It decreases expenditure on stationery such as the cloud. Then use cloud services, such as machine learning connections, network appliances, content delivery network photo copies and mailing by using digital information and and artificial intelligence, to uncover insights for more and domain services. - Mobile — includes IBM’s Swift tools email system. Companies can thus save millions of dollars informed decisions. - Stream audio and video: Connect for creating iOS apps, its MobileFirst Starter package for every year by improving their data management systems. - with your audience anywhere, anytime, on any device with getting a mobile app up and running, and its Mobile Easy storage – Online data processing helps to increase the high-definition video and audio with global distribution. - Foundation app back-end services. - Watson — includes storage space for adding, managing and modifying Embed intelligence: Use intelligent models to help engage IBM’s artificial intelligence and machine learning services, information. By eliminating unnecessary paperwork, it customers and provide valuable insights from the data which it calls “cognitive computing,” such as Discovery search minimizes clutter and also improves search efficiency by captured. - Deliver software on demand: Also known as and content analytics, Conversation natural language services elimination the need to go through data manually. CLOUD software as a service (SaaS), on-demand software lets you and speech-to-text. - Data and analytics — includes data COMPUTING - Cloud computing is the on-demand offer the latest software versions and updates around to services, analytics services, big data hosting, Cloudera availability of computer system resources, especially data customers—anytime they need, anywhere they are. What hosting, MongoDB hosting and Riak hosting. - Internet of storage (cloud storage) and computing power, without direct advantages do organizations have in adopting cloud storage Things — includes IBM’s IoT platform and its IoT starter active management by the user. Large clouds often have and cloud computing. - Scalability: Cloud services allow packages. - Security — includes tools for securing cloud functions distributed over multiple locations, each location organizations to easily scale their storage and computing environments, such as a firewall, hardware security modules being a data center. Cloud computing relies on sharing of resources up or down as needed. This means that they can (physical devices with key management capabilities), Intel resources to achieve and economies of scale, typically using a quickly respond to changes in demand without having to Trusted Execution Technology, security software and SSL "pay-as-you-go" model which can help in reducing capital invest in costly hardware or infrastructure. - Cost savings: certificates. - DevOps — includes the Eclipse IDE, expenses but may also lead to unexpected operating Using cloud services can be more cost-effective than continuous delivery tools and availability monitoring. - expenses for unaware users. Simply put, cloud computing is maintaining and upgrading in-house systems. Organizations Application services — includes Blockchain, Message hub and the delivery of computing services—including servers, can avoid the costs of hardware, software, and maintenance, business rules, among others. E-COMMERCE APPLICATIONS - storage, databases, networking, software, analytics, and and only pay for the resources they actually use. - Retail and Wholesale: Ecommerce has numerous intelligence—over the Internet (“the cloud”) to offer faster Flexibility: Cloud services can be accessed from anywhere applications in this sector. E-retailing is basically a B2C, and in innovation, flexible resources, and economies of scale. You with an internet connection, making it easier for employees some cases, a B2B sale of goods and services through online typically pay only for cloud services you use, helping lower to work remotely or collaborate with partners and customers stores designed using virtual shopping carts and electronic your operating costs, run your infrastructure more efficiently in different locations. - Security: Cloud providers often have catalogs. A subset of retail ecommerce is mcommerce, or and scale as your business needs change. BENEFITS OF extensive security measures in place to protect against data mobile commerce, wherein a consumer purchases goods and CLOUD COMPUTING - 1. Cost: Cloud computing eliminates breaches, which can be more effective than what individual services using their mobile device through the mobile the capital expense of buying hardware and software and organizations can implement on their own. - Improved optimized site of the retailer. - Online Marketing: This setting up and running on-site data centers —the racks of reliability: Cloud providers typically have redundant systems refers to the gathering of data about consumer behaviors, servers, the round-the-clock electricity for power and cooling, and backup mechanisms in place, which can provide greater preferences, needs, buying patterns and so on. It helps the IT experts for managing the infrastructure. It adds up fast. reliability and ensure that data and services are always marketing activities like fixing price, negotiating, enhancing 2. Speed: Most cloud computing services are provided self available. - Faster innovation: Cloud providers often offer product features, and building strong customer relationships service and on demand, so even vast amounts of computing the latest technologies and software, allowing organizations as this data can be leveraged to provide customers a tailored resources can be provisioned in minutes, typically with just a to innovate more quickly and stay ahead of the competition. and enhanced purchase experience. - Finance: Banks and few mouse clicks, giving businesses a lot of flexibility and Cloud based services offered by Amazon - Amazon Web other financial institutions are using e-commerce to a taking the pressure off capacity planning. 3. Global scale: The Services offers a wide range of different business purpose significant extent. Customers can check account balances, benefits of cloud computing services include the ability to global cloud-based products. The products include storage, transfer money to other accounts held by them or others, pay scale elastically. In cloud speak, that means delivering the databases, analytics, networking, mobile, development tools, bills through internet banking, pay insurance premiums, and right amount of IT resources—for example, more or less enterprise applications, with a pay-as-you-go pricing model. - so on. - Manufacturing: Supply chain operations also use computing power, storage, bandwidth—right when it is AWS Compute Services: EC2(Elastic Compute Cloud), ecommerce; usually, a few companies form a group and needed and from the right geographic location. 4. LightSail, Elastic Beanstalk, EKS (Elastic Container Service for create an electronic exchange and facilitate purchase and sale Productivity: On-site datacenters typically require a lot of Kubernetes, AWS Lambda. - Migration: DMS (Database of goods, exchange of market information, back office information like inventory control, and so on. - Online Volume: Websites have become great sources and cost to replace the whole machine can be saved. 8. Education Booking: This is something almost every one of us has done repositories for many kinds of data. User clickstreams are Sector: Online educational course conducting organization at some time –book hotels, holidays, airline tickets, travel recorded and stored for future use. Social media applications utilize big data to search candidate, interested in that course. insurance, etc. These bookings and reservations are made such as Facebook, Twitter, Pinterest, and other applications If someone searches for YouTube tutorial video on a subject, have enabled users to become prosumers (producers and then online or offline course provider organization on that possible through an internet booking engine or IBE. Online consumers) of data. There is an increase in the number of subject send ad online to that person about their course. 9. Publishing: This refers to the digital publication of books, data shares and also the size of each data element. High- Energy Sector: Smart electric meter read consumed power magazines, catalogues, and developing digital libraries.- definition videos can increase the total shared data. There are every 15 minutes and sends this read data to the server, Digital Advertising: Online advertising uses the internet to autonomous data streams of video, audio, text, data, and so where data analyzed and it can be estimated what is the time deliver promotional material to consumers; it involves a on coming from social media sites, websites, RFID in a day when the power load is less throughout the city. 10. publisher, and an advertiser. The advertiser provides the ads, applications, and so on. Media and Entertainment Sector: Media and entertainment service providing company like Netflix, Amazon Prime, Spotify and the publisher integrates ads into online content.- do analysis on data collected from their users. Data like what Auctions: Online auctions bring together numerous people type of video, music users are watching, listening most, how from various geographical locations and enable trading of long users are spending on site, etc are collected and items at negotiated prices, implemented with e-commerce analyzed to set the next business strategy. technologies. It enables more people to participate in auctions.
BIG DATA - Big Data is an umbrella term for a collection of
datasets so large and complex that it becomes difficult to process them using traditional data management tools. There has been increasing democratization of the process of IMPORTANCE OR RELEVANCE OF BIG DATA - 1. Cost Savings: content creation and sharing over the Internet using social Big Data tools like Apache Hadoop, Spark, etc. bring cost- media applications. The combination of cloud-based storage, saving benefits to businesses when they have to store large social media applications, and mobile access devices is amounts of data. These tools help organizations in identifying BUSINESS IMPLICATIONS OF BIG DATA - Any industry that helping crystallize the big data phenomenon. The leading more effective ways of doing business. 2. Time-Saving: Real- produces information-based products is most likely to be management consulting firm, McKinsey & Co. created a time in-memory analytics helps companies to collect data disrupted. Thus, the newspaper industry has taken a hit from flutter when it published a report in 2011 showing a huge from various sources. Tools like Hadoop help them to analyze digital distribution channels, as well as from published-on- impact of such big data on business and other organizations. data immediately thus helping in making quick decisions web-only blogs. Entertainment has also been impacted by They also reported that there will be millions of new jobs in based on the learnings. 3. Understand the market digital distribution and piracy as well as by user-generated- the next decade related to the use of big data in many conditions: Big Data analysis helps businesses to get a better and uploaded content on the internet. The education industry industries .Big data can be used to discover new insights from understanding of market situations. For example, analysis of is being disrupted by massively on-line open courses a 360-degree view of a situation that can allow for a complete customer purchasing behavior helps companies to identify (MOOCs) and user-uploaded content. Healthcare delivery is new perspective on situations, new models of reality, and the products sold most and thus produces those products impacted by electronic health records and digital medicine. potentially new types of solutions. It can help spot business accordingly. This helps companies to get ahead of their The retail industry has been highly disrupted by e-commerce trends and opportunities. For example, Google is able to competitors. 4. Social Media Listening: Companies can companies. Fashion companies are impacted by quick predict the spread of a disease by tracking the use of search perform sentiment analysis using Big Data tools. These enable feedback on their designs on social media. The banking terms related to the symptoms of the disease over the globe them to get feedback about their company, that is, who is industry has been impacted by the cost-effective online in real time. Big data can help determine the quality of saying what about the company. 5. Boost Customer selfserve banking applications and this will impact research, prevent diseases, link legal citations, combat crime, Acquisition and Retention: Customers are a vital asset on employment levels in the industry. There is a rapid change in and determine real-time roadway traffic conditions. Big data which any business depends on. No single business can business models enabled by big data technologies. Steve is enabling evidence-based medicine, and many other achieve its success without building a robust customer base. Jobs, the ex-CEO of Apple, conceded that his company’s innovations. Data has become the new natural resource. But even with a solid customer base, the companies can’t products and business models would be disrupted. He Organizations have a choice in how to engage with this ignore the competition in the market. 6. Solve Advertisers preferred his older products to be cannibalized by his own exponentially growing volume, variety and velocity of data. Problem and Offer Marketing Insights: Big data analytics new products rather than by those of the competition. Every They can choose to be buried under the avalanche or they shapes all business operations. It enables companies to fulfill other business too will likely be disrupted. The key issue for can choose to use it for competitive advantage. Challenges in customer expectations. Big data analytics helps in changing business is how to harness big data to generate growth big data include the entire range of operations from capture, the company’s product line. It ensures powerful marketing opportunities and to leapfrog competition. Organizations curation, storage, search, sharing, analysis, and visualization. campaigns. 7. The driver of Innovations and Product need to learn how to organize their businesses so that they Big data is more valuable when analyzed as a whole. More Development: Big data makes companies capable to innovate do not get buried in high volume, velocity, and the variety of and more information is derivable from analysis of a single and redevelop their products. APPLICATIONS OF BIG DATA - data, but instead use it smartly and proactively to obtain a large set of related data, as compared to separate smaller 1. Tracking Customer Spending Habit, Shopping Behavior: In quick but decisive advantage over their competition. sets. However, special tools and skills are needed to manage big retails store (like Amazon, Walmart, Big Bazar etc.) Organizations need to figure out how to use big data as a such extremely large datasets. CHARACTERISTICS OF BIG management team has to keep data of customer’s spending strategic asset in real time, to identify opportunities, thwart DATA - Variety: There are many types of data, including habit (in which product customer spent, in which band they threats, build new capabilities, and enhance operational structured and unstructured data. Structured data consists of wish to spent, how frequently they spent), shopping behavior, efficiencies. TECHNOLOGY IMPLICATIONS OF BIG DATA - The numeric and text fields. Unstructured data includes images, customer’s most liked product (so that they can keep those growth of data is made possible in part by the advancement video, audio, and many other types. There are also many products in the store). Which product is being searched/sold of storage technology. The attached graph shows the growth sources of data. The traditional sources of structured data most, based on that data, production/collection rate of that of disk drive average capacities. The cost of storage is falling, include data from ERPs systems and other operational product get fixed. 2. Recommendation: By tracking customer the size of storage is getting smaller, and the speed of access systems. Sources for unstructured data include social media, spending habit, shopping behavior, Big retails store provide a is going up. Flash drives have become cheaper. Random web, RFID, machine data, and others. Unstructured data recommendation to the customer. E-commerce site like access memory storage used to be expensive but now is so comes in a variety of sizes, resolutions, and are subject to Amazon, Walmart, Flipkart does product recommendation. inexpensive that entire databases can be loaded and different kinds of analysis. For example, video files can be They track what product a customer is searching, based on processed quickly instead of swapping sections of it into and tagged with labels and they can be played, but video data is that data they recommend that type of product to that out of high-speed memory. New data management and typically not computed which is the same with audio data. customer. processing technologies have emerged. IT professionals Graphic data can be analyzed for network distances. 3. Smart Traffic System: Data about the condition of the traffic integrate big data structured assets with content and must Facebook texts and tweets can be analyzed for sentiments of different road, collected through camera kept beside the increase their business requirement identification skills. Big but cannot be directly compared. - Velocity: The Internet road, at entry and exit point of the city, GPS device placed in data is going democratic. Business functions will be protective greatly increases the speed of movement of data, from e- the vehicle (Ola, Uber cab, etc.). All such data are analyzed of their data and will begin initiatives around exploiting it. IT mails to social media to video files, data can move quickly. and jam-free or less jam way, less time taking ways are support teams need to find ways to support end-user- Cloud-based storage makes sharing instantaneous and easily recommended. Such a way smart traffic system can be built in deployed big data solutions. Enterprise data warehouses will accessible from anywhere. Social media applications enable the city by Big data analysis. One more profit is fuel need to include big data in some form. The IT platform needs people to share their data with each other instantly. Mobile consumption can be reduced. 4. Secure Air Traffic System: At to be strengthened to help provide the enablement of a access to these applications also speeds up the generation various places of flight (like propeller etc) sensors present. ‘digital business strategy’ around digital assets and These sensors capture data like the speed of flight, moisture, capabilities. DATA SEARCH ALGORITHMS IN SEARCH ENGINES temperature, other environmental condition. Based on such - A search engine algorithm is a complex algorithm used by data analysis, an environmental parameter within flight are search engines such as Google, Yahoo, and Bing to determine set up and varied. By analyzing flight’s machine-generated a web page’s significance. According to Net craft, an Internet data, it can be estimated how long the machine can operate research company, there are over 150,000,000 active flawlessly when it to be replaced/repaired. 5. Auto Driving websites on the Internet. Without search engines, there Car: Big data analysis helps drive a car without human would be no way to determine which of these sites are interpretation. In the various spot of car camera, a sensor worthy of viewers time, and which sites are simply spam. placed, that gather data like the size of the surrounding car, Search engines collect significant data, which allows them to obstacle, distance from those, etc. 6. Virtual Personal almost instantly determine whether a site is spam or relevant Assistant Tool: Big data analysis helps virtual personal data. Relevant sites receive high rankings in search engines, assistant tool (like Siri in Apple Device, Cortana in Windows, and spam or irrelevant sites can receive exceptionally low Google Assistant in Android) to provide the answer of the rankings. Each search engine uses a search engine algorithm, various question asked by users. This tool tracks the location and no two search engines use exactly the same formula to of the user, their local time, season, other data related to determine a page’s ranking. However, there are several things question asked, etc. 7. IoT: Manufacturing company install that all search engines will look for when crawling a web IOT sensor into machines to collect operational data. and access to data. page. List a few data search algorithms - Linear Search: In Analyzing such data, it can be predicted how long machine linear search, also known as sequential search, the algorithm will work without any problem when it requires repairing so sequentially checks each element in the list until the desired that company can take action before the situation when element is found. - Binary Search: Binary search is an machine facing a lot of issues or gets totally down. Thus, the efficient algorithm for finding an item from a sorted list. It works by repeatedly dividing the list in half and eliminating in the prediction of customer behavior. use of Customer organization’s operations. Now, that’s a fairly boring definition the half that cannot contain the item. - Hashing: Hashing is a Analytics - Retail: Although until recently over 90% of for an idea that I sincerely believe has the potential to change technique that maps data of arbitrary size to a fixed-size retailers had limited visibility on their customers, with how organizations use data. Instead of using dashboards or value. This can be useful for indexing and searching data increasing investments in loyalty programs, customer tracking reports to understand your data, operational analytics drives quickly. - Depth-First Search (DFS): DFS is an algorithm for solutions and market research, this industry started action by automatically delivering real-time data to the exact traversing or searching a tree or graph data structure. It starts increasing use of customer analytics in decisions ranging from place it’ll be most useful, no matter where that is in your at the root node and explores as far as possible along each product, promotion, price and distribution management. - organization. And when implemented well, this flow of real- branch before backtracking. - Breadth-First Search (BFS): Retail Management: Companies can use data about time data (from data warehouses to people who can actually BFS is another algorithm for traversing or searching a tree or customers to restructure retail management. This do something with that data) becomes an undercurrent of graph data structure. It starts at the root node and explores restructuring using data often occurs in dynamic scheduling insights powering important daily decisions, both big and all the neighbor nodes at the current depth before moving on and worker evaluations. Through dynamic scheduling, small. Why You Should Use Operational Analytics - Any to the next depth level. - A* Search: A* search is an companies optimize staffing through predictive scheduling company with more than a handful of customers will likely informed search algorithm that uses heuristics to guide the software based on predictive customer traffic. - need operational analytics. Once you reach a point of having search. It is often used in pathfinding and other optimization Criticisms of Use: As retail technologies become more data a few customers’ data flowing into your data warehouse, problems. driven, use of customer analytics use has raised criticisms you’ll quickly find the limitations of the individual tools you’re specifically in how they affect the retail worker. - Finance: currently using. Operational analytics not only provides a way Banks, insurance companies and pension funds make use of to overcome these limitations at scale but also allows your customer analytics in understanding customer lifetime value, data team to step up and take a more proactive role in how identifying below-zero customers which are estimated to be your business uses data. Without operational analytics, you around 30% of customer base, increasing cross-sales, have to rely on the analytics capabilities of your individual managing customer attrition as well as migrating customers tools. to lower cost channels in a targeted manner. - Community: Municipalities utilize customer analytics in an effort to lure retailers to their cities. - Customer relationship management: Analytical Customer Relationship Management, commonly abbreviated as CRM, enables measurement of and prediction from customer data to provide a 360° view of the client. DIGITAL ADVERTISEMENTS - Digital advertising refers to marketing through online channels, such as websites, streaming content, and more. Digital ads span media formats, including text, image, audio, and video. They can help you COMPLIANCE ANALYTICS- compliance analytics: It’s the achieve a variety of business goals across the marketing process of gathering all the data the company holds (and funnel, ranging from brand awareness to customer even data that it does not hold) and analyzing it using engagement, to launching new products and driving repeat statistical algorithms to mine for patterns and anomalies to MACHINE LEARNING - Machine learning (ML) is the study of sales. The field of digital advertising is relatively young, in uncover things like fraud, policy violations, and other computer algorithms that can improve automatically through comparison to traditional channels such as magazines, misconduct. “Compliance analytics is about using data to experience and by the use of data. It is seen as a part of billboards, and direct mail. The evolution of advertising isn't derive insights from a compliance perspective,” says Seth artificial intelligence. Machine learning algorithms build a just about what the ads look like or where they appear, but Rosensweig, a partner at PwC and head of its digital risk, model based on sample data, known as training data, in order also the ways they're built, sold, and measured. What are the regulatory, and compliance practice. Depending on where a to make predictions or decisions without being explicitly different types of digital advertising? - Search advertising: company is along the analytics maturity spectrum, programmed to do so. Machine learning algorithms are used Search ads, also called search engine marketing (SEM), Rosensweig explains, compliance analytics can be used to in a wide variety of applications, such as in medicine, email appear in search engine results pages (SERPs). These are derive insights for a variety of purposes, including: filtering, speech recognition, and computer vision, where it is typically text ads that appear above or alongside organic Descriptive analytics: What happened in a given situation? difficult or unfeasible to develop conventional algorithms to search results. - Display advertising: Display ads are online Diagnostic analytics: Why did it happen? Predictive perform the needed tasks. A subset of machine learning is ads that use text and visual elements, such as an image or analytics: What could happen? Prescriptive analytics: What closely related to computational statistics, which focuses on animation, and can appear on websites, apps, and devices. is the best course of action for a given situation? What can making predictions using computers; but not all machine They appear in or alongside the content of a website.\ - the business do to improve? - “Dealing with a huge amount learning is statistical learning. The study of mathematical Online video advertising: Online video ads are ads that use a of data traditionally was a very laborious activity for optimization delivers methods, theory and application video format. Out-stream video ads appear in places similar compliance functions,” says Shaheen Dil, managing director domains to the field of machine learning. Data mining is a to display ads: on websites, apps, and devices. In-stream and global solution leader for data management and related field of study, focusing on exploratory data analysis video ads appear before, during, or after video content. - advanced analytics at Protiviti. Manually sifting through data through unsupervised learning. Some implementations of Streaming media advertising: Also known as over-the-top also leaves the door open for misconduct or a policy violation machine learning use data and neural networks in a way that (OTT), these are a specific type of video ad that appears in to go undetected—a very real concern for a global financial mimics the working of a biological brain. In its application streaming media content delivered over the Internet without institution, for example, that typically has dozens of lines of across business problems, machine learning is also referred to satellite or cable. - Audio advertising: In the context of business, has millions of customers, and manages billions of as predictive analytics. Machine learning programs can digital advertising, audio ads are ads that play before, during, records. Merely taking a risk-based sample of data doesn’t perform tasks without being explicitly programmed to do so. or after online audio content, such as streaming music or satisfy regulators, Dil says, because it raises the question, It involves computers learning from data provided so that podcasts. - Social media advertising: Social media ads “How do you know you’ve picked a comprehensive data they carry out certain tasks. For simple tasks assigned to appear in social media platforms, such as Twitter or LinkedIn. sample? How do you know this sample covers all your computers, it is possible to program algorithms telling the RECOMMENDER SYSTEMS - A recommender system, or a potential risks?”. FRAUD ANALYTICS - Fraud analytics is the machine how to execute all steps required to solve the recommendation system (sometimes replacing 'system' with use of big data analysis techniques to prevent online financial problem at hand; on the computer's part, no learning is a synonym such as platform or engine), is a subclass of fraud. It can help financial organizations predict future needed. For more advanced tasks, it can be challenging for a information filtering system that seeks to predict the "rating" fraudulent behavior, and help them apply fast detection and human to manually create the needed algorithms. In practice, or "preference" a user would give to an item. Recommender mitigation of fraudulent activity in real time. The challenge of it can turn out to be more effective to help the machine systems are used in a variety of areas, with commonly financial fraud - Banks and other financial institutions have a develop its own algorithm, rather than having human recognised examples taking the form of playlist generators for responsibility to their customers to secure their data and programmers specify every needed step. Describe machine video and music services, product recommenders for online finances against fraud or outright theft. This has become a intelligence applications in business and data analytics. - stores, or content recommenders for social media platforms complex task due, at least in part, to customers being able to Predictive analytics: Machine learning algorithms can analyze and open web content recommenders. These systems can access their accounts via multiple channels. They can do their large volumes of data to identify patterns and make operate using a single input, like music, or multiple inputs banking transactions using a mobile banking app, online predictions about future trends or events. This helps within and across platforms like news, books, and search banking portal, by calling into the call center, or even visiting businesses make informed decisions, such as predicting queries. There are also popular recommender systems for the bank in person. A teller can verify a customer’s identity customer behavior, identifying market trends, or forecasting specific topics like restaurants and online dating. with reasonable confidence. But how do you verify that the sales. - Fraud detection: Machine learning algorithms can Recommender systems have also been developed to explore person logging into a bank account online is actually that analyze transactions, behavior patterns, and other data to research articles and experts, collaborators, and financial person and not a fraudster logging in with stolen credentials? detect potential fraud. This helps businesses prevent losses services. Recommender systems usually make use of either or The number of stolen credentials available to fraudsters is and protect themselves against financial crimes. - Customer both collaborative filtering and content-based filtering (also staggering. There are over 15 billion stolen credentials for service: Chatbots and virtual assistants powered by natural known as the personality-based approach), as well as other sale on the dark web. Cybercriminals can purchase them for language processing and machine learning can provide systems such as knowledge-based systems. Collaborative as little as an average $15.43 for consumer credentials to automated customer support. This helps businesses improve filtering approaches build a model from a user's past behavior more than an average $3,139 for credentials for an customer satisfaction and reduce costs associated with (items previously purchased or selected and/or numerical organization’s key systems. Fraud analytics is key to financial customer service. - Supply chain optimization: Machine ratings given to those items) as well as similar decisions made fraud risk management - The bad news is that online fraud is learning can be used to optimize supply chain operations, by other users. This model is then used to predict items (or constantly evolving. As banks put remediation measures in such as predicting demand, improving inventory ratings for items) that the user may have an interest in. place, new threats appear. Traditional, static rules-based management, and optimizing logistics. This helps businesses Content-based filtering approaches utilize a series of discrete, fraud prevention systems can’t keep pace. The good news is reduce costs and improve efficiency. - Marketing pre-tagged characteristics of an item in order to recommend that there is a wealth of data available to financial automation: Machine learning can be used to personalize additional items with similar properties. CUSTOMER organizations that can be used to predict and detect financial marketing campaigns based on customer behavior and ANALYTICS - Customer analytics is a process by which data fraud and adapt to new threats. Collecting a username and preferences. This helps businesses improve the effectiveness from customer behavior is used to help make key business password at login is no longer sufficient to guard against of their marketing efforts and increase customer engagement. decisions via market segmentation and predictive analytics. fraudulent activity. When someone accesses, or attempts to - Sentiment analysis: Machine learning can analyze social This information is used by businesses for direct marketing, access, an account there is other data that can be used to media and other sources of customer feedback to identify site selection, and customer relationship management. determine whether or not this is a legitimate customer and trends and sentiment. This helps businesses improve their Marketing provides services in order to satisfy customers. whether or not the transaction requested is legitimate. products and services based on customer feedback. TYPES OF With that in mind, the productive system is considered from OPERATIONAL ANALYTICS - Operational analytics is a type of LEARNING ALGORITHMS - Supervised learning - its beginning at the production level, to the end of the cycle analytics that informs day-to-day decisions with the goal of Supervised learning algorithms build a mathematical model at the consumer. Customer analytics plays an important role improving the efficiency and effectiveness of your of a set of data that contains both the inputs and the desired outputs. The data is known as training data, and consists of a introduces non-linearity by taking advantage of the kernel COLLECTION) - Data collection is the process of gathering and set of training examples. Each training example has one or trick to implicitly map input variables to higher-dimensional measuring information on targeted variables in an established more inputs and the desired output, also known as a space. - A Bayesian network, belief network, or directed system, which then enables one to answer relevant questions supervisory signal. In the mathematical model, each training acyclic graphical model is a probabilistic graphical model that and evaluate outcomes. Data collection is a research example is represented by an array or vector, sometimes represents a set of random variables and their conditional component in all study fields, including physical and social called a feature vector, and the training data is represented by independence with a directed acyclic graph (DAG). For sciences, humanities, and business. While methods vary by a matrix. Through iterative optimization of an objective example, a Bayesian network could represent the discipline, the emphasis on ensuring accurate and honest function, supervised learning algorithms learn a function that probabilistic relationships between diseases and symptoms. collection remains the same. The goal for all data collection is can be used to predict the output associated with new inputs. Given symptoms, the network can be used to compute the to capture quality evidence that allows analysis to lead to the - Unsupervised learning : algorithms take a set of data that probabilities of the presence of various diseases. Efficient formulation of convincing and credible answers to the contains only inputs, and find structure in the data, like algorithms exist that perform inference and learning. questions that have been posed. Data collection and grouping or clustering of data points. The algorithms, Bayesian networks that model sequences of variables, like validation consists of four steps when it involves taking a therefore, learn from test data that has not been labeled, speech signals or protein sequences, are called dynamic census and seven steps when it involves sampling. A formal classified or categorized. Instead of responding to feedback, Bayesian networks. Generalizations of Bayesian networks that data collection process is necessary as it ensures that the data unsupervised learning algorithms identify commonalities in can represent and solve decision problems under uncertainty gathered are both defined and accurate. This way, the data and react based on the presence or absence of such are called influence diagrams. - A genetic algorithm (GA) is subsequent decisions based on arguments embodied in the commonalities in each new piece of data. – Semi- a search algorithm and heuristic technique that mimics the findings are made using valid data. The process provides both supervised learning : falls between unsupervised learning process of natural selection, using methods such as mutation a baseline from which to measure and in certain cases an (without any labeled training data) and supervised learning and crossover to generate new genotypes in the hope of indication of what to improve. There are 5 common data (with completely labeled training data). Some of the training finding good solutions to a given problem. In machine collection methods:1. Closed-ended surveys and quizzes, 2. examples are missing training labels, yet many machine- learning, genetic algorithms were used in the 1980s and Open-ended surveys and questionnaires, 3. 1-on-1 interviews, learning researchers have found that unlabeled data, when 1990s. Conversely, machine learning techniques have been 4. Focus groups, and 5. Direct observation. DATA STORAGE used in conjunction with a small amount of labeled data, can used to improve the performance of genetic and evolutionary AND KNOWLEDGE MANAGEMENT - Data storage and produce a considerable improvement in learning accuracy. - algorithms. - Training models Typically, machine learning knowledge management are two closely related areas that Reinforcement learning : is an area of machine learning models require a high quantity of reliable data in order for are essential for organizations to manage and leverage data concerned with how software agents ought to take actions in the models to perform accurate predictions. When training a effectively. Data storage involves the physical storage and an environment so as to maximize some notion of cumulative machine learning model, machine learning engineers need to retrieval of data in a way that is secure, reliable, and efficient. reward. Due to its generality, the field is studied in many target and collect a large and representative sample of data. There are various types of data storage technologies other disciplines, such as game theory, control theory, Data from the training set can be as varied as a corpus of text, available, including on-premises storage, cloud storage, and operations research, information theory, simulation-based a collection of images, sensor data, and data collected from hybrid storage. Organizations need to choose the appropriate optimization, multi-agent systems, swarm intelligence, individual users of a service. Overfitting is something to watch storage technology that meets their needs in terms of statistics and genetic algorithms. In machine learning, the out for when training a machine learning model. Trained capacity, accessibility, security, and cost. Knowledge environment is typically represented as a Markov decision models derived from biased or nonevaluated data can result management involves the creation, storage, and retrieval of process (MDP). in skewed or undesired predictions. Bias models may result in knowledge and information that is relevant to an detrimental outcomes thereby furthering the negative organization. This includes tacit knowledge, such as expertise impacts to society or objectives. and experience of employees, as well as explicit knowledge, such as documents, procedures, and policies. Knowledge MACHINE LEARNING MODELS - Artificial neural networks - management systems use technologies such as databases, (ANNs), or connectionist systems, are computing systems search engines, and collaboration tools to manage and share vaguely inspired by the biological neural networks that AREAS OF APPLICATION - Artificial Intelligence (AI) and knowledge effectively within an organization. constitute animal brains. Such systems "learn" to perform Machine Learning (ML): AI and ML can be applied in tasks by considering examples, generally without being numerous areas such as finance, healthcare, transportation, DATA ANALYSIS - Data analysis is a process of inspecting, programmed with any task-specific rules. An ANN is a model agriculture, e-commerce, customer service, cybersecurity, cleansing, transforming, and modeling data with the goal of based on a collection of connected units or nodes called education, and many others. For instance, AI can help discovering useful information, informing conclusions, and "artificial neurons", which loosely model the neurons in a diagnose diseases, detect fraud, and personalize content. - supporting decision-making. Data analysis has multiple facets biological brain. Each connection, like the synapses in a Internet of Things (IoT): IoT can be applied in various areas and approaches, encompassing diverse techniques under a biological brain, can transmit information, a "signal", from such as smart homes, smart cities, healthcare, logistics, variety of names, and is used in different business, science, one artificial neuron to another. An artificial neuron that agriculture, and many others. For example, IoT can help track and social science domains. In today's business world, data receives a signal can process it and then signal additional inventory, monitor patient health, and optimize energy analysis plays a role in making decisions more scientific and artificial neurons connected to it. In common ANN consumption. - Blockchain: Blockchain can be applied in helping businesses operate more effectively. Data mining is a implementations, the signal at a connection between artificial various areas such as finance, supply chain management, particular data analysis technique that focuses on statistical neurons is a real number, and the output of each artificial voting systems, and identity verification. For example, modelling and knowledge discovery for predictive rather than neuron is computed by some non-linear function of the sum blockchain can help improve supply chain transparency and purely descriptive purposes, while business intelligence of its inputs. The connections between artificial neurons are traceability, enhance security in financial transactions, and covers data analysis that relies heavily on aggregation, called "edges". Artificial neurons and edges typically have a verify identity without the need for intermediaries. - focusing mainly on business information. In statistical weight that adjusts as learning proceeds. The weight Virtual and Augmented Reality (VR/AR): VR and AR can be applications, data analysis can be divided into descriptive increases or decreases the strength of the signal at a applied in various areas such as entertainment, education, statistics, exploratory data analysis (EDA), and confirmatory connection. - Decision tree learning uses a decision tree as healthcare, and marketing. For example, VR can help simulate data analysis (CDA). EDA focuses on discovering new features a predictive model to go from observations about an item training environments, while AR can help provide interactive in the data while CDA focuses on confirming or falsifying (represented in the branches) to conclusions about the item's product demonstrations. - Robotics: Robotics can be existing hypotheses. Predictive analytics focuses on the target value (represented in the leaves). It is one of the applied in various areas such as manufacturing, logistics, application of statistical models for predictive forecasting or predictive modeling approaches used in statistics, data healthcare, and entertainment. For instance, robotics can classification, while text analytics applies statistical, linguistic, mining, and machine learning. Tree models where the target help automate repetitive tasks, provide assistance to people and structural techniques to extract and classify information variable can take a discrete set of values are called with disabilities, and perform surgeries. BUSINESS from textual sources, a species of unstructured data. All of classification trees; in these tree structures, leaves represent INTELLIGENCE - Business intelligence (BI) comprises the the above are varieties of data analysis. Data integration is a class labels and branches represent conjunctions of features strategies and technologies used by enterprises for the data precursor to data analysis, and data analysis is closely linked that lead to those class labels. Decision trees where the analysis and management of business information. Common to data visualization and data dissemination. INTRODUCTION target variable can take continuous values (typically real functions of business intelligence technologies include TO R PROGRAMMING - R is a programming language for numbers) are called regression trees. In decision analysis, a reporting, online analytical processing, analytics, dashboard statistical computing and graphics supported by the R Core decision tree can be used to visually and explicitly represent development, data mining, process mining, complex event Team and the R Foundation for Statistical Computing. Created decisions and decision making. In data mining, a decision tree processing, business performance management, by statisticians Ross Ihaka and Robert Gentleman, R is used describes data, but the resulting classification tree can be an benchmarking, text mining, predictive analytics, and among data miners and statisticians for data analysis and input for decision making. - Support-vector machines prescriptive analytics.BI technologies can handle large developing statistical software. Users have created packages (SVMs), also known as support-vector networks, are a set of amounts of structured and sometimes unstructured data to to augment the functions of the R language. The official R related supervised learning methods used for classification help identify, develop, and otherwise create new strategic software environment is an open-source free software and regression. Given a set of training examples, each marked business opportunities. They aim to allow for the easy environment within the GNU package, available under the as belonging to one of two categories, an SVM training interpretation of these big data. Identifying new GNU General Public License. It is written primarily in C, algorithm builds a model that predicts whether a new opportunities and implementing an effective strategy based Fortran, and R itself (partially self-hosting). Precompiled example falls into one category or the other.[69] An SVM on insights can provide businesses with a competitive market executables are provided for various operating systems. R has training algorithm is a nonprobabilistic, binary, linear advantage and long-term stability. Business intelligence can a command line interface. Multiple third-party graphical user classifier, although methods such as Platt scaling exist to use be used by enterprises to support a wide range of business interfaces are also available, such as RStudio, an integrated SVM in a probabilistic classification setting. In addition to decisions ranging from operational to strategic. Basic development environment, and Jupyter, a notebook interface. performing linear classification, SVMs can efficiently perform operating decisions include product positioning or pricing. Features - Statistics : R and its libraries implement various a non-linear classification using what is called the kernel trick, Strategic business decisions involve priorities, goals, and statistical and graphical techniques, including linear and implicitly mapping their inputs into high-dimensional feature directions at the broadest level. elements of business nonlinear modeling, classical statistical tests, spatial and time- spaces. - Regression analysis encompasses a large variety of intelligence - Multidimensional aggregation and allocation. series analysis, classification, clustering, and others. R is easily statistical methods to estimate the relationship between Denormalization, tagging, and standardization. Realtime extensible through functions and extensions, and its input variables and their associated features. Its most reporting with analytical alert. A method of interfacing community is noted for contributing packages. Many of R's common form is linear regression, where a single line is with unstructured data sources. Group consolidation, standard functions are written in R,[citation needed] which drawn to best fit the given data according to a mathematical budgeting, and rolling forecasts. Statistical inference and makes it easy for users to follow the algorithmic choices criterion such as ordinary least squares. The latter is often probabilistic simulation. Key performance indicators made. For computationally intensive tasks, C, C++, and extended by regularization (mathematics) methods to optimization. Version control and process management. Fortran code can be linked and called at run time. Advanced mitigate overfitting and bias, as in ridge regression. When Open item management. Importance - Identify ways to users can write C, C++, Java, .NET or Python code to dealing with non-linear problems, go-to models include increase profit. Analyze customer behavior. Compare manipulate R objects directly. R is highly extensible through polynomial regression (for example, used for trendline fitting data with competitors. Track performance. Optimize the use of packages for specific functions and specific in Microsoft Excel[70]), logistic regression (often used in operations. Predict success. Spot market trends. applications. Due to its S heritage, R has stronger statistical classification) or even kernel regression, which Discover issues or problems. DATA GATHERING (OR DATA objectoriented programming facilities than most statistical computing languages.[citation needed] Extending it is computations for SEM and displays the results. (usually automated) process of sorting and understanding facilitated by its lexical scoping rules. - Programming: R is an INTRODUCTION TO MS EXCEL - Microsoft Excel is a textual data. TYPES OF ANALYSIS - 1. Descriptive analysis - interpreted language; users can access it through a spreadsheet developed by Microsoft for Windows, macOS, What happened.: The descriptive analysis method is the command-line interpreter. If a user types 2+2 at the R Android and iOS. It features calculation or computation starting point to any analytic process, and it aims to answer command prompt and presses enter, the computer replies capabilities, graphing tools, pivot tables, and a macro the question of what happened? It does this by ordering, with 4.Like languages such as APL and MATLAB, R supports programming language called Visual Basic for Applications manipulating, and interpreting raw data from various sources matrix arithmetic. R's data structures include vectors, (VBA). Excel forms part of the Microsoft Office suite of to turn it into valuable insights to your business. Performing matrices, arrays, data frames (similar to tables in a relational software. Microsoft Excel has the basic features of all descriptive analysis is essential, as it allows us to present our database) and lists. Arrays are stored in column-major order. spreadsheets, using a grid of cells arranged in numbered rows data in a meaningful way. Although it is relevant to mention R's extensible object system includes objects for (among and letter-named columns to organize data manipulations like that this analysis on its own will not allow you to predict others): regression models, time-series and geospatial arithmetic operations. It has a battery of supplied functions to future outcomes or tell you the answer to questions like why coordinates. R has no scalar data type. Instead, a scalar is answer statistical, engineering, and financial needs. In something happened, but it will leave your data organized represented as a length-one vector. Many features of R derive addition, it can display data as line graphs, histograms and and ready to conduct further analysis. 2. Diagnostic analysis - from Scheme. R uses S-expressions to represent both data charts, and with a very limited three-dimensional graphical Why it happened.: One of the most powerful types of data and code. Functions are first-class objects and can be display. It allows sectioning of data to view its dependencies analysis. Diagnostic data analytics empowers analysts and manipulated in the same way as data objects, facilitating on various factors for different perspectives (using pivot business executives by helping them gain a firm contextual metaprogramming that allows multiple dispatch. Variables in tables and the scenario manager). A PivotTable is a tool for understanding of why something happened. If you know why R are lexically scoped and dynamically typed. Function data analysis. It does this by simplifying large data sets via something happened as well as how it happened, you will be arguments are passed by value, and are lazy—that is to say, PivotTable fields It has a programming aspect, Visual Basic for able to pinpoint the exact ways of tackling the issue or they are only evaluated when they are used, not when the Applications, allowing the user to employ a wide variety of challenge. Designed to provide direct and actionable answers function is called. INTRODUCTION TO PYTHON - Python is an numerical methods, for example, for solving differential to specific questions, this is one of the world’s most interpreted high-level general-purpose programming equations of mathematical physics, and then reporting the important methods in research, among its other key language. Its design philosophy emphasizes code readability results back to the spreadsheet. It also has a variety of organizational functions such as retail analytics, 3. Predictive with its use of significant indentation. Its language constructs interactive features allowing user interfaces that can analysis - What will happen.: The predictive method allows as well as its objectoriented approach aim to help completely hide the spreadsheet from the user, so the you to look into the future to answer the question: what will programmers write clear, logical code for small and large- spreadsheet presents itself as a so-called application, or happen? In order to do this, it uses the results of the scale projects. Python is dynamically-typed and garbage- decision support system (DSS), via a custom-designed user previously mentioned descriptive, exploratory, and diagnostic collected. It supports multiple programming paradigms, interface, for example, a stock analyzer, or in general, as a analysis, in addition to machine learning (ML) and artificial including structured (particularly, procedural), object-oriented design tool that asks the user questions and provides answers intelligence (AI). Like this, you can uncover future trends, and functional programming. It is often described as a and reports. In a more elaborate realization, an Excel potential problems or inefficiencies, connections, and "batteries included" language due to its comprehensive application can automatically poll external databases and casualties in your data. 4. Exploratory analysis - How to measuring instruments using an update schedule, analyze the explore data relationships.: As its name suggests, the main standard library. Python is a multi-paradigm programming results, make a Word report or PowerPoint slide show, and e- aim of the exploratory analysis is to explore. Prior to it, language. Object-oriented programming and structured mail these presentations on a regular basis to a list of there's still no notion of the relationship between the data programming are fully supported, and many of its features participants. Excel was not designed to be used as a database. and the variables. Once the data is investigated, the support functional programming and aspect-oriented Key data analysis techniques used in creating data sets for exploratory analysis enables you to find connections and programming (including by metaprogramming and business - 1. Regression analysis: Regression analysis is used generate hypotheses and solutions for specific problems. A metaobjects (magic methods)). Many other paradigms are to estimate the relationship between a set of variables. When typical area of application for exploratory analysis is data supported via extensions, including design by contract and conducting any type of regression analysis, you’re looking to mining. 5. Diagnostic analysis - Why it happened.: One of the see if there’s a correlation between a dependent variable most powerful types of data analysis. Diagnostic data logic programming. Python uses dynamic typing and a (that’s the variable or outcome you want to measure or analytics empowers analysts and business executives by combination of reference counting and a cycle-detecting predict) and any number of independent variables (factors helping them gain a firm contextual understanding of why garbage collector for memory management. It also features which may have an impact on the dependent variable). The something happened. If you know why something happened dynamic name resolution (late binding), which binds method aim of regression analysis is to estimate how one or more as well as how it happened, you will be able to pinpoint the and variable names during program execution. variables might impact the dependent variable, in order to exact ways of tackling the issue or challenge. 6. Prescriptive identify trends and patterns. This is especially useful for analysis - How will it happen.: Another of the most effective INTRODUCTION TO SPSS - SPSS Statistics is a statistical making predictions and forecasting future trends. types of data analysis methods in research. Prescriptive data software suite developed by IBM for data management, techniques cross over from predictive analysis in the way advanced analytics, multivariate analysis, business 2. Monte Carlo simulation: When making decisions or taking that. intelligence, criminal investigation. Long produced by SPSS certain actions, there are a range of different possible Inc., it was acquired by IBM in 2009. Current versions (post outcomes. If you take the bus, you might get stuck in traffic. If 2015) have the brand name: IBM SPSS Statistics. SPSS is a you walk, you might get caught in the rain or bump into your widely used program for statistical analysis in social science. It chatty neighbor, potentially delaying your journey. In is also used by market researchers, health researchers, survey everyday life, we tend to briefly weigh up the pros and cons companies, government, education researchers, marketing before deciding which action to take; however, when the organizations, data miners, and others. The original SPSS stakes are high, it’s essential to calculate, as thoroughly and manual (Nie, Bent & Hull, 1970) has been described as one of accurately as possible, all the potential risks and rewards. 3. "sociology's most influential books" for allowing ordinary Factor analysis: Factor analysis is a technique used to reduce researchers to do their own statistical analysis. In addition to a large number of variables to a smaller number of factors. It statistical analysis, data management (case selection, file works on the basis that multiple separate, observable reshaping, creating derived data) and data documentation (a variables correlate with each other because they are all metadata dictionary is stored in the datafile) are features of associated with an underlying construct. This is useful not the base software. The many features of SPSS Statistics are only because it condenses large datasets into smaller, more accessible via pull-down menus or can be programmed with a manageable samples, but also because it helps to uncover proprietary 4GL command syntax language. Command syntax hidden patterns. This allows you to explore concepts that programming has the benefits of reproducible output, cannot be easily measured or observed—such as wealth, simplifying repetitive tasks, and handling complex data happiness, fitness, or, for a more business-relevant example, manipulations and analyses. Additionally, some complex customer loyalty and satisfaction. 4. Cohort analysis: applications can only be programmed in syntax and are not Cohort analysis is defined on Wikipedia as follows: “Cohort accessible through the menu structure. The pull-down menu analysis is a subset of behavioral analytics that takes the data interface also generates command syntax: this can be from a given dataset and rather than looking at all users as displayed in the output, although the default settings have to one unit, it breaks them into related groups for analysis. be changed to make the syntax visible to the user. They can These related groups, or cohorts, usually share common also be pasted into a syntax file using the "paste" button characteristics or experiences within a defined time-span.” 5. present in each menu. Programs can be run interactively or Cluster analysis: Cluster analysis is an exploratory technique unattended, using the supplied Production Job Facility. that seeks to identify structures within a dataset. The goal of INTRODUCTION TO AMOS - IBM® SPSS® Amos is a powerful cluster analysis is to sort different data points into groups (or structural equation modeling (SEM) software helping support clusters) that are internally homogeneous and externally your research and theories by extending standard heterogeneous. This means that data points within a cluster multivariate analysis methods, including regression, factor are similar to each other, and dissimilar to data points in analysis, correlation and analysis of variance. Build attitudinal another cluster. Clustering is used to gain insight into how and behavioral models reflecting complex relationships more data is distributed in a given dataset, or as a preprocessing accurately than with standard multivariate statistics step for other algorithms. 6. Time series analysis: Time series techniques using either an intuitive graphical or analysis is a statistical technique used to identify trends and programmatic user interface. Amos is included in the cycles over time. Time series data is a sequence of data points Premium edition of SPSS Statistics (except in Campus Edition, which measure the same variable at different points in time where it is sold separately). You can also buy Amos as part of (for example, weekly sales figures or monthly email sign-ups). the Base, Standard and Professional editions of SPSS By looking at time-related trends, analysts are able to forecast Statistics, or separately as a standalone application. For how the variable of interest may fluctuate in the future. 7. Windows only. AMOS is statistical software and it stands for Sentiment analysis: When you think of data, your mind analysis of a moment structures. AMOS is an added SPSS probably automatically goes to numbers and spreadsheets. module, and is specially used for Structural Equation Many companies overlook the value of qualitative data, but in Modeling, path analysis, and confirmatory factor analysis. It is reality, there are untold insights to be gained from what also known as analysis of covariance or causal modeling people (especially customers) write and say about you. So software. AMOS is a visual program for structural equation how do you go about analyzing textual data? One highly modeling(SEM). In AMOS, we can draw models graphically useful qualitative technique is sentiment analysis, a technique using simple drawing tools. AMOS quickly performs the which belongs to the broader category of text analysis—the