0% found this document useful (0 votes)

8 views18 pages

Privacy and

Uploaded by

phdyugandhar077

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views18 pages

Privacy and

Uploaded by

phdyugandhar077

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

UNIT – I

Fundamentals of defining privacy and developing efficient algorithms for enforcing privacy, challenges in developing privacy preserving
algorithms in real-world applications, privacy issues, privacy models.

suggest Fundamentals of defining privacy and developing efficient algorithms for enforcing privacy: Defining privacy involves establishing
clear boundaries for data access and use, while developing efficient algorithms for enforcing privacy requires techniques like differential privacy
and anonymization, balancing utility with protection.

1. Defining Privacy:

Input Privacy: Protecting the confidentiality of the data used as input in a computation, ensuring that no party learns more than their prescribed
output.

Output Privacy: Guaranteeing that the published results of a computation do not contain identifiable input data beyond what is allowable by
the input parties.

Policy Enforcement: Implementing mechanisms to ensure that data is processed and used according to defined privacy policies.
Formal Definitions: Robust and mathematically rigorous definitions of privacy are crucial to understand the trade-off between statistical utility
and privacy.
Threat Model: Understanding the potential adversaries and their capabilities is essential for designing effective privacy-preserving algorithms.
2. Developing Efficient Algorithms for Enforcing Privacy:
Differential Privacy: A statistical technique that adds noise to data or algorithm outputs to ensure that an individual's data does not reveal
whether or not it was used in a dataset.
Epsilon Parameter: A key parameter in differential privacy that quantifies the level of privacy protection, with smaller values indicating
stronger privacy guarantees but potentially lower accuracy.
Noise Mechanisms: Techniques like Laplace noise or Gaussian noise can be used to add perturbation to data, ensuring that individual data
points are not identifiable.
 Anonymization:
Removing or modifying identifiable attributes from data, such as names, addresses, or social security numbers.

K-Anonymity: A technique that ensures that each individual's data is indistinguishable from at least k-1 other individuals in the dataset.

L-Diversity: A technique that ensures that each group of k individuals has a similar distribution of sensitive attributes.
 Privacy-Preserving Techniques:

Cryptographic Methods: Using encryption and other cryptographic techniques to protect data during storage and transmission.

Secure Multi-Party Computation: Enabling computation on data without revealing the data itself to any party.
 Data Perturbation: Adding noise or modifying data in a way that preserves its utility while protecting privacy.
 Computational Complexity:
Balancing the computational cost of privacy-preserving algorithms with the level of privacy they provide.
 Data Utility:
Finding a balance between the level of privacy provided and the usefulness of the data for analysis and decision-making.

Defining privacy requires understanding the balance between data utility and individual protection, while efficient privacy-enforcing algorithms,
like differential privacy, aim to protect sensitive information while still allowing meaningful analysis.
Fundamentals of Defining Privacy:

 Data Utility vs. Privacy: core challenge is finding the right balance between allowing data analysis and protecting individual
privacy.

  Threat Model: Understanding the potential risks and adversaries that could compromise privacy is crucial for designing effective
privacy-preserving techniques.
  Privacy Goals:
Defining specific privacy goals, such as input privacy (protecting the source data) or output privacy (protecting the results), is essential.
  Mathematical Rigor:
Formalizing privacy definitions, like differential privacy, allows for rigorous analysis and guarantees of privacy protection.
Developing Efficient Algorithms for Enforcing Privacy: Differential privacy (DP) is a mathematically rigorous framework for releasing
statistical information about datasets while protecting the privacy of individual data subjects. It enables a data holder to share aggregate patterns
of the group while limiting information that is leaked about specific individuals.[1][2] This is done by injecting carefully calibrated noise into
statistical computations such that the utility of the statistic is preserved while provably limiting what can be inferred about any individual in the
dataset.

 Another way to describe differential privacy is as a constraint on the algorithms used to publish aggregate information about a
statistical database which limits the disclosure of private information of records in the database. For example, differentially private
algorithms are used by some government agencies to publish demographic information or other statistical aggregates while ensuring
confidentiality of survey responses, and by companies to collect information about user behavior while controlling what is visible even
to internal analysts.
 Roughly, an algorithm is differentially private if an observer seeing its output cannot tell whether a particular individual's information
was used in the computation. Differential privacy is often discussed in the context of identifying individuals whose information may
be in a database. Although it does not directly refer to identification and reidentification attacks, differentially private algorithms
provably resist such attacks.[3]

A robust approach that adds noise to datasets to ensure that no individual's data can be uniquely identified from the results of an
analysis.

 Privacy-Preserving Techniques:
Privacy-preserving techniques include homomorphic encryption, differential privacy, secure multiparty computation, and federated learning.
These techniques help protect personal information and ensure that individuals can benefit from data sharing without having their privacy
compromised.
Privacy-preserving techniques

Homomorphic encryption: Encrypts sensitive data so that it can be processed without being decrypted

  Differential privacy: Adds noise to data points to prevent the identification of individuals in a dataset
  Secure multiparty computation: Allows multiple parties to collaborate without revealing individual inputs
  Federated learning: Trains models on decentralized data without centralization
  K-anonymity: An algorithm that ensures that multiple records are indistinguishable when identifying a person's data
Other privacy-preserving strategies Using encrypted communication channels and data storage, Obtaining informed consent from participants,
Limiting access to sensitive information, and Adhering to relevant legal and ethical guidelines.
Data privacy is important for protecting personal information, establishing trust, and complying with regulations.

You can also watch this video to learn more about privacy-preserving machine learning:

Methods like k-anonymity, l-diversity, and randomization can be used to

anonymize data and reduce the risk of re-identification.
  Secure Multi-Party Computation (MPC):
Allows multiple parties to collaborate on a computation without revealing
their private inputs.

Secure multi-party computation (SMPC) is a cryptographic technique that

allows multiple parties to work together on a task while keeping their data
private. It is a collection of protocols that can be used to perform
computations on private inputs.
How it works

 SMPC distributes a computation across multiple parties.

 Each party has their own privacy restrictions.
 The parties can't see each other's data.
 The parties decide who can see the results of the computation.

Benefits :SMPC allows parties to perform joint computations while keeping their data secure.

 It allows parties to keep control over who receives the results of the computation.
 It guarantees that computations have been performed correctly.

Applications :SMPC is used in healthcare to securely share data and conduct collaborative research.
  SMPC is used by financial institutions to secure digital assets.
History :Chinese computer scientist Andrew Yao introduced SMPC in the 1980s.

 The first practical application of SMPC at scale was in a sugar beet

auction in Denmark in 2008.

  Homomorphic Encryption:
Enables computations on encrypted data, allowing for data analysis without decryption.
Homomorphic encryption is a type of encryption that lets you perform calculations on encrypted data without decrypting it first. It's a form of
cryptography that helps keep data confidential while still allowing computations to be performed on it.
How it works

 The client encrypts a query and sends it to a server.

 The server performs operations on the encrypted query and returns an encrypted response.
 The client decrypts the response.

Benefits of homomorphic encryption

 It can help companies share sensitive data with third parties without exposing the data.

  It can help preserve customer privacy in industries like healthcare, financial services, and IT.
  It can help voters check if their vote was counted correctly without revealing how they voted.
Types of homomorphic encryption

 Partially homomorphic: Only allows one operation, like addition or multiplication.

 Somewhat homomorphic: Allows more than one operation, but in fixed combinations.
 Fully homomorphic: Allows arbitrary operations to be applied to encrypted data in any combination.

The word "homomorphic" comes from Greek words meaning "same structure"

  Input and Output Privacy:

Ensuring that both the input data and the output results are protected from unauthorized
access or disclosure.

Input privacy protects the secrecy of data entering a system, while output privacy
ensures that the results produced by the system don't reveal too much about the
original input data.
Input Privacy:

 Definition:

Input privacy focuses on safeguarding the data that is fed into a

system or model, preventing unauthorized parties from accessing
or learning about the raw data.

  Importance:
It's crucial in scenarios where sensitive information is involved, like medical records, financial data, or personal communications.
  Techniques:
Techniques like encryption, data anonymization, and secure multi-party computation can help ensure input privacy.
Output Privacy:

 Definition:

Output privacy concerns the protection of the results generated by a system or model, ensuring that the output doesn't inadvertently
reveal too much about the original input data.

  Importance:
This is particularly relevant in machine learning, where models can be trained on sensitive data, and the predictions or insights derived from that
data need to be protected.
  Techniques:
Techniques like differential privacy, k-anonymity, and secure enclaves can help ensure output privacy.
Relationship between Input and Output Privacy:

 Interdependence:

Input privacy is a prerequisite for achieving output privacy. If the input data is not protected, then even with output privacy techniques,
it may still be possible to infer information about the input data from the output.

  Complementary:
Both input and output privacy are essential for building truly private systems. By addressing both aspects, we can ensure that data is protected
throughout its lifecycle.
Privacy Parameters:
Selecting appropriate privacy parameters, such as the epsilon parameter in differential privacy, is crucial for achieving a balance between
privacy and utility.

Privacy parameters refer to the settings and controls that allow users to manage and customize how their personal information is handled and
accessed by applications, websites, and services. These settings enable users to determine what data is shared, with whom, and for what
purposes.
Here's a more detailed explanation:

What they are: Privacy parameters are the mechanisms through which users can control their privacy online and offline. They allow you to
make choices about how your data is collected, stored, used, and shared.

 Examples: App Permissions: Allowing or denying apps access to your microphone, camera, location, contacts, and other sensitive data.
  Location Settings: Choosing which apps can access your location and the level of accuracy (e.g., approximate vs. precise).
  Data Sharing Options: Controlling who can see your posts, contacts, and other information on social media platforms.
  Cookie Settings: Managing how websites track your browsing activity and use cookies.
  Advertising Preferences: Opting out of personalized ads and controlling how your browsing data is used for advertising purposes.
  Data Deletion: Setting time limits for how long activity data is kept in your account and automatically deleting it.
 Why they matter:
Privacy parameters are crucial for protecting your personal information and maintaining control over your digital footprint. They help you:

Reduce the risk of data breaches and misuse: By limiting the data you share and controlling who can access it.

  Protect your privacy: By choosing what information is visible to others and what is kept private.
  Stay informed about how your data is being used: By reviewing privacy policies and understanding the data collection practices of the
services you use.
  Exercise your rights: By having the ability to access, correct, and delete your personal data.
 Where to find them:
Privacy parameters are typically found in the settings menu of:

 Your phone or device: Android and iOS operating systems offer a range of privacy settings and controls.

  Web browsers: Chrome, Firefox, Safari, and other browsers have privacy settings for cookies, tracking, and more.
  Social media platforms: Facebook, Instagram, Twitter, and other platforms allow you to manage your privacy settings and control who
can see your information.
  Applications and services: Most apps and services have privacy settings that allow you to control how they use your data.
 Key Concepts:

Data Minimization: Only collecting and storing the data that is necessary for a specific purpose.

  Purpose Limitation: Using data only for the purposes for which it was collected.
  Transparency: Being open and honest about how data is collected and used.
  Accountability: Taking responsibility for the protection of personal data

 Epsilon:
A critical parameter in differential privacy that quantifies the level of privacy protection. A smaller epsilon value indicates stronger privacy
guarantees but can also mean lower accuracy of the statistical results.
Epsilon has multiple meanings: it's the fifth letter of the Greek alphabet (ε), a symbol used to represent a very small quantity in mathematics, and
also the name of a global data, technology, and services company focused on marketing and advertising.
Here's a more detailed explanation of the different contexts of "Epsilon":
1. Greek Letter and Mathematics:

 Greek Alphabet: Epsilon (ε) is the fifth letter of the Greek alphabet, corresponding to the vowel sound "e".
  Mathematics: In mathematics, epsilon (ε) is often used to represent a very small or arbitrarily small quantity, used to indicate that
a given quantity is small or close to zero.

 Number.EPSILON: In JavaScript, Number.EPSILON represents the difference between 1 and the next greater number representable in the
Number format.
2. Epsilon as a Company:

 Epsilon (Company):

Epsilon is a global data, technology, and services company that powers the marketing and advertising ecosystem.

  Publicis Groupe:
Epsilon is positioned at the center of Publicis Groupe, a global advertising and marketing technology company.
  Marketing and Advertising:
Epsilon helps brands harness their first-party data to activate campaigns across channels and devices, with an emphasis on proven outcomes and
respecting consumer privacy.
  Epsilon's Technology:
Epsilon's technology connects advertisers with consumers to drive performance, utilizing a people-based identity graph to reach real people, not
just cookies or devices.
  Epsilon India:
Epsilon India is a new-age digital marketing company that infuses data insights with creativity for tailored solutions, connecting people with
brands.
  Epsilon Advanced Materials:
Epsilon Advanced Materials is a company that operates a cutting-edge manufacturing facility in Vijayanagar, Karnataka, focused on graphite
anode materials for EV batteries.
  Epsilon India (Gifted Math Education):
There is also Epsilon India, which focuses on gifted math education for children.
  Epsilon Programmer's Editor:
Lugaru Software Ltd. makes the Epsilon Programmer's Editor, an advanced EMACS-style programmer's text editor for multiple operating
systems.

implementing privacy-preserving machine learning techniques involves several challenges.

These challenges stem from the need to balance privacy with the utility and effectiveness of the machine learning models. Here are some of the
key challenges:
1. Technical Complexity
**1.1. Algorithm Design:
Challenge: Developing privacy-preserving algorithms that maintain the utility of the machine learning models while protecting sensitive data is
technically complex. Techniques such as differential privacy, federated learning, and homomorphic encryption require specialized knowledge
and expertise.
Impact: The complexity can make it difficult to design, implement, and optimize algorithms that are both effective and privacy-preserving.
**1.2. Performance Overhead:
Challenge: Privacy-preserving techniques often introduce computational overhead. For example, differential privacy may require adding noise to
the data, and federated learning requires aggregating model updates from multiple sources.
Impact: The increased computational and communication costs can affect the performance and scalability of the machine learning models.
2. Data Utility vs. Privacy Trade-Off
**2.1. Loss of Accuracy:
Challenge: Privacy-preserving techniques can sometimes degrade the accuracy of machine learning models. For instance, adding noise in
differential privacy can lead to less precise results.
Impact: Finding the right balance between preserving privacy and maintaining data utility is a critical challenge. Overly stringent privacy
measures can reduce the effectiveness of the models.
2.2. Data Quality:
Challenge: Techniques like data anonymization or perturbation may lead to loss of data fidelity, affecting the quality of the input data used for
training.
Impact: Poor quality data can result in suboptimal model performance and reduce the reliability of the outcomes.
3. Integration and Deployment
**3.1. Integration with Existing Systems:
Challenge: Implementing privacy-preserving techniques requires integration with existing data processing and machine learning pipelines. This
can be challenging if existing systems are not designed with privacy in mind.
Impact: Integration issues can lead to increased development time, costs, and potential disruptions in existing workflows.
**3.2. Scalability:
Challenge: Some privacy-preserving techniques may not scale well with large datasets or high-dimensional data. For example, federated learning
requires coordinating between multiple parties, which can be challenging at scale.
Impact: Scalability issues can limit the applicability of privacy-preserving techniques in large-scale or real-time applications.
4. Regulatory and Compliance Issues
**4.1. Legal Requirements:
Challenge: Navigating and complying with data protection regulations (e.g., GDPR, CCPA) while implementing privacy-preserving techniques
can be complex. Regulations often have specific requirements for data handling and user consent.
Impact: Ensuring compliance adds an additional layer of complexity and may require ongoing legal and regulatory reviews.
**4.2. Documentation and Auditing:
Challenge: Demonstrating compliance and maintaining documentation for privacy-preserving practices can be burdensome.
Impact: Organizations need to invest in resources for proper documentation and auditing to ensure they meet regulatory requirements.
5. User Trust and Perception
**5.1. Building Trust:
Challenge: Users must trust that privacy-preserving techniques are effectively protecting their data. This requires clear communication and
transparency about how data is handled and protected.
Impact: Lack of trust can hinder the adoption of privacy-preserving technologies. Educating users and stakeholders about the benefits and
limitations of these techniques is essential.
**5.2. Perceived Privacy Risks:
Challenge: Users may still have concerns about privacy even when privacy-preserving techniques are implemented. For example, there may be
skepticism about the effectiveness of these techniques.
Impact: Addressing these concerns requires ongoing engagement and reassurance to ensure users feel confident about the privacy measures in
place.
6. Interoperability and Standardization
**6.1. Lack of Standards:
Challenge: There is often a lack of standardized methods and protocols for implementing privacy-preserving techniques. This can lead to
variations in implementation and effectiveness.
Impact: The absence of standards can create challenges in ensuring consistent and reliable privacy protection across different systems and
platforms.
**6.2. Interoperability Issues:
Challenge: Ensuring that privacy-preserving techniques work seamlessly across different systems and data sources can be difficult, especially in
heterogeneous environments.
Impact: Interoperability issues can complicate the deployment and integration of privacy-preserving solutions.
Summary
Implementing privacy-preserving machine learning techniques involves addressing technical complexity, managing trade-offs between data
utility and privacy, overcoming integration and scalability challenges, complying with regulatory requirements, building user trust, and
navigating issues related to standardization and interoperability. Addressing these challenges requires a multi-faceted approach involving
technical expertise, regulatory knowledge, and effective communication strategies.

privacy issues :
Cyberbullying,Privacy setting loopholes,Data misuse,False information,Tracking,Data mining,Identity theft,Location settings,Malware and
viruses,Third-party apps

Common social media privacy issues:With the large amount of data on user social media accounts, scammers can find enough information to
spy on users, steal identities and attempt scams. Data protection issues and loopholes in privacy controls can put user information at risk when
using social media. Other social media privacy issues include the following.1. Data mining for identity theft:Scammers do not need a great
deal of information to steal someone's identity. They can start with publicly available information on social media to help target victims. For
example, scammers can gather usernames, addresses, email addresses and phone numbers to target users with phishing scams.Even with an
email address or phone number, a scammer can find more information, such as leaked passwords, Social Security numbers and credit card
numbers.2. Privacy setting loopholes:Social media accounts may not be as private as users think. For example, if a user shared something with
a friend and they reposted it, the friend's friends can also see the information. The original user's reposted information is now in front of a
completely different audience.Even closed groups may not be completely private because postings can be searchable, including any comments.3.
Location settings:Location app settings may still track user whereabouts. Even if someone turns off their location settings, there are other ways
to target a device's location. The use of public Wi-Fi, cellphone towers and websites can also track user locations. Always check that the GPS
location services are turned off, and browse through a VPN to avoid being tracked.User location paired with personal information can provide
accurate information to a user profile. Bad actors can also use this data to physically find users or digitally learn more about their habits.4.
Harassment and cyberbullying:Social media can be used for cyberbullying. Bad actors don't need to get into someone's account to send
threatening messages or cause emotional distress. For example, children with social media accounts face backlash from classmates with
inappropriate comments.Doxxing -- a form of cyberbullying -- involves bad actors purposely sharing personal information about a person to
cause harm, such as a person's address or phone number. They encourage others to harass this person.5. False information:People can spread
disinformation on social media quickly. Trolls also look to provoke other users into heated debates by manipulating emotions.Most social media
platforms have content moderation guidelines, but it may take time for posts to be flagged. Double-check information before sending or
believing something on social media.6. Malware and viruses:Social media platforms can be used to deliver malware, which can slow down a
computer, attack users with ads and steal sensitive data. Cybercriminals take over the social media account and distribute malware to both the
affected account and all the user's friends and contacts.7. Third-party apps:Third-party apps are external apps that integrate with social media
platforms to offer additional features and services such as tools, games and quizzes. However, when you connect to these apps from your
account, you grant permission to access certain data such as photos, posts, friend lists and messages. These apps can misuse your data and collect
additional information for unintended purposes -- such as selling your information to data brokers or targeted advertising. You are also open to
security vulnerabilities if these apps have weaker security controls.Privacy models are frameworks used to understand, protect, and manage
privacy in various contexts, encompassing legal, technological, and social aspects. Key models include Differential Privacy, K-Anonymity, and
Privacy by Design, each with distinct approaches to data protection. Here's a more detailed explanation of some common privacy models:
Differential Privacy: This model focuses on protecting sensitive data by adding noise to datasets, ensuring that an individual's data does not
significantly influence the output of a computation or analysis. It guarantees that the probability of any possible output of the anonymization
process does not change "by much" if data of an individual is added to or removed from input data.
  K-Anonymity:
This model aims to protect individuals by ensuring that each record in a dataset is indistinguishable from at least k-1 other records, making it
difficult to identify an individual based on their attributes.
  L-Diversity:
This model builds upon k-anonymity by adding a further constraint: even with background knowledge, an individual's data should not be
uniquely identifiable.
  Privacy by Design:
This approach emphasizes integrating privacy considerations into the design and development of systems, products, and services from the outset,
rather than as an afterthought. It prioritizes proactive, preventative measures and user-centric design.
  Other Privacy Models and Considerations:

 Data Minimization: This principle advocates for collecting and processing only the necessary data, limiting the scope of potential
privacy breaches.

  Transparency and User Control: Users should be informed about how their data is being collected, used, and protected, and have
control over their data.
  Security Measures: Implementing robust security measures to protect data from unauthorized access and breaches is crucial.

  Ethical Considerations: Privacy models should also consider ethical implications and strive to balance privacy with other legitimate

interests.

UNIT _2
Anonymization operations, information metrics, Anonymization methods for the transaction data, trajectory data, social networks data, and
textual data, Collaborative Anonymization.
Anonymization operations :What Is Data Anonymization

Data anonymization is the process of protecting private or sensitive

information by erasing or encrypting identifiers that connect an
individual to stored data. For example, you can run Personally
Identifiable Information (PII) such as names, social security numbers,
and addresses through a data anonymization process that retains the data
but keeps the source anonymous.However, even when you clear data of
identifiers, attackers can use de-anonymization methods to retrace the
data anonymization process. Since data usually passes through multiple
sources—some available to the public—de-anonymization techniques
can cross-reference the sources and reveal personal information.The General Data Protection Regulation (GDPR) outlines a specific set of rules
that protect user data and create transparency. While the GDPR is strict, it permits companies to collect anonymized data without consent, use it
for any purpose, and store it for an indefinite time—as long as companies remove all identifiers from the data.

Data Anonymization Techniques

 Data masking—hiding data with altered values. You can create a mirror version of a database and apply modification techniques such
as character shuffling, encryption, and word or character substitution. For example, you can replace a value character with a symbol
such as “*” or “x”. Data masking makes reverse engineering or detection impossible.
 Pseudonymization—a data management and de-identification method that replaces private identifiers with fake identifiers or
pseudonyms, for example replacing the identifier “John Smith” with “Mark Spencer”. Pseudonymization preserves statistical accuracy
and data integrity, allowing the modified data to be used for training, development, testing, and analytics while protecting data privacy.
 Generalization—deliberately removes some of the data to make it less identifiable. Data can be modified into a set of ranges or a
broad area with appropriate boundaries. You can remove the house number in an address, but make sure you don’t remove the road
name. The purpose is to eliminate some of the identifiers while retaining a measure of data accuracy.
 Data swapping—also known as shuffling and permutation, a technique used to rearrange the dataset attribute values so they don’t
correspond with the original records. Swapping attributes (columns) that contain identifiers values such as date of birth, for example,
may have more impact on anonymization than membership type values.
 Data perturbation—modifies the original dataset slightly by applying techniques that round numbers and add random noise. The
range of values needs to be in proportion to the perturbation. A small base may lead to weak anonymization while a large base can
reduce the utility of the dataset. For example, you can use a base of 5 for rounding values like age or house number because it’s
proportional to the original value. You can multiply a house number by 15 and the value may retain its credence. However, using
higher bases like 15 can make the age values seem fake.
 Synthetic data—algorithmically manufactured information that has no connection to real events. Synthetic data is used to create
artificial datasets instead of altering the original dataset or using it as is and risking privacy and security. The process involves creating
statistical models based on patterns found in the original dataset. You can use standard deviations, medians, linear regression or other
statistical techniques to generate the synthetic data.
 Disadvantages of Data Anonymization

The GDPR stipulates that websites must obtain consent from users to collect personal information such as IP addresses, device ID, and cookies.
Collecting anonymous data and deleting identifiers from the database limit your ability to derive value and insight from your data. For example,
anonymized data cannot be used for marketing efforts, or to personalize the user experience.

Information metrics :Information metrics are quantifiable measures used to

assess and track the performance or production of a system or process, providing
insights and data for decision-making and improvement.
Here's a more detailed explanation:

 Definition: Metrics are pieces of collected data that help measure against
a stated goal. They are data that are measured against a specific objective
or target.

  Purpose:
Metrics are used to: Track performance over time.
  Compare different systems or processes.
  Identify areas for improvement.   Make data-driven decisions.
 Examples: Business Performance Metrics: Productivity, profit margin, customer satisfaction, and market share.
  Product Metrics: Activation rate, time to activate, churn rate, and customer lifetime value.
  Social Media Metrics: Engagement rate, reach, and click-through rate.
  Financial Metrics: Revenue, expenses, profit, and debt.
  Information Security Metrics: Number of security incidents, time to detect and resolve incidents, and data breach frequency.
 Types of Metrics:

 Key Performance Indicators (KPIs): Specific metrics that are critical to achieving organizational goals.
  Derived Metrics: Metrics calculated from other metrics.
  Acquisition Metrics: Metrics that measure the effectiveness of acquiring new customers or users.
  Activation Metrics: Metrics that measure the extent to which users engage with a product or service after acquiring them.
  Retention Metrics: Metrics that measure the extent to which users continue to use a product or service.
 Importance:
Metrics are essential for: Understanding the current state of a system or process.
  Identifying areas where improvements can be made.
  Making data-driven decisions.
  Communicating performance to stakeholders

Anonymization methods for the transaction data

To anonymize transaction data, you can employ methods like hashing, masking, pseudonymization, generalization, and tokenization, each
offering varying levels of privacy and data utility.
Here's a breakdown of these techniques:
Hashing: Replaces sensitive data with a unique, one-way hash value, making it impossible to reverse to the original data.
  Masking:
Obscures or alters the values in the original data set by replacing them with artificial data that appears genuine but has no real connection to the
original.
  Pseudonymization:
Replaces Personally Identifiable Information (PII) with pseudonyms or codes, allowing for a separate mapping between original and
pseudonymized data, which enables restoring the original information if necessary.
  Generalization:
Reduces the detail of the original data by grouping or aggregating data into broader categories, making it harder to identify individuals.
  Tokenization:
Replaces sensitive data with a non-sensitive token, which is a random value that has no connection to the original data, allowing businesses to
more freely analyze, profile, and share tokenized data

Trajectory data is a series of points that show the path of a moving object over time.
It can be used to study the behavior of objects, such as vehicles or satellites.
How is trajectory data created?

 Sensors capture the position of an object to create raw trajectory data

 This data is then structured to make it easier to use

Where is trajectory data used?

 Transportation

Researchers use trajectory data to study traffic patterns and develop

solutions for transportation issues

 Urban planning :Researchers use trajectory data to study urban dynamics

and develop solutions for urban planning issues
 Environmental protection :Researchers use trajectory data to study environmental issues and develop solutions for environmental
protection issues

How is trajectory data analyzed? Geoprocessing tools can be used to analyze trajectory data
  Trajectory profile charts can be used to visualize and analyze trajectory data
  Deep learning can be used to analyze trajectory data
Examples of trajectory data:

 The path of a thrown ball or rock

 The orbital path of a satellite
 The path of a vehicle moving on a road
Social network data refers to information about individuals and their relationships within a community, used for various applications like
criminology, terrorism analysis, and public health, often involving sensitive data. Here's a more detailed breakdown of what social network data
entails:
What it is: Information about individuals and their connections:

Social network data encompasses information about people and the relationships they have with each other, forming a network or
structure.

  Relational data:
It focuses on the relationships (ties, connections) between individuals, rather than solely on individual characteristics or behaviors.
  Examples:
This includes information like friendships, collaborations, communication patterns, shared interests, and other interactions between people or
entities.
  Data sources:
Social network data can be collected from various sources, including social media platforms (Facebook, Twitter, etc.), surveys, interviews, and
other data collection methods.
  Metadata:
Social data includes metadata such as user location, language, biographical data, and shared links.
Applications of Social Network Data:

 Criminology and Terrorism Analysis:

Analyzing social networks can help identify criminal networks, terrorist cells, and understand their structures and communication
patterns.

 Public Health:
Understanding the spread of diseases or information through social networks can help in developing effective interventions.
  Marketing and Advertising:
Social network data can be used to target specific demographics and interests, improving the effectiveness of marketing campaigns.
  Social Science Research:
Social network analysis helps researchers understand social dynamics, group behavior, and the spread of ideas and information.
  Security Applications:
Social network analysis is used in intelligence, counter-intelligence, and law enforcement activities to map covert organizations.
  Identifying Key Individuals:
Social network analysis can help identify influential individuals within a network, who can play a crucial role in information diffusion or
decision-making

Textual data is information that is written or spoken and stored in a text format. It can include
emails, social media posts, blog posts, and more.
Examples of textual data
emails, social media posts, blog posts, online forum comments, customer reviews, support tickets,
surveys, articles, reports, and essays.
Uses of textual data :Language and linguistic research: Used to study lexis, syntax, morphology,
semantics, and more
  Artificial intelligence: Used as a data test bed for program development
  Natural language processing: Used for taggers, parsers, and spell checking word lists
  Business: Used to extract insights from customer feedback, email tickets, and chatbot
conversations
Challenges of textual data

 Text data can be challenging to analyze because it can come in different forms, such as short text, long text, semi-structured text, and
multilingual text

  The meaning of words can depend on context, so it's important to consider the context when analyzing text data
Text analytics
Text analytics is the process of analyzing unstructured text data to discover patterns, trends, and insights
Collaborative anonymization is a data privacy technique where multiple
parties jointly anonymize data to ensure privacy while still facilitating data
sharing and analysis across organizations.

Here's a more detailed explanation:

 Definition:

Collaborative anonymization involves a group of data providers

working together to remove or modify identifying information from
a dataset, ensuring that the data cannot be linked back to individuals or specific entities.

  Why it's used:

This technique is particularly useful in situations where organizations need to share data for research, analysis, or other collaborative purposes,
but privacy concerns prevent direct sharing of raw data.
  How it works:

 Joint Anonymization: Instead of each organization anonymizing their data independently, they work together to anonymize the data
in a way that preserves privacy across the entire dataset.

  Privacy-Preserving Techniques: Various methods, such as k-anonymity, l-diversity, and t-closeness, can be used to ensure that the
anonymized data remains private.
  Data Sharing: Once anonymized, the data can be shared with other organizations or researchers without the risk of exposing sensitive
information.
 Benefits:

 Enhanced Privacy: Collaborative anonymization can provide a higher level of privacy than individual anonymization, as it considers
the potential for re-identification across multiple datasets.

  Facilitates Collaboration: It allows organizations to share data for research and analysis without compromising privacy, fostering
collaboration and knowledge sharing.
  Improved Data Usability: Anonymized data can be used for various purposes, such as statistical analysis, trend identification, and model
development, while protecting individual privacy.
 Challenges:

 Complexity: Collaborative anonymization can be complex to implement, requiring coordination and trust between multiple parties.

  Data Integrity: Finding the right balance between privacy and data usability can be challenging.
  Re-identification Risks: Even with anonymization techniques, there's always a risk that data can be re-identified, especially if combined
with other datasets.
 Examples:

 Medical Research: Sharing anonymized patient data between hospitals or research institutions to study diseases or develop
treatments.

  Financial Analysis: Analyzing anonymized customer data to identify trends or patterns in financial behavior without revealing individual
identities.
  Public Health: Sharing anonymized data on disease outbreaks between different health agencies to improve response efforts
UNIT-3

Access control of outsourced data, Use of Fragmentation and Encryption to Protect Data Privacy, Security and Privacy in OLAP systems.

Access control of outsourced data :Access control for outsourced data involves managing who can access and what actions they can perform on
data stored outside of your organization's control, requiring robust security measures like encryption and secure access policies.
Here's a more detailed explanation:
Why is Access Control Important for Outsourced Data?

 Data Security: Outsourcing data to third-party providers, such as cloud storage, introduces new security risks. Access control
mechanisms are crucial to prevent unauthorized access, data breaches, and potential misuse of sensitive information.
 Compliance: Many regulations and industry standards require organizations to implement robust access control measures to protect
sensitive data, such as Personally Identifiable Information (PII) or financial data.
 Data Integrity: Access control helps ensure the integrity of outsourced data by preventing unauthorized modifications or deletions.

Key Aspects of Access Control for Outsourced Data:

 Encryption: Encrypting data before outsourcing it is a fundamental security practice. Encryption ensures that even if the data is
accessed by unauthorized individuals, it remains unreadable without the decryption key.
 Secure Access Policies: Defining clear access policies that specify who can access what data and what actions they can perform is
essential. These policies should be based on the principle of least privilege, granting users only the necessary access to perform their
tasks.
 Authentication and Authorization: Implementing strong authentication mechanisms, such as multi-factor authentication, is crucial to
verify the identity of users attempting to access the data. Authorization mechanisms ensure that users are only granted access to the
resources they are authorized to access.
 Access Revocation: Having a mechanism to revoke access to data when a user's employment or authorization changes is essential.
This ensures that sensitive data remains protected even when users leave the organization or their roles change.
 Auditing and Monitoring: Regularly auditing access logs and monitoring access patterns can help identify potential security
incidents and ensure that access control policies are being enforced effectively.
 Data Location and Storage: Consider the location and storage options of the outsourced data. Choose providers with strong security
practices and ensure that the data is stored in a secure environment.
 Data Encryption at Rest and in Transit: Ensure that data is encrypted both when it is stored (at rest) and when it is being transferred
(in transit) to and from the outsourcing provider.
 Regular Security Assessments: Regularly assess the security posture of your outsourced data and the security practices of the
outsourcing provider to identify potential vulnerabilities and ensure that security measures are effective.
Use of Fragmentation and Encryption to Protect Data Privacy:

Combining data fragmentation and encryption offers a robust approach to enhance data privacy by making data unintelligible and breaking
sensitive associations, ensuring confidentiality and security.
Here's a breakdown of how fragmentation and encryption work together to protect data privacy:

 Encryption:
o Encryption transforms data into an unreadable format (ciphertext) that can only be deciphered with a secret key.

  This prevents unauthorized access and ensures that sensitive information remains confidential, even if the data is intercepted.
  Encryption is a crucial measure for protecting data during storage and transmission.
 Fragmentation:

 Fragmentation involves splitting data into smaller, independent fragments, making it difficult to reconstruct the original data or infer
sensitive associations between different pieces of information.

  This approach operates at the attribute level, offering a different level of granularity compared to traditional database encryption.
  By separating sensitive attributes, fragmentation enhances privacy by preventing unauthorized parties from accessing or combining
sensitive information.
 Combined Approach:

 The combination of fragmentation and encryption provides a layered security approach.

  Encryption ensures that individual data fragments are protected from unauthorized access, while fragmentation prevents the reconstruction
of sensitive associations between them.
  This approach helps to protect data privacy by making it difficult for unauthorized parties to access or reconstruct sensitive information.

Security and Privacy in OLAP systems:-

In Online Analytical Processing (OLAP) systems, security and privacy are crucial, requiring measures like data encryption, access control, and
potentially, privacy-preserving techniques to protect sensitive information from unauthorized access or inference.

Here's a more detailed breakdown of security and privacy considerations in OLAP systems:

1. Security Measures:
o Data Encryption:Protect data stored in OLAP cubes or databases by making it unreadable without the proper decryption
key.
o Encryption can be applied at various levels, including the file system, database, or cube.
o Access Control:Implement role-based permissions to limit who can view or modify data.
o Ensure that only authorized personnel can access sensitive information.
o Authentication and Authorization:Verify the identity of users attempting to access the OLAP system.
o Authorize users based on their roles and permissions.
o Auditing:Track user activity within the OLAP system to detect and prevent security breaches.
o Physical Security:Protect the physical infrastructure where the OLAP system is hosted.
o Network Security:Secure the network infrastructure used to access the OLAP system.
2. Privacy Considerations:
o Aggregation and Derivation:Be aware that aggregated and derived data can still contain sensitive information, even if it
appears innocuous.
o Traditional security mechanisms might not be sufficient to protect against inferences from aggregated data.
o Privacy-Preserving Techniques:Consider using techniques that allow for analysis of data while preserving privacy.
o Data Perturbation: Randomly modify data to obscure sensitive information while still allowing for meaningful analysis.
o Differential Privacy: Add noise to data to ensure that individual data points cannot be identified.
o Homomorphic Encryption: Perform computations on encrypted data without decrypting it.
o Data Minimization:Only collect and store the data that is necessary for the intended purpose.
o Data Anonymization:Remove or obscure identifying information from data.
o Transparency:Be transparent about how data is collected, used, and protected.
o User Consent:Obtain informed consent from users before collecting or using their data.
3. OLAP Security Challenges:
o Inference:Malicious users might try to infer sensitive information from aggregated data.
o Data Breaches:Unsecured OLAP systems are vulnerable to data breaches.
o Compliance:Ensure compliance with relevant data privacy regulations, such as GDPR.
4. Best Practices:
o Regular Security Audits:Conduct regular security audits to identify and address vulnerabilities.
o Stay Up-to-Date:Keep your OLAP software and security tools up-to-date with the latest security patches.
o User Training:Train users on security best practices.
o Incident Response Plan:Develop a plan to respond to security incidents.
UNIT_iv

Extended Data publishing Scenarios, Anonymization for Data Mining, publishing social science data
Extended Data publishing Scenarios:
Extended data publishing scenarios involve publishing data in ways that go beyond simple data release, often focusing on privacy preservation
and utility for specific data mining tasks, including multiple views, sequential releases, and incremental updates.
Here's a more detailed breakdown of extended data publishing scenarios:
1. Privacy-Preserving Data Publishing (PPDP) Fundamentals:

 Purpose: To enable the publication of useful information while protecting data privacy.
Techniques: PPDP utilizes various techniques like data anonymization, generalization, and perturbation to achieve this.
  Challenges: Balancing privacy and utility, ensuring efficiency and scalability, and addressing privacy threats in complex data scenarios.
2. Specific Scenarios: Multiple Views Publishing: Concept: Publishing the same dataset through different views (e.g., different selections or
projections). Challenge: Ensuring that privacy is maintained across multiple views, as combining these views might reveal information not
apparent in any single view.  Anonymizing Sequential Releases with New Attributes: Concept: Handling the situation where data is
released incrementally, with new attributes added over time. Challenge: Maintaining privacy as new information is added, potentially revealing
previously hidden information. Anonymizing Incrementally Updated Data Records: Concept: Dealing with datasets that are updated over
time, requiring techniques to ensure that privacy is maintained throughout the updates.   Challenge: Ensuring that updates do not
compromise the privacy of previously published data.  Collaborative Anonymization: Concept: Addressing scenarios where data is shared
or published across different parties, requiring collaborative anonymization techniques.   Challenge: Ensuring that privacy is maintained
when data is shared or combined across different datasets.  Anonymizing Complex Data: Concept: Extending anonymization techniques to
complex data types like transaction data, trajectory data, social network data, and textual data.   Challenge: Adapting anonymization
methods to the specific characteristics and privacy threats associated with these data types.  Interactive Query Model: Concept: Allowing
data recipients to interact with the data publisher by submitting queries and receiving responses, enabling data mining tasks.   Challenge:
Ensuring that the data publisher can protect privacy while allowing interactive queries.  Non-Interactive Query Model: Concept: Publishing
a pre-processed dataset that can be queried without direct interaction with the data publisher.   Challenge: Balancing privacy and utility in
the pre-processing stage. 3. Key Considerations in Extended Data Publishing: Privacy Models: Understanding different privacy models (e.g., k-
anonymity, l-diversity) and their strengths and weaknesses. Attack Models: Understanding potential privacy attacks and designing
anonymization techniques that can mitigate these attacks. Utility Metrics: Evaluating the utility of the published data after anonymization,
ensuring that it remains useful for intended data mining tasks. Efficiency and Scalability: Designing efficient and scalable anonymization
algorithms that can handle large datasets.

 Data Quality: Ensuring that the anonymization process does not significantly distort the data, potentially impacting the accuracy of
data mining results.

Anonymization for Data Mining:

Data anonymization is the process of obscuring

or removing personally identifiable information
(PII) from a dataset in order to protect the
privacy of the people associated with that data.
The anonymization of data makes it impossible
to recognize individuals from their data, while
keeping the information functional for software
testing, data analysis, or other legitimate
purposes.
Data anonymization transforms PII and
sensitive data in such a way that it can’t easily
be linked to aspecific individual. In other words,
it reduces the risk of re-identification, in order to comply with data privacy laws and heighten security.
The anonymization process typically involves data masking of PII data, such as names, addresses, telephone numbers, passport details, or Social
Security Numbers. Towards this end, values are replaced or removed, by using cryptographic techniques, or adding random noise, in order to
protect the data.
Anonymized data can’t guarantee complete anonymity, with the threat of re-identification, particularly when the anonymized data is combined
with publicly available sources. Therefore, data teams must carefully consider the risks and limitations of their data anonymization tools and
processes when working with personal or sensitive data.
The role data anonymization plays in protecting personal privacy
Data anonymization play a critical role in protecting personal privacy by preventing the exposure, and exploitation, of people’s sensitive
information.
With the ever-increasing amounts of data being collected and stored, the risk that personal information could be accessed and misused – without
someone’s knowledge or consent – is greater than ever.
When personal information is violated, not only is it a breach of security for the organization, but, more importantly, a breach of trust for the
customer or consumer. Such attacks can lead to wide-ranging privacy violations, including breach of contract, discrimination, and identity theft.
By hiding or deleting the PII from datasets, data anonymization severely limits the ability of unauthorized users to access, or use, personal
information. In addition to preventing privacy breaches, and protecting the rights of the individual, data anonymization enables organizations to
comply with data privacy regulations – like APPI, CPRA, DCIA, GDPR, HIPAA, PDP, SOX, and more – which require companies to take
preventative measures to protect an individual's confidential data.

Just as important, even after data is anonymized, it can still be used for analysis purposes, business insights, decision-making, and research –
without ever revealing anyone’s personal information.
Types of data anonymization
There are 6 basic types of data anonymization, including:
1. Data masking
Data masking software replaces sensitive data, such as credit card numbers, driver’s license numbers, and Social Security Numbers, with either
meaningless characters, digits, or symbols – or seemingly realistic, but fictitious, masked data. Masking test data makes it available for
development or testing purposes, without compromising the privacy of the original information.
Data masking can be applied to a specific field, or to entire datasets, using a variety of techniques such as character substitution, data shuffling,
and truncation. Data can be masked on demand or according to a schedule. The data masking suite includes data tokenization, which irreversibly
substitutes personal data with random placeholders, and synthetic data generation, when the amount of production data is insufficient.
2. Pseudonymization
Pseudonymization anonymizes data by replacing any identifying information with a pseudonymous identifier, or pseudonym. Personal
information that is commonly replaced includes names, addresses, and Social Security Numbers.
Pseudonymized data reduces the risk of PII exposure or misuse, while still allowing the dataset to be used for legitimate purposes. In
the pseudonymization vs anonymization equation, the former is reversible (unlike data tokenization solutions), and is often used in combination
with other privacy-enhancing technologies, such as data masking vs encryption.
3. Data aggregation
Data aggregation, which combines data collected from many different sources into a single view, is used to gain insights for enhanced decision-
making, or analysis of trends and patterns. Data can be aggregated at different levels of granularity, from simple summaries to complex
calculations, and can be done on categorical data, numerical data, and text data.
Aggregated data can be presented in various forms, and used for a variety of purposes, including analysis, reporting, and visualization. It can
also be done on data that has been pseudonymized, or masked, to further protect individual privacy.
4. Random data generation
Random data generation, which randomly shuffles data in order to obscure sensitive information, can be applied to an entire dataset, or to
specific fields or columns in a database.
Often used together with data masking tools or data tokenization tools, random data generation is ideal for clinical trials, to ensure that the
subjects are not only randomly chosen, but also randomly assigned to different treatment groups. By combining different types of data
anonymization, bias is reduced, while the validity of the results is increased.
5. Data generalization
Data generalization, which replaces specific data values with more generalized values, is used to conceal PII, such as addresses or ages, from
unauthorized parties. It substitutes categories, ranges, or geographic areas for specific values.
For example, a specific address, like 1705 Fifth Avenue, can be generalized to downtown, midtown or uptown. Similarly, the age 55 can be
generalized to an age group called 50-60, or middle-aged adults.
6. Data swapping
Data swapping replaces real data values with fictitious, but similar, ones. For instance, a real name, like Don Johnson, can be swapped with a
fictitious one, like Robbie Simons. Or a real address, like 186 South Street, can be swapped with a fictitious one, like 15 Parkside Lane. Data
swapping is similar to the random data generator, but rather than shuffling the data, it replaces the original values with new, fictitious ones.

publishing social science data:

Publishing social science data involves making your research findings, including datasets, accessible to other researchers and the public, often
through data archives or publications, while adhering to ethical guidelines and ensuring data quality and reproducibility.
Here's a more detailed explanation:
Why Publish Social Science Data?

 Promotes Transparency and Accountability:

Sharing data allows others to verify your findings and identify potential errors or biases.
  Facilitates Replication and Meta-Analysis:
Other researchers can use your data to replicate your study or conduct meta-analyses, contributing to a broader understanding of the topic.
  Stimulates Further Research:
Published datasets can serve as a foundation for new research questions and analyses.
  Enhances Collaboration:
Sharing data can lead to collaborations and knowledge sharing among researchers.
Methods for Publishing Social Science Data

 Data Archives:

Many institutions and organizations maintain data archives specifically for social science data, such as ICPSR (Inter-university
Consortium for Political and Social Research).

  Journal Articles:
You can publish your research findings, including a description of your data and methods, in academic journals.
  Data Repositories:
Platforms like Harvard Dataverse and the Qualitative Data Repository (QDR) are designed for storing and sharing qualitative and multi-method
research data.
  Open Data Platforms:
Platforms like the Data Resource Center for Child & Adolescent Health and the Cultural Policy and the Arts National Data Archive (CPANDA)
provide access to specific types of social science data.
  Supplementary Materials:
Journals may allow you to include data as supplementary materials to your articles.
Ethical Considerations and Best Practices

 Data Privacy and Confidentiality: Ensure that you protect the privacy of your participants and comply with relevant regulations.

  Data Quality and Accuracy: Ensure that your data is accurate, reliable, and properly documented.
  Data Documentation: Provide clear and comprehensive documentation about your data, including variable definitions, data collection
methods, and any limitations.
  Data Access and Use: Specify how others can access and use your data, including any restrictions or permissions required.
  Data Citation: Encourage others to cite your data when they use it in their research.
  Data Sharing Policies: Be aware of any data sharing policies or requirements of your funding agency or institution.
UNIT -V

Continuous user activity monitoring (like in search logs, location traces, energy monitoring), social networks,

recommendation engines and targeted advertising.

Continuous user activity monitoring:

Continuous user activity monitoring (UAM), also known as user access monitoring, involves tracking and analyzing user actions across devices,
networks, and websites to identify potential security risks, ensure compliance, and improve overall security posture.
Here's a more detailed explanation:

 What it is: UAM is a comprehensive system that logs and tracks user actions, including computer activity, screenshots, keystrokes,
and application usage.
  Why it's important: Security: UAM helps identify and prevent insider threats, whether intentional or unintentional, and can help detect
and mitigate security breaches.
  Compliance: It ensures compliance with company policies, data privacy regulations, and other relevant standards.
  Productivity: Monitoring user activity can help identify areas for improvement in productivity and resource utilization.
  Fraud Detection: UAM can be used to detect and prevent fraudulent activities by tracking user behavior and identifying unusual
patterns.
 What it monitors: User actions: Includes accessing files, sending emails, visiting websites, using applications, and other activities within
the organization's systems.
  Data access: Tracks which users are accessing which files, and when, and how much data they transfer.
  System activity: Monitors network usage, software interactions, and other system-level events.
 Benefits: Enhanced Security: Protects sensitive data and prevents unauthorized access.
  Improved Compliance: Ensures adherence to company policies and regulations.
  Better Risk Management: Helps identify and mitigate potential risks, including insider threats.
  Evidence for Investigations: Provides a record of user activity that can be used in investigations and legal proceedings.
n Penumur, Andhra Pradesh, on April 04, 2025, you can use "logs, metrics, and traces" as the three pillars of observability to monitor and
analyze systems, applications, and infrastructure, including search logs, location traces, and energy monitoring data.
Here's a breakdown of how each pillar contributes to observability:

 Logs: Logs record detailed information about events, including errors, warnings, and other exceptional situations, providing a
chronological record of system activity.
 Metrics: Metrics capture quantifiable measurements of system health and performance, such as response times, CPU usage, and
memory consumption, allowing for the identification of performance bottlenecks and resource utilization.
 Traces: Traces track the flow of requests across multiple services in a distributed system, enabling the identification of performance
bottlenecks, errors, and latency issues.
Examples of how these pillars can be used:

 Search Logs: Analyzing search logs can help identify popular search terms, user behavior patterns, and potential issues with the
search functionality.
 Location Traces: Location traces can be used to track the movement of objects or individuals, identify patterns of activity, and
optimize resource allocation.
 Energy Monitoring: Energy monitoring data can be used to track energy consumption patterns, identify inefficiencies, and optimize
energy usage.
By combining these three pillars, you can gain a comprehensive understanding of your systems, applications, and infrastructure, enabling you to
identify issues, optimize performance, and improve user experience

Social networks :
Social networks are online platforms and apps that facilitate connection, communication, and relationship building among users and
organizations, encompassing a wide range of activities from sharing information to forming communities.
Here's a more detailed look at social networks:

 Definition: Social networks are websites and

applications designed to enable users to
connect, communicate, share information,
and build relationships.

  Purpose: They serve as a means for individuals

and organizations to connect with friends, family, colleagues, and others with shared interests or goals.
  Examples: Popular social networks include Facebook, YouTube, Instagram, WhatsApp, TikTok, WeChat, Facebook Messenger,
Telegram, and Snapchat.
  Functionality: These platforms often feature functionalities like: Creating profiles and connecting with others
  Sharing content (text, images, videos, etc.)
  Engaging in discussions and conversations
  Participating in groups or communities
  Discovering new information and content
 Impact:
Social networks have a significant impact on how people communicate, share information, and form relationships in the digital age.

Recommendation engines and targeted advertising:

. Recommendation engines, which use machine learning to suggest relevant content or products, are increasingly used in targeted advertising to
personalize and improve ad campaigns, leading to higher engagement and conversions.
Here's a more detailed explanation:
What are Recommendation Engines?

 Recommendation engines, also known as recommender systems, are AI systems that suggest items or content to users based on their
interests and behavior.

  They leverage machine learning algorithms to analyze user data (like browsing history, past purchases, and interactions) and predict what
a user might find useful or relevant.
  Examples include Netflix suggesting movies, Amazon recommending products, and Google suggesting search queries.
  They help users discover content, products, or services they might not have found on their own.
How Recommendation Engines are Used in Targeted Advertising

 Personalized Ad Targeting: Recommendation engines can analyze user data to understand their preferences and interests, allowing
advertisers to target specific audiences with relevant ads.
   Improved Ad Relevance: By showing users ads that align with their interests, recommendation engines increase the chances of
users engaging with and clicking on those ads.
   Enhanced User Experience: Personalizing ads based on user preferences can lead to a more positive and engaging user
experience, which can encourage repeat visits and purchases.

  Increased Conversions: Targeted ads that resonate with users are more likely to lead to conversions, such as purchases, sign-ups, or
clicks.
  Examples: E-commerce: Recommending products similar to those a user has viewed or purchased in the past.
  Streaming services: Suggesting movies or shows based on a user's viewing history.
  Social media: Showing users ads for products or services that align with their interests and activities.
 Benefits for Advertisers:

 Higher ROI: Targeted ads can lead to a better return on investment (ROI) for advertisers.

  More Effective Campaigns: Recommendation engines help advertisers create more effective and relevant campaigns.
  Improved User Engagement: Personalizing ads can lead to higher user engagement and brand loyalty.

Gaussian Differential Privacy
No ratings yet
Gaussian Differential Privacy
86 pages
Dokumen - Pub Guide To Data Privacy Models Technologies Solutions 9783031128363 9783031128370
No ratings yet
Dokumen - Pub Guide To Data Privacy Models Technologies Solutions 9783031128363 9783031128370
323 pages
Programmingdp
No ratings yet
Programmingdp
124 pages
Privacy Book
No ratings yet
Privacy Book
281 pages
Data Science and Ethical Issues
No ratings yet
Data Science and Ethical Issues
42 pages
Introduction To Privacy Preserving Data Publishing Concepts and Techniques Chapman Hall CRC Data Mining and Knowledge Discovery Series
No ratings yet
Introduction To Privacy Preserving Data Publishing Concepts and Techniques Chapman Hall CRC Data Mining and Knowledge Discovery Series
355 pages
Network Coding and Signcryption for Cloud Data Integrity
From Everand
Network Coding and Signcryption for Cloud Data Integrity
Noah Joan
No ratings yet
SPEML SS2023-Lecture Anonymisation
No ratings yet
SPEML SS2023-Lecture Anonymisation
101 pages
Don't Look at The Data! How Differential Privacy Reconfigures The Practices of Data Science
No ratings yet
Don't Look at The Data! How Differential Privacy Reconfigures The Practices of Data Science
19 pages
Differential Privacy
No ratings yet
Differential Privacy
40 pages
The Algorithmic Foundations of Differential Privacy
No ratings yet
The Algorithmic Foundations of Differential Privacy
281 pages
14 Module Six Privacy
No ratings yet
14 Module Six Privacy
45 pages
Data 102 Fall 2023 Lecture 24 - Privacy in Machine Learning
No ratings yet
Data 102 Fall 2023 Lecture 24 - Privacy in Machine Learning
46 pages
Database Anonymization
No ratings yet
Database Anonymization
138 pages
Fascination: Honeypots and Cybercrime
From Everand
Fascination: Honeypots and Cybercrime
Armin Snyder
No ratings yet
Privacy Models Differential Privacy I
No ratings yet
Privacy Models Differential Privacy I
27 pages
Distributed DP in Mixnets
No ratings yet
Distributed DP in Mixnets
38 pages
A Survey On Differential Privacy For Unstructured Data Content
No ratings yet
A Survey On Differential Privacy For Unstructured Data Content
28 pages
Correlated-Output Differential Privacy and Applications To Dark Pools
No ratings yet
Correlated-Output Differential Privacy and Applications To Dark Pools
26 pages
Survey On Privacy-Preserving Techniques For Data Publishing
No ratings yet
Survey On Privacy-Preserving Techniques For Data Publishing
35 pages
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
No ratings yet
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
23 pages
A Survey On Differentially Private Machine Learning Review Article
No ratings yet
A Survey On Differentially Private Machine Learning Review Article
16 pages
Research Proposal
No ratings yet
Research Proposal
17 pages
w9 Differential Privacy
No ratings yet
w9 Differential Privacy
30 pages
02 Synopsis
No ratings yet
02 Synopsis
16 pages
Research Paper 3
No ratings yet
Research Paper 3
20 pages
Differential Privacy For Deep and Federated Learning A Survey
No ratings yet
Differential Privacy For Deep and Federated Learning A Survey
22 pages
Microsoft Word Basiccivilandmechanicalengineering Basic Mechanical Engineering Notes
No ratings yet
Microsoft Word Basiccivilandmechanicalengineering Basic Mechanical Engineering Notes
76 pages
1differentially Private Federated Learning With An Adaptive Noise Mechanism
No ratings yet
1differentially Private Federated Learning With An Adaptive Noise Mechanism
14 pages
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
No ratings yet
Data Science Ethics - Lecture 5 - Privacy in Data Preprocessing and Modeling
23 pages
Differential Privacy
No ratings yet
Differential Privacy
22 pages
Privacy Enhancing Technologies
No ratings yet
Privacy Enhancing Technologies
18 pages
Wifs 2024
No ratings yet
Wifs 2024
6 pages
Waye Lucas
No ratings yet
Waye Lucas
8 pages
Preserving and Randomizing Data Responses in Web Application Using Differential Privacy
100% (1)
Preserving and Randomizing Data Responses in Web Application Using Differential Privacy
9 pages
May2024 (Volume45Issue01) IJCSMS Ankush 6 14
No ratings yet
May2024 (Volume45Issue01) IJCSMS Ankush 6 14
9 pages
Differential Privacy
No ratings yet
Differential Privacy
56 pages
WDS Unit 5 Notes
No ratings yet
WDS Unit 5 Notes
20 pages
A Statistical Framework For Differential Privacy
No ratings yet
A Statistical Framework For Differential Privacy
16 pages
DSP CT 3
No ratings yet
DSP CT 3
10 pages
Privacy-Preserving Data Mining: Methods, Metrics, and Applications
No ratings yet
Privacy-Preserving Data Mining: Methods, Metrics, and Applications
21 pages
CYBER SECURITY HANDBOOK Part-2: Lock, Stock, and Cyber: A Comprehensive Security Handbook
From Everand
CYBER SECURITY HANDBOOK Part-2: Lock, Stock, and Cyber: A Comprehensive Security Handbook
Poonam Devi
No ratings yet
Private Linear Programming
No ratings yet
Private Linear Programming
8 pages
Big Data Analysis and Perturbation Using Data Mining Algorithm
No ratings yet
Big Data Analysis and Perturbation Using Data Mining Algorithm
10 pages
Data Privacy Preservation Using Differential Privacy and Re-Identification Attacks
No ratings yet
Data Privacy Preservation Using Differential Privacy and Re-Identification Attacks
6 pages
Assess Impact of Differential Privacy On Model Performance
No ratings yet
Assess Impact of Differential Privacy On Model Performance
6 pages
Privacy-Preserving Machine Learning: Techniques To Ensure The Privacy of Sensitive Data While Using It For Machine Learning Tasks
No ratings yet
Privacy-Preserving Machine Learning: Techniques To Ensure The Privacy of Sensitive Data While Using It For Machine Learning Tasks
8 pages
Research Paper 2
No ratings yet
Research Paper 2
9 pages
2012 Ijact
No ratings yet
2012 Ijact
7 pages
Privacy Preserving Data Mining
No ratings yet
Privacy Preserving Data Mining
10 pages
Data Science Notes A
No ratings yet
Data Science Notes A
4 pages
Air Conditioner
No ratings yet
Air Conditioner
40 pages
FOD New Paper
No ratings yet
FOD New Paper
5 pages
CEH: Certified Ethical Hacker v11 : Exam Cram Notes - First Edition - 2021
From Everand
CEH: Certified Ethical Hacker v11 : Exam Cram Notes - First Edition - 2021
IP Specialist
No ratings yet
Big Data Project Privacy and FL DP
No ratings yet
Big Data Project Privacy and FL DP
3 pages
CSE3gn20 - Summer23 Assignment 1 - SRJ
No ratings yet
CSE3gn20 - Summer23 Assignment 1 - SRJ
8 pages
Paper 16
No ratings yet
Paper 16
4 pages
Decentralizing Privacy: Using Blockchain To Protect Personal Data
No ratings yet
Decentralizing Privacy: Using Blockchain To Protect Personal Data
5 pages
Privacy-Preserving Data Analysis - A Survey
No ratings yet
Privacy-Preserving Data Analysis - A Survey
3 pages
Fod New Paper 1
No ratings yet
Fod New Paper 1
4 pages
Paper
No ratings yet
Paper
1 page
Survey On Incentive Compatible Privacy Preserving Data Analysis Technique
No ratings yet
Survey On Incentive Compatible Privacy Preserving Data Analysis Technique
5 pages
Are Not Being Transferred To SAP ERP. What Should You Check To Solve The Problem?
50% (2)
Are Not Being Transferred To SAP ERP. What Should You Check To Solve The Problem?
30 pages
CS619 Final Project Viva Notes
100% (1)
CS619 Final Project Viva Notes
25 pages
An Overview On Privacy Preserving Data Mining Methodologies
No ratings yet
An Overview On Privacy Preserving Data Mining Methodologies
5 pages
AI-900 Killexam
0% (1)
AI-900 Killexam
4 pages
E Commerce PDF
No ratings yet
E Commerce PDF
254 pages
Designed Mcqs With and (CSE320)
No ratings yet
Designed Mcqs With and (CSE320)
26 pages
Free It Material
100% (1)
Free It Material
15 pages
National Finals Flipkart Wired 4.0
No ratings yet
National Finals Flipkart Wired 4.0
28 pages
SDLC
No ratings yet
SDLC
25 pages
Information Systems in Organizations 1st Edition Patricia Wallace Test Bank 1
100% (84)
Information Systems in Organizations 1st Edition Patricia Wallace Test Bank 1
33 pages
Pemetaan Digital (Digital Mapping) : Dodi Sukmayadi Wiradisastra
No ratings yet
Pemetaan Digital (Digital Mapping) : Dodi Sukmayadi Wiradisastra
18 pages
Answer Key (WBP)
No ratings yet
Answer Key (WBP)
14 pages
Openshift Container Platform 4.4: Architecture
No ratings yet
Openshift Container Platform 4.4: Architecture
54 pages
Vacancy Details 10-12-2024.Xlsx Scope Guide List 2025
No ratings yet
Vacancy Details 10-12-2024.Xlsx Scope Guide List 2025
13 pages
Chapter 2: Basic Computer Operations: 1) Computer Accepts Data or Instructions by Way of Input, (INPUT)
No ratings yet
Chapter 2: Basic Computer Operations: 1) Computer Accepts Data or Instructions by Way of Input, (INPUT)
9 pages
Proposal Business For IT Sectors
No ratings yet
Proposal Business For IT Sectors
16 pages
Sample Model Exam
No ratings yet
Sample Model Exam
10 pages
Displaying The SAP Directories
No ratings yet
Displaying The SAP Directories
8 pages
Strengthening Cybersecurity in Nigerian Libraries: Challenges, Mitigation Strategies, and Future Trends (WWW - Kiu.ac - Ug)
100% (1)
Strengthening Cybersecurity in Nigerian Libraries: Challenges, Mitigation Strategies, and Future Trends (WWW - Kiu.ac - Ug)
5 pages
2021 Backend Roadmap - Light
100% (1)
2021 Backend Roadmap - Light
3 pages
Upendrasap MM
No ratings yet
Upendrasap MM
3 pages
DBA Stackexchange Com Questions 12360 What Is The Optimal Wa
No ratings yet
DBA Stackexchange Com Questions 12360 What Is The Optimal Wa
7 pages
Landmark License Application Manager (LAM) Installation Summary
No ratings yet
Landmark License Application Manager (LAM) Installation Summary
4 pages
Creating A Connection SAP
No ratings yet
Creating A Connection SAP
5 pages
1571217737553resume Vignesh
No ratings yet
1571217737553resume Vignesh
4 pages
جامع الدرر البهية لأنساب القرشيين في البلاد الشامية PDF
No ratings yet
جامع الدرر البهية لأنساب القرشيين في البلاد الشامية PDF
1 page
Azure OpenAI Infographic
No ratings yet
Azure OpenAI Infographic
1 page
Abhinavreddy Resume Upadated1 0 2-1
No ratings yet
Abhinavreddy Resume Upadated1 0 2-1
1 page
Best Practices On SQL Server Security
No ratings yet
Best Practices On SQL Server Security
2 pages
Create A New Project - Jira Software Cloud - Atlassian Support
No ratings yet
Create A New Project - Jira Software Cloud - Atlassian Support
2 pages
PremiumpaymentReceipt 70473265
No ratings yet
PremiumpaymentReceipt 70473265
1 page
PCI Data Security Standard (PCI DSS) : Case Study
No ratings yet
PCI Data Security Standard (PCI DSS) : Case Study
1 page

Privacy and

Uploaded by

Privacy and

Uploaded by

UNIT – I

Methods like k-anonymity, l-diversity, and randomization can be used to

Secure multi-party computation (SMPC) is a cryptographic technique that

 SMPC distributes a computation across multiple parties.

 The first practical application of SMPC at scale was in a sugar beet

 The client encrypts a query and sends it to a server.

Benefits of homomorphic encryption

 Partially homomorphic: Only allows one operation, like addition or multiplication.

  Input and Output Privacy:

Input privacy focuses on safeguarding the data that is fed into a

implementing privacy-preserving machine learning techniques involves several challenges.

Data anonymization is the process of protecting private or sensitive

Data Anonymization Techniques

Information metrics :Information metrics are quantifiable measures used to

Anonymization methods for the transaction data

 Sensors capture the position of an object to create raw trajectory data

Where is trajectory data used?

Researchers use trajectory data to study traffic patterns and develop

 Urban planning :Researchers use trajectory data to study urban dynamics

 The path of a thrown ball or rock

 Criminology and Terrorism Analysis:

Here's a more detailed explanation:

Collaborative anonymization involves a group of data providers

  Why it's used:

Key Aspects of Access Control for Outsourced Data:

 The combination of fragmentation and encryption provides a layered security approach.

Security and Privacy in OLAP systems:-

Anonymization for Data Mining:

Data anonymization is the process of obscuring

publishing social science data:

 Promotes Transparency and Accountability:

recommendation engines and targeted advertising.

Continuous user activity monitoring:

 Definition: Social networks are websites and

  Purpose: They serve as a means for individuals

Recommendation engines and targeted advertising:

You might also like