Ethical And Professional Issues In Cybersecurity And Cloud Computing (Privacy) (PDF)

1305202 ETHICAL AND PROFESSIONAL ISSUES IN CYBERSECURITY AND CLOUD COMPUTING Module Four – Part 1 Privacy First Term 2022/2023 Faculty of Information Technology Applied Science Private University Outline Privacy-Aware Computing. Data Anonymization. Differences between Security and privacy Privacy protection. Threat to Privacy 2 Learning Objectives After finishing this module, you’ll learn:  Define Privacy-Aware Computing.  Identify Data Anonymization.  Define the Privacy protection.  Define the Threat to Privacy. 3 Introduction to PRIVACY-AWARE COMPUTING Parties concerning privacy  Individual privacy  Customer data  Public data: census data‫ التعداد السكاني‬voting record  Health record  locations  Online activities  …  Organization privacy  Business secrets  Legal issues prevent data sharing  … Cases of privacy aware computing 1. Public use of private data:  Data mining enables knowledge discovery on large populations, but people are unwilling(not ready to participate in) to release personal information due to the privacy concern.  The Centers for Disease Control want to identify disease outbreaks( spread of something undesirable) by pooling multiple datasets that contain patient information  Insurance companies have data on disease incidents, and patient background, etc.. Personal medical records help them maximize profits(financial gains or positive earnings) – but customers will not be happy with that. Cases of privacy aware computing 2. Industry Collaborations / Trade Groups:  An industry trade group may want to identify best practices to help members, but some practices are trade secrets.  How do we provide “commodity goods/ raw materials” results to all (Manufacturing using chemical supplies from supplier X have high failure rates), while still preserving secrets (manufacturing process Y gives low failure rates)?  The secrets related to products and their chemical compositions, such as Pepsi and others." Cases of privacy aware computing 3. Web search:  Search engine companies keep the cookies and search history, which can be used to derive personal information  (AOL dataset): AOL (America Online) released a dataset related to search queries, which caused privacy concerns. In 2006, AOL released a portion of its search logs, intending for researchers to analyze and improve search algorithms. Unfortunately, the dataset contained personally identifiable information, and AOL faced criticism for insufficient anonymization. This led to the withdrawal of the dataset and raised awareness about the importance of privacy in releasing such data. Cases of privacy aware computing 4. Social networking:  When you use social networks, you leave a trace of personal data and interactions.  Companies can use the data for Ads targeting – there is a risk of privacy breach and personal data abuse. The Facebook and Cambridge Analytica scandal: Facebook exposed data on up to 87 million Facebook users to a researcher who worked at Cambridge Analytica, which worked for the Trump campaign. Facebook and Cambridge Analytica scandal Cases of privacy aware computing 5. Mobile computing: When you allow google latitude to trace your locations, you loose location privacy. 6. Cloud computing:  Users have to outsource data to the cloud.  Data can be sensitive (personal information, customer records, patient info…). 7. Collaborative computing: is a hybrid of centralized (Client/server) and decentralized(P2P) computing.  Collaborative data mining –> share model but not individual records. Note Collaborative computing:. Collaborative computing, also known as collaborative software or groupware, refers to technology that enables people to work together on a common task or project, often in real-time or asynchronously. It involves the use of computer systems and software to facilitate communication, coordination, and cooperation among individuals or groups. The goal of collaborative computing is to enhance productivity, efficiency, and creativity by promoting teamwork and shared decision-making. Major research areas  Micro data publishing:  Anonymize data for statistical analysis and modeling  Privacy preserving data mining  Data outsourcing:  Cloud computing  Databases:  Statistical databases  Private information retrieval Major technical challenges  Techniques of Privacy Preservation.  Privacy evaluation.  Tradeoff between privacy and data utility? Major technical challenges  "Privacy Preservation" or "Data Privacy" science: is field includes a set of technologies and methods used to protect individuals' privacy and secure data during processing or transmission.  Privacy preservation plays a crucial role in ensuring data security and respecting individuals' privacy, especially in the era of increasing technology usage and data exchange in our modern society. Some branches of Privacy Preservation include: 1. Encryption: Utilized to secure data and make it unreadable for unauthorized parties. 2. Private Information Retrieval: Techniques that enable individuals to query data without revealing(uncover) their identity or the actual query. 3. Database Privacy: Focuses on implementing privacy protection measures within databases, such as statistical databases. 4. Data Anonymization: Involves hiding or removing specific information that could be used to identify individuals, contributing to the protection of their identities. 5. Data Perturbation Techniques: Used to generalize data and introduce perturbations to make it challenging to identify actual individuals in the dataset. 6. Data pseudonymization: is a privacy-enhancing technique used to protect sensitive information by replacing or encrypting personally identifiable information (PII) with artificial identifiers or pseudonyms. Privacy Preservation (1. Encryption) homomorphic encryption  I don’t trust the server Homomorphic encryption is a cryptographic technique that allows computations to be performed (complex mathematical operations) on encrypted data without requiring decryption. Privacy Preservation (2. Private Information Retrieval)  Private Information Retrieval (PIR) is a cryptographic technique.  The goal is to enable users to access data privately, even when interacting with a third-party database owner.  PIR allows users to interact with databases while keeping their specific queries private. The cryptographic techniques involved ensure that the database owner cannot determine the content of the user's query even though the requested information is successfully retrieved. Privacy Preservation (2. Private Information Retrieval) Privacy Preservation (3. Database Privacy)  Database privacy refers to the protection of sensitive and personally identifiable information (PII) stored within databases from unauthorized access, use, disclosure, and manipulation.  It includes various strategies, technologies, and policies aimed at safeguarding the privacy and confidentiality of data stored in databases.  This is particularly important as databases often contain a wealth of information about individuals, organizations, or entities, and unauthorized access to this information can lead to privacy breaches and potential harm. Privacy Preservation (3. Database Privacy) Privacy Preservation (3. Database Privacy) Privacy protection measures approval What Is Data Anonymization? Data anonymization: is the process of protecting private or sensitive information by erasing or encrypting identifiers that connect an individual to stored data (retains/keeps the data but keeps the source anonymous).  For example, you can run Personally Identifiable Information (PII) such as: Names, National number, Tax number, Social security numbers, and addresses.  Attackers can use de-anonymization methods to retrace the data anonymization process.  Since data usually passes through multiple sources—some available to the public; de- anonymization techniques can cross-reference the sources and reveal personal information. 22 What Is Data Anonymization? 23 Data Anonymization Techniques 24 Data Anonymization Techniques 1) Data masking 2) Pseudonymization 3) Generalization 4) Data swapping. 5) Data perturbation 6) Synthetic data 25 Data Anonymization Techniques (1. Data masking) 1) Data masking: Hiding data with altered values: 1) You can create a mirror version of a database and apply modification techniques such as character shuffling, encryption, and word or character substitution; to create a sanitized (clean) version of the original dataset. 2) This sanitized or masked data retains (keeps) the essential characteristics and relationships of the original data but does not expose(unmask) sensitive information. 3) Data masking is commonly employed in scenarios where it is necessary to share databases or datasets for non-production purposes, such as testing software applications or conducting analytical research, without compromising the confidentiality of sensitive information. 4) Reverse process of data masking, often referred to as "unmasking" or "de-identification," may be possible through the use of a secure and controlled mechanism. 5) In some techniques, data masking makes reverse engineering or detection impossible. For example, you can replace a value character with a symbol such as “*” or “x”. 26 Data masking (Examples) Ex1 Ex2 : : 27 Data masking (Example 3) Before Masking: After Masking: 28 Why is Data Masking Important? Here are several reasons data masking is essential for many organizations: 1. Data masking solves several critical threats like insecure interfaces with third party systems. 2. Reduces data risks associated with cloud outsourcing. 3. Makes data useless to an attacker, while maintaining many of its inherent functional properties. 4. Allows sharing data with authorized users, such as testers and developers, without exposing sensitive details. 5. Can be used for data sanitization* – normal file deletion still leaves traces of data in storage media, while sanitization replaces the old values with masked ones. *Data sanitization: is the process of permanently and irreversibly removing or destroying the data stored on a memory device to make it unrecoverable. A device that has been 29 sanitized has no usable data, and even with the assistance of advanced forensic tools, the Data Anonymization Techniques (2. Pseudonymization) 2) Pseudonymization: a data management and de-identification method that replaces private identifiers with fake identifiers or pseudonyms*.  It is a reversible process that de-identifies data but allows the re-identification later on if necessary.  This is a well-known data management technique highly recommended by the General Data Protection Regulation (GDPR) as one of the data protection methods. * Pseudonym: is an identifier that is associated with an individual. A pseudonym can be a number, letter, special character, or any combination of those tied to a specific personal data or individual and, therefore, makes data safer to use in a 30 business environment. Data Anonymization Techniques (2. Pseudonymization) Is pseudonymized data still personal data according to the GDPR? A pseudonym is still considered to be personal data according to the GDPR since the process is reversible, and with a proper key, you can identify the individual. 31 Data Anonymization Techniques (2. Pseudonymization) Example: When sending excel sheets containing sensitive data via e-mail. Although the sender and receiver of the e-mails are authorized to access that information, your IT support also has access to those e-mails. Now imagine it was upper management bonuses or information about company salaries. Explain: When the data is pseudonymized, there is a lot less chance of exposing personal data, since it makes the data record unidentifiable while remaining suitable for data processing and data analysis. Solution: For example, replacing the identifier “John Smith” with “Mark Spencer”. Pseudonymization preserves statistical accuracy and data integrity, allowing the modified data to be used for training, development, testing, and analytics while protecting data privacy. 32 Anonymization vs. Pseudonymization 33 Anonymization vs. Pseudonymization  With pseudonymization, if you are authorized to access that information, you will have the key that will enable you to de-identify the data.  Anonymization is a technique that irreversibly alters data so an individual is no longer identifiable directly or indirectly.  Both methods are highly recommended. The choice will depend on many factors (the use case, degree of risk, the way data is processed within your company…). The best method for you will be determined by the purpose of processing, the type of data you process, and the risk of a data breach it imposes.  Compared to anonymization, pseudonymization is a much more sophisticated (advanced) option since it leaves you the key to “unlock” the data. This way, data is not considered directly identifying, and it is not anonymized either, so it doesn’t lose its original value. 34 Recommendations for pseudonymization  The recommendation to anonymize personal data in non-production* environments and use pseudonymization in production** environments arises from the need to balance the usability of data for various purposes with the protection of individuals' privacy and sensitive information.  Data sets with anonymized personal information are still great for development, statistics, and analytics.  When designing data protection for live production systems, it is recommended to use pseudonymization. By doing so, only authorized users will have access to data subjects’ personal data. Once the lawful basis for processing data subject’s personal data no longer exists, the system will delete the pseudonym and make the data subject anonymized (forgotten). * Anonymization in Non-Production Environments (e.g., Development, Testing, Training) ** Pseudonymization in Production Environments (e.g., deliver products or services to end-users, customers, or clients) 35 Data Anonymization Techniques (3. Generalization) 3) Generalization: is a data anonymization technique that involves replacing specific values in a dataset with more general or broader values. This process helps to protect individual privacy by making it more difficult to identify specific individuals from the data.  You can remove the house number in an address, but make sure you don’t remove the road name. The purpose is to eliminate some of the identifiers while retaining a measure of data accuracy. 36 Data Anonymization Techniques (3. Generalization) Ex1: 37 Data Anonymization Techniques (3. Generalization) Ex2: * *STEM Professional: S: Science T: Technology E: Engineering 38 Data Anonymization Techniques (3. Data swapping) 4) Data swapping: Also known as shuffling and permutation, a technique used to rearrange the dataset attribute values so they don’t correspond with the original records.  Swapping attributes (columns) that contain identifiers values such as date of birth, for example, may have more impact on anonymization than membership type values*. * Assuming we have a database for a sports club containing information about its members. We might have an attribute called "Membership Type," indicating the type of subscription or membership each person holds. This type could have values such as "Silver Membership," "Gold Membership," "Regular Membership," etc. Changing this attribute for data generalization may have a lesser impact on individuals' privacy compared to changing more sensitive attributes such as birthdates or full names. 39 Data Anonymization Techniques (3. Data swapping) Before Anonymization by swapping: Values for all attributes have been swapped: 40 Data Anonymization Techniques (3. Data perturbation) 5) Data perturbation: Modifies the original dataset slightly by applying techniques that round numbers and add random noise.  The range of values needs to be in balance ratio to the perturbation. A small base may lead to weak anonymization while a large base can reduce the utility of the dataset. 41 Data Anonymization Techniques (3. Data perturbation) Multi-level perturbation * R: real-world dat * 42 Data Anonymization Techniques (6. Synthetic data) 6) Synthetic data: is fake data that mimics real data. Synthetic data is used to create artificial datasets instead of altering the original dataset or using it as is and risking privacy and security.  What is synthetic data under GDPR? Synthetic data is artificial data that is generated from original data and a model that is trained to reproduce the characteristics and structure of the original data. This means that synthetic data and original data should deliver very similar results when undergoing the same statistical analysis.  The process involves creating statistical models based on patterns found in the original dataset. You can use standard deviations, medians, linear regression or other statistical techniques to generate the synthetic data. 43 Data Anonymization Techniques (6. Synthetic data) 44 Data Anonymization Techniques (6. Synthetic data) 45

Ethical And Professional Issues In Cybersecurity And Cloud Computing (Privacy) (PDF)

Document Details

Tags

Related

Summary

Full Transcript