Digital Society - Types of Data PDF

Document Details

TimelyCosmos

Uploaded by TimelyCosmos

Kampala International School Uganda

2022

Bomfim et al

Tags

data types information technology data analysis digital society

Summary

This document discusses various types of data, including financial data, medical data, meteorological data, geographical data, and scientific data. It also explores data analysis techniques and provides real-world examples of their applications. The text is part of a wider Digital Society course and includes activities for research and communication.

Full Transcript

3.1B Types of data Information technology (IT) systems operating in a range of contexts, for example, cultural, economic, environmental and health, and are used to create, collect and store different types of data. A financial system may store quantitative data about a company’s sales with numbers t...

3.1B Types of data Information technology (IT) systems operating in a range of contexts, for example, cultural, economic, environmental and health, and are used to create, collect and store different types of data. A financial system may store quantitative data about a company’s sales with numbers that can be computed, for example, while a marketing department may conduct a survey to collect qualitative data about customer feedback, which is more descriptive. Now we will briefly introduce the different types of data that you will study in more detail under the different contexts. Financial data Financial data consists of information that is related to the finances of a business, such as cash Link s flow statements, balance sheets, and profit and loss accounts. Specialized software is often used for financial data management to analyse, report and provide data visualization tools. Good financial This section links to Chapter 4.2 Economic. data management also ensures that businesses meet existing regulations and legal requirements. Medical data Medical data is collected, analysed and stored during the ongoing care of a patient. For example, hospitals keep electronic health records that are updated after each visit. Patients’ details may be entered on disease registries, which keep details of data for medical conditions such as Alzheimer’s, cancer, diabetes and asthma. Additionally, patients may register for clinical trials to take part in the testing of new treatments. ATL ACTIVITY Link s Research This section links to Complete the following research activity and document the process used. Chapter 4.4 Health. Conduct research to find out about the different types of data that your Ministry of H ealth colle c t s. Find details about the statistics that it publishes. Write a step-by-step guide on the research process used. Meteorological data Instruments are used to collect data about the weather and climate. Basic instruments include thermometers, rain gauges, barometers and anemometers. More sophisticated technologies include doppler radar, which can detect precipitation, rotation of thunderstorm clouds, wind strength and direction, and tornado debris; radiosondes, which are launched into the air using weather balloons can collect data about the upper atmosphere, and weather satellites that monitor the Earth from space can capture images that are then analysed. 3.1 Data and data analysis 59 ATL ACTIVITY Research Use effective research skills to download historical weather data. Ask the humanities or science teachers if your school has a weather station, or research weather data that is available for download (this data may be local or national weather, or weather from another country). Save the data in a suitable format (you will use this data later on in this chapter). Meteorological data is used, for example, in weather forecasting Link s and to predict extreme weather conditions, as well as This section links to Chapter 4.3 Environmental. climate modelling. Geographical data Location data, also known as geospatial data, refers to data related to the positioning of an object in a geographic space. It is usually collected using global positioning system (GPS) technologies, which may be used by a phone to provide location services or provide data for mapping applications. Location data has a wide range of uses. REAL-WORLD EXAMPLE Accessing location data without authorization: Australian Federal Police (AFP) According to Australian Computer Society’s Information Age, in 2021 the Australian Federal Police (AFP) were being investigated for accessing location data without gaining the correct authorization. The investigation covered a period of five years from 2015 to 2020 in which there were 1700 instances of police accessing Link s location data, with compliance for only 100 of these. https://ia.acs.org.au/article/2021/afp-misused-metadata- This section links to Chapter 4.3 Environmental. powers.html ATL ACTIVITY Communication Explain in your own words how GPS technologies work. Conduct research into how GPS technologies work and their uses. Create a simple diagram to explain how GPS technologies provide location data. Describe what format the location data is presented in. Describe two real-life examples of when GPS data is used. Use your diagram to explain your understanding of GPS technologies to a friend. 60 Content Scientific data Scientific data refers to the research carried out by scientists that has been published in peer-reviewed journals. To support a hypothesis, a scientist must collect data either through an experiment or by observation. To automate the collection of data in an experiment a scientist may use sensors. Sensors are small devices used to measure a specific property of data and send this as a signal to a computer. Usually the signal is an analogue (continuous) signal that needs to be converted to a digital signal before it can be understood by the computer. This is done using an analogue-to-digital converter (ADC). Examples of sensors include temperature, light, pressure, moisture, chemical and gas. ATL ACTIVITY Thinking Revisit an experiment that was conducted in one of your Group 4 subjects. Revisit your notes and data collected from the experiment, or re-run the experiment. Use this information to help answer these questions: What was the purpose of the experiment? Describe the tools used to collect the data, for example, sensors. What type of data was collected and what units were measured? What conclusions were drawn from the data? REAL-WORLD EXAMPLE Link s Citizen scientists This section links to Chapter 3.6, 3.7 – During 2020–21, there was a marked increase in bird watching, which generated an IoT and 4.5 Human increase in data. Many people were working from home during this time due to the COVID- knowledge. 19 pandemic, and large numbers joined projects to collect and share data about birds in the form of pictures, sound recordings and observations. One such citizen-science project, Project Safe Flight, asked users to record birds injured by flying into windows, while eBird allowed citizens to update sightings of the different species of birds. In many cases, the number of people registered to these projects doubled, and so did the amount of data uploaded. From this data, scientists could see changes in bird behaviour, although it was not clear whether this could be attributed to the increase in observations, or whether the birds were actually changing their behaviour. w w w.wired.com/stor y/pandemic-bird-watching-created-a-data-boom-and-a-conundrum 3.1 Data and data analysis 61 Metadata In addition to storing data, IT systems also store data about the data they are storing, which is ◆ Metadata: A set of known as metadata. Metadata is a set of data that describes and gives information about other data. data that describes and For example, a document may store details such as the author, the size of the file and the date it gives information about other data. was created. ATL ACTIVITY Link s Research This section links to Chapter 3.5 Media. Use research skills to investigate different examples of metadata. For each of the following, write up the metadata found for a website document image video. Consider why this data might be useful? 3.1C Uses of data With storage costs declining and advances in storage technologies and artificial intelligence, ◆ Data mining: The organizations are becoming more able to identify trends and patterns in their data, which they can process of finding use to inform their decision-making. What once would have taken individuals months to compute, patterns and correlations, as well as anomalies, can now be done at speed and with greater accuracy. Data mining is the term used to describe the within large sets of data. process of finding patterns and correlations, as well as anomalies, within large sets of data. REAL-WORLD EXAMPLE Data analysis in employment Data is collected widely by both people and communities. In employment, for example, artificial intelligence can be used to analyse data generated by detailed questionnaires to identify which employees would be suitable for new job opportunities. In the health industry, data analysis can be used to determine staffing levels. Too many staff can lead to overspending on labour costs, while understaffing can create a stressful working environment and lower the quality of medical care. Data can be used to solve this issue. In addition to analysing data within one set of data, data can be gathered from multiple sources of ◆ Data matching: The data in order to create new connections, determine new relationships and discover new information. process of comparing two Data matching is when two different sets of data are compared with the aim of finding data about different sets of data with the aim of finding data the same entity. For example, data matching can be used to compare the prices of the same product about the same entity. on different platforms, or used in fraud detection when identifying suspicious transactions. Another example is in the medical field, where medical researchers have been able to find connections between environmental factors and diseases, such as exposure to the sun and skin cancer. 62 Content ATL ACTIVITY Communication Domo is an organization that aims to bring people, data and systems into one place for a digitally connected business. The following infographics are from the resource centre on their website. Study the two infographics Data Never Sleeps 7.0 and Data Never Sleeps 8.0. List the data each of the infographics shown. Describe the similarities and differences that you notice. From your knowledge of the apps, websites and terms in the infographics, suggest possible reasons for any changes. Data Never Sleeps 7.0 Data Never Sleeps 8.0 3.1D Data life cycle The data life cycle has five stages. Organizations and data scientists use it to manage the flow of data, which can improve efficiency as well as help with adherence to data governance regulations. Data creation Storage Usage Preservation Destruction Stages of the data life cycle Stage 1: Data creation The first stage of the data life cycle is the creation of data. New data may be created through manual data entry by a member of the organization, through the completion of an online form, or collected automatically through the use of sensors. As we discussed earlier, this data may be in many different formats. Stage 2: Storage Once the data has been created, it needs to be stored and protected with the appropriate level of security and access configured. Organizations will set what data can be accessed by who, as well as the different levels of access rights, so that users can either read, modify or have full control of the data. 3.1 Data and data analysis 63 Stage 3: Usage Data is collected and stored for many reasons. At this stage of the data life cycle, the data can be viewed in its raw format, be processed so that it can be presented in a more visually appealing manner, or specific information can be extracted out. Once processed, the data can be analysed or shared with others. IT systems may be required to use data that has been previously collected by another organization or for a different purpose, or third parties may be given access to the data. Stage 4: Preservation Following the analysis of data, it is important that this data is preserved by the organization. One reason is to ensure that the data is maintained to support current analysis and decision-making. It also allows data to be reused in the future. Stage 5: Destruction Although organizations may wish to keep this data forever, as the volume of data grows so does the cost of storage. Compliance with data protection regulations may also mean that data must be destroyed once the agreed retention period is over. ATL ACTIVITY Thinking Demonstrate a personal relevance to this activity by analysing a file in the recycle bin. Select one file from your computer’s recycle bin. Analyse the file using the data life cycle by answering the questions in this table. Name of file: Stage Question to be addressed Answers 1 Data creation How was the file created? Was it manual or automatic? What file type? 2 Storage Where was the file being stored (location on the computer)? How was the file being kept secure? Who had access to the file? Was the file being shared? What was the purpose of the file? 3 Usage How did you use the file? Was the file used in its raw format or processed to create a different format? How long did you intend to use the file? 4 Preservation What was the intended future use of the file? When did you add the file to the recycle bin? 5 Destruction 64 Content 3.1E Ways to collect and organize data Data creation Storage Usage Preservation Destruction At the data creation stage there are two main categories of data: primary data and secondary data. ◆ Primary data: Primary data is original data collected for the first time for a specific purpose. This may be an interview Original data collected for as part of your extended essay, or it may be data collected by cameras for facial recognition. Secondary the first time for a specific purpose. data is data that has already been collected by someone else for a different purpose. For your extended ◆ Secondary data: essay this may be in the form of a website or online news article, or may include a set of training data for Data that has already a facial-recognition system. been collected by someone else for a different purpose. Data creation Storage Usage Preservation Destruction Once data has been collected, it is important that organizations or users are able to store this data. Databases are often used to store large volumes of data in one place. Data is organized and Link s structured using tables, which makes finding information quick and easy. A table consists of This section links columns (field names) and rows (records). Databases organize data about entities. For example, an to Chapter 1.6 entity could be a book, movie, house or country. Conducting secondary When designing a database, one must think about what attributes needs to be stored (what specific research and primary data) about the entity. For example, a database about students (entity) may store data such as their research and Section 9 Digital society name, date of birth, telephone number and address (attributes). The fields that store these attributes extended essay. are predefined by size and data type. The most common types are integers, floating point numbers, characters, strings, Boolean values and dates. For example, name and address would be string, date of birth would be date. A database that has more than one table is called a relational database, with tables linked by their ◆ Relational primary key and corresponding foreign key. A field is assigned to be a primary key when it contains :database A database that has unique values. It is important for records in relational databases to have a unique identifier. more than one table. Example of an entity–relationship diagram (ERD) for a relational database 3.1 Data and data analysis 65 It is important during database design to reduce data entry errors and promote integrity, so that the data being input is valid, accurate and consistent. Two methods to improve the accuracy of data in a database are validation and verification. ◆ Validation: In Validation in database design means that only valid (suitable) data can be entered. This can be done in databases, this means that only valid (suitable) various ways, such as setting the field length, assigning data types, using input masks, configuring range data can be entered. checks and designing lookup tables. Incorporating these into the database ensures that errors are ◆ Verification: In minimized at the time of data entry. Should unsuitable data be entered, users will receive an error databases, these are message. checks that the data entered is the actual data On the other hand, verification checks that the data entered is the actual data that you want, or that you want, or that the that the data entered matches the original source of data. Two common methods of data verification data entered matches the include double entry (for example, being asked to enter a password twice when registering a original source of data. username for a new website) or having a second person check the data visually. Multiple users can access databases at any one time, and it is easy to add and modify data. Databases can be sorted so that information can be: l presented in an organized manner l searched in order to find specific information l analysed to find trends or patterns. In databases, searches are sometimes called queries. A query can be designed and saved, then executed whenever the user needs it. Queries are often presented in the form of a report, which can be designed to make the information extracted more visually appealing to the recipient. As part of the process of organizing and structuring data, data needs to be classified into categories. Categorization may be done by defining fields in a database or through data tagging. Classification can make accessing information easier and more searchable, as well as for security purposes, such as classifying documents as confidential. Some standard categories of information include: l public information, for example an organization’s name, address and telephone number l confidential information, for example bank details l sensitive information, for example biometric data Link s l personal information, for example ethnic origin or political opinions. When classifying data is it important to determine the relative risk associated with each set of data. This content links to Section 4.7A Social Public data, which is easy to recover, is low-risk, whereas sensitive personal information or data that components of is necessary for an organization to function will be high risk. identit y. 66 Content ATL ACTIVITY Thinking Analyse the database behind a social media website. Select one of your social media accounts. Study the chosen account and make notes of the data that was entered to set up the profile. Next, study the sort of data that would be entered and automatically created every time a post is made. Describe the different formats of data that are used, for example images, date/time, text, integer, video, sound. Select a suitable drawing program to create an entity–relationship diagram (ERD) for your social media account. Use the example ERD below. Personal Profile Post PersonalID Autonumber PostID Autonumber PostID Number content text ProfilePic Image Bio Text ERD template for social media Complete the following table to classify the data in your social media account. Type of information Summary of data found in the social media account Public Personal Sensitive Confidential 3.1F Ways of representing data Data creation Storage Usage Preservation Destruction Data collected can be presented in different ways to make it both easier to understand and more interesting to read. Numerical data, such as financial, meteorological, scientific and statistical data, are often presented in a visual manner in the form of charts and tables. The type of chart will often depend on the type ◆ Data visualization: of data being represented. For example, rainfall is often presented as a bar chart, whereas The process of converting temperature is presented as a line graph. These may also be combined with text to create a report. large sets of data into Data visualization is the process by which large sets of data are converted into charts, graphs or charts, graphs or other visual presentations. other visual presentations. 3.1 Data and data analysis 67 ATL ACTIVITY Communication Earlier in this chapter, you downloaded a set of weather data. Use your spreadsheet skills to present this data. First, identify which spreadsheet software to use. Import the downloaded weather data into the software. Use formatting tools to present the data in an easier-to- read format – consider the use of fonts, colours, borders and shading. Use simple functions, such as average, minimum and maximum, to make calculations on the data. Use chart tools to create suitable charts for the weather data. Tools may include selecting the chart type, formatting the horizontal and vertical axes, labelling the axes, adding titles and a legend. If your spreadsheet skills need refreshing, find suitable online tutorials to assist you in each of these activities. Infographics are an alternative way to provide an easy-to- understand overview of a topic. They can contain images, charts and text. ATL ACTIVITY Research Throughout the course, there will be numerous occasions when you will be required to present your findings in an easy-to-understand manner. Research and try out at least two online infographic creators. Research the most recommended online infographic creators. Select two and try them out. Create a table and compare their features. Make a decision on which one to use for this course. Write a short justification of your choice. Infographic design template 3.1G Data security Data creation Storage Usage Preservation Destruction It is of paramount importance that data is secure at the time of storage but also in transmission. This may be when the data is collected or shared between systems or organizations. One method to ensure that data is kept secure is encryption. Encr yption ◆ Encryption: The process of converting Encryption is the process of converting readable data into unreadable characters to prevent readable data into unauthorized access. Encryption is based on cryptography, where an algorithm transforms unreadable characters information into unreadable ciphertext. For the intended person or computer to be able to make sense to prevent unauthorized access. of this encoded data, they must use a key to decrypt it back to its original form, called plaintext. 68 Content There are two types of encryption: l symmetric key l public key. Symmetric key encryption is where the key to encode and decode the data is the same. Both computers need to know the key to be able to communicate or share data. The advanced encryption standard (AES) uses 128-bit or 256-bit keys, which are currently considered sufficient to prevent a brute force attack (trying every possible combination to find the right key). For example, a 256-bit key can have 2256 possible combinations. This type of encryption is commonly used in wireless security, security of archived data and security of databases. Public key (asymmetric) encryption uses two different keys to encode and decode the data. The private key is known by the computer sending the data, while the public key is given by the computer. It is shared with any computer that the original computer wishes to communicate with. When sending data, the public key of the destination computer is used. During transmission, this data cannot be understood without the private key. Once received by the destination computer the private key is used to decode the data. Public key encryption Public-key encryption is found in Secure Socket Layer (SSL) and Transport Layer Security (TLS) ◆ Secure Socket Layer internet security protocols. The ‘http’ in the address line will be replaced with ‘https’ to provide (SS L :) is a protocol secure transmission of data over the internet, especially when confidential and sensitive data is developed for sending information securely over collected. It is commonly used in digital signatures, time stamping of electronic documents, the Internet by using an electronic transfers of money, email, WhatsApp, Instagram, and sim card authentication. encrypted link between a web server and a browser. ATL ACTIVITY ◆ Transport Layer Security (TLS): is an Social improved version of SSL and is a protocol that provides security Work in a small group to try out Caesar’s cipher (Julius Caesar used a substitution technique, between client and shifting three letters up, so C became F, D became G, and so on). server applications Teach a friend how to use Caesar’s cipher. communicating over the Try coding and decoding messages with each other. Internet. Data masking Encryption is essential for the trusted delivery of sensitive information; however, cyber threats still exist and the implementation of more stringent data protection legislation means that organizations must ensure that sensitive data is kept private. One method of doing this is called data masking. Data ◆ Data masking: The process of replacing masking is the process of replacing confidential data with functional fictitious data, ultimately confidential data with anonymizing the data. functional fictitious data, As we all know data is a valuable commodity and, once collected, it can be stored, used and ultimately anonymizing the data. shared. However, organizations face privacy problems should they do this without user consent. By anonymizing data an organization can both protect the privacy of their customers while using the data for application testing or business analytics, and/or sharing their data with third parties. Classifying data at stage 2 of the data life cycle makes this process much easier. 3.1 Data and data analysis 69 Data erasure At the final stage of the data life cycle, data needs to be destroyed. Data erasure can be either ◆ Data erasure: The physical or by a software-based method. destruction of data at the end of the data life cycle. Two physical methods are the use of degaussers, which use powerful electromagnetic fields to remove data (often used for magnetic media), and shredders, which break storage media down into tiny particles (an effective way of destroying solid-state storage devices and smartphones). Data- erasing software, on the other hand, permanently removes the original data on a storage device by overwriting it with zeros and ones. Data erasure must not be confused with the term data deletion. As a computer user, you can delete ◆ Data deletion: The files on your computer or cloud storage and send them to the recycle bin, or you can even reformat sending of the file to your storage device. However, with the right tools, deleted data can be recovered and is therefore not the recycle bin, which removes the file icon and secure, especially if you are disposing of hardware. pathway of its location. EXAM PRACTICE QUESTIONS Paper 1 (core) 1 Outline two methods of data erasure. [4 marks] 2 Explain why an organization may wish to use data masking to anonymize data. [6 marks] ATL ACTIVITY Communication Orally present to your peers a summary of a data breach. Using effective online searching skills, investigate a recent article about a data breach due to the improper disposal of computer hardware. Summarize your findings. Create prompt cards and practice presenting your findings using them. Orally present your findings to a group of friends. REAL-WORLD EXAMPLE Data breaches from lack of data erasure In 2010, some photocopiers that were used to copy sensitive medical information were sent to be resold without wiping the hard drives. Three hundred pages of individual medical records containing drug prescriptions and blood test results were still on the hard drive of the copiers. The US Department of Health and Human Services settled out of court with the original owner of the copiers for the violation of the Health Insurance Portability and Accountability Act (HIPAA) for US$1.2 million. In 2015, a computer at Loyola University that contained names, social security numbers and financial information for 5800 students was disposed of before the hard drive was wiped. https://njbmagazine.com/njb-news-now/the-challenge-of-recycling-office-electronics 70 Content Blockchain With increasing numbers of reports of hacking in the news, many individuals, organizations and ◆ Blockchain: a digital governments are looking for alternative systems that are more secure and transparent. Blockchain ledger of transactions uses a shared ledger in the process of recording transactions, allowing the trading and tracking of that is duplicated and distributed across a anything of value, such as copyrights, property and loyalty card points. To participate in blockchain, network of computers. users need to be part of the blockchain network, which will give them access to the distributed ledger. When a transaction occurs, it is recorded as a block of data. Each block forms a chain of data as the ownership of the asset changes hands, with details such as time and sequence of the transaction being recorded. Each additional block strengthens the verification of the previous block, which makes it very difficult to tamper with the transaction. There are many real-life examples of uses of blockchain; four are given here: l Microsoft’s Authenticator app for digital identity l the health care industry is using blockchain technology for patient data l blockchain technology can provide a single unchangeable vote per person in digital voting l the US Government is using blockchain to track weapon and gun ownership. EXAM PRACTICE QUESTIONS Paper 1 (core) IBM collaborated with Raw Seafoods in the USA to digitize the supply chain in 2019. Data would be uploaded to the IBM Food Trust platform at each stage of the supply chain. This included data on the time and location when the seafood was caught, when the boat docked at the port, when and where the seafood was packed, details about the shipping and delivery to supermarkets and restaurants. This included images and video. Blockchain technologies were used by the platform to reduce the level of fraud and increase confidence in the quality and freshness of the seafood. 1 a Identify two types of data recorded on the IBM Food Trust platform. [2 marks] b Outline two benefits of using the IBM Food Trust platform for seafood customers such as restaurants. [4 marks] 2 Explain step-by-step how blockchain technology can be used to reduce the level of fraud. [4 marks] 3.1 Data and data analysis 71 Inquir y In Section 1.4 you were introduced to the Inquiry Process. In this inquiry we will focus on only two of the stages. Use the guiding questions for each stage to help you complete the activity. 3.1G Data security (content) and 4.1B Home, leisure and tourism (context) Research one use of blockchain in the home, leisure or tourism industry. Make sure that you have enough sources to be able to: FOCUS Starting point Identify a real life example of blockchain. Identify which context this applies to from Section 4.1. Determine inquiry Suggest suitable concepts that could be applied. focus Draft three suitable inquiry questions. 3.1H Characteristics and uses of big data and data analytics Data creation Storage Usage Preservation Destruction ◆ Big data: Term used The term big data has been around since the 1990s and was made popular by John R Mashey, who to describe large volumes worked for Silicon Graphics at the time. Big data is the term used to describe large volumes of data, of data, which may which may be both structured or unstructured. Big data can be characterized by the 4Vs: volume, be both structured or unstructured. velocity, variety and veracity. Characteristics of big data 72 Content 1 Volume – big data consists of very large volumes of data that is created every day from a wide range of sources, whether it is a human interaction with social media or the collection of data on an internet of things (IoT) network. 2 Velocity – the speed that data is being generated, collected and analysed. 3 Variety – data consists of a wide variety of data types and formats, such as social media posts, videos, photos and pdf files. 4 Veracity – refers to the accuracy and quality of the data being collected. Uses of big data Big data analytics is when large and varied data sets are processed to identify trends and patterns. This may be used to analyse past behaviour in order to improve customer service, streamline operations or identify new revenue streams. REAL-WORLD EXAMPLE Big data in banking and finance Big data is allowing banks to see customer behaviour patterns and market trends. American Express is using big data to get to know its customers using predictive models to analyse customer transactions. It is also being used to monitor the efficiency of internal processes to optimize performance and reduce costs. JP Morgan has used historical data from billions of transactions to automate trading. A third use of big data has been to improve cybersecurity and detect fraudulent ◆ Real-time : Happening transactions. Citibank has developed a real-time machine learning and predictive now or live. modelling system that uses data analysis to detect potentially fraudulent transactions. https://algorithmxlab.com/blog/big-data Big data is at the foundation of all of the megatrends that are happening today, from social to mobile to the cloud to gaming. Chris Lynch REAL-WORLD EXAMPLE Big data in the sports industry Bundesliga, Germany’s professional association football league, introduced Match Facts in 2021 to give match insights to its viewers. During a match, 24 cameras are positioned on the field to collect and stream data during the 90-minute game. This data is then converted into metadata and used with past data to provide insights for the fans, such as which player is being most closely defended or the likelihood of a goal being scored. https://searchbusinessanalytics.techtarget.com/feature/Bundesliga-delivering-insight-to- fans-via-AWS 3.1 Data and data analysis 73 Inquir y In this inquiry we will focus on only two of the stages. After your initial research, use the guiding questions for each stage to help you complete the activity. 3.1H Characteristics and uses of big data and data analytics (content) and 4.4 Health (contexts) Research how big data is being used in medical diagnostics or medical care. Narrow the topic to a specific inquiry focus by using the digital society Determine inquiry diagram and considering the 3Cs. focus Find related real-world example(s). EXPLORE Find and evaluate sources. Explore Conduct research, collect data and record sources. Check that there are adequate sources to address each aspect of the inquiry and the inquiry focus. From your research: narrow down your focus to a particular application of big data used describe the use of big data for this real life example select three articles that you found to be useful and correctly list them in a bibliography. Activity: HL Extended Inquiry Once you have been guided through the extended inquiry process in Section 6 and learned the prescribed area of inquiry in Section 5.3A – make the connection with this topic and complete the extended inquiry. 3.1 Data and data analysis (content) and 5.3A Climate change and action Research and evaluate one intervention for climate change that uses big data. Research and evaluate this intervention using the HL extended inquiry framework. Make a recommendation for steps for future action. Present your work in the form of a written report. 3.1I Data dilemmas Alongside the ownership of data comes a huge responsibility to ensure that data dilemmas are addressed at every stage of the data life cycle. Stage 1: Collection of data Organizations must consider whether the data was collected ethically and complies with data protection regulations. For example, there should be no excessive collection of data and consent should be obtained. Careful consideration should also be placed on what data is collected so as to avoid biased data sets that may ultimately skew outcomes in machine learning. 74 Content REAL-WORLD EXAMPLE Bias in facial recognition In 2019 the National Institute of Standards and Technology (NIST) published a report analysing the performance of facial-recognition algorithms. Many of these algorithms were less reliable in identifying the faces of black or East Asian people, with American Indian faces being the most frequently misidentified. The main factor was the non-diverse set of training images used. https://jolt.law.harvard.edu/digest/why-racial-bias-is-prevalent-in-facial-recognition-technology Stage 2: Storage of data Where data is stored, who owns it, who is responsible for it, who has control of it and who has ◆ Data privacy: The access to it, all impact data privacy. Organizations must comply with the local data protection ability for individuals to regulations of the country that the data is stored in. Failure to do so can cost an organization a huge control their personal information. amount in legal fees and compensation should a data breach occur. ◆ Data reliability: Security of data and levels of access can impact the reliability and integrity of the data. Refers to data that is Unauthorized changing of data could deem data invalid and useless. complete and accurate. ◆ Data integrity: Refers The problem with unreliable data is that it is often used to make decisions and can lead to faulty to the trustworthiness of predictions and inaccurate forecasting. It is therefore important to identify the common problems the data and whether it that lead to unreliable data. has been compromised. 1 Biased data: We looked at data bias earlier in this chapter. This could be due to using biased data sets or bias by humans when selecting the data. 2 Viruses and malware: Stored data can be vulnerable to these external threats. Data can be changed, and therefore lose its integrity, or be corrupted and ultimately lost. 3 Reliability and validity of sources: Data can be generated from a number of online sources; if these sources have not been evaluated, this can lead to unreliable data being used by the IT systems. 3.1 Data and data analysis 75 4 Outdated data: Many IT systems collect and store data that is changing; if data is not updated it becomes unreliable data. Consider the telephone numbers of parents at school, for example – if a parent does not inform the school of a change in number, this data cannot be relied on to contact parents. 5 Human error and lack of precision: Any form of manual data entry is prone to human error. Automating data entry is crucial for reducing these types of errors. It is also easy for users to accidentally delete files, move them or even forget the name of the file and where it was saved. Effective file management procedures are essential to reduce these types of errors. REAL-WORLD EXAMPLE Reliability and validity of COVID-19 data In June 2020, the Guardian reported on a study that was published online about the effect of the anti-parasite drug Ivermectin on COVID-19 patients. The data in the study was obtained from the Surgisphere website using the QuartzClinical database, which claimed to be monitoring real-time data from 1200 international hospitals. However, as doctors around the world started using this data, they soon became concerned regarding the amount of anomalies they found. This resulted in prestigious medical journals reviewing studies that were based on this unreliable data and the World Health Organization Top tips stopping their research into the potential COVID-19 treatment. There are many terms www.theguardian.com/world/2020/jun/04/unreliable-data-doubt-snowballed-covid-19- and keywords when d rug-researc h -su rgisp here - coronaviru s- hyd rox yc hloroquine discussing data. Make sure you know the difference between reliability and integrity, EXAM PRACTICE QUESTIONS data matching and data mining, Paper 1 (core) and validation and 1 Distinguish between data reliability and data integrity. [4 marks] verification. Stage 3: Use of data The use of data should be ethical and comply with local data protection regulations. For example, data should only be used for the intended purpose and should not be shared without the user’s consent. When investigating the uses of data, one must also question who the data is shared with and for what purpose, as well as whether data has been anonymized before sharing with third parties. Individuals may choose to be anonymous for legitimate reasons, such as seeking personal advice or ◆ Cyberbullying: advice on embarrassing health conditions. However, too often, the use of privacy conceals the Bullying carried out identity of criminals, terrorists or computer hackers from law enforcement agencies. It may also be online, for example, on used in cyberbullying or to conduct internet searches without being traced. social media. Stage 4 & 5: Archiving and storage of data Again, organizations must comply with local data protection regulations when it comes to the retention and security of archived data. 76 Content Deeper thinking Privac y ◆ GDPR (General Data Protection Regulation): Legislation designed to harmonize data privacy laws across the EU. Throughout the Digital Society course, there will be intentional use of data, such as used by marketing many times when the impact of a digital technology or for surveillance, that citizens may not approve of. creates a breach of privacy. So, what does privacy actually mean? How can digital technology cause Data protection legislation a privacy breach? How are citizens being protected by legislation? Privacy is the ability of individuals and groups to determine for themselves when, how and to what extent information about themselves is shared with others. There are three key aspects of privacy: Freedom from intrusion – an individual has the right to be left alone; for example, when at home, you have the right to not answer the door if someone calls, and you don’t have the right to walk To protect individuals and groups of people from into someone else’s house uninvited. privacy breaches, different countries have their own set of data protection regulations and laws. In May 2018, Control of information about oneself – the General Data Protection Regulation (GDPR) controlling information about yourself is a very was introduced in Europe with the aim of harmonizing important aspect of privacy. You are the one who data privacy laws across Europe and providing greater decides what information is shared and where. rights and protection for European citizens. Designed Freedom from surveillance – if you have privacy, to be more stringent and up to date than previous it means that you are not being watched. laws, many other countries are following suit and adapting their own laws accordingly. What does it mean to have a breach of privac y? There are seven key principles at the heart of GDPR: Lawfulness, fairness and transparency – outlines Possible causes of data breaches include how the data being collected, used and stored may the unauthorized use of data by insiders – this could be treated. include the IT staff that maintain the data or the systems storing the data Purpose limitation – data collected should only be used for its original intended purpose. an accidental leak of data due to negligence or Data minimization – organizations should not collect carelessness, which could result in access by hackers or third parties more personal information than necessary. Accuracy – includes the responsibility of keeping data a series of errors that results in the exposure of information about an individual up to date and having processes to correct data. 3.1 Data and data analysis 77 Storage limitation – covers the duration that data is The aims of data protection regulations are to provide kept; it should not be kept longer than necessary. individuals with more control over their data and to Integrity and confidentiality – encourages provide them with rights. For example, an individual has organizations to adopt best practices for the right to be informed, they have the right to access securing data. their data, and they have the right to have it rectified. They also have the right to be forgotten. Accountability – a new principle to ensure that organizations can prove that they are working on compliance with the other principles. 78 Content ATL ACTIVITY Thinking Hold a discussion with a group of friends on the following questions: If someone says ‘you are invading my privacy’, what does this mean? What sorts of information do you think you should keep private? If you have been watching the news lately, how is social media being used to invade users’ privacy? Should you expect privacy if you use social media? If someone says ‘it was an anonymous post’, what does that mean? How has digital technology helped people be anonymous on the internet? Distinguish between privacy and anonymity. ATL ACTIVITY Research Research and apply the data protection regulations. Research the data protection regulations in one country of your choice. Apply these data protection principles/regulations to each stage of the data life cycle. Listen to the pizza order here www.aclu.org/ordering-pizza or here www.youtube.com/ watch?v=RNJl9EEcsoE. Describe the breach of privacy of this scenario, including which data protection principles have been breached. Investigate one recent example of a data breach. Describe the cause of the data breach and the cost to the organization. Inquir y 3.1 Data and data analysis (content) and 2.4 Power (concepts) Select a real-life example/news item with a digital society topic of interest Does this source involve digital systems? Starting point Does this digital system(s) clearly cause impacts and have implications for people and communities? Is this a topic that provides opportunities for both secondary and primary research? Determine inquiry Narrow the topic to a specific inquiry focus by using the digital society diagram and considering the 3Cs. focus Find related real-world example(s). EXPLORE Find and evaluate sources. Explore Conduct research, collect data and record sources. Check that there are adequate sources to address each aspect of the inquiry and the inquiry focus. Look back at the topics in this chapter. Select one that you would like to investigate further. Next, look at Chapter 2.4 Power and review the different prescribed areas. Conduct some initial research to help select a focus area. Can you link the inquiry to a personal experience? Find one or two examples of secondary research that will be the inspiration for this inquiry. Write a short summary to describe the topic of this inquiry, any link to a personal experience and the source that inspired you. 3.1 Data and data analysis 79 Creativity, activity, service (CAS) TOK Promote screen time awareness among Knowledge and technology the school community In TOK lessons you may have discussed what is meant by the term Complete a short formal survey about screen time among ‘knowledge’. How is the knowledge we have talked about in this chapter similar to your discussion in TOK? How is it different? Is a cross section of students in your school. Find out: there one ‘right’ definition of knowledge? One can also ask what what type of devices they use is the difference between data, information and knowledge? what activities students use their screens for how many hours they spend on-screen on school- IT systems with artificial intelligence may be better at analysing related activities patterns compared to humans but they rely on a vast amount how many hours they spend on-screen on non- of data. Many people are unaware that their digital footprint is school-related activities. allowing personal data to be collected, raising questions such as: What data is being collected? What methods are being used to Use a ready-made template online to create an get this data? infographic of your findings. Include suitable charts, How has digital technology impacted how we filter data images and text to describe student screen time habits. and information? Share this with the school community. Does this data give a complete picture of what it is really like to be human? It also raises ethical questions about information systems, how much data they should have about an individual, and how they are using the data. Reflection Extended Now that you have read this chapter, reflect on these questions: essay (EE) How is data different from information, and what role does technology have in creating wisdom? The data life cycle, big Could you explain the different stages of the data life cycle? data and analytics, or There are so many different types of data. Could you match up the types of data with the the dilemmas of data different contexts in the next section? may give you some Do you have the skills to present data in different ways? How might this be useful when initial ideas for an working on the digital society internal assessment? extended essay topic. Security is an important aspect of the data life cycle. What other chapters in this section might security be important for? What is the relationship between big data and artificial intelligence? How are these being used Learner profile to improve the quality of life? Do the benefits of collecting, analysing and sharing data outweigh the ethical concerns that Thinker it raises? How has studying data made you think What is the relationship between data and power? differently about your How does your learning about data and knowledge in this chapter relate to your understanding own personal data of knowledge in TOK? and how it is used? 80 Content 3.2 Algorithms and code UNDERSTANDINGS By the end of this chapter, you should understand:  algorithms are defined sequential steps or instructions to solve a specific problem or perform a task  the effectiveness of an algorithm is often evaluated according to its efficiency  the use of algorithms poses significant opportunities and dilemmas in digital society. Algorithms have been around for thousands of years, but it’s likely that you are hearing this term ◆ Algorithm: more frequently than ever before. Essentially, an algorithm is a step-by-step set of procedures used to A procedure or formula solve a problem or perform a specific activity. The success of all computer systems is dependent on for solving a problem that is based on a sequence algorithms and how they are programmed. of steps. Whether we are conscious of it or not, algorithms are a ubiquitous part of today’s society. For example, they are responsible for what we see on our social media feeds, and what movies are recommended on Netflix. The use of algorithms presents significant opportunities, but also poses new dilemmas for society. 3.2A Characteristics of an algorithm Algorithms define a set of instructions that will be carried out in a specific order, to obtain an intended output. Consequently, algorithms should have the following characteristics: l Unambiguous: Algorithms should be clear and concise; the inputs and outputs should be clear and all steps of the procedure explicit. l Finite: Algorithms must have a finite number of steps that end once they have been completed. The algorithm must stop eventually with either the expected output or a response that indicates that there is no solution. l Well defined: Each step of the procedure should be well defined, making very specific the steps to be taken and in what order. Details of each step must be explicit, including how to handle any errors. 3.2 Algorithms and code 81 l Inputs: The input is the data, which will be transformed by the procedure. An algorithm may have zero or more inputs. l Outputs: The output is the data that has been transformed by the process; it should match the desired output. An algorithm should have one or more well-defined outputs. l Feasible: For an algorithm to be effective, the procedure must be possible with the available resources and not contain any redundant unnecessary steps. l Independent: The algorithm should have step-by-step instructions and be independent of any programming language. ATL ACTIVITY Social Work in a small group to play this game. The purpose of the game is for one person to give instructions to their partner in sufficient detail to recreate a diagram. Each pair will need: copies of a printed diagram (this can be of anything as long as there is sufficient detail for person A to describe it) paper and pens chairs. Setting up the game: Place the chairs back-to-back in a row. Get into pairs; each pair should sit down back-to-back. On one side (side A) give all the contestants a copy of the diagram (side B must not see it). On the other side (side B), each contestant should have paper and a pen. Playing the game: Contestants on side A must give instructions to their partners (side B) on how to recreate the diagram. A time limit can be set to make this game harder. The winner of the game will be the team that has recreated the diagram most similar to the original diagram. After playing the game, reflect on the characteristics of an algorithm. Which characteristics did the winning team display? 3.2B Components of an algorithm The first computer algorithm was written in the 1840s by Ada Lovelace. It calculated a sequence of numbers called the Bernoulli numbers and was written for the ‘Analytical Engine’. However, since the introduction of computers, algorithms have become more and more complex and sophisticated. Despite the complexity of the more recent algorithms, many of the same components are commonly found. These include: l Instructions: An algorithm consists of a series of sub-algorithms, each performing a small activity. Each set of steps for a small activity is called an instruction. One example would be digit addition. l Variables: You may have come across variables in mathematical problem- solving or science experiments. They have the same function here in an algorithm, which is to temporarily store values while the steps of the algorithm are being executed. As the algorithm is being processed line by line, the variable will change value, hence its name. For example, an algorithm used to calculate profit will have the variable named ‘profit’ to store this data. Ada Lovelace 82 Content l Conditionals: One of the steps in an algorithm could be to make a decision or choice. An example of this is when an algorithm is required to determine whether a profit has been made. This could be written as: if Sales > Costs, then print ‘We are profitable’ Loops: Algorithms would be very limited if they could only run a sequence of steps once, which l is why many algorithms contain loops. Loops allow a set of instructions to repeat when a certain condition is met. For example, an algorithm may repeat until there are no more customers. 3.2C Ways of representing algorithms Algorithms are created independently of a programming language, which allows computer programmers to develop code in their preferred programming language. However, there are three main ways to represent an algorithm. Natural language is a popular choice and may often be considered the first step of designing a computer program. Using everyday language allows developers to work with non-coders to write down the steps that the algorithm needs to follow, with the advantage that everyone involved is able to understand the process. An example of a natural language algorithm would be a cooking recipe. To make a cake, you would follow the detailed steps of the recipe, such as get the ingredients from the cupboard through to take the cake out of the oven. Although natural language is easy for everyone to understand, it has a tendency to be ambiguous and lack clarity. Consequently, an ◆ Flowchart: A visual START alternative method used to represent an algorithm is a flowchart. representation of an algorithm showing an Flowcharts use a standard set of symbols to represent the Preheat overview from start different components, and arrows are used to show the the oven to end. direction of the steps. For example, each rectangle represents an action, while diamonds represent a condition or a loop. Mix the ingredients Flowcharts help programmers visualize the steps of the algorithm and force them to think about sequence and Pour into selection. This makes them a useful planning tool. cake tin Once an algorithm has been planned, it is time to start writing Remove the code so that the program can be tested and implemented. from oven Put into There have been many programming languages created oven throughout history, with over one-third of these developed in a Let it cool down country that has English as the primary language. Despite the Bake diversity of languages, many of the keywords, such as ‘if’ for conditions and ‘while’ for loops, are in English. It is not just the Not ready Eat Ready language used in the code that is important, however, but also the language of the community of programmers. According Test – STOP to the TIOBE Index 2022, Python is the most popular is it ready? programming language, followed by C, Java and C++ https://www.tiobe.com/tiobe-index/ Flowchart to create a cake 3.2 Algorithms and code 83 EXAM PRACTICE QUESTIONS Paper 1 (core) 1 a Outline two characteristics of an algorithm. [4 marks] b Outline two common components found in an algorithm. [4 marks] 2 Explain why programmers may prefer to plan out the algorithm using a flowchart compared to natural language. [6 marks] 3.2D Uses of algorithms Whether they are processed by mathematicians, scientists or computer scientists, algorithms often perform the same common tasks. Take, for example, the bubble sort algorithm learned by computer science students. This is one of the most basic sorting algorithms, which runs in a loop and swaps adjacent elements until they are in the correct order. Alternatively, sorting may be found as a built-in function in a spreadsheet or database. Algorithms may also be used for searching (which may be referred to as a ‘query’ in a database), filtering (with the REAL-WORLD EXAMPLE selection of cells based on certain criteria) and counting. For example, a scientist may use a spreadsheet to analyse Search algorithms: PageRank the results of an experiment. To do this they may sort the results to find the highest or lowest values, or filter the Google’s search algorithm PageRank is one of the most spreadsheet for a specific variable to narrow down their frequently used search algorithms to find the most results, or count how many instances a given value appears relevant web pages for a given search criteria. There in their results. Alternatively, the scientist may use an open- are many sub-algorithms in this search algorithm that access research database to search for results of similar look at factors such as the words used in the query, laboratory investigations. expertise of sources, quality of content, location and useability of web pages. ATL ACTIVITY Thinking In the last chapter, there was an activity to download weather data. Use this data to practice using functions in a spreadsheet program. Search for online tutorials on how to sort, filter and count in a spreadsheet. Using the weather spreadsheet from before select a column to sort the data in order try using the filter features to narrow down the data decide on a certain criterion and use the ‘count’ function. 84 Content An effective algorithm is one that makes an activity more efficient and ◆ Prioritization solves the initial problem. So, naturally businesses are looking to algorithm : A sorting algorithms to help them be more competitive. One such algorithm is algorithm used to prioritize tasks. the prioritization algorithm, which is a sorting algorithm used to ◆ Association rule: prioritize customer orders, prioritize help desk requests or even decide Uncovers how items which region to prioritize sales in. The first step in the algorithm is to are associated with count the frequency of requests from a customer, department or area. each other and reveals They are then sorted and classified into high, medium and low relationships between items in large databases. frequency, and then finally the customer, support, request or region would be ranked. Prioritization algorithms A second algorithm to improve efficiency is the association rule. Used in machine learning, association rules are algorithms being used in market basket analysis and medical diagnosis. Simply put, an association rule uncovers how items are associated with each other and reveals relationships between items in large databases. For example, analysing items in shopping baskets can determine how likely one item is to be bought with another. This information can then be used to determine product placement within a store, which will save customers time and remind them of things that they might be interested in buying. Whether algorithms are in basic computer programs written by computer science students or by programmers from the top technology companies, the increase in the amount of data generated has steered companies towards artificial intelligence algorithms to help them make Analysing items in shopping baskets can determine sense of the data. how likely one item is to be bought with another ATL ACTIVITY Research Link s Search for open-access databases using a search engine. We will learn more Conduct a simple search to identify the most popular free, open-access databases. about machine learning Search these databases to see if you can find out more information about prioritization and and neural networks in association algorithms. Chapter 3.6 Artificial Read and make notes. intelligence. REAL-WORLD EXAMPLE Machine learning algorithms and facial recognition According to American Scientist, machine learning algorithms are being used to link physical appearance with other traits, many of which are reportedly making false claims. In one example, an algorithm was used to determine personality traits of job candidates based on their facial expressions. In another, machine learning was used to determine if a person was cheating in an online examination based on how their face changed as they answered the questions. One notorious misuse of facial recognition was an algorithm that claimed to identify a criminal based on the shape of their face with an accuracy of 89.5%! w w w.americanscientist.org/ar ticle/the-dark-past-of-algorithms-that-associate-appearance- and-criminality 3.2 Algorithms and code 85 Inquir y In this inquiry we will focus on only one of the stages of the inquiry process. If you need to refresh yourself on the Inquiry Process, revisit Section 1.4. 3.2 Algorithms and code (content) & 2.2 Expression (concepts) Artificial intelligence models are being used to capture human expressions to make predictions. What impact is this having on people? Formulate an inquiry question, find real-world example(s) and connect them to the 3Cs Is your question concise, thought-provoking and worth considering from Determine inquiry different perspectives? Does your question support discoveries that move beyond recall, description focus and summary? Are the course concepts, content and contexts that you have identified connected to your inquiry question? Use your research skills to narrow down this focus question to a particular context. Find one real-life example to support your choice of context. Rewrite your focus question to include the context and summarize the real-life example. 3.2E Algorithmic dilemmas One goal of algorithms is to make people’s lives easier, for organizations to operate with greater efficiency, and for governments to make better decisions. Algorithms are all around us, aiding us with online searching and shopping. Small businesses can gain new insights into trends when making sales forecasts without having to hire experts, which ultimately allows them to provide a better service for customers and employ the right staff. Governments can analyse health data to improve hospital services and use artificial intelligence in the courtroom. However, algorithms created with good intentions in mind sometimes have negative consequences, albeit unintentional. Algorithms replacing human judgements Algorithms can be better decision-makers than humans: they don’t get tired, and the decisions can be applied consistently and with precision as they are not emotional. On the other hand, however, even logical algorithms can give inappropriate results. 86 Content REAL-WORLD EXAMPLE Algorithms used by the police and courts of reoffending. However, criminals from low-income and minority communities were at risk of having higher In 2020 the Harvard Gazette reported on the use scores because historically they came from areas with of algorithms by the US government. There were a disproportionately higher number of court cases. approximately 2.2 million adults in prison in the US in This led to less-favourable sentencing. In a different 2016. With increased pressure to reduce these numbers, case, one court sentenced a man to 18 months in jail police departments used predictive algorithms to help because the algorithm placed a greater weighting on his decide where to locate police personnel on the streets. age. If he had been older, he would have had a much At the same time, courtrooms were using criminal shor ter sentence. risk assessment algorithms to determine the length of https://news.harvard.edu/gazette/story/2020/10/ethical- prison sentences. This algorithm created a score for concerns-mount-as-ai-takes-bigger-decision-making-role each defendant based on their profile and likelihood As we can see from the example, there are real concerns about how much we should rely on algorithms for making judgements. The main problem is that many algorithms make judgements based on the bias that already exists in society. Since machine learning algorithms are learning from data sets using historical data, it is not surprising that these algorithms adopted the same level of bias that was present with previous human judgements. ATL ACTIVITY Thinking Formulate a reasoned argument to support your opinion about the use of algorithms to replace human judgements. Conduct wider research to find two more examples of how algorithms are being used. Use this research to write a paragraph about algorithms replacing human judgements. The paragraph should include your opinion, which should be supported by clearly written arguments with real-life examples. Algorithmic bias There are two main reasons why artificial intelligence systems have built-in bias: 1 Human algorithm developers unknowingly introduce bias into their models. 2 The training data set includes biased data or is incomplete, so it is not a true representation of the population. REAL-WORLD EXAMPLE Bias in algorithms A study published in 2020 found that the algorithm used to determine the health of a patient’s kidney function used race as one of the factors. Of the 57,000 medical records reviewed in Massachusetts, one-third of the black patients would have had their disease classified as more severe if the formula used for white patients had been applied. w w w.wired.com/stor y/new-formula-help-black-patients-access-kidney-care 3.2 Algorithms and code 87 Black box algorithms and the lack of transparency Many of today’s algorithms are often considered to be black box algorithms. In artificial intelligence this is when insights are made but it is not clear how the algorithm came to reach the conclusion from the data input. People are generally aware that algorithms are influencing the world they live in, yet don’t know what they are or how they work. Since artificial intelligence algorithms can learn from experience and improve over time, it is very difficult to know how they are making decisions. As artificial intelligence becomes increasingly sophisticated with developments in deep learning, programmers have less control over how artificial intelligence learning evolves. This creates issues around who is responsible when the algorithm does ◆ Black box not perform as expected. Who is to blame? Is it the programmer or the data scientist? algorithm : An algorithm that provides insight With highly regulated industries such as health care and financial services, transparency in algorithms without clarity on how is important. People in these industries are held accountable for the decisions being made by artificial the conclusions were intelligence systems. This can be problematic, however, because: reached from the data input. l it is often difficult to explain how the algorithm reached its conclusion ◆ Transparency in l it is not always possible to know how the training data was selected algorithms: the ability to understand and l the evolving nature of machine learning makes it difficult to keep up. be able to explain the inner workings of the algorithm. REAL-WORLD EXAMPLE Black box algorithms: Object detection systems in autonomous vehicles ‘Predictive inequity in object detection’, an academic paper written by a group of researchers at the Georgia Institute of Technology, highlighted that the deep learning computer vision model found it difficult to detect people with dark skin. If used in autonomous vehicles, it would unlikely detect a pedestrian crossing the road if they had darker skin. Due to the nature of the black box algorithms, it made it much harder for developers to go back and correct the algorithm. REAL-WORLD EXAMPLE Black box algorithms: Deep Patient software Top tips The dilemmas found in this chapter are closely In 2015, a research group trained their Deep Patient software to discover patterns hidden linked to many of the in hospital data to predict if a patient was likely to develop medical problems. It was dilemmas in artificial particularly good at predicting psychiatric disorders, which were often difficult for human intelligence, robotics doctors to foresee. However, because of its black box nature and lack of transparency, and autonomous doctors were very resistant to using it. technologies. 88 Content Inquir y In this inquiry we will focus on only two of the stages. After your initial research, use the guiding questions for each stage to help you complete the activity. 3.2E Algorithmic dilemmas (content) and 2.6 Systems (concepts) Inquiry focus: How have the use of black box algorithms resulted in unintended consequences for a digital society? Explore and collect information from relevant sources Do these other sources provide claims and perspectives that will be useful in the inquiry? Explore Have you gathered a range of content from secondary and primary research and investigations? Can you provide clear justification for three main sources for their usefulness in the inquiry? Use your research skills to find three sources to further your understanding of this question. Your final choice of sources must be able to help you gain a deeper understanding of this topic, provide a balance of claims and perspectives, and be embedded in the content (algorithms) and concept (systems). Write a short report to justify your chosen sources and their usefulness in this inquiry. Your report should include: a discussion on the origin and purpose of each source, including any potential bias or limitations of using the source a discussion of the main ideas being presented in each source and what features of the source were used to support the claim being made a discussion on how the sources corroborate or contradict, and how it has helped you gain a deeper understanding of this question a bibliography entry for each source at the end of the report. Analyse impacts and implications for relevant people and communities Is your inquiry question supported by additional questions to consider for analysis and evaluation? Analyse Does your analysis focus on the impacts and implications for people and communities? Is your analysis effective, sustained and well-supported by evidence? How has the use of black box algorithms in one real-life scenario resulted in unintended consequences for a digital society? Analyse the inquiry focus using the perspective of systems by answering these questions: Describe the components of the digital system for the selected real-world scenario that uses a black box algorithm. Explain how the different components of the digital system interact with each other and the people and communities using it. Use a diagram to support your explanation. In what ways have black box algorithms changed the human/built/natural systems that are related to the real-life example. Are they evolutionary, adaptive, transformational or radical? Evaluate the intended and unintended consequences of the specific use of this black box algorithm? How has the use of black box algorithms made it possible for developers and users to fully understand the connections between the components of the digital system? In your opinion, are digital systems and devices becoming too big and complex to understand? 3.2 Algorithms and code 89 Activity: HL Extended Inquiry Once you have studied Section 5.1, complete this inquiry activity. 3.2E Algorithmic dilemmas (content) and 5.1 Global well-being Inquiry focus: How is algorithmic bias contributing to global inequality? Research and describe the global challenge: Use effective research skills and identify one example where algorithmic bias is contributing to global inequality. Describe the challenge in detail. Research and evaluate one intervention for this challenge: Research and evaluate this intervention using the HL extended inquiry framework. Make a recommendation for steps for future action. Present your work in the form of a written report. Activity: HL Extended Inquiry Once you have studied Section 5.2, complete this inquiry activity. 3.2E Algorithmic dilemmas (content) and 5.2 Governance and human rights Challenge: An algorithm was developed to predict which patient would need extra medical care. The algorithm favoured white patients over black patients because they had used historic data on patient health care spending and made faulty assumptions based on the correlation between income and race. Inquiry focus: To what extent is algorithmic bias reinforcing racial and ethnic discrimination? Research and describe the global challenge: Use effective research skills to find out more about the global challenge of algorithmic bias and racial/ethnic discrimination. Describe the challenge in detail. Research and evaluate one intervention for this challenge: Research and evaluate this intervention using the HL extended inquiry framework. Make a recommendation for steps for future action. Present your work in the form of a written report. Creativity, activity, service (CAS) User guide for CAS As a service for the CAS coordinator, use the knowledge gained about algorithms in this unit to develop a step-by-step guide for IB students on how to document CAS at school. Use your knowledge about the characteristics of an algorithm and ways of representing algorithms to analyse the system in place at your school. Discuss the choice of digital media to use with the CAS coordinator. Select suitable software to create the guide and distribute it to students. 90 Content TOK Extended essay (EE) Knowledge and technology Digital technology is changing the way we know things. Is it helping or hindering our cognition? The uses of algorithms One may start to question the ethical limits that should be put in place during the process of and the dilemmas acquiring knowledge, especially the use of algorithms by IT systems. Can the use of algorithms to that they raise may predict behavioural traits ever be free from human bias? generate some initial ideas for an extended essay topic. Reflection Now that you have read this chapter, reflect on these questions: Learner profile Could you explain the characteristics and different components of an algorithm? Inquirers Compare the different ways algorithms can be represented? When is each of these ways used in Inquirers conduct real life? wider research into Could you explain the different types of algorithms and when they are used in a range different claims of contex t s? and perspectives to What is the relationship between data (3.1), algorithms (3.2) and artificial intelligence (3.6)? check for possible What are the main causes of bias in an algorithm, and what solutions are t

Use Quizgecko on...
Browser
Browser