Applied Computing VCE Units 1 & 2 Chapter 1 PDF
Document Details
Uploaded by AdventuresomeAlliteration
Kilbreda College
Tags
Summary
This document is from a chapter in an applied computing textbook that covers data analysis. It discusses qualitative and quantitative data types, methods of gathering and analysing data, ethical issues, and the principles behind presenting findings. The keywords all related to this data analysis chapter are "applied computing," "data analysis," "qualitative data," and "quantitative data".
Full Transcript
# Applied Computing VCE Units 1 & 2 - Chapter 1: Data Analysis ## Introduction VCE Unit 1 of Applied Computing looks at how software tools such as databases and spreadsheets can be used to create visualizations of data. Programming languages are also studied. Throughout the unit, students will app...
# Applied Computing VCE Units 1 & 2 - Chapter 1: Data Analysis ## Introduction VCE Unit 1 of Applied Computing looks at how software tools such as databases and spreadsheets can be used to create visualizations of data. Programming languages are also studied. Throughout the unit, students will apply the stages of the problem-solving methodology. They will analyze data and develop data visualizations, as well as design and develop software solutions. Different types of data are acquired and manipulated in database and spreadsheet software. There are two outcomes to be completed in Unit 1: - **Area of Study 1 - Data Analysis:** Students will gather, organize, analyze and present appropriate data, and develop data visualizations based on given solution requirements and designs. They will validate their data and apply formats and conventions to data visualizations. - **Area of Study 2 - Programming:** Students will design and develop solutions in a programming language based on program requirements provided by their teacher. Validation, debugging and testing are part of the development process. ## Chapter 1: Data Analysis ### Key Knowledge - Types and purposes of qualitative and quantitative data. - Characteristics of data types relevant to selected software tools. - Factors affecting the quality of data and information. - Characteristics of data and information. - Techniques for applying the Australian Privacy Principles (APPs) in the Privacy Act 1988 relating to the use, management, and communication of data and information. - Ethical issues arising from the use, management, and communication of data and information. - Referencing primary and secondary data and information. ### For the Student This chapter focuses on identifying and collecting data to present findings as data visualizations. Students will: - Interpret teacher-provided solution requirements and designs. - Collect and manipulate data. - Analyze patterns and relationships. - Develop data visualizations to present findings. ### Understanding Research - Consumers of research read research, while producers of research conduct research. - **Theories** are general statements that describe something, provide an explanation, and can be used to predict future events. - **Research questions** assist researchers in focusing their investigation. - **Hypotheses** are based on probabilities about what will happen. - **Qualitative data** is about qualities and attributes. - **Quantitative data** is measurable and specific. ### Advantages and Disadvantages of Quantitative and Qualitative Data #### Quantitative Data **Advantages:** - Participants are more willing to participate in quantitative studies. - Surveys can capture a large sample size, providing statistical validity. - Findings can be communicated easily. **Disadvantages:** - Data can be superficial. - It may be difficult to analyze a lot of data. - Time and budget constraints may limit the depth of the data. #### Qualitative Data **Advantages:** - Provides in-depth descriptions of participants. - Allows for follow-up questions. - Can be used to understand complex issues. **Disadvantages:** - Time-consuming to process and analyze. - Can be difficult to make generalizations. - Subjective interpretations can be difficult. ### Table 1.1: Comparison of Qualitative and Quantitative Data Aspects | Aspect | Qualitative Data | Quantitative Data | |---|---|---| | Analysis techniques | Thematic analysis, content analysis | Statistical analysis, computational modeling | | Insights | Provides in-depth understanding of user experiences and motivations | Provides broad and generalizable insights | | Flexibility | Very flexible, easy to explore new ideas and areas | Limited flexibility, predefined variables must be determined | | Advantages | Can gain rich, detailed data; helps to provide contextual understanding; very useful for exploring complex issues | Easy to analyze and visualize; allows for generalizations to be made; can handle large data sets easily and efficiently | | Disadvantages | Very time-consuming to process and analyze the data collected; tends to result in subjective interpretations; can be difficult to make generalizations from findings | Can miss contextual elements and nuances; requires large sample sizes for accuracy; can be costly to collect the required amount of data | | Used for | Understanding user behaviour and improving user experience | Measuring system performance, analyzing usage patterns | ### Data and Information - **Data** is raw, unorganized facts, symbols, or numbers. It may include images and sounds. - **Information** is created when data is organized and presented in a meaningful and useful way. - It is used to inform, entertain, or persuade an audience. ### Primary and Secondary Data and Information - **Primary sources:** Data that provides a first-hand account of a person, object, event, or phenomenon. Examples include interviews, surveys, and personal histories. - **Secondary sources:** Data that is derived from analysis of existing personal information or from other sources. Examples include observations, results of experimental testing, and information derived from human biospecimens. ### Techniques and Methods - **Surveys** are a quick way of gathering large amounts of data. They can be closed or open-ended. - **Interviews** are used to elicit people's opinions and beliefs. They can be structured, semi-structured, or unstructured. - **Data collection methods:** Techniques used to capture data, including surveys, interviews, observation, and sensor data. ### Purposes of Quantitative and Qualitative Data - **Quantitative data** is often used to: - Describe. - Predict, - Improve processes. - Cleanse data - **Qualitative data** is often used to: - Provide rich descriptions of a sample group. - Conduct in-depth studies of participants. ### Census Data - The Australian Bureau of Statistics (ABS) conducts a census every five years, collecting information on the population, housing, and other demographics. - This data is valuable for understanding population characteristics and trends, as well as for informing policy decisions and urban planning. ### Data Generated by Artificial Intelligence - Al systems generate both quantitative and qualitative data: - **Quantitative Data**: Sensor data, predictive analytics, and user interaction metrics. - **Qualitative Data**: Natural language processing, computer vision, and audio analysis. - AI integration can provide a more comprehensive and nuanced analysis of both qualitative and quantitative data. ### Referencing Primary Sources - When referencing primarily sourced information, ensure that detailed information is recorded, including: - **Interviews**: - Interviewee name. - Date of interview. - Place of interview. - Interviewee's qualifications. - Contact information for the interviewee. - How the interview was conducted. - **Surveys**: - Respondent name. - Date of survey completion. - Survey title. - Organization that conducted the survey. - How the survey was conducted. - **Observations**: - Observer name. - Date and time of observation. - Location of observation. ### Examples of Referencing - There are many ways to cite sources. - **Footnotes:** Notes listed at the bottom of the page. - **APA Style:** Citations within the text are provided, along with a reference list at the end of the work. The APA style is commonly used in academic writing. ### Interpretation of Information for Communication and Decision-Making - Interpreting data and information is necessary to make informed decisions. - Ethical principles should be considered, including justice, respect, and beneficence. - Common methods for communicating findings include: - Journals. - Books. - Conference presentations. - Data visualizations. ### Quality of Data and Information - Reliable data must be relevant, accurate, free from bias, and usable. ### Accuracy - Accuracy refers to data being correct. - **Transcription errors** can occur when data is being entered. - **Verification** of the data is important to ensure accuracy. ### Freedom from Bias - Bias can result in unreliable data - **Bias** is a prejudice or unreasoned judgement. - **Types of bias** include: - Vested interest: When someone has a personal reason for promoting a certain outcome. - Timing: When data is collected during an unusual event. - Small sample size: When a sample size is too small to represent the whole population. - Bias through sorting: When order of data is manipulated. - Bias in graphic representations: When visual representations are not sized proportionally. ### Integrity - **Data integrity** refers to data quality. - Data integrity is needed to ensure that information produced is useful. - Errors can reduce data integrity - **Types of errors** include: - Incorrect data: Wrong or outdated data. - Data that is not relevant: Data that doesn’t relate to the research question. - Data that is not accessible: Data that is difficult or impossible to access. ### Relevance - **Relevant data** directly supports the objectives of the analysis or decision-making process. - **Irrelevant data** can distract or mislead. ### Accessibility - **Accessible data** is available to the right people at the right time. - Factors that affect accessibility include: - Data storage solutions. - User access controls. - Availability of data retrieval tools. ### Clarity - **Clear data** is well defined. - **Clear data** is easy to analyze and use effectively. - **Lack of clarity** can lead to errors and misunderstandings in data interpretation. ### Context - **Context** includes the background information and circumstances surrounding the data, such as its source, conditions of collection, and relevant metadata. - **Context** helps in understanding the significance and limitations of the data, ensuring accurate analysis and interpretation. ### Privacy - **Privacy** is a balancing act between protecting individuals' rights and those of researchers. - **Privacy laws ** attempt to prevent inappropriate intrusion into the lives of individuals. - **De-identification** removes personal identifiers such as names and birthdates from data. - **De-identification techniques** are important to protect individual privacy. ### Security to Protect Personal Information - Stored data must be protected to preserve confidentiality and meet legal requirements. - **Steps to protect data:** - **Physical security controls** - **Encryption** - **Backups** ### Encryption - **Encryption** translates data into a code that can only be read by authorized users. - **Decryption** is the process of translating encrypted data back into plain text. - **Encryption** is necessary to protect data confidentiality. ### Backups - **Backups** are important to protect against data loss due to human error, computer crashes, and software faults. - **Types of backups:** - **Full backup:** Backs up all files. - **Differential backup:** Backs up files that have changed since the last full backup. - **Incremental backup:** Backs up files that have changed since the last incremental backup. ### Location of Backup Files - Backup files should be stored in a safe location that is protected from theft, damage, and disasters. - Cloud storage can provide off-site backup solutions. ### Usernames and Passwords - **Usernames** are uniquely assigned to users and should be easy to remember. - **Passwords** should be: - At least eight digits long. - Include non-alphabetical characters. - Not easily guessed. - Changed every month. ### Firewall - **Firewalls** filter network traffic to protect computer networks from unauthorized access and security breaches. - They can be used to: - Block access from specific computers. - Block access to certain domain names. - Block access to certain protocols. ### Malware Protection - **Malware** is malicious software. - **Types of malware:** - Spyware and adware - Trojan horses - Worms and viruses. - **Anti-malware programs** are important to protect computers from malicious activity. - **Firewalls** are also useful in blocking malware from sending personal information over the internet. ### Ethical Issues - **Ethical issues** arise from the impact of digital systems on people's rights. - **Considerations for designing, controlling and using digital systems:** - The real and potential negative effects. - Legal objections. - Ethical considerations. - Providing clear information to users. ### Ethics - **Ethics** are accepted moral standards that guide behavior. - Ethics are distinct from laws. - When ethical principles conflict, it can create moral dilemmas. ### Transparency - **Transparency** is essential for managing data ethically. - **Transparency** includes: - Specifying the original purpose of data collection. - Describing how data is stored and accessed. - Identifying ownership and copyright. ### Use of Inaccurate or Incomplete Data - **Inaccurate or incomplete data** can lead to biased or unfair outcomes, especially in decision-making processes that rely heavily on data analysis. - **Examples:** hiring practices, loan approvals, and legal systems. ### Privacy and Confidentiality - **Privacy** is essential when managing and communicating data, especially when handling sensitive information. - **Incomplete data** can lead to breaches of privacy if it is not appropriately de-identified or if context is lost. ### Harm and Misinformation - **Inaccurate data** can lead to harmful decisions and spread misinformation. - **Examples:** misdiagnosis in healthcare and ineffective government policies. ### Ethical Responsibility - Organizations and individuals responsible for managing and communicating data have a duty to ensure the data’s accuracy and completeness. ### Ownership and Control of Data - **Ownership and control of data** are central to ethical issues in data management. - Data subjects have the right to know how their data is used and to seek permission to access, use, and benefit from data. ### Informed Consent - **Informed consent** is essential before collecting data about individuals or groups. - Participants must be informed of the purpose of the research and any risks involved. - Participation in the research must be voluntary. - **Consent forms** are a common way to document informed consent. ### Misuse of Personal Data and Information - **Misuse of personal data and information** raises ethical issues, including: - Breaching ethical standards. - Compromising informed consent. - Fostering discrimination and profiling. - Eroding trust. ### Repurposing and Sharing of Data by Artificial Intelligence Systems - AI systems present ethical issues when data is used beyond its original intent or shared without explicit consent. - **Ethical considerations:** - Privacy violations. - Misuse. - Unauthorised access. - **Transparency, robust consent processes, and stringent data protection measures** are essential to address these ethical concerns. ### Handling Ethical Issues - **Six steps**: 1. Identify the problem. 2. Identify the stakeholders. 3. Identify possible alternatives. 4. Identify ethical standards. 5. Evaluate options. 6. Make a decision. ### Test Your Knowledge **Data and Information** 1. What can occur if information is produced from incorrect or incomplete data? 2. Why is it important to ensure that data is accurate? 3. What are the properties of usable data? 4. What is a common cause of inaccurate data? 5. How can the accuracy of a primary source be determined? 6. With an example, explain the importance of timeliness in ensuring the quality and usability of data? 7. What influences the introduction of bias into data? 8. What is the difference between quantitative and qualitative data? Provide two examples. 9. What strategies could be used when gathering quantitative data? 10. Provide an example of referencing based on the APA style. 11. Why is it important to obtain permission when collecting data? 12. What is the purpose of consent forms? 13. Why is encryption important in data security? 14. How do usernames and passwords protect data? 15. What makes a strong password? Provide an example of a very strong password. 16. Describe a strategy for backing up data. 17. What is the difference between a full backup and an incremental backup? 18. Why is it important to secure data when conducting research? 19. How can the use of AI-GPT assist with administering surveys or interviews? 20. What are some implications of using Al-GPT to summarise surveys and interviews? **Interactions and Impacts** 21. How do the Australian Privacy Principles affect the individual? 22. Under which legislation do the Australian Privacy Principles fall? 23. Why is it important to de-identify personal data? 24. What is an ethical dilemma in the context of data collection for research purposes? ### Apply Your Knowledge **Street Traffic** 1. Milorad wants to do some research to support his theory about the traffic on his street. - Clearly state the topic Milorad will investigate. - What type of data will Milorad need to collect to assist in his investigation? - Identify an appropriate data-gathering technique Milorad could use. - Justify the selected data-gathering technique. - How will Milorad keep the data safe? - Does Milorad need to get permission to conduct his research and, if so, from whom? - What tools will Milorad use to interpret the results? - What types of relationships and patterns is Milorad looking for? - How will Milorad present the data to the local council? **Internet Usage** 2. Use the UN Data website and the 'Datamarts' to find the **World Telecommunication/ICT Indicators database.** Select to view the data on the percentage of individuals using the internet. - Filter the data to show only the value of countries using the internet in 2010-14 from 10 countries. - Copy this data into a spreadsheet program. - Create a column chart and a scatter diagram chart to depict these statistics. - Discuss which graphic representation best conveys the data and why. ## Essential Terms - **Accurate:** Data that is correct in all details and free from errors. - **Backup:** The process of copying files from an information system to some type of storage device to guard against possible data loss. - **Bias:** Prejudicial or unreasoned judgment. - **Boolean data type:** A logical data type; Boolean data can hold one of only two possible values, usually true or false. - **Character data type:** A data type representing a single letter, number, symbol, punctuation mark, or space. - **Closed (or closed-ended) questions:** Queries that are restricted in the range of options provided so that only specific answers are elicited. - **Data:** Unprocessed, unorganized and distinct facts or ideas; in addition to text and numbers, data also includes sounds, images, and video. - **Data integrity:** The quality of the data. - **Data types:** Different forms that variables and data may take, that determine the data that a variable may contain, and how the data or variable may be manipulated. - **Decrypt:** To translate encrypted data back into ordinary text that can be read by anyone. - **De-identify:** To remove information from data so an individual cannot be identified. - **Differential backup:** Used in conjunction with a full backup; only files that have been altered since the last full backup are copied; restoration requires the full backup to be restored first, followed by files from the differential backup. - **Dilemma:** When people must choose between two (or more) equally desirable (or undesirable) options; for example, between allowing the sale of violent video games to preserve freedom expression and banning their sale in order to protect children from possible harm. - **Encryption:** The process of encoding or changing data so that an unauthorized user who reads the data would not be able to understand it. - **Ethics:** Accepted moral standards that guide behaviour; these standards may be common across a particular society or specific to a single organization, and they apply to questionable activities over and above any legal requirements; ethics often provide us with a set of guidelines for appropriate behaviour. - **Exponent:** The power of 10 by which the mantissa in a floating-point number must be multiplied to regain the original number. - **Firewall:** Hardware and software that restrict access to data and information on a network. - **Floating-point number:** A number with a fractional or decimal component. - **Footnote:** A reference that is listed at the bottom of the page on which the citation is made. - **Full backup:** Copying all chosen files to a backup device; it can be slow to perform, but is the easiest and quickest form of backup from which to restore data. - **Hypothesis:** Testable statements that are based on probabilities about what will happen according to the applied theory. - **Incremental Backup:** Similar to a differential backup in that it works in conjunction with a full backup, and only backs up files that have been altered since the last incremental backup, but it uses more than two backup media; it is the most complicated strategy from which to restore files. - **Information:** Processed, organized and value-added data, which can be paper-based (hard copy) or digital (soft copy). - **Informed consent:** A necessity for all participants before agreeing to take part in research; participants must be informed of what the research involves, the time commitment expected and any possible risks that may arise. - **Integer:** A number without a fractional or decimal component. - **Interview:** A conversation, usually between two people, in which questions are asked and answers are given. - **Malware:** Short for 'malicious software'; programs designed to infiltrate and cause damage, disruption or access to a device or network without the user's knowledge or consent; includes viruses, worms, Trojan horses, adware, spyware, logic bombs and keyloggers. - **Mantissa:** All of the digits in a floating-point number, with a decimal point after the first digit that is not zero. - **Open-ended questions:** Queries that allow people to answer in the manner they wish. - **Participant information statement:** A document that provides participants with information about the research in an unbiased way, and also provides the scope to answer questions that the participant may have. - **Password:** A secret term used to identify the user. - **Plagiarism:** Passing off someone else’s work as your own. - **Precision:** Being exact. - **Primary sources: ** Original, unprocessed data and resources; that is, information that has not been processed, analyzed, or interpreted in any way, such as interviews; it is usually gathered from stakeholders. - **Pseudonym: ** A fictitious name that is given to a person, or that is chosen by a person, to hide or protect their identity. - **Qualitative data:** Collected data that is based on subjective data-collection techniques such as interviews, focus groups, video footage, and observation. - **Quantitative data:** Collected data that is measurable and specific; quantitative data gathering is based on verifying a theory through the use of statistics and data that is largely numerical. - **Referencing:** Citations in a document that assist readers to locate the source of an original idea or quote in a piece of work, and assist students to avoid plagiarism. - **Relevant:** Appropriate to the discussion. - **Reliable:** Able to be trusted. - **Research question:** A question that enables a researcher to narrow the focus of the topic of the investigation. - **Secondary sources:** Sources of information that has been processed, interpreted, or analyzed in some way by other people, such as textbooks, websites, magazines, newspapers, and TV programs. - **Stakeholder:** An individual or group that has an interest in, or is affected by, the decisions and actions of an organisation. - **Survey:** Usually a set of questions that ask for a response from a list of alternatives, such as A, B, C, D, or from a range, such as 1 to 5 or very low to very high; surveys can easily be given to many people, and are easily processed and analyzed using computer-based methods because the answers can be recorded as numbers. - **Theory:** A general statement that describes something, provides an explanation of why something happens, and can be applied to predict what will happen in the future. - **Unencrypted data:** Data that is not protected by encryption and can thus be read by anyone; also known as ‘plain text’. - **Username:** The name given to the user on a computer or computer network. - **Variable:** In programming, a key word, phrase, or symbol that represents a value that may change. - **Vested interest:** Arises when an individual, group, or organization has a strong personal interest because there is an advantage to be gained. ## Important Facts 1. *Artificial intelligence (AI)* has developed through a new type of neural network called large language model (LLM). A generative pre-trained transformer (GPT) samples many hundreds of millions of examples and establishes patterns and trends using hundreds of parameters. *ChatGPT* is a chatbot interface that attempts to provide human-like responses to input prompts. Each input prompt generates a different response, even from identical input prompts. Any copyright is held by the person inputting the prompt. 2. Data must be relevant to produce usable information. Data needs to be processed while it is current because decision-making should not be based on outdated data. 3. Data that is entered into a computer must be accurate. Transcription is often a cause of error. Transcription errors occur when the person entering the data misreads the information through, for example, a lapse in concentration or being interrupted, or presses the wrong key. 4. Interviews are usually done one-to-one, but can sometimes be done in groups, and can take a substantial amount of time. A major feature of an interview is the opportunity for in-depth follow-up and clarification questions that cannot be done with surveys, which are often answered in private. Interviews are very useful for eliciting the feelings, attitudes, and opinions of people that are too complex to easily record in a survey. 5. Bias can infiltrate data if the respondent to a survey or interview has a vested interest in the outcome of the research, if the timing of the data gathering is inappropriate, or if the chosen sample size is too small. 6. Timing of events needs consideration when collecting data as it can cause skewed results, which can lead to inaccurate or misleading conclusions. 7. Sample size must relate to the purpose of the data collection. Generally, a larger sample size leads to greater precision. 8. There is a plethora of unchecked information on the internet; however, some of the views presented may not be widely accepted or proven. Sources cited should be reliable. 9. The American Psychological Association (APA) created a style guide to assist with academic writing such as essays, books and other publications. The APA style is widely used and is the reference style that students are expected to use. 10. Privacy is a fine balance between the interests of researchers and those of the participants. Privacy laws attempt to stop inappropriate intrusion into the lives of individuals. Often, however, the problem is not the collection of data, but how the data is used or misused by people entrusted with it. 11. Researchers must ensure that data and materials generated and collected as part of their research, regardless of the format, are stored securely in a durable and accessible form. Stored data can be protected with both physical and software-based controls, such as backing up of data and shredding of confidential documents. 12. Cloud-computing companies provide off-site storage, processing and computer resources to individuals and organisations. These companies are typically third party and they store data to a remote database in real time. 13. To verify users' rights to access a network, security features are required. A system of establishing usernames (or user IDs) and passwords allows for the identification and authentication of each user. 14. The Privacy Act 1988 (Cwlth) was amended by the Privacy Amendment (Enhancing Privacy Protection) Act 2012. This came into effect in 2014. As part of this Act, the Australian Privacy Principles (APPs) replaced the National Privacy Principles and the Information Privacy Principles so that Australia now has one set of privacy principles. The APPs apply to Australian government agencies. 15. APP 2, 'Anonymity and pseudonymity', offers individuals dealing with organisations the option of using a different name or a pseudonym in relation to a particular matter. 16. APP 6, 'Use or disclosure of personal information', states that the information that is being held is in line with the primary purpose it was intended for. Information cannot be used for a secondary purpose without consent from the individuals concerned. 17. APP 11, 'Security of personal information', states that reasonable steps need to be taken to protect personal information an organisation holds from misuse, interference and loss, and from unauthorised access, modification or disclosure. The organisation also needs to destroy or de-identify personal information in certain circumstances.