Research Methods for Computer Science PDF
Document Details

Uploaded by BenevolentRubidium4221
Demelo M. Lao
Tags
Summary
These are lecture notes on research methods for computer science, covering applied statistics. Topics include quantitative data analysis, hypothesis testing, statistical tests, and examples using R software. Example problems are included, testing concepts such as unauthorized computer account use and CD writer battery life.
Full Transcript
Research Methods for Computer Science (CMSC 106) Applied Statistics – Part IIB DEMELO M. LAO Department of Computer Science Topical Outline – 1 (F2F) RECAP Quantitative Data Analysis Inferential Statistics by Hypothesis Testing Hypothesis Testing 5-step Guide By Examples...
Research Methods for Computer Science (CMSC 106) Applied Statistics – Part IIB DEMELO M. LAO Department of Computer Science Topical Outline – 1 (F2F) RECAP Quantitative Data Analysis Inferential Statistics by Hypothesis Testing Hypothesis Testing 5-step Guide By Examples Topical Outline – 2 (Asynchronous Remote Learning) Supplementary video lecture materials on the ff.: Quantitative Analysis Basics Quantitative Analysis – Working w/ Survey Data Review on t-test/z-test, Chi-square test, p-value and more … Assignment 3 – Continue on Answering Research Question by Hypothesis Testing RECAP (Part IIA) Quantitative data analysis Systematic approach in research w/c collects numerical data, transforms these into actionable insights/information Concerns w/ finding evidence to either support/contradict research idea/hypothesis empirical evidence Hypothesis testing Statistical method that uses sample data to evaluate researcher’s hypothesis - claims/conjectures - about target population Hypothesis Testing-1 A 5-Step Guide (Todd Daniel, 2021) Step 1 – Choose Right Statistical Test Step 2 – Establish Null (H0/Ho) and Alternative Hypotheses (H1/Ha) Step 3 – Select Criteria for Significance Step 4 – Calculate Statistical Test Step 5 – Interpret and Write Up Findings Hypothesis Testing-1a Which Statistical Test to Use? Hypothesis Testing-1b Which Statistical Test to Use? Hypothesis Testing-2 How to Create Null & Alternative Hypothesis Questions to ask yourself Is the research question one-tailed (directional) or two-tailed (nondirectional) What is mean of population? Refer to research question What direction does DV change? Increase or decrease? If DV increases, use “>” in alternative hypothesis If DV decreases, use “ critical value from table → statistically significant! Does confidence interval around mean difference include 0 (or, = hypothesized difference)? Yes, statistically NOT significant! Type I & Type II Errors Four possible situations Two of four test cases results in correct decision – either accept a true hypothesis or reject a false hypothesis Other two situations are sampling errors: Type I → (by convention to determine significance level) Type II → Hypothesis Testing-4 Calculate Statistical Test When working by hand, use test formula to calculate test statistic Computed test statistic will be compared to critical value from table When using software (via Spreadsheet Add- ins, SPSS, R, Python, etc.), software calculates test statistic, probability (p) value, and confidence interval P-value will be compared to set Hypothesis Testing-5 Interpret and Write Up Findings Each of ff. indicate statistical significance: P-value is less than (p-value < 0.05) Calculated statistic is in critical region (rejection region) – computed test statistic value exceeds critical value Confidence interval around mean difference includes 0 (i.e., one negative and other positive) When FALSE, reject null hypothesis! Null hypothesis says there is no difference between means o If you reject idea that there is no difference, you are inferring that there is difference! Example 1 (One-sample Case) Unauthorized Use of Computer Account A long-time authorized user of the account makes 0.2 seconds between keystrokes on average. The time between keystrokes, the time a key is depressed, the frequency of various keywords are measured and compared with the account owner (if an unauthorized person accesses a computer account with correct username and password – stolen or cracked). If there are noticeable differences, an intruder is detected. One day, the following times between keystrokes were recorded when someone typed the correct username and password:.46,.38,.31,.24,.20,.31,.34,.42,.09,.18,.46,.21 seconds. At 1% level of significance, is this an evidence of an unauthorized attempt? Continue … Example 1 Solution Step 1 - Choose right statistical test Variable of interest – time between keystrokes (in seconds) → Ratio scale of measurement Number of groups/samples = 1 (set) Size of group/sample (n) = 12 Hypothesized mean = 0.2 seconds (i.e., on avg.) Statistical test (based on guide) → One-sample t-Test (since n < 30, if not → use z-Test instead!) Step 2 – State Null & Alternative Hypotheses HO: = 0.2 HA: 0.2 (since it only implies that observed mean time between keystrokes is different from usual) Continue … Example 1 Solution Step 3 – Criteria for Statistical Significance P-value < set = 0.01, or By confidence interval (CI), 0.2 not inside CI Step 4 – Calculate test statistic By R software: #preparing path for working directory path Group2 (Batt. Life in G1 last longer than in G2) Continue … Example 2 Solution Step 3 – Criteria for Statistical Significance P-value < set = 0.05, or By confidence interval (CI), 0 not inside CI Step 4 – Calculate test statistic (via R software script) #preparing path for working directory path