Kin 2032 Final Research Methods PDF
Document Details
Uploaded by Deleted User
The University of Western Australia
Tags
Summary
The document is a past paper for Kin 2032, Research Methods at The University of Western Ontario, focusing on the goals of science, basic vs. applied research, and a model of scientific research.
Full Transcript
lOMoARcPSD|47328710 Kin 2032 Final Research Methods (The University of Western Ontario) Scan to open on Studocu Studocu is not sponsored or endorsed by any college or university Downloaded by Claire Seed ([email protected]) ...
lOMoARcPSD|47328710 Kin 2032 Final Research Methods (The University of Western Ontario) Scan to open on Studocu Studocu is not sponsored or endorsed by any college or university Downloaded by Claire Seed ([email protected]) lOMoARcPSD|47328710 Lecture 1 Introduction to Research Goals of Science Describe ○ Achieved through careful observation. Predict ○ Achieved after sufficient observation of behaviors or events that are systematically related to one another. Explain ○ Achieved by determining the causes of behaviors or events. Basic vs Applied Research Basic Research ○ Conducted for the sake of achieving a more detailed and accurate understanding of a behavior or events without trying to address any practical problem. Applied Research ○ Conducted to address a practical problem. A Model of Scientific Research Non-linear (cyclic) Literature drives future research questions Finding a Research Topic Good research requires a good foundation in the research question. Process to develop a research question can be stressful and difficult. Inspiration for good research questions can come from a variety of sources: ○ clinical experience ○ theory ○ “unanswered questions” in professional literature Reviewing the Research Literature Previous research is a common source of inspiration for research questions. It is important to conduct a literature review early in the research process which involved a review of research literature. ○ Research literature is all the published research in that field. Reviewing the research literature requires you to find, read and summarize the published research. Reviewing the research literature can also assist you in other ways: ○ It can tell you if a research question has already been answered. ○ It can help you evaluate the interestingness of a research question. ○ It can give you ideas for how to conduct your own study. Downloaded by Claire Seed ([email protected]) lOMoARcPSD|47328710 ○ It can tell you how your study fits into the research literature Professional Journals Professional journals are periodicals that publish original research articles. Most journals require a double-blind peer review. ○ This means that multiple external review and provides constructive, critical feedback along with recommendations to revise, publish or exclude the article. Professional journals will publish a variety of article types but two main categories are empirical research reports and review articles. ○ Empirical research reports introduce a research question, provide the background, describe the methods and results and report the conclusions. ○ Review articles summarize previously published research. Professional journals typically publish in one of the following three ways: ○ Closed Access/Traditional – the reader pays for a subscription to the journal and authors publish for free ○ Open Access with Peer Review – the authors pay a fee to publish in the journal, articles go through a peer-review process and readers can access the journal for free Reviewers and authors are not financial compensated ○ Predatory Publishers/”Pay to Play” – the authors pay a fee to publish and there is no peer-review process No peer review is bad, people can publish what they want Scholarly Books Scholarly books are written by researchers and practitioners for use by other researchers and practitioners. Monographs are written by a small group and present a topic in a similar fashion to a research article. Edited volumes have an editor who recruit many authors to write separate chapters on different aspects of the same topic. Generally, scholarly books undergo a similar peer review process to professional journals. Literature Search You can use a variety of tools and resources to conduct a literature search ○ Journal Websites ○ Electronic Databases – indexes multiple journals and allows you to search across them Some paid some not ○ Reference sections in articles Electronic Databases Electronic databases typically are the best approach for literature searching Common Databases ○ PubMed https://pubmed.ncbi.nlm.nih.gov/. Full PDFs ○ Google Scholar https://scholar.google.com Links to university sites No accessible to other researchers as search results are arbitrary ○ CINAHL https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=http://search.ebscohost.com/login.aspx?authtype=i p,uid&custid=s3694324&profile=cinahl&defaultdb=cin20 ○ Scopus https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi?url=https://www.scopus.com Downloaded by Claire Seed ([email protected]) lOMoARcPSD|47328710 What to search for Focus on sources that help you accomplish four basic things: ○ refine your research question ○ identify appropriate research methods ○ place your research in the context of previous research ○ write an effective research report Limit your search to “recent” articles (this depends on your question) Use Boolean logic to refine your search ○ Quotes: Use quotes to search for an exact phrase. Example: “family physician" ○ Parenthesis: Combine modifiers to create a more complex search. Example: Canada AND (physician OR doctor) ○ AND: Include two search terms. Example physician AND doctor ○ OR: Broaden your search with multiple terms. Example: “family physician" OR “family doctor" ○ NOT: Use to exclude a specific term. Example: physician NOT doctor Western Research Guide & Mendeley Western Research Guide for Heath Studies ○ https://guides.lib.uwo.ca/healthstudies Mendeley Reference Manager ○ https://www.Mendeley.com Good Research Questions Asking Good Questions Good research questions can be generated through a variety of means including clinical experience, theory or unanswered questions from professional literature. Good research questions should have one or more of the following objectives ○ Evaluation of measurement tools ○ Descriptive (characterizing clinical disorders) ○ Exploratory (investigating relationships) ○ Comparative (cause-effect relationships) Who? Who are we studying? ○ What is the target population? ○ What is my sampling frame? ○ What characteristics should my sample include? ○ How should I draw my sample? issues surrounding sampling are crucial to the development of a good question ○ Not everyone has landlines, calling from phonebook leads to incorrect conclusions (only old people) ○ If given a reward for doing a poll, will just skip through to get reward Can't correct for things we don't know Why? Why is this question important? Why has this question not been answered previously? Downloaded by Claire Seed ([email protected]) lOMoARcPSD|47328710 Why does this effect occur? ○ How does it relate to other effects? ○ Is there a unifying theory that I might investigate? What? What are we going to measure? What are the important aspects of this phenomenon? What variables are we considering? ○ What is the independent variable? How many levels does the I.V. have (if appropriate)? ○ What is the dependent variable? ○ Can we identify predictors and/or criteria? How? How will we measure this variable? What is our operational definition? Will we use questionnaires, observations, physical measurements, or some other method? ○ What will we use as rating scales? ○ How will we score our measure(s)? Can we come up with low-inference measures? The Importance of Context Important to know what researchers have found in the past ○ Literature searches (see text) ○ Conference presentations ○ Opinions of colleagues What will our hypothesis be, and how does this fit into existing pieces of evidence? Evaluating Research Questions Researchers often generate more research questions than they can answer so it is important to consider two criteria for evaluating research questions in order to use their time efficiently. Interestingness Feasibility Interestingness Three factors contribute to the interestingness of a research question ○ The answer is in doubt ○ The answer fills a gap in the research literature ○ The answer has important practical implications The answer is in doubt ○ There must be a reasonable chance that the answer to the question will be something that is now already known. ○ If you can think of reasons to expect a least two different answers then the question may be interesting. The answer fills a gap in the research literature Downloaded by Claire Seed ([email protected]) lOMoARcPSD|47328710 ○ If the question has not already been answered by scientific research and would make sense to people who are familiar with the research literature, then the question may be interesting. The answer has important practical implications ○ If the answer to a research question has practical implications, then it likely is an interesting research question.a Feasibility In order to successfully answer a research question the feasibility of answering it must be considered. There are many factors that affect feasibility: ○ Time ○ Money ○ Equipment ○ Materials ○ Technical knowledge ○ Access to participants Developing a Hypothesis Theory A theory is a coherent explanation or interpretation of one or more phenomena. Theories can take a variety of forms, but all theories go beyond the phenomena they explain by including variables, structures, processes, functions, or organizing principles that have not been observed directly. Theories can be untested but also can be extensively tested, supported and accepted as an accurate description of the phenomena. Hypothesis A hypothesis is a specific prediction about a new phenomenon that should be observed if a particular theory is accurate. Hypothesis are often specific predictions about what will happen in a particular study. Hypothesis are developed by considering existing evidence and using reasoning to infer what will happen in the specific context of interest. Hypotheses are often but not always derived from theories. ○ A hypothesis is often a prediction based on a theory but some hypotheses are a-theoretical and only after a set of observations have been made, is a theory developed. Theories and Hypotheses Theories and hypotheses always have this if-then relationship. A hypothesis can be derived from a theory in multiple ways: ○ A research question can be generated, and then relevant theories can be explored that may imply an answer to the question. ○ A component of the theory that has not been directly observed can become the focus of a hypothesis. Theory Testing - Hypotheticodeductive Method The primary way theories are used in research are through the hypotheticodeductive method. Researchers begin with a set of phenomena and either construct a theory to explain or interpret them or choose an existing theory to work with. They then make a prediction (or hypothesis) about some new phenomenon that should be observed if the theory is correct Downloaded by Claire Seed ([email protected]) lOMoARcPSD|47328710 Finally, they reevaluate the theory considering the new results and revise it if necessary. This process is not necessarily linear and should be a cycle because the researchers can then derive a new hypothesis from the revised theory, conduct a new empirical study to test the hypothesis, and so on. Characteristics of a Good Hypothesis Testable and Falsifiable ○ The hypothesis must be evaluated using the methods of science and it must be possible to gather evidence that will disconfirm the hypothesis if it is false. Logical ○ Hypotheses should be informed by previous theories or observations and logical reasoning. Positive ○ The hypothesis should make a positive statement about the existence of a relationship or effect, rather than a statement that a relationship or effect does not exist. Lecture 2 Designing a Research Study Variables and Operational Definitions In order to generate a hypothesis the variables in question need to be identified and an operational definition for each variable needs to be established. The operational definition is important as it permits for the accurate measurement of the variable. Variables Variable ○ A quantity or quality that varies across people or situations Quantitative variable ○ A quantity that is typically measured by assigning a number to each individual. ○ Example: height, weight (numerical) Categorical variable ○ A quality that is typically measured by assigning a category label to each individual. ○ Example: Academic major, nationality, occupation. Operational Definition Operational Definition ○ A definition of the variable in terms of precisely how it is to be measured. Most variables cannot be directly observed or measured. An operational definition takes an abstract construct that cannot be directly observed or measured and transforms it into something that can be. Most variables can be operationally defined in many ways which is why it is important to have it defined at the outset. Sampling Researchers must identify the population of interest and draw a smaller subset of the population, called the sample, to study. Population ○ ALL individuals of interest ○ Example: All individuals who have contracted COVID-19 in the world. Sample ○ A small subset of the population used for a research study. Downloaded by Claire Seed ([email protected]) lOMoARcPSD|47328710 ○ Example: 100 individuals who have contracted COVID-19. Sampling Methods Researchers use a variety of methods to obtain a sample in order to reduce bias and obtain a sample that is truly representative of the population. Simple Random Sampling ○ Every member of the population has an equal chance of being selected for the sample. Systematic sampling ○ The list of participants is "counted off". That is, every nth participant is taken. Convenience Sampling ○ Individuals who happen to be nearby and willing to participate. Cluster sampling ○ Divides the population into groups called clusters or blocks. The clusters/blocks are then randomly selected and everyone within that cluster is selected. Stratified sampling Divides the population into groups based on a specific characteristic. A sample is taken from each of these strata using either random, systematic, or convenience sampling. Experimental Research Used to test causal relationships between variables to explain a phenomenon. One or more variables are manipulated while controlling for extraneous variables and then the manipulated variables are measured to determine how they affected the participants. Types of Experimental Variables Independent Variable ○ The variable being manipulated Dependent Variable ○ The variable being measured. Extraneous variables ○ Any variable other than the dependent variable. Confounds ○ A specific type of extraneous variable that systematically varies along with the variables under investigation and therefore provides an alternative explanation for the results. ○ Confounding variables need to be taken into consideration to ensure they are controlled for. Non-Experimental Research Used to describe characteristics of people, relationships between variables and using those relationships to make predictions. Variables are not manipulated, rather they are measured and/or observed as they naturally occur. Non-experimental does not mean nonscientific as it can be used to describe and predict but cannot be used to make the causal conclusions as in the case of experimental research. Laboratory Research A laboratory study is conducted in a controlled laboratory environment. Laboratory experiments have high internal validity (the degree to which we can confidently infer a causal relationship between variables) because the independent variable can be manipulated while controlling for all other extraneous variables. Because of this, laboratory experiments tend to have low external validity (the degree to which we can generalize the findings to other circumstances or settings) Field Research Downloaded by Claire Seed ([email protected]) lOMoARcPSD|47328710 Field studies are conducted in the real-world environment. Because of the natural environment not all extraneous variables can be controlled for therefore field research tends to have lower internal validity but higher external validity. Analyzing the Data Descriptive Statistics Used to describe or summarize a set of data but cannot be used to form casual conclusions. Examples include percentages, measures of central tendency, measures of dispersion and correlation coefficients. Measures of Central Tendency ○ Mean – the average of a distribution of scores ○ Mode - the most frequently occurring score in a distribution ○ Median - the midpoint of a distribution of scores, unaffected by outliers Measures of Dispersion ○ Range - measures the distance between the highest and lowest scores in a distribution ○ Standard Deviation - measures the average distance of scores from the mean ○ Variance - the standard deviation square Correlation Coefficient ○ Used to describe the strength and direction of the relationship between two variables. ○ The values of a correlation coefficient can range from -1.00 (the strongest possible negative relationship) to +1.00 (the strongest possible positive relationship). ○ A value of 0 means there is no relationship between the two variables. Inferential Statistics Can be used to draw conclusions about a population based on data from a sample. Researchers use inferential statistics to determine whether their effects are statistically significant. A statistically significant effect is one that is unlikely due to random chance and therefore likely represents a real effect in the population. Two types of mistakes can be made with inferential statistics: ○ Type I Errors (False positive) ○ Type II Errors (False Negative) Drawing Conclusions and Reporting the Results Drawing Conclusions The results of a single study cannot conclude with certainty that a theory is true, rather it can support, refute or modify the theory based on the results. If the results are statistically significant and consistent with the hypothesis and the theory that was used to generate the hypothesis, then researchers can conclude that the theory is supported. If the results are not consistent with the hypothesis and the hypothesis is disconfirmed in a systematic empirical study, then the theory has been weakened. Confirming a hypothesis can strengthen a theory but it can never prove a theory as a disconfirming case can never be ruled out. Disconfirming a hypothesis on the other hand could disprove the theory it was derived from however it could also mean that some unstated but relatively minor assumption of the theory was not met. Statistics are probabilistic in nature and because all research studies have flaws there is no such thing as scientific proof, there is only scientific evidence. “Absence of evidence is not evidence of absence” Lecture 3 Downloaded by Claire Seed ([email protected]) lOMoARcPSD|47328710 Summarizing Data Levels of Measurement nominal ordinal interval ratio Nominal Data categorical data with no implicit ordering ○ e.g., sex, animals cannot be added, subtracted, multiplied or divided can be summarized using mode only ○ For example: 25 animals (10 dogs, 15 cats) ○ If cats are category 1 and men are category 2, is the average 1.6? - no ○ There are more cats than dogs - yes Ordinal Data categorical data with implicit (or explicit) ordering ○ e.g., positions in a race unequal distance between points cannot be added, subtracted, multiplied or divided can be summarized with median or mode Interval Data continuous (equal distance between points) ○ e.g., temperature in Celsius no meaningful zero can be added or subtracted cannot be multiplied or divided can be summarized with mean, median, or mode Ratio Data continuous (equal distance between points) meaningful zero ○ e.g., temperature in Kelvin can be added, subtracted, multiplied and divided can be summarized with mean, median, or mode ○ Bank account (when negative, on a different scale, overlapping at 0) In practical terms… imagine that it was 10ºC yesterday, and 20ºC today ○ is it twice as hot today? 10ºC = 283K & 20ºC = 293K ○ - Ordinal data versus Interval data might be able to treat ordinal as interval if: Downloaded by Claire Seed ([email protected]) lOMoARcPSD|47328710 ○ you are aggregating multiple items ○ the underlying construct is continuous ○ the measurement instrument is reliable But…think before you collect still common to see continuous variables (e.g., “age”) collected as categorical variables produces a number of problems: ○ categories are often arbitrary ○ results in a significant loss of information ○ presents fewer analytic choices both descriptive and inferential think about your analysis BEFORE you collect!!! Central Tendency Mean ○ the arithmetic average of the data Median ○ the point that divides the data in half ○ 50th percentile Mode ○ the most frequently occurring value Mean total all the results and divide by the number of units or “n” of the sample ○ Median the exact middle score in a data-set list all scores in numerical order, and then locate the score in the center of the sample ○ Ignores outliers (better for exams) Mode the most repeated score in the set of results 15 is the most repeated score and is labeled the mode if you have a “tie” for “most repeated score”, you will have more than one mode (bimodal or multimodal) Normality and Central Tendency if the distribution is normal (i.e., bell-shaped), the mean, median and mode are all equal Dispersion range ○ good for an intuitive description of minimum and maximum values in a data set standard deviation ○ more accurate/detailed description of dispersion that takes “outliers” into account coefficient of variation ○ a useful way of comparing standard deviations across populations with different means or units Downloaded by Claire Seed ([email protected]) lOMoARcPSD|47328710 Range the range is the difference between the highest and lowest scores within a variable Standard Deviation a value that shows the relation that individual scores have to the mean of the sample if scores are said to be standardized to a normal curve, then there are several statistical techniques that can be used to analyze the data set Standard Deviation (Conceptual Formula) SD is calculated across all scores as the square root of the sum of the squared deviations from the mean, divided by the number of scores: we represent the population value with the Greek character ‘sigma’, and the sample value with the letter ‘s’ we generally compute standard deviation with the following computational formula: Computational Example of Standard Deviation Different equations give slightly different results but it is good enough Coefficient of Variation the standard deviation of a measure is dependent upon it’s scale (i.e., the magnitude of the values within the data) if you need to compare the dispersion of two different scales (or two different samples), it is worthwhile to make the comparison scaleless the coefficient of variation facilitates this type of comparison Computational Example of Coefficient of Variation Different equations come to the same result Can compare directly without concern Distributional Shape Normal Distribution sometimes called a “bell curve” upper and lower halves perfectly symmetrical most common normal distribution is the standard normal distribution (distribution of standard scores) ○ mean, median, and mode of 0 (perfect bell curve) ○ standard deviation of 1 Only the case in standard normal distributions! The Empirical Rule 68% of the data falls within 1 SD of the mean 95% of the data falls within 2 SD of mean 99.7% of the data falls within 3 SD of mean ○ P-values are not important for clinical value (only statistical) Skewness a measure of the asymmetry of the distribution Downloaded by Claire Seed ([email protected]) lOMoARcPSD|47328710 ○ extent to which one “tail” is longer than the other positive skew: right tail longer negative skew: left tail longer generally… -’ve skew: median > mean +’ve skew: mean > median Kurtosis skewness measures the tails of the distribution, and kurtosis measures the peaks ○ kurtotic distributions have non-normal “peaks” platykurtotic (“flat”; highly negative kurtosis) leptokurtotic (“pointed”; highly positive kurtosis) mesokurtotic (“no” kurtosis – ‘normal’ distribution) Outliers outliers are values that fall substantially outside the range of most other values in the data ○ skew distributions, and disrupt inferential statistics identifying outliers ○ recall that the empirical rule states that 99.7% of the data will fall within 3 SD of the mean values that are >3 SD away from mean may be outliers ○ graphical methods (e.g., box plot) Graphical Summaries Graphical Summaries of Data bar graphs and histograms line graphs box plots Bar Graph vs. Histogram a histogram compares multiple measurements of the same variable ○ e.g. describing the age range in a sample a bar graph compares multiple variables ○ e.g. the relative frequency of test usage within a group of practitioners stem-and-leaf plot (Tukey) Stem and Leaf Downloaded by Claire Seed ([email protected]) lOMoARcPSD|47328710 basically constructed as a vertical histogram also shows raw data, and gives a rough idea of dispersion Line Graphs often used to convey temporal information ○ the “line” in a line graph suggests that there is a continuity of the variable between points on the curve should not be used for discrete variables! ○ 75% of the data falls below QU ○ 25% of the data falls below QL ○ IQR = QU - QL Identifying Outliers? “outliers” > 1.5 * IQR away from Q1 or Q3 ○ upper inner fence: Q3 + 1.5 * IQR ○ lower inner fence: Q1 – 1.5 * IQR “extreme outliers” > 3.0 * IQR away from Q1 or Q3 ○ upper outer fence: Q3 + 3.0 * IQR ○ lower outer fence: Q1 – 3.0 * IQR Lecture 4 Standard Scores it is often useful to describe data points on different measures, in terms of a common scale ○ e.g. if you have two different measures of depression, and want to compare an individual’s scores on the two measures the easiest way to compare scores on a common scale is to use standard scores ○ most commonly, a z score or a T score The z-score a z-score is a standard measure of the distance between a single point in the data (e.g. an individual’s score), and the overall mean for that variable: the z-distribution: ○ ranges from negative infinity to positive infinity Dependent on the variable, if value is finite, z will also be finite ○ has a mean of 0 ○ has a standard deviation of 1 Downloaded by Claire Seed ([email protected]) lOMoARcPSD|47328710 Example You are working with a group of patients with Parkinson’s disease, and want to collect baseline data on their postural stability during the performance of everyday tasks. You use two different balance assessments. Mr. Smith scores a 5.6 on the first scale, and 4.9 on the second scale. Standardization information for these measures indicates that the population means for the two measures are 5.3 and 5.0, respectively, and the population standard deviations are 0.8 and 0.2, respectively. Is Mr. Smith’s performance markedly different on these two measures? Standard Score on Scale #1 we are interested in standardizing the distance between Mr. Smith’s score and the variable mean ○ z-scores may be thought of as “standard deviation units” Standard Score on Scale #2 we are interested in standardizing the distance between Mr. Smith’s score and the variable mean ○ z-scores may be thought of as “standard deviation units” The T score a T score is another way of standardizing scores ○ by convention, it is a standard distribution with a mean of 50, and a SD of 10 ○ is a standard score (like z), but without negative values so… ○ Mr. Smith’s T score on Scale #1 was: 10(0.375) + 50 = 53.75 ○ Mr. Smith’s T score on Scale #2 was: 10(-0.5) + 50 = 45.00 Sidebar (scale 1): ○ Can use this to scale any variable… ○ e.g., IQ = 15z + 100 Percentiles we can also express z-scores as percentiles ○ refers to the proportion scoring less than a particular value ○ e.g. 75% of the population scores below the 75th percentile percentiles are obtained from a z-table Mr. Smith had a z-score of 0.375 on the first scale the area between 0 and 0.375 is 0.1480 or 0.1462 since the area between z = 0 and z = 0.375 is 0.1462 the area below z = 0.375 is 0.5 + 0.1462 = 0.6462 the above z = 0.375 is 0.5 - 0.1462 = 0.3538 Mr. Smith is, therefore, at the 65th percentile on the first scale Mr. Smith had a z-score of -0.5 on the second scale the area between 0 and -0.5 is 0.1915 since the area between z = 0 and z = -0.5 is 0.1915 the area below z = -0.5 is 0.5 – 0.1915 = 0.3085 the above z = -0.5 is 0.5 + 0.1915 = 0.6915 Mr. Smith is, therefore, at the 31st percentile on the first scale Z-Table Downloaded by Claire Seed ([email protected]) lOMoARcPSD|47328710 Hypothesis Testing Population Distribution Sample Distribution The Sampling Distribution Hypothesis Statements research hypothesis (“alternative hypothesis”) ○ researcher’s true expectation of results null hypothesis (“status quo”) ○ comparison statement for research hypothesis ○ typically involves the assumption that: nothing has happened; or no relationship exists; or no change has occurred ○ can also be used to specify a particular threshold for an event this is particularly important for the demonstration of meaningful results (rather than simply significant results) Rejecting the Null Hypothesis proposed effect is demonstrated when the null hypothesis is “rejected” ○ if you “reject” the null hypothesis, you may conclude that the alternative hypothesis (your research hypothesis) is likely to be correct Decision Matrix Type I error is an incorrect rejection of Ho Type II error is an incorrect failure to reject Ho Power is a correct rejection of Ho Null Hypothesis Significance Testing “Do children watch more television today, than they did in 1998?” Downloaded by Claire Seed ([email protected]) lOMoARcPSD|47328710 Directional Hypothesis In a directional hypothesis, there is a specific result that one wants to test…change must occur in the correct “direction” as compared to the mean. Directional Hypothesis (upper-tailed test) “Do children watch more television today, than they did in 1998?” Directional Hypothesis (lower-tailed test) “Do children watch less television today, than they did in 1998?” Non-directional Hypothesis (two-tailed test) In a non-directional hypothesis, one proposes that change that might occur in either direction. This is a two-tailed test. Statistical Inference statistical decision-making requires that you make a determination about the likelihood that something could occur due to chance Establishing Rejection Regions Where did the 0.475 come from?! Conclusion because our sample mean fell within the rejection region, we: ○ reject the null hypothesis ○ conclude that the alternative hypothesis is likely to be correct thus, we conclude that children are watching a significantly greater amount of television than they watched in 1998 Calculating z thus, we would report that our sample suggests that children watch significantly more television than they did in 1998, z = 3.08, p < 0.05 Lecture 5 Significance Testing Significance testing Based on the statistical properties of sample data, we can extrapolate (“estimate”) about the probability of the observed differences or relationships occurring in the target population. We are assuming that the sample data is representative and that the data meets the assumptions associated with the inferential test. History of significance testing Downloaded by Claire Seed ([email protected]) lOMoARcPSD|47328710 Developed by Ronald Fisher (1920s-1930s) To determine which agricultural methods yielded greater output ○ Were variations in output between two plots attributable to chance or not? Agricultural research designs couldn’t be fully experimental because natural variations such as weather and soil quality couldn't be fully controlled. ○ Therefore, it was needed to determine whether variations in the DV were due to the IV(s) or to chance. Criticisms of significance testing The null hypothesis is rarely true. ST provides: ○ a binary decision (yes or no) and ○ direction of the effect But mostly we are interested in the size of the effect ○ i.e., how much of an effect? Statistical vs. practical significance Statistical significance simply means that the observed effect (relationship or differences) are unlikely to be due to sampling error Statistical significance can be evident for very small (trivial) effects if N and/or critical alpha are large enough Practical significance ○ Is the difference is large enough to be of value in a real world sense Is an effect worth being concerned about? Is the effect noticeable or worthwhile? e.g., a 5% increase in well-being probably starts to have practical value A&P Statistical significance α Level of significance aka Alpha aka α ○ This represents the probability of obtaining your results due to chance. ○ The smaller this value is, the more “unusual” the results, indicating that the sample is from a different population than it’s being compared to, for example. If p-value falls below significance level, we say that the results from the test are statistically significant If p>α then FAIL TO REJECT the null hypothesis. If p< α then REJECT the null hypothesis. Alpha also represents your chance of making a Type I Error. ○ The chance that you reject the null hypothesis when in reality you should fail to reject the null hypothesis. ○ There sample data indicates that there is a difference when in reality, there is not. ○ False positive. Power β Power aka Beta aka β Power refers to your study’s ability to find a difference if there is one. The greater the power, the more meaningful your results are. Beta = 1 – Power. Beta also represents the chance of making a Type II Error. Downloaded by Claire Seed ([email protected]) lOMoARcPSD|47328710 You incorrectly fail to reject the null. ○ The data indicates that there is not a significant difference when in reality there is. ○ Your study failed to capture a significant finding. ○ False negative. Desirable power >.80 Typical power ~.60 Power becomes higher when any of these increase ○ Sample size (N) ○ Critical alpha (α) ○ Effect size (∆) Calculate expected power before conducting a study (a priori), based on: ○ Estimated N ○ Critical α ○ Expected or minimum Effect Size (e.g., from related research) Effect Size A measure of the strength (or size) of a relationship or effect. An inferential test may be statistically significant (i.e., the result is unlikely to have occurred by chance), but this doesn’t indicate how large the effect is (the effect might be trivial). On the other hand, there may be nonsignificant, but notable effects (esp. in low powered tests). Unlike significance testing, effect sizes are not influenced by N. Common effect size statistic is Cohen’s d ○ Equal to the mean difference divided by the standard deviation When using cohen's d, effect sizes are as follows: ○ d = 0.2, small effect ○ d = 0.5, medium effect ○ d = 0.8, large effect p-Value The probability of observing a sample statistic at least as extreme as the one actually observed (in the direction of HA) given H0 is true The p-value is the probability of Type I error. Type I error is the probability of rejecting a correct null hypothesis. Small p-value: ○ Such an event is highly unlikely if H0 is true ○ Cast doubt upon the validity of H0 ○ Small enough p-value gives us reason to reject H0 and supports HA p-value tells us exactly how likely we are to make a Type I error if we reject H0 For p-value, smaller is better (in support of alternative hypothesis) plain English The p-value is the probability of incorrectly rejecting the null hypothesis. OR The p-value is the probability of rejecting a null hypothesis when in fact it is ‘true.’ OR The p-value is the chance of error you will have to accept if you want to reject the null hypothesis. P-Value Examples a p-value of.01 means there is a 1% chance that we will incorrectly reject the null hypothesis. Or that we could reject the null hypothesis with a 1% chance of error. a p-value of.04 means there is a 4% chance that we are incorrectly rejecting the null hypothesis. Or that we could reject the null hypothesis with a 4% chance of error a p-value of.10 means there is a 10% chance that our decision to reject the null hypothesis was in error. Or that we could reject the null hypothesis with a 10% chance of error. Downloaded by Claire Seed ([email protected]) lOMoARcPSD|47328710 P-Value Size How small the p-value have to be to infer that HA is true? P-value between 0 and 0.01 implies overwhelming evidence P-value between 0.01 and 0.05 implies strong evidence P-value between 0.05 and 0.10 implies weak evidence P-value greater than 0.10 means no evidence in favor of HA α and p Think of the α as the minimum p-value you are willing to accept. 1. Set the α level before you calculate the p-value 2. Calculate the p-value 3. Compare the p-value to the α level - If p>α then FAIL TO REJECT the null hypothesis. - If p< α then REJECT the null hypothesis. Reporting p-values p-values can be reported in one of two ways: ○ Actual p-values P=0.04 ○ Statement of inequality P