Research Methods Summary (Chapters 1-12) PDF

Student UvA Research Methods - The essential knowledge base Samenvattingen, aantekeningen en oefenvragen van studenten kopen en verkopen of schakel de hulp in van tutoren die je ondersteunen met je studie. Gedownload door: ankied - [email protected] Dit document is auteursrechtelijk beschermd, het verspreiden van dit document is strafbaar. RESEARCH METHODS The essential knowledge base William J. Trochim, James P. Donnelly and Kanika Arora 2nd edition, 2016. Chapters 1-12 Pre-master Business Administration, Universiteit van Amsterdam Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld Chapter 1 – Foundations of research methods 1.1 The research enterprise Research: a type of systematic investigation that is empirical in nature and is designed to contribute to public knowledge. Research enterprise: the macro-level effort to accumulate knowledge across multiple empirical systematic public research projects. How is knowledge from research different from knowledge derived from experience, or experts, or intuition, or trial and error? We try to connect all the ideas with observations, pit reality with our ideas of reality. Within the social sciences we can know nothing for sure. There is no proof for a theory. It’s a mathematical concept, but you cannot show that a certain individual would always behave in a certain way.we try to identify trends. We are constantly facing the problem of looking at the result of a small research and how can we apply this on everything. We can never study the entire population. Researches are not often explicit jabout the population. You cannot generalize research, for example if you study employees in the Netherlands this would not be the same for employees in another country. All research is limited and we can know nothing for sure. Research - Purposeful; - Systematic; - Investigation; - Imperical; - Public effort Nature of business and management research - Transdiciplinary - Double hurdle (theoretical and practical impact) - Science –practice gap/translational research - Evidence based management Translational research: the systematic effort to move research from initial discovery to practice and ultimately to impacts on our lives. (from bench to behavior, brain to vain). Research-practice continuum: the process of moving from an initial research idea or discovery to practice, and the potential for the idea to influence our lives or world. - Basic research: research that is designed to generate discoveries and to understand how the discoveries work. - Applied research: research where a discovery is tested under increasingly controlled conditions in real-world contexts. Difference between basic and applied research: Basic/fundamental research is conducted to satisfy curiosity, applied research is focused on finding answers on real-life problems in organizations. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld - Implementation and dissemination research: research that assesses how well an innovation or discovery can be distributed in and carried out in a broad range of contexts that extend beyond the original controlled studies. - Impact research: assesses the broader effects of a discovery or innovation on society. - Policy research: designed to investigate existing policies or develop and test new ones. Expected that system for synthesizing large numbers of research studies will become the normative way that research about new discoveries moves from the basic-applied stage to implementation and dissemination in broader context. Research synthesis: systematic study of multiple prior research projects that address the same research question or topic and that summarizes the results in a manner that can be used by practitioners. Meta-analysis: uses statistical methods to combine the results of similar studies quantitatively in order to allow general conclusions to be made. Systematic review: focuses on a specific question or issue and uses specific preplanned methods to identify, select, assess, and summarize the findings of multiple research studies. Stronger conclusions than a single study. Even meta-analyses and systematic reviews are sometimes not by themselves sufficient to be used by practitioners as guides for how they might change what they implement. To help address this problem; guideline: systematic process that leads to a specific set of research-based recommendations for practice that usually includes some estimates of how strong the evidence is for each recommendation. See figure above, includes guidelines. Evidence-based practice (EBP): movement designed to encourage or require practitioners to employ practices that are based on research evidence as reflected in research syntheses or practice guidelines. Better integration of research and practice. First in medicine (1997). Society became more aware of dominance of research, we think differently about research enterprise; as evolutionary system. Based on evolutionary epistemology: branch of philosophy that holds that ideas evolve through the process of natural selection. 1.2 Conceptualizing research One of most difficult aspects is how to develop the idea for a research project. - Practical problems in the field; Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld - Literature in your specific filed. Requests for proposals (RFPs): document issued by a government agency or other organization that, typically, describes the problem that needs addressing, the contexts in which it operates, the approach the agency would like you to take to investigate the problem, and the amount the agency would be willing to pay for such research. - Think up their research topic on their own. Influenced by background, culture etc. One of the most important early steps in a research project is the conducting of the literature review: systematic compilation and written summary of all of the literature published in scientific journals that is related to a research topic of interest. A literature review is typically included in the introduction section of a research write-up. - Concentrate your efforts on the research literature, determine what the most credible research journals are and start there. Peer review: system for reviewing potential research publications where authors submit potential articles to a journal editor who solicits several reviewers who agree to give a critical review of the paper. Paper is sent to these reviewers with no identification of the author so that there will be no personal bias. - Do the review early in the research process - Begin to think about whether the study is feasible at all. Considerations; trade-offs between rigor and practicality. - Think about how long the research will take - Think about ethical constraints - Determine whether you can acquire the cooperation needed to take the project to its successful conclusion - Determine the degree to which costs will be manageable. 1.3 The language of research Terms that help describe key aspects of contemporary social research: Theoretical: pertaining to theory. Social research is theoretical, meaning that much of it is concerned with developing, exploring, or testing the theories or ideas that social researchers have about how the world operates. Empirical: based on direct observations and measurements of reality. Probabilistic: based on probabilities. Causal: pertaining to a cause-effect relationship, hypothesis or relationship. Something is causal if it leads to an outcome or makes an outcome happen. Causal relationship: cause-effect relationship. When you evaluate whether your treatment or program causes an outcome to occur. Causality as the ‘holy grail’ of research – relationships between variables Remember: a variable in a data set that has one value is meaningless, it varies. A variable has attributes. For gender this is female/male. Specific values that a variable can take on. Types of studies 1. Descriptive studies: documents what is going on or what exists. 2. Relational studies: investigates the connection between two or more variables. 3. Causal studies: investigates a causal relationship between two variables. Time in research Cross-sectional study: takes place at a single point in time. Longitudinal: takes place over time. Repeated measures: two or more waves of measurement over time. Time series: many waves of measurement in time. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld Types and patterns of relationships 2 types of relationships: - Correlational relationship: two things perform in a synchronized matter. Does not tell whether one causes the other. - Causal relationship: synchronized relationship between two variables just as a correlational relationship, but one variable causes the other to occur. o Third variable/missing variable problem: unobserved variable that accounts for a correlation between two variables. Patterns of relationships - No relationship at all - Positive relationship: high values for one variable are associated with high values on another variable and low values associated with low values on the other variable. - Negative relationship: high values for one are associated with low values on another variable. - Curvilinear relationship: for example dosage medicines, as dosage rises, severity of illness goes down, but at some point the patient begins to experience negative side effects and the severity of illness increases again. Hypotheses Hypothesis: a specific statement of prediction Alternative hypothesis: specific statement of prediction stating what you expect will happen in you study. Research hypothesis. Null hypothesis: a specific statement that predicts there will be no effect of a program or treatment you are studying. One-tailed hypothesis: hypothesis that specifies a direction; when your hypothesis predicts that your program will increase the outcome for example. Two-tailed hypothesis: does not specify a direction. If your hypothesis is that your program will have an effect on an outcome, but not specified whether it will be positive or negative. Logic of hypothesis testing based on two basic principles (Marriott, 1990):  Two mutually exclusive hypothesis statements that, together, exhaust all possible outcomes, need to be developed.  The hypotheses must be tested so that one is necessarily accepted and the other rejected. Hypothetico-deductive model: model in which two mutually exclusive hypotheses that together exhaust all possible outcomes are tested, such that if one hypothesis is accepted, the second must be rejected. Variables  any entity that can take on different values. Quantitative: numerical representation of some object. Qualitative: data in which variables are not in numerical form, but in the form of text, photographs. Attribute: specific value of a variable. Sex has two attributes: female and male. Agreement can have 5 (likert scale) Each variable’s attributes should be  Exhaustive: property of a variable that occurs when you include all possible answerable responses.  Mutually exclusive: ensures that respondent is not able to assign two attributes simultaneously. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld The independent variable is the presumed cause of something, the variable that you manipulate. The dependent variable is affected by the independent variable, so the effect. Dependent on the y, independent x (in a graph). The independent causes the dependent. It might be that there is a third variable. Unit of analysis: entity that you are analyzing in your analysis: individuals, groups, geographical units (town, state) social interactions (relations, divorces), artifacts (books, photos). Hierarchical modeling: statistical model that allows for the inclusion of data at different levels, where the unit of analysis at some levels is nested within the unit of analysis at others. Inductive vs. deductive reasoning Inductive is associated with qualitative research, deductive with quantitative research. Inductive: Bottom-up reasoning that begins with specific observations and measures and ends up as general conclusion or theory. Observation  Pattern  Tentative hypothesis  Theory Deductive: Top-down reasoning that works from the more general to the more specific. Theory  Hypothesis  Observation  Confirmation 1.4 The structure of research Hourglass model: Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld Model useful because it can be used as review guide. Look at recommendations for future research in a specific topic, could be useful for a thesis. The research proposal includes the upper part of the model. Research question: central issue being addressed in the study, typically phrased in the language of theory. Operationalization: act of translating a construct into its manifestation; translating the idea of your treatment or program into the actual program, or translating the idea of what you want to measure into the real measure. Describe actual program as operationalized program. Major components in a causal study:  The research problem  The research question  The program (cause)  The units  The outcomes (effect)  The design The importance of theory: good theory includes a plausible, coherent explanation for why a certain cause and effect relationship should be expected. Why would we observe a certain cause and effect relationship. Model Binning & Barrett (1989) Relationship between predictor and outcome (empirical realm) 1.5 The validity of research Validity: best available approximation of the truth of a given proposition, inference or conclusion. A construct is an idea, for example it could be intelligence. We can theorize how this can be related into other constructs. Constructs are not directly observable, we define measures to do this. Cause construct: abstract idea or theory of what the cause is in a cause-effect relationship you are investigating. Effect construct: abstract idea or theory of what the outcome is in a cause-effect relationship you are investigating. Validity  Conclusion validity: degree to which conclusions you reach about relationships in your data are reasonable. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld  Internal validity: approximate truth about inferences regarding cause-effect or causal relationships.  Construct validity: degree to which inferences can legitimately be made from the operationalizations in your study to the theoretical constructs on which those operationalizations are based.  External validity: degree to which the conclusions in your study would hold for other persons in other places and at other times. Threats to validity: reasons your conclusion or inference might be wrong. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld Chapter 2 – Ethics 2.1 Foundations of ethics in research Conflict of interest: exists in research when a researcher’s primary interest in the integrity of a study is compromised by a secondary interest as personal gain (for example financial profit). 2.2 Historical cases of unethical research  Nazi Experimentation in WWII o Nuremberg code: developed following the trial of Nazi doctors after World War II. 10 principles to guide research involving human subjects. Extremely important as a reference point for all regulations related to the protection of human subjects.  Stanley Milgram’s Obedience studies: studied the conflict between obedience toward authority and one’s personal conscience.  The Thalidomide Tragedy: involved the occurrence of very serious birth defects in children of pregnant women who had been given Thalidomide as a sedative. The drug side effects should have been known and available to doctors and patients, but were not until much harm had been done.  Kefauver-Harris Amendments: after the thalidomide tragedy, these amendments to the Food, Drug and Cosmetic Act were passed to ensure greater drug safety. For the first time, drug manufacturers were legally required to present evidence on the safety and effectiveness of their products to the FDA before marketing them.  Declaration of Helsinki: World Medical Association adopted this declaration in 1964 in order to provide a set of principles to guide the practice of medical research.  The Tuskegee Syphillis Study: 40-year observational study of the impact of untreated syphilis on men. The participants were low-income African American men who were led to believe they were receiving treatment when in fact they were not. Major stimuli of Belmont (below) 2.3 Evolution of a modern system of research ethics Until the 1970s no comprehensive guidelines existed in the US on the question of what constitutes ethical behavior in research involving human subjects. The first serious attempt at developing a broad ethical framework was triggered by the revelation of the Tuskegee Syphilis Study in 1972.  National Research Act: passed by the US Congress in 1974. It created a national commission to develop guidelines for human subjects research and to oversee and regulate the use of human experimentation in medicine. The Belmont Report: includes basic standards that should underlie the conduct of any biomedical and behavioral research involving human participants. Emphasizes universal principles that are unbounded by time or technology. Three core principles:  Respect for persons: people are to be treated as independent and autonomous individuals. o Vulnerable populations: those who may not be fully in control of their decision making. o Assent: child has affirmatively agreed to participate. o Informed consent: informing study participants about procedures and risks. o Voluntary participation  Beneficence: expected impact on a persons well-being that may result from participation in research.  Justice: participation in research should be based on fairness and not on circumstances that give researchers access to or control of a population based on status. Related guidelines on human subject participation: Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld - Privacy o Anonymity o Confidentiality o De-identification - Deception and debriefing o Deception: use of false/misleading information o Debriefing: process of providing participants with full information - Right to service: to best available services The National Research Act mandated the establishment of Institutional Review Boards (IRBs): panel of people who review research proposals with respect to ethical implications and decide whether additional actions need to be taken to assure the safety and rights of participants. The proposal for research is submitted in the form of a protocol that includes specific information about the study background and goals as well as details on all aspects of care of the participants. Ethics in clinical research Phase I study: research study designed to test a new drug or treatment in a small group of people for the first time to evaluate its safety, determine a safe dosage range, and identify potential side effects. Phase II study: research study designed to test a drug or treatment that is given in a larger group of people than in a Phase I study to see if it is effective and to further evaluate its safety. Phase III study: research study designed to test a drug or treatment that is given to large groups of people using a highly controlled research design, in order to confirm the intervention’s effectiveness, monitor side effects, compare it to commonly used treatments, and collect information that will allow the drug or treatment to be used safely. Ethics in research with animals People for the Ethical Treatment of Animals (PETA) have taken a strong stand against the use and abuse of animals in research. Legal regulation of the treatment of animals is relatively limited and largely left to the oversight of IRBs and the research ethics codes of professional organizations as the American Psychological Association. The APA has published guidelines for the care of animals that include oversight by the local IACUC (Institutional Animal Care and Use Committee). 2.4 Ethics in the production and publication of scholarly work A serious ethical compromise arises when researchers are dishonest in the way they conduct research or report their results. For example William Fals-Stewart and Jan Hendrik Schön. Good examples of research misconduct: fabrication(making up data), falsification(manipulating) or plagiarism(the use of another person’s work without proper credit) in proposing, performing or reviewing research, or in reporting research results. The meaning of significance can be corrupted by piecemeal and duplicate publication: ethical issue in the dissemination of research referring to the possibility that, in order to maximize personal reward from publication, a researcher would essentially repackage the results of a single study in multiple articles. Fairness in publication credit: authorship credit should be established on the basis of the quality and quantity of one’s contributions to a study, rather than on status, power or any other factor. Additional information from lecture (not in book) Research Ethics Brainstorm Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld Ethics refers to the appropriateness of your behavior in relation to the rights of those who become the subjects of your work, or are affected by it. Research ethics relates to questions about how we: - Formulate and clarify our research topic - Design our research and gain access (AIDS topic where people did not have access to the medicines) - Collect data – be aware of the interest of the participants - Process and store our data – anonymity (example: patient records) - Analyze data – It’s important to be transparent, some researches look at their data, if it doesn’t fit their hypotheses they might use the data set for another research. This is highly unethical but it happens. - Write up our research findings – plagiarism, being transparent... in a moral and responsible way Ethics pertains to the entire research. Marketing is unethical as a discipline. Two dominant philosophical standpoints:  Deontological view: Argues that the ends served by research can never justify the use of research which is unethical  Teleological view: Argues that the ends served by your research possibly justify the means (weighing costs and benefits) Milgram: with enough authority people can be forced to do something. This shows how the nazi’s could have been so powerful. This is a powerful lesson that we learned, however the study itself was not ethical as he hurt people in his research. BUT this knowledge might prevent the next. Despite the fact that the data is tainted, current researches might see value in the data. The Nazi data are reliable and useful in current researches. Difficult to decide whether to use the data or not. ‘We cannot reverse the history’. Difficult topics. Ethics: personal opinions. If I had died for this data, what would I want? I would want people to use it, to build life vests or wetsuits that can safe people’s lives.. General Ethical Issues:  Non-malfeasance: avoid embarrassment, stress, discomfort, pain or harm to participants  Privacy of possible and actual participants – people need to be informed prior to the research, and they need to have the possibility to withdraw from the research anytime.  Voluntary nature of participation and the right to withdraw partially or completely from the process  Consent (obtaining permission) and possible deception of participants – some people are handicapped, or don’t read everything. You never know if the respondent was really informed. People should be informed enough to make a good decision.  Maintenance of the confidentiality of the data provided  Behavior and objectivity of you as a researcher – not changing the data etc. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld ‘I have read the foregoing information, or it has been read to me. I have had the opportunity to ask questions about it and any questions that I have asked have been answered to my satisfaction. I consent voluntarily to participate as a participant in this research’. The Pygmalion Effect: research with children that were expected to become smarter, told to the teacher. After one year these randomly chosen children actually had become smarter than the other children. Is this ethic? As researchers expect something, they probably treat children/animals/.. differently. As a result of that, researchers try to distance themselves from the research. Placebo effect. Double-blind experiments: nobody is aware who is receiving the placebo or the real treatment until after the research. Covert studies and the use of deception  Reactivity can be circumvented by utilizing a covert study  Covert studies should always be followed by debriefing  Habituation may alleviate the perceived need to deceive – getting used to the researcher Guidance on ethics  Codes on ethics o American Psychological Association’s Ethical Principles of Psychologists and Code of Conduct o British Psychological Society’s Ethical Principles for Conducting Research with Human Participants Look at these codes!  Research Ethics Committee a.k.a. Institutional Review Boards o More common in the US based universities  Data protection legislation Chapter 3 – Qualitative Approaches to research Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld 3.1 Foundations of qualitative research 3.2 The context for qualitative research Qualitative: descriptive nonnumerical characteristic of some subject. Qualitative measures: data not recorded in numerical form. Approach in situations where you want to: - Generate new constructs, theories or hypotheses - Develop detailed stories to describe a phenomenon - Achieve a deeper understanding of issues o Peripheral vision - Improve the quality of quantitative measures o Focus groups, interviews, input from experts and other approaches. 3.3 Qualitative traditions Qualitative tradition: general way of thinking about conducting qualitative research. (4 traditions) 1. Ethnography: study of a culture using qualitative field research. Mostly participant observation: a method of qualitative observation where the researches becomes a participant in the culture or context being observed. 2. Phenomenology: a philosophical perspective as well as an approach to qualitative methodology that focuses on people’s subjective experiences and interpretations of the world. Meaning units: in qualitative data analysis, a small segment of a transcript or other text that captures a concept that the analyst considers to be important. 3. Field research: a research method in which the researcher goes into the field to observe the phenomenon in its natural state. Participatory action research (PAR) goes the furthest in varying the researcher-participant- data relationship by turning researchers into participants, participants into researchers and data into action. 4. Grounded theory: a theory rooted in observation about phenomena of interest. Also a method for achieving such a theory. Developed by Glaser and Strauss in the 1960s. Complex process in which development of a theory and the collection of data related to that theory build on each other. Tradition Focus Data collection Analysis Researcher role Etnography Culture Mostly participant Identification of themes Active observation participant Field research In situ observation Participant/direct Identification of themes Observer observation Phenomenology Subjective Interviews Discovery of in vivo Observer experiences/interpretations meaning units Grounded Core theoretical concepts Interviews Iteratively establishing Observer theory links between constructs 3.4 Qualitative methods - Participant observation: most common and most demanding method. Researcher becomes a participant in the culture or context being observed. Needs to become accepted as a natural part of the culture to ensure observations are naturally. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld - Direct observation: the process of observing a phenomenon to gather information about it. This process is distinguished from participant observation; o Not typically become a participant in the context; o The direct observer tries to be unobtrusive to not bias the research; o More detached perspective, researcher is watching instead of taking part; o More structured, researcher is observing instead of become immersed o Takes not as long as participant observation - Unstructured interviewing: interviewing method that uses no predetermined interview protocol or survey and where the interview questions emerge and evolve as the interview proceeds. o Like a natural conversation o More difficult to analyze, especially when synthesizing across respondents - Case study: intensive study of a specific individual, event, organization or specific context. Often used in business, law and policy analysis. Mostly combination of methods. Also quantitative approaches for studying cases. Observations, clinical notes, medical history, life history. - Focus groups: popular in marketing and other social research; they enable researchers to obtain detailed information about attitudes, opinions and preferences of selected groups of participants. o What will the specific focus be? o Who will participate? o How will you record the observations? o How will you analyze the date? Ethics are important as well, especially if the topic is a sensitive one. Unobtrusive measures: methods of collecting data that do not interfere in the lives of the respondents. Reduces the biases that result from the intrusion of the researcher or measurement instrument. But a researcher may also miss out on relevant information as a result of not engaging with the study audience. Unobtrusive measures are used in both qualitative and quantitative research. For example secondary analysis of data. 2 approaches in qualitative research: 1. Indirect measures: unobtrusive measure that occurs naturally in a research context. Researcher is able to collect data without the respondent being aware of it. 2. Content analysis: systematic analysis of text documents. Can be quantitative, qualitative or both. Typically, the major purpose of content analysis is to identify patterns in text.  Thematic analysis of text: identification of themes or major ideas;  Indexing: variety of automated methods for rapidly indexing text documents exists. Key Words in Context analysis (computer analysis of text data). An exception dictionary excludes all nonessential words like is, and, and of in a content analysis study. Steps of phases in content analysis: o Sampling from the population of potential texts to select the ones that will be used, o Unitizing: dividing a continuous text into smaller units that can be analyzed o Coding: categorizing qualitative data Limitations content analysis: o Limited to the types of information available in text form o Have to be careful with sampling to avoid bias o Have to be careful about interpreting results of automated context analyses Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld 3.5 Qualitative data Qualitative data can be recorded in numerous ways; stenography, audio recording, video recording, written notes, written papers, image-based, twitter analysis. Qualitative & quantitative data are easily contrasted, but:  All qualitative data can be coded quantitatively  All quantitative data are based on qualitative judgement Mixed methods research: uses a combination of qualitative and quantitative methods. 3.6 Assessing qualitative research Credibility: degree to which the results of qualitative research are believable from the perspective of the participant; to what extent do they make sense? Transferability: degree to which the results can be generalized or transferred to other contexts or settings. Dependability: Need for the researcher to account for the ever-changing context within which research occurs. Researcher responsible for describing the changes that occur in the setting and how these changes might affect the conclusions that are reached. Reliability emphasizes the researcher’s Traditional criteria for judging quantitative Alternative criteria for judging qualitative research research Internal validity Credibility External validity Transferability Reliability Dependability Objectivity Confirmability responsibility to develop measures that would yield consistent results. Confirmability: degree to which the results could be confirmed or corroborated by others. Data audit; the process of systematically assessing the quality of data in a qualitative study. Secondary analysis: quantitative analysis of existing data that is done either to verify or extend previously accomplished analyses or to explore new research questions. Using existing data can be an efficient alternative to collecting original primary data and can extend the validity, quality and scope of what can be accomplished in a single study. See figure 3.5. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld Chapter 4 – Sampling 4.1 Foundations of sampling Feasibility will determine the construction or refinement of your research question and objectives, and may sometimes lead to a clash. Sampling: the process of selecting units (participants) from the population of interest. By studying the sample you can generalize the results to the population from which the units were chosen. Descriptive and exploratory studies rely on a small number of cases that can be studied intensively, some studies aim for results that are very general and represent populations (larger samples). 4.2 Sampling terminology Groups involved in a sampling process: - Population: the group you want to generalize to and the group you sample from in a study - Theoretical population: group which, ideally, you would like to sample from and generalize to. Contrasted with accessible population. - Accessible population: group that reflects the theoretical population of interest and that you can get access to when sampling. Get a list of the members. - The sampling frame: list from which you draw your sample. When no list, you draw your sample based upon an explicit rule. - Sample: the actual units you select to participate in your study. See figure 4.3. Sampling is a difficult multistep process, you can go wrong in many places; possibility of introducing systematic error or bias: systematic error, the result of any factor that leads to an incorrect estimate. When bias exists, the values that are measured do not accurately reflect the true value. 4.3 External validity Sometimes we try to reach conclusions beyond the sample in our study  Generalizing: the process of making an inference that the results observed in a sample would hold in the population of interest. If valid: generalizability. Possible when a study has good evidence/external validity. The degree to which the conclusions in your study would hold for other persons in other places and at other times. The best change of a representative sample is if it is randomly selected to that every member has an equal chance of participating. Approaches to external validity in sampling: sampling model and proximal similarity model Sampling model: a model for generalizing in which you identify your population, draw a fair sample, conduct your research, and finally generalize your results from the sample to the population. Difficulties: - You might not know what part of the population you will want to generalize to. - You may not be able to draw a fair or representative sample easily - Impossible to sample across all times that you might like to generalize to ‘ Proximal similarity model: a model for generalizing from your study to other contexts based upon the degree to which the other context is similar to your study context. Gradient of similarity: the dimensions along which your study context can be related to other potential contexts to which you might wish to generalize. Contexts that are closer to yours along the gradient of similarity of place, Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld time, people and so on can be generalized to with more confidence than ones that are further away. See figure 4.5. The sampling model is more traditional and widely accepted, but when it comes to external validity you should think in terms of proximal similarity but use the sampling model to the extent it is practical. 4.4 Sampling methods 2 categories: probability and nonprobability sampling. - Probability samples are selected that every member of the population has a known probability of being included in the sample. Smaller descriptive studies (interviews etc.) - Nonprobability samples are selected on the basis of researcher judgement (larger surveys) Difference: probability sampling utilizes some form of random selection, nonprobability samples don’t. You can estimate confidence intervals(used to indicate the precision of an estimate of a statistic. Provides the lower and upper limits of the statistical estimate at a specific probability level. 95% confidence interval for an estimate of a mean or average is the range of values within which there is a 95% chance that the true mean is likely to fall. Regression line (see slide lecture) – sample can differ from the population. 4.5 Nonprobability sampling 2 broad types: accidental or purposive. Most methods are purposive in nature because the sampling problem is approached with a specific plan in mind. - Accidental, haphazard or convenience sampling: on-the-street interviews. No evidence that they are representative of the populations you’re interested in generalizing to. Convenience sampling is therefore often used in pilot or feasibility studies, it allows the researcher to obtain basic data. - Purpose sampling: sample with a purpose related to the kind of participant you’re looking for. Useful in situations where you need to reach a targeted sample quickly and where sampling for proportionality is not the primary concern. All of the following can be considered subcategories of purposive sampling methods. - Modal instance sampling: sampling the most frequent case/typical case. For example typical voters in a typical town. - Expert sampling: sample of people with known or demonstrable experience and expertise in some area. For example a panel of experts. Specific expertise, evidence for the validity. Also used when there is insufficient data on a particular topic. - Quota sampling: any sampling method where you sample until you achieve a specific number of sampled units for each subgroup of a population. o Proportional quota sampling: sample until you achieve a specific number of sampled units for each subgroup of a population, where the proportions in each group are the same. Major characteristics of the population. o Nonproportional quota sampling: less restrictive; you specify the minimum number of sampled units you want in each category. Similar to stratified random sampling. An issue is the potential overrepresentation of those individuals who are more convenient to reach  biased results. - Heterogeneity sampling: sampling for diversity or variety. Include all opinions or views. Broad and diverse range of participants is necessary. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld - Snowball sampling: sampling method in which you sample participants based upon referral from prior participants. For research to homeless, sex workers etc. they recommend others. Broad coverage of a population, because respondents reached through social networks. But no random selection, so more likely to be biased.  Respondent-driven sampling: a nonprobability sampling method that combines chain- referral or snowball sampling, with a statistical weighting system that helps compensate for the fact that the sample was not drawn randomly. Nonprobability methods can be a good fit for the kind of research question, especially in the early phases of research in an area where little is known. Sometimes these methods are the only available options for practical reasons. The major limit is the external validity. 4.6 Probability sampling: theory  Method of sampling that utilizes some form of random selection The units that you sample (people) supply you with one or more responses. A response is a specific measurement value that a sampling unit supplies. When you summarize the numerical responses, you use a statistic, a value that is estimated from data, such as mean, median and mode. If you measure the entire population and calculate a value like a mean or average, this is not a statistic but a population parameter. The sampling distribution: theoretical distribution of an infinite number of samples of the population of interest in your study. Infinite is not a number we can reach, but you need to realize that your sample is just one of a potentially infinite number of samples that you could have taken. While the statistic from your sample is near the center of the sampling distribution, you could have gotten on of the extreme samples. If you take the average of the sampling distribution – the average of the averages – you would be closer to the true population average, the parameter of interest. Standard deviation: indicator of the variability of a set of scores in a sample around the mean of that sample. – The spread of the scores around the average in a single sample. Standard error: the spread of the averages around the average of averages in a sampling distribution. Associated with sampling, the error in measurement is called sampling error. To calculate the sampling error you base the calculation on the standard deviation of your sample; the greater the sample’s standard deviation, the greater the standard error. And the greater the sample size, the smaller the standard error. Because the greater the sample size, the closer your sample is to the actual population. When you calculate the entire population, the mean is the parameter. Normal curve: a common type of distribution where the values of a variable have a bell-shaped histogram or frequency distribution. Approximately 68% of cases occur within one standard deviation of the mean, 95% of the cases within 2 standard deviations, and 99% within 3 standard deviations. Also known as bell curve. The intervals of 68, 95 and 99% are confidence intervals. See figure 4.14. 4.7 Probability sampling: procedures Initial definitions 𝑁 is the number of cases in the sampling frame 𝑛 is the number of cases in the sample 𝑁 𝐶 𝑛 is the number of combinations of n from N 𝑓 = 𝑛 ∕ 𝑁 is the sampling fraction Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld Simple random sampling: a method of sampling that involves drawing a sample from a population so that every possible sample has an equal probability of being selected.  Objective: to select n units out of N such tat 𝑁 𝐶 𝑛 has an equal chance of being selected.  Procedure: use a table of random numbers, a computer random-number generator, or a mechanical device to select the sample. Stratified random sampling: method of sampling that involves dividing your population into homogeneous subgroups and then taking a simple random sample in each subgroup. Systematic random sampling: method where you determine randomly where you want to start selecting in the sampling frame and then follow a rule to select every xth element in the sampling frame list. 1. Number the units in the population from 1 to N 2. Decide on the n (sample size) that you want or need 3. Calculate 𝑘 = 𝑁 ∕ 𝑛 = the interval size 4. Randomly select an integer between 1 and k 5. Take every kth unit Cluster random sampling/area random sampling: involves dividing the population into groups called clusters, randomly selecting clusters, and then sampling each element in the selected clusters. Useful when sampling a population that is spread across a wide area geographically. 1. Divide population into clusters 2. Randomly sample clusters 3. Measure all units within sampled clusters Multistage sampling: combining of several sampling techniques to create a more efficient or effective sample than the use of any one sampling type can achieve on its own. 4.8 Threats to external validity  any factors that can lead you to make an incorrect generalization from the results of your study to other persons, places, times, or settings. People, places or times. Improving external validity:  Random selection  Respondents dropout rates low  Second approach  Show critics that they’re wrong, by replication: study repeated in different place, time or setting. See table 4.1 !! Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld Chapter 5 – Introduction to measurement 5.1 Levels of measurement  Relationship between numerical values on a measure. There are different types of levels of measurement that determine how you can treat the measure when analyzing it.  Nominal level: measuring a variable by assigning a number arbitrarily in order to name it numerically so that it might be distinguished from other objects. The jersey numbers in most sports  Ordinal level: measuring a variable using rankings. Class rank  Interval level: distance between numbers is interpretable/meaningful, celcius  Ratio level: distance between numbers is interpretable and there is an absolute zero. Weight. Knowing the level of measurement - Helps you decide how to interpret the data - Helps you decide what statistical analysis is appropriate 5.2 Quality of measurement 1. Reliability 2. Validity Reliability The consistency or stability of an observation/measure. Does the observation provide the same results each time?  true score theory, a theory that maintains that an observed score is the sum of two components: true ability and random error. X = T + e = T+𝑒𝑟 + 𝑒𝑠 (Observed score = True ability + random error).  Random error (individual level) – noise. A component or part of the value of a measure that varies entirely by chance  Systematic error (at sample level) – bias. A component of an observed score that consistently affects the responses in the distribution. It affects the average: bias. Important: - Simple, powerful model for measurement. Reminder that most measurement will have an error component. - Foundation of reliability theory; a measure without random error is perfectly reliable, a measure that has no true score has zero reliability. - Can be used in computer simulations as the basis for generating observed scores with certain known properties. Other models: Item Response Theory, Tasch model, Generalizability theory. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld Reducing measurement error: 1. Pilot test instruments to get feedback from your respondents 2. Train people that collect the data 3. Double-check the data 4. Statistical procedures to adjust for measurement error 5. Use multiple independent measures of the same systematic construct: triangulate. Theory of reliability  Reliable means a measurement is repeatable and consistent.  Reliability is a ratio: o Proportion of truth in your observation o Determined using group of individuals, not a single observation o (co)variance and standard deviation o The higher the correlation, the more reliable the measure o Reliability can only be estimated 𝑇𝑟𝑢𝑒 𝑙𝑒𝑣𝑒𝑙 𝑜𝑛 𝑡ℎ𝑒 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 as a simple ratio: 𝑡ℎ𝑒 𝑒𝑛𝑡𝑖𝑟𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 (𝑒𝑟𝑟𝑜𝑟 𝑖𝑛𝑐𝑙𝑢𝑑𝑒𝑑) 𝑇ℎ𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑡𝑟𝑢𝑒 𝑠𝑐𝑜𝑟𝑒 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑇) In terms of variances: 𝑇ℎ𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑒𝑛𝑡𝑖𝑟𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 or 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋) 𝒄𝒐𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 (𝑿𝟏 ,𝑿𝟐 ) To estimate reliability: 𝒔𝒅 (𝑿𝟏 )∗𝒔𝒅 (𝑿𝟐 ) 𝑣𝑎𝑟 (𝑇) Reliability ratio expressed in terms of variances: 𝑣𝑎𝑟 (𝑇)+𝑣𝑎𝑟 (𝑒) 𝑣𝑎𝑟 (𝑇) No error in measurement = perfect reliability = estimate is 1.0 𝑣𝑎𝑟 (𝑇) 0 Only error in measurement = no reliability = estimate is 0 𝑣𝑎𝑟 (𝑒) Types of reliability (see figures book): - Inter-rater or inter-observer reliability: degree of agreement or correlation between the ratings or codings of 2 independent raters or observers. o Cohen’s Kappa: more robust than percent agreement, adjusts for probability that some agreement is due to random chance Best way when measure is observation done by one person or if you were interested in using a team of raters. - Test-retest reliability: correlation between scores on the same test or measure at two successive time points Feasible in experiments that use a no-treatment control group. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld - Parallel-forms reliability: correlation between two versions of the same test or measure that were constructed in the same way, usually by randomly selecting items from a common test question pool Use in situations where you intend to use 2 forms as alternate measures of the same thing. - Internal consistency reliability: assesses the degree to which items on the same multi-item instrument are interrelated. Consistent findings. o Average interitem correlation: average of the correlations of all pairs of items o Average item-total correlation: average of individual item-total correlations o Split-half reliability: correlation between total score of 2 randomly selected halves of the same multi-item test or measure o Cronbach’s Alpha ∝: analogous to the average of all possible split-half correlations. Perfect internal consistency (Cronbach’s Alpha=1). This never happens. Validity The extent to which your measure or instrument actually measures what it is theoretically supposed to measure. Crucial for any kind of meaningful measurement. It deals with accuracy or precision of measurement.  Weight on scale could be the same every day, but there could be an error in the scale itself. Theory: what you think, observation: what you do/see. Construct validity is an assessment of how well your actual programs or measures reflect your ideas or theories. Figure 5.26. Any time you translate a concept/construct into a functioning and operating reality, you need to be concerned about how well the translation reflects what you intended. Construct validity Operationalization: the act of translating a construct into its manifestation, for example, translating the idea of your treatment or program into the actual program. The result is also referred to as an operationalization.  Translation validity: how well translated the idea of your measure into its operationalization o Face validity: checks that operationalization seems a good translation o Content validity: check of operationalization against relevant content domain  Criterion-related validity: based on relationship to another independent measure as predicted by the theory of how the measures should behave. o Predictive validity: based on the idea that your measure is able to predict what it theoretically should be o Concurrent validity: operationalization’s ability to distinguish between groups that it should be able to o Convergent validity: degree to which operationalization is similar to others to which it should o Discriminant validity: degree to which operationalization is not similar to others that is should not be  see figures in book Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld Threats to construct validity: any factor that causes to make an incorrect conclusion about whether your operationalized variables reflect well the constructs. - Deficiency versus contamination - Inadequate preoperational explication of constructs, you didn’t define what you meant by the construct before trying to translate it into a measure/program. o Think through concepts better; o Use structured methods to articulate concepts; o Get experts to critique operationalizations. - Mono-operation bias: when you rely on a single implementation of your independent variable, cause, program or treatment in your study. - Mono-method bias: when only used a single method of measurement. - Interaction of different treatments - Interaction of testing and treatment - Restricted generalizability across constructs - Confounding constructs and levels of constructs Social threats: - Hypothesis guessing: participants in a study guess the purpose and adjust their responses based on that. - Evaluation apprehension: people are anxious about being evaluated, could make them perform poorly. - Researcher expectancies 5.3 Integrating reliability and validity Separate ideas but intimately interconnected. See figure 5.30. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld Chapter 6 – Scales, tests and indexes 6.1 Foundations of scales, tests, and indexes  The three most common approaches for quantitatively measuring a construct. 6.2 Scales and scaling Scaling: involves the construction of an instrument that associates qualitative constructs with quantitative metric units. Unidimensional scaling types: - Thurstone or Equal-Appearing Interval scaling - Likert or summative scaling - Guttman or cumulative scaling Types of scales - Response scale o Sequential-numerical response format, such as 1-to-5 rating  Dichotomous response: 2 possible options (yes/no)  Interval response scale: measured on interval level, size is meaningful Purposes of scaling: - Test a hypothesis - Discover if a construct is unidimensional or multidimensional - As part of exploratory research - To represent a construct as a single score Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld semantic differential: scaling method in which an object is assessed by the respondent on a set of bipolar adjective pairs. Unidimensional or multidimensional?  Unidimensional scales are easier to use and understand. Used when o What you are measuring is unidimensional reality o If the concept is not unidimensional, this scale will not measure the concept well, and you need to choose a multidimensional approach Undimensional types  Thurstone scaling – equal appearing intervals o Develop the focus of scaling project o Generate potential scale items (statements); 80-100 separate items o Rate the favorability to attitude of scale items o Compute scale score values for each item  Median, interquartile range o Selecting the final scale items o Administering the scale (getting respondents to agree or disagree) o Average scale scores of items agreed  Likert scaling o Define the focus o Generate the items on a 1-5 or 1-7 agree/disagree response scale o Have a group of judges rate the items on favorability to the concept o Select the items by computing the item total correlations of items o Administer the scale  Forced choice response scale; reversal  Guttman scaling o Define the focus o Develop the items (80-100) o Have a group of judges rate the items (Y/N) o Develop the cumulative scale o Administer the scale Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld 6.3 Tests  measure designed to measure a respondent’s knowledge, skill or performance. History of testing goes back to 2200 BCE. The key for a good test is reliability and validity. You can fake a scale, you cannot fake a test. Standardized tests: method of test construction that uses statistics and a large sample of previously taken tests to ‘standardize’ the measurement. Includes statistics as mean, median, mode, percentiles, variances and standard deviation, correlations with other, related tests. Test may be unfair. Anything has error, any decision taken on any test has error. Many factors affect standardized test performance  Consequential validity  Many decision makers have begun to either do away with standardized test scores or to combine single test scores with other measures of ability. How to find a good test or measure (scriptie)  Test publishers  Primary research literature  Test reviews in academic journals  Buros center for testing at the university of lowa  APA  http://inn.theorizeit.org/  Academy of management research division measure chest 6.4 Indexes  A quantitative score that measures a construct of interest by applying a formula or a set of rules that combines relevant (usually pre-existing) data Grade point average. Chapter 7 – survey research 7.1 Foundations of survey research Survey: A measurement tool used to gather information from people by asking questions about one or more topics. Collected from a sample, usually to generalize to the population. 2 ways:  Questionnaire  Interview 7.2 Types of survey research Questionnaires - Mail survey: paper-and-pencil survey send through mail. Inexpensive to administer, but low response rates and no detailed written responses. - Group-administered questionnaire: survey that is administered to respondents in a group setting. For example all students in a classroom. High response rate, ask for clarification and easy to assemble the group. - Household drop-off survey: paper-and-pencil survey that is administered by dropping it off at the respondents household and picking it up later. Direct personal contact while allowing time and privacy. Higher response rate, but less economical (high costs). Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld - Point-of-experience survey: delivered at or immediately after the experience that the respondent is being asked about. For example a customer satisfaction survey. Socially desirable answers, when presence of people who treated them. - Electronic survey: administered via computer program, distributed via email/website. Elimination of costs, timely responses and international respondents. o Email survey: surveys are pushed directly, but limited in interaction, easier to create. o Web surveys o Dual-media survey: distributed in 2 ways; as attachment, or directly on the web. Interviews - Personal interview: one-to-one interview. Interview guide with a script and follow-up prompts. Opinions and impressions, but time-consuming and resource intensive. - Group interview: administered to respondents in group setting. Structured interview. - Focus group: qualitative measurement method where input on one or more focus topics is collected from participants in a small-group setting where the discussion is structured and guided by a facilitator. - Telephone interview: personal interview conducted over the telephone. Rapid information, personal contact, follow-up questions. But many people are not reachable, and more expensive. 7.3 Selecting the survey method  Depends on the target population, the kind of information that is being sought, and the availability of resources, including budget and time. - Population issues: if population units can be identified, if the population is literate, language issues, cooperation and geographic restrictions. - Sampling issues: data availability, respondent, who and where, all members of population, response rates and incentives for participation. - Question issues: question types, filter questions, question sequence, lengthy questions and long response scales. - Content issues: if respondent knows about issue, if respondent needs to consult records. - Bias issues: social desirability, interviewer distortion and subversion, false respondents. - Administrative issues: costs, facilities, time and personnel. 7.4 Survey design Reflect on the issues before you start. Keep in mind the purpose of the survey, what do you need to know? Three primary issues in writing a question: - Determining the question content, scope and purpose; - Choosing the response format that you use for collecting information from the respondent; - Figuring out how to word the question to get at the issue of interest. Types of questions Structured and unstructured(open-ended). Structured questioning: - Dichotomous response format: allows respondent to choose between only two possible responses (yes/no, male/female) - Questions based on level of measurement o Nominal response format: response format with number beside each choice where the number has no meaning except as a placeholder for that response. (occupational class) Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld o Ordinal response format: respondents are asked to rank the possible answers in order of preference. o Interval level response format: using numbers spaced at equal intervals where the size of the interval between potential response values is meaningful. 1-5 scale. o Likert-type response scale: responses are gathered using numbers spaced at equal intervals (1=strongly agree etc.) - Filter or contingency questions: to determine whether the respondents are qualified or experienced enough to answer a subsequent one. Keep in mind: o Avoid having more than three levels for any question (if yes, …); o Use graphics to jump (arrow and box); o Jump to a new page if possible. Question content - Is the question necessary and useful? - Are several questions needed? o Double-barreled question: asks about 2 issues but only allows respondent for a single answer. Then split the question into 2. - Do respondents have the needed information? - Does the question need to be more specific? - Is the question sufficiently general? - Is the question biased or loaded? - Will the respondent answer truthfully? o Response brackets: includes groups of answers, as between 30-40 years, income. Response format  The format you use to collect the answer from the respondent.  Structured response format: specific format for the respondent to choose their answer. o Fill-in-the-blank (Name: ______) o Check the answer: X next to response. More than one response possible.  Multi-option or multiple-response variable: respondent can pick multiple variables from a list. o Circle-the-answer  Single-option variable  Unstructured response format: not predetermined and allows respondent/interviewer to determine how to respond. Open-ended question. Question wording - Can the question be misunderstood? - What assumptions does the question make? - Is the time frame specified? - How personal is the wording? - Is the wording too direct? - Other wording issues Question placement - The answer may be influenced by prior questions; - The question may come too early or too late to arouse interest; - The question may not receive sufficient attention because of the questions around it. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld Opening questions are important, they will determine the tone for the survey. Therefore they should be easy to answer. Sensitive questions: difficult or uncomfortable subjects. Before this you have to develop trust. Easier warm-up questions will help, and in advance mention the purpose of the research. - Start with easy, nonthreatening questions; - Put more difficult, threatening questions near the end; - Never start a mail survey with an open-ended question; - For historical demographics, follow chronological order; - Ask about one topic at a time; - When switching topics, use transition; - Reduce response set; - For filter or contingency questions, make a flowchart. The golden rule: - Thank the respondent at the beginning - Keep your survey as short as possible - Be sensitive to the needs of the respondent - Be alert for any sign that the respondent is uncomfortable - Thank the respondent at the end - Assure the respondent that you will send a copy of the final results – and do. 7.5 Interviews Most challenging and rewarding forms of measurement: personal sensitivity and adaptability required as well as ability to stay within the bounds of the protocol. Role of the interviewer - Locate and enlist cooperation of respondents; - Motivate respondents to do a good job; - Clarify any confusion/concerns; - Observe quality of responses; - Conduct a good interview. Training the interviewers - Describe the entire study; - State who is the sponsor of the research; - Teach enough about survey research; - Explain the sampling logic and process; - Explain interviewer bias; - Walk through the interview; - Explain respondent selection procedures: o Reading maps o Identifying households o Identify respondents o Rehearse the interview o Explain supervision o Explain scheduling Interviewers kit A 3-ring notebook, maps, sufficient copies of the survey instrument, official identification, a cover letter from the principal investigator/sponsor, phone number the respondent can call. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld Conducting the interview - Opening remarks - Gaining entry - Doorstep technique - Introduction - Explaining the study - Asking the questions - Use questionnaire carefully, but informally - Ask questions exactly as written - Follow the order given - Ask every question - Don’t finish sentences Obtaining adequate responses – the probe  The silent probe: pause and wait  Overt encouragement: uh-huh, okay  Elaboration: would you like to elaborate on that?  Ask for clarification  Repetition Recording the response - Record responses immediately and include all probes. Conducting the interview - Thank respondent - Tell them when you expect to send results - Don’t be brusque or hasty - After leaving, write down notes about how the interview went Chapter 8 – Introduction to design 8.1 Foundations of design 8.2 Research design and causality Evidence for a causal relationship: 1. Temporal precedence: criterion for establishing a causal relationship that holds that the cause must occur before the effect. Time order. 2. Covariation of the cause and effect: if X then Y, if not X then not Y. 3. No plausible alternative explanations: correlation does not mean causation, it’s possible that some other variable or factor is causing the outcome. Incorporate a control group: compared or contrasted with a group that receives the program or intervention of interest. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld Internal validity Key question: whether observed changes can be attributed to your program or intervention(the cause) and not to other possible causes (alternative explanations for the outcome). It is possible to have internal validity and not have construct or external validity. Causal validity. Threats to internal validity: leads to draw an incorrect conclusion that causes the outcome. Single- group threats, multiple-group threats and social threats to internal validity. Single-group threats: occurs in a study that uses only a single program or treatment group and no comparison or control. - Posttest-only single-group design: intervention and posttest where measurement of outcomes is only done within a single group of program recipients. - Pretest-posttest: research design that uses measurement both before and after an intervention or program. - Single-group design: involves only a single group in measuring outcomes. Threats:  History threat: occurs when some historical event affects your study outcome.  Maturation threat: occurs as a result of natural maturation between pre- and post- measurement. Getting older.  Testing threat: when taking the pretest affects how participants do on the posttest. Different supervisor.  Instrumentation threat: arises when instruments (or observers) used on the posttest and pretest differ.  Mortality threat: occurs because a significant number of participants drop out.  Regression threat/regression artifact/regression to the mean: causes a groups average performance on one measure to regress toward or appear closer to the mean of that measure, more than anticipated/predicted. Occurs whenever you have a nonrandom sample from a population and two measures that are imperfectly correlated. Will bias your estimate of the groups posttest performance, can lead to incorrect causal inferences. Multiple-group threats Selection threat/bias: groups were not comparable before the study. Factor that leads to pretest differences between groups.  Selection-history threat: results from any other event that occurs between pretest and posttest that the groups experience differently.  Selection-maturation threat: arises from any differential rates of normal growth between pretest and posttest for the groups.  Selection-testing threat: occurs when a differential effect of taking the pretest exists between groups on the posttest. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld  Selection-instrumentation threat: results from differential changes in the test used for each group from pretest to posttest.  Selection-mortality threat: arises when there is differential non-random dropout between pretest and posttest.  Selection-regression threat: occurs when there are different rates of regression to the mean in the two groups between pretest and posttest. Quasi-experimental design: designs that have several of the key features of randomized experimental designs, as pre-post measurement and treatment-control group comparisons, but lack random assignment group. Social threats: social research conducted in real-world human contexts where people will react to not only what affects them, but also to what is happening to others around them.  Diffusion or imitation of treatment: comparison group learns about the program either directly or indirectly from program group participants.  Compensatory rivalry: one group knows the program another group is getting and, because of that, develops a competitive attitude with the other group. Often knowing that program group is receiving a desirable program.  Resentful demoralization: comparison group knows what the program group is getting, and instead of developing rivalry, the members become discouraged or angry and give up.  Compensatory equalization of treatment: control group is given a program or treatment designed to make up for or compensate for the treatment the program group gets. This diminishes the researchers ability to detect the program effect. Threats can be minimized by constructing multiple groups that are unaware of each other, or by training administrators in the importance of preserving group membership and not instituting equalizing programs. But researchers will never be able to eliminate the possibility that human interactions are making it more difficult to assess cause-effect relationships. Other ways to rule out threats to internal validity Good research designs minimize the plausible alternative explanations for the hypothesized cause- effect relationship. Methods other than the research design:  By argument: make an argument that the threat is not a reasonable one, a priori or a posteriori.  By measurement or observation: demonstrating that it does not occur at all or occurs so minimally as to not be a strong alternative explanation for the cause-effect relationship.  By analysis: study plausibility of an attrition or mortality threat by conducting a two-way factorial experimental design.  Preventive action. These 4 methods are not mutually exclusive. They may be used in addition to the research design itself to strengthen the case for internal validity. 8.3 Developing a research design Structure of research, it tells you how all the elements in a project fit together. Every research design should include (figure 8.7): - Time; some time has elapsed between occurrence of the cause and consequent effect. - Treatments or programs; X. - Observations or measures; O. - Groups or individuals; individuals who participate in various conditions. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld 8.4 Types of designs - Random assignment to groups? (not the same as random selection).  true experiment. - Either multiple groups or multiple waves of measurement?  Quasi-experiment or non-experiment.  Posttest-only randomized experiment: groups are randomly assigned and receive only a posttest.  Pre-post nonequivalent groups quasi-experiment: structured like a randomized experiment, but lacking random assignment to group.  Posttest-only nonexperimental design: only a posttest is given. Referred to as nonexperimental because no control group exists. Expanding on basic designs - Expanding across time - Expanding across programs - Expanding across observations - Expanding across groups Chapter 9 – Experimental design 9.2 Introduction: the origins of experimental design 1919 – Fisher’s breakthrough in experimental design, by randomness. Random assignment: process of assigning your sample into two or more subgroups by chance. Procedures for random assignment can vary from flipping a coin to using a table of random numbers. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld Experimental designs are considered the strongest of all designs in terms of internal validity. If X, then Y and if not X, then not Y. Important to have a program/treatment group(receives the program of interest) and a comparison group(compared/contrasted). The 2 groups are probabilistically equivalent: they would perform identically. This does not mean they will obtain the exact same average score. X treatment, R for control group. O for measures. Threats to internal validity Advantages experimental design:  2 groups, where one serves the function of a control group;  Random assignment ensures the 2 groups are equivalent to each other Threats:  Selection-mortality threat: differential rates of dropouts in the 2 groups could result the treatment is noxious or negative, or the control group condition is difficult to tolerate.  Social threats  It is likely that research participants (often students) are aware of each other and the conditions to which they are assigned. Two-group posttest-only randomized experiment: research design in which two randomly assigned groups participate. Pretest not required because this design uses random assignment. You measure the groups on one or more measures (O’s) and compare them by testing for the differences between means using a t- test or one-way Analysis of Variance (ANOVA). Difference random selection and assignment: random selection is how you draw the sample of people for your study from a population, random assignment is how you assign the sample that you draw to different groups or treatments in your study. You can have both, none or one of these in one study. 9.3 Classifying experimental designs What you see/observe in a research study can be divided in two components: the signal and the noise. Because there is much variability(extent to which values measured for a variable differ) or noise in series, it is difficult to detect the downward slope. (figure 9.7) The signal is often related to the key variable of interest (construct you’re trying to measure or effect of the program). The noise consists of all random factors in the environment that make it harder to see the signal (lighting, local distractions, personal feelings). You want the signal to be high relative to the noise. The signal-to-noise metaphor helps us classify experimental designs into: - Signal enhancers: o Factorial designs: focus is on the setup of program, its components and major dimensions and enables to determine whether program has an effect, whether different subcomponents are effective and whether these are interactions in the effects caused by subcomponents. Medical research. - Noise reducers o Covariance designs: inclusion of one or more variables that account for some of the variability in the outcome measure or dependent variable. o Blocking designs: grouping of units into one/more classifications called blocks that account for some of the variability in the outcome measure or dependent variable. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld 9.4 Signal enhancing designs: Factorial designs Efficient because they enable to examine which features or combinations of features of the program lead to a causal effect. More several independent variables, one dependent variable. Factors crossed. Main & combined/interaction effects. Independent x as dependent y as. Movie UvA The basic 2x2 factorial design – see example 9.9 Number indicates the number of variables 2x2 means 2 variables. 2x2x2 means 3 variables. 2 itself stands for levels of the variable. 2x2 requires 4 groups. 2x3 requires 6 groups. Factor: major independent variable Level: subdivision of a factor, for example time and program setting. Null-case: situation in which the treatment has no effect. Main effect: outcome that shows consistent differences between all levels of a factor. Only men/woman and migraine attacks and age and migraine attacks. Interaction effect: occurs when differences on one factor depend on which level you are on another factor. Conditional or moderation effect. Moderation hypotheses. Men/woman, age & migraine attacks. Three ways to determine whether interaction exists: - Statistical table will report on all main effects and interactions - When you can’t talk about the effect on one factor without mentioning the other factor - In the graphs of group means; whenever lines are not parallel there is an interaction See figures from p. 239 !! Gender affects something(already given), this is even more so when higher executive level comes in = moderation hypotheses. Benefits factorial designs:  Great flexibility for exploring or enhancing the signal in studies;  Factorial designs are efficient (combine studies into one);  Factorial designs are the only effective way to examine interaction effects. Limitations:  Involve more planning;  Require more participants;  Can be problematic when a large number of groups are present. See 2x3 example book. Incomplete factorial design In much research not interested in fully crossed factorial design: design that includes pairing of every combination of factor levels. Some of the combinations may not make sense. Then you decide to implement an incomplete factorial design: some cells or combinations in a fully crossed factorial design are intentionally left empty. For example allow for control/placebo group that receives no treatment. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld 9.5 Noise-reducing designs: randomized block designs Randomized block design: sample is grouped into relatively homogeneous subgroups or blocks within which your experiment is replicated. Reduces noise/variance in data.  equivalent to stratified random sampling. The variability within each block is less than the variability of the entire sample. Note: - It may not be apparent that you are blocking; blocking doesn’t necessarily affect anything that you do with research participants. It is an analysis strategy. - You will only benefit from blocking design if you are correct that the blocks are more homogeneous than the entire sample is. 9.6 Noise-reducing designs: covariance designs Analysis of Covariance Design(ANCOVA): analysis that estimates the difference between groups on the posttest after adjusting for differences on the pretest. Figure 9.25. Covariate: correlated with dependent variable. Can be included in analysis to provide a more precise estimate of the relationship between an independent and dependent variable. Covariates are the variables you adjust for in your study. 9.7 Hybrid designs: switching-replications experimental designs Switching-replications design: 2-group design in 2 phases defined by 3 waves of measurement. Implementation of treatment is repeated in both phases. In the repetition, the 2 groups switch roles: the original control group in phase 1 becomes the treatment group in phase 2, whereas the original treatment group acts as the control. By the end all participants have received the treatment. Strength: causal effect testing 9.8 Limitations of experimental design  Groups have differential dropout, thereby confounding the program with the outcome  diminishes the value of random assignment, because the initially probabilistically equivalent groups become less equivalent due to differential dropout.  Challenged on ethical grounds; you have to deny the program to some people who might be equally deserving of it as others.  Resistance from staff members in study who would like some of their favorite people to get the program.  Design may call for assigning a certain group of people to a harmful program, which may not be plausible to implement on ethical grounds. Randomized experiments may only be appropriate in 25-30% of the social research studies that attempt to assess causal relationships, depending on nature of intervention and of context involved. Chapter 10 – Quasi-experimental design 10.1 Foundations of quasi-experimental design Quasi = sort of. A quasi-experimental design looks like an experimental design but lacks the key ingredient – random assignment. Developed by Donald T. Campbell. More frequently implemented because it is either impractical or unethical to randomly assign participants. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld 10.2 The nonequivalent-groups design Nonequivalent-groups design: pre-post 2-group quasi-experimental design structured like a pretest- posttest randomized experiment, but lacking random assignment to group. (NEGD).  extremely intuitive in structure and can be implemented relatively easily in practice. (lack of random assignment = nonequivalent). 10.3 The regression-discontinuity design Regression-discontinuity design (RD): pretest-posttest program-comparison-group quasi- experimental design in which a cutoff criterion on the preprogram measure is the method of assignment to group. Pretest, then identify those people most in need of the intervention. Not random at all. Convincing. Pre-test should be associated with post-test. 10.4 Other quasi-experimental designs Proxy-pretest design: post-only design in which, after the fact, a pretest measure is constructed from preexisting data. This is usually done to make up for the fact that the research did not include a true pretest. Separate pre-post samples design: design in which the people who receive the pretest are not the same as the people who take the posttest. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld Double-pretest design: includes two waves of measurement prior to the program. Nonequivalent dependent variables (NEDV) design: single-group pre-post quasi-experimental design with two outcome measures where only one measure is theoretically predicted to be affected by the treatment and the other is not. Pattern-matching NEDV design: single group pre-post quasi-experimental design with multiple outcome measures where there is a theoretically specified pattern of expected effects across the measures. Regression point displacement (RPD) design: pre-post quasi-experimental research design where the treatment is given to only one unit in the sample, with all remaining units acting as controls. Useful in studying the effect of community-level interventions, where outcome data are routinely collected at the community level. Lecture (see outcomes book) Reaching cause-and-effect conclusions with the NEGD Look at alternative explanations just by looking at the graphs!! Outcome 1: already difference at the start. If something happens between pre and posttest it can affect the outcome. History threat. Outcome 3: history threat, re-selection regression threat. Outcome 5: not regression because that goes to the point not through it. See list of threats Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld When the researches made use of non-random sampling, think about what kind of effect this could have. For example, study to discrimination on work floor. When a company accepts to participate to this study, you could already state that they might know that in their company there is nothing wrong. The outcome of the study would be positive for them. Chapter 11 – Introduction to data analysis 11.1 Foundations of data analysis Data analysis steps: 1. Data preparation: logging data in, making codebook, entering data into computer, checking data for accuracy, transforming data, developing/documenting database. 2. Descriptive statistics: meaningful summaries about sample, graphical analysis, describing what data shows. 3. Inferential statistics analysis: tests research hypotheses. 11.2 Conclusion validity Conclusion validity: the extent to which conclusions or inferences regarding relationships between the major variables in your research are warranted. Also relevant in qualitative research. Relevant when you are looking at relationships including cause-and-effect relationships.  Related to internal validity, but also independent of it;  Internal validity is concerned with causal relationships  Conclusion validity applies to all relationships, not just causal ones.  So conclusion validity is needed before we can establish internal validity. Threats to conclusion validity  any factors that can lead you to reach an incorrect conclusion about a relationship in your observations. 2 kinds of errors:  Type I Error: erroneous interpretation of statistics in which you falsely conclude that there is a significant relationship between variables when in fact there is none. You reject the null hypothesis when it is true (falsely choosing research hypotheses). Finding a person guilty of a crime that they did not actually commit.  Type II error: failure to reject a null hypothesis when it is actually false. Conclude that there is no relationship when in fact there is. Finding a person not guilty of a crime that they did actually commit. Threats type I error: - Level of statistical significance: determine probability of whether finding is real or chance event. Alpha level: p value selected as the significance level. Alpha is type I error, or Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld probability of concluding that there is an effect when there is not..05 level of significance: probability of Type I error. Selected by researcher before a test is conducted. R.A. Fisher. - Fishing and error rate problem: occurs as a result of conducting multiple analyses and treating each one as independent. When fishing for specific results by analyzing the data repeatedly under slightly differing conditions or assumptions. Threats type II error: - Small effect size: standardized estimate of the strength of a relationship/effect. In planning a study, you need to have an idea of the minimum effect you want to detect in order to identify an appropriate sample size. In meta-analysis, effect sizes are used to obtain an overall estimate of outcomes across all relevant studies. - Sources of noise: o Low reliability of measures: measures that have low reliability have more noise, the greater noise makes it difficult to detect a relationship. o Poor reliability of treatment implementation o Random irrelevancies in the setting: traffic etc. o Random heterogeneity of respondents: very diverse group of respondents - Source of weak signal: for example low-strength intervention Every analysis is based on assumptions, the procedures you use to conduct the analysis and the match between these two. Drawing incorrect conclusions about relationships in quantitative research is the threat of violated assumptions of statistical tests: threat to conclusion validity that arises when key assumptions required by a specific statistical analysis are not met in the data. In qualitative research similar problems can occur. Improving conclusion validity  Minimizing the threats that increase the likelihood of making Type I and Type II errors can help improve the overall conclusion validity of your research. To strength conclusion validity: improving statistical power; the probability that you will conclude there is a relationship when in fact there is one. At least 0.80 in value. To improve statistical power and thus conclusion validity: - Increase the sample size - Increase the level of significance: raising the alpha level, however, this represents a trade-off. Increasing the level of significance makes it more likely for a Type I error to occur. - Increase the effect size: raise the signal or decrease the noise(increase reliability). 11.3 Data preparation  Logging the data o Use a computer program to log the data as it comes in  MS Excel, Access  SPSS, SAS, Minitab, Datadesk o Retain and archive original data records  Most researchers keep original data for 5-7 years  IRB requires data to be stored securely and anonymously  Checking the data for accuracy o Are the responses legible/readable? o Are all important questions answered? o Are the responses complete? o Is all relevant contextual information included (data, time, place, researcher)? Ensuring that the data-collection process does not contribute inaccuracies helps ensure the overall quality of subsequent analysis. Gedownload van Knoowy - Upload jouw eigen documenten en verdien geld  Developing a database structure, the system you use to store the data so that it can be accessed in subsequent data analyses. o Database programs: more complex to learn and operate, but greater flexibility in manipulating the data. o Statistical programs o Codebook: describes each variable in the data and indicates where and how it can be accessed. (name, description, format, instrument, date collected, respondent/group, location, notes)  Entering the data into the computer o Double entry: automated method for checking data-entry accuracy in which you enter data once and then enter them a second time, with the software automatically stopping each time a discrepancy is detected until the data enterer resolves the discrepancy. This procedure assures extremely high rates of data-entry accuracy, although it requires twice as long for data entry. o Setting data once and set up procedure for checking data for accuracy.  Data transformations: transform original data into variables that are more usable. o Missing values o Item reversals: reverse ratings to get them in the same direction as others o Scale and subscale totals o Categories o Variable transformations: expressing variables in logarithm or square-root form. 11.4 Descriptive statistics  Statistics used to describe the basic features of the data in a study. They help you summarize large amounts of data in a sensible way. However, every time you try to describe a large set of observations with a single indicator, you run the risk of distorting the original data or losing important detail. A variable has three major characteristics that are typically described:  The distribution: the manner in which a variable takes different values in your data. o Frequency distribution: a summary of the frequency of individual values or ranges of values for a variable. o Percentages  The central tendency: Estimate of the center of a distribution of values. o Mean: description of the central tendency in which you add all values and div

Research Methods Summary (Chapters 1-12) PDF

Document Details

Tags

Related

Summary

Full Transcript