Study Design Part 1 PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document is lecture notes on study design, focusing on different types of epidemiological studies, including descriptive and analytical approaches. It covers experimental and observational studies, case reports, case series, cross-sectional studies, cohort studies, and case-control studies.
Full Transcript
Hello everyone. Welcome to the online version of the class. So today we're gonna talk about study designs. For this class, our learning objectives would be first, for us to define the basic differences between observational and experimental epidemiology and then to identify an epidemiological study...
Hello everyone. Welcome to the online version of the class. So today we're gonna talk about study designs. For this class, our learning objectives would be first, for us to define the basic differences between observational and experimental epidemiology and then to identify an epidemiological study design by its description. So if I would give you a description of an epi design study you would know exactly what type of study is there. Also you will be able to list the main characteristics, advantages and disadvantages of cross-sectional, case control and cohort studies. You'll also be able to describe sample size or sample designs that use epidemiological research and also you'll be able to calculate and interpret an odds ratio and relative risk. Remember we've gone through odds ratio and relative risk for a separate class where we worked on the calculation. Remember your two by two tables or these contingency two-by-two two tables. So what differs is that now we're going to apply those measures of association, which is the odds ratio and relative risk. Actually we will implement them into the actual study designs that they belong to. For example odds ratio as you know, we get that information from where? From case control studies. And relative risk, where do we get that information? We get it from cohort studies. So this is just some YouTube resources. That you can go through; some supplemental material. So it's only five minutes each or even less. So I advise you to look into it. It could give you more details and in a summarized way regarding study designs and also experimental study designs which we will go through in a different, separate lecture. So this graph just shows you epidemiological studies in general. It sort of works like a map for you. So you can look at Epi studies at the top and as you know we have descriptive epidemiology and analytic epidemiology. So for descriptive you have your case report, your case series, cross-sectional and then some other type of studies, but these are the common ones. Case report, case series and cross-sectional. Cross-sectional is also known as cross sectional surveys, because it's basically a survey. And then you have your analytic part or type which can be either experimental or observational. Experimental is either a clinical trial or randomized clinical trial which is the most common type. And then you have community trials. It's a less common type of experimental study, but I just wanted you to know about it. And then you have observational analytic studies. So like your cohort and case control studies. Those are the two main analytic observational studies. Now cross-sectional studies are not an analytic as you can see on the left side, it's descriptive. But can also sometimes be classified under observational. So it's a descriptive observational. Cross-sectional studies are descriptive and observational. However cohort and case control are analytic observational. Experimental studies are the randomized clinical trials which is one type of analytic studies. This is another kind of flow chart just to know what experimental means and how to classify these studies. So you have to ask yourself a question first. Does the investigator or did the investigator assign an exposure himself or herself. If the answer is yes, if the investigator assigned an exposure. This means that it's an experimental study. What does it mean to assign an exposure? Meaning, I will recruit a big group of people, I will divide them into two groups randomly and then I will assign a certain treatment or some sort of exposure to one group and not have it with the other group. So this is an experiment. I'm experimenting with these people. Hence the name, experimental studies. So if we randomize these people into one of these groups, (exposure or non exposure) then it's a randomized controlled trial. But if you don't have that randomized, or randomization factor then it's not a randomized controlled trial. It's as simple as that. So when you go back to that question: "Did the investigator assigned exposure?", if the answer is no, then this is basically an observational study. So you just observe, we don't experiment with people. So, but then in observational studies, if you want to know if it's analytic or descriptive, you want to see if there is a comparison group. So is there a comparison group where we compare people to each other? If the answer is yes, then that's an analytic study. Here, so that's an analytic study. And then to know what type of study it is, you need to look at the direction as well. For example, in cohort studies, you start first with the exposure and then you end up with an outcome. And then for case control studies, you start with an outcome. So you start at the very end with the disease or an outcome and then you go back and ask history or questions about the exposures. So if there is no comparison group, most likely your dealing with a cross-sectional study. So it's also a type of - that's a descriptive type of study. Because for a cross sectional it's only a certain point of time or specific point of time. As you remember, we mentioned cross sectional before. So it's it's like taking a snap-shot of the population or screenshot of the population at a certain point of time. So you can only assess exposure and outcome at the very same time. So you cannot look at the exposure, how they affect outcome over time. No, you look at exposure and outcome at very specific time points. So that's a cross-sectional study. So this graph shows you the hierarchy of scientific evidence. So we use this as an illustration to show what kind of studies give the weakest evidence and what kind of studies generate the strongest evidence. So as you can see, case report, case series, animal trials, and cross-sectional. And then as you go up: case control, cohort, randomized controlled trials and systematic reviews. So cross-sectional will give you good evidence, but it's not as strong as case control or cohort. And the best would be randomized controlled trials. Actually, randomized controlled trials are known as the gold standard. So, observation versus experimental approaches. As we learned from that graph that was explaining experimental, there is a type of manipulation you're introducing - interventions to your population. And there's also an element of randomization. Right. However, for observational you observe. You observe people and you can do some comparisons, if you're dealing with analytic studies like case control or cohort. You can compare exposures between different groups, like your cases and controls. And it can be descriptive only as in cross-sectional surveys, where you're assessing outcomes and exposure at the same time point. And then you have your experimental studies, so its your experimental design. And the most common type is randomized controlled trials. So we're going to focus the first part of this lecture on cross-sectional studies. So as you can see, I put the picture of the camera there to remind you of cross-sectional. Always remember the camera because it gives you one snapshot or a screenshot of the population at a certain time point. So you can not get incidence data from cross-sectional studies. What kind of data can you get? Exactly, prevalence data. So obviously because of that, cross-sectional type of designs can also be called prevalence studies because you get prevalence data from them. And the exposure and disease measured are obtained at the individual level. It's a single period or point of observation. Exposure and disease histories are collected simultaneously at the same time. So whenever you think of a cross-sectional study, think of any survey. Basically just a questionnaire. It's a number of questions that you provide to a specific group of people. And you would be interested in a certain exposure and a certain disease. That's what the cross-sectional study is. It's a survey. So you go out into the community, either a school, college, neighborhood, health centers or whatever. And you can go and do it, either personally administer the questionnaire or you can do it via the phone. You can do it online, email or by mail as well. So you collect your information of a number of behaviors. Say for example, habits, substance use, lifestyle. All of these could be your potential exposures. At the same time you'll be collecting information on their physical or mental health. some type of outcome data. And if they have certain chronic diseases or a specific disease that you're interested in studying. So although you don't have a comparison group per-say, in cross-sectional studies. Cross-sectional studies do not have groups. But when you collect your questionnaire, when you collect your data, you can divide your data yourself in terms of: some people have exposures, some other people do not have the exposures and then you can see how these characteristics differ between these groups. But basically cross-sectional studies do not have comparison groups by design. So what are some examples of cross-sectional studies? So you have a survey of smokeless tobacco use among high school students. This is an actual study. You can actually follow that link to know more/learn more about that study. Also for example prevalence surveys or cross-sectional studies of the number of Lasik surgeries performed. So if you're interested to know how many Lasik surgeries, for example, were performed in Arlington. What's the characteristics of people who get Lasik surgeries? Are they more female? More male? What are their ages etc.. Socioeconomic status? And then another example would be prevalence surveys or cross-sectional studies. So prevalence survey is another name for cross-sectional studies. So remember that. That might come in a quiz. So prevalence surveys of cigarette smoking among Cambodian Americans in Long Beach California. So you're interested in that specific population only and you send them that survey and you ask about smoking habits. So what are the uses of cross-sectional studies? Well similar to descriptive epidemiology if you remember. Because a cross-sectional study is a type of descriptive study. So you can use it to generate hypotheses. You can use it also for intervention planning. So for example, if you're interested to know the rates of diabetes in your community. And after, you do the study, the cross-sectional study. You're surprised that you have really high rates of diabetes among your community compared to other communities, and other cities in the U.S.. So now you go ahead and allocate more money, more resources, more interventions, more studies, among this specific location and this specific population. So cross-sectional studies are very helpful in intervention planning. Also you can use it to estimate the magnitude and the distribution of a specific health problem. The same example of diabetes. Right. And if you take multiple snapshots, multiple pictures - screenshots. So this means you do multiple cross-sectional studies among the same community, year after year after year. You can use that data to examine trends in diseases or risk factors and how that variant changed over time. Nonetheless cross-sectional studies have some limitations. So it has limited uses for inferring causes of disease. Because if you remember, cross-sectional studies, they're really somewhere in the bottom here, in terms of evidence. So does not really help you to establish causation. And also because you cannot determine temporality. Right. So if you remember, one of the important criteria we use in that causation class. Remember that in-class activity that you did. One main criteria is the temporality, meaning factor A or exposure comes before factor B which is the disease. So exposure first and then the disease. So you smoke a cigarette and then you get lung cancer. It's not vice versa. It's not the opposite. So this kind of relation/time relationship or temporality - you cannot assess in cross-sectional studies. And remember why? Because it's like taking a screenshot of one specific time point. You're not following people over time. Also cross-sectional studies do not give you or provide incidence data. It's the same thing. You can not really identify new cases because you need to follow people over time to find new cases of diseases which is incidence. So you cannot get incidence data. You can only have information about pre-existing conditions or characteristics which is prevalence data. So you get prevalence not incidence data. All right. Also you cannot use it to study a low prevalence disease because, for example, if you pick a specific population - let's say we're doing a cross-sectional at the University of Texas at Arlington, UTA here, and you're looking at the prevalence of a disease that is rare. Of course you're not going to find it because you're population is very limited. And by design because you're not following people over time. It's also hard for you to find these diseases with low prevalence. So cross-sectional studies reviewed. So this is just a small revision, a YouTube link. It's supplemental material if you're interested to know more. Just start at "one minute". Where you can hit the points regarding cross-sectional. So there you go. We have a pop quiz, similar to the class. So here we have four choices. Which of the following health outcomes could be studied using cross-sectional study design? So you can choose all that apply. It can be one, it can be two, it can be all of them. So the first statement is "prevalence of diabetes among adults in the United States in 2014. B. This prevalence of diabetes among all patients seen at a particular health clinic on one day in 2014. C. is number of new cases, new cases of diabetes diagnosed among those who are at risk in the United States in 2014. And finally, D. The number of people in a population with diabetes who are obese and the number of people in the population with diabetes who are not obese in the US in 2014. So I'm going to give you a few seconds to think about it. Think which one of those apply or you can use a cross-sectional design to answer or to find this kind of data. Think about it for a few seconds while I make sure our presentation is still being recorded. Yes we're still being recorded. Great. OK. OK. So what's the answer? Well if you said "A", "B" and "D", then you're correct. But why is "C" not an answer? Why? Well let's go back and see. So because "C" is the number of new cases, remember new cases, so new cases meaning what? Incidence data. And incidence data - you don't get from cross-sectional studies. You get it from cohort studies. What you get from cross-sectional is prevalence data. So that's why "C" was excluded there. All right. Now we're going to shift gears and we're going to talk about case control studies. We have talked about case control several times, during our past or previous lectures, but we're going to talk more about it now - in more details. So in case control studies, you will have two groups. One group has a disease of interest and you call them "cases" obviously. And then you have a comparable group that is free from the disease, or "controls". The case control study identifies possible causes of the disease by finding out how the two groups differ in respect to the exposure or whatever factor you're interested in. We call it, usually you call it the "exposure". So this graph should be helpful for you to understand what type of study design this is. So you have two groups, you have two groups to compare. So that's analytic by nature. Analytic study. OK. And then you have your first group, which are the "cases", those who have the disease. In this example, those who are cancer patients. And then you have your comparison group which are your "controls". So people who are not sick. I mean they don't have the same disease. So there are "controls". They don't have cancer. OK. So then you interview them or you take information from them. You take history and ask about multiple exposures. How were they behaving before they got the cancer? You asking about this. For example, if they have cancer, let's say lung cancer because we always use that lung cancer example. So we are asking about. We ask them about smoking, alcohol, their diet, exposure to radiation, whatever cancer risk factors. And then for example let's say, we have five people. Three out of them are smokers. They reported they were heavily smoking. Then you do the same with your "controls". You ask them the same questions and none of them, for example where smokers. So then you compare their answers and then you draw your conclusion. And when you draw your conclusion from case control studies, what kind of information do you have? Odds ratios. That's what you get. You get odds ratios. So you get your odds ratio and then you can estimate the risk or approximately estimate the risk. For example, you could say - just an example, if your odds ratio is 5. You can say. OK Those who are smokers were five times as likely to have lung cancer than those who are not smokers. So you can tell from the name of this study design which is case control, that you have two groups, cases and controls. So how do you go about selecting your cases? So you need to think about this conceptually and also operationally. Conceptually meaning you have to define your cases. What kind of cases do you want to include in your study? So if you're thinking about lung cancer, you need to think - first of all, what is the diagnosis? What type of lung cancer? What stage? So you have to be very specific, conceptually, in defining your cases. And also operationally you need to think of where you're going to get those cases. Are you going to get them from the hospitals? Are you gonna go straight to hospitals and look for people who have lung cancers? Are you gonna go to clinics for people who are following up? Who have lung cancers. Are you going to use some health registries? Because if you remember from the data quality class, we talked about where you can get data and the quality of this data. And one of the data resources out there is disease registry. For example, cancer registry or tumor registry. Are you gonna go to this registry and will you get the data? And are you going to contact these people who have cancer to recruit them in your study? So you've got to think about where you're going to get your cases. Also very important and I think even more important than selecting the case, because selecting these cases might be a little easier if you define your case very clearly. But selecting the controls is the most important because you need to make sure that your controls match your cases. They match - I mean you try. They don't obviously completely match. They're not twins. But you have to try to make your best to make sure that they are as close as possible, as similar as possible to your cases in everything except one thing. Except the disease. Obviously they don't have the disease but are similar in age, same gender, they live in the same environment, et cetera. So if the controls were equal to the cases cases in all respects other than disease and the hypothesized risk factor are in a stronger position to infer or ascribe differences in a disease status to the exposure of interest. So this basically means if you match your controls very well with the cases and then a certain exposure seems to be more common among your cases, compared to your control, then you can say with confidence that that exposure seems to be the risk for getting the disease. So control cases must must represent the base population. So it's the same point I was talking about. They have to match on everything except for the disease. And where do you get them from? So usually an easy way of doing it is you can get it from the same geographical area, same neighborhood. Another way of doing it if you're getting your cases from a hospital for example, if you're getting those lung cancer patients. Just go to a different department, not the oncology department where you get your lung cancers. Go to the outpatient clinic and recruit people for controls: those who don't have lung cancer but maybe have some other minor things. Maybe they're there for a cold or just for a checkup or maybe they're following up for some minor procedures. So because they come to the same hospital, probably they're from the same neighborhood; they share some characteristics with your cases. Sometimes also we contact relatives of friends of cases because most likely they're from a similar socioeconomic status. Or they have some similar habits. So how do you get your information about exposure or their previous exposure in history? So, basically you collect that information using questionnaires. Either face to face, you ask them questions, they answer them, or maybe you can do it over the telephone or you self administer the questionnaire to your participants in the study. You can easily obtain info on many exposures with relative ease and flexibility in case controls, but this questionnaire must be carefully crafted and created to make sure you include all the information that you need for your study. So the first way of doing it or getting information about the previous or past exposures - by asking questions, doing questionnaires where you design your own questions. You know what you need to know. So you include that in your questions. A second way which is less ideal is looking at the preexisting record. So if you are using those disease registries or are using records to lookup, maybe medical records to look at previous exposures, you are limited only to the data that is recorded obviously. So it's not the best; it might be incomplete, but hey it's another way of getting information about previous exposures. Another third type, this is really uncommon, but you can use biomarkers in blood or saliva or any biological fluid because these biomarkers can be a proxy to tell you if someone was exposed to a certain factor. They might have a biomarker for that factor that stays in their blood and their biological system for a long time. But this is very uncommon because most of these biomarkers are short living. So if you smoke you would have cotinine in your blood for maybe a month or two. Maybe you'd have cotinine in your hair for two or three months. But cotinine is one of the metabolites of tobacco. So you can have that in your system for some time. But if someone quit five or six years ago, they would probably have no biomarkers in their blood. But that's another way of getting that information but it's really uncommon. So what are the characteristics of case control studies? Well it's a single point of observation, but this is not like cross-sectional studies because cross sectional is a single time point like a snapshot. This is different. It's also a single point of observation which mean you're starting with the outcome. You're starting at the back end of the story. After people developed the disease, you come to someone who already has lung cancer and then you go back in time by asking them about the history of their exposures and then you do the same with the controls. But it's one point of observation. Because you meet with them at that point, you get that information from their memory or from the medical records. All right. Let's get that away. OK. Unit of observation and unit analysis are the individual and then exposure is determined retrospectively. So you collect that information going back in time meaning you're asking your cases and control about their previous history. So you're gonna go back to the exposure because usually exposure starts and then outcome. But because it's case control we start with the outcome and we go back and try to investigate that exposure. Case-control does not directly provide incidence data. It gives you a proxy because - also it's so a single point of observation because you're not following people starting with exposure and ending at outcome, as you do in cohort. That's where you get your incidence data so you can find - you can see new cases of diseases as they develop. So new diagnosed outcomes. You can find out about that doing cohort studies which provides you with incidence data but you cannot do that with case control. You get odds ratios. Odds ratios is an approximation of risk. So it tells you about the odds. What are the odds of developing a disease between cases and controls? Remember the two-by-two, and how we can create odds ratio. You have gone through this, we actually had one specific class where we did this. We also did an in-class activity if you remember. This is two- by-two tables. You're gonna keep seeing this over and over again in quizzes and the final exam in homework's in epidemiology and public health in general. So this is very important. You have your formula there. "A D" divided by "B C". So it goes on the left column. You have exposure status-yes and no. And those are the rows. And then for the columns you have the disease status which is - "Yes" for cases and "No" for controls. You do cross product. And that would be your formula for odds ratio, just there at the bottom. So. Case-control studies. We're going to do a sample calculation. You have done this before but just to refresh your memory. So this is a real study that was done in Mexico City. They sister association between chili pepper chili pepper consumption and gastric cancer. The risk of getting cancer got gastric cancer. Just the cancer of stomach was a population base case control study. Let's look at the numbers. So they have their two-by-two there. And those who consumed chili pepper and have gastric cancer were 2 2 0 4 and then those who consumed chili pepper but did not have gastric cancer. Five hundred fifty two. And then you have those who did not consume any chili pepper and still had gastric cancer only nine. And in those who did not consume any chili peppers and they did not develop gastric cancer. So we have all our numbers have the cross products and then we calculate our odds ratio which is five point nine five. What does that mean. Well before we go through the interpretation of that specific address you got to remind you how to interpret oration. This also imply for relative risk. But now we're talking about old ratio. So if other issue could one this is meaning the disease the odds of getting the disease is the same as the odds of getting sorry. This means the odds of disease is the same. For those who are exposed and those who are not exposed this mean that there's no relation. If you for old is one this the odds of getting a disease are the same between the cases and your controls. The exposure that did not matter. However if your odds ratio is greater than one this means that exposure increases the odds of being a case of getting the disease. And if your old ratio is less than one that's mean exposure reduce the odds of getting the disease. So this mean if your old ratio is less than one meaning your exposure is protective actually against the disease. So to petition for that study is that those who consumed a chili peppers are five point nine five five point nine times as likely to be cases or to get the disease which is gastric cancer than your controls or whole. Those who did not consume chili peppers. So underachieve provides a good approximation of risk so it's not an absolute risk. Remember this is not an absolute risk. It's an approximation of risk. This is because you're not getting incident's data from Chris control but it gives you an abook summation of risk gives you an odds of getting a specific disease. That's what Audre Rayce tells you especially give you good information especially if your controls are representative of the target population and if your cases are representative of all cases and if the frequency of the disease is small or in the in your overall population. And this is when we use the case controls. Right izuna we use it for rare diseases because for example for that lung cancer example you cannot follow people who are smoking and people not the whole not smoking for 10 or 15 years and wait for a lung cancer to develop. That would be expensive and very hard. Difficult to do. That's why because lung cancer is a rare disease. That's why for that kind of diseases who are rare or which are rare. You do case control studies. So just some more examples. This is a case control study that was conducted in California to to investigate the relationship between colon cancer and diet. So they looked at two different exposures high fat diet and coffee consumption. So all cases were confirmed colon cancer cases in California 2011. And then the controls where sambal of California residents without colon cancer. So that's a good controls. They got it from the same area from the same geographical region. So they have they had an old ratio for for the study of high fat diet. So what does that mean. It means that individuals who consumed high fat diet have four times the odds of colon cancer than those individual who did not consume a high fat diet. And then for the coffee study they had their old ratio of 0.6. What does that mean. So it means that the odds of getting colon cancer among coffee drinkers was only 0.6 times the odds among individual who did not consume coffee. Does coffee consumption seems to be protective against colon cancer. If you remember for that class of word ratio and relative risk that we did the cold calculation I did. I taught you how to interpret. And when you have a protection order ratio or relative risk or that mean your odds ratio is less than one. There's another way to express the relationship. So what you do is you. So how can we express this in a different way. We can say that those who consume coffee are 40 percent less likely to get colon cancer than those who did not consume coffee. So why why we say 40 percent. So you take your old ratio 0.6 you subtract that from one you get 0.4 you multiply that by one hundred percent and you get it together by one hundred to get a percentage. So that's 40 percent. So you can say that the odds of colon cancer among coffee drinkers is zero point six. That times the odds on among individual who did not consume coffee compared. Sorry. You can say the odds of colon cancer among coffee drinkers is all these 0.6 times the odds among individual who do not consume coffee. Well you can say that those who did not those who consumed coffee were 40 percent less likely to get colon cancer compared with those who did not consume coffee. And for more information on how to interpret odds ratios and of risk go back to that or refer to that specific lecture where we gone into this into this for details for further details. So what's the advantage of case control studies. They tend to use smaller sample sizes than surveys or prospective studies. They're quicker and cheaper than cohort studies. As we talked. So rather than waiting for ten fifteen years to to find your outcome you start with your outcome and you go back in time and assess exposure. So it saves you time and money. It's useful useful for studies of rare diseases rare outcome like lung cancer and other cancers. It's also efficient for studies of diseases with link with long latent period. So some disease that take time to develop. And this is another important important point is that case control studies allow you for us to assist multiple exposures. Remember like this study did look at high fat diet and coffee in the same exact study. So you can look at multiple exposures with case control which is an advantage. But what about the disadvantages of case control. Well they are subject to selection bias because remember you have to select your cases and select your control. So if you don't do a good job you will end up with selection bias meaning your cases are not defined very well. You you're not picking people a different diagnosis. Some of them are lung cancer some of them with subtitles. They have other cancers they have some selection bias. Or maybe your controls do not match your your your cases. Also another point is increased possibility of bias because of retrospective nature of data. Remember when you start to we said case control you start with the outcome and go back in time then go back in time meaning you're asking your cases and control about their history of exposure. So sometimes they will have recalled by as they will not remember. And that's a problem that that's a disadvantage of case control when they don't remember. Also it's difficult to assess the temporal relationship between exposure and disease. So this is similar to cross-sectional studies because it's a single point of observation. You start with the outcome the disease and you ask them about the history of their exposure. But you're not really. Like cohort studies following people starting with exposure and waiting for the disease or the outcome to develop which is that Timba relationship you cannot have that in case control. You can only have that in cohort studies or randomized controlled trials but not in case control and neither in cross-sectional studies. So kids control are not suitable for assessing rare rare exposures because it's very suitable in assessing career outcomes but not rare exposures. Because remember if you're asking someone about something in a bust and a rare exposure on the bus they probably will not remember they were a member something that is a common exposure like smoking or drinking or eating meat or eating bland based diet. But if you ask him about something very specific details that happened years ago you will not remember similar exposures. Also case control generally do not allow calculation of incidence which is absolute risk. You only get an approximation of risk. You can also calculate odds of this rate. You which is an approximation of risk. So if you have any questions about this lecture feel free to post and a discussion board that will be created specifically for this week of lectures so you can boss your questions there and me and or my T.A. would respond to you. Also beyond that if you have more questions feel free to email me or email my.