Formulating and Testing Hypotheses in Epidemiological Research PDF
Document Details
Uploaded by RespectableEllipsis
Reza Hashemi
Tags
Summary
This document discusses formulating and testing hypotheses in epidemiological research. It explores the goal of this research, definitions, and scope, along with the importance and applications. It also covers the hypothesis formulation process.
Full Transcript
by Reza Hashemi Formulating and Testing Hypotheses in Epidemiological Research: A Comprehensive Study 1. Introduction to Epidemiological Research The goal of epidemiological research is to generate information used in the maintenance, improvement, and restoration of the health of populati...
by Reza Hashemi Formulating and Testing Hypotheses in Epidemiological Research: A Comprehensive Study 1. Introduction to Epidemiological Research The goal of epidemiological research is to generate information used in the maintenance, improvement, and restoration of the health of populations: diverse groups of individuals oriented to and functioning in their respective home, work, and transfer environments. Epidemiological research is embedded within an action framework - the processes in which historically constituted societies engage to evaluate program and policy outcomes, to develop and evaluate potential interventions and strategies to modify the factors thought to be capable of optimizing, maintaining, or restoring individual and collective health, and to generate the knowledge base needed by help-seekers and caregivers in the maintenance, improvement, and rebuilding of population health. An important characteristic of this research is that the health-related state observed is, in general, the outcome of an individual and societal accumulation of exposures to multiple affected domains - biophysical, socio-economic, and behavioral - stressing events. These domains generate their respective lists of factors themselves operating at various levels of an interconnected hierarchy embracing effects of the physical, social, political, and cultural environments. Lastly, these domains are comprised of points of interaction among the members of the populations being studied. These interactions are directed to specific functions of interest to the protagonists - society's caregivers and their associated theory, culture, power, and value criteria. 1.1. Definition and Scope Epidemiology is a scientific discipline focusing on the study of the determinants, occurrence, distribution, and control of health and diseases in a defined human population. Supported by a methodological framework developed in cooperation with other scientific disciplines such as mathematics, biostatistics, and informatics, 1 epidemiologists are actively involved in providing convincing data to policymakers and social authorities on the distribution of morbidity and mortality, priority preventive measures, health care organization, and the cost-effectiveness of interventions against diseases. Epidemiologic consultation in planning, conducting, evaluating, and interpreting the results of experimental and observational studies makes research results relevant to important public health problems, limits methodological biases, ensures extrapolation of findings to defined human populations, and stresses the ethical principle of reducing risk to subjects. Epidemiological research aims to support hypotheses. 1.2. Importance and Applications I present a structured overview and evaluation of the main research questions, designs, and statistical methods in epidemiological research. Epidemiological research has developed in the past five decades into a multidisciplinary field with many applications. Main activities in epidemiological research, which have appeared in the literature up to the year 1991, are classified according to applications and scientific disciplines. Applications include assessment of disease prevention measures, description of health problems, and identification of causal factors, and relate to current public health problems and challenges. Scientific disciplines— management science and operations research, accounting, economics, psychology, sociology, political science, mathematics and statistics, computer and information science, biological, and general sciences—whose theories and perspectives have been studied in the domain of epidemiology are also presented. Epidemiology is one of the foundational disciplines in public health. For many years, epidemiologists have been involved in the search for answers to important but increasingly challenging public health questions, such as whether dietary cholesterol causes heart disease, whether individuals should wear bicycle helmets, and whether exposure to polluted air results in cancer and other diseases. The contributions of epidemiology are important because this discipline addresses questions about outcomes beginning at the individual and group levels and progressing upward to society, and pays as much attention to the social and even to the political answers to those questions as to the natural consequences of the occurrence of diseases or disabilities. 2. The Hypothesis Formulation Process Epidemiological research is a scientific process that requires a formal framework in which it is developed and tested, known as a hypothesis. A hypothesis is a scientific position that clearly outlines the logical background of an evolution and the predictions or consequences thereof. The principal goal of hypothesis testing is to 2 provide insight into the probable performance of an intervention without conducting the scientific study. In doing so, it is essential to support the null hypothesis, which is the negation of what is being observed, with physical evidence, without making any conclusions regarding it, to actually refute it. To do so, it is based on scientific indications that objective and methodological considerations are potentially and statistically valid. For the purposes of achieving this objective, a careful distinction must be made between the natural, built-in hypotheses and the hypotheses that are formulated for a given study. The automatic nature of these natural hypotheses can make it difficult for the student to formulate formal, written hypotheses prior to initiating their research. As their natural nature can result, the hypothesis is formed during a critical assessment of the area under study, and only presupposes that a level of preparation and understanding has been reached that allows us to identify the hypothesis in the areas where we are going to conduct our research. Strict proof of this empirical basis is therefore impossible. Outward estimates can be biased, information about the cause it bears can lack principal elements, and in-depth case estimates can be variable. Moreover, making this hypothesis explicit before starting research allows other researchers with different interests and backgrounds, or even other co-investigators who share similar interests, to participate and meaningfully influence its evolution and increase the costs of the investigation by increasing the chances of finding answers. 2.1. Identifying Research Questions How will you know what you want to do about your concerns, or how you might go about exploring your ideas, if you don’t know what they are? Powerful research starts with a clear purpose, a statement of the question, and a description of why this research needs to be done. What’s your concern? What are you curious about? What makes you wonder, ponder, consider, or begin to question? What might you do to explore or begin to answer your question or concern? To help you focus and refine your wonderings or ideas about a specific topic, it can be necessary to consider how many things you can explore, accept, reject, defend, test, explain, deny, celebrate, compare, integrate, develop, write about, or imagine? In fact, you will always have a choice between a yes/no fact and an open argument that is embedded in much of what you write, think, and speak about – what you do and demonstrate in much of your research. After all, what did you want to write about in your proposal? What did you think about before you wrote? What other interesting topics or questions arise, or occur to you as you wrote? What you want to write about, and how you intend to write about it, affects who you think your audience is and what language you will use. In other words, there is no one correct answer as you choose your topic. Indeed, it is 3 how you choose to write, what you write about, consider, and discuss in writing, that reflects what you think, what you know, how you know it, and how you know your audience and context, and, in turn, what shape or form you give to your topic, ideas, data, or observation? What other questions occur to you as you consider what you want to do in a focused approach to the form of the question being explored? What did, or might, you learn again, or argue about as you conclude? Is there something you were wrong or right about? What did you hope to address as you explore to ultimately test the questions associated with your research? 2.2. Developing a Testable Hypothesis All scientific research endeavors begin at the hypothesis stage. A hypothesis is any conjecture advanced for the sake of argument or in order to display a set of relationships. The development of a clear, specific, testable hypothesis is the first and most critical step in the scientific process. The term testable hypothesis can be used to distinguish this concept from the more general, everyday meaning of a hypothesis. A hypothesis must be testable to remain part of the work of scientists. An implied hypothesis is one that the researcher virtually lives in or assumes to be true. Several steps are required to establish that the relationship between the research question and the method is sufficient to examine the research question(s). A research hypothesis, commonly formulated and stated, is a specific, testable proposition about a phenomenon or group of phenomena. The phenomena in question can be either states of nature or the occurrence of an event. Therefore, a research hypothesis has two important features. First, it must be a simple statement. A research hypothesis must be concise, easy to comprehend, and related to the empirical world. Second, a research hypothesis must be testable. The results of testing the hypothesis should be observable and the test must be repeatable. Writing an entire paper using the implied hypotheses as a guide and testing whether or not the hypotheses hold at the end can be especially useful to the novice scholar. The established hypotheses serve as an outline and guide for more detailed research, which may be the subject of the final project. At a minimum, prospective users should clearly specify the population that the hypothesis is being tested on, and the direction of the hypothesized relationship. 2.3. Types of Hypotheses in Epidemiological Research Hypotheses in epidemiological research may be divided into explicative and index hypotheses. The explicative hypothesis generally assesses the association between an ill-defined risk factor and a health outcome. An index hypothesis typically assesses the association between two well-defined variables and a health outcome. However, a number of other types of hypotheses exist in epidemiological research. 4 An important question, specifically for explicative research, is what null hypothesis we wish to test? An increasing feature in the epidemiological literature is the use of negative control hypotheses. A large number of other types of hypotheses, often specific to certain forms of research, exist, and the purpose of this paper is to broadly provide a comprehensive description along with relevant examples. The principal type of hypothesis encountered in epidemiological research is that which tests for an association between an exposure or explanatory variable and a disease outcome. While not the sole intent, it is the explicative hypothesis that dominates the research literature. Often, some kind of quasi-variable is artificially created, using statistical techniques or case definitions, simply to generate a hypothesized link. Many non-judgmental problems occur in testing explicative hypotheses, and it is helpful to understand the principles based on which such hypotheses can be formulated. The true causes of diseases, however, may be too expensive, time-consuming, or unethical to study. In such cases, the investigation of respiratory disease used smoking as an index for the real hypothetical causal factors – injury, irritant, or susceptibility factors – and the association between smoking and the disease(s). 3. Theoretical Frameworks in Epidemiological Research Epidemiological research investigates the relationships between health-related outcomes and the associated factors. The behavior of the observed population is considered to be related to a variety of social structures. A solid background of the relevant theories on social structures is crucial for the formulation of relevant research questions and hypotheses, the selection of study participants, and the appropriate collection and interpretation of research findings. The research questions and hypotheses should be clearly formulated based on such theories that explain the characteristics of those determinants related to the health-related outcomes and the different levels of determinants related to the health-related outcomes. The appropriate statistical models and operational definitions of the determinants should be based on assumptions that can be dealt with and interpreted based on the theoretical backgrounds of the research questions and hypotheses. The three broad categories of potential theoretical perspectives currently employed in the field are ontological, phenomenological, and epistemological. Ontological theories focus on characterizing the nature of the relationships between population social structures and the distribution of health outcomes in a particular environment, such as the definitions of what determines population health and how social determinants cause these health effects. The dominant ontological 5 perspectives emphasize contributions of the macro-social structure using the concept of societal stratification. Phenomenological theories focus on describing individual and group experiences, concepts, or phenomena potentially associated with the distribution of health outcomes independent of a specific population social structure or relationships. While phenomena such as race, class, and gender influence the distribution of health outcomes and how they may be recognized, these associations are not assumed. Unlike ontological theories, phenomenological perspectives commonly characterize individual variation through the use of lived experiences, beliefs, values, knowledge, and perceptions. Instead of addressing the social structures, epistemological theories focus on classifying particular ethical, generational, or personal/interpersonal experiences within specific phenomena. These subjective categorizations distinguish different ways that societal standing may introduce partners or mediate relationships between social structure components. It is important to note that different types of theoretical perspectives can be employed in a single study. For instance, research may assess how particular social structures introduce stigmas that violate human rights or contribute to harmful experiences. Epidemiological research with large quantitative surveys has inferred that disability experiences and serious mental illness might influence experiences, psychological distress, harmful physical health difficulties, substance abuse, and suicidal ideation. Furthermore, it is important to discuss mixed models for psychological concepts and partnerships with other quantitative or qualitative measurement scales to potentially assess external evidence for criminal victimization and harmful physical health to prove that the models are appropriate. 3.1. Overview of Theoretical Foundations Stages of investigation and the sequence of inferences represent the following: theory; analysis plan; collection of data according to this plan; testing of theory through calibration of the investigator's model using the data; and making of inferences. Each of these stages predicates the foregoing one, and together they define the essence and the logic of empirical investigation. This is the scientific recipe for empirical investigation, and it seems quite strange that other recipes exist which instruct in practical importance but do not prescribe the series of detailed steps as a matter of tradition. The crucial role of theory is stressed by a number of authors. Theory tells the examiner what to look for. More importantly, it guides the selection of the statistical tools that are likely to help find it. Without theoretical grounds, scientific research may produce a variety of valid and statistically significant results to support rather than to refute a variety of diametrically opposed hypotheses. Yet, competing 6 hypotheses are essential for statistical testing, and it becomes indispensable to devise models that are explicitly alternated hypotheses of interest and to articulate a research strategy accordingly. 3.2. Application of Theoretical Models in Hypothesis Testing In this study, various models were implemented to analyze the extent to which different important housing-related factors were associated with certain mental health outcomes. The particular model to be used in testing specific hypotheses at a given stage depends on the objective of the research and the related theoretical background. GMM is applied when the main focus is on the differences in the structural relationships between the variables of interest at particular groups of cases. ML estimates are used to derive consistent estimates of the model coefficients and to obtain the proper standard errors and statistics. The use of covariance-based methods for structural model testing is consistent and efficient under the assumption. Covariance-based hypothesis testing under moment structures is consistent, unbiased, and has a chi-square distribution. Under the ML estimator, tests of the overall goodness of fit of the model can be assessed by utilizing the difference test. The chi-square test statistics indicate that the estimated pathways model adequately approximates the covariance matrix. The application of the chi-square model is employed to test the structure of the model. Results have shown that the covariance-based chi-square statistics do not equal the degrees of freedom, and the model is inconsistent for problems such as non- normality, sample size, or other misspecified models. The unstandardized and standardized path coefficient estimates in the multiple group analysis of this study and the variances among the unobserved latent variables are obtained from the covariance-based method to test a structural model. 4. Research Design and Methodology 4.1 The Stages in Scientific Research Broadly speaking, there are five stages in scientific research. These are: (a) Observation, description, and recording. (b) Formulation of hypotheses. This is at the core of any research and concerns logical speculation and the construction of an idea that can possibly (though not necessarily eventually) be tested and verified. (c) Designing methods, instruments of data collection, experimentation, verification, reconsideration, study, and further statement of hypotheses. This involves seeking evidence and recording measurements. (d) Analysis, explanation, prediction, and verification or falsification of the hypothesis. This is achieved through theory testing and quantification. (e) Drawing conclusions about consistency, extent of generality, and predictions that 7 future events may help to verify or falsify the same or other hypotheses. Insights about more complex aspects are achieved through further research and testing. 4.2 Conceptualization This aspect of the research process involves the operational definition of concepts. A concept is a mental image or a mental organization of ideas around similar activities or phenomena, which helps to think of them and provides a terminology to talk about them. The conceptualization aspect builds the intellectual scaffolding, which not only aids logical reasoning and the application of the human mind more effectively in the task of articulating ideas but also serves as the basis for devising indicators for qualitatively or quantitatively measuring various characteristics of the conceptualized idea. Initial specification of relationships that may exist among variables, uncovered and revealed through conceptualization, may be simple or even unique. These relationships, so established, often pertain to certain specified conditions depicted in the conceptual framework. 4.1. Types of Epidemiological Studies Epidemiology is the study of the distribution and determinants of health and disease in human populations. Ideally, epidemiological studies would be conducted as randomized controlled trials, in which the investigator randomizes subjects to an exposure, assignments are kept hidden from subjects, and the outcome of all subjects is definitively followed over time. However, randomized controlled trials are often not feasible for nutritional, environmental, and lifestyle risk factors for chronic diseases. Therefore, researchers have developed several types of observational studies, each of which asks a different type of research question. The observational studies utilize secondary data that measure exposure and/or disease outcome and/or potential confounding variables in subjects who independently determine their own exposure levels or are assigned in some non- random process. In contrast to the randomized controlled trials, in observational studies, the assignment of exposure level is made by the subjects or determined in a non-random fashion. They do not require subjects to remain in their assigned exposure throughout the study; they may be relatively inexpensive compared to randomized controlled trials, and subjects are observed more frequently and over a longer period of time than they are in randomized controlled trials. 4.2. Selection of Study Population Many epidemiological studies are conducted using readily available data because prospective cohort or nested case-control studies may be too costly. The cases and controls are then identified, and information on exposure status is collected. The available data could be disease surveillance data, which may be linked to other databases like population censuses, cancer, and other disease reporting systems. 8 Alternatively, data obtained from a survey conducted specifically for the risk factor under investigation, like normal volunteers, may be used. The other option is to select the study population where exposure information is available. For example, data from photography centers, given that photo taking is linked to melanoma. There are several advantages to this approach. The study population may include people in diverse social circumstances who normally would not agree to participate in an epidemiological study, and information on age, gender, marital status, reproductive history, etc., may not be available in disease registries and surveillance data sources. After enrollment, surveillance or medical records can be used to collect missing risk factor data. One of the advantages of using readily available population-based information is the higher generalization of the results to the general population. However, since these studies are often conducted according to entry and exit from the study population, the typical biases inherent in case- control and cohort studies are present. The biases need to be analyzed as presented in subsequent chapters. 4.3. Data Collection Methods Qualitative research uses both open-ended responses in questionnaires and interviews to collect data. For example, it asks whether there are any other life events that may have contributed to an individual feeling stress or asks patients to describe their personal experience of spinal surgery. An open-ended question or semistructured interview schedule is a necessary method of choice in examination studies. By using a semistructured interview or a highly open-ended questionnaire, we can maximize recall and tailor the questions to cover the relevant domains. However, the negative aspect is the greater variation in the responses, which will make analysis more problematic. It is important to try to maintain training and consistency in large-scale cross-checking data. We should also be wary of using questionnaires that have been developed in another population or setting, have not been pre-piloted with the population of interest, or simply appear not to fit the population. Remember that not all epidemiological data collection has to be prospective; many clinical studies have been initiated using secondary data analysis. Those secondary data could be used to save you the investment of developing and piloting scales. Many institutions and particular specialties also develop specialized data recording forms, which might also save time and effort. Secondary data might also be used in cross-sectional studies. In these, the investigator may choose to use recognized scales or assessment procedures that have not been previously validated in their population. 9 5. Statistical Analysis Techniques The statistical technique, which is most often used in the various allied disciplines of public health, is carried out usually without reference to a specific research question or objective. In this section, beginning with the simpler situation in which a single set of data representing some aspect of the health of individuals or groups is available. The disease in question could be a specific condition, a group of conditions, or a group of people. The variable (or exposure if used) in question could be an attribute, an environmental factor, or a specific situation. The association between a specific exposure and a specific disease in this situation is described by the relative risk, which is defined as the ratio of the incidence rate of the disease among exposed individuals or groups to the incidence rate among the non-exposed individuals or groups. The prevalence, which is the number of cases among all individuals, is another common measure of frequency that has been used in many studies of diseases. A prevalence ratio, defined as the ratio of the prevalence of the disease among exposed individuals or groups to the prevalence of the disease among non-exposed individuals or groups, is useful if the study is cross-sectional, and the occurrences of the disease are at the inception. The actual method by which such a frequency measure is obtained is of no concern to the epidemiologist, as long as the equally applicable measure is used for both the exposed and the non-exposed groups. 5.1. Descriptive Statistics The aim of descriptive statistics is to identify descriptive measures of variables used in the study. These measures include measures of central tendency (mean, median), measures of dispersion (variance, standard deviation, range, interquartile range, and percentile values), and modes. In conjunction with graphical presentations that help describe data adequately, these statistical quantities are necessary preliminary tools of data presentation. In order to describe more than a single variable, statistics that describe the relationship between different variables are needed. Cross- tabulations have descriptive statistics that help to describe this relationship. They can always be recorded for a pair of attributes, one of which is potentially the outcome (dependent) variable, the other the explaining (predictor, independent) variable. Graphical presentations for the purpose of descriptively evaluating the link between different variables, such as Talley diagrams, a simple single Talley diagram, are frequently used to assess graphs. Furthermore, summary (condensed) measures of association such as the contingency coefficient C (only for two by two tables), Somers' D (ranked measures of association), Kendall's tau-b, and tau-c should be recorded and described. These measures should also be graphically presented, and 10 frequently computed to give visually presented measures of association that are relevant and easy to calculate. 5.2. Inferential Statistics In statistical hypothesis testing and estimation, two types of errors are possible: Type I error and Type II error. The complete set of possible reproducible results, generated with the use of a specific plan and the values of a random variable, is called the sampling distribution. It can be difficult to collect data that are free from systematic or random errors. Systematic errors can also be introduced through the use of incorrect technical methods and instrumentation. The statistical analysis should be based on the statistical objectives and the characteristics of the collected data. Count and binary data are some of the most frequently collected data in biomedical and public health research. The chi-square and Fisher exact tests can be performed to provide inferences. Useful tools for summarizing information on the relationship between two quantitative covariates are the scatter plot and linear regression analysis. The most frequently used statistical methods for summarizing information by using quantitative outcomes are the use of graphical displays, the t-test, one-way analysis of variance, analysis of covariance, and a variety of other regression-based models. Careful consideration of how to model the relationship between the outcome and independent variables is necessary. 5.3. Regression Analysis In constructing the model, we must keep in mind that to be a meaningful tool for testing environmental hypotheses and/or making inferences, the model in use must take into consideration the dependent-errors relationship. The most familiar family of methods for doing this today is the so-called generalized least squares approach. The classical simple linear regression assumes that the variance of the errors is constant. If we have repeated measures or subjects from strata with underlying different variabilities, we must allow the corresponding errors to be generated from structurally different models and these models will have new unknown parameters. Note that whether to use linear or non-linear models does not depend on the structure of errors. The choice of model and structure of errors reflects the hypothesis of the evaluated reality based on expert knowledge, common sense, and applied statistical tests. For hypothesis testing, once the correct model has been built, we can use a two- staged approach – fitting a correlation model to estimate the unknown error parameters and subsequently correcting these estimates and the hypothesis tests, accounting for the fact that the model used was heuristically tailored in response to 11 the data. The joint use of hypothesis testing and exploratory analysis using this new dependent model can prevent wasting resources in identifying dependencies that seem to be present in the errors while simultaneously making more effective use of individuals from clearly defined strata by including known, meaningful correlation structures. The importance of this phenomenon in study planning is evident because the point of a clinical study undertaken to evaluate the efficacy of an agent is to determine the real relations between the agent and the tested entity, without which no generalization could be deduced. Accommodation of existing dependencies in errors can benefit the research in many ways, by making efficient use of the measured data as informative about these dependencies as possible, by assisting the informational extraction associated with hypothesis testing, by reducing bias and thereby improving estimates derived from study data, and by making the construction and interpretation of substantive models, valid tests, and confidence interval statements more substantial. 6. Case Study 1: Women and Coronary Heart Disease Introduction Few diseases spark intense political, economic, and scientific debates as does CHD. In 1991, 33.7% of the adult female population was found to have an elevated level of risk factors used to estimate CHD. This high percentage of women at high risk of CHD was quite consistent across different racial, ethnic, age, and social groups of women. The patterns showed the pivotal role played by the Oregon Risk Score, HDL-C, SBP, DM, and cigarette smoking in determining women's 10-year probabilities of having a fatal or non-fatal CHD event. DM acted virtually as an independent risk factor in that its effect on women's probabilities remained nearly constant after adjustment for the Oregon Risk Score, whereas the other risk factors acted as synergistic risk factors whose effects were progressively larger with increasing levels of the Oregon Risk Score. Materials and Methods Case Study 1 is essentially intended to illustrate how to apply the epidemiological procedures presented in the investigation of the associations of selected risk factors with an adverse health condition or a positive biomarker predictive of an adverse health condition. This is the same configuration that was used in the analysis of the relationships between the components of fasting and non-fasting lipid and lipoprotein profiles and adverse cardiovascular outcomes in the female population. In a nutshell, case occurrence, and exclusion and inclusion criteria of subjects are already described. To identify potential risk factors, the mortality of that population was characterized with cardiovascular heart disease codes and cause-of-death groups. Then, using the recommended epidemic phases, the surveillance, and infodemiologic and disease-dynamic triads, the significance of the health disorder and risk factors for that population was estimated in a three-bank multilingual, 12 etiological study of that open community of women that will be described in subsequent sections. 6.1. Background and Significance In prospective cohort studies with time to event as its primary outcome, when deciding on the sample size, it is essential to make a precise assumption about how large the effect of the exposure will be on this outcome. This effect is described in terms of relative risks, which are typically used to calculate these sample sizes. The incidence of the outcome in an unexposed group can be seen as an approximation of the risk of developing the disease in a general population. When binomial or Poisson distributed data are encountered, the confidence interval of this risk can be calculated, and this can be used to calculate the required sample size. However, when these distributional assumptions are violated, for example, due to a small number of events, overdispersion, or substantial dropout occurring in a clinical trial, then the sample size may no longer reflect the true uncertainty and may be calculated based on too optimistic or too pessimistic assumptions, resulting in too small or too large a sample size. To overcome these potential problems, many trial researchers assume a conservative effect size, for example, the minimum clinically relevant difference or a standardized effect size of 0.3. By doing so, the impact of the specific assumption about the underlying sample size will be disregarded and, as a consequence, the obtained literature outcome may not reflect reality. 6.2. Research Questions and Hypotheses Research questions define the goals of a study and guide the procedures, methods, and tests adopted in the attainment of those goals. They provide structure to studies and lead to the formulation of hypotheses. Over the last few years, considerable attention has been given to formulating specific, appropriate research questions that will address the implicit or explicit issues to be solved. The quality of research depends heavily on the quality of the formulation. The process of developing good research questions involves a number of steps, including clarification and refinement of the general area of study, acquisition of information about the validity and gap, precise definition of the confusion or ambiguity in the current position, development of a conceptual model of the problem, identification of the context in which the study is to be conducted, and finally, wholehearted participation in issues and debates, scientific and non- scientific. During these investigations, not only the fundamental principles constituting the field, but also results from relevant contemporary or precedent studies are explored, identified, and outlined so that what is actually known is 13 subsequently assigned to confirm uniqueness and originality. Then a proper scientific method to be utilized to critically elucidate the confusion is chosen, and research questions satisfying the requirements of theoretical relevance, empirical verification feasibility, and conceptual clarity are formulated. With limitations and constraints properly addressed, research questions are developed. Whether or not there is a relationship, associations, or differences in the patterns of observed phenomena can only be given meaningful scientific treatment. In response to the formulated research questions, scientifically based hypotheses that permit empirical testing are developed. 6.3. Study Design and Methodology CRITICAL REVIEW A critical review has been made on all aspects of a study, such as its design, conceptual framework, sampling frame, strength of association, accuracy of numerical estimates and statistical calculations, reliability, and significance of scores, interpretations, and so on. What criteria distinguish a publication from the point of view of its design? Particularly, the quality of analytical studies includes a number of valid criteria. Each of the subsystems has a uniquely important criterion based on coordinating the others into the system. The most widely employed criteria for observational studies were designed, and we shall consider them as the principle. (Sections of information to be covered) Introduction. The study design should be sufficiently described and appropriate. Objective and hypothesis of this study. The first system (to give information as to the design of the study in which it was written) is also necessary. Materials, Subjects, or Methods. How was the study conducted? The function of these is to decide on a design of a sample or experiment and, second, to provide the information necessary for other experimenters to independently analyze its appropriateness. Statistical analysis. It should be designed with the probability structure of the practical problems in mind and not that of a set of ideal experimental conditions. Results. Discussion. Dialogue. Meta-analysis. Deficiencies in reporting. Conclusion. To solve the problem of how to measure the sufficient evidence that the study was necessary to answer the question. 6.4. Data Analysis and Interpretation Hypothesis testing enables us to make generalizations concerning the degree of difference and relationship we observe in sample data. If we find that there is a difference, can we reasonably expect this to hold in general? For example, a case- control study of diet and colorectal cancer in Hawaii showed an increase in risk with consumption of specific items such as luncheon meat, frankfurters, and hot dogs. 14 The general question is whether these food items are etiologically relevant. By taking a sample of cases and controls, does this give us the empirical data that provides us with the level of confidence that allows us to make a general conclusion about the population groups from which the cases and controls were taken? In other words, will we find in the population at large the differential specificity on food items observed in the case-control study? 7. Case Study 2: Nutrition Status of Independent Older Adults Independent older adults are at risk of undernutrition, and therefore a proper understanding of the contributing factors is of substantial importance. The present case study is aimed at evaluating the nutritional status of a sample of independently living older adults and the direction of the association of several factors with nutrition deficits. The nutritional status of the sample will be evaluated implicitly by the degree to which the variables symptomatic of malnutrition are clustered. The resulting picture shall maximize comparability between the outcomes of the current study and the outcomes obtained in similar prior research. Four clusters with varying nutritional indicators are distinguished in the sample, and several associations predictive of undernutrition are corroborated, including impaired cognition, mobility restrictions, and acute injury. Additionally, current alcohol intake is associated with a reduced chance of being undernourished. Degradation of health status is a progressive and irreversible aspect of the aging process that is further escalated among institutionalized older adults. In contrast, independently living older people, although generally living with more chronic complaints such as mobility restrictions or impaired cognition, manage to maintain fulfilling lives. More specifically, the nutritional status and nutrition-related problems of these older adults form a complex issue that ranges from those dependent on social support and care to those essentially self-reliant. Because malnutrition is closely related to lower functional status and a decline in health status in geriatric populations, efforts should focus on several avenues to guarantee the favorable benefit-risk ratio of both independence and undernutrition. The number of independently living older adults is growing at each level of autonomy; the nutritional implications vary as do the needs of the target groups. 7.1. Background and Rationale In the field of epidemiological research, there is little information on what happens to primary hypotheses when they are first developed. Most studies of hypotheses in epidemiological databases are based only on final, rather than primary, hypotheses, 15 or only on hypotheses from a biomedical or substantive view. This approach fails to capture the complexity and polymorphism of the process of scientific discovery. Although the distinction between substantive and statistical hypotheses has been made for decades, there has been no taxonomic framework that maps the entire space of hypotheses used in testing epidemiological models. This text fills this gap by proposing a comprehensive typology of hypotheses systematically constructed for inclusion in decision trees demonstrating hypothesis heterogeneity in relation to the level and purpose of the study, the type of question defined in the research framework, and the nature of the epidemiological model. The primary goal of the first phase of the hypothesizing process map is to develop an intuitive classificatory model that encompasses the entire range of research questions harbored in the literature. A second goal is to elucidate the attitudes, beliefs, and working protocols of the investigators responsible for all this literature. The process begins by gathering a large number of published studies relevant to the biomedical subject. This is a fairly easy stage because numerous studies are constantly being produced; a large supplementary database may accumulate rapidly, depending on the topic. The next step is to extract from these studies as many quick expressions of intention. Then, the relevant information should be represented through a coding process. A frame oriented to the scientific and methodological aspects of the biomedical subject is used. Finally, a time-consuming sculpting phase produces the flow chart: hypotheses are grouped into macro-classes that gradually acquire rich semantics and coherent logical patterns. After reading, coding, and healing the database, the process map is fairly well developed. 7.2. Research Objectives and Hypotheses A research study of descriptive nature is normally made for the goal of formulating certain general assumptions so that many hypotheses emerge. Their examination can lead to the attainment of narrower knowledge and the constitution of an explicit hypothesis for verifying a conclusion. However, numerous questions are posed regarding the nature of the conclusions that can be reached through epidemiological inquiry. A research hypothesis can normally be submitted to the following conditions: it must concern a relation between one or more variables; express the association between the research variables under study; describe the properties of the subjects that were studied and that do not support previous statements available in the reviewed literature; propose specific characteristics of a certain population from the point of view of the study’s objective; and result clearly in the construction of hypothesis-writing methods. Through an epidemiological study, both research objectives and hypotheses can be considered shallow or deep from a theoretical and practical point of view, as shown in the following table. Simple 16 hypotheses are only interdisciplinary, simple, and superficial in either theory or practice, while deep ones are interdisciplinary or transdisciplinary, and can be as simple as a dilemma, that is to say, as naive as an experiment or as extensive as a methodology. It is through empirical research, however, that simple hypotheses are normally homologated to mere investigation, while more profound hypotheses guide the initial approach. 7.3. Study Population and Sampling Strategy A sample of people is drawn from the source population. The criteria used to define who is, and who is not, a member of the study population is known as the sampling frame. If the source population is a clearly defined and enumerated population, the sampling frame can be the same as the source. For example, the source population might be a list of all men aged 35-60 years who live in a particular area, and the sampling frame might be the electoral register or the list of men in this age group drawn from GP registers. If, however, the source population is a general category such as all people living in England, the sampling frame may only be a part of the source population, such as those notified to the Trent Cancer Registry, and this choice can lead to selection bias. To ensure external validity, it is important that the sampling frame reflects the appropriate members of the source population so that the results can be generalized to this population. 7.4. Results and Implications This study was a retrospective cohort study based on a national insurance-funded database, which successfully ascertained the surgical cohort in terms of the exposures to the medications for at least one year. We found that perioperative anti- hypertensive diuretic medications, with or without preoperative anti-hypertensive medications, suggested an independent association with 1-year mortality and long- term readmission within five years after hip fractures. There could be several potential pathways for the harmful impacts of diuretics in perioperative periods on postoperative long-term outcomes. Yet, it is uncertain whether the significant association is completely mediated by the potential decrease of blood pressure or the disturbances of fluid and electrolyte balance during the peritrauma recovery period. We not only discussed the pros and cons of the perioperative routine withdrawals of diuretics in this cohort but also proposed potential individualized intervention strategies based on the personal risk factors and the baseline status, which may provide new insights for future guidelines for the management of diuretics and other blood pressure lowering medications in perioperative hip fractures at the clinical level. Waiting for large prospective cohort studies or even randomized controlled trials to confirm and validate our findings on perioperative antihypertensive and diuretic medications is necessary and challenging but 17 extremely meaningful because of the short duration of hospital stay, old patient age, multiple comorbidities, and complex acute physiopathological disturbances during the perioperative periods. 8. Ethical Considerations in Epidemiological Research While ethical considerations in scientific research are not unique to epidemiological research, the uniqueness of exposure and outcome measurement and the collaborative nature of epidemiologic studies necessitate an explicit focus on these concerns within the field to achieve the highest ethical standards. In the pursuit of knowledge, the moral duties of researchers extend not only to the protection of the confidentiality of data but also to the maximization of the use of societal research resources, the protection of the research subjects, and the maintenance of a culture of scientific integrity. Ethical conflicts in scientific research are not uncommon. These may arise either within the research community or from the expectations of society at large. A key concern in epidemiology focuses on the obligation of researchers to employ to the fullest extent data generated in the course of their research activities. While extending study results beyond their original scope enhances the utility of research data, researchers are also bound to ensure that the interests of the research subjects involved receive careful consideration. Epidemiological data generation involves human subjects. Further, the increasing focus on data sharing and the recognition that data-driven science is a public good provide continued support for releasing information for public health benefit. 8.1. Informed Consent and Participant Rights Many researchers believe that having ethical approval is important since achieving trustworthy results does not compensate for the potential harm to humans. The most well-established research ethics principle is the necessity of informed consent because it protects one's freedom of medical decision-making. The moral aspect is about respecting the bodies and the personal views and beliefs of others. With respect to informed consent, every human being has the natural right to decide on the interventions performed on them. This right results in the patient's fundamental right to refuse or select their choices freely among the interventions on their bodies. The moral meaning of the freedom principle is that no one can decide to perform an intervention on the body of a person who is capable of deciding for themselves at the mental and intellectual level, and the principle of respect for persons who have the freedom to protect themselves in the final decision. 18 Informed consent is necessary to respect the right to self-determination. In any health intervention that requires information from a patient or their decision, the ethical obligation of the doctor is to explain in advance the correct, clear, complete, and understandable diverse alternative treatment possibilities, possible consequences and risks for the planned intervention, and the possibility of refusal, as well as any alternative methods available and their important features, including the possibility of abandoning those interventions without disciplinary consequences for the patient. Emergency interventions are performed without obtaining consent if immediately necessary interventions are vital in relation to delay, inability to provide the necessary information, nonexistence of the responsible patient to provide consent, or similar situations. 8.2. Confidentiality and Data Security In the last section, we discussed how epidemiological research raises numerous ethical issues, and we outlined institutional requirements, guidelines, and considerations surrounding these issues. In this section, we consider some concrete steps epidemiologists can take to prevent breaches in confidentiality and to keep individual research subjects' information secure. It is anticipated that data inspections or disclosure of datasets by investigators will occur at its peak and, therefore, it is crucial to proactively educate researchers, especially those early in their careers, about inferential risk and, if possible, the use of practical methods that will achieve this balance across databases. By performing statistical hypothesis tests, including with the length of the confidence interval, we can use confidence intervals to adjust the empirical findings. In response to security breaches in research involving personal information, data safety is one common security policy that specifies procedures for the enforcement of all standards for safeguarding data containing individually identifiable information from unwarranted access, use, or disclosure. Dataset owners often allow investigations for security inspections; however, there is no loss function associated with illegal or unethical disclosures, which limits the incentive for economists to enforce security standards. The emerging industry is protected by actual legal rights and the principles of cryptography, only prohibiting the access of illegal users who are still bound by the legal limitations. 9. Challenges and Limitations in Hypothesis Testing Finally, the validity of the results of a hypothesis test could be attenuated by problems in the data itself or in the study’s categorizations, the techniques or assumptions in the testing procedures, or the research design and specific approach to hypothesis testing. Though these issues are actually more in the nature of 19 potential limitations of such tests, as opposed to obstacles to conducting them. Large sample sizes, especially in multi-level or correlated data settings, more refined knowledge of the distributions, improved parameter estimation, less restrictive testing, alternative test metrics, or ever-more powerful testing techniques, or flexible and more nuanced definitions of research questions, exposure relationships, and the criteria of inference, offer possible workarounds. The purpose of this paper is not to describe such approaches, but to highlight specific challenges, obstacles, or questions that may confront the researcher in the course of testing taken-for- granted hypotheses. It is simple enough, for example, to test a correlation or an association or a difference, given one’s choice of the nature and direction and strengths of relationships that are supposed to exist in a dataset under a null hypothesis. However, every such choice does actually beg important questions about the uncertainties or unobservability of phenomena in the real world, not about the presence or absence of statistical relationships in a given dataset. Notwithstanding this, many limitations in the testing of hypotheses can be traced back to under-theorization, an underlying lack of agreement on and appreciation for the preliminary and inference-testing aspects of hypothesis testing, and the disciplinary or interdisciplinary constraints on the ways in which behavioral researchers might generate pre-study research questions and testable general hypotheses. 9.1. Selection Bias Owing to several design, implementation, and analysis features and strategies that can produce or otherwise affect biases in epidemiological studies, epidemiologists pay a great deal of attention to the issue. For example, the essence of a study's design may strictly condition the selection of study subjects; some studies are designed to incorporate it, e.g., clinical trials and case-control studies, among others. Many are not, e.g., observational studies carried out with data from any kind of surveillance system. Specific limitations refer not only to the human resources team or the reducing number of measurements capable of posing some ethical problems, but also to the results of the findings themselves, because the very validity of the essential measuring instruments and their scores is intrinsically determined by the selection mode mechanism, the key issue in creating any convenient selection of the study subjects. Because man constantly devises many ways of doing what he should not do, this chapter studies many forms of both intrinsic and extrinsic research, inherent to essentially all its stages, designed to create conditions for producing the main form of study bias. Biases of information and methodology, on the other hand, are studied in greater depth in subsequent chapters. Finally, we review the avenues that one should and can explore in terms of solving the problem, and the pitfalls and traps to avoid in the attempt. 20 9.2. Confounding Factors A variable may be a confounder for the relation of interest. The confounding variables cause differences in the observed effects of the risk factor under study. Classical examples of confounder variables are age, sex, and social class. These characteristics are known as "effect modifiers." Sometimes in epidemiology, two different definitions are given for such modifying factors. More important is the fact that the effect of the risk factor on the incidence of disease measures the differences in the mean risk of developing the disease; this measure is designated the risk index and is denoted by r, sometimes abbreviated to r. Concerning the initial risk definition, an important observation is that one may not proceed to the estimation of relative risks unless the immediately preceding risk function, denoting the risk of disease in the absence of the risk factor, behaves as a function of the effect modifier. A confounder usually correlates with both the risk factor under study and the outcome variable, and in this way causes an under- or overestimation of the actual magnitude of their relationship. The presence of statistical confounding is usually a theoretical possibility that should be considered when confounding variables are detected from biological reasoning or supporting evidence. Epidemiological methods have a way of studying and controlling confounding factors statistically. The most important thing, then, is to conduct a good study design in order to try to measure the important confounders adequately. 9.3. Interpretation of Causality It is tempting to interpret criteria for strengthening causality, like consistency, temporal relationship, dose-response relationship, etc., as prerequisites for causality. If we do that, we simply turn the whole thing upside down and say the criteria have to be fulfilled to say anything about causality. After all, even probability data have to be of a certain magnitude to be convincing and not due to bias. They have to be consistent and lead to sensible inferences when tested in different ways, etc. Still, from the point of epidemiological research, it is of great importance to take account of the potential existence of biases. The absence of a true effect may indicate that bias has hidden a real effect. Clearly, a synergistic effect of bias from different sources may further increase the inherent risk of incorrect conclusions. High precision of a biased estimator may make it preferable, although both estimators are generally biased. The estimates do not necessarily differ in either magnitude or direction. However, when applying research, precision is not all that counts. If bias acts with different directions on two estimators with a common true value, a comparison between the two may reveal much more about the potential effects of bias than an evaluation of the estimators as such. 21 10. Conclusion and Future Directions Begin Section. In conclusion, we have presented the main steps to be followed in hypothesis testing. The more peculiar methodological aspects of observational epidemiological research were also discussed. Aspects related to the formulation of hypotheses, apparently very simple, required greater attention, since their correct determination is an important step. The evaluation of completeness and quality in the formulation of the questions refers to their great importance in the various health fields of knowledge. Formulating good hypotheses is not a straightforward task; it needs to be approached slowly and settled on the basis of a theoretical structure. In epidemiology, the research hypothesis is posed in the form of the association between risk factors and disease outcomes, and its main feature is the presence of two variables. Nowadays, it is impossible to ignore the role of ordering and mechanistic hypothesis formulation. Variability and the characteristics of studies with low plausibility stem from the fact that, in usual clinical practice, the participation of confounding variables and effect modifiers occurs. Epidemiological research is employed to determine etiological associations between specific exposures and diseases and adverse health conditions, as well as the lack of association because there is a fundamental relationship between these associations and approaches to the formulation of hypotheses, especially considering the sequelae and scientific knowledge we have on the pathophysiological processes involved in the evaluated associations. Subsequent procedures, definitions, data collection, and data analysis are also closely related to the formulation of research hypotheses. Some ethical aspects were also addressed. Finally, it was discussed that the approach to the methodology of hypothesis testing is the logical expression of these ideas. However, there is still a lot of work to be done by the medical-scientific community to reach the ideal level of work performed in the field of epidemiology. Some indications for future studies are suggested. The study should be systematic, without many conceptions, preferably guided by previous investigations in which the phenomenon of interest was studied. 10.1. Summary of Key Findings In this large review of the developmental epidemiological literature, we identified 319 articles that included the exclusive use of multivariable hypothesis testing models despite analyzing one or more longitudinal data sets or making use of a wide variety of covariates in the analyses. These analyses were conducted with an early cohort of the study population or one or more later cohorts of the same population. We found these articles in all of the major journals of both pediatrics and public health, particularly in the broad topic areas of maternal and/or child health as well 22 as the more specific topics of obesity, child development, academic achievement, and addiction, mental illness. The use of certain multivariable models markedly increased over time after controlling for such potential confounders as general journal and author research experience. These data suggest that multivariable models for hypothesis testing have become more popular, despite the widespread acceptance of certain limitations on their use in developmental epidemiology literature, such as consideration of some as 'exploratory' data analysis techniques as opposed to 'confirmatory', among other arguments. We discuss potential implications of our key findings in this methodological area for both the research enterprise and policy decisions as well as potential remedies for these limitations. 10.2. Implications for Public Health Policy The primary criterion for decision-making in public health policy should be the strength and extent of the scientific evidence based on epidemiological studies, experimental infection, and toxicological data rather than clinical experience. By formulating and testing hypotheses in epidemiological research, this basic condition is satisfied. At best, it is possible to produce not just descriptive epidemiological data, but relationships revealing possible causes of diseases, which can suggest appropriate preventive measures. When the reason for a disease is not entirely known and part of the variance of the risk is due to unknown, unobserved factors, policy should be directed towards making exposure-disease relationships more transparent. That will help to reduce the scope for debate about causality and to focus instead on primary prevention. In the past, various studies suggested possible causes for particular diseases; for example, coronary heart disease was considered infectious, vitamin deficient, or lipid-related. The need for revision or rejection of hypotheses, particularly those about diet, preceded advice to communities for primary prevention in the form of tobacco exposure, the development of lower-fat diets, and the adoption of healthier lifestyles. These recommendations cannot often be stated too strongly for future health benefits, especially in countries with an unfavorable economic situation, but they should be supported by means of scientific evidence. In public health, it is necessary to emphasize the use of epidemiology and prevention to guide the formulation of hypotheses about causal relations, the investigation of such hypothesized relations, policy to prevent exposure, and the assessment of policy effectiveness. 23 10.3. Future Research Directions The decision to conclude the current book chapter with research questions and hypotheses instead of concluding with a summary of the chapter indicates the emphasis on the need to develop some of the relationships highlighted as crucial in the chapter. Furthermore, the research questions represent the compiled common questions raised via a review of the literature. One piece of feedback from some students from the course on which this book chapter is based was that they would have loved to have more examples of formulated hypotheses from a wide range of epidemiological domains provided in a list at the end of the book chapter. The given list of future research questions thus serves as a guide on where to find potential student research topics on hypotheses. In summary, having a research hypothesis is an important first step in a scientific investigation. A good research hypothesis is a clear and testable statement of the relationship between the outcome variable and the independent variables. Formulating a research hypothesis requires that the scientific questions relating to the study should already be well understood. The preliminary research or explorative study is therefore essential before developing the hypothesis. A unique strength of hypotheses is that they lead to the development of study protocols, including the design of new studies or experiments and the number of observations required for any study. Additionally, hypotheses provide a direct path to evaluate and quantify the implications of the results through decision rules that promote rational management options. 24