The Foundations of Scientific Thinking Notes PDF

Summary

This document contains notes on the foundations of scientific thinking, covering topics such as epistemology, influences of empiricism, induction vs. deduction, and aspects of current scientific thinking. It is intended as a study guide for an undergraduate level science course.

Full Transcript

The Foundations of Scientific Thinking Notes Page 1 of 47 Contents The Development of Modern Science.................................................................................... 3 Epistemology.......................................................

The Foundations of Scientific Thinking Notes Page 1 of 47 Contents The Development of Modern Science.................................................................................... 3 Epistemology.................................................................................................................... 3 Influence of Empiricism on Scientific Inquiry...................................................................... 5 Induction vs Deduction..................................................................................................... 6 Parsimony/Occam’s Razor................................................................................................. 7 Falsifiability...................................................................................................................... 8 Significance of Confirmation Bias....................................................................................... 9 Cultural Contribution Knowledge.................................................................................... 12 Paradigm Shift................................................................................................................ 15 Influences on Current Scientific Thinking............................................................................. 19 Ethics, Morality, & the Law.............................................................................................. 19 Current Influences on Scientific Thinking......................................................................... 20 Influence of Ethical Frameworks on Scientific Research.................................................... 20 Use of Research Data...................................................................................................... 25 Page 2 of 47 The Development of Modern Science Epistemology o Epistemology is defined as: ‘a branch of philosophy that investigates the origin, nature, methods, and limits of human knowledge’. o Scientific epistemology explores the nature of scientific knowledge. It consists of three aspects: ▪ Science attempts to explain natural phenomena ▪ Scientific knowledge is represented as laws and theories. ▪ Laws: describe patterns and relationships in scientific information. 1. The Qualities of ▪ Theories: provide explanations of natural phenomena. Scientific ▪ Scientific knowledge is tentative as it requires revision. Knowledge ▪ Science is part of the social and cultural traditions of many human societies. ▪ Scientific ideas are affected by social and historical setting. ▪ Science does not make moral judgements (e.g. should euthanasia be permitted?). ▪ Science does not make aesthetic judgments (e.g. is Mozart’s music more beautiful than Bach’s?). 2. The Limitations ▪ Science does not prescribe how to use scientific knowledge (e.g. of Scientific should genetic engineering be used to develop disease-resistant Knowledge crops?). ▪ Science does not explore supernatural or paranormal phenomena (e.g. religious ideas and ghosts). ▪ The development of scientific knowledge relies on observations, experimental evidence, rational arguments and scepticism. ▪ Scientific knowledge advances through slow and incremental steps (evolutionary progression), as well as giant leaps of understanding (revolutionary progression). 3. How Scientific ▪ Observations are theory dependent, which influences how Knowledge if scientists obtain and interpret evidence. Generated ▪ There is no universal step-by-step scientific method. Scientific knowledge is acquired through a variety of different methods. Two main lines of reasoning that influence modern science as inductive (generalisations) and deductive processes (deriving). o Science distinguishes itself from other ways of knowing and from other bodies of knowledge through the use of empirical standards, logical arguments, and scepticism, as scientists strive for certainty of their proposed explanation. Page 3 of 47 o Alternative Ways of Knowing: Alternative Ways Explanation Examples Can/should we control our Emotion feeling, as opposed to reasoning emotions? Are emotions the enemy of, or necessary for, good reasoning? Can theistic beliefs be considered knowledge because they are Faith/Belied trust or confidence produced by a special cognitive faculty or “divine sense”? Does faith meet a psychological need? What is the role of imagination in forming new ideas, or images or producing knowledge about a real Imagination concepts of external objects not world? Can imagination reveal truths present to the senses. that reality hides? Are there certain things that you a form of knowledge that appears have to know before being able to Intuition in consciousness without obvious learn anything at all? Should you deliberation trust your intuition? How does language shape a system of communication used by Language knowledge? Is the importance of a particular country or community. language cultural? Can we know things which are the faculty by which the brain beyond our personal present Memory encodes, stores, and retrieves experience? Can our beliefs information. contaminate our memory? What is the difference between a basis or cause, as for some belief, Reason reason and logic? How reliable is action, fact, event inductive reasoning? How can we know if our senses are understanding gained through the reliable? What is the role of Sense Perception use of one of the senses such as expectation or theory in sense sight, taste, touch or hearing perception? o Navigation ❖ Early travellers relied on their senses (sense perception) to observe landforms, wind speed and direction, tides and measures of distance to navigate (observational knowledge). Celestial navigation using the positions of stars, constellations and the sun also served as navigational aids. In those times, travel was restricted to short distances, or to coastal areas. ❖ With advances in measuring techniques (and geometry), accurate maps were created. Such calculations indicated that the Earth was a sphere. The altitude of the North Star provided latitudinal information. These are examples of knowledge constructed through memory, language (communication through oral stories, written accounts and maps) and reasoning. ❖ Later, navigational instruments extended the powers of sense perception. The compass was an important tool to orientate Page 4 of 47 travellers to the magnetic north (works at night as well). Other instruments, such as the astrolabe, Sextant, chronometer and Chip Log were designed to identify locations in 3-dimensional space. The information from these instruments was used to produce highly refined maps (ways of knowing: reasoning, imagination, intuition, language). ❖ Modern navigation uses radar, gyroscopic compasses and the GPS to provide positional and kinematic (e.g. speed and acceleration) information. ❖ Polynesians used natural navigation aids such as the stars, ocean currents, and wind patterns. They used non-physical devices such as songs and stories for memorizing the properties of stars, islands, and navigational routes. Influence of Empiricism on Scientific Inquiry o Science is derived from philosophy. The term ‘philosophy’ means the love of wisdom. One branch of philosophy focuses on developing explanations of the natural world. This branch was called ‘natural philosophy’. o Around the 15th century, natural philosophers began to redefine how knowledge of the natural world should be constructed. Natural philosophy was the beginning of science. In the 19th century, the British philosopher, William Whewell, coined the term ‘science’ to describe the type of inquiries undertaken by the natural philosophers. Eventually, the term science became distinct from other branches of inquiries (such as philosophy, religion, etc). o As an example of the common roots of science and philosophy, the highest research degree awarded by Universities around the world is the Doctor of Philosophy, even in science. After a person receives a PhD degree, they are allowed to use the title “doctor” (medical science is the exception to this rule). o Empiricism is a branch of philosophy that emphasises ‘prior experience’. Empiricists say that we can only construct knowledge after collecting information through our senses. Sensory information extended to information collected using instruments. o Therefore, observations are important for knowledge construction. The information collected through observation eventually becomes evidence and explanations for natural phenomena. Over time, the evidence and explanations become knowledge. o Empiricism was crucial for separation of natural philosophy from the other branches of philosophy. It came to define modern science. Most scientific knowledge is empirical. Empiricism demands that all scientific information be based on evidence and tested through observations or experimentation. Page 5 of 47 Induction vs Deduction o Induction is the process of generalisation. After collecting information about specific events, generalisations are drawn. They describe the broad applications of the conclusions. In science, inductive reasoning allows explanations of related phenomena to be constructed. o Deduction is the process of deriving specific knowledge from broad ideas. Therefore, deductive reasoning is often used to make predictions. ▪ The top panel illustrates inductive thinking. When a leaf is examined under a microscope, it is seen to be composed of cells. Examining the leaves of many plants shows the same conclusion. Therefore, through inductive reasoning, we may conclude that all plants are composed of cells. In doing so, the definite conclusion of each observation is used to synthesise a generalisation – the Cell Theory. Theories are big ideas in science – broad explanation of natural phenomena. ▪ The lower panel illustrates deductive thinking. Here, we start with a big idea – that of the Cell Theory, which states that all plants are composed of cells. Suppose you have come across a new and unknown type of plant. Based on the Cell Theory, you predict that the unknown plant is composed of cells. This prediction is called a hypothesis. You then conduct an experiment, where you observe that the plant is indeed composed of cells. In this case, we have moved from a general instance (the Cell Theory) to a specific conclusion (that the new, unknown plant is composed of cells). ▪ Many of the big ideas or theories in science are the products of inductive thinking. This example shows Charles Darwin’s inductive thinking on populations of organisms. 1. Darwin makes several discrete observations about how individuals in a population are adapted to their environments (for example the beaks of different populations of finches show different shapes). 2. Based on the five observations shown on the slide, Darwin makes two inferences. 3. After many such inferences, Darwin develops a new big idea, which generalises how populations change over time. This is known as the Theory of Evolution by Natural Selection. Page 6 of 47 o Scientific Laws describe the relationships between the variables of a system. They are usually expressed in the form of mathematical equations. This slide shows equations in the Chemistry and Physics datasheets. o Scientific laws are examples of inductive reasoning. ▪ The discovery of the electron by J.J. Thomson is an example of deductive thinking. While studying the nature of cathode rays, Thomson was exploring the basis of the Atomic Theory. ▪ In the 19th and early 20th centuries, it was thought that atoms were electrically-neutral and indivisible constituents of matter. However, through careful experimentation and data analyses, Thomson discovered that atoms were composed of subatomic particles. ▪ One type of subatomic particle was negatively charged and was called the electron. As a result of his discoveries, the Atomic theory was modified. Parsimony/Occam’s Razor o William of Occam was an English friar who live in the 12/13th centuries. Although he did not invent the phrase, he used the phrase “Plurality must never be posited without necessity” frequently in his writings. Many thinkers before Occam, including the Greek philosophers Aristotle and Ptolemy, have made statement similar to this. A modern-day statement of Occam’s razor is “Other things being equal, simpler explanations are generally better than more complex ones”. o Science works with competing ideas. That means that when scientists are trying to develop explanations of some phenomenon, the devise alternative hypothesis. Sometimes, after testing those hypotheses, there may be more than one plausible hypothesis for a phenomenon. In those situations, using Occam’s razor may be useful. Occam’s razor says that if the competing hypotheses are equivalent, then the simpler hypothesis is the best explanation for the phenomenon. Page 7 of 47 o There are many historical examples of the use of Occam’s razor. Before the 16th century, the geocentric model of the solar system (Earth at the centre of the solar system) was dominant. Then, this was replaced with the heliocentric model (Sun in the centre). The geocentric model required a number of complicated features (e.g. epicycles) to explain some unusual phenomena (such as the retrograde motion of Venus). The heliocentric model did not require such features and is thus a simpler model. o Scientists do not use Occam’s razor exclusively when accepting ideas in science. o The most important factor is evidence. o Other considerations: ❖ Are some ideas more testable than others? ❖ Are some ideas better at producing broader explanations? ❖ Are some ideas a better fit with existing ideas? ❖ Are some ideas better at generating new areas for investigation? Falsifiability o Falsifiability is a method of developing scientific knowledge. It is a type of deductive reasoning. It claims that all scientific ideas should be falsifiable through testing (for example, through experimentation). If an idea cannot be falsified, then it cannot be scientific. For example, creation science or intelligent design are not considered to be scientific because you cannot test their ideas. o While not everything agrees with the principle of falsification, falsification impacted on two aspects of science: ❖ Differentiating scientific ideas from non-scientific ideas ❖ A method to test and verify scientific ideas. o Falsification has given rise to one method of testing and verifying scientific ideas. This is known as hypothesis testing. o Hypotheses are tentative explanations of a narrow set of related phenomena. For example, consider the hypothesis “particulate pollution in the atmosphere increases the incidence of asthma”. This hypothesis Page 8 of 47 proposes an explanation for the increased incidence of asthma. It is a tentative explanation that is based on observations, but needs to be verified. In other words, the hypothesis needs to be tested. o To test the hypothesis, a controlled experiment must be conducted and the data generated in that experiment analysed. o Two important features of hypotheses are that: ❖ Hypotheses cannot be proven to be true – they can only be falsified (this is because of the falsification principle). ❖ Hypotheses can only be rejected (if they are NOT supported by evidence) or not rejected (if the evidence supports the hypothesis). o These are important features of hypotheses to bear in mind. The goal of hypothesis testing is to reject what is false (not supported by evidence). o Often, hypothesis testing also involves the statistical analysis of experimental data. So, it is not simply the data collected in an investigation that is used to verify hypotheses, but the quality of the data. Significance of Confirmation Bias o Observations are an important element of scientific inquiry. Inferences can be influenced by: ❖ Confirmation Bias ❖ Theory-Laden Observation o Confirmation Bias: the tendency to search for or interpret information in a way that confirms one’s preconceptions. o Theory-Dependent Observations: how previous experiences, beliefs and assumptions affect the inferences drawn from observations. o No matter whether we use inductive or deductive reasoning, observations are important. The quality of observations is crucial for initiating inquiries and investigations. Consider Marshall and Warren’s study on the microbial cause of gastric ulcers. The observation that Helicobacter pylori are frequently associated with gastric ulcers was a crucial observation that led to the discovery that that bacterium caused the disease. Page 9 of 47 o Another important aspect of observations is the analysis of data. Identifying patterns and trends in experimental data is critical for finding evidence that may support the hypotheses. o Theory-dependent observations refers to observations that are dependent on theories. It means that prior knowledge of scientific theories may influence that inferences that we draw from observations. This extended to the ways that we analyse and interpret observations. Optical illusions are often used to illustrate theory-dependent observations. In the picture shown in the slide, the image on the left shows two lines, one vertical and the other horizontal. o On initial inspection, the vertical line appears to be longer than the horizontal line. Yet, when measured, both lines are of the same length. Although this example is simplistic, it illustrates how the initial interpretations we make of our observations may be misleading and require further inquiry. Therefore, how we interpret observations is dependent on prior experience. o Theory-dependent observations play an important role in the way ‘experts’ interpret information. For example, an X-ray image may not be informative to the untrained person, but to an experienced radiologist, the same image may be very informative. The radiologist may be able to pick up certain conditions or pathologies, because of his/her past learning and experiences. o Theory-laden observations may also be responsible for professional intuition, where the expert practitioner may be able to arrive at certain conclusion without conducting much analysis. Page 10 of 47 o Theory-laden observations can lead people to derive different conclusion from the same set of observations. As shown on this table, in many scientific fields, scientists have come to different conclusions while studying the same phenomena. In some instances, those conclusions have been wrong (e.g. Aristotle, Ptolemy). In other instances, the different conclusion describe different aspects of the same phenomena (Newton and Einstein). o Confirmation bias is not a good thing in science. As the name suggests, it is a form of bias. That bias improperly confirms a researcher’s belief about the outcome of an inquiry. There are many reasons why confirmation bias may occur in a scientific investigation. Some of the reasons are listed on this slide. o Poor experimental design or data is a major cause of confirmation bias. Sometime, preliminary studies are interpreted as confirmatory studies. For example, in a recent article, it was claimed that bald men are more likely to be afflicted with Covid-19 disease. This was only an observation is a couple of hospitals, and not the result of a well-designed investigation. Confirmation bias may also occur when correlation is confused with cause- and-effect. o Here is an example of confirmation bias in the scientific literature. It is generally assumed by biologists that ants are more aggressive to ants from neighbouring nests than to those from their own nest. A research team in Melbourne decided to examine the scientific papers on the nesting behaviour of ants. They looked at 79 publications, and noticed that only 29% of those were designed as blinded-studies. Blinded studies are controlled- experiments. In other words, 79% of the published studies did not use a proper experimental design. The researchers also noted that the studies that were not controlled-experiments, the assumption of ant behaviour was identified. However, in the controlled- experiment studies, the reverse was observed. This occurred because in the uncontrolled studies, the researchers did not check for aggressive behaviour within each nest. They simply assumed that ants were less aggressive towards nest mates, compared to ants from other nests. Thus, the poor experimental design of those studies resulted in a confirmation bias. Page 11 of 47 Cultural Contribution Knowledge o Many of those knowledge systems have influenced the development of scientific knowledge. o Knowledge construction is closely linked with cultural constructs. This means that knowledge construction depends on the languages used in a society, the cultural practices and other factors. In the preceding slides, we looked at how scientific knowledge is constructed. We examined the central role of empiricism, reasoning tools (such as induction and deduction), Occam’s razor, falsification, confirmation bias and paradigm shifts in developing scientific knowledge. Most of the scientific research and knowledge construction that happens around the world is largely the product of European schools of thought. o However, all cultures in the world have systems for constructing knowledge. In every culture, knowledge is constructed and communicated in ways that are specific to those cultures. For example, the knowledge systems in indigenous societies are called Traditional Knowledge. Many governments are now tapping into traditional knowledge systems, as those systems have developed different, but relevant, explanations of natural phenomena. o As with science, cultural observational knowledge is based on developing inferences from observations. However, there is little or no experimentation such as that seen in science. Over the years, cultural observational knowledge has made significant contributions to scientific advancement. o One example of cultural observational knowledge that is common to many societies is astronomy. People around the world realised that many natural phenomena can be attributed to astronomical events. For example, changes in the seasons, weather and tides were associated with the positions of the sun and the moon in the sky. Agriculture was dependent on seasonal information. The patterns of stars (constellations) in the sky could provide positional and directional information for travel. Therefore, many cultures developed systems for measuring and analysing astronomical data. As shown in this slide, observatories have been identified in ancient Mexican (Mayan), Egyptian, Indian and Chinese societies. Much of this information has been used to construct knowledge of the Earth (for travel and trade) and astronomical phenomena. Page 12 of 47 o The indigenous cultures in Australia are ancient and have existed in this continent for more than 60,000 years. There were more than 400 Aboriginal nations in Australia. There were different languages and cultural practices in those societies. They constructed knowledge of natural phenomena and transmitted that knowledge mainly in the oral tradition. For example, Aboriginal societies studied the night sky and developed mythical tales of constellations and other astronomical phenomena. o The emu in the sky describes the region of the Milky Way that is adjacent to the Southern Cross, and forms part of the Dreaming narrative about creation. Other stories were built around the Pleiades system and the Orion constellations. In addition to mythologies, the night sky also provided information for seasonal changes, and as guideposts for celestial navigation. Time, calendars and information about seasons were developed using astronomical knowledge. Some other uses of astronomical knowledge in Aboriginal societies are indicated in this slide. o Aboriginal societies also developed extensive knowledge about local Australian ecosystems. This knowledge is referred to as Traditional Ecological Knowledge. That knowledge is currently used in Australian states and territories for managing ecosystems and landcare. Their understanding of the role of bushfires in the functioning of local ecosystems is proving to be critical for modern fire management systems. Another area of traditional knowledge that has received scrutiny is bush medicine. Traditional knowledge is being used to identify new substances from native plants that have medicinal and therapeutic value, including antibiotics, antimicrobials and antiviral products. Thus, contemporary society benefits from traditional knowledge as it become integrated with scientific knowledge. o The use of traditional knowledge for the development of medicinal, therapeutic or health products has implications for commercialisation practices and intellectual properties. o All civilisations developed knowledge of natural phenomena. As shown in this slide, the cultural observational knowledge of many civilisations influenced the development of modern science. The Islamic cultures of the Middle Ages amalgamated and advanced the knowledge systems of those civilisations and formed the basis of scientific development in Renaissance Europe. Page 13 of 47 o Greece: parallax measurements and geometry; geocentric and heliocentric models of the solar system o Egypt: curvature of the Earth (Aristachus), calendar, brewing, agriculture o India: metallurgy, surgery, medicine, mathematics, astronomy o China: metallurgy, printing, explosives, paper, irrigation, acupuncture o Islamic: medicine, physics, chemistry, biology, astronomy o Here is an example of cultural observations that enhanced scientific understanding of natural phenomena. During the Middle Ages, the Islamic world was a centre of learning. As a result of military conquests and trading relations, Islamic cultures in the Middle East and North Africa had access to knowledge and data from many parts of the world. Islamic scholars translated the works of the ancient Greeks, Romans and Egyptians. They assembled information about scientific discoveries from far-off places, such as India and China. Universities in the Middle East were highly-regarded centres of learning. o Another example of the contribution of cultural observational knowledge to science is the discovery of the anti-malarial compound, Artemisinin. A Chinese literary work, dating back to ~300 A.D., suggested that preparations of the herb may provide protection against malaria. You-You Tu decided to extract the active ingredient from this plant so as to develop the substance as a therapeutic. After many years of sustained effort, Tu isolated the extract and showed that it was effective against both types of malarial parasites. Tu then proceeded to determine the chemical structure of the active ingredient, which as called Artemisinin (after the scientific name of the herb). Artemisinin is now part of the established anti-malarial therapy used around the world to treat the condition. Research is also underway to develop new therapeutics, based on the molecular Page 14 of 47 structure of Artemisinin. For her efforts, Yu received the Nobel Prize in Medicine or Physiology in 2015. Paradigm Shift o Thomas Kuhn was a science philosopher who explored scientific epistemology. For any scientific discipline, the set of concepts, theories, research methods and postulates used by scientists makes up the paradigm of that discipline. o Kuhn said that the scientific paradigm consists of normal science and puzzle- solving science. Normal science is everyday science, where scientists conduct inquiries into the paradigm, for example, by verifying hypotheses. The ‘discoveries’ of normal science are expected findings, based on the prevailing paradigm. Over time, anomalies will appear in the prevailing paradigms. o Those anomalies are referred to as puzzle-solving science. Scientists then conduct investigations to understand the anomalies. The discoveries in the puzzle-solving sciences are unexpected and lead to paradigm shifts (also called scientific revolutions). We will explore these ideas further in the subsequent slides. Page 15 of 47 o According to Kuhn, a paradigm shift occurs in 3 stages: ❖ In Stage 1, normal science dominates. Scientists go about verifying the prevailing concepts through observation and experimentation. During this stage, hypotheses that are supported by evidence will be retained, while those that are not, are rejected. This builds into a body of scientific knowledge that forms the paradigms for the various scientific disciplines. ❖ In Stage 2, scientists note that there are anomalies to the prevailing paradigms. Anomalies are experimental data or observations that cannot be explained by contemporary concepts. ❖ In Stage 3, the anomalies force scientists to search for new explanations. When identified, the new concepts, ideas and explanation result is a paradigm shift. 1. The prevailing view of how populations change over time is the model proposed by Lamarck. According to him, the events that an individual experiences in its lifetime will be transmitted to the offspring. This model is called the Inheritance of Acquired Characters. For example, the children of a person engaged in physical labour will develop a stronger physique. 2. There were many anomalies that could not be explained by this paradigm. One such anomaly is the experiment conducted by the German biologist, August Weissman. He took a population of rats and amputated their tails. Those rats were allowed to breed. All of the offspring had normal tails. Once again, he amputated their tails and bred them. He did this for 19 generations, but all of the offspring in every generation had normal tails. Therefore, this observation was an anomaly to the theory of the Inheritance of Acquired Characters. 3. Charles Darwin then proposed the Theory of Evolution by Natural Selection, which was a different model to the Inheritance of Acquired Characters. Eventually the paradigms of evolutionary biology shifted, and Lamarck’s theory was discarded. Darwin’s theory of evolution, together with advances in genetics and cell biology, changed our understanding of inheritance of biological traits in populations. Page 16 of 47 o When anomalies appear in a scientific discipline, there are 2 ways by which paradigm shifts may occur. This is theory replacement. This describes the situation when a new theory replaces an old theory. In modelling the arrangements and movements of planets, the geocentric model was replaced with the heliocentric model. o The second method by which paradigm shifts occur is called theory modification. Here, the old theory is not replaced, but modified. In the example shown on this slide, the paradigms of Newtonian mechanics was modified to include Einstein’s theories of relativity. Both theories are relevant and valid, but are used to describe different systems. o Both Galileo and Newton laid the foundations of classical mechanics. Classical mechanics describes the motion of macroscopic (large) objects, including those that are at rest. This movement is described in terms of the masses of objects, and the forces acting on it. Parameters such as distance, speed, time, space characterise the properties of a moving body. They were very successful because if we know some initial conditions of a moving body, then we can predict certain future outcomes, based on the principles of classical mechanics. For example, if the speed of a moving object is known, we can calculate the time taken for the object to travel a certain distance. Classical mechanics applied equally to everything, everyone and everywhere: for example, Newton’s 2nd Law of motion (F = ma) equally applicable on the moon as it is on the Earth. Many aspects of modern society are based on the principles of classical mechanics, for example, calculating travel times. o Despite the success of classical mechanics, scientists noted that it could not explain some specific types of phenomena, for example, very small objects (e.g. atoms), very large objects (e.g. stars), or objects moving very fast (near the speed of light). Indeed, Newtonian mechanics was incompatible with Maxwell’s description of electromagnetism. o Einstein explored some of these anomalies using a combination of ideas in physics and mathematics. Special relativity describes the physics of particles and waves moving at, or close to, the speed of light. On the other hand, Page 17 of 47 general relativity deals with the physics of large objects (gravitation). Relativity did not displace the principles of Newtonian physics, but modified them. For example, special relativity describes an equation for calculating the momentum of a particle that is moving close to the speed of light. However, for a particle that is moving at slower speeds, Einstein’s equation for special relativity approximates to the equation for momentum in classical mechanics. Special relativity also introduced new concepts in physics, such as length contraction, time dilation, relativistic mass, a universal speed limit, and mass–energy equivalence (E=mc2). General relativity extends Newton’s Theory of Universal Gravitation. According to Newton’s model, gravity is an attractive force between two objects. However, General Relativity extends that idea to describe gravitation as the warping of space-time around massive objects. Indeed, some aspects of classical mechanics are seen to be an approximation of special relativity at low velocities, and special relativity is an approximation of general relativity in low gravitational fields. o Modern scientific practice is based on philosophical thinking that developed over many centuries in Europe, as well as the cultural observational knowledge of many societies around the world. Modern science is a powerful method of inquiry. The findings of scientific inquiry eventually develop in a dynamic body of knowledge, known as science. Empiricism was a strong force that shapes scientific inquiry – the emphasis on evidence and experience is central to the development of scientific understanding. Page 18 of 47 Influences on Current Scientific Thinking Ethics, Morality, & the Law o Consider the following scenario: imagine you are driving a car and are approaching a traffic intersection. The light has just turned red. What do you do? You will, I hope, come to a stop at the intersection. Now, think about your reasons for stopping at the intersection – did you do it because it is illegal to cross the intersection when the light is red? Or did you think about safety issues of not following the traffic signals? What if, after having stopped at the intersection, you notice that no other vehicles are approaching the intersection – will you be tempted to cross the intersection even though the lights are still red? o Since we live in societies among many other people, our behaviours will affect the people around us. The decisions we make may also affect the other living things with whom we share the planet, as well as affect its environments. Therefore, in all societies, rules of behaviour and norms are developed in the interest of the greater good. Essentially, there are three influences on human behaviour. o Scientific ethics is concerned with the truth and integrity of scientific practice. The processes used to conduct investigations, analyse and communicate the findings should all be based on honesty. That honesty is one of the most powerful features of science. o Ethically Questionable Practices ❖ This team of scientists claimed to have performed cold fusion (generating energy from radioactive fusion reaction at ambient temperatures). ❖ The case of cold fusion shows another aspect of questionable scientific ethics. Stanley Pons and Martin Fleischmann at the University of Utah concluded that they had found evidence of deuterium fusion occurring at room temperature (this was a ‘holy grail’ of energy research). Rather than publishing their findings in a peer-reviewed journal, they announced their findings at a press conference. However, other scientists could not replicate Pons and Fleischmann’s experiments. A few weeks later, the U.S. Department of Energy concluded that Pons and Fleischmann had not achieved cold fusion. Although their work was not considered to be scientific fraud, it was unethical as they did not follow the scientific process. Page 19 of 47 o Ethical Frameworks ❖ The principle of autonomy: making voluntary and informed decisions (i.e. capacity to act intentionally, with understanding, and without controlling influences) ❖ The principle of no maleficence: No subject in a study is intentionally harmed or injured, either through acts of commission or omission ❖ The principle of beneficence: Produce beneficial outcomes & positive steps are taken to prevent and to remove harm from the patient ❖ The principle of justice: Equal access to care, benefits, compensation ❖ The principle of confidentiality: maintaining anonymity and privacy. ❖ The principle of non-deception: maintaining open and truthful communications Current Influences on Scientific Thinking o Economic o Political o Global Influence of Ethical Frameworks on Scientific Research o Human Research ❖ Human experimentation refers to scientific investigations of humans (it excludes studies in other areas, such as social science, education, etc.). Human experimentation may involve manipulation (e.g. clinical trials), or be purely observational. The history of human experimentation is a mixed one. Although our knowledge of human biology advanced in leaps and bounds through experimentation, Page 20 of 47 many of those studies will not be conducted in the original manner today. ❖ For example, vaccination is a powerful medical therapy that has improved the human condition worldwide. The English doctor, Edward Jenner, is credited with the first scientific demonstration of vaccination (strictly speaking, Jenner’s method is called variolation). However, by today’s standards, Jenner’s studies on variolation would be considered to be unethical. ❖ When the details of the Tuskegee Study of Untreated Syphilis experiment was revealed to the public, it raised an uproar internationally. In this study, 399 syphilitic 201 non-syphilitic African-American men were part of a study to determine the physiological effects of syphilis infections on humans. After reading the synopsis of the research on the website indicated in the slide, you may notice the following: ▪ The participants were not told that they were involved in a human experiment. ▪ The participants were given inducements to take part in the experiment. ▪ The participants were denied treatment for syphilis infections, even though the treatment became available during the study. ❖ All human experimentation in Australia is governed by the ethical frameworks developed by the National Health and Medical Research Council, also referred to as the NHMRC. These ethical frameworks are based on the same principles of universal ethics. The four key frameworks are: ❖ In Australia, at institutions that undertake human research, all research proposals must be approved by the HREC. Human research cannot be conducted without an ethics permit is not obtained. ❖ HERCs consist of researchers, non-researchers and community members. ❖ The HRECs evaluate applications based on: Page 21 of 47 1. How is the research question/theme identified or developed? 2. How do the research methods align with the research aims? 3. How will the researchers and the participants engage with one another? 4. How will the research data or information be collected, stored, and used? 5. How will the results or outcomes be communicated? 6. What will happen to the data and information upon completion of the project? o Experimentation on Animals ❖ Several ethical frameworks also govern the use of animals in research. Animals are used in many areas of study. The reasons for using animals in experimentation are two-fold: 1. To learn about the biology and behaviour of the animals themselves – this is important for veterinary science, agriculture, management of wildlife (e.g. zoos, aquaria, parks) and conservation. 2. To use animals as models of human biology – since there are many biochemical, physiological and genetic similarities between many animals and humans, animal models can provide a wealth of information about how human biology works. This is described further in the next slide. ❖ There are several reasons why animals are used for biomedical research: 1. Animals are biologically very similar to humans (mice and humans share more than 98% genetic similarity) 2. Animals are susceptible to many of the same health problems as humans – cancer, diabetes, heart disease, etc. 3. With a shorter life cycle than humans, animal models can be studied throughout their whole life span and across several generations. Page 22 of 47 ❖ There are both scientific and ethical imperatives for looking after animals in research. Animals that are not cared for usually experience stress, which tend to affect other physiological systems. This may lead to anomalous results, as well as results that are not reproducible. ❖ The ethical frameworks for animal experimentation apply to vertebrate animals. These are animals with backbones (fish, amphibians, reptiles, birds and mammals). This is because vertebrates can experience pain, while current evidence suggests that invertebrates do not experience pain. ❖ In Australia, individual states and territories are responsible for overseeing animal ethics. As for human experimentation, institutions must have Animal Ethics Committees to review all research involving vertebrate animals. Researchers must obtain ethics permits to conduct their research. ❖ AECs focus on the care of animals in experiments, as well as the disposal of animals after the investigations are completed. There are strict guidelines on how animals should be housed, fed, cleaned and maintained during the investigations. If animals are to be euthanised, then the researchers must use approved methods to kill the animals. These methods have been approved by expert committees so as to reduce the stress and pain burden on animals. ❖ The 3R rule was devised to reduce reliance on the use of animals in experiments: ▪ Replacement of animals with other methods – where possible, viable alternatives to the use of animals should be explored. ▪ Reduction in the number of animals used – researchers should use the minimum number of animals in their experiments. There are statistical models that they can use to determine the minimum number of animals that can be used in an experiment without affecting the reliability and validity of their findings. ▪ Refinement of techniques used to minimise the adverse impact on animals – researchers should always use the latest findings regarding the manipulation of animals so as to minimise pain and stress on them. ❖ This is a pathway for the discovery of new medical treatments. In vitro and in silico refer to studies that are conducted with cell or tissue cultures, or to discoveries made with molecular arrays and computer simulations. Small animal research is usually the starting point for the development of a new therapeutic. Once it is proven, then the therapeutic is tested on large animals. Large animal studies provide important information about physiological responses to the therapeutic. Only when its potential is realised in large animals will the product be used in human experiments (clinical trials). Both animal and human ethics apply at multiple stages in this discovery pipeline. Page 23 of 47 o Biobanks ❖ Biomedical researchers often need access to biological samples for their experiments. In some types of research, they may need access to samples from specific sectors of the population (e.g. for disease tissues). Biobanks are repositories of biological samples that researchers can use. To ensure that researchers have access to all relevant information about the samples in the biobanks, there is a detailed cataloguing process for all samples. ❖ Information about the type, origin, date of collection and other information are kept in a database. The tissue samples are stored in various ways, but usually in cold storage (-70oC or liquid nitrogen). One example of the use of biobank materials is the research conducted by the Kathleen Cuningham Foundation National Consortium for Research on Familial Breast Cancer (KConFaB). ❖ There, the researchers are looking to determine the: ▪ population rates of mutations in breast cancer genes; ▪ kinds of mutations that predispose to breast and ovarian cancer; ▪ risk of breast and other types of cancer; ▪ age at which cancers occur; and ▪ effect of lifestyle and environmental factors on the risk of developing cancer and age of onset. ❖ More than 100 research projects worldwide rely on samples from this biobank. ❖ Many ethical issues surround the collection, maintenance and use of the materials in biobanks. Some of those issues are listed on this slide. ▪ Informed consent – all donors must be willing participants who have been informed about how their donated samples may be used. Proper communications should also be set up between the biobanks and the donors. Most importantly, no one must be compelled to donate samples. Vulnerable donors, including those who cannot make informed decisions, need to be respected and protected. Where relevant, cultural sensitivities must be taken into account. ▪ The information contained in biobanks must be treated confidentially. The privacy of donors must be respected. Most biobanks have coded identification systems where Page 24 of 47 confidential information is not released, except as required by the researchers. ▪ Some research activities may result in commercially- valuable discoveries. Biobanks must ensure that the benefits of research and development are shared with all people involved, in accordance with the law. Use of Research Data o The communication of research findings is at the core of the scientific enterprise. Research data, once verified, are the raw materials of scientific knowledge construction. For scientists, the generation and publication of scientific data are the hallmarks of professionalism. Peer-reviewed publications are the primary means of such communications, but scientists also use other forms of communications. Most research funding agencies, such as the Australian Research Council and the National Health and Medical Research Council require scientists to publish the data, they generate through funded research programs. When applying for such funding, scientists have to indicate how they plan to publicise their findings. This creates transparency, and the community can see the benefits of supporting scientific research. o Sometimes, research may produce data that should not be openly shared. For example, research that has military or security implications, or data that may be used to achieve harmful ends (e.g. bioterrorism) should not be published. Furthermore, discoveries with commercial potential will not be published (at least, critical data may not be revealed). o Data sharing is beneficial, as it: ❖ Encourages further scientific enquiry and promotes innovation. ❖ Leads to new collaborations between data users and data creators. ❖ Maximises transparency and accountability. ❖ Reduces the cost of duplicating data collection. o The ethics of data sharing centres on the following questions: 1. What data or information are required to achieve the objectives of the project? 2. How and by whom will the data or information be generated, collected and accessed? 3. How and by whom will the data or information be used and analysed? 4. Will the data or information be disclosed or shared and, if so, with whom? 5. How will the data or information be stored and disposed of? 6. What are the risks associated with the collection, use and management of data or information and how can they be minimised? 7. What is the likelihood and severity of any harm/s that might result? Page 25 of 47 The Scientific Research Proposal Notes Page 26 of 47 Table of Contents Developing the Question & Hypothesis............................................................................... 28 Reliability, Validity & Accuracy........................................................................................ 28 What makes a source 'reliable'?...................................................................................... 28 Using citations to locate other relevant journal articles.................................................... 29 How to find more full-text articles................................................................................... 29 Scientific Research Proposal................................................................................................ 30 Plan to Investigate the Scientific Hypothesis.................................................................... 30 Referencing Protocols..................................................................................................... 30 Methodology & Data Collection.......................................................................................... 31 Uncertainty in Experimental Evidence............................................................................. 31 Use of Errors................................................................................................................... 31 Quantitative & Qualitative Research Methods................................................................. 32 Methods Used to Obtain Large Data Sets......................................................................... 32 Processing Data for Analysis............................................................................................... 34 Impact of Kepler Telescope Data..................................................................................... 34 Page 27 of 47 Developing the Question & Hypothesis Reliability, Validity & Accuracy ▪ When a scientist repeats an experiment with a different group of people or a different batch of the same chemicals and gets very similar results then those results are said to be reliable. Reliability is measured by a percentage – if you get exactly the same results every time then they are 100% reliable. Reliability ▪ Try holding a ruler above a friend’s open hand and dropping it – they have to catch the ruler but may not move until they see the ruler start to move. Note down the measurement where the ruler was caught. Do this ten times and calculate the mean (average) result. ▪ Is the ‘dropping a ruler’ experiment a reliable measure of reaction time? ▪ Validity describes whether the results of an experiment really do measure the concept being tested. Does seeing how far a ruler can drop through Validity someone’s hand really measure reaction time? What other variables may be influencing the results? ▪ Is the ‘dropping a ruler’ experiment a valid measure of reaction time? ▪ Accuracy describes how well a measuring instrument determines the variable it is measuring. It can be employed in two ways ▪ An accurate measuring instrument, say a thermometer, is one whose readings confirm a known result. ▪ The level of accuracy of a measuring instrument determines the detail to which it can measure. A micrometer measures length to a greater level of accuracy than a ruler which in turn measures length to a greater level of Accuracy accuracy than a ‘clicker’ wheel. ▪ In order to be accurate in their work scientists need to first select a measuring instrument that allows an appropriate measure of accuracy (e.g. a micrometer for the diameter of a piece of wire and a ruler marked in mm for its length and then to calibrate it. Calibrating an instrument involves measuring already known quantities too assess how accurately it is working). What makes a source 'reliable'? o I would be looking for journal articles which: ❖ Are published in a high ranking journal (usually ranked according to how many citations articles in that journal have. See discussion at https://en.wikipedia.org/wiki/Journal_ranking). ❖ Have a large number of citations (depending on how old the article is) - meaning that other researchers refer to this paper in their own research articles. The number of citations is a measure of how highly regarded the work is by other researchers in the field. ❖ Whose authors are from well-regarded universities/institutions (but this is not really as important as the first two). ❖ There may also be circumstances in which reliable information is available on a website is maintained by a reputable institution such Page 28 of 47 as NASA (for example, in my sample investigation, the NASA earth observatory). Using citations to locate other relevant journal articles o When you find a highly reliable and relevant article, look at both the articles it cites as well as articles which cite that article to find other highly relevant articles. o You can find the articles that cite your article using, for example, google scholar. When I search for my review article on "the albedo of earth" by Stehpens et al. on google scholar I can see that it has been cited by 30 other articles. I click on the "citing literature" link to find those articles. In this way you 'follow your nose' through the research until you find the information you want and/or come up against the edge of what is known, with articles published in the last year or two. How to find more full-text articles o You can consider using the browser add-on "Unpaywall" which finds legal fulltext versions of articles (often stored in university repositories of their researchers work or in preprint archives). If you are having trouble finding a full-text version of an article you can also sometimes find the researchers own personal webpage which may list full-text versions (for example I keep full-text versions of my articles available on my personal website) as it is in the researcher's interests that their articles be accessible. o As mentioned on the the "tools" page, joining the state and national libraries may also assist you find full-text papers. Page 29 of 47 Scientific Research Proposal Plan to Investigate the Scientific Hypothesis o The Overall Strategy o Methodology o Data Analysis o Representation & Communication of the Scientific Research o Timelines o Benchmarks Referencing Protocols o APA o Harvard o MLA Page 30 of 47 Methodology & Data Collection Uncertainty in Experimental Evidence Two types: ✓ Offset uncertainty - all measurements are larger or smaller than the "true" value by a constant amount. For example: a thermometer is not in sufficiently good thermal contact with a hot object - the readings are all lower than the "true" temperature of the object. See this paper for an Systematic example of a systematic offset: https://fathomingphysics.nsw.edu.au/wp- Errors content/uploads/2017/07/TEHumphrey_VCalisa_Phys_Teach_vol_52_iss_3_142_2014.pdf ✓ Gain uncertainty - measurements are larger or smaller than the "true" value by a fixed percentage. For example: A long measuring tape is stretched so that all 1cm markings on the ruler are now actually separated by 1.01cm. Each reading will give a value that is 1% higher than the "true" value. Random errors correspond to the "scatter" in experimental data, and will result in readings that are scattered around the "true" value, usually with a normal distribution. The data in the paper we looked at Random earlier (https://fathomingphysics.nsw.edu.au/wp- Errors content/uploads/2017/07/TEHumphrey_VCalisa_Phys_Teach_vol_52_iss_3_142_2014.pdf) also shows significant random error, with data points distributed above and below the line of best fit. Use of Errors Page 31 of 47 Quantitative & Qualitative Research Methods o Qualitative Research: This can be described as research that cannot easily be communicated or understood in numerical terms. It usually involves open ended questions and responses (such as interview questions or case studies). o Quantitative Research: Is an approach for testing objective theories by examining the relationship among variables. The variables can be measured and analysed numerically. o Mixed Methods Research: Contains elements of both types of research. o Different Types of Scientific Inquiry ❖ a commitment to deductive testing (i.e. the idea that experimental/observational evidence determines if a theory can be accepted) ❖ an experimental design that protects against bias ❖ a consideration of alternative explanations of the results ❖ the interpretation of data (either qualitative or quantitative) to produce results that are reproducible and generalisable ❖ a discussion of how the research relates to other work done in that field Methods Used to Obtain Large Data Sets o Remote Sensing ❖ Remote sensing is obtaining information about an area or phenomenon through a device that does not touch the area or phenomenon under study. ❖ Passive remote sensors detect natural energy that is reflected or emitted from an observed object or scene (most commonly, reflected sunlight). For example, a camera or a spectrometer (or your eyes!). ❖ Active remote sensors provide their own energy (electromagnetic radiation) to illuminate the object or scene they are observing, and then detect the radiation that is reflected or backscattered from that object. For example, Radar (Radio detection and ranging) or Lidar (Light detection and ranging) instruments. ❖ Many remote sensing devices are on-board satellites that monitor the Earth from space. o Streamed Data ❖ There are many devices now used in research which can operate in an autonomous or semi autonomous mode in which data is recorded continuously and then "streamed" out of the sensor for further processing and analysis. ❖ While this technology has opened new opportunities in research, it has also brought challenges. With advances in information technology it has become possible to record very large amounts of data in a short time. In some cases, e.g. the SKA discussed below, it is necessary to process the data in real time to reduce the amount of data that is placed in long term storage. In other systems the constraints may be on the communication link from the instrument. For example the Kepler telescope was situated in a sun centred orbit and had a limited Page 32 of 47 capacity (once a month only) radio link back to earth (see: https://www.nasa.gov/mission_pages/kepler/spacecraft/index.html). Similar considerations apply in sensor networks where the communication back to base is limited by the amount of power required. ❖ Other examples, as identified as: ▪ Internet of People (consisting of wearable devices) ▪ Social media ▪ financial transactions ▪ Industrial Internet of Things ▪ Cyberphysical Systems (requiring real-time responses) ▪ Satellite and airborne monitors ▪ National Security ▪ Astronomy ▪ Light Sources ▪ Instruments like the LHC ▪ Sequencers (all involving large volumes of data) ▪ Data Assimilation (where there is sensitivity to latency) ▪ Analysis of Simulation Results ▪ Steering and Control. Page 33 of 47 Processing Data for Analysis Impact of Kepler Telescope Data o The Kepler and K2 missions have provided an unprecedented data set with a precision and duration that will not be rivalled for decades. Even though the data has already contributed to nearly 2,500 scientific publications so far, the scientific community continues to extract new discoveries from the archive data every day. o To help new users understand where there may be important scientific gains left to be made in analysing Kepler data, and to encourage the continued use of the archives, we have prepared a white paper which discusses a non- exhaustive list of 21 important data analysis projects which can be executed using the public data that are readily available in the archives today. Each project contains a link to an issue on the GitHub repository of the white paper where we invite researchers to discuss their ideas or progress towards resolving the challenge. o The studies discussed in the paper show that many of Kepler's contributions still lie ahead of us, owing to the emergence of complementary new data sets, novel data analysis methods, and advances in computing power. Page 34 of 47 The Data, Evidence and Decisions Notes Page 35 of 47 Table of Contents Patterns & Trends............................................................................................................... 38 Data vs. Evidence............................................................................................................ 38 Qualitative vs. Quantitative Data Sets............................................................................. 38 o Content & Thematic Analysis....................................................................................... 38 o Descriptive Statistics................................................................................................... 38 Tools for Data Representation......................................................................................... 38 o Spreadsheets.............................................................................................................. 38 o Graphical Representations........................................................................................... 38 o Models....................................................................................................................... 38 o Digital Technologies.................................................................................................... 38 Limitations of Data Analysis & Interpretation.................................................................. 38 o Quantitative Data........................................................................................................ 38 o Qualitative Data.......................................................................................................... 39 Statistics in Scientific Research............................................................................................ 40 Descriptive Statistics....................................................................................................... 40 o Mean.......................................................................................................................... 40 o Median....................................................................................................................... 40 o Standard Deviation..................................................................................................... 40 Performance Measures................................................................................................... 40 o Error........................................................................................................................... 40 o Accuracy..................................................................................................................... 40 o Precision..................................................................................................................... 40 o Bias............................................................................................................................. 40 o Data Cleansing............................................................................................................ 40 Statistical Tests............................................................................................................... 40 o Student’s t-test........................................................................................................... 40 o Chi-square test............................................................................................................ 41 o F-test.......................................................................................................................... 42 Bivariate Correlation....................................................................................................... 43 o Correlation Coefficient................................................................................................ 43 Correlation vs. Causation................................................................................................ 43 Decisions from Data & Evidence.......................................................................................... 45 Collective & Individual Decision-Making.......................................................................... 45 o Collective Decision-Making.......................................................................................... 45 Page 36 of 47 o Individual Decision-Making.......................................................................................... 46 Impact of New Data on Established Scientific Ideas.......................................................... 46 o Gravitational Waves on General Relativity................................................................... 46 Data Modelling................................................................................................................... 47 Data Modelling Techniques............................................................................................. 47 o Predictive.................................................................................................................... 47 o Statistical.................................................................................................................... 47 o Descriptive.................................................................................................................. 47 o Graphical.................................................................................................................... 47 Page 37 of 47 Patterns & Trends Data vs. Evidence o Data is just data and has no intrinsic meaning on its own. o Evidence has to be evidence for or of something; an argument, an opinion, a viewpoint or a hypothesis. Qualitative vs. Quantitative Data Sets o Content & Thematic Analysis ❖ Content analysis is the study of documents and communication artefacts, which might be texts of various formats, pictures, audio or video. Social scientists use content analysis to examine patterns in communication in a replicable and systematic manner. ❖ Thematic analysis is one of the most common forms of analysis within qualitative research. It emphasizes identifying, analysing and interpreting patterns of meaning within qualitative data. o Descriptive Statistics ❖ A descriptive statistic is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics is the process of using and analysing those statistics. Tools for Data Representation o Spreadsheets o Graphical Representations o Models ❖ Physical, computational and/or mathematical o Digital Technologies Limitations of Data Analysis & Interpretation o Data analysis and interpretation is that the method of assigning meaning to the data collected and determining the conclusions, significance, and implications of the findings. it’s a crucial and exciting step within the process of research. In most of the research studies, analysis follows data collection. o There are two main methods in the interpretation of data. o Quantitative Data ❖ Quantitative data is statistical and is usually structured in nature meaning it’s more rigid and defined. This kind of data is measured using values and numbers, which makes it a more suitable candidate for data analysis. Page 38 of 47 ❖ E.g. ▪ Experiments ▪ Surveys ▪ Metrics ▪ Tests o Qualitative Data ❖ Qualitative data is non-statistical and is usually unstructured or semi-structured in nature. This data isn’t necessarily measured using hard numbers that are used to develop graphs and charts. Instead, it’s categorized as supported properties, attributes, labels, and other identifiers. ❖ E.g. ▪ Symbols and Images ▪ Video and audio recordings ▪ Texts and documents ▪ Observations and notes o There are many issues that researchers should be aware of with respect to data analysis. Some of those issues are as follows. ❖ Having the necessary skills to analyse ❖ Simultaneously selecting data collection methods and appropriate analysis ❖ Drawing unbiased conclusion ❖ Unsuitable subgroup analysis ❖ Lack of clearly defined and objective outcome calculation ❖ Providing honest and exact analysis ❖ Data recording process ❖ Split up ‘text’ when analysing qualitative data ❖ accuracy, authenticity and Validity Page 39 of 47 Statistics in Scientific Research Descriptive Statistics o Mean o Median o Standard Deviation Performance Measures o Error o Accuracy o Precision o Bias ❖ Bias is any trend or deviation from the truth in data collection, data analysis, interpretation and publication which can cause false conclusions. Bias can occur either intentionally or unintentionally. o Data Cleansing ❖ Data cleansing is the process of detecting and correcting corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data. Statistical Tests o Student’s t-test ❖ Student’s t-test, in statistics, is a method of testing hypotheses about the mean of a small sample drawn from a normally distributed population when the population standard deviation is unknown. ❖ The t distribution is a family of curves in which the number of degrees of freedom (the number of independent observations in the sample minus one) specifies a particular curve. As the sample size (and thus the degrees of freedom) increases, the t distribution approaches the bell shape of the standard normal distribution. In practice, for tests involving the mean of a sample of size greater than 30, the normal distribution is usually applied. ❖ It is usual first to formulate a null hypothesis, which states that there is no effective difference between the observed sample mean and the hypothesized or stated population mean—i.e., that any measured difference is due only to chance. ❖ In an agricultural study, for example, the null hypothesis could be that an application of fertilizer has had no effect on crop yield, and an experiment would be performed to test whether it has increased the harvest. In general, a t-test may be either two-sided (also Page 40 of 47 termed two-tailed), stating simply that the means are not equivalent, or one-sided, specifying whether the observed mean is larger or smaller than the hypothesized mean. The test statistic t is then calculated. If the observed t-statistic is more extreme than the critical value determined by the appropriate reference distribution, the null hypothesis is rejected. The appropriate reference distribution for the t-statistic is the t distribution. The critical value depends on the significance level of the test (the probability of erroneously rejecting the null hypothesis). ❖ For example, suppose a researcher wishes to test the hypothesis that a sample of size n = 25 with mean x = 79 and standard deviation s = 10 was drawn at random from a population with mean μ = 75 and unknown standard deviation. Using the formula for the t- statistic, ❖ The calculated t equals 2. For a two-sided test at a common level of significance α = 0.05, the critical values from the t distribution on 24 degrees of freedom are −2.064 and 2.064. The calculated t does not exceed these values; hence the null hypothesis cannot be rejected with 95 percent confidence. (The confidence level is 1 − α.) ❖ A second application of the t distribution tests the hypothesis that two independent random samples have the same mean. The t distribution can also be used to construct confidence intervals for the true mean of a population (the first application) or for the difference between two sample means (the second application). o Chi-square test ❖ A chi-square (χ2) statistic is a test that measures how expectations compare to actual observed data (or model results). The data used in calculating a chi-square statistic must be random, raw, mutually exclusive, drawn from independent variables, and drawn from a large enough sample. For example, the results of tossing a coin 100 times meet these criteria. ❖ There are two main kinds of chi-square tests: the test of independence, which asks a question of relationship, such as, "Is there a relationship between gender and SAT scores?"; and the goodness-of-fit test, which asks something like "If a coin is tossed 100 times, will it come up heads 50 times and tails 50 times?" Page 41 of 47 ❖ For these tests, degrees of freedom are utilized to determine if a certain null hypothesis can be rejected based on the total number of variables and samples within the experiment. ❖ For example, when considering students and course choice, a sample size of 30 or 40 students is likely not large enough to generate significant data. Getting the same or similar results from a study using a sample size of 400 or 500 students is more valid. ❖ In another example, consider tossing a coin 100 times. The expected result of tossing a fair coin 100 times is that heads will come up 50 times and tails will come up 50 times. The actual result might be that heads will come up 45 times and tails will come up 55 times. The chi-square statistic shows any discrepancies between the expected results and the actual results. o F-test ❖ An F statistic is a value you get when you run an ANOVA test or a regression analysis to find out if the means between two populations are significantly different. It’s similar to a T statistic from a T-Test; A-T test will tell you if a single variable is statistically significant and an F test will tell you if a group of variables are jointly significant. ❖ Simply put, if you have significant result, it means that your results likely did not happen by chance. If you don’t have statistically significant results, you throw your test data out (as it doesn’t show anything!); in other words, you can’t reject the null hypothesis. ❖ You can use the F statistic when deciding to support or reject the null hypothesis. In your F test results, you’ll have both an F value and an F critical value. ▪ The F critical value is also called the F statistic. ▪ The value you calculate from your data is called the F value (without the “critical” part). ❖ In general, if your calculated F value in a test is larger than your F statistic, you can reject the null hypothesis. However, the statistic is only one measure of significance in an F Test. You should also consider the p value. The p value is determined by the F statistic and is the probability your results could have happened by chance. ❖ The F statistic must be used in combination with the p value when you are deciding if your overall results are significant. Why? If you have a significant result, it doesn’t mean that all your variables are significant. The statistic is just comparing the joint effect of all the variables together. ❖ For example, if you are using the F Statistic in regression analysis (perhaps for a change in R Squared, the Coefficient of Determination), you would use the p value to get the “big picture.” 1. If the p value is less than the alpha level, go to Step 2 (otherwise your results are not significant, and you cannot reject the null hypothesis). A common alpha level for tests is 0.05. Page 42 of 47 2. Study the individual p values to find out which of the individual variables are statistically significant. Bivariate Correlation o Correlation Coefficient ❖ The Pearson product-moment correlation coefficient is a measure of the strength of the linear relationship between two variables. It is referred to as Pearson's correlation or simply as the correlation coefficient. If the relationship between the variables is not linear, then the correlation coefficient does not adequately represent the strength of the relationship between the variables. ❖ The symbol for Pearson's correlation is "ρ" when it is measured in the population and "r" when it is measured in a sample. Because we will be dealing almost exclusively with samples, we will use r to represent Pearson's correlation unless otherwise noted. ❖ Pearson's r can range from -1 to 1. An r of -1 indicates a perfect negative linear relationship between variables, an r of 0 indicates no linear relationship between variables, and an r of 1 indicates a perfect positive linear relationship between variables. Correlation vs. Causation ❖ Correlation is a statistical measure (expressed as a number) that describes the size and direction of a relationship between two or more variables. A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable. ❖ Causation indicates that one event is the result of the occurrence of the other event; i.e. there is a causal relationship between the two events. This is also referred to as cause and effect. ❖ Theoretically, the difference between the two types of relationships are easy to identify — an action or occurrence can cause another (e.g. smoking causes an increase in the risk of developing lung cancer), or it can correlate with another (e.g. smoking is correlated with alcoholism, but it does not cause alcoholism). In practice, however, it remains difficult to clearly establish cause and effect, compared with establishing correlation. ❖ The objective of much research or scientific analysis is to identify the extent to which one variable relates to another variable. For example: ▪ Is there a relationship between a person's education level and their health? ▪ Is pet ownershi

Use Quizgecko on...
Browser
Browser