MAE 541 Foundation of Data Analysis PDF

Document Details

CapableAwareness299

Uploaded by CapableAwareness299

UiTM

Noor Maizatul Nazuha Mohamad

Tags

data analysis statistics inferential statistics descriptive statistics

Summary

This document is a set of lecture notes for a course on data analysis. It covers fundamental concepts in statistics with specific attention paid to descriptive and inferential statistics. The material explores various types of statistical analysis as well as the different levels of measurement. The document introduces concepts, such as types of variables.

Full Transcript

MAE 541 FOUNDATION OF DATA ANALYSIS CHAPTER 1 NOOR MAIZATUL NAZUHA MOHAMAD LEARNING OUTCOMES: By the end of this topic, you will be able to: Understand the meaning of statistics Distinguish between descriptive and inferential statistics. Understand the meaning of inferen...

MAE 541 FOUNDATION OF DATA ANALYSIS CHAPTER 1 NOOR MAIZATUL NAZUHA MOHAMAD LEARNING OUTCOMES: By the end of this topic, you will be able to: Understand the meaning of statistics Distinguish between descriptive and inferential statistics. Understand the meaning of inferential statistics. Understand about the population and samples. Understand about the variable. Understand the scales of measurements. NOOR MAIZATUL NAZUHA MOHAMAD WHATS IS STATISTICS? Statistic is the science of : Interpreting/ Collecting Organizing Presenting Analyzing Decision Data Data Data Data making NOOR MAIZATUL NAZUHA MOHAMAD PURPOSE/USES OF STATISTICS 1. Statistics helps in providing a better understanding and accurate description of nature’s phenomena. 2. Statistics helps in the proper and efficient planning of a statistical inquiry in any field of study. 3. Statistics helps in collecting appropriate quantitative data. 4. Statistics helps in presenting complex data in a suitable tabular, diagrammatic and graphic form for an easy and clear comprehension of the data. 5. Statistics helps in drawing valid inferences, along with a measure of their reliability about the population parameters from the sample data. NOOR MAIZATUL NAZUHA MOHAMAD THE IMPORTANCE OF STATISTICS ❖Be able to read, understand and make sense the data obtained ❖Be able to design the experiments, data collection, data analysis, summarization and possibly make reliable predictions or forecast for future use. ❖Use the knowledge in statistics to become good researcher because they have to i- conduct research ii – read & evaluate journal articles iii – further develop critical thinking and analytic skills iv – know when you need to hire outside statistical help TYPES OF STATISTICS Types of Statistics Descriptive Inferential ✓ Describe and summarize ✓ Make inference from sample characteristics populations. ✓ Consist of collecting, organizing, ✓ Involve statistical tests summarizing and presentations of ✓ Results used to make conclusion data. ✓ Examples: ✓ Example: Hypothesis testing (t-test, z-test, Percentage, mean, median, Bar chart, forecasting, regression) pie chart, frequency table NOOR MAIZATUL NAZUHA MOHAMAD Inferential Statistic oInferential statistics are methods for using sample data to make general conclusions (inferences) about populations. oBecause a sample is typically only a part of the whole population, sample data provide only limited information about the population. As a result, sample statistics are generally imperfect representatives of the corresponding population parameters. oThe purpose of inferential statistic are: ✓To analyze the relationship between variables by using statistical test. ✓To allow us to make inferences and assumption about data that we collected in a larger group. * we cannot collect data on a very larger group, so we will collect from smaller sample. NOOR MAIZATUL NAZUHA MOHAMAD Population and Samples Population - A set of all items being studied. The items can be humans, animals, things or others. Samples - Subset or part of population EXAMPLE 1 A substitute teacher wants to know how students in the class did on their last test. The teacher asks the 10 students sitting in the front row to state their latest test score. He concludes from their report that the class did extremely well. What is the sample? What is the population? Can you identify any problems with choosing the sample in the way that the teacher did? EXAMPLE 2 A coach is interested in how many cartwheels the average college freshmen at his university can do. Eight volunteers from the freshman class step forward. After observing their performance, the coach concludes that college freshmen can do an average of 16 cartwheels in a row without stopping. What is the sample? What is the population? Can you identify any problems with choosing the sample in the way that the coach did? NOOR MAIZATUL NAZUHA MOHAMAD Simple Random Sampling Simple random sampling requires every member of the population to have an equal chance of being selected into the sample. In addition, the selection of one member must be independent of the selection of every other member. That is, picking one member from the population must not increase or decrease the probability of picking any other member (relative to the others). In this sense, we can say that simple random sampling chooses a sample by pure chance. EXAMPLE 3 A research scientist is interested in studying the experiences of twins raised together versus those raised apart. She obtains a list of twins from the National Twin Registry, and selects two subsets of individuals for her study. First, she chooses all those in the registry whose last name begins with Z. Then she turns to all those whose last name begins with B. Because there are so many names that start with B, however, our researcher decides to incorporate only every other name into her sample. Finally, she mails out a survey and compares characteristics of twins raised apart versus together. 1. What is the population? 2. What is the sample? 3. Was the sample picked by simple random sampling? 4. Is it biased? NOOR MAIZATUL NAZUHA MOHAMAD In Example 3, the population consists of all twins recorded in the National Twin Registry. It is important that the researcher only make statistical generalizations to the twins on this list, not to all twins in the nation or world. That is, the National Twin Registry may not be representative of all twins. Even if inferences are limited to the Registry, a number of problems affect the sampling procedure we described. For instance, choosing only twins whose last names begin with Z does not give every individual an equal chance of being selected into the sample. Moreover, such a procedure risks over-representing ethnic groups with many surnames that begin with Z. There are other reasons why choosing just the Z's may bias the sample. Perhaps such people are more patient than average because they often find themselves at the end of the line! The same problem occurs with choosing twins whose last name begins with B. An additional problem for the B's is that the “every-other-one” procedure disallowed adjacent names on the B part of the list from being both selected. Just this defect alone means the sample was not formed through simple random sampling. NOOR MAIZATUL NAZUHA MOHAMAD Stratified Sampling Since simple random sampling often does not ensure a representative sample, a sampling method called stratified random sampling is sometimes used to make the sample more representative of the population. This method can be used if the population has a number of distinct “strata” or groups. In stratified sampling, you first identify members of your sample who belong to each group. Then you randomly sample from each of those subgroups in such a way that the sizes of the subgroups in the sample are proportional to their sizes in the population. Let's take an example: Suppose you were interested in views of capital punishment at an urban university. You have the time and resources to interview 200 students. The student body is diverse with respect to age; many older people work during the day and enroll in night courses (average age is 39), while younger students generally enroll in day classes (average age of 19). It is possible that night students have different views about capital punishment than day students. If 70% of the students were day students, it makes sense to ensure that 70% of the sample consisted of day students. Thus, your sample of 200 students would consist of 140 day students and 60 night students. The proportion of day students in the sample and in the population (the entire university) would be the same. Inferences to the entire population of students at the university would therefore be more secure. NOOR MAIZATUL NAZUHA MOHAMAD Simple Random Stratified Sampling Sampling NOOR MAIZATUL NAZUHA MOHAMAD Variables ❑Variables are properties or characteristics of some event, object, or person that can take on different values or amounts. ❑Example: An experimenter might compare the effectiveness of four types of antidepressants. In this case, the variable is “type of antidepressant.” When a variable is manipulated by an experimenter, it is called an independent variable. The experiment seeks to determine the effect of the independent variable on relief from depression. In this example, relief from depression is called a dependent variable. ❑In general, the independent variable is manipulated by the experimenter and its effects on the dependent variable are measured. NOOR MAIZATUL NAZUHA MOHAMAD Examples Can blueberries slow down aging? A study indicates that antioxidants found in blueberries may slow down the process of aging. In this study, 19-month-old rats (equivalent to 60-year-old humans) were fed either their standard diet or a diet supplemented by either blueberry, strawberry, or spinach powder. After eight weeks, the rats were given memory and motor skills tests. Although all supplemented rats showed improvement, those supplemented with blueberry powder showed the most notable improvement. 1. What is the independent variable? 2. What are the dependent variables? Solution: NOOR MAIZATUL NAZUHA MOHAMAD Example #2: Does beta-carotene protect against cancer? Beta-carotene supplements have been thought to protect against cancer. However, a study published in the Journal of the National Cancer Institute suggests this is false. The study was conducted with 39,000 women aged 45 and up. These women were randomly assigned to receive a beta-carotene supplement or a placebo, and their health was studied over their lifetime. Cancer rates for women taking the beta-carotene supplement did not differ systematically from the cancer rates of those women taking the placebo. Example #3: How bright is right? An automobile manufacturer wants to know how bright brake lights should be in order to minimize the time required for the driver of a following car to realize that the car in front is stopping and to hit the brakes. NOOR MAIZATUL NAZUHA MOHAMAD TYPES OF VARIABLES Variable Quantitative Qualitative or Categorical - Measured on numerical scale -Measured on non-numerical scale ▪ Assume only exact values ▪ express a qualitative attribute such as hair ▪ Numerical responses from counting process color, eye color, religion, favorite movie, Discrete ▪ Examples: gender, and so on number of students, total income, number of ▪ The values of a qualitative variable do not accidents, etc. imply a numerical ordering. ▪ Qualitative variables are sometimes referred ▪ Can be expressed in certain degree of to as categorical variables. accuracy Continuous ▪ Numerical responses from measuring process ▪ Examples: length, weight of the children, height of adults, etc. NOOR MAIZATUL NAZUHA MOHAMAD Level of Measurements 1) Nominal Scale - figurative labeling scheme in which the numbers serve only as labels or tags for identifying and classifying objects. Data are classified into categories and the frequency of each category is counted. Lowest scale of measurement. Eg: Jersey number, Identification number, Color of car. 2) Ordinal Scale - ranking scale in which numbers are assigned to objects to indicate the relative extent to which the objects possess some characteristic. - allow to determine whether an object has more or less of a characteristic than some other object, but not how much more or less. Thus, an ordinal scale indicates relative position not the magnitude of the differences between the objects. Eg: Education level (PhD, Master, Bachelor, Diploma), Ranking of football players. 3) Interval Scale – contains all information of an ordinal scale but it allows to compare the differences between objects. Data at this level do not have a natural zero starting point. Eg: Temperature, opinions, shoes size. NOOR MAIZATUL NAZUHA MOHAMAD NOOR MAIZATUL NAZUHA MOHAMAD STATISTICAL TERMS Research Survey A study done using statistical methods in order to understand certain problem. Element Respondent/object on which data is taken. Population A set of all items being studied. The items can be humans, animals, things or others Sample Subset or part of population. Sampling Frame is a list of all members in a population. it can be prepared which is the sample is randomly selected. Examples: list of student’s names in a class, list of companies registered in KLSE or others. Pilot Survey A study done on a small scale before the actual survey. the purpose is to identify any problem that may arise during the actual survey and also to pre-test the relevancy of questionnaires and hence questionnaires can be improved for use in the actual survey. Statistic A summarize numerical value calculated from a sample. Example: The mean age and standard deviation of age of students calculated from a random sample of students. Census is a survey that takes all members in a defined population. the best example of census is population census. census is normally taken if the population to be studied is small Parameters a summary measure/characteristics obtained from population Variables Characteristics of the population under study. NOOR MAIZATUL NAZUHA MOHAMAD EXAMPLE Consider a population of 120,000 students in Terengganu. It was found that the mean height of the student is 148 cm and the variance is 1.5 cm. It also found that the mean height of 1,500 students in Dungun High School is 152 cm and the variance is 2 cm. State the population, sample, element and variables from the above statement. Population - Sample - Element - Variable - NOOR MAIZATUL NAZUHA MOHAMAD

Use Quizgecko on...
Browser
Browser