STA111 Statistics and Data Descriptions PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document provides an introduction to statistics and data descriptions. It covers the meaning of statistics, its importance, and various aspects of data analysis and collection. It's structured as an academic textbook.
Full Transcript
# Statistics and Data Descriptions ## Chapter One ### Statistics and Data Descriptions #### Introduction The word *"statistics"* is derived from the Latin word "status" as well as the Italian word *statistical*. The words mean "political state" or "government". In the past, the use of statistics...
# Statistics and Data Descriptions ## Chapter One ### Statistics and Data Descriptions #### Introduction The word *"statistics"* is derived from the Latin word "status" as well as the Italian word *statistical*. The words mean "political state" or "government". In the past, the use of statistics was limited; it was mainly used by kings and rulers to derive information relating to their area of governance. Statistics were useful to them in understanding and administering populations, commerce, land, wealth, taxation, and other aspects of government. Statistics have developed gradually during the last few centuries. A lot of work was done at the end of the nineteenth century. During the 20th century, several statisticians were active in developing new methods, theories, and applications of statistics. These days, the availability of electronics is certainly a major factor in the modern development of statistics. #### Meaning of Statistics Statistics is a branch of applied mathematics dealing with data collection, organization, analysis, interpretation, and presentation. It is a form of mathematical analysis that uses quantified models, representations, and synopses for a given set of experimental data of real-life cases such as population counts, incomes, ages, etc. Statistics are scientific facts or data of a numerical kind, based on measurements or observations that are assembled, classified, and tabulated so as to present significant information about a given subject. #### Importance and Scope of Statistics Statistical knowledge helps one to use proper methods to collect the data, employ correct analyses, and effectively present the results. Statistics are a crucial process behind how discoveries are made in science, in evidence-based, decisions-making based, and in making predictions. Some of the aspects of human life, where the statistics are practically applicable and important, are: * Planning * Mathematics * Economics * Social Sciences * Trade * Research Work #### Statistics in Planning Statistics are indispensable in planning - be it in business, economics, or the government level. The modern age is termed as the age of planning, and almost all organizations, governments, and businesses are resorting to planning for efficient working and for formulating policies. To achieve this, the statistical data relating to production, consumption, birth, death, investment, income, and many more are paramount importance. #### Statistics in Mathematics Statistics are intimately related to and essentially dependent upon mathematics. The modern theory of statistics has its foundation on the theory of probability, which in turn is a particular branch of more advanced mathematical theory of measures and integration. The ever-increasing role of mathematics into statistics has led to the development of a new branch of statistics called Mathematical Statistics. #### Statistics in Economics Statistics and economics are intermixed with that it looks foolishness to separate them. Development of modern statistical methods has led to an extensive use of statistics in economics. All the important branches of economics (consumption, public finance, production, exchange, distribution) use statistics for the comparison, presentation, interpretation, and decision-making. #### Statistics in Social Science Statistics are used to measure, collate and variation in observations from place to place, object to object and time to time. Statistics tools of regression and correlation analysis are used to study and isolate the effect of each of these factors on the given observations. #### Statistics in Trade Considering that statistics is a body of methods to make wise decisions in the face of uncertainties, it is most important in business and trade, since business is full of uncertainties and risks. Business persons have to forecast and make speculations which can result in gains or losses. Can forecasting be done without taking into view the past and present? Perhaps, no. The future trend of the market can be determined when statistics are used. Failure in forecasting or speculation will mean failure of the business. #### Statistics in Research Work Statistics are greatly used in research endeavors. In reporting research findings, the effect of a variable on a particular problem, under differing conditions, can be known through the use of statistical methods. To keep alive research interests and research activities, the researcher is required to update his/her knowledge and skills in statistical methods. #### Types of Statistics Statistics have majorly categorized into two types: 1. **Descriptive Statistics** 2. **Inferential Statistics** ##### Descriptive Statistics These are methods used to organize, represent, and describe a collection of data. It typically involves using tables, graphs, percentages, the mean, standard deviation, and such others. It is classified into four different categories: * The frequency distribution * Central tendency * Variability of a dataset ##### Inferential Statistics This type of statistics is used to interpret or give meaning to descriptive statistics. This means once the data has been collected, analyzed, and summarized, then the result is used to describe the meaning of the collected data. Inferential statistics is used to draw conclusions from the data that depends on random variations such as observational errors, sampling variation, and such others. It is classified into different categories: * One sample hypothesis test * Confidence interval * Contingency table and chi-square statistic * T-test/Anova * Pearson correlation * Bi-variate regression * Multivariate regression #### Terms and Common Components in Statistical Problem * **Variables(X):** within the context of a research investigation, concepts are generally referred to as a variable. A variable is something that varies. Examples are age, gender, export, income, expenses, family size, country of birth, capital expenditure, class grades, and blood pressure readings. ##### Types of Variables * **Qualitative variables** * **Quantitative variables** * **Discrete variables** * **Continuous variables** * **Dependent variables** * **Independent variables** * **Population:** A population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects or a hypothetical and potentially infinite group of objects conceived from experience. Also, a population is a distinct group of individuals with a common characteristic. In statistics, a population is the pool of individuals or elements from which a statistical sample is drawn for a study. Thus, any selection of individuals grouped together by a common feature can be said to be a population. ##### Types of Population * **Finite Population** * **Infinite Population** * **Existent Population** * **Hypothetical Population** * A **finite population** is a collection of objects or individuals that are subject of a research which occupies a certain area. A finite population has clear boundaries that distinguish it from other population groups. It may also be called a countable population. The number of vehicles crossing a bridge every day, the number of birth per years and the number of wards are examples of finite population. Finite population is denoted by N; where N is the size of the population. * An **infinite population** is a population where the units cannot be counted. Such a population is referred to as infinite or uncountable. Let us suppose that we want to examine the number of germs in the body of a sick patient. This most likely will be uncountable. * The **existent population** can be described as the population of concrete individuals. In other words, the population whose unit is available in solid form is known as existent population. Examples are textbooks, students, animals. * The **hypothetical population** is a theoretically estimated numerical total of a group that does not exist. It has a lot to do with awful work-life balance. Example; if five hundred penguins were moved to Alaska, how many would survive after five years? 458 original penguins with 294 additional hatchings since the transfer. So it is estimated that 42 original penguins would not survive due to starvation, disease, predator attack, or age. * **Sample:** It refers to a smaller, manageable version of a population. It is a subset containing the characteristics of a larger population. Samples are used in statistical testing when population sizes are too large for the test to include all possible members or observations. * **Sample Space:** In probability theory, the sample space of an experiment or random trial is the set of all possible outcomes or results of that experiment. A sample space is usually denoted using a notation, and the possible ordered outcomes are listed as elements in the set. * **Statistical Error:** This is the amount by which an observation differs from its expected value. The latter can be based on the whole population from which the statistical unit was chosen randomly. For example, if the mean of the heights in a population of 27-year old men is 1.75 meters, and are randomly chosen, meaning is 1.80 metres tall, and then the "error" is 0.05 meters, if the randomly chosen mean is 1.70 meters tall, then the error is -0.05 meters. The expected value, being the mean of the entire population, is typically unobservable, and hence the statistical error cannot be observed either. * **Parameter:** A parameter is a useful component of statistical analysis. It refers to the characteristics that are used to define a given population. It is used to describe a specific characteristic of the entire population. For example, parameters such as * $\bar{x}$ represents the sample mean * $\mu$ represents population mean, * σ² represents population variance * $\sigma$ represents population standard deviation * ρ represents population correlation coefficient, * and p represents Spearman's Rank Correlation coefficient. #### Statistical Data: Sources, Types and Methods of Data Collection Data are individual pieces of factual information about an entity, systematically recorded and used for the purpose of analysis. It is the raw information from which statistics are created. Thus, data are facts or figures from which conclusions may be drawn. ##### Sources of Data There are two sources of data in statistics: * **Statistical Sources:** This refers to data that are collected primarily for creating official statistics, and include statistical surveys and censuses. * **Non-Statistical Sources:** This refers to the data that are collected for other administrative purposes or for the private sector. ##### Types of Data There are two types of data: primary data and secondary data * **Primary Data:** This can be described as first-hand information collected by the researcher. The data so collected are pure and original and collected for a specific purpose. They have never undergone any statistical treatment before. The collected data may be published as well. The census of a country or patients' hospital records are examples of primary data. ##### Advantages of Primary Data * They can be used to address targeted issues. * Decency of data * They address specific research issues. * The researcher has greater control over the data. * It eliminates propriety issues. * Data interpretation is better. ##### Disadvantages of Primary Data * It involves high cost to obtain primary data. * It is time consuming. * Subjects or respondents may give inaccurate feedbacks. * More number of resources is required. * **Secondary Data:** These are already existing data that have been collected and published already (either by an individual or by an organization). Secondary data are utilized by surveyors for analysis, to map out trends and make predictions. Secondary data are regarded as 'improved data' in the sense that they have undergone statistical treatment at least once ##### Advantages of Secondary Data * It is inexpensive to gather. * It is easily accessible, since they are in tangible form e.g printed or digital. * It provides alternative research methods to the researcher rather than gathering raw data * Less labour is needed to handle such data. ##### Disadvantages of Secondary Data * There is no originality. * There are not details to the information. * On some subject matters, secondary data, may not be available. #### Methods of Primary Data Collection Primary data are obtained using the following means: * **Personal Investigation:** The surveyor collects the data himself/herself. The data so collected is reliable but is suited for small projects. * **Collection via Investigations:** Trained investigators are employed to contact the respondents to collect data. * **Questionnaire:** A list of questions on a specific subject matter used to elicit responses from respondents. The questionnaire may be mailed or sent online to respondents. * **Telephone investigations:** The collection of data is done by asking respondents questions over the telephone to get quick and accurate information. #### Methods of Secondary Data Collection * **Official publications** such as the reports from Ministry of Finance, Federal Bureaus of Statistics, Agricultural boards, etc. * **Data published by Chambers of Commerce and trade associations and boards.** * **Articles in newspapers, journals, and technical publications** ----- ### Exercises 1. Define statistics extensively. 2. State the importance of statistics to your course of study. 3. What is data? 4. Describe types of statistics and data. 5. Explain data collection.