STA 111: Descriptive Statistics PDF - Kwara State University
Document Details
Uploaded by FruitfulMesa
Kwara State University
Tags
Summary
This document discusses descriptive statistics and its applications across various fields, including computer science, physics, geology, engineering, biological sciences, and economics. It highlights the importance of statistical tools in data analysis, prediction, and decision-making. It also covers examples and applications in each field.
Full Transcript
KWARA STATE UNIVERSITY-MALETE Department of Mathematics and Statistics, Faculty of Pure and Applied Sciences, P.M.B. 1530, Ilorin, Kwara state, Nigeria STA 111: Descriptive Stat...
KWARA STATE UNIVERSITY-MALETE Department of Mathematics and Statistics, Faculty of Pure and Applied Sciences, P.M.B. 1530, Ilorin, Kwara state, Nigeria STA 111: Descriptive Statistics Reference Text: Elementary Statistics: A step by step Approach (Eight Edition) by Bluman Understandable Statistics: Concepts and Methods by C. H Brase and C. P Brase INTRODUCTION Statistics plays a vital role in numerous disciplines by providing tools for data collection, analysis, interpretation, and decision-making. Below is a comprehensive exploration of how statistics is applied across different fields, highlighting its importance and practical use cases. Statistics in Computer Science: In computer science, statistics is essential for designing algorithms, machine learning, data mining, and performance analysis. It helps in managing and interpreting large datasets, often referred to as "big data," to derive insights and make predictions. Applications: 1. Machine Learning and AI: Statistical models form the backbone of machine learning algorithms, where data is used to train models that can predict outcomes. Example: A spam filter in email services uses logistic regression, a statistical method, to classify emails as spam or non-spam. 2. Performance Analysis: Statistics help in analyzing the efficiency and performance of algorithms. Example: Calculating the average execution time of sorting algorithms to determine their efficiency. 3. Natural Language Processing (NLP): Statistical methods are used to process and analyze textual data. Example: Sentiment analysis in social media posts using statistical classification techniques. Statistics in Physics: Physics often deals with experimental data and uncertainties, making statistics crucial for analyzing and interpreting results. Statistical tools help in understanding complex systems and validating theories through empirical data. Applications in Physics: 1. Quantum Mechanics: Probability distributions describe the behavior of particles at a quantum level. Example: The Heisenberg Uncertainty Principle uses statistical concepts to explain the limits of measuring particle position and momentum simultaneously. 2. Experimental Data Analysis: Physicists use statistical methods to analyze experimental data and reduce measurement errors. Example: Analyzing data from the Large Hadron Collider (LHC) to discover new particles. Page 1 of 7 Statistics in Geology: Geologists use statistics to analyze geological data, predict natural disasters, and assess the availability of natural resources. Applications: 1. Mineral Exploration: Statistical models help estimate the probability of finding minerals in a given area. Example: Geostatistical techniques are used to predict the presence of oil reserves. 2. Seismology: Analyzing seismic data to predict earthquakes and understand tectonic movements. Example: Calculating the probability of future earthquakes based on historical data. Statistics in Engineering: Engineering relies on statistics for quality control, reliability testing, and optimization of processes. It helps engineers design systems that meet performance and safety standards. Applications in Engineering: 1. Quality Control: Statistical Process Control (SPC) ensures that products meet quality standards. Example: Monitoring the diameter of manufactured parts to ensure they fall within specified tolerances. 2. Reliability Engineering: Predicting the lifespan of components and systems. Example: Calculating the failure rate of electronic components to design more reliable systems. Statistics in Biological Sciences and Medicine: Statistics is indispensable in biology and medicine for designing experiments, analyzing biological data, and evaluating treatments' effectiveness. Applications: 1. Clinical Trials: Statistical methods evaluate the effectiveness of new drugs or treatments. Example: A randomized controlled trial uses statistics to determine whether a new cancer drug is more effective than existing treatments. 2. Epidemiology: Studying the distribution and determinants of diseases in populations. Example: Analyzing data from COVID-19 cases to predict infection rates and outcomes. 3. Genetics: Analyzing genetic data to understand hereditary diseases. Example: Statistical models identify genes associated with diseases like diabetes. Statistics in Economics: Economists use statistics to analyze economic data, forecast trends, and evaluate policies. It helps in understanding market behavior and economic indicators. Applications: 1. Economic Forecasting: Predicting future economic trends based on historical data. Example: Using time series analysis to forecast inflation rates. 2. Policy Evaluation: Assessing the impact of government policies on the economy. Example: Analyzing the effects of tax cuts on consumer spending. 3. Market Analysis: Understanding consumer behavior and market dynamics. Example: Analyzing survey data to understand demand for new products. Statistics in Banking and Finance: In banking and finance, statistics helps in risk assessment, investment analysis, and portfolio management. It supports decision-making and financial planning. Applications in Banking and Finance: 1. Risk Management: Identifying and quantifying risks associated with investments. Example: Value-at-Risk (VaR) models estimate the potential loss in a portfolio. 2. Credit Scoring: Assessing the creditworthiness of borrowers using statistical models. Example: Banks use logistic regression to predict the likelihood of loan default. 3. Stock Market Analysis: Predicting stock prices and market trends. Example: Statistical arbitrage strategies in quantitative trading. Statistics in Accounting: In accounting, statistics aids in auditing, cost control, and financial analysis. It ensures accuracy and compliance with financial regulations. Applications in Accounting: Page 2 of 7 1. Auditing: Statistical sampling techniques select a representative sample of transactions for auditing. Example: Auditors use random sampling to verify financial statements. 2. Cost Analysis: Estimating costs and identifying cost-saving opportunities. Example: Analyzing production costs to identify inefficiencies. 3. Forecasting: Predicting future financial performance. Example: Using regression analysis to forecast revenue. Statistics in Education: In education, statistics helps in evaluating teaching methods, student performance, and curriculum effectiveness. It supports decision-making in educational policies. Applications in Education: 1. Student Performance Analysis: Analyzing test scores to assess student learning. Example: Identifying factors that affect student performance using regression analysis. 2. Program Evaluation: Evaluating the effectiveness of educational programs. Example: Comparing student outcomes before and after implementing a new teaching method. 3. Survey Research: Collecting data on student and teacher satisfaction. Example: Conducting surveys to assess the quality of online learning. Statistics in Agricultural Sciences: In agriculture, statistics is used to improve crop yields, optimize resource use, and assess the impact of environmental factors. Applications: 1. Experimental Design: Designing experiments to test the effects of fertilizers or pesticides. Example: Comparing crop yields under different irrigation methods using analysis of variance (ANOVA). 2. Yield Prediction: Predicting crop yields based on weather and soil data. Example: Using regression models to forecast wheat production. 3. Resource Optimization: Allocating resources efficiently to maximize output. Example: Optimizing fertilizer use to reduce costs and increase yields. Statistics in Food Science: In food science, statistics helps in quality control, sensory evaluation, and product development. It ensures food safety and compliance with standards. Applications in Food Science: 1. Quality Control: Monitoring the quality of food products during production. Example: Using control charts to monitor the weight of packaged products. 2. Sensory Evaluation: Analyzing consumer preferences and product acceptability. Example: Conducting taste tests to compare different flavors of a product. 3. Shelf Life Studies: Estimating the shelf life of food products. Example: Analyzing microbial growth data to determine expiration dates. Descriptive statistics is a branch of statistics that focuses on summarizing, organizing, and analyzing data to provide meaningful insights. It involves methods that describe and visualize data through numerical measures, tables, and graphical representations. Unlike inferential statistics, which makes predictions or inferences about a population based on a sample, descriptive statistics only describes the data at hand. This course will equip students with the fundamental tools needed to explore and summarize datasets effectively. Key concepts in descriptive statistics include: Measures of central tendency (mean, median, and mode). Measures of dispersion (range, variance, and standard deviation). Data visualization techniques (charts, histograms, and scatter plots). Page 3 of 7 DATA Today, we work with laboratory equipment that is continuously producing huge data in the high- dimension and large quantity. However, without an understanding of statistics which is the techniques required to analyse, summarize and interpret these data, we are very limited in what we can learn from our observations, which will in turn hinder our ability to move forward in our research, decision making and planning. Even with experiments that generate very little data, there is a need to simulate phenomena by modeling the behaviour of systems and their parameters, which again often needs to be done statistically. It is therefore imperative that we understand some basics concepts of statistics in our field. The knowledge of statistics in various fields assists in the following. i) Enables one to read and understand the various statistical studies performed in your fields. To have this understanding, you must be knowledgeable about the vocabulary, symbols, concepts, and statistical procedures used in these studies. ii) Allows you to conduct research in your field. Since statistical procedures are basic to research. To accomplish this, you must be able to design experiments; collect, organize, analyse, and summarize data; and possibly make reliable predictions or forecasts for future use. You must also be able to communicate the results of the study in your own words. iii) You can also use the knowledge gained from studying statistics to become better consumers and citizens. For example, you can make intelligent decisions about what products to purchase based on consumer studies, about government spending based on utilization studies, and so on. Statistical data are the basic raw materials for statistical investigation. Information is essentially referred to as data in Statistics. In everything we do, we seek information to guide us in all our activities. In fact, activities we embark upon today will provide information to guide us better in executing similar activities in (subsequent days) future activities. However, gathering information may be formal or informal. Formal Gathering of Information involves documented information in which every bit of what has been observed in the past or what is being observed currently is expected to be kept in its original (or raw) form. Informal Gathering of Information involves information about experiences in the past which were not immediately captured. It may not always provide desired level of information that is equivalent to complete retrieval as in the formal method of gathering information. DATA: Data are the values (measurements or observations) that a variable such as age, weight, height, exam scores, shoe size etc. can assume. On the other hand, Biological data are data or measurements collected from biological sources, which are often stored or exchanged in a digital form. Biological data are commonly stored in files or databases. Page 4 of 7 Examples of biological data include; ▪ Sequences: DNA, RNA, Protein ▪ Structures of biological Molecules ▪ Gene expressions profiles ▪ Biochemical pathway ▪ Chromosomal mapping ▪ Phylogenetic data ▪ Single Nucleotide Polymorphisms (SNPs), Etc. The challenge thus lies in the use of statistical methods in analysing and making meaningful inference for immediate and future use using some biological data. DATA COLLECTION There are two main source of data collection in statistics namely; - Primary source - Secondary source Primary source of data Data from primary source are datasets obtained directly from the concerned object. Primary sources of data provide data compiled as a result of population count or results obtained from a sample of the population where the population is too large for individual count. Primary sourced data can be collected either by i. Direct personal observations (e.g. Laboratory experiments) ii. Personal interview iii. Mailed questionnaire iv. Questionnaires administered by enumerators v. Direct interview by people Advantages of Primary source of data i. It supplies exact information ii. It gives more reliable data than the secondary source iii. It gives detailed data than the secondary source Disadvantages of Primary source of data i. It is very expensive ii. It takes time iii. It may involve large non-responses. Secondary source of data Secondary source provide data readily available or previously used data from administrative sources such as journals, newspapers, databases, and official compilations etc. Page 5 of 7 Advantages of secondary source of data i. It gives quicker information than the primary source ii. It is more timely than the primary source iii. It is not as expensive as the primary source Disadvantages of secondary source of data i. It gives less information than the primary source ii. It may be wider or narrower than the objectives of the research iii. It may not be as detailed information as the primary source NOTE ▪ VARIATE (VARIABLE): A variate is any quantity or attribute whose values varies from one unit of investigation to another. In other words, a variable is a characteristic or attribute that can assume different values. Variables whose values are determined by chance are called random variables. ▪ OBSERVATION: An observation is the value taken by a variate or variable for a particular unit of investigation. TYPES OF DATA (Quantitative and Qualitative) 1. Quantitative (or Numerical) Data: Quantitative data are observations that are measured on a numerical scale. The most common type of data is quantitative data since many descriptive variables in nature are measured on numerical scales. Examples of quantitative data are: Number of leaves per plant, yield of cowpea, the heights (or weights) of students in a class. The measurements in these examples are all numerical. Quantitative variables can also be divided into two types; Continuous or Discrete. Continuous Variable: These are variables that can assume an infinite number of values between any two specific values. They are obtained by measuring. They often include fractions and decimals. Example include: weight of seeds, age of plants, amount of water etc. Discrete Variable: Discrete variables are variables that assume values that can be counted or if their values change by steps or jumps. For example, the number of plants per plot, etc. Decimals and fractions are not allowed for this type of variable. 2. Qualitative (or Categorical) Data: All data that are not quantitative are qualitative. Qualitative data are data whose values cannot be put in any numerical order. That is, they are observations that are categorical rather than numerical and are not capable of being measured. Examples of these are; political affiliations of a group of people, gender of a person. Qualitative variable can also be classified into Discrete category. Page 6 of 7 Hence, qualitative data describes attributes, characteristics, or categories. It does not involve numerical values and is used to label or classify elements within a dataset. Examples: ✓ Eye color (blue, green, brown). ✓ Marital status (single, married, divorced). ✓ Type of vehicle (car, truck, motorcycle). Types of Qualitative Data: 1. Nominal Data: Categories without a specific order. Example: Blood types (A, B, AB, O). There is no ranking or order. 2. Ordinal Data: Categories with a meaningful order but no fixed interval between them. Example: Customer satisfaction ratings (satisfied, neutral, dissatisfied). SCALES OF MEASUREMENT Scales of measurement, also known as levels of measurement or measurement scales, are a fundamental concept in statistics that categorize the types of data based on their nature and characteristics. These scales help researchers and statisticians determine the appropriate statistical techniques and operations that can be applied to a particular set of data. There are four primary scales of measurement: 1) NOMINAL: The nominal scale of measurement classifies data into mutually exclusive (non- overlapping) categories in which no order or ranking can be imposed on the data. Data in this category are qualitative. Example include: gender, Marital status, colour of substance, course of study, etc. 2) ORDINAL: The ordinal level of measurement classifies data into categories that can be ranked; however, precise differences between the ranks do not exist. For instance, when people are classified according to their size of shoes (small, medium, or large), a large variation exists among the individuals in each class. Other examples include; class of degrees, position in a competition, HIV test result etc. 3) INTERVAL: Interval data have ordered categories with equal and meaningful intervals between them, but they lack a true zero point. Examples include temperature measured in Celsius or Fahrenheit and IQ scores. You can perform arithmetic operations like addition and subtraction on interval data, but multiplication and division are not meaningful. Means, standard deviations, and parametric statistical tests can be used. 4) RATIO: Ratio data have all the properties of interval data, but they also have a true zero point, which indicates the absence of the measured quantity. Examples include height, weight, age, and income. All arithmetic operations (addition, subtraction, multiplication, division) are meaningful with ratio data. You can use means, standard deviations, and a wide range of statistical techniques. Page 7 of 7