Stat Unit 1 PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document covers the basics of statistical data collection and various types, including definitions, branches, methods, functions, and limitations. It explains the role of statistics in different fields, like science and economics.
Full Transcript
# UNIT I - STATISTICAL DATA COLLECTION AND TYPES ## Definition: - Statistics is the branch of mathematics and science that involves the collection, analysis, interpretation, presentation, and organization of data. - It is numerical information from which certain conclusions can be drawn. - It enc...
# UNIT I - STATISTICAL DATA COLLECTION AND TYPES ## Definition: - Statistics is the branch of mathematics and science that involves the collection, analysis, interpretation, presentation, and organization of data. - It is numerical information from which certain conclusions can be drawn. - It encompasses methods and techniques for summarizing and making inferences from data, helping people understand patterns, trends, and variability in various phenomena. - Statistics plays a crucial role in various fields, including science, economics, social sciences and more, by providing tools to make informed decisions and draw meaningful conclusions from data. ## Main Division of Statistics (Branches): - Statistical Methods - Applied Statistics ## Statistical Methods: - Also called mathematical statistics or theory of statistics. - Deals with the procedure of statistical analysis and interpretation of numerical data. - It is a tool for decision making. ## Applied Statistics: - It is an application of statistical methods to concrete situations like agriculture, industry, population, medicines etc. ## Some Branches of Applied Statistics: - Bio statistics - Econometrics - Social statistics - Environmental Statistics - Financial Statistics - Business Analytics - Demography - Statistical quality control - Agricultural statistics ## Functions of Statistics: - **Descriptive Function:** Helps in summarizing and describing data, making it more understandable and manageable with help of tools such as mean, median, mode - **Inferential Function:** Statistics allows researchers and decision-makers to draw conclusions about populations based on samples. - **Expletory Function:** Statistics aids in exploring data to discover pattern relationships, trends - **Comparative function:** Enables comparison of different groups, variables, time periods. - **Predictive function:** Statistical models and methods are used to predict future outcomes based on historical data which is valuable in forecasting sales, stock prices, weather and many areas. - **Quality control function:** In manufacturing and service industries, it is used to monitor and control processes to ensure consistent product quality and efficiency. ## Research Function: - Statistics is fundamental in scientific research, providing tools to design experiments, collect data, analyze results, and test hypothesis. ## Decision-Making Function: - In business and policy-making, statistics provides insight. ## Statistics Present the Facts: - In a definite form. ## Limitation of Statistics: 1. **Data Quality:** Statistics relies on the quality of data collected. Inaccurate or incomplete data can lead to incorrect conclusions. 2. **Sampling Bias:** If the data used for statistical analysis is not representative of the population it's meant to describe, the results can be biased and not applicable to the broader context. 3. **Assumption Independence:** Many statistical methods assume that data points are independent of each other. When this assumption is violated, it can lead to incorrect conclusions. 4. **Interpretation Challenges:** Statistics can be misinterpreted or misused, especially when non-experts attempt to analyze complex data. 5. **Limited scope:** Statistics provides summaries and generalizations, which might not capture all the nuances and details of a complex phenomenon. Statistical laws are true only on an average. 6. **Ethical concerns:** Ethical considerations can limit the collection and use of certain types of data for statistical analysis, especially in sensitive areas like healthcare or social sciences. 7. **Incompleteness:** Statistical analysis can only tell you about the relationships and patterns present in the data you have. It cannot provide information about data that wasn't collected. 8. **Complexity:** Some phenomena are inherently complex and cannot be fully understood or described using traditional statistical methods. 9. **It does not deal with individuals and does not deal with qualitative data.** Only experts can make the best possible use of statistics. ## Units or Individuals: 1. In statistics, units typically refer to the individual elements or objects within a dataset whose characteristics are studied in any statistical survey. 2. These units can represent a wide range of things depending on context of the data. 3. **Example:** - In a survey, each person surveyed is a unit. - In experiments each measurement or data point is a unit. - In time periods each time point say day, month, year can be considered as a unit. - Units are essential as they are basis for collecting, analyzing, and interpreting data. ## Population or Universe: 1. In statistics, the term "population" or "universe" refers to the entire set of individuals, items, or data points that are the subject of statistical study or analysis. 2. The population is the complete collection of all elements we study. 3. **Example:** - **If we want to understand the voting preferences of all eligible voters in the country, the population would consist of every eligible voter in that country**. - **If we want to study the quality of apples in an orchard, the population would include every apple tree and every apple on those trees in that specific orchard.** ## Finite Population: 1. In statistics, a "finite population" refers to a well-defined and limited set of individuals, items, or data points that you are studying or analyzing. 2. A finite population has a clear and finite or countable number of elements or units. 3. **Example:** - **If the test scores of students is to be analyzed in a specific classroom, the population would consist of students in that class room, which is finite number.** ## Infinite Population: - In statistics, an "infinite population" refers to a theoretical concept used for mathematical purposes, where the set of individuals, items, or data points under consideration is so large that it is effectively considered limitless. In reality, truly infinite populations are exceedingly rare. ## Sample: 1. In statistics, a "sample" refers to a subset of individuals, items, or data points selected from a larger group known as population. 2. It is also defined as representative units of a population. 3. The purpose of taking a sample is to make inferences or draw conclusions about the entire population without having to study or collect data from every individual or item within it. 4. It is a fundamental technique in statistical analysis and research. 5. The size of the sample is denoted as "n”. 6. Random sampling is common method used to select a sample which helps to reduce bias. 7. Larger samples generally provide more accurate estimates of population parameters which can more costly and time consuming. 8. Some of the sampling methods are systematic sampling, simple random sampling, stratified sampling, cluster sampling. ## Sampling: - A sample collection from subset and then generalized to large population. ## Quantitative Characteristics: 1. In statistics, quantitative characteristics, also known as quantitative variables or numerical variables. 2. They are types of data that represent measurable quantities and can be expressed as numbers. 3. These characteristics provide information about the magnitude or amount of something and are typically analyzed using various statistical method. 4. **Example:** - **Age:** Represents no of years a person lived and is expressed as numerical value. - **Height:** It is a measure of dimension of an object or individual. - **Income:** Amount of money earned by an individual or household and is typically measured in currency units. - **Other examples are Temperature, Weight, Test scores, stock prices etc**. - **Quantitative characteristics are important in statistical analysis because they allow for various numerical computations, such as calculating means, medians, standard deviations and conducting various statistical tests.** ## Qualitative Characteristics: 1. In statistics, qualitative characteristics, also known as categorical variables or nominal variables, are type of data that represent categories, labels, or distinct groups. 2. They are typically non numeric and do not have natural numerical order. 3. **Example:** - **Gender:** Represents category variable as male and female which do not have numerical order. - **Color:** Represents category as red, blue which are not ordered. - **Marital status:** Categories like single, married, divorced, widowed etc. - **Types of cars:** Categories on model or make. - **Educational:** Categories like high school, college etc. - **Blood type:** Categories like A, B, AB, O etc. 4. **Qualitative characteristics are often used to group and categorize data.** While they cannot be subjected to numerical calculations like means or medians, they are valuable for organizing and summarizing information, conducting frequency counts, and visualizing data using charts. ## Variable - **Quantitative characteristics which varies from unit to unit or individual to individual.** Eg: No of members in the family. There are independent variable denoted as "X" and dependent variable which is depended on independent variable "Y". - **Some types are:** - Categorical - Continuous - Discrete - **Categorical:** These variables represent categories/groups as gender, color, type of car etc - **Continuous variables:** Numeric variables that can take any value within a range as height, age. - **Discrete variables:** Numeric variables that can take specific, distinct values such as number of children in the family etc. ## Attribute: - In statistics, an attribute is a characteristic or property of an object or individual that can be measured or described. They are synonymous with variables. - **Example:** - **In a dataset of people's information, attributes could include age, gender, income**, **education level etc.** - **These can take on different values for different individuals making them variables in statistical analysis.** ## DATA - A collection of facts such as numbers, text, words and measurements, observations or even just descriptions of things. - The Data are individual pieces of factual information recorded, and it is used for the purpose of the analysis process in the research. Data is organized in the form of graphs, charts or tables. ## Types of Data 1. **Numerical or Quantitative Data** 2. **Categorial or Qualitative Data.** ## Numerical or Quantitative Data: - Numerical or quantitative data is a type of data that consists of measurable quantities and is expressed with numbers. - It represents information that can be counted or measured and subjected to mathematical operations. - **Some common examples of quantitative data are:** - **Age:** The age of individuals is a numerical variable. - **Income:** Monthly or annual income is represented as numerical value. - **Weight:** The weight of an object/person is expressed numerically like 100 pounds. - **Time:** Time intervals, such as 10 seconds are quantative. - **Temperature:** Temperature readings are quantative data. ### Numerical/Quantative Data Classified into Two types: 1. **Discrete** 2. **Continuous** #### Discrete data: - Discrete quantitative data is a specific type of numerical data that represents distinct, separate values that are typically counted rather than Measured. - These values are often integers and coannot take on fractional or continuous values. It has clear gap between values, and it is open based on counting or enumeration. - **Example:** - **Number of children in a family is descrete as every individual is counted and is integers, no one can have 2.5 kids but can be 2 or 3 kids.** - **Number of cars in a parking lot:** Here clearly the no of cars say 10, 20 and so on - **Number of customers in a store or number of books on a shelf** - **Number of defective products in a batch. etc.** #### Continuous Data: - Contineous quaantitative data is a type of numerical data that can take on an infinite number of values within a given range and is typically measured using real numbers. - Unlike discrete data, which consists of distinct, separate values, continuous data represents measurements that can be divided into smaller and smaller units, often with a level of precision. - **Example:** - **Height of individuals:** Heights can vary continuously within a range such as 153, 153.2, 153.4, 175 cm etc. - **Temperature:** Temperature can take on a wide range as 30.5", 32.2 degrees etc - **Time:** Time measurements can be continuous, with fransctions say 9.83s etc - **Distance:** Distances measured in meters is contineous data. - **Continuous quantitative data is a type of numerical data that allows for mathematical operations such as addition and it can be further analyzed using statistical techniques.** ## Categorial or Qualitative data: - It consist of non numerical information that describes qualities or characteristics. - It is often subjective and categorial in nature. It consists of categories or labels that represent different groups or characteristics. - **These categories are not inherently ordered or numerical and are typically used to describe attributes or qualities such as colors, types of animals, or survey response options. Examples of qualitative data include colours: "Red" "blue"** - **Qualitative data cannot be mathematically measured or ordered, but it can be organised into categories or Groups for analysis.** - **Measurement of quantitative data is very easy but, measurement of qualitative data requires different types of scale of measurements.** - **This data deals with characteristics and descriptors that cant be easily measured, but can be observed subjectively such as smell taste.** - **Some times categorial data can have numerical values (quantitative value), but those values do not have mathematical sense such as birth date, it has quantative value, but does not have numerical meaning as it varies.** ### Categorial or Qualitative Data (Scale): Two types: 1. **Nominal Data** 2. **Ordinal Data** #### Nominal Data (Nominal scale): - Helps to label the variable without providing the numerical or quantitative value. - It is classified without any intrinsic order or rank. - It cannot be ordered and measured. - It is the lowest level of measurement in statistics. - The nominal data are examined using grouping method i.e the data are grouped into categories, and then frequency/percentage of the data is calculated. - These data are visually represented using the pie charts. - The significant feature of nominal data is that difference between the data values is not determined. - Nominal data can be summarized using frequency counts or percentages to understand the distribution of categories within a dataset, but you cannot perform mathematical operations like addition on nominal data because the categories have no numerical meaning or order. - **For Example:** - Colors- Red, blue, green, etc - Animal Types: Cat, Dog, Bird, etc - Marital status: Single, Married, Divorced, etc. #### Ordinal Data: - Ordinal data/variable is a type of data which follows natural order or predetermined order. - Ordinal data represents categories with a meaningful order or ranking but does not have a consistent interval between them. - Unlike nominal data, ordinal data has a natural order or hierarchy among its categories, but the differences between the categories are not quantifiable or uniform - This variable data is mostly found in surveys, finance, questionnaires etc. - These data are investigated and interpreted through many visualisation. - The ordinal data is commonly represented using a bar chart. - The information may be expressed using tables in which each row in the table shows the distinct category. - **Example of ordinal data include:** - Educational levels: High school, Diploma, Bachelors degree, Masters degree, PhD - Customer satisfaction ratings: Very dissatisfied, dissatisfied, neutral, satisfied, very satisfied - Socioeconomic status: Lower class, middle class, upper class. - Health: poor, good, excellent. - **Common statistical analyses for ordinal data include using median and mode for central tendency measures.** ## Data collection: - Methodical process of gathering and analyzing specific information, solutions to relevant questions and evaluate the results of a particular subject matter. - The critical objective of the data collection is that the collected information is accurate, rich and reliable. ### Key Aspects of Data Collection: - Defining the objectives - Selecting data sources - Designing data collection methods - Developing Data collection instruments - Sampling - Data collection process - Data recording - Quality control - Privacy and ethical consideration - Data storage - Data documentation - Data validation and verification - Data analysis - Interpretation and reporting. ## Methods of Collection of data: - **The investigator** is the person who conducts the statistical enquiry. - **The person who collects the information for the investigator is called enumerator**. This person should be trained and an efficient satisfaction. - **The statistician** collects and analyze the characteristics under study for further statistical analysis. - **The respondents(informants)** are the persons from whom the information will be collected - **Collection of data** is the process of enumeration together with a proper recording of results. The success of an enquiry is based upon the proper Collection of data. ## Classification of Data - **Primary data collection methods** - **Secondary data collection methods.** ### Primary data collection method: - Primary data or raw data is a type of information that is obtained directly from the first-hand or original source through individuals, groups, or objects. - These methods are used when researchers need specific information that is not readily available from existing sources. ### Common Primary Data Collection Methods: - Surveys - Interviews - Observations - Experiments - Focus groups - Questionnaires - In-depth Interviews - Case studies - Diaries or journals - Photography and video - Surveillance and Sensor data - GPS (Geospatial Dat - When choosing a primary data collection method, researchers should consider the research objectives; the type of data needed (quantitative data or qualitative data) the target population or sample, ethical considerations, budget constraints, and available resources. Often, a combination of these methods may be used to gather comprehensive data for a research project. ### Secondary data collection methods: - Secondary data is collected by some other than the actual user. - It is information which is ready available, and someone analysis it. - The secondary data includes magazines, newspapers, books, journals, etc in the published or UN published manner. Published data available in various resources including government Publications, public records, historical and statistical documents, business documents, technical and trade journals. Unpublished data includes diaries, letters, unpublished biography etc. ## Data Collection Tools: - Devices/Instruments used to collect data are Questionaire, Survey, Interviews, Focus group discussion. ### Questionnaires : - It is the process of collecting data through an instrument/forms consisting of a series of questions and prompts to receive a response from individuals/groups. - Questionnaires is a part of survey, This method can also help to gather information about population's general knowledge of variables such as products, brands or companies and how that variable affects them as a consumer. - **Pro:** - Cost effective - Can be done in large number - Used to compare/contrast previous research to note changes - Easy to visulize - Easy to analyze (actionable data) - Identity of respondent is not revealed, - Covers all areas of topics. - **Cons:** - Answers may not be honest - It does not produce qualitative data - May be incomplete - Respondent may have hidden agenda - All questions cannot be analyzed. - **Best Data collection tools for questionaire: (a) Form and online questionaire (b) Paper Questionaire.** #### Forming a questionnaire: 1. The number of questions should be as less as possible 2. The questions should be simple to understand 3. Questions should be arranged logically. 4. Answers to the question should be short 5. As far as possible questions regarding personal matter should be avoided 6. Any clarifications if necessary regarding any of the questions should be supplied in the form of foot note 7. Necessary instructions should be given to the informants 8. Questions which require mathematical rigors should be avoided 9. A questionnaire should be attractive 10. The question and are so framed that the validity of the information's supplied by the informants can be crossed checked #### Two kinds of questionnaire: 1. **Close ended** 2. **Open Ended questionnaire** - **Close-ended questions** can be answered with "yes" or "no" or they have limited set answers. Ex: (a) Are you feeling better today? (b) May I use your pen? - **An open-ended questions** are questions that cannot be answered "yes" or "no" response. Open-ended questions are broad and can be answered in detail. - **Example:** - Tell me about your college - How do you see your future? ### Survey (Existing Data): - Data Collection through survey is a widely used to measure together information and insights from individual group of people. - Surveys can be conducted in various formats, including online questionnaire, phone surveys, paper- based forms, face to face interactions. - Survey data is defined as the resultant data that is collected from a sample of responded that took a survey. - This data is comprehensive information gather from a target audience about specific topic to conduct research. - There are many methods used for survey data collection and statistical analysis. - Various mediums are used to collect feedback and opinions from the desired sample of individuals. While conducting survey research, researchers prefer multiple sources to gather data such as online survey, telephone survey, face to face survey. - Factors of collecting survey data such as how the interviewer will contact the respondent, how the information is communicated with the respondents etc, decide the effectiveness of gathered data. The addition of new questions in addition to the initial questions collected when data was gathered. It is adding to a study or research. - **Pros:** Accuracy is very high * Easily accessible information - **Cons:** Problems with evaluation * Difficulty in understanding. - **Best Data Collection Tools for Existing Data (Survey):** - Research Journals - Surveys ### Research Journals - A journal is a scholarly publication containing articles written by researchers, professors, & other experts. - Research journals are intended for an academic or technical audience. ### Interviews: - A Interview is a face-to-face converstion between two individuals with the sole purposes of collecting relevant information to satisfy a research purpose. - Data Collection through interviews is available method for gathering qualitative data and in depth insights from individual or groups, interviews involve engaging in direct conversation with participants to explore their experiences, opinions, and perspectives on specific topics. - Interviews are a way to gather a lot of information from a small population. - You can perform a interview by finding a small target population and asking them questions about crucial data. This method of Data Collection Collection is used together qualitative information, such as an individuals visuals opinion and to understand why they may feel a certain way about a product, brand, company, event. - Using this method to gather information about an event that the target population observe directly means finding multiple people who witnessed an event and asking about what the observed from their perspectives of event. - Diff types of Intervies: Structured and unstructured with each living slightly variation from each other. - **Pros:** - In-depth information - Freedom of flexibility - Accurate data - **cons:** - Time-consuming - Expensive to collect. #### Structured Interviews: - Verbally administered questionaire. - It is surface level, it lacks depth but it is high recommendable for speed and efficiency. #### Semi structured Interviews: - These interviews have several key questions which cover the scope of the areas to be explored. - It helps little more than structured question for the researcher to explore the subject matter. #### Unstructured Interviews: - This interview is in depth, it allows the researcher to collect a wide range of information with a purpose. - It gives freedom to researcher to combine structure with flexibility ever though it is more time consumccing. - **Best Data Collection Tools for Interviews:** Audio Recorder, Digital Camera, Camcorder, etc ### Focus Groups: - It is a data collection method focusing more on qualitative research. - It falls under the primary category for data based on the feelings and opinions of the respondents. - This research involves asking open-ended questions to a group of individuals using ranging from 6-10 people to provide feedback. - Data Collection through focus groups is a qualitative research method that involves facilitated group discussions to gather insights, perceptions, and opinions on a specific topic. - Focus groups can provide rich data by capitalizing on the group dynamics and interactions among participants. - Focus groups are a type of qualitative research, observation of the groups dynamic, their answers to Focus Group questions, and even their body language can guide future research on customer decisions, products and services, or controversial topics. - Focus groups are used in marketing, library, science, social science, and user research disciplines. - They can provide more natural feedback that individual interviews and easier to organize than experiments or large scale service. - **Pros:** - Information obtained is usually very detailed - Cost-effective when compare to one-onone interview - It reflects speed and efficiency in the supply of results. - **Cons:** - Lacking depth in covering the nitty-gritty (Practical details/important aspects of a subject - Bias might still be evident - Requires interviewer training - The researcher has very little control over the outcome - A few vocal voices can drown out the rest - Difficulty in assembling an all-inclusive group. - **Best Data Collection Tools for Focus Groups:** A focus group is a data collection method that is tightly facilitated and structured around a set of questions. The very purpose of the meeting is to extract from the participants detailed responses to these questions.