Data Classification PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document provides an overview of data classification, including different types such as geographical, chronological, qualitative, and quantitative. It explains the purpose of data classification and discusses continuous and discrete data with examples. Furthermore, it introduces the concepts of elements, variables, and observations in data.
Full Transcript
DATA CLASSIFICATION MEANING Process of arranging data in groups/classes on the basis of certain properties. Purpose- 1. Condense the raw data into a form suitable for statistical analysis. 2. Removes complexities and highlights the feature of the data. 3. Facilitates comparisons and drawing infer...
DATA CLASSIFICATION MEANING Process of arranging data in groups/classes on the basis of certain properties. Purpose- 1. Condense the raw data into a form suitable for statistical analysis. 2. Removes complexities and highlights the feature of the data. 3. Facilitates comparisons and drawing inference from the data. 4. Provides information about the mutual relationships among elements of a data set. 5. Helps in statistical analysis by separating the elements of the data set into homogeneous groups and hence brings out the point of similarity and dissimilarity. BASIS OF CLASSIFICATION Generally data are classified on the basis of following four bases: 1. Geographical Classification: -Data is classified on the basis of geographical or locational difference such as cities, districts, or villages between various elements of data set. - Example- City Mumbai Kolkata Delhi Chennai Population 654 685 423 205 Density (per square km) -Elements can be classified alphabetical or based on the frequency size. 2. Chronological Classification: - Data classified on the basis of time is known as chronological classification. - Also known as time series. - In such a classification, data are classified either in ascending or in descending order with reference to time such as years, quarters, months, weeks, etc. - Example Year 1941 1951 1961 1971 1981 1991 2001 Populatio 31.9 36.9 43.9 54.7 75.6 85.9 98.6 n (crore) 3. Qualitative Classification: - Data are classified on the basis of descriptive characteristics or on the basis of attributes like gender, literacy, region, caste or education, which cannot be quantified. - Done in two ways: 1. Simple Classification- Each class is sub divided into two sub-classes and only one attribute is studied. Example- Male and Female, Blind and Not blind, Educated and Uneducated. 2. Mainfold Classification- A class is subdivided into more than two sub-classes which may be sub- divided further. Example- Population in a country can be classified in terms of gender as male and female. These two sub-classes may be further classified in terms of literacy as literate and illiterate. 4. Quantitative Classification: Data are classified on the basis of some characteristics which can be measured on numerical scale. Example- Height, weight, income, expenditure, production, sales. Two types of Quantitative classification- 1. Discrete (or discontinuous) data- Discrete Data can only take certain values. It is a data obtained by counting. Example: the number of students in a class, the results of rolling 2 dice. Number of children in a family. Discrete Series Number of Children Number of Families 0 10 1 30 2 60 3 90 4 110 5 20 2. Continuous data- Continuous Data can take any value (within a range). It is a data obtained by measuring. Example- A person’s weight: could be any value (within the range of human weights), not just certain fixed weight; Time in a race: you could even measure it to fractions of a second. Continuous Series Weights (kgs) Number of Persons 40-50 10 50-60 20 60-70 25 70-80 35 80-90 50 DIFFERENCE BETWEEN DISCRETE AND CONTINUOUS DATA Discrete Data Continuous Data Expressed in whole numbers and not Can take any numeric value including fractions. fractions and decimals. Countable. Measureable. Broken or separate data. Unbroken data. When plotted on a graph, points are When points are plotted on a graph, a isolated and as such may or may not pattern is formed either of a straight form a pattern. line or a curvy line. Example- number of months in a year, Example- time, weight, height, number of days in a week. temperature. DATA AND ITS COMPONENTS Data - Facts and figures collected for statistical analysis and interpretation. All the data collected in a particular study are referred to as the data set for the study. For e.g. a data set containing prices of shares over a period of time Elements are the entities on which data are collected. For e.g. the shares for which price data is collected. A variable is a characteristic of interest for the elements. For e.g. for the data set mentioned above the prices of shares are variable. The data obtained for a particular element is called an observation. For e.g. the price of share of RIL on a date is Rs. 2527. DATA, DATA SETS, ELEMENTS, VARIABLES, AND OBSERVATIONS Observation Variables Element Names Stock Annual. Earn/ Company Exchange Sales(Rs.) Share(Rs.) Dataram BSE 73.10 0.86 EnergySouth NSE NSE 74.00 74.00 1.67 1.67 Keystone Nasdaq 365.70 0.86 LandCare MCX 111.40 0.33 Psychemedics N 17.60 0.13 Data Set SCALES OF MEASUREMENT Data Categorical Quantitativ / e Qualitative Interval/ Nominal Ordinal Ratio To summarize nominal NOMINAL DATA data we use frequency or Also known as categorical or qualitative data. percentage, you can Example- Gender, Color, flavor of chocolate or ice-cream. not use mean or These are description or labels with no sense of order. average value for nominal data. Gende Color r Femal Male Red Yellow Green e Flavor Chocola Strawber Vanilla te ry ORDINAL DATA Ordinal data is a kind of categorical data with a set order or scale to it. Rank order data or ratings. One of the most notable features of ordinal data is that the differences between the data values cannot be determined or are meaningless. Also the distance between the intervals may not be equal. Example- Rank, Satisfaction etc. Unsatisfi Very Satisfied ed Very Satisfied Unsatisfied 1 4 Very Unsatisfie Very Satisfied Satisfied d Unsatisfied 1 2 3 4 INTERVAL/RATIO Includes items that can be measured rather than classified or ordered. Example- Weight, Age, Size etc. Also, known as scale data, quantitative data or parametric data. Can be discrete i.e., whole numbers5 customer s 17 points 12 shops Or Continuous i.e., fractions 4.2 miles 25.3 degree 3.8 minutes EXAMPLE Priya sells choconutties. Now, Priya is interested in developing a product to the line of choconutties. She develops a questionnaire and ask random sample 50 customers to fill it out. Questions: 1. How old are you? 2. Are you: Male Female 3. How much do you spend on groceries each week? 4. How many chocolate bars do you buy in a week? 5. Which type of chocolate do you like the best? Milk Dark White 6. How satisfied are you with choconutties? Very Satisfied Satisfied Unsatisfied Very Unsatisfied 7. How likely are you to a buy a box of 10 packets of choconutties? Interval/ Ratio Nominal Ordinal Priya’s Choconutties Survey Custome Age Gender Grocerie Choco- Type Satisfaction Bulk r s bars 1 25 Female Rs. 300 2 Milk 1 1 2 34 Male Rs. 1000 3 Dark 2 2 3 40 Male Rs. 500 1 Milk 3 4 4 22 Female Rs. 600 2 Whit 3 3 e 5 36 Female Rs. 700 3 Dark 2 2 6 20 Female Rs. 140 2 Dark 2 1