Data Types PDF
Document Details
Uploaded by LikeChrysoprase4098
Zarqa University
Dr. Mais Haj Qasem
Tags
Summary
This document provides information on various types of data, such as numerical, categorical, image, text, time series, audio, sensor, and structured data. It explains the characteristics of each data type, including examples.
Full Transcript
Data Prepared by: Dr. Mais Haj Qasem Data Data is a collection of information that stored in digital format, which can then be used as base for performing analysis and decisions making. Data science aims to organize data such that it is easy to understand a...
Data Prepared by: Dr. Mais Haj Qasem Data Data is a collection of information that stored in digital format, which can then be used as base for performing analysis and decisions making. Data science aims to organize data such that it is easy to understand and use. Data science connects the world by combining randomly distributed information into small units. The first thing to realize is that not all data is equal. This means that data collected by social media apps is not the same as data generated by point-of- sale or supply chain systems. Prepared by: Dr. Mais Haj Qasem Data scientists must understand that the data they collect can be one of many different types. It is important to know what type it is because it determines how best to collect it and how to display it. Data may be divided into several categories: ❖ Numerical Data ❖ Categorical Data ❖ Image Data ❖ Text Data ❖ Time Series Data ❖ Audio Data ❖ Sensor Data ❖ Structured data Prepared by: Dr. Mais Haj Qasem Numerical Data Any data that may be expressed numerically is considered numerical data. Real numbers, integers, and floats are all included in this category of data. This includes data from surveys, experiments, and other sources. There are two forms of numerical data: 1. Discrete Data 2. Continues Data Prepared by: Dr. Mais Haj Qasem 1. Discrete Data : is a type of numerical data that has definite or specified values. Discrete data has a few basic characteristics. These are: − fixed collection of numbers. They are fixed and can be whole numbers or integers. − Easily visualized using simple statistical methods bar graphs, line charts, histograms, and pie charts. − Some examples of Discrete data are: ❖ Shoe sizes available in a shoe store ❖ Number of blue cars sold in a showroom Prepared by: Dr. Mais Haj Qasem − The following examples illustrate the fixed value characteristics of discrete data. It is not possible to have a shoe size of 7.003, 500.5 cars sold, or quarters for movie tickets. Because of this, they are all fixed, to gain their name discrete. Prepared by: Dr. Mais Haj Qasem 2. Continues Data: is data that occurs within a certain range of values. A continuous data set is any data that occurs within a range or two fixed values. − Some examples of Continuous data are: ❖ Height of a plant ❖ Weight of newborn kittens ❖ Temperature changes in a desert Prepared by: Dr. Mais Haj Qasem Prepared by: Dr. Mais Haj Qasem Categorical Data Categorical data is data that is not measured but may be categorized, as pet choice (dogs vs. cats). Since categories cannot be measuredly ordered; there is no absolute difference among them, categorical data simply allows for comparisons. Gender (male, female, or other) and hair color (red, blonde, or black) are two examples of categorical data. Since there is no greater one than the other, they are seen as categorical. Red hair, for example, is only different from black hair; it is not darker. Categorical data divides data into distinct groups without giving each group a fixed value. There are two forms of categorical data : 1. Nominal Data 2. Ordinal Data Prepared by: Dr. Mais Haj Qasem 1. Nominal Data: data with no sort of natural order. Nominal data deals with categories that lack ranking capabilities. Some instances of nominal categorical data are brand names, colors, and animal breeds. It is not correct to say that dogs are greater than cats. Prepared by: Dr. Mais Haj Qasem 2. Ordinal Data: data relates to data that can naturally be ranked or ordered but does not have continuous measurable value. An example of this type of data is sizing on clothes (small, medium, and large are not measurable differences but they are clearly ordered to show size comparisons). Prepared by: Dr. Mais Haj Qasem Image Data This kind of data is made up of pixel values which make up an image. It is used for tasks like object recognition and categorization. Image data may come from a variety of sources, including digital cameras, scanners, and satellite imagery. To help computers to recognize things, people, and scenes, the data must be tagged or annotated. Labeled image data can comprise bounding object boxes, facial recognition data, and image segmentation data. The data may also be used for object identification, image classification, and other purposes. Prepared by: Dr. Mais Haj Qasem Text Data This data consists of words and sentences. Its applications include sentiment analysis and text categorization. Sentiment analysis helps you to understand the text's deeper emotions. The method uses emotions such as negative, positive, neutral, sad, etc. to categorize the text data. Speech transcripts, emails, articles, social media postings, customer reviews, and other unstructured text are examples of text data. For example, may be used to categorize emails, identify emotion in customer reviews, or produce content based on a list of keywords. It may also be used to summarize long text documents, find subjects within the text, and extract relevant entities or connections from the text. Prepared by: Dr. Mais Haj Qasem Time Series Data Time series data is a collection of data points collected over time, generally in order of occurrence. This data type consists of a series of values through time and has uses in forecasting and anomaly identification. A time series, for example, might be used to forecast stock market values or to find trends in weather patterns. Time series data is often gathered in the form of a sequence of data points collected at regular intervals such as hourly, daily, weekly, or monthly. Prepared by: Dr. Mais Haj Qasem Audio Data Audio data consists of conversations, speeches, music, and other sound effects. Music, spoken words, and other sound recordings are included. These recordings are used to generate datasets from which machine learning models may learn. Prepared by: Dr. Mais Haj Qasem Sensor Data Data from motion sensors, temperature sensors, and other physical sensors are included. This information may come from a variety of sources, including mobile devices, sensors on robots, cameras, and other Internet of Thing's devices. Sensor data may also be utilized to better understand and optimize user behavior. Sensors can detect almost any physical element. Prepared by: Dr. Mais Haj Qasem Structured Data Structured data typically categorized as quantitative data is data that has been arranged and formatted in an easily understood manner so that both people and machines are able to understand it. Structured data is commonly found in databases and spreadsheets and is distinguished by its organization. In most cases, each data element is assigned a specific field or column in the schema, and each record or row represents a special instance of that data. Structured data can be of any type. For example, customer information may include: numbers (age, income, num.vehicles), text (housing. type), Boolean type (is.employed), and categorical data (Gender, marital.stat). What matters for us is that any data we see here – whether it is a number, a category, or a text – is labeled. In other words, we know what that number, category, or text means Prepared by: Dr. Mais Haj Qasem Prepared by: Dr. Mais Haj Qasem Unstructured Data Unstructured data, typically categorized as qualitative data, is information in various formats that does not adhere to traditional data models, making it challenging to store and manage in a traditional relational database. One of the most common types of unstructured data is text. Unstructured text is generated and collected in a wide range of forms, including Word documents, email messages, PowerPoint presentations, survey responses, transcripts of call center interactions and posts from blogs and social media sites. Unstructured data consists of a wide range of content, including documents, videos, audio files, social network postings, and emails. These data formats might be challenging to standardize and classify. Unstructured data is frequently made up of data collections rather than a single data unit, such as a document having thousands of words that include a variety of subjects. Prepared by: Dr. Mais Haj Qasem Prepared by: Dr. Mais Haj Qasem key differences between Structured vs Unstructured Data Structured data is label, while unstructured data is not label. Structured data is often kept in data warehouses, and unstructured data is typically saved in data centres. Structured data is simple to search and analyze, but unstructured data takes more effort to process and understand. Structured data is available in established formats, but unstructured data is available in a variety of formats. Prepared by: Dr. Mais Haj Qasem Traditional Data and Big Data Traditional data is structured data that is mainly used by any type of organization, that varies from small to large. A centralized database architecture is used in a typical database system to store and manage data in an organized format or fields in a file. Big data is a more advanced form of traditional data. Big data deals with data sets that are too huge or complicated to manage in standard data-processing application software. It handles massive amounts of structured and unstructured data. Prepared by: Dr. Mais Haj Qasem Data Collections There are many places online to look for sets or collections of data. Here are some of those sources: 1. Open Data: Some data should be freely available in a public domain that can be used by anyone as they wish, without restrictions from copyright, patents, or other mechanisms of control. For example, you can visit data repositories produced by the US Government. 2. Social Media Data : Social media has become a gold mine for collecting data to analyze for research or marketing purposes. This is facilitated by the Application Programming Interface (API) that social media companies provide to researchers and developers such as Facebook Graph API. 3. Multimodal Data :We are living in a world where more and more devices exist and are getting connected to the Internet, creating an emerging trend of the Internet of Things (IoT). These devices are generating and using much data, but not all of which are traditional types (numbers, text). Prepared by: Dr. Mais Haj Qasem