Full Transcript

Unit 4 Data and Analysis Introduction A computer system is a fundamental and important part enterade roulife. It has revolutionized the way we work, communicate, learn, and entertain ourselves. In today\'s world, we are surrounded by a lot of data, which may be on our computer system or otherwise...

Unit 4 Data and Analysis Introduction A computer system is a fundamental and important part enterade roulife. It has revolutionized the way we work, communicate, learn, and entertain ourselves. In today\'s world, we are surrounded by a lot of data, which may be on our computer system or otherwise. This data is continuously growing and to get meaningful information from it we need to follow some discipline: Data science is the branch of knowledge, in which computer programming skills along with mathematics and statistics is used to extract meaningful information from the collection of data. 4.1 Data and Analysis of information or farts that we gather abor pictures Meitan be represented by numbers, measurements, descriptions, sounds or pictures. Here are two examples of data. In a science experiment when you record the temperatures at different times, that temperature values are data. If you conduct a survey of your classmates and Iget to know how many of any of them like Mathematics, it will be called data. Data Analytics Data Analytics reters to the process s of carefully examining and studying data to identify patterns, draw conclusions, or make the data meaningful. It\'s like solving a puzzle or retrieving meaningful results from the given or collected data. To analyze data, you can use mathematical calculations, statistical techniques, charts, or other tools to understand data. For example, after recording hourly temperature data in a science experiment you can create a graph to see how it changed over time. From graphical representation of data, you draw a conclusion that it got warmer as the day went on, that information will be the result of your data analysis What Therefore, data is the information we collect, and data analytics is the way we observe it to get meaningful information. Data analytics can be quantitative(numeric) and qualitative. 4.1.1 Data Science Data Science refers to an interdisciplinary field of multiple disciplines that uses mathematics, statistics, data analysis, and machine learning to analyze data and to extract knowledge and insights from it. It is like a pipeline from data to insights. This insight or knowledge is used to find patterns in the data. The result drawn can be used for making informed decisions to solve real world problems e.g., medical, education, scientific research, and business etc. 4.1.2 Concepts of Data Science Data science consists of many components, theories, and algorithms. s. To understand data science and make its productive usage, following are some key concepts or components that lay the foundation of data science: Data: As mentioned earlier, data is a collection of observations, facts or information collected from different sources. This data can be in the form of numbers. example a collection of brain CT scan of brain tumor patients is a dataset which can be unique body of work. This collection of data is related to each other in some way, for measurements Data and Analysis structured(processed) observations, or in audio or video form. It could be ) data which is in the form of tables or data in the form of audio, video, tweets, pdf files efta unstructured(unprocessed) Dataset: Dataset is a structured or processed collection of data usually associated with a used to evaluate certain pattern or trend common in the entire datadata Statistics and Probability: Statistics is the analysis of the frequency of past events and probability is to predict the likelihood of future events. Data scientists use statistics and probability to find patterns and trends in the data. Mathematics: Mathematics is a fundamental part of data science which helps to solve problems, optimize the model performances, and interpret huge complex data into simple and clear results, for decision making. Machine Learning: Machine learning is a branch of Artificial Intelligence and computer science which emphasis on the use of data and algorithms to imitate human learning by the computers. Deep Learning: Deep learning is the subset of Machine learning, with emphasis on the simulation or imitation of human brain\'s behavior by using artificial neural networks. Data Mining: Data mining is the subset of data science which primarily focuses on discovering patterns and relationships in existing datasets. The usage of techniques and tools is limited in data mining as compared to data science. Data Visualization: Data visualization is the graphical representation of data using common charts, plots, infographics, and animations. These visual displays of information communicate complex data relationships and data driven insights in a way that is easy to understand. Big Data: Big data refers to handling large volumes of data. Data scientists use big data to find patterns and trends in datasets, to obtain more accurate and reliable results. The huge size of data provides more opportunities for machine learning and provides better results. Predictive Analysis: Predictive analysis is the use of data to predict future trends and events based on historical data. Natural Language Processing (NLP): It is the study of interaction between human language and computers. The common uses of NLP are chatbots, language translators and sentiment analysis. 4.1.3 Scope and application of Data **Ssentiment analysis** is the term used to identify the sentiments of a customer by analyzing the review about the product. The sentiment can be p negative, or neutral. Sentimeng nalysts can be performed on review text, opinions etc Data science is used for a wide range of analytics, machine learning, data visualization, recommendation systems, sentiment analysis, fraud detection, and decision-making in various industries like healthcare, finance, marketing, and technology. A business problem is a gap between the existing and desired state of a situation, It is a desired action or series of actions to achieve an objective. Various business problems can be solved through data science, some of them are as follows: To decide the best routes for shipping of goods or passenger airplanes. To choose the best product among many, which one to buy Aor B. To foresee delays for flight/ship/train etc. (through predictive analysis). To create promotional offers (which products are more popular than others) To find the best suitable time to deliver goods to reduce cost. To forecast next year\'s revenue for a company. To analyze health benefit of physical training programs. To predict some fore coming event like who will win elections. 4.1.4 Business problems and Data Science Data science can be applied to various businesses after analyzing the available data, some of them are: Industry: Data science can be used to make data driven decisions by analyzing historical data and predicting future trends. It can also help in effective marketing an improving quality control. Consumer goods: Data science skills can be used to optimize inventory according to the Semand forecasting of particular goods in particular social groups, communities, and demographics. Logistic companies: These companies can apply data science for their rout optimization, Bemand forecasting, real-time tracking, load balancing, carrier selection, cost reduction and global trade optimization. Stock markets: Data science techniques and tools can be helpful in algorithmic trading. market sentiment analysis, volatility predictions, quantitative analysis, machine earning based trading, market surveillance and risk management etc. E-commerce: In e-commerce data science helps in recommendation systems, customer egmentation, shopping cart analysis, fraud detection, supply chain optimization and ustomers sentiment analysis etc. 4.2 Data types in Data Science In data science we can mainly classify data quantitative(numeric) and qualitative (categorical). Qualitative or Categorical data describes an object or a group of objects that can be labeled according to some group or category. It cannot be represented in numerical form. For example, data including colors, places, etc. it is further subdivided into two types: Ordinal data subtypes 19% or catejovial datas Ordinal Data: a\. What is main characterstic of ordinal data? Ordinal data sees a specific order or ranking, it uses certain scale or measure to group data into categories. Such as in test grades, economic status, or military rank. Nominal Data: Nominal data does not have any order, it can be labelled into mutually exclusive categories, which cannot not be ordered meaningfully For example, if we consider the categories of transportation as car, bus or train. Similarly, gender, city, color, employment status are also examples of nominal data. Quantitative or Numerical data deals with numeric values, that can be computed mathematically to draw some conclusions. Examples of numeric data are height, weight, number of students in a school, fruits in a basket etc\] Quantitative data can be further divided into two types: i\. Discrete data ii\. Continuous data Q: Its main advandenge provides exact con Discrete Data: It includes data which can only take certain values and cannot be furtherm subdivided into smaller units. This data can be counted and has a finite number of values. 어 For example, the number of product reviews, ticket sold, computers in certain departments, employees in a company etc.\] Continuous Data: It refers to the unspecified number of possible measurements between two realistic points or numbers. For example, daily wind speed, weight of newborn babies, freezer\'s temperature etc. 4.2.1 Sources of data To analyze data for predictive analysis and decision making, the initial step is data collection through various reliable sources. Data can be divided into two categories, Viwist w primary data, and secondary data Primary data is collected directly by questionnaires, surveys, and interviews. Primary data can also be collected through experimen and recording observations, secondary data is collected from some previously recorded from primary data, Following are some sources of data: It websites, Surveys Sensors. Website: Collecting tweets regarding some topic or thread. Surveys Collecting firsthand data by performing surveys about some event, mode Anything elselecting seismic data regarding changes under the earth which Earthquakes. 4.2.2 bataset and Database between A dataset is a structured or organized collection of data, which is susually a unique body of work. However, a database an organized collection of data multiple datasets or tables, These tables can be accessed electronically fro computer system for further manipulation and update. Associated To perform actions on the data stored in a database, we need al Database Mana To perf (DBMS) DBMS is the interface between the database and they difeser, pro System reate, modify, and retrieve data The of database being sent the type used management systems available, depending on example, telational databases, which store data in tables, can be man For example,agement systems such as MySQL, Oracle, dat ahost used databases in Data Science, for the data which is presented in format: Non-relational databases, which store data in forms such as Column families, or graphs, can be managed by database management systems MongoDB and Cassandra. Non-relational DBMS are also called NoSQL DBMS. What are PRAPPEL OF key-value 4.2.3 Role of database in data science Before the advent of database systems, computer scientists relied on file manag Systems to store and manage data However, without a structured method of s Data, it would be of little use. This is why databases were introduced to manage and Large amounts of data. The first database management system was developed in There are two key reasons why databases have become so popular in recent years 1)The rapid increase in data generation 2). The dependence of data science on data Data is growing fast ds needs data To better understand the importance of databases in our daily lives, let's take an ex of supermarkets' evolution. In this case study you will learn how data science had on the shopping in current age. DO YOU KNOW The method of collecting Information from individuals. The basic purpose of a survey to collect data to describe different characterstics such enjoyable experience because shops often haddress, quality, price, kirdess. etc. It lovalves asking questions about s produrt or service from many propte in the old days, people used to buy play? necessities you had s from various shops For to buy a calculator, a shoe polish and a example, if box of yogurt, pair of socks of school uniform, you were supposed to visit four various shops. Such shopping was never an less space for customers, and they had to wait for the shopkeeper to find their desired item The introduction of supermarkets, however, changed that, as they they made shopping much morenakes solve? pleasant by displaying all the products in a large space and making them easily accessible\] to customers As the number of products and customers in supermarkets increased, the need for a database system to keep track of Why dateibare syste all the purchases became critical Data Science plays a crucial role in determining the shelves place of various products in variou of the supermarket. For example, the information gathered from the databases? will guide us to place the products with less, shelf life an, the most easily accessible shelves. Similarly, predictive ahalysis provides adequate guidelines that which products would be in high demand in which season/month. For example, in Pakistan during the months of religious and national festivals, the demand of food items and clothing increases as compared to the rest of the year. By analyzing sales data from different supermarket branches, supermarket owners can identify which products need to be stocked in larger quantities and during which months the sales are highest. Case study for the Database And Data Science To determine the months with the heaviest customer traffic, a graph was plotted between the month and gross income. The analysis showed that the sales were highest in the months of festivals. In this way data science provides maximum benefits to supermarket owners as well as customers, who can find their desired items easily. 4.2.4 Data Collection in Data Science what is Data Collection is the process of collecting informal Datas the first and solution to the given statistical enquiry Collection is foremost ste senditegories of dates \[a statistical investigation, Data collection methods are divided into two categories Primary data collection Secondary data collection Primary data collection methods: \[It involves the collection of original data directly from the data whom the sad Interaction with the respondent A respondent is a person from whom the statistic Information required for the enquiry is collected. Some common primary data collection a wote what are methods are as follows: 1\. Surveys and Questionnaires ii\. Interviews ifi. Observations iv\. Experiments V Focus groups Vi Sensors vii\. IoT devices vill Biometric devices Secondary data collection methods:? It involves data collection using existing data collected by someone else for tone purpose. Such data is usually available in the form of published material like research papers, books, websites ete Some common secondary data collection methodsiaren follows: 1. Published sources 2. Online databases 3. Social media data/posts 4. publicly available data 5. Government and institutional records 6. Surveys and Questionnaires conducted in the past 7. Past research studies\] **DO YOU KNOW!** Enumerators: To collect Information for analysis, an Investigator needs the help of some people. These people are known as enumerators. Investigator: An Investigators person who conducts the statistivcal enquiry. 4.2.5 Data Storage ned After data step after sunta Collection? collection, effective storage of datalis analyzing the large volumes of data) There a of datal is an essential step for managing and are varfoundata storage methods according to the nature of data. Some common data storage methods are as fol meth \[Relational/NoSQL. databases Data warehouse Distributed file systems Cloud based data storage Blockchain cutliers (0) is to year old (very young) \* Bo year old (very old). bistribution Th or older people que mostly Few younger 4.2.6 Data Visualization 47 what 15 Think of if the party Trends meil peuple ase doung Data visualization is graphical representation of data to get meaningful Insight, trends. and patterns from data. The visual elements which help in data visualization are charts, graphs, maps, figures, and dashboard etc.\] 4.2.7 Summary statisticstatistics It is information about the data in a sample. It can help understand the values better \[it may include the total number of values, minimum value, and maximum value, along the mean value and the standard deviation corresponding to a data collection Summary statistics help to understand the trends, outliers, and distribution of values in a data set what are 4.2.8 Requirement of Summary Statistics) The summary statistics provide a quick overview of characteristics of data. It leads towards a better understanding of data cleaning, data preprocessing, feature selection and data visualization, 4.3 Big Data No sowhat is big data? \[Big data contains greater variety, arriving in increasing volumes and with more velocity This is also known as the three Vs Big data is larger, more hat we complex datasets, especially new data sources. from These data sets are so voluminous that traditional data processing software cannot manage them. These massive volumes of data can be used to address business problems which were difficult to handle before The three Vs of big data are: de what does volume reter is in my dades Volume It refers to the amount of datal Big data deals with huge volumes of low-density, unstructured data. size/volume of data may vary from system to system. For some organizations, this might be tens of terabytes of data. For others, it may be hundreds of petabytes. Velocity: It refers to the speed of data) Velocity is the fast rate at which data is received velocity of data streams directly into memory rather than being Written to disk. Some Internet-enabled smart products operate in real-time and wi and action require real-time evaluation and Variety: awhid is variety to It refers motthout any delan e various formats and types of data that are available. Traditional to the data types were structured and fit neatly in a relational database. With the rise of big data, data comes in new data types. These unstructured data (text, images, videos) and semi structured data (JSON, XML) types require additional preprocessing to derive meaningful insight. (JavaScript offend Nelation, Extensible Voskup Language). 4.3.1 The history of big data The term big data emerged in the early 2000s as a term to describe exponential growth of data. Around (2005) people began to realize just how much data users generated through Facebook, YouTube, and other online services. In 2005 a tool called Hadoop (an open- source framework created specifically to store and analyze big datasets) was developed, which helped store and manage huge data. With the advent of the Internet of Things (IoT)) more objects and devices are connected to the internet, gathering data on customer usage patterns and product performance. The emergence of machine learning has produced still more data. The analysis of this huge data, provides business insight for optimized decision making. us make better choices help 32Advantages and benefits of big data sinesses with? Big data contains more information therefore it helps individuals, organizations, and businesses to optimize and generate cost effective solutions Big data has many advantages for the betterment and progress of business, some of them are as follows: QARA by dels help with product development? Product development: Developing and creating new products, services or brands is much easier when based od data collected from customers\' needs and wants. Companies use big data to anticipate customer demand. They build predictive models for new products and services by classifying key attributes of of past and current products. what is predictive winterres Predictive maintenance: I It is a proactive maintenance strategy that uses the analysis of existing data to predict when equipment machinery or product is likely to fall. Therefore, it indicates the potential issues before the problems happen. dea data improve contener sostation Customer experience/satisfaction: A clearer view of customer experience is more possible now than ever before \[Big data enables the businesses to gather data from social media, web visits, call logs, and other sources to Improve customer satisfaction.J what is rakodate in fraud prevent ors Fraud and compliance: Big data analytics can identify and detect unusual suspicious. and anomalies. As a result provides an effective tool to detect fraudulent activities and enhance cybersecurity measures. patterns 43.3 Big data challenges big data. Some of them are as follows: data, businesses encounter many challenges of 1\. Data Quality: Poor quality of data may lead to errors) inefficiency, and misleading insight after data analysis. A challenge? Antlyes Data Security and privacy. It is difficult to manage the protection and privacy of f massive datasets to prevent unauthorized acces data grou access iii\. Rapid growth of rapidly of data: Making systems that can handle keeps on in growing without slowing down is challenging. more and more data as it systems del property. Big data tool selection: Ensuring compatibility and seamless interaction between different big data tools and platforms. What is chalenge of combing dala In Different formats are hand to combere v\. Data integration: To create harmony among diverse data formats and structures is a difficult task. 4.3.4 Application of big data in business 50s How shoes big data help bussin Big data applications can help companies to make better business decisions by analyzing large volumes of data and discovering hidden patterns The following are a few business domains where big data can be Papplied? e big data can be applied 12 Healthcare big 2\. Media and Entertainment 3\. loT 4\. Manufacturing 5\. Government Health care Big data is making a major impact on the huge healthcare industry. Wearable devices and sensors collect patient data which is then fed in real-time to an individual\'s electronic health records. Healthcare providers are now using bio data to predict edict epidemic outbreaks, real time alerting, predict and prevent serious medical conditions etc\] Researchers analyze the data to determine the best treatment for a particular disease, side effects of the drugs, forecasting the health risks, etc. Media and entertainment, are The media and entertainment industhes are creating, advertising, and distributing their content using new business models, The media houses are targeting audiences\] by predicting what they would like to see, how to target the ads, content monetization, etc. Big data systems are thus increasing the revenues of such media houses by analyzing viewer patterns. 135 ent Unit 4 Data and Analysts and anomalies. As a result provides an effective tool to detect fraudulent activities and enhance cybersecurity measures. patterns 43.3 Big data challenges big data. Some of them are as follows: data, businesses encounter many challenges of R 1\. Data Quality: Poor quality of data may lead to errors) inefficiency, and misleading insight after data analysis. A challenge? Antlyes Data Security and privacy. It is difficult to manage the protection and privacy of f massive datasets to prevent unauthorized acces data grou access iii\. Rapid growth of rapidly of data: Making systems that can handle keeps on in growing without slowing down is challenging. more and more data as it systems del property. Big data tool selection: Ensuring compatibility and seamless interaction between different big data tools and platforms. What is chalenge of combing dala In Different formats are hand to combere v\. Data integration: To create harmony among diverse data formats and structures is a difficult task. 4.3.4 Application of big data in business 50s How shoes big data help bussin Big data applications can help companies to make better business decisions by analyzing large volumes of data and discovering hidden patterns The following are a few business domains where big data can be Papplied? e big data can be applied 12 Healthcare big 2\. Media and Entertainment 3\. loT 4\. Manufacturing 5\. Government Health care Big data is making a major impact on the huge healthcare industry. Wearable devices and sensors collect patient data which is then fed in real-time to an individual\'s electronic health records. Healthcare providers are now using bio data to predict edict epidemic outbreaks, real time alerting, predict and prevent serious medical conditions etc\] Researchers analyze the data to determine the best treatment for a particular disease, side effects of the drugs, forecasting the health risks, etc. Media and entertainment, are The media and entertainment industhes are creating, advertising, and distributing their content using new business models, The media houses are targeting audiences\] by predicting what they would like to see, how to target the ads, content monetization, etc. Big data systems are thus increasing the revenues of such media houses by analyzing viewer patterns. Unit 4 Data Internet of Things (IoT) Big data plays an important role in enhancing the capabilities of lot devices, IoT de generate continuous data. Th The analytics based on this huge data helps in for unlocking the full g data is essential for customer experience. In brief, big d by providing meaningful.insighý derived from the massive amount lot devices. Dueper undentceding of stuations / trend. person potentials of data generatas Manufacturing applications in manufacturing manufacturing industries: In montaturing. business decisio Big data helps the manufacturing companies to make better products and decisions. It helps in predicting when machines might need maintenancel, making sure they don\'t unexpectedly stop working. Big data also locks how products are made better and cheaper. It is like having a smart assistant that goe the whole manufacturing process, making things more efficient and helping companies bulld the best products. 41 The following are some of the major advantages o a break (predic of employing big t L product quality Highp Tracking faults Supply planning Predicting the output Increasing energy efficiency Testing and simulation of new manufacturing process Large scale customization of manufacturing\] Government Geram dues big data help geresnments Analytics through big data management techniques allows governments to undenta the needs of their citizens, combat fraud, minimize system errors and impro operations) reducing costs and improving the services of any government entity adopting big data systems, the government can attain efficiency in terms of cast, outpu and novelty/argent. Big data applications can be applied in each and everywhere big data finds application Include:Next \[Agriculture elie cais big data be applied? Aviationhouse Cyber security and Intelligence flying ciarafl Crime prediction and prevention E-commerce Fake news detection Fraud detection Pharmaceutical drug evaluation Scientific research Weather forecasting Tax compliance

Use Quizgecko on...
Browser
Browser