intro-lectures-to-DSA (2).pdf
Document Details
Uploaded by Deleted User
Tags
Full Transcript
Module in Data Science Analytics Module 1 Overview of Data, Data Science Analytics, and Tools What is Data Science? Data science is a field of study that focuses on techniques and algorithms to extr...
Module in Data Science Analytics Module 1 Overview of Data, Data Science Analytics, and Tools What is Data Science? Data science is a field of study that focuses on techniques and algorithms to extract knowledge from data. The area combines data mining and machine learning with data-specific domains. This section focuses on defining "data" before going to any complicated topic. Data analysis refers to the process of inspecting, cleaning, transforming, and interpreting data to discover valuable insights, draw conclusions, and support decision-making. It involves using various techniques and tools to analyze large sets of data and extract meaningful patterns, trends, correlations, and relationships within the data. Data analysis is essential across various industries and disciplines, as it helps uncover valuable information that can be used to optimize processes, solve problems, and make informed decisions. What is the purpose of data analysis? The purpose of data analysis is to gain meaningful insights from raw data to support decision-making, identify patterns, and extract valuable information. Some of the key objectives of data analysis include: Identifying trends and patterns, Making data-driven decisions, Finding correlations and relationships, Detecting anomalies, Improving performance, and Predictive modeling. AN INTRODUCTION TO DATA Data science is a field of study that focuses on techniques and algorithms to extract knowledge from data. The area combines data mining and machine learning with data-specific domains. This section focuses on defining "data" before going to any complicated topic. What is Data? The word “data” has the following meaning, based on the Oxford dictionary. Data refers to facts and statistics collected for reference analysis. Based on the definition, data has three aspects: (1) Data comes from facts and statistics, (2) Data is collected, and (3) Data is used for reference or analysis. The simplest form of data A table is probably the simplest form of data. Surprisingly, most implementations of data science algorithms still today use tabular data as inputs. Data scientists prefer to convert any type of complex data — such as text, image, or time series — to tables to make sure that existing tools can be leveraged for analysis. As an example, let us say that a company keeps information about its employees in an excel table. Here is the table. One reason for the popularity of tabular representation is the ease in storing the tabular data directly in the main memory of the computer. Regardless of the number of rows or columns, a tabular dataset can always be stored in a 2-dimensional array. Given some data, a data scientist tries to retrieve interesting information that might help the data owner make decisions. Can data speak? Given the following data table regarding employees, we will try to retrieve interesting information from it. Closely look at the table for several minutes. Then, write down anything interesting you can find. Here is what I could find from the table above. You can see how many of your findings match with the findings listed below and how many of your findings are not listed below. Please feel free to write your additional findings in the Comments section. Jane and Dave earn the highest salary. Delilah earns the least. Jane and Dave are the oldest people in the group. Delilah is the youngest among all the employees in the table. These are all interesting findings. What else? Will the following statement be a correct one based on the information provided in the table? Older people earn more in the company from where the data was collected. It is indeed a correct piece of information based on the data provided to us in the table. Now, let us go back to the definition: Data refers to “facts and statistics collected for reference or analysis.” This table has facts. This table is collected from a company. We used the table for analysis. We revealed that the company appreciates experienced employees. Basically, the data reflects a general trend – Experience, wisdom, (and money, which is the salary in this case) come with age. There can be debates regarding the conclusion but the main point is — Data Speaks. Data gives us insights. Data gives us those light-bulb moments. The Difference between Data Science vs Data Analytics What is Data Science? Data Science is the process of using scientific methods, tools and systems to shape raw data into meaningful information. Data scientists use machine learning algorithms to build complex, predictive models that find patterns and trends in the data. Data Science software is used to manipulate, organise and build predictive data models. What is Data Analytics? Data Analytics is the science of analysing either raw or processed data to derive useful insights, that can then be turned into actionable plans or strategies. Data Analytics often builds on the work first done by Data Science, using the predictive information to make decisions. What is Business Analytics? Business Analytics is the application of Data Analytics tools and techniques in a business context. The historical data of a business is statistically analysed in order to understand past performance, predict future market trends, create more accurate budgets and much more. The Function of Data Science vs Data Analytics The function of Data Science is to build the foundation from which Data Analytics then works on. The key functions associated with this field are; Programming: Coding algorithms and computer models that can analyse large data sets. The most common programming languages used in Data Science is R, SQL and Python. Data wrangling: Cleaning the data and then organising the data coherently so that it’s both easier and more readily available to use. Statistical modelling: Using statistical assumptions and mathematical models such as regression analysis, k-mean clustering and more, to identify relationships between two or more variables. This function is tied to Quantitative research methodologies. The function of Data Analytics is to apply a set of analysis specific frameworks and tools to data sets in order to generate information that can be used to make decisions. These frameworks are; Predictive Analytics: The use of past trends, patterns and historical data to make predictions about future events, and act accordingly. An example of this would be to increase the inventory count of an item that sees spikes in sales during a specific month or season. Prescriptive Analytics: This uses all available data to determine the best strategy, action or plan that should be taken in a specific scenario, in order to reach the objective. It is considered a more advanced form of Predictive Analytics. An example of this would be e-commerce websites that show consumers a specific product they know would entice a purchase, based on that consumer’s lifestyle data, browsing patterns and previous purchase history. Descriptive Analytics: The means of summarising data to analyse, understand and describe ‘what happened’ either in real-time or at a particular point of time in a business. An example of this would be KPI reports or dashboards that depict the current figures against an established benchmark. Diagnostic Analytics: This uses data to understand and analyse ‘why something happened’. An example of this would be identifying why a social media campaign faired either very poorly or did very well, in order to either avoid or duplicate the parameters. What are the steps in the data science process ? The data science process typically involves several steps, including: 1. Defining the problem: Identifying the problem or question that the data science project is intended to solve. 2. Collecting and cleaning data: Gathering the data needed for the project and preparing it for analysis by cleaning and preprocessing it. 3. Exploring and visualizing data: Examining the data to get a better understanding of its characteristics and patterns. Visualization techniques, such as plots and charts, can be used to help identify trends and relationships in the data. 4. Modeling and evaluation: Building and testing machine learning models to make predictions or inferences from the data. This step may involve selecting and tuning the model, as well as evaluating its performance using metrics such as accuracy or precision. 5. Communicating findings: Presenting the results of the data science project to stakeholders, including key findings and recommendations. What are some common applications of data science ? Data science is used in a wide range of industries and sectors, including finance, healthcare, retail, and technology. Some common applications of data science include: Predictive modeling: Using machine learning algorithms to predict future outcomes based on past data. Customer segmentation: Grouping customers into different categories based on their characteristics and behaviors. Fraud detection: Identifying fraudulent activity using patterns and anomalies in data. Fraud and Risk Detection The earliest applications of data science were in Finance. Companies were fed up of bad debts and losses every year. However, they had a lot of data which use to get collected during the initial paperwork while sanctioning loans. They decided to bring in data scientists in order to rescue them from losses. Over the years, banking companies learned to divide and conquer data via customer profiling, past expenditures, and other essential variables to analyze the probabilities of risk and default. Moreover, it also helped them to push their banking products based on customer’s purchasing power. Supply chain optimization: Analyzing data to improve efficiency and reduce costs in the supply chain. Healthcare The healthcare sector, especially, receives great benefits from data science applications. 1. Medical Image Analysis Procedures such as detecting tumors, artery stenosis, organ delineation employ various different methods and frameworks like MapReduce to find optimal parameters for tasks like lung texture classification. It applies machine learning methods, support vector machines (SVM), content-based medical image indexing, and wavelet analysis for solid texture classification.healthcare 1 - Data Science Applications - Edureka 2. Genetics & Genomics Data Science applications also enable an advanced level of treatment personalization through research in genetics and genomics. The goal is to understand the impact of the DNA on our health and find individual biological connections between genetics, diseases, and drug response. Data science techniques allow integration of different kinds of data with genomic data in the disease research, which provides a deeper understanding of genetic issues in reactions to particular drugs and diseases. As soon as we acquire reliable personal genome data, we will achieve a deeper understanding of the human DNA. The advanced genetic risk prediction will be a major step towards more individual care. 3. Drug Development The drug discovery process is highly complicated and involves many disciplines. The greatest ideas are often bounded by billions of testing, huge financial and time expenditure. On average, it takes twelve years to make an official submission. Data science applications and machine learning algorithms simplify and shorten this process, adding a perspective to each step from the initial screening of drug compounds to the prediction of the success rate based on the biological factors. Such algorithms can forecast how the compound will act in the body using advanced mathematical modeling and simulations instead of the “lab experiments”. The idea behind the computational drug discovery is to create computer model simulations as a biologically relevant network simplifying the prediction of future outcomes with high accuracy. 4. Virtual assistance for patients and customer support Optimization of the clinical process builds upon the concept that for many cases it is not actually necessary for patients to visit doctors in person. A mobile application can give a more effective solution by bringing the doctor to the patient instead. The AI-powered mobile apps can provide basic healthcare support, usually chatbots. You simply describe your symptoms, or ask questions, and then receive key information about your medical condition derived from a wide network linking symptoms to causes. Apps can remind you to take your medicine on time, and if necessary, assign an appointment with a doctor. This approach promotes a healthy lifestyle by encouraging patients to make healthy decisions, saves their time waiting in line for an appointment, and allows doctors to focus on more critical cases. The most popular applications nowadays are Your.MD and Ada. Internet Search Now, this is probably the first thing that strikes your mind when you think Data Science Applications. When we speak of search, we think ‘Google’. Right? But there are many other search engines like Yahoo, Bing, Ask, AOL, and so on. All these search engines (including Google) make use of data science algorithms to deliver the best result for our searched query in a fraction of seconds. Considering the fact that, Google processes more than 20 petabytes of data every day. Had there been no data science, Google wouldn’t have been the ‘Google’ we know today. Targeted Advertising If you thought Search would have been the biggest of all data science applications, here is a challenger – the entire digital marketing spectrum. Starting from the display banners on various websites to the digital billboards at the airports – almost all of them are decided by using data science algorithms. This is the reason why digital ads have been able to get a lot higher CTR (Call-Through Rate) than traditional advertisements. They can be targeted based on a user’s past behavior. This is the reason why you might see ads of Data Science Training Programs while I see an ad of apparels in the same place at the same time. Website Recommendations Aren’t we all used to the suggestions about similar products on Amazon? They not only help you find relevant products from billions of products available with them but also add a lot to the user experience. A lot of companies have fervidly used this engine to promote their products in accordance with user’s interest and relevance of information. Internet giants like Amazon, Twitter, Google Play, Netflix, Linkedin, IMDb, and much more use this system to improve the user experience. The recommendations are made based on previous search results for a user. Speech Recognition Some of the best examples of speech recognition products are Google Voice, Siri, Cortana etc. Using the speech-recognition feature, even if you aren’t in a position to type a message, your life wouldn’t stop. Simply speak out the message and it will be converted to text. However, at times, you would realize, speech recognition doesn’t perform accurately. Airline Route Planning Airline Industry across the world is known to bear heavy losses. Except for a few airline service providers, companies are struggling to maintain their occupancy ratio and operating profits. With high rise in air-fuel prices and need to offer heavy discounts to customers has further made the situation worse. It wasn’t for long when airlines companies started using data science to identify the strategic areas of improvements. Now using data science, the airline companies can: Predict flight delay Decide which class of airplanes to buy Whether to directly land at the destination or take a halt in between (For example, A flight can have a direct route from New Delhi to New York. Alternatively, it can also choose to halt in any country.) Effectively drive customer loyalty programs Southwest Airlines, Alaska Airlines are among the top companies who’ve embraced data science to bring changes in their way of working. You can get a better insight into it by referring to this video by our team, which vividly speaks of all the various fields conquered by Data Science Applications. Gaming Games are now designed using machine learning algorithms that improve/upgrade themselves as the player moves up to a higher level. In motion gaming also, your opponent (computer) analyzes your previous moves and accordingly shapes up its game. EA Sports, Zynga, Sony, Nintendo, Activision-Blizzard have led the gaming experience to the next level using data science. Augmented Reality This is the final of the data science applications which seem most exciting in the future. Augmented reality. Data Science and Virtual Reality do have a relationship, considering a VR headset contains computing knowledge, algorithms and data to provide you with the best viewing experience. A very small step towards this is the high-trending game of Pokemon GO. The ability to walk around things and look at Pokemon on walls, streets, things that aren’t really there. The creators of this game used the data from Ingress, the last app from the same company, to choose the locations of the Pokemon and gyms. References: https://www.staffordglobal.org/articles-and-blogs/data-science-articles-and-blogs/fundamental- data-science-concepts/ https://www.staffordglobal.org/articles-and-blogs/data-science-articles-and-blogs/difference-da ta-science-data-analytics/ https://studymafia.org/data-analysis-ppt/?expand_article=1 https://medium.com/@hamzakhalid2111/the-fundamentals-of-data-science-a-guide-for-beginn ers-b563db9522ba https://mitu.co.in/wp-content/uploads/2021/11/7.-Data-Analytics.pdf https://www.investopedia.com/terms/p/prescriptive-analytics.asp https://www.upgrad.com/blog/types-of-data/