BI L1 PDF
Document Details
Uploaded by GiftedUniverse
Westmont College
Tags
Related
- Unit 1 CS - Statistics For Data Science PDF
- Unit 1 CS Statistics For Data Science PDF
- Statistics For Data Science Notes PDF
- Business Intelligence, Analytics, & Data Science: A Managerial Perspective (PDF)
- IT3080 Data Science & Analytics Lecture 01 PDF
- Database Systems, Data Centers, and Business Intelligence PDF
Summary
This document provides an overview of business intelligence, data science, and database administration concepts. It explores data types, processing methods, applications, and professional roles in the field. It also includes references to organizations and academic institutions.
Full Transcript
Business Intelligence and Database 1 Administration Lots of data is being collected and warehoused Web data, e-commerce Financial transactions, bank/credit transactions Online trading and purchasing Social Network...
Business Intelligence and Database 1 Administration Lots of data is being collected and warehoused Web data, e-commerce Financial transactions, bank/credit transactions Online trading and purchasing Social Network Administration Business Intelligence and Database 2 Google processes 20 PB* a day (2008) Facebook has 60 TB of daily logs eBay has 6.5 PB of user data + 50 TB/day (5/2009) 1000 genomes project: 200 TB Administration Business Intelligence and Database Cost of 1 TB of disk: $35 Time to read 1 TB disk: 3 hrs (100 MB/s) 3 *A petabyte is a measure of memory or data storage capacity that is equal to 2 to the 50th power of bytes Big Data is any data that is expensive to manage and hard to extract value from Volume The size of the data Administration Business Intelligence and Database Velocity The latency of data processing relative to the growing demand for interactivity Variety and Complexity the diversity of sources, formats, quality, structures. 4 Business Intelligence and Database 5 Administration Relational Data (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Administration Business Intelligence and Database Graph Data Social Network, Semantic Web (RDF), … Streaming Data You can afford to scan the data once 6 Aggregation and Statistics Data warehousing and OLAP (Online analytical processing) Administration Business Intelligence and Database Indexing, Searching, and Querying Keyword based search Pattern matching (XML/RDF) Knowledge discovery Data Mining 7 Statistical Modeling “… the sexy job in the next 10 years will be statisticians,” Hal Varian, Google Chief Economist The U.S. will need 140,000-190,000 predictive analysts and 1.5 million managers/analysts by 2018. Administration Business Intelligence and Database McKinsey Global Institute’s June 2011 New Data Science institutes being created or repurposed – NYU, Columbia, Washington, UCB,... New degree programs, courses, boot-camps: e.g., at Berkeley: Stats, I-School, CS, Astronomy… One proposal (elsewhere) for an MS in “Big Data Science” 8 An area that manages, manipulates, extracts, and interprets knowledge from tremendous amount of data Data science (DS) is a multidisciplinary field of study with goal to address the challenges in big data Administration Business Intelligence and Database Data science principles apply to all data – big and small Data science combines: math and statistics, specialized programming, advanced analytics, artificial intelligence (AI), and machine learning. 9 Theories and techniques from many fields and disciplines are used to investigate and analyze a large amount of data to help decision makers in many industries such as science, engineering, economics, politics, finance, and education Computer Science Administration Business Intelligence and Database Pattern recognition, visualization, data warehousing, High performance computing, Databases, AI Mathematics Mathematical Modeling Statistics Statistical and Stochastic modeling, Probability. These insights can be used to guide decision making and strategic planning. The data science lifecycle involves various roles, tools, and processes, which 10 enables analysts to glean actionable insights. Business Intelligence and Database 11 Administration Business Intelligence and Database 12 Administration Companies learn your secrets, shopping patterns, and preferences For example, can we know if a woman is pregnant, even if she doesn’t want us to Administration Business Intelligence and Database know? Target case study Data Science and election (2008, 2012) 1 million people installed the Obama Facebook app that gave access to info on “friends” 13 A data science project undergoes the following stages: Data ingestion: The lifecycle begins with the data collection--both raw structured and unstructured data from all relevant sources using a variety of methods. Administration Business Intelligence and Database Data storage and data processing: Since data can have different formats and structures, companies need to consider different storage systems based on the type of data that needs to be captured. This stage includes cleaning data, deduplicating, transforming and combining the data using ETL (extract, transform, load) jobs or other data integration technologies. Data analysis: Data scientists conduct an exploratory data analysis to examine biases, patterns, ranges, and distributions of values within the data. Communicate: Finally, insights are presented as reports and other data visualizations that make the insights—and their impact on business—easier 14 for business analysts and other decision-makers to understand. Data science is considered a discipline, while data scientists are the practitioners within that field. Data scientists are not necessarily directly responsible for all the processes involved in the data science lifecycle. Administration Business Intelligence and Database Data scientist responsibilities can commonly overlap with a data analyst, particularly with exploratory data analysis and data visualization. 15 Data Scientist The Sexiest Job of the 21st Century They find stories, extract knowledge. They are not reporters Administration Business Intelligence and Database Data scientists are the key to realizing the opportunities presented by big data. They bring structure to it, find compelling patterns in it, and advise executives on the implications for products, processes, and decisions. 16 National Security Cyber Security Business Analytics Engineering Administration Business Intelligence and Database Healthcare And more …. 17 Business Intelligence and Database 18 Administration A data scientist must be able to: Know enough about the business to ask pertinent questions and identify business pain points. Apply statistics and computer science, along with business acumen, to data analysis. Use a wide range of tools and techniques for preparing and Administration Business Intelligence and Database extracting data—everything from databases and SQL to data mining to data integration methods. Extract insights from big data using predictive analytics and artificial intelligence (AI), including machine learning models, natural language processing (NLP), and deep learning. Write programs that automate data processing and calculations. Tell—and illustrate—stories that clearly convey the meaning of results to decision-makers and stakeholders at every level of technical understanding. Explain how the results can be used to solve business problems. Collaborate with other data science team members, such as data and 19 business analysts, IT architects, data engineers, and application developers. R Studio: An open source programming language and environment for developing statistical computing and graphics. Python: It is a dynamic and flexible programming language. The Python includes numerous libraries, such as NumPy, Pandas, Matplotlib, for analyzing data quickly. To facilitate sharing code and other information, data scientists may use GitHub and Jupyter notebooks. Administration Business Intelligence and Database Some data scientists may prefer a user interface, and two common enterprise tools for statistical analysis include: SAS: A comprehensive tool suite, including visualizations and interactive dashboards, for analyzing, reporting, data mining, and predictive modeling. IBM SPSS: Offers advanced statistical analysis, a large library of machine learning algorithms, text analysis, open source extensibility, integration with big data, and seamless deployment into applications. 20 A data scientist may or may not Who is a Data Scientist? have specialized industry In addition to advanced analytic skills, this individual is also knowledge to aid in modeling proficient at integrating and preparing large, varied datasets, business problems and with architecting specialized database and computing environments, understanding and preparing data. and communicating results. Creating value from data The data scientist has requires a range of talents: TASKS MISSION emerged as a new role, from data integration and distinct from — but Data Scientist preparation, to with similarities to PROFILE TALENT architecting specialized Administration Business Intelligence and Database those of business GARTNER computing/database intelligence (BI),analysts environments, to data and statisticians RESPONSIBILITY PECULIARITY mining and intelligent algorithms An individual responsible for modeling complex business problems, discovering business insights Data scientists can be invaluable in generating and identifying opportunities through the use of insights, especially from "big data;" but their unique statistical, algorithmic, mining and visualization combination of technical and business skills, together techniques. with their heightened demand, makes them difficult to find or cultivate. 21 It may be easy to confuse the terms “data science” and “business intelligence” (BI) because they both relate to an Administration Business Intelligence and Database organization’s data and analysis of that data, but they do differ in focus. Business intelligence (BI) is an umbrella term for the technology that enables data preparation, data mining, data management, and data visualization. 22 Business Intelligence and Database 23 Administration In BI we use tools and techniques to turn data into meaningful information. Process: Methods used by the organization to turn data into knowledge. Product: Information that allows businesses to make decisions. Administration Business Intelligence and Database Business intelligence tools and processes allow end users to: Identify actionable information from raw data, Facilitating data-driven decision-making within organizations across various industries, Focusing on data from the past to understand what happened before to inform a course of action. 24 Business Intelligence and Database 25 Administration Business analysts (BAs) are responsible for bridging the gap between IT and the business using data analytics to assess processes, determine requirements and deliver data-driven recommendations and reports to executives and stakeholders. Administration Business Intelligence and Database Business analysts use their business analysis capabilities to work within the core of many companies small and large to improve and streamline processes that help an organization meet its objectives and reach goals. 26 Analyzing and evaluating the current business processes a company has and identifying areas of improvement Researching and reviewing up-to-date business processes and new IT advancements to make systems more modern Presenting ideas and findings in meetings Administration Business Intelligence and Database Training and coaching staff members Creating initiatives depending on the business’s requirements and needs Developing projects and monitoring project performance Collaborating with users and stakeholders Working closely with senior management, partners, clients 27 and technicians Business intelligence focuses on descriptive analytics: BI prioritizes descriptive analytics, which provides a summary of historical and present data to show what has happened or what is currently happening. Administration Business Intelligence and Database BI answers the questions “what” and “how” so you can replicate what works and change what does not. Business analytics focuses on predictive analytics Business analytics prioritizes predictive analytics, which uses data mining, modeling, and machine learning (ML) to determine the likelihood of future outcomes. BA answers the question “why” so it can make more educated predictions about what will happen. With BA, you can anticipate 28 developments and make the changes necessary to succeed. Example: «Jewelry online store» Business intelligence provides helpful reports on the past and current state of your business. BI tells you that sales of your blue feather earrings have spiked in Utah in the past three weeks. As a result, you decide to make more blue feather earrings to keep up with demand. Administration Business Intelligence and Database Business analytics asks, “Why did sales of blue feather earrings spike in Utah?” By mining your website data, you learn that a majority of traffic has come from a post by a Salt Lake City fashion blogger who wore your earrings. This insight helps you decide to send complimentary earrings to a few other prominent fashion bloggers throughout the US. You use the previous sales information to anticipate how many earrings you will need to make and how many supplies you will need to order to keep up with demand if the bloggers were to post about 29 the earrings. Data analytics is a broad umbrella for finding insights in data: He can refer to any form of analysis of data—whether in a spreadsheet, database, or app—where the intent is to uncover trends, identify anomalies, or measure performance Administration Business Intelligence and Database Business analytics focuses on identifying operational insights: He focuses on the overall function and day-to-day operation of the business. A business analyst would deal less with the technical aspects of analysis and more with the practical applications of data insights. 30 Same Example: «Jewelry online store» A data analyst would look at how people are using your website, identify trends in traffic, analyze visitor demographics, and maybe even create a system for tracking how customers click through Administration Business Intelligence and Database different pages. A business analyst would deal more with the practical applications of this data and how it can help you make decisions for purchasing ads, creating new products, and updating your website. 31 Business Intelligence and Database 32 Administration Business Intelligence and Database 33 Administration Business Intelligence and Database 34 Administration In reality, a business needs both business intelligence and business analytics—descriptive and predictive analytics—to succeed For developing a business intelligence strategy you need to Administration Business Intelligence and Database ask important questions, such as: Who are the key stakeholders? Who will be using this system? What departments need business intelligence and what will be measured? What support do content authors and information consumers need? 35 Business Intelligence and Database 36 Administration Business Intelligence and Database 37 Administration 1.Business understanding : Understanding the domain and the business problem that can be avoided or improved for better results to the business. This involves multiple stakeholders like domain experts, Business Analysts, data engineers, data scientist and software engineers. Each has a important role of defining the Administration Business Intelligence and Database requirements and contributions at each stage of the project. 2. Data Acquisition & understanding : This is the beginning of the data science work. In this step, we collect all the available data from different sources and start analyzing the data. This is the longest and important stage of the data science project life cycle. This involves multiple steps. 38 3.Feature Engineering : In this stage we have to make the data suitable for building models. 4.Model Building : This is a phase we build a statistical or machine learning model that will be useful for business to take Administration Business Intelligence and Database decisions. 5.Model deployment : We deploy the best model to production so that it can be made available to make decisions on new data. 39 Business Intelligence and Database 40 Administration Business Intelligence and Database 41 Administration