Data Analysis & Decision Making PDF

WEEK 12 DIKW Model Decision Making Includes evaluating choices, considering consequences and stakeholder responses in a world of competing interests to ﬁnd acceptable solutions Traditional decision making is based on: - Human judgment - Experience - Expertise - Tacit / Explicit knowledge Decision Types Management Level & DM Complexity Operational decision ○ Daily operations ○ Data input: transaction processing systems, sensors, EFTPOS , etc. ○ Eg: How many products to make every hours/day/week (e.g. in an ofﬁce equipment manufacturing company) Managerial/Tactical decisions: ○ Allocating and utilizing resources, short-term ○ Data input: from both the strategic level (e.g. budge info, strategies etc.), and operational level (e.g. operational reports) ○ Eg: How much budget or how many laborers is required to produce one table Strategic decisions: ○ Concerned with the long term and organization-wide issues ○ Eg: Do we need to have multiple ofﬁces across the globe to increase our market share? ○ Data input: general reports (not detailed). High level of knowledge is required ○ Are almost always made in group (in medium and large enterprises) Data-driven vs Judgment-driven Data Ethics PAPA Framework 4 dimensions of Ethics for Information Age Privacy - What information must people reveal about themselves to others? - Are there some things that people do not have to reveal about themselves? - Can the information that people provide be used to identify their personal preferences or history when they don’t want those preferences to be known? - Can the information that people provide be used for purposes other than those for which they were told that it would be used? Property - Who owns the data/information? - What are the just and fair prices for its exchange? - Who owns the channels of data/information? Accuracy - Who is responsible for the reliability, authenticity and accuracy of data collected? - How can we ensure that information will be processed properly and presented accurately to users? - How can we ensure that errors in databases, data transmissions and data processing are not intentional? - Who is to be held accountable for errors in data/information and how is an injured party compensated? Accessibility - What information does a person or organisation have a right to obtain, with what safeguards, and under what conditions? - Who can access personal information kept in ﬁles/databases? - Who is allowed to access what data/information and for what purpose? Extended PAPA frameworks The four dimensions of the original PAPA framework are still relevant However, new ethical challenges created by - new data types (e.g. big data, unstructured data such as pictures) - different types of BA (descriptive/predictive/prescriptive) - other data-intensive technologies (e.g. social media, open data platforms) - new organisational and societal use of data such as: digital surveillance, dataﬁcation of individuals (e.g digital traces of individuals), integration and propagation of data across organisational boundaries create the need for extended PAPA frameworks with more than 4 dimensions Eg: Skills required for Business Analysts Soft skills - Communication - Critical thinking - Problem-solving - Collaboration - Adaptability - Negotiation - Time management Hard skills - Data analysis (Excel, SQL, statistical software) - Data visualisation (Tableau, Power BI) - Programming (Python or R) - Technical Writing - Database Management - Project Management - Business Acumen / Savviness WEEK 11 Clustering Natural grouping (or clustering) of data based on their similarities There is no output or target variable Eg: Clustering customers into different segments based on their demographics and past purchase behaviours. Unlike classiﬁcation, the cluster labels are unknown Human expert is needed to interpret the clusters and label them Part of machine learning family The logic for clustering is to divide cases into groups, or clusters, so that the degree of similarity is strong among members of the same cluster and weak among members of different clusters Key requirement: Needs a good measure of similarity between the cases K-mean The most referenced clustering algorithm k: stands for pre-determined number of clusters ○ To be decided by the analyst ○ An input to the algorithm Logic: Assigns each data point (or instance) to the cluster whose centre is the nearest Similarity measure: Distance to the centre of cluster K-means Clustering Algorithm – k : pre-determined number of clusters – Algorithm: Step 0 determine value of k Step 1 Randomly generate k random points as initial cluster centers Step 2 Assign each point to the nearest cluster center Step 3 Re-compute the new cluster centers Repetition step: Repeat steps 3 and 4 until some convergence criterion is met (usually that the assignment of points to clusters becomes stable) Interpreting the clusters Clustering algorithms only divide items into some clusters Interpreting the business meaning of the clusters is the job for human (the analyst) – based on the characteristics of the items in each cluster K-means weaknesses Requires pre-decided K Only work with numbers not nominal or ordinal data If too many outliers, too sensitive to noises as they change the center radically Association Rule Mining Finds interesting relationships among variables in large datasets → In other words: Finds the commonly co-occurrence of things There is no output variable Also known as market basket analysis Often used as an example to describe Data Mining to ordinary people : Eg: Beers and diapers(!) going together in market-basket analysis Input: the simple point-of-sale transaction data Output: Most frequent relationship among items Eg: Example: according to the transaction data… “Customer who bought a laptop computer and a virus protection software, also bought extended Warranty plan 70 percent of the time." How do you use such a pattern/knowledge? ○ Put the items next to each other ○ Promote the items as a package ○ Place items far apart from each other Association Rule Algorithms The algorithms help identify the frequent item sets, which are, then converted to association rules Apriori Algorithm Finds subsets that are common to at least a minimum number of the itemsets → Support Threshold Uses a bottom-up approach ○ frequent subsets are extended one item at a time (the size of frequent subsets increases from one-item subsets to two-item subsets, then three-item subsets, and so on), and ○ groups of candidates at each level are tested against the data for minimum support Data Mining vs Statistics Statistics = Foundation of data mining WEEK 9 Data Mining - The process of extracting non-trivial, novel, valid, potentially useful and ultimately understandable patterns or knowledge from large amount of data - Technically speaking, data mining is a process that use statistical, mathematical, and artiﬁcial intelligence techniques to extract and identify useful information and subsequent knowledge (or patterns) from large set of data Data Mining Applications Customer Relationship Management ○ Improve customer retention (churn analysis: Why do they leave?) ○ Maximise customer value (cross-, up-selling) ○ Identify and treat most valued customers Banking & Other Financial ○ Automate approval of loan application process ○ Detecting fraudulent transactions ○ Optimising cash reserves with forecasting Medicine: Discovering new drugs by identifying potential compounds and understanding patient responses to different treatments. Insurance: Assessing risks associated with policyholders. For example, they analyse driving history and demographics to predict the likelihood of accidents, helping insurers set appropriate premiums for car insurance. Tourism: Recommending travel destinations, activities, and accommodations based on a user’s past behaviours and preferences. Retailing and Logistics Manufacturing and Maintenance Main Types of Data Mining Methods 1. Prediction 2. Classiﬁcation (Some people put it under prediction group as in nature it can be used for prediction) 3. Clustering (Segmentation) 4. Associations 5. Time-Series Data Mining Algorithms 1. Prediction → Regression 2. Classiﬁcation → Decision Tree 3. Clustering → K-mean 4. Associations → Apriori Prediction Tell the nature of future occurrence of certain events based on what has happened in the past Regression Linear Regression: Investigating the relationship between a dependent variable (or Target) and a (set of) independent variable(s) (or predictor) - Data type: number Classiﬁcation - Perhaps the most frequently used data mining method in real-word problems - Part of machine learning family - Learn from past data, classify new data + Learns pattern from past data in order to place new instances into their respective groups or classes Eg: to predict - Whether a person will be a likely customer, or no hope (for target marketing) - Whether a customer is likely to turn to another phone company (yes or no) (for customer retention) Classiﬁcation vs Regression Both classiﬁcation and regression are about “prediction” If it is a numeric value (e.g., temperature, such as 40°C), the prediction problem is called a regression. [numerical data] If what is being predicted is a class label (e.g., “sunny,” “rainy,” or “cloudy”), the prediction problem is called a classiﬁcation [categorical data] Decision Tree WEEK 6 Big Data Data that cannot be stored or processed easily using traditional tools/means Typically refers to data that comes in many different forms: large, structured, unstructured, continuous (image, video…) ○ 3Vs – Volume, Variety, Velocity (speed), (Veracity) Data (Big Data or otherwise) is worthless if it does not provide business value → for it to provide business value, it has to be analysed A major ingredient for Business Intelligence / Artiﬁcial Intelligence / Data Science / Data Analytics Sources of Big Data Big Data Uses - Retail organisations monitor social networks to engage brand advocates, identify brand adversaries - Advertising and marketing agencies track comments on social media - Hospitals analyse medical data and patient records - Consumer product companies monitor social networks to gain insight into consumer behaviour - Financial service organisations use data to identify customers who are likely to be attracted to increasingly targeted and sophisticated offers Data Warehouse A large, central repository of current and historical enterprise data, integrated from one or more disparate sources, and is considered a core component of Business Intelligence, which supports decision making → prepared processed data Data Mart A subset of a data warehouse that is used by small- and medium-sized businesses and departments within large companies to support decision making Data Lake Takes a “store everything” approach to Big Data, saving all the data in its raw and unaltered form (and also processed data) Hadoop - A distributed framework that save data and analyse data (map reduce) - Use a lot of server → decentralised Data Visualization The graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualisation tools provide an accessible way to see and understand trends, outliers, and patterns in data → make the data presentable & understandable Data Visualization Process Business Intelligence - Includes a wide range of applications, practices, and technologies for the extraction, transformation, integration, visualisation, analysis interpretation, and presentation of data to support improved decision making - Data used in BI is often pulled from multiple sources and may come from sources internal and external to the organisation. Data can be used to build large collections of data called data warehouses, data marts, and data lakes - Business analytics analyses data, which usually results in reports or data visualizations such as dashboards. These help managers to digest and understand data, and give them actionable insights that help them make decisions BI and analytics are used to achieve a number of beneﬁts: - Detect fraud - Improve forecasting - Increase sales - Optimise operations - Reduce costs WEEK 4 + 5 Database A database is a self-describing collection of integrated records A database contains: 1. Tables 2. Relationships among tables 3. Metadata Tables Represent business objects - E.g., Student, Customer, Employee, Product, etc. Hierarchy of data elements - Bytes/characters are grouped into columns/ﬁelds - Columns/ﬁelds grouped into rows/records - Rows/records are grouped into tables/ﬁles Relationships among tables Relationships exist between rows in different tables → Implement & represent business rules How are 2 different tables connected with each other → foreign key Values in one table may relate to rows/records in other tables Primary Key - A column or group of columns that uniquely identiﬁes a row in a table - Each table has a primary key Foreign Keys - Fields that are Primary Keys in other tables - The two keys create a relationship (Primary Key Foreign Key) Relational databases - Databases using tables, primary keys, and foreign keys Metadata Data about the data → To support the organisation and management of the database Examples of Metadata: Field name (what is the column called) Data type (text, no., date…) Field properties (e.g., length) Description (what is in it?) Database development process Creating a database: Data Modelling Also called: Conceptual Modelling, Entity Relationship Modelling (ER Modelling) - Identifying and graphically representing business objects (entities) you want to store information about (customers, products, suppliers, transactions, etc.) - Representing the logical relationships between entities - The outcome of the ER Modelling process = a data model Entity Relationship Diagram (ERD) Relational & Structured database Entity = Table: collection of examples - Concept (typically people, places or things), about which you wish to store information → A table in your Database / A list Entity instance: one example/record - Single occurrence of entity type → A row (record) in this table/list Attribute - Characteristic of an entity type relevant to the organisation → A column (ﬁeld) in a table Primary key - An attribute (or a combination of attributes) that uniquely identiﬁes each instance of an entity type → A unique number or code ERD Notations Entity - Name of ENTITY is placed inside a rectangle - Name is stated in singular and in capital letters - A noun Attribute - Name of ATTRIBUTE is placed inside an ellipse - Capitalise the ﬁrst letter of each word - Where attribute name is two words, use an underscore between words - Where an attribute is the identiﬁer (primary key) for the entity it is underlined - A noun Relationship - Using a Diamond, with continuous lines connecting it to entities - A singular verb Cardinalities A relationship between entities will be one of the following types One to many (1 M) - A many to one (M 1) relationship is the same as a one to many (1 M), only stated in reverse Many to many (M:M) - Equals to 2 1 M - We will treat one to one (1 1) relationships as one to many (1 M) relationships in this course Golden Rule for modelling relationships The PK of the entity on the 1-side of the relationship is ALWAYS the FK to the entity on the M-side of the relationship Associative Entity Links two entities and contains attributes unique to their relationship. Always has a composite primary key ○ Combination of the primary keys of the linked entities (e.g. SID + MID) Normally named as combination of the entity names ○ Retain the verb of the relationship and then write the new name in capital letters above or below the box. Sometimes has a unique name (e.g. TRANSACTION, OFFER) ERD Steps Step 1. Identify entities - Business objects you need to store information about Step 2. Identify business rules - Find ‘NOUN – verb – NOUN’ relationships in narrative Step 3. Deﬁne relationships and represent cardinality - Some business objects are actually relationships à associative entity Step 4. Identify attributes - Characteristics of entities - Represent information that you want to store about entities PACER Primary Key Attribute Cardinalities Entity Relationship Unary relationships relationship between two different instances of an entity. Eg: when one employee supervises other employees. Two kinds of important relationships: 1. The transaction (many-to-many) Modelled with an associative entity (e.g., booking, order). Extend the composite primary key (e.g. with time) to allow for multiple transactions between the same subject and object (e.g. hotel guest and room) over time 2. The classifying relationship (one-to-many) [put into categories to save space] Use a TYPE relationship when an attribute isn’t unique to each entity instance, but to a class (type) of entities. In this case you create a new entity (TYPE), and a 1-M relationship with the object entity (e.g. ROOM and ROOM TYPE) You can then assign certain attributes to the TYPE (e.g. Rate as an attribute of ROOM TYPE), and others to the actual object you are classifying (e.g., Floor as an attribute of ROOM) WEEK 3 Why is Business Process management important to originations? Every business needs/has business processes Processes change over time: adapting to changing environments → often deliberate changes for the sake of improvements! Business Process Management (BPM) ― Systematic process of creating, assessing, and improving business processes ― BPM applies to all organisations (including not-for-proﬁt and government agencies/departments) ― Involves four stages 4 Stages of Business Process Management 1. Create a model of the current business process - “As-is” model documents the current process - Business users (that is YOU!) review and adjust the model; it is changed to solve process problems - The result is a “to-be” model → new process 2. Create system components - Create or alter Information Systems to support/facilitate new process - Uses the 5 elements of IS (hardware, software, data, procedures, people) 3. Implement new business process - Change the way the organisation operates, train people, implement IS 4. Create policy and procedures to assess process effectiveness on an ongoing basis - Adjust and repeat cycles (repeat the steps) 3 varies of BPM 1. Functional Processes - Activities in a single department or function - BPM easier at this level - Problem is that may lead to "isolated silos” - Example: Marketing campaign management 2. Cross-Functional Processes - Activities across/among many business departments - Eliminate or reduce isolated systems and data 3. Inter-organisational Processes - Activities that cross organisational boundaries + Supply chain management (SCM), credit card transaction processing - More difﬁcult than functional and cross-functional processes - Requires negotiation, contracts, litigation to resolve conﬂicts between organisations The activity of representing business processes = creating process models BPMN Business Process Modeling Notations A standard set of terms and graphical notations for documenting business processes → create models - Models will be understandable across various organisations and modellers (common language). - Facilitate exchange of knowledge and experience across situations Can have multiple end events Can have multiple events after parallel gate Simple layout Swim-lane layout Database Interaction layout WEEK 2 Business Process (BP) A Business Process (BP) is a structured network of activities supported by resources, facilities, and information that interact to achieve some business function BPs turn input into higher value output. A business process is a system; sometimes business processes are referred to as business systems Cross-functional Components of a business process Activities ― Transform resources and information of one type into another type Decisions ― Question that can be answered ‘Yes’ or ‘No’ Roles ― Look after sets of procedures (who does what) Resources ― People, facilities or computer programs that are assigned to roles Repositories ― Collection of business records (databases) Data / information ﬂow ― Movement of a data item from one activity to another activity or to a repository/database INFLUENCE OF IT USE IN BPS More accurate information ― BPs draw on databases, which ensure accurate information across many activities and BPs More automated ― Some activities that were manual before can be automated, ― Example: Automated customer credit check through specialised Computer System More streamlined → faster ― Example: Enterprise applications such as ERP systems can facilitate quick hand-over of activities between workplaces More efﬁcient → less cost ― All of the above together can lead to signiﬁcantly reduced cost Characteristics of well-designed Business Processes Complete ― Include all activities necessary to achieve the business goal. Minimal ― Do not include unnecessary activities (cost efﬁciency). Well-structured ― Activities are organised in a logical sequence Embedded ― Logically connect with other BPs in the organisation → Outcomes of a well-designed business process are: ― increased effectiveness (value for the customer) ― increased efﬁciency (less cost for the company). WEEK 1 Business Analytics? BA is a set of disciplines and tech for solving busi probs using data analysis, statistical models and other quantitative method Data analysis Process Planning ○ Deﬁne goals ○ Organise resources ○ Coordinate people ○ Schedule project Data preprocessing ○ Get the data ○ Clean the data ○ Explore data ○ Reﬁne/transform data Modelling ○ Create model ○ Validate model ○ Evaluate model ○ Reﬁne model Follow up ○ Present model ○ Deploy model ○ Revisit model ○ Archive assets Statistical Model Quantitative Methods 3 TYPES OF ANALYTICS Some other common levels: diagnostic/explanatory, inferential, etc. Descriptive Analytics What happened? What is happening? Descriptive or reporting analytics Answering the question of what happened Retrospective analysis of historic data Enablers ○ Descriptive statistics ○ Business reporting ○ Dashboard ○ Scoreboard ○ Data warehouse ○ Data visualisation → Outcome: well-deﬁned busi probs and opportunities Predictive Analytics What will happen? Why will it happen? Aims to determine what is likely to happen in the future Looking at the past data to predict the future Enablers: ○ Forecasting (for e.g., time series) ○ Data mining ○ Text mining/Web/Media mining Prescriptive Analytics What should I do? Why should I do it? Aims to determine the best possible decision Uses both descriptive and predictive to create the alternatives, and then determines the best one Enablers ○ Optimization ○ Simulation ○ Multi-Criteria Decision Modeling ○ Programming/Expert system ROLES AND RESPONSIBILITIES OF BEING A BUSINESS ANALYST Evaluate actions to improve the operation of a business system. Again, this may require an examination of organisational structure and staff development needs, to ensure that they are in line with any proposed process redesign and IT system development. Document the business requirements for the IT system support using appropriate documentation standards. Elaborate requirements, in support of the business users, during evolutionary system development. Strategy implementation – here the business analysts work closely with senior management to help deﬁne the most effective business system to implement elements of the business strategy. Business case production – more senior business analysts usually do this, typically with assistance from Finance specialists. Beneﬁts realisation – the business analysts carry out post-implementation reviews, examine the beneﬁts deﬁned in the business case and evaluate whether or not the beneﬁts have been achieved. Actions to achieve the business beneﬁts are also identiﬁed and sometimes carried out by the business analysts. Speciﬁcation of IT requirements – typically using standard modelling techniques such as data modelling or use case modelling. Information system? Five fundamental components of computer-based information systems: 1. Computer hardware 2. Software 3. Data 4. Procedures 5. People Types of decision modelling Deterministic Models Where all the input data value are known with complete certainty (e.g. easy to calculate the proﬁt of producing certain products) The most commonly used technique is Linear Programming (LP) Probabilistic Models (Stochastic Models) Where some input data values are uncertain- values of some important variables will not be known before decision are made (e.g. the decision of whether to start a new business) Its techniques: analysis, queuing, simulation and forecasting- Probabilistic Models use probabilities to incorporate uncertainty Steps involved in Decision Modelling 1. Formulation Translating a problem scenario from words to a mathematical model 2. Solution Solving the model to obtain the optimal solution- solve the mathematical expressions in the formulation 3. Interpretation and Sensitivity Analysis Analysing results and implementing a solution

Data Analysis & Decision Making PDF

Document Details

Tags

Related

Summary

Full Transcript