Podcast
Questions and Answers
What is a primary reason organizations and people deviate from established processes?
What is a primary reason organizations and people deviate from established processes?
Which approach is most effective for improving the performance of a process?
Which approach is most effective for improving the performance of a process?
What is one key factor in controlling a process more effectively?
What is one key factor in controlling a process more effectively?
What might be a potential outcome of redesigning a process without proper analysis?
What might be a potential outcome of redesigning a process without proper analysis?
Signup and view all the answers
When is it most likely that organizations will need to deviate from existing processes?
When is it most likely that organizations will need to deviate from existing processes?
Signup and view all the answers
What is the primary distinction mentioned in the content regarding focus in events?
What is the primary distinction mentioned in the content regarding focus in events?
Signup and view all the answers
When might someone's focus shift away from complete events?
When might someone's focus shift away from complete events?
Signup and view all the answers
Which of the following statements is NOT supported by the content?
Which of the following statements is NOT supported by the content?
Signup and view all the answers
What can affect the choice of focus regarding events?
What can affect the choice of focus regarding events?
Signup and view all the answers
Why might someone want to focus on withdrawals?
Why might someone want to focus on withdrawals?
Signup and view all the answers
Study Notes
Process Mining Goal
- The goal of process mining is to answer questions about operational processes
Examples
- What truly happened in the past?
- Why did it happen?
- What is likely to happen in the future?
- When and why do organizations and people deviate?
- How to control a process better?
- How to redesign a process to improve its performance?
ETL Process
- In the context of BI and data mining, the phrase "Extract, Transform, and Load" (ETL) describes a process:
- Extracting data from external sources
- Transforming the data to fit operational needs (addressing syntactical and semantical issues, ensuring quality levels)
- Loading the transformed data into a designated system (e.g., a data warehouse or relational database)
Data Warehouse
- A data warehouse is a single logical repository that combines an organization's transactional and operational data.
- Its goal is to consolidate information for reporting, analysis, and forecasting.
Data Quality Examples
- One data source might use a patient's last name and birth date, while another uses their social security number.
- Different sources might use different date formats, such as "31-12-2010" versus "2010/12/31."
- If a data warehouse exists, it can provide valuable input for process mining.
Scoping
- Scoping is a crucial step in process mining.
- The quality of the collected data is important.
- Often, the issue is selecting suitable data rather than just performing syntactical conversion.
- Only relevant events are included during the extraction step.
- The chosen viewpoint and questions will influence the events considered.
- Events logs are typically filtered.
- Coarse-grained scoping is typically done when creating event logs.
- Filtering, in contrast, is frequently done as a fine-grained process.
- Filtering can be based on initial data analysis results.
- For example, in process discovery, focusing on the 10 most frequent activities simplifies the model.
- Process mining frequently leads to further questions and the need for more detailed data extraction.
- Iterative extraction, filtering, and mining steps are common.
Event Logs - Assumptions
- Event logs contain data related to a single process.
- Each event in the log references a single process instance (a "case").
- Events are usually linked to an activity.
Processes and Activities
- A process is a series of activities that make up a lifecycle.
- Events have a timestamp.
- Resources (people) and costs associated to the event can also be recorded optionally.
- Each case is a sequence of events that apply to one process instance.
- Events in a case can have attributes (activity, time, costs, resources).
- Events within the same activity typically have the same attributes. Standard attributes often include:
- Activity (e)
- Time (e)
- Resource (e)
- Transaction type (e) (e.g., schedule, start, complete, suspend)
Two Activity Instances With Identical Footprints
- Two instances of the same activity that complete with similar timings may have differing underlying processes, but the event log displays identical footprints.
Correlation Problem
- The primary correlation problem concerns linking events to cases (process instances).
- This, along with the secondary problem of correlating events for the same case, can require extra manual effort or heuristics.
- Timeouts for a start event, for example, can be put in place (e.g., if a start event is not followed by a completion event within 45 minutes, remove the event).
Role of Activities in Processes
- Activities are central to process models.
- Various modeling notations (e.g., Petri nets, YAWL, EPCs, BPMN) all portray activities.
Process Mining Techniques
- Some process mining techniques use the transactional model, while others focus on atomic events.
- Sometimes, only complete events are analyzed.
- Filtering is possible (removing certain subtypes of event data).
- The concept of a classifier allows mapping the event attributes to a label in the process model (e.g., "name" of the event).
Simple Event Logs
- A multi-set of traces. A Trace is a sequence of events in a case.
- Cases need not be uniquely identifiable and events no longer are, in a simple log.
XES 5 Standard Extensions
- Defines attributes for traces and events, including instance attributes.
- Addresses lifecycle, organizational, and time attributes.
- Includes semantic attributes.
Challenges in Extracting Event Logs
-
Correlation: Difficulty in linking/correlating events within cases/instances.
- Events can be dispersed over multiple databases/systems.
- Inter-organizational communication matching can be challenging.
- Timestamps: Event logs may have imprecise timestamps (e.g., only dates, not time of day).
- Snapshots: Event logs may only show snippets of ongoing operational processes (and not necessarily represent the full process lifecycle).
Data Quality
-
Missing Data: Issues with recording events in an event log (missing events, activities with a missing timestamp or value).
- Events that occurred but were not properly recorded in the system.
- Events recorded that never occurred.
- Hidden Events: Event data which is present in the system, but is obfuscated.
- Attribute Issues: Event attributes may have imprecise values or be missing.
- Data Quality Issues: Recurrence of data quality problems over time and periods of the record (continuous, intermittent, and changing).
Guidelines for Data Logging
- Importance of data quality over logging speed.
- Logging data as a by-product of processes.
- Guidelines define 12 points, with focus on consistent naming, time, related occurrences for greater quality and clarity.
Flattening Reality into Event Logs
- Conversion from existing data formats to an event format (e.g., XES).
- Iterative Application of Filtering
- Use of Views (e.g., 2D slices of 3D data to view data from different angles).
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
This quiz covers essential concepts of process mining, including its goals, the ETL process, and the function of data warehouses. Participants will explore questions related to operational processes, data extraction, transformation, and loading, as well as the importance of data quality in decision-making. Test your understanding of these crucial BI components!