MDM SaaS Match and Merge Best Practices PDF
Document Details
Uploaded by RestoredAsh
Tags
Summary
This document details best practices for master data management (MDM) in SaaS solutions. It examines various techniques, strategies, and considerations for effective data matching and merging, and provides insights into different stages of the process. The document covers topics like data profiling and the creation of match rules to identify differing data.
Full Transcript
MDM SaaS Match and Merge Best Practices Agen da Best practices for: 1 Match 2 Data Profiling 3 Match Model Process Key Points Overview 4 Match Rule Key 5 Match Fields Key 6 Match Tuning Po...
MDM SaaS Match and Merge Best Practices Agen da Best practices for: 1 Match 2 Data Profiling 3 Match Model Process Key Points Overview 4 Match Rule Key 5 Match Fields Key 6 Match Tuning Points Points Key Points What Data Should be Mastered? Fundamental Entities Upon Which a Business is Based SPACE = Suppliers/Vendors, Products/Services, Assets, Customers, Employees/Contractors, (+ Contracts + Accounts …) Company Structure – Divisions, Regions, Territories, Departments, etc. Related Data (Child Tables) – Addresses, Communication Methods, Certifications, etc. Represents people, places and things related to organization Customer details, product details, employee details Master Data is Relatively Static – Non-Transactional. Filter out transactional data through CDI jobs Generates excessive matches Impacts performance negatively Overview: Match Rules Implementation Goal: to find the match rules configuration that uses the least number of rules, of which most are auto match- merge, requiring the fewest manual inspections possible. Step 4: Testing the match rules Key configuration. Decisi on Point Step 3: B360 Console, Step 2: Translating SSA_Name3 Step 1: workbench Data business Understand profiling - requireme ing the Key Data nts into business’ Decisi analysis an initial on needs. to set of Point Step 5: Data analysis of investigate match Step 6: Making changes the match rules Worksho customer manual rules. to match rules (e.g. B360 Console outcome, and ps, data search range, key width data. or type, attribute recording/tracking of meeting analysis, match rules s Excel weights, threshold, configuration changes purpose, population, made for each iteration attribute set, etc.). of testing (run). B360 Console Excel Understand Business Needs Hold Identify Data- Continuou MDM Business Discovery Artifacts for future Driven Identify s Feedback Users / Data Workshops improvements Business business Loop Data Profiling based on Identify critical Processes Share feedback Stewards problems to Integrating Analyse if continuous attributes for solve with stakeholders matching feedback matching Start with most based on System / duplicate data E.g. Name, technical review Applications critical ones is required Addresses, Phone and match report Engage Document / email addresses Create a Stakehold Current etc. Business ers State Case Data Profiling (contd.) Data Profiling Key Points Use data profiling at project start to discover if data is suitable for MDM Identify duplicates, and missing values Identify inconsistences in data formats Identify stale data & contradicting data Identify Distinct count and percent Helps in making the fuzzy match key(s) and creation of match Match Configuration Steps Master data identification Match Candidat Declarati Match e Survivorshi Model ve Rules Fields& Selection p Configurati Segmentatio on Criteria n Match Model Key Points Configuration that determines how your data is matched Use a Meaningful model name You cannot change the model's name after the model is published A match model can be copied to new one. Predefined models can’t be deleted Before deleting the match model, remove it from its dependencies (ex. Job, Search Match API) Maximum 25 published Match models can be created. Population Key Points is a definition of certain characteristics Population of data Choose the right population file based on the requirement, default is USA Populations contain the logic to generate match keys The system matches similar records that belong to the same population A population set encapsulates intelligence about name, address, and other identification information If you have mixed data from different languages Candidate Selection Criteria Match candidates are record pairs that Remove extraneous/bad data to improve are possible matches candidate selection and avoid large Avoid choosing multiple Candidate number of Selection Criteria unless it is essential, candidates returned. and you have such a requirement for your Ensure that you publish when you modify use case. Population file and Candidate selection Having multiple criteria improves quality criteria but affects the performance of the Ensure that you regenerate the Match candidate selection process keys for the records when you modify Multiple candidate selection criteria: Does Population file and Candidate selection a union of all candidates from all the criteria criterion. Configure candidate selection criteria that are necessary to enhance the Candidate Selection Criteria Key Points Business entity field use for generating the match keys and match candidates You can use numeric/alpha numeric columns depending on the cardinality level – for example SSN, passport numbers are valid. Phone numbers are commonly used. Use filter criteria, where applicable, to limit the number of candidates returned. As an example, use full Name instead of firstname or lastname Based on your discussion with the business users and data audit, as a rule of thumb, you would use the following as match keys: If data contains organization names use organization name as field If data contains individual names only use person name as the field If the data contains addresses only use address part1 as the field Candidate Selection Criteria Key Points (cont.) Field Filter Type: Candidates: Indicates the type of data Adding a filter might reduce the contained in the "Field Name" number of match candidates fielda field that serves to capture the Use Fewer match candidates potential candidates improve the performance Ensure that the selected field and Find accuracy filter column that correctly meets the selection criteria removes the incorrect matches best practices without This canlosing good be effectively matches used to ensure that 10K limit is not reached. Key Generation Level Key Points Define the thoroughness with which match keys are generated to identify the records for matching. Decide key generation level based on: Size and quality of the data Reliability of the matched records Processing time Standard is Default and recommended Use Extended for high-risk or critical search applications Larger candidate sets at search time Using Limited reduces the search reliability Candidate Search level Key Points Defines how stringently and thoroughly to search for match candidates Declarative rules applies on the match candidates and not on the entire data Decide on a candidate search level based on the following considerations: Size and quality of the data Criticality of the matches Time constraints Declarative Rules Key Points Set of conditions and BE fields to identify the duplicate records Configure a match model with the least possible number of match rules for optimal match results A meaningful description helps identify a declarative rule and its purpose. Separate out rules per business logic Start with simple tighter rules Match Strategy Key Points You can define a declarative match rule for exact or fuzzy matching. Configure the exact match strategy if the quality of o ‘Tech Corp’ the Configure data isthe Fuzzy strategy, for a probabilistic match based good o Technology on data patterns Rank rules in the order you want to run them for matching Corporation Unranked rules get applied if the matching pair doesn’t meet ranked rule conditions Match Criteria Strategy Key Points Applicable for Fuzzy based rules, and every Fuzzy rule must have a match criteria Two rules with properties but different match criteria result in different set matching records. For example, if address is important to determine match of a two person records, use the Resident match criteria, and use Division for Organization and address. Refer complete details of Match criteria via Doc link Merge Strategy Key Points A merge strategy indicates the action to be performed on the records that have gone through the match process. Use Automated for rules that are based on unique identifier fields, such as social security number Use manual strategy for data steward review. Avoid rules that generate too many matches. You do not want to flood the Data Stewards with too many matches Declarative Rules Key Points Always (cont.) have some exact match filters on every match rule especially fuzzy match rules Excessive use of fuzzy matching algorithms without tight filters can impact system performance. Remove redundant rules that don’t capture any matches Move rules up or down in match set to see if order improves results Threshold Based Rules Key Points Use thresholds to control the level of similarity for a match A lower threshold will result in higher recall but lower precision, while a higher threshold will increase precision but reduce recall. Configure tight thresholds to avoid false positives for name and address match without unique identifiers Review documentation for predefined rules while using Threshold-based Merge Strategy When there is a tie between manual merge and automated merge, Match outcome always go for REVIEW so that Data stewards can take a necessary decision in terms of conflicts. Refer Article for example All THB rules get executed and using high number impacts the performance. Match fields Key Add Points Exact match field to match identical data Exact fields Used for filtering that reduces the number of rows that go to through fuzzy matching Use fuzzy match fields for identifying data that are similar Pay attentions to Recommendations from CLAIRE Engine o Suitable for exact match -> 'Name ', 'phone', 'SSN’ o Unsuitable for exact match -> 'DOB', 'email', 'AddressLine1', 'City', 'ZiP' Match fields Key Points (Contd.) Segment Matching Key Use Points Segment Matching to limit match rules to specific subsets of data (e.g., country code, dpt code). Segment matching can only be enabled on exact match fields For example, If you specify the segment value as Japan, the rule uses the records that have Japan as a country value for matching. Select Match segment data with other data to match the segment data with the rest of the data Achieve Distinct matching using workaround mentioned in Article Survivorship Rule Key Implement survivorship rules to Points determine how conflicting data from matched records should be resolved Must be configured before records Ingress By default, Informatica customer 360 source as Rank 1 and Default system as Rank 2 The ability to survive a field group as a block is available for preview from July Match Report Key Points After an initial run, review and assess the match output Match report with 150K pairs information can be downloaded and reach out to GCS for complete report. Match Operation Key Points In MDM SAAS Matching happens at XREF level and not on Master level If the record pair meets the conditions set by the first match rule, then it doesn't process further. Null to Null and Null to Non-Null matching available in August release, on demand. By default, the match process identify up to first 10,000 candidates for each record to improve the performance and accuracy Currently 1 master can only have 1000 Source XREF records Reset Job Key ThePoints Reset step resets the already matched records. Reset job is not recommended for Production environments The record state changes from MATCHED to MATCHED_INDEXED Reset job doesn’t remove the CONSOLIDATED and MATCH_INDEXED records. Already merged/consolidated records are not unmerged. It affects only the cross-reference state Job Definitions Key Points During match tuning, select Match Only when executing match. This will allow you to reset the data for re-match in case you make match configuration changes. If required, ensure to rerun key generation after making match changes Use NotReadyForMatch to specify whether a source record can participate in the match process. Use NotReadyForMatch to avoid a large cluster record from participating in match Via API, UI and CDI Ingress. Use search match instead of Match Job for real time matching. Operational Insights Key Points Use Operational Insights to view data processing analytics and monitoring statistics for MDM SaaS. You can analyze the match job metrics and decide whether to modify the match model configuration. You can analyze the merge job metrics to determine the number of record pairs and record pair groups that were created by the job. You can compare two to five jobs and view their key metrics. Tuning Exercise Key Points What rules do we start with? Goo Ba Ne Profit d d w Rule Rule Rule ? s s s Starting Ruleset - Rules that work already - Rules that are troublesome today - Rules utilizing new attributes Don’t overthink it! The more iterations you execute, the less important your starting point becomes. Match Best Practice Checklist Know your Data Define Candidate Selection Criteria Identify Hotspots in Data Define Fuzzy Column Define Exact Column AUTO, Manual, Threshold rules Leverage Segment Matching