DAMA-DMBOK 3 PDF
Document Details
Uploaded by Deleted User
2017
Tags
Summary
This book, "DATA MANAGEMENT BODY OF KNOWLEDGE (DAMA-DMBOK 3)", is a comprehensive guide to data management. It covers essential concepts, frameworks, and activities including data handling ethics, governance, architecture, modeling, and storage. It is a professional-level textbook suitable for those in the data management field or related industries.
Full Transcript
DAMA-DMBOK DATA MANAGEMENT BODY OF KNOWLEDGE SECOND EDITION DAMA International Technics Publications BASKING RIDGE, NEW JERSEY Dedicated to the memory of Patricia Cupoli, MLS, MBA, CCP, CDMP...
DAMA-DMBOK DATA MANAGEMENT BODY OF KNOWLEDGE SECOND EDITION DAMA International Technics Publications BASKING RIDGE, NEW JERSEY Dedicated to the memory of Patricia Cupoli, MLS, MBA, CCP, CDMP (May 25, 1948 – July 28, 2015) for her lifelong commitment to the Data Management profession and her contributions to this publication. Published by: 2 Lindsley Road Basking Ridge, NJ 07920 USA https://www.TechnicsPub.com Senior Editor: Deborah Henderson, CDMP Editor: Susan Earley, CDMP Production Editor: Laura Sebastian-Coleman, CDMP, IQCP Bibliography Researcher: Elena Sykora, DGSP Collaboration Tool Manager: Eva Smith, CDMP Cover design by Lorena Molinari All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without written permission from the publisher, except for the inclusion of brief quotations in a review. The author and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. All trade and product names are trademarks, registered trademarks or service marks of their respective companies and are the property of their respective holders and should be treated as such. Second Edition First Printing 2017 Copyright © 2017 DAMA International ISBN, Print ed. 9781634622349 ISBN, PDF ed. 9781634622363 ISBN, Server ed. 9781634622486 ISBN, Enterprise ed. 9781634622479 Library of Congress Control Number: 2017941854 Contents Preface _________________________________________________________ 15 Chapter 1: Data Management _______________________________________ 17 1. Introduction ____________________________________________________________ 17 2. Essential Concepts _______________________________________________________ 18 2.1 Data ______________________________________________________________________ 18 2.2 Data and Information ________________________________________________________ 20 2.3 Data as an Organizational Asset _______________________________________________ 20 2.4 Data Management Principles __________________________________________________ 21 2.5 Data Management Challenges _________________________________________________ 23 2.6 Data Management Strategy ___________________________________________________ 31 3. Data Management Frameworks ____________________________________________ 33 3.1 Strategic Alignment Model____________________________________________________ 33 3.2 The Amsterdam Information Model ____________________________________________ 34 3.3 The DAMA-DMBOK Framework _______________________________________________ 35 3.4 DMBOK Pyramid (Aiken) _____________________________________________________ 39 3.5 DAMA Data Management Framework Evolved ___________________________________ 40 4. DAMA and the DMBOK ___________________________________________________ 43 5. Works Cited / Recommended ______________________________________________ 46 Chapter 2: Data Handling Ethics ____________________________________ 49 1. Introduction ____________________________________________________________ 49 2. Business Drivers ________________________________________________________ 51 3. Essential Concepts _______________________________________________________ 52 3.1 Ethical Principles for Data ____________________________________________________ 52 3.2 Principles Behind Data Privacy Law ____________________________________________ 53 3.3 Online Data in an Ethical Context ______________________________________________ 56 3.4 Risks of Unethical Data Handling Practices ______________________________________ 56 3.5 Establishing an Ethical Data Culture ____________________________________________ 60 3.6 Data Ethics and Governance __________________________________________________ 64 4. Works Cited / Recommended ______________________________________________ 65 Chapter 3: Data Governance ________________________________________ 67 1. Introduction ____________________________________________________________ 67 1.1 Business Drivers ____________________________________________________________ 70 1.2 Goals and Principles _________________________________________________________ 71 1.3 Essential Concepts __________________________________________________________ 72 2. Activities _______________________________________________________________ 79 2.1 Define Data Governance for the Organization ____________________________________ 79 2.2 Perform Readiness Assessment _______________________________________________ 79 2.3 Perform Discovery and Business Alignment _____________________________________ 80 2.4 Develop Organizational Touch Points___________________________________________ 81 2.5 Develop Data Governance Strategy _____________________________________________ 82 2.6 Define the DG Operating Framework ___________________________________________ 82 2.7 Develop Goals, Principles, and Policies __________________________________________ 83 2.8 Underwrite Data Management Projects _________________________________________ 84 2.9 Engage Change Management __________________________________________________ 85 1 2 DMBOK2 2.10 Engage in Issue Management ________________________________________________ 86 2.11 Assess Regulatory Compliance Requirements___________________________________ 87 2.12 Implement Data Governance _________________________________________________ 88 2.13 Sponsor Data Standards and Procedures _______________________________________ 88 2.14 Develop a Business Glossary _________________________________________________ 90 2.15 Coordinate with Architecture Groups _________________________________________ 90 2.16 Sponsor Data Asset Valuation ________________________________________________ 91 2.17 Embed Data Governance ____________________________________________________ 91 3. Tools and Techniques____________________________________________________ 92 3.1 Online Presence / Websites___________________________________________________ 92 3.2 Business Glossary ___________________________________________________________ 92 3.3 Workflow Tools ____________________________________________________________ 93 3.4 Document Management Tools_________________________________________________ 93 3.5 Data Governance Scorecards __________________________________________________ 93 4. Implementation Guidelines _______________________________________________ 93 4.1 Organization and Culture_____________________________________________________ 93 4.2 Adjustment and Communication ______________________________________________ 94 5. Metrics ________________________________________________________________ 94 6. Works Cited / Recommended _____________________________________________ 95 Chapter 4: Data Architecture _______________________________________ 97 1. Introduction ___________________________________________________________ 97 1.1 Business Drivers ____________________________________________________________ 99 1.2 Data Architecture Outcomes and Practices _____________________________________ 100 1.3 Essential Concepts _________________________________________________________ 101 2. Activities _____________________________________________________________ 109 2.1 Establish Data Architecture Practice __________________________________________ 110 2.2 Integrate with Enterprise Architecture ________________________________________ 115 3. Tools ________________________________________________________________ 115 3.1 Data Modeling Tools________________________________________________________ 115 3.2 Asset Management Software _________________________________________________ 115 3.3 Graphical Design Applications _______________________________________________ 115 4. Techniques ___________________________________________________________ 116 4.1 Lifecycle Projections _______________________________________________________ 116 4.2 Diagramming Clarity _______________________________________________________ 116 5. Implementation Guidelines ______________________________________________ 117 5.1 Readiness Assessment / Risk Assessment ______________________________________ 118 5.2 Organization and Cultural Change ____________________________________________ 119 6. Data Architecture Governance ___________________________________________ 119 6.1 Metrics ___________________________________________________________________ 120 7. Works Cited / Recommended ____________________________________________ 120 Chapter 5: Data Modeling and Design _______________________________ 123 1. Introduction __________________________________________________________ 123 1.1 Business Drivers ___________________________________________________________ 125 1.2 Goals and Principles ________________________________________________________ 125 1.3 Essential Concepts _________________________________________________________ 126 2. Activities _____________________________________________________________ 152 2.1 Plan for Data Modeling______________________________________________________ 152 CONTENTS 3 2.2 Build the Data Model _______________________________________________________ 153 2.3 Review the Data Models _____________________________________________________ 158 2.4 Maintain the Data Models ___________________________________________________ 159 3. Tools _________________________________________________________________ 159 3.1 Data Modeling Tools ________________________________________________________ 159 3.2 Lineage Tools _____________________________________________________________ 159 3.3 Data Profiling Tools ________________________________________________________ 160 3.4 Metadata Repositories ______________________________________________________ 160 3.5 Data Model Patterns ________________________________________________________ 160 3.6 Industry Data Models _______________________________________________________ 160 4. Best Practices __________________________________________________________ 161 4.1 Best Practices in Naming Conventions _________________________________________ 161 4.2 Best Practices in Database Design _____________________________________________ 161 5. Data Model Governance _________________________________________________ 162 5.1 Data Model and Design Quality Management ___________________________________ 162 5.2 Data Modeling Metrics ______________________________________________________ 164 6. Works Cited / Recommended _____________________________________________ 166 Chapter 6: Data Storage and Operations _____________________________ 169 1. Introduction ___________________________________________________________ 169 1.1 Business Drivers ___________________________________________________________ 171 1.2 Goals and Principles ________________________________________________________ 171 1.3 Essential Concepts _________________________________________________________ 172 2. Activities ______________________________________________________________ 193 2.1 Manage Database Technology ________________________________________________ 194 2.2 Manage Databases _________________________________________________________ 196 3. Tools _________________________________________________________________ 209 3.1 Data Modeling Tools ________________________________________________________ 209 3.2 Database Monitoring Tools __________________________________________________ 209 3.3 Database Management Tools _________________________________________________ 209 3.4 Developer Support Tools ____________________________________________________ 209 4. Techniques ____________________________________________________________ 210 4.1 Test in Lower Environments _________________________________________________ 210 4.2 Physical Naming Standards __________________________________________________ 210 4.3 Script Usage for All Changes _________________________________________________ 210 5. Implementation Guidelines_______________________________________________ 210 5.1 Readiness Assessment / Risk Assessment ______________________________________ 210 5.2 Organization and Cultural Change ____________________________________________ 211 6. Data Storage and Operations Governance ___________________________________ 212 6.1 Metrics ___________________________________________________________________ 212 6.2 Information Asset Tracking __________________________________________________ 213 6.3 Data Audits and Data Validation ______________________________________________ 213 7. Works Cited / Recommended _____________________________________________ 214 Chapter 7: Data Security __________________________________________ 217 1. Introduction ___________________________________________________________ 217 1.1 Business Drivers ___________________________________________________________ 220 1.2 Goals and Principles ________________________________________________________ 222 1.3 Essential Concepts _________________________________________________________ 223 4 DMBOK2 2. Activities _____________________________________________________________ 245 2.1 Identify Data Security Requirements __________________________________________ 245 2.2 Define Data Security Policy __________________________________________________ 247 2.3 Define Data Security Standards_______________________________________________ 248 3. Tools ________________________________________________________________ 256 3.1 Anti-Virus Software / Security Software _______________________________________ 256 3.2 HTTPS ___________________________________________________________________ 256 3.3 Identity Management Technology ____________________________________________ 257 3.4 Intrusion Detection and Prevention Software ___________________________________ 257 3.5 Firewalls (Prevention) ______________________________________________________ 257 3.6 Metadata Tracking _________________________________________________________ 257 3.7 Data Masking/Encryption ___________________________________________________ 258 4. Techniques ___________________________________________________________ 258 4.1 CRUD Matrix Usage ________________________________________________________ 258 4.2 Immediate Security Patch Deployment ________________________________________ 258 4.3 Data Security Attributes in Metadata __________________________________________ 258 4.4 Metrics ___________________________________________________________________ 259 4.5 Security Needs in Project Requirements _______________________________________ 261 4.6 Efficient Search of Encrypted Data ____________________________________________ 262 4.7 Document Sanitization ______________________________________________________ 262 5. Implementation Guidelines ______________________________________________ 262 5.1 Readiness Assessment / Risk Assessment ______________________________________ 262 5.2 Organization and Cultural Change ____________________________________________ 263 5.3 Visibility into User Data Entitlement __________________________________________ 263 5.4 Data Security in an Outsourced World _________________________________________ 264 5.5 Data Security in Cloud Environments __________________________________________ 265 6. Data Security Governance _______________________________________________ 265 6.1 Data Security and Enterprise Architecture _____________________________________ 265 7. Works Cited / Recommended ____________________________________________ 266 Chapter 8: Data Integration and Interoperability______________________ 269 1. Introduction __________________________________________________________ 269 1.1 Business Drivers ___________________________________________________________ 270 1.2 Goals and Principles ________________________________________________________ 272 1.3 Essential Concepts _________________________________________________________ 273 2. Data Integration Activities _______________________________________________ 286 2.1 Plan and Analyze __________________________________________________________ 286 2.2 Design Data Integration Solutions ____________________________________________ 289 2.3 Develop Data Integration Solutions ___________________________________________ 291 2.4 Implement and Monitor _____________________________________________________ 293 3. Tools ________________________________________________________________ 294 3.1 Data Transformation Engine/ETL Tool ________________________________________ 294 3.2 Data Virtualization Server ___________________________________________________ 294 3.3 Enterprise Service Bus ______________________________________________________ 294 3.4 Business Rules Engine ______________________________________________________ 295 3.5 Data and Process Modeling Tools _____________________________________________ 295 3.6 Data Profiling Tool _________________________________________________________ 295 3.7 Metadata Repository _______________________________________________________ 296 4. Techniques ___________________________________________________________ 296 CONTENTS 5 5. Implementation Guidelines_______________________________________________ 296 5.1 Readiness Assessment / Risk Assessment ______________________________________ 296 5.2 Organization and Cultural Change ____________________________________________ 297 6. DII Governance_________________________________________________________ 297 6.1 Data Sharing Agreements ___________________________________________________ 298 6.2 DII and Data Lineage _______________________________________________________ 298 6.3 Data Integration Metrics ____________________________________________________ 299 7. Works Cited / Recommended _____________________________________________ 299 Chapter 9: Document and Content Management_______________________ 303 1. Introduction ___________________________________________________________ 303 1.1 Business Drivers ___________________________________________________________ 305 1.2 Goals and Principles ________________________________________________________ 305 1.3 Essential Concepts _________________________________________________________ 307 2. Activities ______________________________________________________________ 323 2.1 Plan for Lifecycle Management _______________________________________________ 323 2.2 Manage the Lifecycle _______________________________________________________ 326 2.3 Publish and Deliver Content _________________________________________________ 329 3. Tools _________________________________________________________________ 330 3.1 Enterprise Content Management Systems ______________________________________ 330 3.2 Collaboration Tools ________________________________________________________ 333 3.3 Controlled Vocabulary and Metadata Tools _____________________________________ 333 3.4 Standard Markup and Exchange Formats ______________________________________ 333 3.5 E-discovery Technology _____________________________________________________ 336 4. Techniques ____________________________________________________________ 336 4.1 Litigation Response Playbook ________________________________________________ 336 4.2 Litigation Response Data Map ________________________________________________ 337 5. Implementation Guidelines_______________________________________________ 337 5.1 Readiness Assessment / Risk Assessment ______________________________________ 338 5.2 Organization and Cultural Change ____________________________________________ 339 6. Documents and Content Governance _______________________________________ 340 6.1 Information Governance Frameworks _________________________________________ 340 6.2 Proliferation of Information _________________________________________________ 342 6.3 Govern for Quality Content __________________________________________________ 342 6.4 Metrics ___________________________________________________________________ 343 7. Works Cited / Recommended _____________________________________________ 344 Chapter 10: Reference and Master Data _____________________________ 347 1. Introduction ___________________________________________________________ 347 1.1 Business Drivers ___________________________________________________________ 349 1.2 Goals and Principles ________________________________________________________ 349 1.3 Essential Concepts _________________________________________________________ 350 2. Activities ______________________________________________________________ 370 2.1 MDM Activities ____________________________________________________________ 371 2.2 Reference Data Activities ____________________________________________________ 373 3. Tools and Techniques ___________________________________________________ 375 4. Implementation Guidelines_______________________________________________ 375 4.1 Adhere to Master Data Architecture ___________________________________________ 376 4.2 Monitor Data Movement ____________________________________________________ 376 6 DMBOK2 4.3 Manage Reference Data Change ______________________________________________ 376 4.4 Data Sharing Agreements ___________________________________________________ 377 5. Organization and Cultural Change ________________________________________ 378 6. Reference and Master Data Governance____________________________________ 378 6.1 Metrics ___________________________________________________________________ 379 7. Works Cited / Recommended ____________________________________________ 379 Chapter 11: Data Warehousing and Business Intelligence_______________ 381 1. Introduction __________________________________________________________ 381 1.1 Business Drivers ___________________________________________________________ 383 1.2 Goals and Principles ________________________________________________________ 383 1.3 Essential Concepts _________________________________________________________ 384 2. Activities _____________________________________________________________ 394 2.1 Understand Requirements __________________________________________________ 394 2.2 Define and Maintain the DW/BI Architecture ___________________________________ 395 2.3 Develop the Data Warehouse and Data Marts ___________________________________ 396 2.4 Populate the Data Warehouse ________________________________________________ 397 2.5 Implement the Business Intelligence Portfolio __________________________________ 398 2.6 Maintain Data Products _____________________________________________________ 399 3. Tools ________________________________________________________________ 402 3.1 Metadata Repository _______________________________________________________ 402 3.2 Data Integration Tools ______________________________________________________ 403 3.3 Business Intelligence Tools Types ____________________________________________ 403 4. Techniques ___________________________________________________________ 407 4.1 Prototypes to Drive Requirements ____________________________________________ 407 4.2 Self-Service BI _____________________________________________________________ 408 4.3 Audit Data that can be Queried _______________________________________________ 408 5. Implementation Guidelines ______________________________________________ 408 5.1 Readiness Assessment / Risk Assessment ______________________________________ 408 5.2 Release Roadmap __________________________________________________________ 409 5.3 Configuration Management __________________________________________________ 409 5.4 Organization and Cultural Change ____________________________________________ 410 6. DW/BI Governance_____________________________________________________ 411 6.1 Enabling Business Acceptance _______________________________________________ 411 6.2 Customer / User Satisfaction _________________________________________________ 412 6.3 Service Level Agreements ___________________________________________________ 412 6.4 Reporting Strategy _________________________________________________________ 412 6.5 Metrics ___________________________________________________________________ 413 7. Works Cited / Recommended ____________________________________________ 414 Chapter 12: Metadata Management ________________________________ 417 1. Introduction __________________________________________________________ 417 1.1 Business Drivers ___________________________________________________________ 420 1.2 Goals and Principles ________________________________________________________ 420 1.3 Essential Concepts _________________________________________________________ 421 2. Activities _____________________________________________________________ 434 2.1 Define Metadata Strategy____________________________________________________ 434 2.2 Understand Metadata Requirements __________________________________________ 435 2.3 Define Metadata Architecture ________________________________________________ 436 CONTENTS 7 2.4 Create and Maintain Metadata________________________________________________ 438 2.5 Query, Report, and Analyze Metadata__________________________________________ 440 3. Tools _________________________________________________________________ 440 3.1 Metadata Repository Management Tools _______________________________________ 440 4. Techniques ____________________________________________________________ 441 4.1 Data Lineage and Impact Analysis_____________________________________________ 441 4.2 Metadata for Big Data Ingest _________________________________________________ 443 5. Implementation Guidelines_______________________________________________ 444 5.1 Readiness Assessment / Risk Assessment ______________________________________ 444 5.2 Organizational and Cultural Change ___________________________________________ 445 6. Metadata Governance ___________________________________________________ 445 6.1 Process Controls ___________________________________________________________ 445 6.2 Documentation of Metadata Solutions _________________________________________ 446 6.3 Metadata Standards and Guidelines ___________________________________________ 446 6.4 Metrics ___________________________________________________________________ 447 7. Works Cited / Recommended _____________________________________________ 448 Chapter 13: Data Quality _________________________________________ 449 1. Introduction ___________________________________________________________ 449 1.1 Business Drivers ___________________________________________________________ 452 1.2 Goals and Principles ________________________________________________________ 452 1.3 Essential Concepts _________________________________________________________ 453 2. Activities ______________________________________________________________ 473 2.1 Define High Quality Data ____________________________________________________ 473 2.2 Define a Data Quality Strategy ________________________________________________ 474 2.3 Identify Critical Data and Business Rules _______________________________________ 474 2.4 Perform an Initial Data Quality Assessment_____________________________________ 475 2.5 Identify and Prioritize Potential Improvements _________________________________ 476 2.6 Define Goals for Data Quality Improvement ____________________________________ 477 2.7 Develop and Deploy Data Quality Operations ___________________________________ 477 3. Tools _________________________________________________________________ 484 3.1 Data Profiling Tools ________________________________________________________ 485 3.2 Data Querying Tools ________________________________________________________ 485 3.3 Modeling and ETL Tools_____________________________________________________ 485 3.4 Data Quality Rule Templates _________________________________________________ 485 3.5 Metadata Repositories ______________________________________________________ 485 4. Techniques ____________________________________________________________ 486 4.1 Preventive Actions _________________________________________________________ 486 4.2 Corrective Actions _________________________________________________________ 486 4.3 Quality Check and Audit Code Modules ________________________________________ 487 4.4 Effective Data Quality Metrics ________________________________________________ 487 4.5 Statistical Process Control ___________________________________________________ 488 4.6 Root Cause Analysis ________________________________________________________ 490 5. Implementation Guidelines_______________________________________________ 490 5.1 Readiness Assessment / Risk Assessment ______________________________________ 491 5.2 Organization and Cultural Change ____________________________________________ 492 6. Data Quality and Data Governance_________________________________________ 493 6.1 Data Quality Policy _________________________________________________________ 493 6.2 Metrics ___________________________________________________________________ 494 8 DMBOK2 7. Works Cited / Recommended ____________________________________________ 494 Chapter 14: Big Data and Data Science ______________________________ 497 1. Introduction __________________________________________________________ 497 1.1 Business Drivers ___________________________________________________________ 498 1.2 Principles ________________________________________________________________ 500 1.3 Essential Concepts _________________________________________________________ 500 2. Activities _____________________________________________________________ 511 2.1 Define Big Data Strategy and Business Needs ___________________________________ 511 2.2 Choose Data Sources _______________________________________________________ 512 2.3 Acquire and Ingest Data Sources______________________________________________ 513 2.4 Develop Data Hypotheses and Methods ________________________________________ 514 2.5 Integrate / Align Data for Analysis ____________________________________________ 514 2.6 Explore Data Using Models __________________________________________________ 514 2.7 Deploy and Monitor ________________________________________________________ 516 3. Tools ________________________________________________________________ 517 3.1 MPP Shared-nothing Technologies and Architecture _____________________________ 518 3.2 Distributed File-based Databases _____________________________________________ 519 3.3 In-database Algorithms _____________________________________________________ 520 3.4 Big Data Cloud Solutions ____________________________________________________ 520 3.5 Statistical Computing and Graphical Languages _________________________________ 520 3.6 Data Visualization Tools ____________________________________________________ 520 4. Techniques ___________________________________________________________ 521 4.1 Analytic Modeling __________________________________________________________ 521 4.2 Big Data Modeling _________________________________________________________ 522 5. Implementation Guidelines ______________________________________________ 523 5.1 Strategy Alignment _________________________________________________________ 523 5.2 Readiness Assessment / Risk Assessment ______________________________________ 523 5.3 Organization and Cultural Change ____________________________________________ 524 6. Big Data and Data Science Governance_____________________________________ 525 6.1 Visualization Channels Management __________________________________________ 525 6.2 Data Science and Visualization Standards ______________________________________ 525 6.3 Data Security______________________________________________________________ 526 6.4 Metadata _________________________________________________________________ 526 6.5 Data Quality ______________________________________________________________ 527 6.6 Metrics ___________________________________________________________________ 527 7. Works Cited / Recommended ____________________________________________ 528 Chapter 15: Data Management Maturity Assessment __________________ 531 1. Introduction __________________________________________________________ 531 1.1 Business Drivers ___________________________________________________________ 532 1.2 Goals and Principles ________________________________________________________ 534 1.3 Essential Concepts _________________________________________________________ 534 2. Activities _____________________________________________________________ 539 2.1 Plan Assessment Activities __________________________________________________ 540 2.2 Perform Maturity Assessment________________________________________________ 542 2.3 Interpret Results __________________________________________________________ 543 2.4 Create a Targeted Program for Improvements __________________________________ 544 2.5 Re-assess Maturity _________________________________________________________ 545 CONTENTS 9 3. Tools _________________________________________________________________ 545 4. Techniques ____________________________________________________________ 546 4.1 Selecting a DMM Framework _________________________________________________ 546 4.2 DAMA-DMBOK Framework Use ______________________________________________ 546 5. Guidelines for a DMMA __________________________________________________ 547 5.1 Readiness Assessment / Risk Assessment ______________________________________ 547 5.2 Organizational and Cultural Change ___________________________________________ 548 6. Maturity Management Governance ________________________________________ 548 6.1 DMMA Process Oversight ____________________________________________________ 548 6.2 Metrics ___________________________________________________________________ 548 7. Works Cited / Recommended _____________________________________________ 549 Chapter 16: Data Management Organization and Role Expectations _______ 551 1. Introduction ___________________________________________________________ 551 2. Understand Existing Organization and Cultural Norms ________________________ 551 3. Data Management Organizational Constructs ________________________________ 553 3.1 Decentralized Operating Model _______________________________________________ 553 3.2 Network Operating Model ___________________________________________________ 554 3.3 Centralized Operating Model _________________________________________________ 555 3.4 Hybrid Operating Model ____________________________________________________ 556 3.5 Federated Operating Model __________________________________________________ 557 3.6 Identifying the Best Model for an Organization __________________________________ 557 3.7 DMO Alternatives and Design Considerations ___________________________________ 558 4. Critical Success Factors __________________________________________________ 559 4.1 Executive Sponsorship ______________________________________________________ 559 4.2 Clear Vision _______________________________________________________________ 559 4.3 Proactive Change Management _______________________________________________ 559 4.4 Leadership Alignment ______________________________________________________ 560 4.5 Communication____________________________________________________________ 560 4.6 Stakeholder Engagement ____________________________________________________ 560 4.7 Orientation and Training ____________________________________________________ 560 4.8 Adoption Measurement _____________________________________________________ 561 4.9 Adherence to Guiding Principles ______________________________________________ 561 4.10 Evolution Not Revolution __________________________________________________ 561 5. Build the Data Management Organization ___________________________________ 562 5.1 Identify Current Data Management Participants _________________________________ 562 5.2 Identify Committee Participants ______________________________________________ 562 5.3 Identify and Analyze Stakeholders ____________________________________________ 563 5.4 Involve the Stakeholders ____________________________________________________ 563 6. Interactions Between the DMO and Other Data-oriented Bodies ________________ 564 6.1 The Chief Data Officer_______________________________________________________ 564 6.2 Data Governance ___________________________________________________________ 565 6.3 Data Quality_______________________________________________________________ 566 6.4 Enterprise Architecture _____________________________________________________ 566 6.5 Managing a Global Organization ______________________________________________ 567 7. Data Management Roles _________________________________________________ 568 7.1 Organizational Roles _______________________________________________________ 568 7.2 Individual Roles ___________________________________________________________ 568 8. Works Cited / Recommended _____________________________________________ 571 10 DMBOK2 Chapter 17: Data Management and Organizational Change Management __ 573 1. Introduction __________________________________________________________ 573 2. Laws of Change ________________________________________________________ 574 3. Not Managing a Change: Managing a Transition _____________________________ 575 4. Kotter’s Eight Errors of Change Management _______________________________ 577 4.1 Error #1: Allowing Too Much Complacency ____________________________________ 577 4.2 Error #2: Failing to Create a Sufficiently Powerful Guiding Coalition ________________ 578 4.3 Error #3: Underestimating the Power of Vision _________________________________ 578 4.4 Error #4: Under Communicating the Vision by a Factor of 10, 100, or 1000 __________ 579 4.5 Error #5: Permitting Obstacles to Block the Vision_______________________________ 580 4.6 Error #6: Failing to Create Short-Term Wins ___________________________________ 580 4.7 Error #7: Declaring Victory Too Soon _________________________________________ 581 4.8 Error # 8: Neglecting to Anchor Changes Firmly in the Corporate Culture____________ 581 5. Kotter’s Eight Stage Process for Major Change ______________________________ 582 5.1 Establishing a Sense of Urgency ______________________________________________ 583 5.2 The Guiding Coalition_______________________________________________________ 586 5.3 Developing a Vision and Strategy _____________________________________________ 590 5.4 Communicating the Change Vision ____________________________________________ 594 6. The Formula for Change_________________________________________________ 598 7. Diffusion of Innovations and Sustaining Change _____________________________ 599 7.1 The Challenges to be Overcome as Innovations Spread ___________________________ 601 7.2 Key Elements in the Diffusion of Innovation ____________________________________ 601 7.3 The Five Stages of Adoption _________________________________________________ 601 7.4 Factors Affecting Acceptance or Rejection of an Innovation or Change ______________ 602 8. Sustaining Change _____________________________________________________ 603 8.1 Sense of Urgency / Dissatisfaction ____________________________________________ 604 8.2 Framing the Vision _________________________________________________________ 604 8.3 The Guiding Coalition_______________________________________________________ 605 8.4 Relative Advantage and Observability _________________________________________ 605 9. Communicating Data Management Value __________________________________ 605 9.1 Communications Principles __________________________________________________ 605 9.2 Audience Evaluation and Preparation _________________________________________ 606 9.3 The Human Element________________________________________________________ 607 9.4 Communication Plan _______________________________________________________ 608 9.5 Keep Communicating _______________________________________________________ 609 10. Works Cited / Recommended ___________________________________________ 609 Acknowledgements _____________________________________________ 611 Index _________________________________________________________ 615 Figures Figure 1 Data Management Principles ____________________________________________________________ 22 Figure 2 Data Lifecycle Key Activities_____________________________________________________________ 29 Figure 3 Strategic Alignment Model (Henderson and Venkatraman) _____________________________________ 34 Figure 4 Amsterdam Information Model (adapted) __________________________________________________ 35 Figure 5 The DAMA-DMBOK2 Data Management Framework (The DAMA Wheel) ___________________________ 36 Figure 6 DAMA Environmental Factors Hexagon ____________________________________________________ 36 Figure 7 Knowledge Area Context Diagram ________________________________________________________ 37 Figure 8 Purchased or Built Database Capability ____________________________________________________ 40 Figure 9 DAMA Functional Area Dependencies _____________________________________________________ 41 Figure 10 DAMA Data Management Function Framework _____________________________________________ 42 Figure 11 DAMA Wheel Evolved ________________________________________________________________ 44 Figure 12 Context Diagram: Data Handling Ethics ___________________________________________________ 50 Figure 13 Ethical Risk Model for Sampling Projects __________________________________________________ 64 Figure 14 Context Diagram: Data Governance and Stewardship _________________________________________ 69 Figure 15 Data Governance and Data Management __________________________________________________ 72 Figure 16 Data Governance Organization Parts _____________________________________________________ 74 Figure 17 Enterprise DG Operating Framework Examples _____________________________________________ 75 Figure 18 CDO Organizational Touch Points ________________________________________________________ 81 Figure 19 An Example of an Operating Framework __________________________________________________ 83 Figure 20 Data Issue Escalation Path _____________________________________________________________ 86 Figure 21 Context Diagram: Data Architecture _____________________________________________________ 100 Figure 22 Simplified Zachman Framework________________________________________________________ 103 Figure 23 Enterprise Data Model _______________________________________________________________ 106 Figure 24 Subject Area Models Diagram Example___________________________________________________ 107 Figure 25 Data Flow Depicted in a Matrix ________________________________________________________ 108 Figure 26 Data Flow Diagram Example __________________________________________________________ 109 Figure 27 The Data Dependencies of Business Capabilities ____________________________________________ 112 Figure 28 Context Diagram: Data Modeling and Design ______________________________________________ 124 Figure 29 Entities __________________________________________________________________________ 129 Figure 30 Relationships______________________________________________________________________ 130 Figure 31 Cardinality Symbols _________________________________________________________________ 131 Figure 32 Unary Relationship - Hierarchy ________________________________________________________ 131 Figure 33 Unary Relationship - Network _________________________________________________________ 131 Figure 34 Binary Relationship _________________________________________________________________ 132 Figure 35 Ternary Relationship ________________________________________________________________ 132 Figure 36 Foreign Keys ______________________________________________________________________ 133 Figure 37 Attributes ________________________________________________________________________ 133 Figure 38 Dependent and Independent Entity _____________________________________________________ 134 Figure 39 IE Notation _______________________________________________________________________ 137 Figure 40 Axis Notation for Dimensional Models ___________________________________________________ 138 Figure 41 UML Class Model ___________________________________________________________________ 140 Figure 42 ORM Model _______________________________________________________________________ 141 Figure 43 FCO-IM Model _____________________________________________________________________ 142 Figure 44 Data Vault Model ___________________________________________________________________ 143 Figure 45 Anchor Model _____________________________________________________________________ 143 Figure 46 Relational Conceptual Model __________________________________________________________ 145 Figure 47 Dimensional Conceptual Model ________________________________________________________ 146 Figure 48 Relational Logical Data Model _________________________________________________________ 146 Figure 49 Dimensional Logical Data Model _______________________________________________________ 147 Figure 50 Relational Physical Data Model ________________________________________________________ 148 Figure 51 Dimensional Physical Data Model _______________________________________________________ 149 Figure 52 Supertype and Subtype Relationships ___________________________________________________ 152 Figure 53 Modeling is Iterative ________________________________________________________________ 153 11 12 DMBOK2 Figure 54 Context Diagram: Data Storage and Operations _____________________________________________ 170 Figure 55 Centralized vs. Distributed ____________________________________________________________ 175 Figure 56 Federated Databases ________________________________________________________________ 176 Figure 57 Coupling __________________________________________________________________________ 177 Figure 58 CAP Theorem ______________________________________________________________________ 180 Figure 59 Database Organization Spectrum _______________________________________________________ 184 Figure 60 Log Shipping vs. Mirroring ____________________________________________________________ 192 Figure 61 SLAs for System and Database Performance _______________________________________________ 203 Figure 62 Sources of Data Security Requirements ___________________________________________________ 218 Figure 63 Context Diagram: Data Security_________________________________________________________ 219 Figure 64 DMZ Example ______________________________________________________________________ 231 Figure 65 Security Role Hierarchy Example Diagram ________________________________________________ 251 Figure 66 Context Diagram: Data Integration and Interoperability ______________________________________ 271 Figure 67 ETL Process Flow ___________________________________________________________________ 274 Figure 68 ELT Process Flow ___________________________________________________________________ 275 Figure 69 Application Coupling ________________________________________________________________ 282 Figure 70 Enterprise Service Bus _______________________________________________________________ 283 Figure 71 Context Diagram: Documents and Content ________________________________________________ 304 Figure 72 Document Hierarchy based on ISO 9001-4.2 _______________________________________________ 317 Figure 73 Electronic Discovery Reference Model ___________________________________________________ 319 Figure 74 Information Governance Reference Model_________________________________________________ 341 Figure 75 Context Diagram: Reference and Master Data ______________________________________________ 348 Figure 76 Key Processing Steps for MDM _________________________________________________________ 361 Figure 77 Master Data Sharing Architecture Example ________________________________________________ 370 Figure 78 Reference Data Change Request Process __________________________________________________ 377 Figure 79 Context Diagram: DW/BI _____________________________________________________________ 382 Figure 80 The Corporate Information Factory______________________________________________________ 388 Figure 81 Kimball's Data Warehouse Chess Pieces __________________________________________________ 390 Figure 82 Conceptual DW/BI and Big Data Architecture ______________________________________________ 391 Figure 83 Release Process Example _____________________________________________________________ 400 Figure 84 Context Diagram: Metadata____________________________________________________________ 419 Figure 85 Centralized Metadata Architecture ______________________________________________________ 432 Figure 86 Distributed Metadata Architecture ______________________________________________________ 433 Figure 87 Hybrid Metadata Architecture __________________________________________________________ 434 Figure 88 Example Metadata Repository Metamodel ________________________________________________ 437 Figure 89 Sample Data Element Lineage Flow Diagram _______________________________________________ 442 Figure 90 Sample System Lineage Flow Diagram ___________________________________________________ 442 Figure 91 Context Diagram: Data Quality _________________________________________________________ 451 Figure 92 Relationship Between Data Quality Dimensions ____________________________________________ 460 Figure 93 A Data Quality Management Cycle Based on the Shewhart Chart ________________________________ 463 Figure 94 Barriers to Managing Information as a Business Asset ________________________________________ 467 Figure 95 Control Chart of a Process in Statistical Control _____________________________________________ 489 Figure 96 Abate Information Triangle____________________________________________________________ 498 Figure 97 Context Diagram: Big Data and Data Science _______________________________________________ 499 Figure 98 Data Science Process ________________________________________________________________ 501 Figure 99 Data Storage Challenges ______________________________________________________________ 503 Figure 100 Conceptual DW/BI and Big Data Architecture _____________________________________________ 504 Figure 101 Services-based Architecture __________________________________________________________ 506 Figure 102 Columnar Appliance Architecture ______________________________________________________ 519 Figure 103 Context Diagram: Data Management Maturity Assessment ___________________________________ 533 Figure 104 Data Management Maturity Model Example ______________________________________________ 535 Figure 105 Example of a Data Management Maturity Assessment Visualization _____________________________ 537 Figure 106 Assess Current State to Create an Operating Model _________________________________________ 552 Figure 107 Decentralized Operating Model ________________________________________________________ 554 FIGURES AND TABLES 13 Figure 108 Network Operating Model ___________________________________________________________ 554 Figure 109 Centralized Operating Model _________________________________________________________ 555 Figure 110 Hybrid Operating Model ____________________________________________________________ 556 Figure 111 Federated Operating Model __________________________________________________________ 557 Figure 112 Stakeholder Interest Map ____________________________________________________________ 564 Figure 113 Bridges’s Transition Phases __________________________________________________________ 576 Figure 114 Kotter’s Eight Stage Process for Major Change ____________________________________________ 583 Figure 115 Sources of Complacency_____________________________________________________________ 585 Figure 116 Vision Breaks Through Status Quo _____________________________________________________ 591 Figure 117 Management/Leadership Contrast _____________________________________________________ 593 Figure 118 Everett Rogers Diffusion of Innovations _________________________________________________ 600 Figure 119 The Stages of Adoption _____________________________________________________________ 602 Tables Table 1 GDPR Principles ______________________________________________________________________ 54 Table 2 Canadian Privacy Statutory Obligations _____________________________________________________ 55 Table 3 United States Privacy Program Criteria _____________________________________________________ 55 Table 4 Typical Data Governance Committees / Bodies _______________________________________________ 74 Table 5 Principles for Data Asset Accounting _______________________________________________________ 78 Table 6 Architecture Domains _________________________________________________________________ 101 Table 7 Commonly Used Entity Categories ________________________________________________________ 127 Table 8 Entity, Entity Type, and Entity Instance ____________________________________________________ 128 Table 9 Modeling Schemes and Notations ________________________________________________________ 136 Table 10 Scheme to Database Cross Reference _____________________________________________________ 137 Table 11 Data Model Scorecard® Template _______________________________________________________ 164 Table 12 ACID vs BASE ______________________________________________________________________ 180 Table 13 Sample Regulation Inventory Table ______________________________________________________ 246 Table 14 Role Assignment Grid Example _________________________________________________________ 250 Table 15 Levels of Control for Documents per ANSI-859 _____________________________________________ 327 Table 16 Sample Audit Measures _______________________________________________________________ 329 Table 17 Simple Reference List ________________________________________________________________ 353 Table 18 Simple Reference List Expanded ________________________________________________________ 354 Table 19 Cross-Reference List _________________________________________________________________ 354 Table 20 Multi-Language Reference List _________________________________________________________ 354 Table 21 UNSPSC (Universal Standard Products and Services Classification) ______________________________ 355 Table 22 NAICS (North America Industry Classification System) _______________________________________ 355 Table 23 Critical Reference Data Metadata Attributes _______________________________________________ 357 Table 24 Source Data as Received by the MDM System _______________________________________________ 361 Table 25 Standardized and Enriched Input Data ___________________________________________________ 362 Table 26 Candidate Identification and Identity Resolution ____________________________________________ 364 Table 27 DW-Bus Matrix Example ______________________________________________________________ 389 Table 28 CDC Technique Comparison ___________________________________________________________ 393 Table 29 Common Dimensions of Data Quality _____________________________________________________ 458 Table 30 DQ Metric Examples _________________________________________________________________ 480 Table 31 Data Quality Monitoring Techniques _____________________________________________________ 481 Table 32 Analytics Progression ________________________________________________________________ 501 Table 33 Typical Risks and Mitigations for a DMMA _________________________________________________ 547 Table 34 Bridges’s Transition Phases ____________________________________________________________ 575 Table 35 Complacency Scenarios _______________________________________________________________ 578 Table 36 Declaring Victory Too Soon Scenarios ____________________________________________________ 581 Table 37 Diffusion of Innovations Categories Adapted to Information Management _________________________ 600 Table 38 The Stages of Adoption (Adapted from Rogers, 1964) ________________________________________ 602 Table 39 Communication Plan Elements _________________________________________________________ 608 Preface D AMA International is pleased to release the second edition of the DAMA Guide to the Data Management Body of Knowledge (DAMA-DMBOK2). Since the publication of the first edition in 2009, significant developments have taken place in the field of data management. Data Governance has become a standard structure in many organizations, new technologies have enabled the collection and use of ‘Big Data’ (semi- structured and unstructured data in a wide range of formats), and the importance of data ethics has grown along with our ability to explore and exploit the vast amount of data and information produced as part of our daily lives. These changes are exciting. They also place new and increasing demands on our profession. DAMA has responded to these changes by reformulating the DAMA Data Management Framework (the DAMA Wheel), adding detail and clarification, and expanding the scope of the DMBOK: Context diagrams for all Knowledge Areas have been improved and updated. Data Integration and Interoperability has been added as a new Knowledge Area to highlight its importance (Chapter 8). Data Ethics has been called out as a separate chapter due to the increasing necessity of an ethical approach to all aspects of data management (Chapter 2). The role of governance has been described both as a function (Chapter 3) and in relation to each Knowledge Area. A similar approach has been taken with organizational change management, which is described in Chapter 17 and incorporated into the Knowledge Area chapters. New chapters on Big Data and Data Science (Chapter 14) and Data Management Maturity Assessment (Chapter 15) help organizations understand where they want to go and give them the tools to get there. The second edition also includes a newly formulated set of data management principles to support the ability of organizations to manage their data effectively and get value from their data assets (Chapter 1). We hope the DMBOK2 will serve data management professionals across the globe as a valuable resource and guide. Nevertheless, we also recognize it is only a starting point. Real advancement will come as we apply and learn from these ideas. DAMA exists to enable members to learn continuously, by sharing ideas, trends, problems, and solutions. Sue Geuens Laura Sebastian-Coleman President Publications Officer DAMA International DAMA International 15 CHAPTER 1 Data Management 1. Introduction M any organizations recognize that their data is a vital enterprise asset. Data and information can give them insight about their customers, products, and services. It can help them innovate and reach strategic goals. Despite that recognition, few organizations actively manage data as an asset from which they can derive ongoing value (Evans and Price, 2012). Deriving value from data does not happen in a vacuum or by accident. It requires intention, planning, coordination, and commitment. It requires management and leadership. Data Management is the development, execution, and supervision of plans, policies, programs, and practices that deliver, control, protect, and enhance the value of data and information assets throughout their lifecycles. A Data Management Professional is any person who works in any facet of data management (from technical management of data throughout its lifecycle to ensuring that data is properly utilized and leveraged) to meet strategic organizational goals. Data management professionals fill numerous roles, from the highly technical (e.g., database administrators, network administrators, programmers) to strategic business (e.g., Data Stewards, Data Strategists, Chief Data Officers). Data management activities are wide-ranging. They include everything from the ability to make consistent decisions about how to get strategic value from data to the technical deployment and performance of databases. Thus data management requires both technical and non-technical (i.e., ‘business’) skills. Responsibility for managing data must be shared between business and information technology roles, and people in both areas must be able to collaborate to ensure an organization has high quality data that meets its strategic needs. Data and information are not just assets in the sense that organizations invest in them in order to derive future value. Data and information are also vital to the day-to-day operations of most organizations. They have been called the ‘currency’, the ‘life blood’, and even the ‘new oil’ of the information economy. 1 Whether or not an organization gets value from its analytics, it cannot even transact business without data. To support the data management professionals who carry out the work, DAMA International (The Data Management Association) has produced this book, the second edition of The DAMA Guide to the Data 1 Google ‘data as currency’, ‘data as life blood’, and ‘the new oil’, for numerous references. 17 18 DMBOK2 Management Body of Knowledge (DMBOK2). This edition builds on the first one, published in 2009, which provided foundational knowledge on which to build as the profession advanced and matured. This chapter outlines a set of principles for data management. It discusses challenges related to following those principles and suggests approaches for meeting these challenges. The chapter also describes the DAMA Data Management Framework, which provides the context for the work carried out by data management professionals within various Data Management Knowledge Areas. 1.1 Business Drivers Information and knowledge hold the key to competitive advantage. Organizations that have reliable, high quality data about their customers, products, services, and operations can make better decisions than those without data or with unreliable data. Failure to manage data is similar to failure to manage capital. It results in waste and lost opportunity. The primary driver for data management is to enable organizations to get value from their data assets, just as effective management of financial and physical assets enables organizations to get value from those assets. 1.2 Goals Within an organization, data management goals include: Understanding and supporting the information needs of the enterprise and its stakeholders, including customers, employees, and business partners Capturing, storing, protecting, and ensuring the integrity of data assets Ensuring the quality of data and information Ensuring the privacy and confidentiality of stakeholder data Preventing unauthorized or inappropriate access, manipulation, or use of data and information Ensuring data can be used effectively to add value to the enterprise 2. Essential Concepts 2.1 Data Long-standing definitions of data emphasize its role in representing facts about the world. 2 In relation to information technology, data is also understood as information that has been stored in digital form (though data is 2 The New Oxford American Dictionary defines data as “facts and statistics collected together for analysis.” The American Society for Quality (ASQ) defines data as “A set of collected facts” and describes two kinds of numerical data: measured or variable and counted or attributed. The International Standards Organization (ISO) defines data as “re-interpretable DATA MANAGEMENT 19 not limited to information that has been digitized and data management principles apply to data captured on paper as well as in databases). Still, because today we can capture so much information electronically, we call many things ‘data’ that would not have been called ‘data’ in earlier times – things like names, addresses, birthdates, what one ate for dinner on Saturday, the most recent book one purchased. Such facts about individual people can be aggregated, analyzed, and used to make a profit, improve health, or influence public policy. Moreover our technological capacity to measure a wide range of events and activities (from the repercussions of the Big Bang to our own heartbeats) and to collect, store, and analyze electronic versions of things that were not previously thought of as data (videos, pictures, sound recordings, documents) is close to surpassing our ability to synthesize these data into usable information. 3 To take advantage of the variety of data without being overwhelmed by its volume and velocity requires reliable, extensible data management practices. Most people assume that, because data represents facts, it is a form of truth about the world and that the facts will fit together. But ‘facts’ are not always simple or straightforward. Data is a means of representation. It stands for things other than itself (Chisholm, 2010). Data is both an interpretation of the objects it represents and an object that must be interpreted (Sebastian-Coleman, 2013). This is another way of saying that we need context for data to be meaningful. Context can be thought of as data’s representational system; such a system includes a common vocabulary and a set of relationships between components. If we know the conventions of such a system, then we can interpret the data within it. 4 These conventions are often documented in a specific kind of data referred to as Metadata. However, because people often make different choices about how to represent concepts, they create different ways of representing the same concepts. From these choices, data takes on different shapes. Think of the range of ways we have to represent calendar dates, a concept about which there is an agreed-to definition. Now consider more complex concepts (such as customer or product), where the granularity and level of detail of what needs to be represented is not always self-evident, and the process of representation grows more complex, as does the process of managing that information over time. (See Chapter 10). Even within a single organization, there are often multiple ways of representing the same idea. Hence the need for Data Architecture, modeling, governance, and stewardship, and Metadata and Data Quality management, all of which help people understand and use data. Across organizations, the problem of multiplicity multiplies. Hence the need for industry-level data standards that can bring more consistency to data. Organizations have always needed to manage their data, but changes in technology have expanded the scope of this management need as they have changed people’s understanding of what data is. These changes have enabled organizations to use data in new ways to create products, share information, create knowledge, and improve representation of information in a formalized manner suitable for communication, interpretation, or processing” (ISO 11179). This definition emphasizes the electronic nature of data and assumes, correctly, that data requires standards because it is managed through information technology systems. That said, it does not speak to the challenges of formalizing data in a consistent way, across disparate systems. Nor does it account well for the concept of unstructured data. 3 http://ubm.io/2c4yPOJ (Accessed 20016-12-04). http://bit.ly/1rOQkt1 (Accessed 20016-12-04). 4For additional information on the constructed-ness of data see: Kent, Data and Reality (2012) and Devlin, Business Unintelligence (2013). 20 DMBOK2 organizational success. But the rapid growth of technology and with it human capacity to produce, capture, and mine data for meaning has intensified the need to manage data effectively. 2.2 Data and Information Much ink has been spilled over the relationship between data and information. Data has been called the “raw material of information” and information has been called “data in context”. 5 Often a layered pyramid is used to describe the relationship between data (at the base), information, knowledge, and wisdom (at the very top). While the pyramid can be helpful in describing why data needs to be well-managed, this representation presents several challenges for data management. It is based on the assumption that data simply exists. But data does not simply exist. Data has to be created. By describing a linear sequence from data through wisdom, it fails to recognize that it takes knowledge to create data in the first place. It implies that data and information are separate things, when in reality, the two concepts are intertwined with and dependent on each other. Data is a form of information and information is a form of data. Within an organization, it may be helpful to draw a line between information and data for purposes of clear communication about the requirements and expectations of different uses by different stakeholders. (“Here is a sales report for the last quarter [information]. It is based on data from our data warehouse [data]. Next quarter these results [data] will be used to generate our quarter-over-quarter performance measures [information]”). Recognizing data and information need to be prepared for different purposes drives home a central tenet of data management: Both data and information need to be managed. Both will be of higher quality if they are managed together with uses and customer requirements in mind. Throughout the DMBOK, the terms will be used interchangeably. 2.3 Data as an Organizational Asset An asset is an economic resource, that can be owned or controlled, and that holds or produces value. Assets can be converted to money. Data is widely recognized as an enterprise asset, though understanding of what it means to manage data as an asset is still evolving. In the early 1990s, some organizations found it questionable whether the value of goodwill should be given a monetary value. Now, the ‘value of goodwill’ commonly shows up as an item on the Profit and Loss Statement (P&L). Similarly, while not universally adopted, monetization of data is becoming increasingly common. It will not be too long before we see this as a feature of P&Ls. (See Chapter 3.) Today’s organizations rely on their data assets to make more effective decisions and to operate more efficiently. Businesses use data to understand their customers, create new products and services, and improve operational efficiency by cutting costs and controlling risks. Government agencies, educational institutions, and not-for-profit 5 See English, 1999 and DAMA, 2009. DATA MANAGEMENT 21 organizations also need high quality data to guide their operational, tactical, and strategic activities. As organizations increasingly depend on data, the value of data assets can be more clearly established. Many organizations identify themselves as ‘data-driven’. Businesses aiming to stay competitive must stop making decisions based on gut feelings or instincts, and instead use event triggers and apply analytics to gain actionable insight. Being data-driven includes the recognition that data must be managed efficiently and with professional discipline, through a partnership of business leadership and technical expertise. Furthermore, the pace of business today means that change is no longer optional; digital disruption is the norm. To react to this, business must co-create information solutions with technical data professionals working alongside line-of-business counterparts. They must plan for how to obtain and manage data that they know they need to support business strategy. They must also position themselves to take advantage of opportunities to leverage data in new ways. 2.4 Data Management Principles Data management shares characteristics with other forms of asset management, as seen in Figure 1. It involves knowing what data an organization has and what might be accomplished with it, then determining how best to use data assets to reach organizational goals. Like other management processes, it must balance strategic and operational needs. This balance can best be struck by following a set of principles that recognize salient features of data management and guide data management practice. Data is an asset with unique properties: Data is an asset, but it differs from other assets in important ways that influence how it is managed. The most obvious of these properties is that data is not consumed when it is used, as are financial and physical assets. The value of data can and should be expressed in economic terms: Calling data an asset implies that it has value. While there are techniques for measuring data’s qualitative and quantitative value, there are not yet standards for doing so. Organizations that want to make better decisions about their data should develop consistent ways to quantify that value. They should also measure both the costs of low quality data and the benefits of high quality data. Managing data means managing the quality of data: Ensuring that data is fit for purpose is a primary goal of data management. To manage quality, organizations must ensure they understand stakeholders’ requirements for quality and measure data against these requirements. It takes Metadata to manage data: Managing any asset requires having data about that asset (number of employees, accounting codes, etc.). The data used to manage and use data is called Metadata. Because data cannot be held or touched, to understand what it is and how to use it requires definition and knowledge in the form of Metadata. Metadata originates from a range of processes related to data creation, processing, and use, including architecture, modeling, stewardship, governance, Data Quality management, systems development, IT and business operations, and analytics. 22 DMBOK2 DATA MANAGEMENT Data is valuable PRINCIPLES D ata is an asset with unique properties Effective data The value of data can and management requires should be expressed in leadership economic terms commitment Data Management Requirements are Business Requirements Managing data means managing the quality of data It takes Metadata to manage data It takes planning to manage data Data management requirements must drive Information Technology decisions Data Management depends on diverse skills D ata management is cross-functional Data management requires an enterprise perspective Data management must account for a range of perspectives Data Management is lifecycle management Different types of data have different lifecycle characteristics Managing data includes managing the risks associated with data Figure 1 Data Management Principles It takes planning to manage data: Even small organizations can have complex technical and business process landscapes. Data is created in many places and is moved between places for use. To coordinate work and keep the end results aligned requires planning from an architectural and process perspective. Data management is cross-functional; it requires a range of skills and expertise: A single team cannot manage all of an organization’s data. Data management requires both technical and non-technical skills and the ability to collaborate. Data management requires an enterprise perspective: Data management has local applications, but it must be applied across the enterprise to be as effective as possible. This is one reason why data management and data governance are intertwined. Data management must account for a range of perspectives: Data is fluid. Data management must constantly evolve to keep up with the ways data is created and used and the data consumers who use it. DATA MANAGEMENT 23 Data management is lifecycle management: Data has a lifecycle and managing data requires managing its lifecycle. Because data begets more data, the data lifecycle itself can be very complex. Data management practices need to account for the data lifecycle. Different types of data have different lifecycle characteristics: And for this reason, they have different management requirements. Data management practices have to recognize these differences and be flexible enough to meet different kinds of data lifecycle requirements. Managing data includes managing the risks associated with data: In addition to being an asset, data also represents risk to an organization. Data can be lost, stolen, or misused. Organizations must consider the ethical implications of their uses of data. Data-related risks must be managed as part of the data lifecycle. Data management requirements must drive Information Technology decisions: Data and data management are deeply intertwined with information technology and information technology management. Managing data requires an approach that ensures technology serves, rather than drives, an organization’s strategic data needs. Effective data management requires leadership commitment: Data management involves a complex set of processes that, to be effective, require coordination, collaboration, and commitment. Getting there requires not only management skills, but also the vision and purpose that come from committed leadership. 2.5 Data Management Challenges Because data management has distinct characteristics derived from the properties of data itself, it also presents challenges in following these principles. Details of these challenges are discussed in Sections 2.5.1 through 2.5.13. Many of these challenges refer to more than one principle. 2.5.1 Data Differs from Other Assets 6 Physical assets can be pointed to, touched, and moved around. They can be in only one place at a time. Financial assets must be accounted for on a balance sheet. However, data is different. Data is not tangible. Yet it is durable; it does not wear out, though the value of data often changes as it ages. Data is easy to copy and transport. But it is not easy to reproduce if it is lost or destroyed. Because it is not consumed when used, it can even be stolen without being gone. Data is dynamic and can be used for multiple purposes. The same data can even be used by multiple people at the same time – something that is impossible with physical or financial assets. Many uses of data beget more data. Most organizations must manage increasing volumes of data and the relation between data sets. 6 This section derives from Redman, Thomas. Data Quality for the Information Age (1996) pp. 41-42, 232-36; and Data Driven (2008), Chapter One, “The Wondrous and Perilous Properties of Data and Information.” 24 DMBOK2 These differences make it challenging to put a monetary value on data. Without this monetary value, it is difficult to measure how data contributes to organizational success. These differences also raise other issues that affect data management, such as defining data ownership, inventorying how much data an organization has, protecting against the misuse of data, managing risk associated with data redundancy, and defining and enforcing standards for Data Quality. Despite the challenges with measuring the value of data, most people recognize that data, indeed, has value. An organization’s data is unique to itself. Were organizationally unique data (such as customer lists, product inventories, or claim history) to be lost or destroyed, replacing it would be impossible or extremely costly. Data is also the means by which an organization knows itself – it is a meta-asset that describes other assets. As such, it provides the foundation for organizational insight. Within and between organizations, data and information are essential to conducting business. Most operational business transactions involve the exchange of information. Most information is exchanged electronically, creating a data trail. This data trail can serve purposes in addition to marking the exchanges that have taken place. It can provide information about how an organization functions. Because of the important role that data plays in any organization, it needs to be managed with care. 2.5.2 Data Valuation Value is the difference between the cost of a thing and the benefit derived from that thing. For some assets, like stock, calculating value is easy. It is the difference between what the stock cost when it was purchased and what it was sold for. But for data, these calculations are more complicated, because neither the costs nor the benefits of data are standardized. Since each organization’s data is unique to itself, an approach to data valuation needs to begin by articulating general cost and benefit categories that can be applied consistently within an organization. Sample categories include 7: Cost of obtaining and storing data Cost of replacing data if it were lost Impact to the organization if data were missing Cost of risk mitigation and potential cost of risks associated with data Cost of improving data Benefits of higher quality data What competitors would pay for data What the data could be sold for Expected revenue from innovative uses of data 7 While the DMBOK2 was preparing to go to press, another means of valuing data was in the news: Wannacry ransomware attack (17 May 2017) impacted more than 100K organizations in 150 countries. The culprits used the software to hold data hostage until victims paid ransom to get their data released. http://bit.ly/2tNoyQ7. DATA MANAGEMENT 25 A primary challenge to data asset valuation is that the value of data is contextual (what is of value to one organization may not be of value to another) and often temporal (what was valuable yesterday may not be valuable today). That said, within an organization, certain types of data are likely to be consistently valuable over time. Take reliable customer information, for example. Customer information may even grow more valuable over time, as more data accumulates related to customer activity. In relation to data management, establishing ways to associate financial value with data is critical, since organizations need to understand assets in financial terms in order to make consistent decisions. Putting value on data becomes the basis of putting value on data management activities. 8 The process of data valuation can also be used a means of change management. Asking data management professionals and the stakeholders they support to understand the financial meaning of their work can help an organization transform its understanding of its own data and, through that, its approach to data management. 2.5.3 Data Quality Ensuring that data is of high quality is central to data management. Organizations manage their data because they want to use it. If they cannot rely on it to meet business needs, then the effort to collect, store, secure, and enable access to it is wasted. To ensure data meets business needs, they must work with data consumers to define these needs, including characteristics that make data of high quality. Largely because data has been associated so closely with information technology, managing Data Quality has historically been treated as an afterthought. IT teams are often dismissive of the data that the systems they create are supposed to store. It was probably a programmer who first observed ‘garbage in, garbage out’ – and who no doubt wanted to let it go at that. But the people who want to use the data cannot afford to be dismissive of quality. They generally assume data is reliable and trustworthy, until they have a reason to doubt these things. Once they lose trust, it is difficult to regain it. Most uses of data involve learning from it in order to apply that learning and create value. Examples include understanding customer habits in order to improve a product or service and assessing organizational performance or market trends in order to develop a better business strategy, etc. Poor quality data will have a negative impact on these decisions. As importantly, poor quality data is simply costly to any organization. Estimates differ, but experts think organizations spend between 10-30% of revenue handling data quality issues. IBM estimated the cost of poor quality data in the US in 2016 was $3.1 Trillion. 9 Many of the costs of poor quality data are hidden, indirect, and therefore hard to measure. Others, like fines, are direct and easy to calculate. Costs come from: Scrap and rework Work-arounds and hidden correction processes 8 For case studies and examples, see Aiken and Billings, Monetizing Data Management (2014). 9 Reported in Redman, Thomas. “Bad Data Costs U.S. $3 Trillion per Year.” Harvard Business Review. 22 September 2016. https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year. 26 DMBOK2 Organizational inefficiencies or low productivity Organizational conflict Low job satisfaction Customer dissatisfaction Opportunity costs, including inability to innovate Compliance costs or fines Reputational costs The corresponding benefits of high quality data include: Improved customer experience Higher productivity Reduced risk Ability to act on opportunities Increased revenue Competitive advantage gained from insights on customers, products, processes, and opportunities As these costs and benefits imply, managing Data Quality is not a one-time job. Producing high quality data requires planning, commitment, and a mindset that builds quality into processes and systems. All data management functions can influence Data Quality, for good or bad, so all of them must account for it as they execute their work. (See Chapter 13). 2.5.4 Planning for Better Data As stated in the chapter introduction, deriving value from data does not happen by accident. It requires planning in many forms. It starts with the recognition that organizations can control how they obtain and create data. If they view data as a product that they create, they will make better decisions about it throughout its lifecycle. These decisions require systems thinking because they involve: The ways data connects business processes that might otherwise be seen as separate The relationship between business processes and the technology that supports them The design and architecture of systems and the data they produce and store The ways data might be used to advance organizational strategy Planning for better data requires a strategic approach to architecture, modeling, and other design functions. It also depends on strategic collaboration between business and IT leadership. And, of course, it depends on the ability to execute effectively on individual projects. The challenge is that there are usually organizational pressures, as well as the perennial pressures of time and money, that get in the way of better planning. Organizations must balance long- and short-term goals as they execute their strategies. Having clarity about the trade-offs leads to better decisions. DATA MANAGEMENT 27 2.5.5 Metadata and Data Management Organizations require reliable Metadata to manage data as an asset. Metadata in this sense should be understood comprehensively. It includes not only the business, technical, and operational Metadata described in Chapter 12, but also the Metadata embedded in Data Architecture, data models, data security requirements, data integration standards, and data operational processes. (See Chapters 4 – 11.) Metadata describes what data an organization has, what it represents, how it is classified, where it came from, how it moves within the organization, how it evolves through use, who can and cannot use it, and whether it is of high quality. Data is abstract. Definitions and other descriptions of context enable it to be understood. They make data, the data lifecycle, and the complex systems that contain data comprehensible. The challenge is that Metadata is a form of data and needs to be managed as such. Organizations that do not manage their data well generally do not manage their Metadata at all. Metadata management often provides a starting point for improvements in data management overall. 2.5.6 Data Management is Cross-functional Data management is a complex process. Data is managed in different places within an organization by teams that have responsibility for different phases of the data lifecycle. Data management requires design skills to plan for systems, highly technical skills to administer hardware and build software, data analysis skills to understand issues and problems, analytic skills to interpret data, language skills to bring consensus to definitions and models, as well as strategic thinking to see opportunities to serve customers and meet goals. The challenge is getting people with this range of skills and perspectives to recognize how the pieces fit together so that they collaborate well as they work toward common goals. 2.5.7 Establishing an Enterprise Perspective Managing data requires understanding the scope and range of data within an organization. Data is one of the ‘horizontals’ of an organization. It moves across verticals, such as sales, marketing, and operations… Or at least it should. Data is not only unique to an organization; sometimes it is unique to a department or other sub-part of an organization. Because data is often viewed simply as a by-product of operational processes (for example, sales transaction records are the by-product of the selling process), it is not always planned for beyond the immediate need. Even within an organization, data can be disparate. Data originates in multiple places within an organization. Different departments may have different ways of representing the same concept (e.g., customer, product, vendor). As anyone involved in a data integration or Master Data Management project can testify, subtle (or blatant) differences in representational choices present challenges in managing data across an organization. At the same time, stakeholders assume that an organization’s data should be coherent, and a goal of managing data is to make it fit together in common sense ways so that it is usable by a wide range of data consumers. 28 DMBOK2 One reason data governance has become increasingly important is to help organizations make decisions about data across verticals. (See Chapter 3.) 2.5.8 Accounting for Other Perspectives Today’s organizations use data that they create internally, as well as data that they acquire from external sources. They have to account for different legal and compliance requirements across national and industry lines. People who create data often forget that someone else will use that data later. Knowledge of the potential uses of data enables better planning for the data lifecycle and, with that, for better quality data. Data can also be misused. Accounting for this risk reduces the likelihood of misuse. 2.5.9 The Data Lifecycle Like other assets, data has a lifecycle. To effectively manage data assets, organizations need to understand and plan for the data lifecycle. Well-managed data is managed strategically, with a vision of how the organization will use its data. A strategic organization will define not only its data content requirements, but also its data management requirements. These include policies and expectations for use, quality, controls, and security; an enterprise approach to architecture and design; and a sustainable approach to both infrastructure and software development. The data lifecycle is based on the product lifecycle. It should not be confused with the systems development lifecycle. Conceptually, the data lifecycle is easy to describe (see Figure 2). It includes processes that create or obtain data, those that move, transform, and store it and enable it to be maintained and shared, and those that use or apply it, as well as those that dispose of it. 10 Throughout its lifecycle, data may be cleansed, transformed, merged, enhanced, or aggregated. As data is used or enhanced, new data is often created, so the lifecycle has internal iterations that are not shown on the diagram. Data is rarely static. Managing data involves a set of interconnected processes aligned with the data lifecycle. The specifics of the data lifecycle within a given organization can be quite complicated, because data not only has a lifecycle, it also has lineage (i.e., a pathway along which it moves from its point of origin to its point of usage, sometimes called the data chain). Understanding the data lineage requires documenting the origin of data sets, as well as their movement and transformation through systems where they are accessed and used. Lifecycle and lineage intersect and can be understood in relation to each other. The better an organization understands the lifecycle and lineage of its data, the better able it will be to manage its data. The focus of data management on the data lifecycle has several important implications: Creation and usage are the most critical points in the data lifecycle: Data management must be executed with an understanding of how data is produced, or obtained, as well as how data is used. It costs money to produce data. Data is valuable only when it is consumed or applied. (See Chapters 5, 6, 8, 11, and 14.) 10 See McGilvray (2008) and English (1999) for information on the product lifecycle and data. DATA MANAGEMENT 29 Design & Create / Plan Enable Obtain Store / Enhance Use Maintain Dispose of Figure 2 Data Lifecycle Key Activities Data Quality must be managed throughout the data lifecycle: Data Quality Management is central to data management. Low quality data represents cost