Lecture 12 - Data Governance (DW) Spliced PDF
Document Details
Uploaded by TriumphantPrologue
University of Santo Tomas Manila
Tags
Summary
This lecture discusses Data Governance, including various key elements such as data cataloging, data quality, data security, and data lineage. It emphasizes the importance of data governance in today's digital landscape and the organizational benefits it provides, such as improved operational efficiency and better decision-making. This lecture also covers essential aspects like data discovery, data sharing, and the core principles of data governance, such as accountability, standardization, and transparency.
Full Transcript
# CS-ELEC1C: Data Warehousing ## Data Governance ### What is Data Governance? Data Governance is the overall management and strategy of an organization's data assets to ensure the availability, usability, integrity, accuracy, and security of the data. * It involves defining and implementing pol...
# CS-ELEC1C: Data Warehousing ## Data Governance ### What is Data Governance? Data Governance is the overall management and strategy of an organization's data assets to ensure the availability, usability, integrity, accuracy, and security of the data. * It involves defining and implementing policies, procedures, and controls to govern the entire data lifecycle, from creation and collection to storage, analysis, dissemination, and deletion. * A good Data Governance strategy aims to establish accountability, transparency, and consistency in data-related decision-making processes across the organization. ### Why is Data Governance Important? * Data governance is of paramount importance in today's digital landscape. * It ensures data definitions are clearly defined and standardized across an organization, addressing any ambiguity or confusion that may arise. * Data governance fosters data literacy within organizations, empowering employees to understand, interpret, and effectively use data for decision-making and problem-solving. * Governance programs also pave the way for data science initiatives by providing a solid foundation of quality data that ensures data scientists can trust the data they analyze and derive meaningful insights from it. ### What are the Organizational Benefits of Data Governance? * **Increased Operational Efficiency and Reduced Costs:** Allows organizations to create a single source of truth for their data estate, preventing data sprawl and silos, and reducing duplication. * **Improved Productivity and Faster Decision-Making:** Promotes data democratization by ensuring quick access to high-quality, accurate, consistent, and trustworthy data. * **Enhanced Collaboration and Value Realization:** Lays the foundation for enhanced data collaboration and sharing across teams, business units, and partners; promoting a culture of knowledge sharing. * **Enhanced Security and Privacy:** Mitigates security and privacy risks by implementing controls and processes to prevent unauthorized access and misuse of sensitive data. * **Better Compliance with Regulations and Standards:** Results in better compliance with regulatory requirements, such as NPC; protecting the organization's reputation, avoiding potential financial and legal consequences. ### Key Elements of Data Governance * **Data cataloging:** Effective data governance requires knowledge of the data that exists within an organization. Data catalogs provide a centralized metadata repository for an organization's data assets. * A data catalog allows stakeholders to quickly discover, understand, and access the data they need, improving data-related activities such as discovery, governance, and analytics. * Data quality: Ensuring high data quality is crucial for accurate analytics, informed decision-making, and cost-effectiveness. * **Data security:** Organizations understand the significance of granting high-quality data access to their teams to drive insights and business value, while prioritizing sensitive data protection against unauthorized access. * Effective data access management is crucial for data security and governance, and a good data security governance program should include access controls that define which groups or individuals can access what data. * **Data classification:** Involves organizing and categorizing data based on its sensitivity, value, and criticality. * Classification allows organizations to identify and classify data based on its risk level and importance, allowing them to apply appropriate security measures and policies. * **Auditing data entitlements and access:** Effective data access auditing is a critical aspect of data governance and security governance programs, particularly in regulated industries. * By understanding who has access to what data and tracking recent access, organizations can proactively identify over-entitled users or groups and adjust their access accordingly, minimizing the risk of data misuse. * **Data lineage:** Helps organizations ensure data quality and trustworthiness by providing a better understanding of data sources and consumption. * It captures relevant metadata and events throughout the data's lifecycle, providing an end-to-end view of how data flows across an organization's data estate. * Data lineage empowers data consumers to perform better analyses, and helps data teams perform root cause analysis of any errors, significantly reducing debugging time. * **Data discovery:** As organizations continue to gather massive amounts of data from various sources, it's becoming increasingly important to make this data easily discoverable for analytics, AI or ML use cases. * This is critical to accelerate data democratization and unlock the true value of the data. * **Data sharing and collaboration:** Organizations exchanging data with internal teams, external partners, and customers across multiple clouds, data platforms and regions. * As the demand for external data continues to grow, it is critical for organizations to securely exchange data while maintaining control and visibility over how their sensitive information is used. * It is essential for organizations to invest in open format, interoperable and multicloud data sharing technologies to meet their data-driven innovation needs. * Moreover, data marketplaces serve as a bridge between data providers and consumers, facilitating the discovery and distribution of data sets. ### What does a good data governance solution look like? Data-forward organizations prioritize data, analytics and AI to drive business outcomes, and build their data strategies around a cohesive and sustainable data platform with a set of key capabilities: * **Centralized Data Catalog**: Stores all your data, ML models and analytics artifacts as well as metadata for each object; blends data from other catalog. * **Data Auditing**: Central auditing with alerts and monitoring capabilities to promote accountability & security. * **Data Lineage**: End-to-end visibility into how data flows in the lakehouse, from source to consumption, down to the column level. * **Unified Data Access Controls**: Unified permissions model across all assets, includes attribute-based access control (ABAC) for personally identifiable information (PII). * **Data Sharing and Collaboration**: Data sharing with fine-grained access controls across clouds, regions and platforms, preventing silos from forming. * **Data Platform**: Robust DQ management with built-in quality controls, testing, monitoring and enforcement to ensure accurate and useful data is available. * **Data Discovery**: Referencing relevant data by developers and stakeholders to accelerate time to value. * **Open Marketplace**: Discover, access and deploy data sets, as well as AI and analytical assets without proprietary platform dependencies, complicated ETL. * **Privacy-Safe Collaboration**: Collaborate on sensitive data with internal or external stakeholders in a privacy preserving environment ### Who are involved in Data Governance? * **Data Owner**: These individuals or departments are responsible for defining and maintaining the overall data strategy, policies, and standards within an organization. Data owners ensure data aligns with business goals and objectives. * **Data Steward**: Individuals responsible for the day-to-day management and oversight of specific data assets. They ensure data quality, integrity, and compliance with established policies and procedures. * **Data Custodian**: Responsible for the technical implementation and management of data storage, infrastructure, and security. They ensure the storage, backing up, and protection of organizational data. * **Data Users**: Individuals or departments relying on data to perform their job functions. Data users should understand and adhere to data governance policies and procedures to ensure data integrity and accuracy. ### Core and Basic Principles of Data Governance * **Principle I: Accountability**: Each agent, steward, and owner must know what their task is and how they can perform. If something does go wrong, there must be a clear process for holding that individual/team to account and ensuring it does not happen again. * **Principle II: Standardization**: Understanding and following regulatory laws is key to creating a successful strategy. These rules on data security & storage should be followed by businesses as standard practice to ensure your data framework may operate universally across legal jurisdictions. * **Principle III: Transparency:** The democratization of data, analytics and AI is at the core of data governance. It's about making data accessible and usable throughout your organization, without compromising on quality or security. **Data governance has everything to do with managing other people's data, such as your employees, clients, or customers. You must make it clear to everyone what your policies are, what data is collected, and why.** ### What are the main steps of Data Governance? 1. Identifying all sources of data. 2. Preparing metadata for the data and organizing a metadata storage option. 3. Setting up mechanisms to track data lineage for data flow and usage. 4. Scanning for sensitive data in your data estate. 5. Creating a governance framework, with set policies and procedures for the various overseers. 6. Checking the framework's compliance with relevant regulatory instruments. 7. Performing data quality checks by creating a rules library that is centrally managed and versioned. Update the rules library periodically with new rules. 8. Regularly audit data entitlements and access to ensure compliance. 9. Identifying further risks, such as lack of data security or excess access to sensitive data sets. 10. Securing the go-ahead from the senior executive team and data governance committee. 11. Hiring or training essential staff members, such as data stewards. 12. Establishing a means of data distribution, so eveyone in your organization can locate the centralized data catalog. 13. Having regular reviews with senior executives (inviting team member feedback). 14. Constantly adapting the governance model, such as when new data sets are introduced. ### Assessment 1. **Suppose you are the data governance officer of a multinational organization.** During a routine audit, you discover that sensitive customer data has been accessed by unauthorized personnel due to weak access controls. The breach has not yet been made public, but you have a legal obligation to notify the authorities within 72 hours under GDPR. **Write an essay discussing your approach to handling this situation.** Your essay should include: * The immediate steps you would take to mitigate the breach. * How you would inform stakeholders (internal teams, customers, regulators) and manage their expectations. * Measures you would implement to prevent similar incidents in the future. * The role of data governance policies and frameworks in managing such scenarios. 2. **Your organization is exploring the use of artificial intelligence (AI) to improve customer services, which involves analyzing vast amounts of personal data.** While the innovation promises significant business growth, it also introduces risks of non-compliance with data protection regulations like GDPR and potential ethical concerns. **Write an essay outlining how you would balance innovation and compliance as a data governance officer.** Your essay should include: * Strategies for ensuring compliance with data protection laws while pursuing AI initiatives. * How to incorporate ethical considerations into data governance policies. * Steps to engage stakeholders in decision-making to align business goals with regulatory requirements. * The importance of maintaining transparency and accountability in the organization's data practices.