Lineage building fundamentals.docx
Document Details
Uploaded by AmpleAqua
Tags
Full Transcript
**Auto Lineage Prerequisites** ============================== There are certain prerequisites that need to be completed while creating a connection before building the lineage automatically. In Administration \> Connectors, adding a connector involves entering connector details and license add-on...
**Auto Lineage Prerequisites** ============================== There are certain prerequisites that need to be completed while creating a connection before building the lineage automatically. In Administration \> Connectors, adding a connector involves entering connector details and license add-on options. Only the **Connector Creator** can select the Auto Lineage option to build lineage automatically. This option enables tracking data origins (source and destination) and paths. Provide all connection details, save, and validate to ensure the correct setup. **Crawler Settings** -------------------- Users can select a validated connector of their choice, click the 9 dot menu, select the settings option, Select the crawler option on the top pane, and follow the options as per the required connector. ![](media/image30.png) **For example**, the setup of an RDBMS connector differs from that of a report connector. Users with the Integration Admin role can configure these settings. ### **RDBMS Connector** - - - ### **Reporting Connectors** ![](media/image27.png) i. ii. ### **ETL Connector** **Building Lineage** ==================== Building lineage can be done in two ways: 1. 2. **Auto Lineage** ---------------- **Field** **Description** ------------------------ ------------------------------------------------------------------------------------------------------------------------------------------------------ **Connector Name** Shows the name of the chosen connector, helping users identify the data source for lineage. **Codes Loaded** It displays the total number of source codes crawled within the selected connector, giving users an idea of the available data. **Codes Processed** Shows the number of source codes already processed in the lineage-building process, allowing users to track progress. **Codes Unprocessed** It shows the count of source codes yet to be processed for lineage building, giving users an overview of the remaining tasks. **Last Updated Count** Shows the number of source codes that have been added from the previous crawl. **Last Job Status** Updates users on the most recent job submission for building lineage within the connector, indicating success, ongoing status, or issues. **Last Run Date** Displays a timestamp of the last lineage-building job within the connector, helping users track their activities and ensure up-to-date data lineage. ### ### **Ways of Building Auto Lineage** Users can select source codes from the list and build auto lineage by selecting the 9 dots menu. It provides 5 options to meet specific requirements, making the process flexible. Here is an overview of these options with context: #### **Build Lineage for Selected Codes** #### **Build Lineage for New or Changed Codes** #### **Build Lineage for All Codes** #### **Fetch Code Log and Build Lineage** Various options to configure query log settings include **Query Log Settings** **Description** ------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Query Displays the queries that are used to retrieve the Query Logs from the schema Look back Period This feature allows the user to fetch queries that were processed in the datasource for that number of days ago. For example, the lookback period can be set to 1 day, two days, or more. The maximum Look back period is seven days. Include Query Type Users can include query types such as Select, Insert, Update, and Delete Exclude Users In the Query Logs, users have the option to exclude unnecessary users. If a username is specified in the excluded user field, the query log for queries executed by that particular user is excluded from retrieval. Schema The user has the option to select a specific schema to apply query log settings. If a user leaves this blank, it will fetch the query log based on all the schema available for that database connection. #### **Export Source Code to File** #### **Import Source Code from File** ### **Viewing Source Codes** OvalEdge lets users access important information for each query on the \"Build Auto Lineage\" page after they submit and execute a job. This feature gives users detailed insights into their data lineage creation. Here\'s a close look at the key information for each query. ![](media/image15.png) **Schema Name:** This field shows the code schema in the connector. It helps understand the code\'s structure and context, ensuring clarity in the data lineage. **Code Name:** The \"Code Name\" section shows the name of the code. This helps users identify and refer to the code in their lineage quickly. **Lineage Status:** Lineage Status\" indicates the data lineage progress. It shows the current status of the query in OvalEdge. Various statuses are used to communicate the query\'s progress, making the lineage creation process transparent and clear. The statuses include: **Status** **Parse** **Lineage Discovered** **Table Type** -------------------------------------------------- ----------- ------------------------ ------------------------- **SUCCESS\_LINEAGE\_BUILD** 100% YES Real Table **SUCCESS\_WITH\_TEMP** 100% YES Temp Table **SUCCESS\_LINEAGE\_DOES\_NOT\_EXIST** 100% NO -- **SUCCESS\_LINEAGE\_FIXED** 100% YES Real Table **SUCCESS\_LINEAGE\_PARTIALLY\_BUILD** \ b. c. d. 1. 2. **Manual Lineage** ------------------ Users can manually build lineage using 1. 2. 3. ### **Lineage Maintenance** a. b. - - - - #### **How to Add Column Mapping** - - - - - - ### **Load Metadata From Files** ### **OvalEdge API's** ![](media/image4.png) **Accessing Lineage in Data Catalog** ===================================== Understanding data lineage is crucial for effective data management. OvalEdge has tools to help users access and interpret this data. This overview will show how users can navigate and understand lineage data in the Data Catalog. **Accessing Lineage** --------------------- Users access lineage information at the object level in the Data Catalog. The lineage tab, available for every object, details how data objects are interconnected and flow through connected source systems. ### **Graphical View** The Graphical View visually overviews the lineage, showing connections and relationships between data objects. Here\'s a detailed breakdown of what the graphical view includes: #### **Lineage Nodes** a. b. c. d. e. f. g. i. ii. iii. iv. v. h. i. #### **Lineage Differentiation in OvalEdge** - - #### **Column Mapping** ![](media/image26.png) ### **Tabular View** Lineage data is presented in a structured table format, showing one upstream and one downstream lineage level. This clear representation helps users understand data flow easily, and they can download the table for detailed analysis. Note: The tabular view displays only 1 level of upstream and one level of downstream objects. #### **Download** Users can download the tabular lineage representation for in-depth data lineage analysis. This empowers them to gain insights, make informed decisions, and ensure data integrity. OvalEdge lets users customize their download preferences, specifying what they want to download and the level of lineage data to include. These options ensure users get exactly the information they need for their analyses. ![](media/image3.png) ***Note:*** Users can download lineage data at Level -1, which covers all lineage levels and gives a complete overview of the data\'s journey. ### **Process Upstream/Downstream Objects** ======================================= Process Upstream and Downstream Objects allow updating three key parameters: - - - ### **Caution for All Downstream Objects** ### **Remove Caution All the Downstream Objects** ### **Copy Metadata using Lineage** **Advanced Job** ================ **Refresh Has-Lineage:** This job refreshes the Has Lineage column in the Data Catalog list page. The column indicates whether the object has lineage or not.