Automation Techniques PDF
Document Details
Uploaded by VictoriousRubellite
Tags
Summary
This document provides an overview of automation techniques within Google Cloud, exploring various services like Cloud Scheduler, Cloud Composer, Cloud Run functions, and Eventarc. It discusses their functionalities, use cases, and how they can be used for recurring tasks and event-driven workflows.
Full Transcript
Proprietary + Confidential 06 Automation Techniques Proprietary + Confidential In this module, you learn to... Explain the automation patterns and options 01 available for pipelines. Use Cloud S...
Proprietary + Confidential 06 Automation Techniques Proprietary + Confidential In this module, you learn to... Explain the automation patterns and options 01 available for pipelines. Use Cloud Scheduler to trigger a Dataform 02 SQL workflow. 03 Use Cloud Composer to orchestrate pipelines. Use Cloud Run functions to execute code in 04 response to a Google Cloud event. Explain the functionality and automation use 05 cases for Eventarc. In this module, first, you review the automation patterns and options available for pipelines. Second, you explore Cloud Scheduler and workflows. Then, you review the functionality and use cases for Cloud Composer. Next, you review the capabilities of Cloud Run functions. Finally, you look at the functionality and automation use cases for Eventarc. Proprietary + Confidential In this section, you explore Automation Patterns and Options for Pipelines Cloud Scheduler and Workflows Cloud Composer Cloud Run Functions Eventarc Proprietary + Confidential ELT and ETL workloads can be automated to run on a recurring basis Scheduled ELT example Schedule BigQuery Dataform BigQuery Event-driven ETL batch example File Cloud Cloud Dataproc upload Storage Storage On Google Cloud, ELT and ETL workloads can be automated for recurring execution. For example, in a scheduled ELT, a defined schedule triggers data extraction from BigQuery, transformation via Dataform, and loading back into BigQuery. Meanwhile, in the example shown for an event-driven ETL, a file upload to Cloud Storage initiates a batch process using Dataproc, culminating in data landing in Cloud Storage. Proprietary + Confidential Google Cloud provides multiple services for automating and orchestrating your workloads Cloud Cloud One-off or scheduled Scheduler Composer Workflow orchestration Cloud Run Eventarc functions Event-based execution Automation & orchestration Google Cloud offers a suite of services to automate and orchestrate your workloads. For scheduled tasks or one-off jobs, you can leverage Cloud Scheduler and Cloud Composer. If your workflows require orchestration, Cloud Composer is the ideal choice. To trigger actions based on events, consider using Cloud Run functions or Eventarc. Proprietary + Confidential In this section, you explore Automation Patterns and Options for Pipelines Cloud Scheduler and Workflows Cloud Composer Cloud Run Functions Eventarc Proprietary + Confidential Cloud Scheduler invokes your workloads at recurring intervals Cloud Scheduler Trigger for execution HTTP/S call Schedule App Engine HTTP Frequency (unix-cron format) Pub/Sub message Retry configuration Workflows via HTTP Cloud Scheduler empowers you to automate tasks by invoking your workloads at specified, recurring intervals. It grants you the flexibility to define both the frequency and precise time of day for job execution. Triggers can be based on HTTPS calls, App Engine HTTP calls, Pub/Sub messages, or Workflows. Proprietary + Confidential Example: Trigger a Dataform SQL workflow - createCompilationResult: call: http.post args: url: ${"https://dataform.googleapis.com/[...]"} Cloud auth: Scheduler type: OAuth2 Schedule body: gitCommitish: result: compilationResult - createWorkflowInvocation: call: http.post Workflows args: url: ${"https://dataform.googleapis.com/[...]"} YAML config auth: type: OAuth2 body: compilationResult: ${compilationResult.body.name} Dataform invocationConfig: includedTags: - Execution result: workflowInvocation Cloud Scheduler can be used to trigger a Dataform SQL workflow. In the example code, a scheduled job in Cloud Scheduler initiates the process, defined in a YAML config file. The workflow involves two main steps: creating a compilation result from your Dataform code and then triggering a workflow invocation using that result, ensuring only specific parts of your Dataform project execute based on included tags. Proprietary + Confidential In this section, you explore Automation Patterns and Options for Pipelines Cloud Scheduler and Workflows Cloud Composer Cloud Run Functions Eventarc Proprietary + Confidential Cloud Composer orchestrates your pipelines on different systems into workflows Cloud Composer Execution on systems Apache Airflow Operators Tasks Dependencies DAG Google Cloud On-premises Rich integrations with Google Cloud services and others Multicloud Triggering Monitoring Logging Cloud Composer acts as a central orchestrator, seamlessly integrating your pipelines across diverse systems—whether on Google Cloud, on-premises, or even multicloud environments. Cloud Composer leverages Apache Airflow, incorporating essential elements like operators, tasks, and dependencies to define and manage your workflows. Additionally, Cloud Composer offers robust features for triggering, monitoring, and logging, ensuring comprehensive control over your pipeline executions. Proprietary + Confidential Develop your DAG in Python using Apache Airflow operators and submit it to Cloud Composer for execution Apache Airflow DAG with DAG deployment Cloud Composer operators dependencies dag.py DAG parsing Cloud Storage Python workflow Dataflow Scheduling and execution operator step one Dataproc Error handling, retries BigQuery Cloud operator step… n Storage Monitoring, logging...many more Developing and executing workflows using Apache Airflow and Cloud Composer is easily done using Python. First, you leverage Apache Airflow operators to craft your directed acyclic graph or DAG, defining the tasks and their dependencies. Next, the DAG is deployed to Cloud Composer, which handles the parsing and scheduling of your workflow. Cloud Composer further manages the execution of your tasks, incorporating features like error handling, retries, and monitoring to ensure smooth operation. Proprietary + Confidential Example: Run a data analytics DAG with models.DAG( Composer workflow "data_analytics_dag", # Define schedule and default args Cloud Storage ) as dag: retrieve file create_batch = DataprocCreateBatchOperator( # Specify Dataproc settings, e.g. which # Python file to execute BigQuery ) load file load_external_dataset = GCSToBigQueryOperator( # Specify Cloud Storage source file to load # and BigQuery table destination BigQuery ) execute JOIN query with TaskGroup("join_bq_datasets") as bq_join_group: # Define the SQL query in BigQuery to join # the loaded table with another one BigQuery bq_join_holidays_weather_data = BigQueryInsertJobOperator( insert result set # Execute query and insert result into BigQuery table ) Dataproc # define the dependencies of the workflow transform data load_external_dataset >> bq_join_group >> create_batch With minimal effort, Cloud Composer can be used to run a data analytics DAG. In the example code, the workflow retrieves a file from Cloud Storage, loads it into BigQuery, and then performs a JOIN operation with an existing BigQuery table. The joined results are then inserted into a new BigQuery table. Finally, Dataproc is used for further data transformation. Proprietary + Confidential In this section, you explore Automation Patterns and Options for Pipelines Cloud Scheduler and Workflows Cloud Composer Cloud Run Functions Eventarc Proprietary + Confidential Use Cloud Run functions to execute code based on Google Cloud events Cloud Run functions Event triggers Event Function HTTP Metadata API calls Pub/Sub Cloud Storage Serverless execution Firestore Multiple programming languages Eventarc Cloud Run functions allow you to execute code in response to various Google Cloud events. These events can originate from sources like HTTP requests, Pub/Sub messages, Cloud Storage changes, Firestore updates, or custom events through Eventarc. When triggered, a Cloud Run function provides a serverless execution environment where your code runs, supporting multiple programming languages for flexibility. Proprietary + Confidential Example: Trigger a Dataproc workflow template after a file upload to Cloud Storage // pre-work: define project ID, workflow template, region Event-driven process // set up Dataproc API client in specific region const client = new dataproc.WorkflowTemplateServiceClient({ apiEndpoint: `${region}-dataproc.googleapis.com`, }); Cloud Storage new file event // retrieve bucket and name of new object on Cloud Storage const file = data; const inputBucketUri = `gs://${file.bucket}/${file.name}`; Cloud Run functions // construct request to Dataproc API call Dataproc API const request = { name: client.projectRegionWorkflowTemplatePath(projectId, region, workflowTemplate), parameters: {"INPUT_BUCKET_URI": inputBucketUri} Dataproc }; execute job // call API to launch the workflow client.instantiateWorkflowTemplate(request).then(responses => {console.log("Launched Dataproc Workflow:", responses); Cloud Storage }) output result.catch(err => {console.error(err); }); Cloud Run functions easily automate routine tasks on Google Cloud. In the example code, a Dataproc workflow template is triggered after a file is uploaded to Cloud Storage. A Cloud Run function is used to capture the Cloud Storage new file event and call the Dataproc API. The Dataproc API then executes the specified workflow template, using the uploaded file as an input parameter. The final result of the workflow execution is stored in Cloud Storage. Proprietary + Confidential In this section, you explore Automation Patterns and Options for Pipelines Cloud Scheduler and Workflows Cloud Composer Cloud Run Functions Eventarc Proprietary + Confidential Build a unified event-driven architecture for loosely coupled services with Eventarc Eventarc Event sources Event targets Google Cloud Cloud Audit Logs Cloud Run functions direct events Trigger GKE Third-party Eventarc API Internal HTTP endpoint Unified CloudEvent message Custom Workflows Pub/Sub message Eventarc enables the creation of a unified event-driven architecture for loosely coupled services. Eventarc connects various event sources, including Google Cloud services, third-party systems, and custom events via Pub/Sub, to a range of event targets like Cloud Run functions, and more. By using a standardized CloudEvent message format, Eventarc simplifies the integration of diverse systems and facilitates the development of responsive, scalable applications. Proprietary + Confidential Example: Respond to INSERT events in BigQuery Cloud Audit Logs event as Eventarc trigger: { protoPayload: { Infrequent @type: "type.googleapis.com/google.cloud.audit.AuditLog" Execution event metadata: { @type: "type.googleapis.com/google.cloud.audit.BigQueryAuditMetadata" Rebuild dashboard tableDataChange: { INSERT Cloud Run insertedRowsCount: "2" jobName: "projects//jobs/bquxjob_4d3da71c_190de71a2f3" reason: "QUERY" Retrain ML model } Cloud Run Table } BigQuery methodName: "google.cloud.bigquery.v2.JobService.InsertJob" resourceName: "projects//datasets//tables/"...other custom action serviceName: "bigquery.googleapis.com" } Eventarc enables deep monitoring of logging and other events which occur less frequently on Google Cloud. In the example code, Eventarc is used to trigger actions in response to data insertion events in BigQuery. When an insert operation occurs in a BigQuery table, it generates a Cloud Audit Log event. Eventarc can capture this event and initiate various actions such as rebuilding a dashboard, retraining an ML model, or executing any other custom action based on the specific requirements. Proprietary + Confidential Compare automation options Cloud Cloud Run Cloud Scheduler Eventarc Composer functions Trigger type schedule, manual schedule, manual event event Serverless yes no yes yes Coding effort low medium high high Python, Java, Go, Programming YAML (with Python Node.js, Ruby, any languages Workflows) PHP,.NET core In summary, there are various Google Cloud data-related automation options. Cloud Scheduler and Cloud Composer are suitable for scheduled or manual triggers, while Cloud Run functions and Eventarc are event-driven. Cloud Scheduler offers low coding effort with YAML, and Cloud Composer requires medium effort with Python. Cloud Run functions support multiple languages, while Eventarc is language-agnostic. As a final note, all options except Cloud Composer are serverless. Proprietary + Confidential Lab: Use Cloud Run Functions to Load BigQuery 45 min Learning objectives Create a Cloud Run function. Deploy and test the Cloud Run function. View data in BigQuery and review Cloud Run function logs. In this lab, you create a Cloud Run function to load BigQuery. You create a Cloud Run function using the Cloud SDK. You then deploy and test the Cloud Run function. Finally, you view data in BigQuery and review Cloud Run function logs.