Professional Data Engineer Sample Questions
10 Questions
14 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

You are working on optimizing BigQuery for a query that is run repeatedly on a single table. The data queried is about 1 GB, and some rows are expected to change about 10 times every hour. You have optimized the SQL statements as much as possible. You want to further optimize the query's performance. What should you do?

  • Create a materialized view based on the table, and query that view. (correct)
  • Enable caching of the queried data so that subsequent queries are faster.
  • Create a scheduled query, and run it a few minutes before the report has to be created.
  • Reserve a larger number of slots in advance so that you have maximum compute power to execute the query.
  • Several years ago, you built a machine learning model for an ecommerce company. Your model made good predictions. Then a global pandemic occurred, lockdowns were imposed, and many people started working from home. Now the quality of your model has degraded. You want to improve the quality of your model and prevent future performance degradation. What should you do?

  • Retrain the model with data from the first 30 days of the lockdown.
  • Monitor data until usage patterns normalize, and then retrain the model.
  • Retrain the model with data from the last 30 days. After one year, return to the older model.
  • Retrain the model with data from the last 30 days. Add a step to continuously monitor model input data for changes, and retrain the model. (correct)
  • A new member of your development team works remotely. The developer will write code locally on their laptop, which will connect to a MySQL instance on Cloud SQL. The instance has an external (public) IP address. You want to follow Google-recommended practices when you give access to Cloud SQL to the new team member. What should you do?

  • Ask the developer for their laptop's IP address, and add it to the authorized networks list.
  • Remove the external IP address, and replace it with an internal IP address. Add only the IP address for the remote developer's laptop to the authorized list. (correct)
  • Give instance access permissions in Identity and Access Management (IAM), and have the developer run Cloud SQL Auth proxy to connect to a MySQL instance.
  • Give instance access permissions in Identity and Access Management (IAM), change the access to "private service access" for security, and allow the developer to access Cloud SQL from their laptop.
  • Your Cloud Spanner database stores customer address information that is frequently accessed by the marketing team. When a customer enters the country and the state where they live, this information is stored in different tables connected by a foreign key. The current architecture has performance issues. You want to follow Google-recommended practices to improve performance. What should you do?

    <p>Create interleaved tables, and store states under the countries.</p> Signup and view all the answers

    Your company runs its business-critical system on PostgreSQL. The system is accessed simultaneously from many locations around the world and supports millions of customers. Your database administration team manages the redundancy and scaling manually. You want to migrate the database to Google Cloud. You need a solution that will provide global scale and availability and require minimal maintenance. What should you do?

    <p>Migrate to Cloud Spanner.</p> Signup and view all the answers

    Your company collects data about customers to regularly check their health vitals. You have millions of customers around the world. Data is ingested at an average rate of two events per 10 seconds per user. You need to be able to visualize data in Bigtable on a per user basis. You need to construct the Bigtable key so that the operations are performant. What should you do?

    <p>Construct the key as user-id#device-id#activity-id#timestamp.</p> Signup and view all the answers

    Your company is hiring several business analysts who are new to BigQuery. The analysts will use BigQuery to analyze large quantities of data. You need to control costs in BigQuery and ensure that there is no budget overrun while you maintain the quality of query results. What should you do?

    <p>Set a customized project-level or user-level daily quota to acceptable values.</p> Signup and view all the answers

    Your Bigtable database was recently deployed into production. The scale of data ingested and analyzed has increased significantly, but the performance has degraded. You want to identify the performance issue. What should you do?

    <p>Use Key Visualizer to analyze performance.</p> Signup and view all the answers

    Your company is moving your data analytics to BigQuery. Your other operations will remain on-premises. You need to transfer 800 TB of historic data. You also need to plan for 30 Gbps of daily data transfers that must be appended for analysis the next day. You want to follow Google-recommended practices to transfer your data. What should you do?

    <p>Use a Transfer Appliance to move the existing data to Google Cloud. Set up a Dedicated or Partner Interconnect for daily transfers.</p> Signup and view all the answers

    Your team runs Dataproc workloads where the worker node takes about 45 minutes to process. You have been exploring various options to optimize the system for cost, including shutting down worker nodes aggressively. However, in your metrics you see that the entire job takes even longer. You want to optimize the system for cost without increasing job completion time. What should you do?

    <p>Set a graceful decommissioning timeout greater than 45 minutes.</p> Signup and view all the answers

    Study Notes

    Optimizing BigQuery Performance

    • To optimize a repeatedly run query on a single 1 GB table with 10 row changes per hour, consider further optimization beyond SQL statement optimization.

    Improving Machine Learning Model Quality

    • To improve a degraded machine learning model's quality, retrain the model with new data that includes the changes caused by the global pandemic and lockdowns.

    Securely Granting Access to Cloud SQL

    • To follow Google-recommended practices when granting access to Cloud SQL to a new team member, use the Cloud SQL proxy to connect to the instance instead of the external IP address.

    Optimizing Cloud Spanner Performance

    • To improve performance issues in a Cloud Spanner database storing customer address information, denormalize the data by storing country and state in the same table to reduce joins.

    Migrating PostgreSQL to Google Cloud

    • To migrate a business-critical PostgreSQL system to Google Cloud, use Cloud SQL or Cloud Spanner to provide global scale and availability with minimal maintenance.

    Constructing Performant Bigtable Keys

    • To construct a performant Bigtable key for visualizing data on a per-user basis, use a composite key that includes the user ID and a reverse timestamp to optimize row key selection.

    Controlling BigQuery Costs

    • To control costs in BigQuery, set up budgets and alerts, use cost estimation, and optimize queries to ensure quality results while preventing budget overruns.

    Identifying Bigtable Performance Issues

    • To identify performance issues in a Bigtable database, use the Cloud Console or the Bigtable CLI to monitor performance metrics and identify bottlenecks.

    Transferring Data to BigQuery

    • To transfer 800 TB of historic data and 30 Gbps of daily data to BigQuery, use the BigQuery Data Transfer Service and follow Google-recommended practices for data transfer.

    Optimizing Dataproc Workloads

    • To optimize Dataproc workloads for cost without increasing job completion time, use cluster autoscaling and dynamic node allocation to adjust the number of nodes based on workload demand.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Description

    Professional Data Engineer Sample Questions

    More Like This

    Use Quizgecko on...
    Browser
    Browser