Podcast
Questions and Answers
You are working on optimizing BigQuery for a query that is run repeatedly on a single table. The data queried is about 1 GB, and some rows are expected to change about 10 times every hour. You have optimized the SQL statements as much as possible. You want to further optimize the query's performance. What should you do?
You are working on optimizing BigQuery for a query that is run repeatedly on a single table. The data queried is about 1 GB, and some rows are expected to change about 10 times every hour. You have optimized the SQL statements as much as possible. You want to further optimize the query's performance. What should you do?
Several years ago, you built a machine learning model for an ecommerce company. Your model made good predictions. Then a global pandemic occurred, lockdowns were imposed, and many people started working from home. Now the quality of your model has degraded. You want to improve the quality of your model and prevent future performance degradation. What should you do?
Several years ago, you built a machine learning model for an ecommerce company. Your model made good predictions. Then a global pandemic occurred, lockdowns were imposed, and many people started working from home. Now the quality of your model has degraded. You want to improve the quality of your model and prevent future performance degradation. What should you do?
A new member of your development team works remotely. The developer will write code locally on their laptop, which will connect to a MySQL instance on Cloud SQL. The instance has an external (public) IP address. You want to follow Google-recommended practices when you give access to Cloud SQL to the new team member. What should you do?
A new member of your development team works remotely. The developer will write code locally on their laptop, which will connect to a MySQL instance on Cloud SQL. The instance has an external (public) IP address. You want to follow Google-recommended practices when you give access to Cloud SQL to the new team member. What should you do?
Your Cloud Spanner database stores customer address information that is frequently accessed by the marketing team. When a customer enters the country and the state where they live, this information is stored in different tables connected by a foreign key. The current architecture has performance issues. You want to follow Google-recommended practices to improve performance. What should you do?
Your Cloud Spanner database stores customer address information that is frequently accessed by the marketing team. When a customer enters the country and the state where they live, this information is stored in different tables connected by a foreign key. The current architecture has performance issues. You want to follow Google-recommended practices to improve performance. What should you do?
Signup and view all the answers
Your company runs its business-critical system on PostgreSQL. The system is accessed simultaneously from many locations around the world and supports millions of customers. Your database administration team manages the redundancy and scaling manually. You want to migrate the database to Google Cloud. You need a solution that will provide global scale and availability and require minimal maintenance. What should you do?
Your company runs its business-critical system on PostgreSQL. The system is accessed simultaneously from many locations around the world and supports millions of customers. Your database administration team manages the redundancy and scaling manually. You want to migrate the database to Google Cloud. You need a solution that will provide global scale and availability and require minimal maintenance. What should you do?
Signup and view all the answers
Your company collects data about customers to regularly check their health vitals. You have millions of customers around the world. Data is ingested at an average rate of two events per 10 seconds per user. You need to be able to visualize data in Bigtable on a per user basis. You need to construct the Bigtable key so that the operations are performant. What should you do?
Your company collects data about customers to regularly check their health vitals. You have millions of customers around the world. Data is ingested at an average rate of two events per 10 seconds per user. You need to be able to visualize data in Bigtable on a per user basis. You need to construct the Bigtable key so that the operations are performant. What should you do?
Signup and view all the answers
Your company is hiring several business analysts who are new to BigQuery. The analysts will use BigQuery to analyze large quantities of data. You need to control costs in BigQuery and ensure that there is no budget overrun while you maintain the quality of query results. What should you do?
Your company is hiring several business analysts who are new to BigQuery. The analysts will use BigQuery to analyze large quantities of data. You need to control costs in BigQuery and ensure that there is no budget overrun while you maintain the quality of query results. What should you do?
Signup and view all the answers
Your Bigtable database was recently deployed into production. The scale of data ingested and analyzed has increased significantly, but the performance has degraded. You want to identify the performance issue. What should you do?
Your Bigtable database was recently deployed into production. The scale of data ingested and analyzed has increased significantly, but the performance has degraded. You want to identify the performance issue. What should you do?
Signup and view all the answers
Your company is moving your data analytics to BigQuery. Your other operations will remain on-premises. You need to transfer 800 TB of historic data. You also need to plan for 30 Gbps of daily data transfers that must be appended for analysis the next day. You want to follow Google-recommended practices to transfer your data. What should you do?
Your company is moving your data analytics to BigQuery. Your other operations will remain on-premises. You need to transfer 800 TB of historic data. You also need to plan for 30 Gbps of daily data transfers that must be appended for analysis the next day. You want to follow Google-recommended practices to transfer your data. What should you do?
Signup and view all the answers
Your team runs Dataproc workloads where the worker node takes about 45 minutes to process. You have been exploring various options to optimize the system for cost, including shutting down worker nodes aggressively. However, in your metrics you see that the entire job takes even longer. You want to optimize the system for cost without increasing job completion time. What should you do?
Your team runs Dataproc workloads where the worker node takes about 45 minutes to process. You have been exploring various options to optimize the system for cost, including shutting down worker nodes aggressively. However, in your metrics you see that the entire job takes even longer. You want to optimize the system for cost without increasing job completion time. What should you do?
Signup and view all the answers
Study Notes
Optimizing BigQuery Performance
- To optimize a repeatedly run query on a single 1 GB table with 10 row changes per hour, consider further optimization beyond SQL statement optimization.
Improving Machine Learning Model Quality
- To improve a degraded machine learning model's quality, retrain the model with new data that includes the changes caused by the global pandemic and lockdowns.
Securely Granting Access to Cloud SQL
- To follow Google-recommended practices when granting access to Cloud SQL to a new team member, use the Cloud SQL proxy to connect to the instance instead of the external IP address.
Optimizing Cloud Spanner Performance
- To improve performance issues in a Cloud Spanner database storing customer address information, denormalize the data by storing country and state in the same table to reduce joins.
Migrating PostgreSQL to Google Cloud
- To migrate a business-critical PostgreSQL system to Google Cloud, use Cloud SQL or Cloud Spanner to provide global scale and availability with minimal maintenance.
Constructing Performant Bigtable Keys
- To construct a performant Bigtable key for visualizing data on a per-user basis, use a composite key that includes the user ID and a reverse timestamp to optimize row key selection.
Controlling BigQuery Costs
- To control costs in BigQuery, set up budgets and alerts, use cost estimation, and optimize queries to ensure quality results while preventing budget overruns.
Identifying Bigtable Performance Issues
- To identify performance issues in a Bigtable database, use the Cloud Console or the Bigtable CLI to monitor performance metrics and identify bottlenecks.
Transferring Data to BigQuery
- To transfer 800 TB of historic data and 30 Gbps of daily data to BigQuery, use the BigQuery Data Transfer Service and follow Google-recommended practices for data transfer.
Optimizing Dataproc Workloads
- To optimize Dataproc workloads for cost without increasing job completion time, use cluster autoscaling and dynamic node allocation to adjust the number of nodes based on workload demand.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
Professional Data Engineer Sample Questions