Podcast
Questions and Answers
You want to use a database of information about tissue samples to classify future tissue samples as either normal or mutated. You are evaluating an unsupervised anomaly detection method for classifying the tissue samples.
Which two characteristic support this method? (Choose two.)
A. There are very few occurrences of mutations relative to normal samples.
B. There are roughly equal occurrences of both normal and mutated samples in the database.
C. You expect future mutations to have different features from the mutated samples in the database.
D. You expect future mutations to have similar features to the mutated samples in the database.
E. You already have labels for which samples are mutated and which are normal in the database.
You want to use a database of information about tissue samples to classify future tissue samples as either normal or mutated. You are evaluating an unsupervised anomaly detection method for classifying the tissue samples.
Which two characteristic support this method? (Choose two.)
A. There are very few occurrences of mutations relative to normal samples. B. There are roughly equal occurrences of both normal and mutated samples in the database. C. You expect future mutations to have different features from the mutated samples in the database. D. You expect future mutations to have similar features to the mutated samples in the database. E. You already have labels for which samples are mutated and which are normal in the database.
Your company's on-premises Apache Hadoop servers are approaching end-of-life, and IT has decided to migrate the cluster to Google Cloud Dataproc. A like-for- like migration of the cluster would require 50 TB of Google Persistent Disk per node. The CIO is concerned about the cost of using that much block storage. You want to minimize the storage cost of the migration. What should you do?
Your company's on-premises Apache Hadoop servers are approaching end-of-life, and IT has decided to migrate the cluster to Google Cloud Dataproc. A like-for- like migration of the cluster would require 50 TB of Google Persistent Disk per node. The CIO is concerned about the cost of using that much block storage. You want to minimize the storage cost of the migration. What should you do?
Business owners at your company have given you a database of bank transactions. Each row contains the user ID, transaction type, transaction location, and transaction amount. They ask you to investigate what type of machine learning can be applied to the data. Which three machine learning applications can you use? (Choose three.)
A. Supervised learning to determine which transactions are most likely to be fraudulent.
B. Unsupervised learning to determine which transactions are most likely to be fraudulent.
C. Clustering to divide the transactions into N categories based on feature similarity.
D. Supervised learning to predict the location of a transaction.
E. Reinforcement learning to predict the location of a transaction.
F. Unsupervised learning to predict the location of a transaction
Business owners at your company have given you a database of bank transactions. Each row contains the user ID, transaction type, transaction location, and transaction amount. They ask you to investigate what type of machine learning can be applied to the data. Which three machine learning applications can you use? (Choose three.)
A. Supervised learning to determine which transactions are most likely to be fraudulent. B. Unsupervised learning to determine which transactions are most likely to be fraudulent. C. Clustering to divide the transactions into N categories based on feature similarity. D. Supervised learning to predict the location of a transaction. E. Reinforcement learning to predict the location of a transaction. F. Unsupervised learning to predict the location of a transaction
You have spent a few days loading data from comma-separated values (CSV) files into the Google BigQuery table CLICK_STREAM. The column DT stores the epoch time of click events. For convenience, you chose a simple schema where every field is treated as the STRING type. Now, you want to compute web session durations of users who visit your site, and you want to change its data type to the TIMESTAMP. You want to minimize the migration effort without making future queries computationally expensive. What should you do?
You have spent a few days loading data from comma-separated values (CSV) files into the Google BigQuery table CLICK_STREAM. The column DT stores the epoch time of click events. For convenience, you chose a simple schema where every field is treated as the STRING type. Now, you want to compute web session durations of users who visit your site, and you want to change its data type to the TIMESTAMP. You want to minimize the migration effort without making future queries computationally expensive. What should you do?
Signup and view all the answers
You need to store and analyze social media postings in Google BigQuery at a rate of 10,000 messages per minute in near real-time. Initially, design the application to use streaming inserts for individual postings. Your application also performs data aggregations right after the streaming inserts. You discover that the queries after streaming inserts do not exhibit strong consistency, and reports from the queries might miss in-flight data. How can you adjust your application design?
You need to store and analyze social media postings in Google BigQuery at a rate of 10,000 messages per minute in near real-time. Initially, design the application to use streaming inserts for individual postings. Your application also performs data aggregations right after the streaming inserts. You discover that the queries after streaming inserts do not exhibit strong consistency, and reports from the queries might miss in-flight data. How can you adjust your application design?
Signup and view all the answers
Your company uses a proprietary system to send inventory data every 6 hours to a data ingestion service in the cloud. Transmitted data includes a payload of several fields and the timestamp of the transmission. If there are any concerns about a transmission, the system re-transmits the data. How should you deduplicate the data most efficiency?
Your company uses a proprietary system to send inventory data every 6 hours to a data ingestion service in the cloud. Transmitted data includes a payload of several fields and the timestamp of the transmission. If there are any concerns about a transmission, the system re-transmits the data. How should you deduplicate the data most efficiency?
Signup and view all the answers
You are building a model to predict whether or not it will rain on a given day. You have thousands of input features and want to see if you can improve training speed by removing some features while having a minimum effect on model accuracy. What can you do?
You are building a model to predict whether or not it will rain on a given day. You have thousands of input features and want to see if you can improve training speed by removing some features while having a minimum effect on model accuracy. What can you do?
Signup and view all the answers
Your company built a TensorFlow neutral-network model with a large number of neurons and layers. The model fits well for the training data. However, when tested against new data, it performs poorly. What method can you employ to address this?
Your company built a TensorFlow neutral-network model with a large number of neurons and layers. The model fits well for the training data. However, when tested against new data, it performs poorly. What method can you employ to address this?
Signup and view all the answers
Your company is in a highly regulated industry. One of your requirements is to ensure individual users have access only to the minimum amount of information required to do their jobs. You want to enforce this requirement with Google BigQuery.
Which three approaches can you take? (Choose three.)
A. Disable writes to certain tables.
B. Restrict access to tables by role.
C. Ensure that the data is encrypted at all times.
D. Restrict BigQuery API access to approved users.
E. Segregate data across multiple tables or databases.
F. Use Google Stackdriver Audit Logging to determine policy violations.
Your company is in a highly regulated industry. One of your requirements is to ensure individual users have access only to the minimum amount of information required to do their jobs. You want to enforce this requirement with Google BigQuery. Which three approaches can you take? (Choose three.) A. Disable writes to certain tables. B. Restrict access to tables by role. C. Ensure that the data is encrypted at all times. D. Restrict BigQuery API access to approved users. E. Segregate data across multiple tables or databases. F. Use Google Stackdriver Audit Logging to determine policy violations.
Signup and view all the answers
You are working on a sensitive project involving private user data. You have set up a project on Google Cloud Platform to house your work internally. An external consultant is going to assist with coding a complex transformation in a Google Cloud Dataflow pipeline for your project. How should you maintain users' privacy?
You are working on a sensitive project involving private user data. You have set up a project on Google Cloud Platform to house your work internally. An external consultant is going to assist with coding a complex transformation in a Google Cloud Dataflow pipeline for your project. How should you maintain users' privacy?
Signup and view all the answers
You are designing storage for very large text files for a data pipeline on Google Cloud. You want to support ANSI SQL queries. You also want to support compression and parallel load from the input locations using Google recommended practices.
What should you do?
You are designing storage for very large text files for a data pipeline on Google Cloud. You want to support ANSI SQL queries. You also want to support compression and parallel load from the input locations using Google recommended practices.
What should you do?
Signup and view all the answers
You work for a manufacturing plant that batches application log files together into a single log file once a day at 2:00 AM. You have written a Google Cloud
Dataflow job to process that log file. You need to make sure the log file in processed once per day as inexpensively as possible. What should you do?
You work for a manufacturing plant that batches application log files together into a single log file once a day at 2:00 AM. You have written a Google Cloud Dataflow job to process that log file. You need to make sure the log file in processed once per day as inexpensively as possible. What should you do?
Signup and view all the answers
Your company has recently grown rapidly and now ingesting data at a significantly higher rate than it was previously. You manage the daily batch MapReduce analytics jobs in Apache Hadoop. However, the recent increase in data has meant the batch jobs are falling behind. You were asked to recommend ways the development team could increase the responsiveness of the analytics without increasing costs.
Your company has recently grown rapidly and now ingesting data at a significantly higher rate than it was previously. You manage the daily batch MapReduce analytics jobs in Apache Hadoop. However, the recent increase in data has meant the batch jobs are falling behind. You were asked to recommend ways the development team could increase the responsiveness of the analytics without increasing costs.
Signup and view all the answers
You are developing an application that uses a recommendation engine on Google Cloud. Your solution should display new videos to customers based on past views. Your solution needs to generate labels for the entities in videos that the customer has viewed. Your design must be able to provide very fast filtering suggestions based on data from other customer preferences on several TB of data. What should you do?
You are developing an application that uses a recommendation engine on Google Cloud. Your solution should display new videos to customers based on past views. Your solution needs to generate labels for the entities in videos that the customer has viewed. Your design must be able to provide very fast filtering suggestions based on data from other customer preferences on several TB of data. What should you do?
Signup and view all the answers
Study Notes
Unsupervised Anomaly Detection
- Evaluating an unsupervised anomaly detection method for classifying tissue samples as either normal or mutated
- Two characteristics that support this method are:
- There are very few occurrences of mutations relative to normal samples
- There are roughly equal occurrences of both normal and mutated samples in the database
Database Migration
- Migrating on-premises Apache Hadoop servers to Google Cloud Dataproc
- 50 TB of Google Persistent Disk per node required for a like-for-like migration
- Need to minimize storage cost of the migration
Machine Learning Applications
- Three machine learning applications for a database of bank transactions are:
- Supervised learning to determine which transactions are most likely to be fraudulent
- Clustering to divide the transactions into categories based on feature similarity
- Unsupervised learning to identify anomalous transactions
Data Schema and Query Optimization
- Loading data from CSV files into a Google BigQuery table
- Changing data type of a column from STRING to TIMESTAMP
- Need to minimize migration effort and prevent computationally expensive queries
Real-time Data Ingestion and Analysis
- Storing and analyzing social media postings in Google BigQuery at a rate of 10,000 messages per minute
- Using streaming inserts and data aggregations
- Need to ensure strong consistency in reports and avoid missing in-flight data
Data Deduplication
- Receiving inventory data every 6 hours from a proprietary system
- Need to deduplicate data efficiently to avoid re-transmitted data
Feature Selection and Model Training
- Building a model to predict whether it will rain on a given day
- Having thousands of input features and wanting to remove some features for faster training
- Need to minimize the effect on model accuracy
Overfitting and Model Performance
- Building a TensorFlow neural-network model with many neurons and layers
- Model fits well for training data but performs poorly on new data
- Need to address overfitting
Access Control and Data Encryption
- Ensuring individual users have access only to the minimum amount of information required
- Three approaches to enforce this requirement in Google BigQuery are:
- Restrict access to tables by role
- Restrict BigQuery API access to approved users
- Ensure that the data is encrypted at all times
Maintaining User Privacy
- Working on a sensitive project involving private user data
- Need to maintain users' privacy when collaborating with an external consultant
Data Storage and Querying
- Designing storage for large text files on Google Cloud
- Need to support ANSI SQL queries and compression and parallel load
Processing Log Files
- Batching application log files together into a single log file once a day
- Need to process the log file once per day as inexpensively as possible
Increasing Analytics Responsiveness
- Managing daily batch MapReduce analytics jobs in Apache Hadoop
- Need to increase the responsiveness of analytics without increasing costs
Recommendation Engine
- Developing an application that uses a recommendation engine on Google Cloud
- Need to generate labels for entities in videos that the customer has viewed
- Need to provide fast filtering suggestions based on data from other customer preferences
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
This quiz evaluates the suitability of an unsupervised anomaly detection method for classifying tissue samples as normal or mutated. It examines the characteristics that support this method, such as the occurrence of mutations relative to normal samples and the expected features of future mutations.