Podcast
Questions and Answers
What is the benefit of using subqueries in Hive versus writing multiple separate queries to solve a problem?
What is the benefit of using subqueries in Hive versus writing multiple separate queries to solve a problem?
- . Each map reduce job becomes smaller.
- The jobs can always be converted to map-only tasks, which execute faster.
- There is a potential for less map reduce jobs to be executed. (correct)
Which of the following statements is true of Hive?
Which of the following statements is true of Hive?
- SORT BY outputs globally sorted output even when there are two reducers used.
- ORDER BY only sorts the data within each reducer.
- ORDER BY is always faster than SORT BY.
- SORT BY only sorts the data within each reducer. (correct)
Suppose you are given a table with the following columns: City Name, Country Name, Population. The table contains every city for every country in the world and their population.
You are asked to output the city with the highest population among all cities in Australia. Which Hive commands would you use?
Suppose you are given a table with the following columns: City Name, Country Name, Population. The table contains every city for every country in the world and their population. You are asked to output the city with the highest population among all cities in Australia. Which Hive commands would you use?
- SELECT, GROUP BY, LIMIT
- SELECT, WHERE, ORDER BY, LIMIT (correct)
- SELECT, ORDER BY, GROUP BY, LIMIT
- SELECT, WHERE, GROUP BY, ORDER BY, LIMIT
Why does Hive allow users to define complex data types like structs, whereas traditional relational databases do not allow it?
Why does Hive allow users to define complex data types like structs, whereas traditional relational databases do not allow it?
Suppose you are asked to output the url, ipAddress and time of the most recent log entry in the mytraffic table of lab 3, where each row of the mytraffic table represents a log entry. Which of the following set of Hive commands best solves this problem?
Suppose you are asked to output the url, ipAddress and time of the most recent log entry in the mytraffic table of lab 3, where each row of the mytraffic table represents a log entry. Which of the following set of Hive commands best solves this problem?
Which statement about the MapReduce job tracker is false?
Which statement about the MapReduce job tracker is false?
Suppose you are given a table with the following columns: City Name, Country Name, Population. The table contains every city for every country in the world and their populations.
You are asked to list the top 10 countries in descending order of total population using Hive. Which Hive commands would you use?
Suppose you are given a table with the following columns: City Name, Country Name, Population. The table contains every city for every country in the world and their populations. You are asked to list the top 10 countries in descending order of total population using Hive. Which Hive commands would you use?