50 Questions
The expression expr("(((someCol + 5) * 200) - 6) < otherCol")
is only valid in Python.
False
Spark DataFrames can be represented using SQL code or DataFrame code, but the former is more efficient.
False
The columns
property of a DataFrame can be used to access a specific column by its name.
False
In Spark, each row in a DataFrame is an object of type Record
.
False
Spark uses column expressions to manipulate the byte array interface of Row objects.
False
When working with individual DataFrames, the fundamental objective is to create DataFrames from raw data sources.
False
The printschema
method is used to programmatically access columns in a DataFrame.
False
Spark DataFrames can be loaded from JSON files using the spark.read.format("json")
method.
True
We can sort data by values in columns when working with DataFrames.
True
Transforming a row into a column is not a possible operation in DataFrames.
False
The expr
function is used to compile DataFrame code to SQL code.
False
Creating DataFrames from raw data sources is discussed in Chapter 11.
False
DataFrame transformations are depicted in Figure 11-2.
False
We can add columns but not rows when working with DataFrames.
False
DataFrame transformations only involve adding or removing rows.
False
The most common DataFrame transformations take multiple columns and change them row by row.
False
The withColumn function takes only one argument, which is the column name.
False
The expr function is used to execute SQL queries in DataFrames.
False
Renaming a column is not a possible operation in DataFrames.
False
The show method is used to display the entire DataFrame.
False
The withColumn function can only be used to add new columns, not to modify existing ones.
False
The columns property of a DataFrame is an immutable list.
True
Spark DataFrames can only be loaded from CSV files.
False
The expr function is only available in Python.
False
You can specify the number of partitions you would like when repartitioning a DataFrame in Spark.
True
Repartitioning a DataFrame always incurs a full shuffle.
True
Coalesce can be used to partition a DataFrame by a specific column.
False
Collecting rows to the driver can be used to manipulate data on a local machine.
True
Repartitioning is necessary when the future number of partitions is less than the current number of partitions.
False
The number of partitions of a DataFrame can be obtained using the getNumPartitions
method in Spark.
True
Coalesce always incurs a full shuffle.
False
Repartitioning is only used for filtering by a certain column often.
False
In Scala, the Metadata
object is used to specify the data type of a column.
False
In Python, the StructType
class is used to create a schema for a DataFrame.
True
Spark maintains its own type information independent of the per-language types.
True
Columns in Spark are physical constructions that store data on disk.
False
Expressions in Spark are used to select, manipulate, and remove columns from DataFrames.
True
The StructField
class in Scala is used to create a DataFrame.
False
The schema
method in Spark is used to specify the data type of a column.
False
The load
method in Spark is used to read data from a JSON file into a DataFrame.
True
What is the purpose of using the AS keyword in Spark?
To rename a column
What is the difference between select and selectExpr in Spark?
select is used for aggregating columns, while selectExpr is used for non-aggregating columns
What is the result of using the alias method on a column in Spark?
It changes the column name to a new name
What is the purpose of using expr in Spark?
To execute SQL queries
What is the advantage of using selectExpr in Spark?
It is more efficient than using select
What happens when you use the alias method on a column that has already been renamed?
The column name is changed back to its original name
What is the result of using selectExpr with multiple columns in Spark?
It creates a new DataFrame with the specified columns
What is the main difference between select and selectExpr in terms of column manipulation?
select can only be used for non-aggregating columns, while selectExpr can be used for aggregating columns
What is the advantage of using expr in Spark DataFrames?
It allows for more complex expressions to be built up
What is the purpose of the AS keyword in Spark DataFrames?
To rename a column
Learn how to create a manual schema for JSON data using Apache Spark and Scala. This quiz covers the import of necessary types, creation of a StructType, and loading of JSON data. Test your skills in Spark data processing!
Make Your Own Quizzes and Flashcards
Convert your notes into interactive study material.
Get started for free