Assignment 1 Quiz 5 CSE5BDC T5 2023
6 Questions
20 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which of the following statements regarding data caching in Apache Spark is false?

  • Caching only a part of an RDD has no performance benefits. (correct)
  • Caching data are especially important for the performance of iterative programs.
  • Caching reduces the amount of disk access and therefore speeds up query execution.
  • RDDs in Apache Spark are only cached if you explicitly specify that you want the RDD to be cached.
  • Which of the following statements about parquet storage format is false?

  • Parquet storage format stores the schema with the data.
  • Given a dataframe with 100 columns, it is faster to query a single column of the dataframe if the data are stored using the CSV storage format compared to the parquet storage format. (correct)
  • Parquet storage format stores all values of the same column together.
  • Given a dataframe with 100 columns, it is faster to query a single column of the dataframe if the data are stored using the parquet storage format compared to the data being stored in the CSV storage format.
  • Which of the following statements is false?

  • Executing queries using SparkSQL DataFrames and DataSets functions are at least as fast as using their RDD counterparts, and often faster.
  • You can add columns to a dataframe using the withColumn function.
  • DataSets contain schemas whereas DataFrames do not contain schemas. (correct)
  • After performing a self-join on a dataframe, the resulting columns will contain duplicate column names.
  • What is a benefit of using the partitionBy function in SparkSQL?

    <p>It allows you to quickly retrieve all data associated with a given value on the partitioned column.</p> Signup and view all the answers

    Which of the following statements about query optimisation in Spark is false?

    <p>Spark automatically applies query optimisation on a sequence of RDD transformations.</p> Signup and view all the answers

    Which of the following statements is false?

    <p>You need to explicitly invoke a combiner in order to enjoy the benefits of reduced data shuffle when using the reduceByKey function.</p> Signup and view all the answers

    Use Quizgecko on...
    Browser
    Browser