You have an Azure Blob storage account that contains a folder. The folder contains 120,000 files. Each file contains 62 columns. Each day, 1,500 new files are added to the folder.... You have an Azure Blob storage account that contains a folder. The folder contains 120,000 files. Each file contains 62 columns. Each day, 1,500 new files are added to the folder. You plan to incrementally load five data columns from each new file into an Azure Synapse Analytics workspace. You need to minimize how long it takes to perform the incremental loads. What should you use to store the files and in which format? Answer the appropriate options in the answer area.

Question image

Understand the Problem

The question is asking how to optimally store and format data in Azure Blob storage, focusing on minimizing load times for Azure Synapse Analytics. It requires selecting the best storage method and file format from given options.

Answer

Timeslice partitioning in the folders; Apache Parquet.

The storage option should be 'Timeslice partitioning in the folders' and the file format should be 'Apache Parquet'.

Answer for screen readers

The storage option should be 'Timeslice partitioning in the folders' and the file format should be 'Apache Parquet'.

More Information

Timeslice partitioning helps organize and retrieve files efficiently, while Apache Parquet is optimized for performance and storage, making it suitable for incremental loads.

Tips

Avoid using JSON or CSV if performance and loading time are critical, as these formats might be less efficient for querying large datasets.

AI-generated content may contain errors. Please verify critical information

Thank you for voting!
Use Quizgecko on...
Browser
Browser