Podcast
Questions and Answers
What is a database in Databricks equivalent to?
What is a database in Databricks equivalent to?
What command is used to create a database in Databricks?
What command is used to create a database in Databricks?
Where is the central Hive metastore accessed in Databricks?
Where is the central Hive metastore accessed in Databricks?
What is the default location for table data in Databricks?
What is the default location for table data in Databricks?
Signup and view all the answers
What extension is used for database folders in the Hive metastore?
What extension is used for database folders in the Hive metastore?
Signup and view all the answers
If a table is created without specifying a database name, where will its metadata be stored?
If a table is created without specifying a database name, where will its metadata be stored?
Signup and view all the answers
How do you create other databases apart from the default one in Databricks?
How do you create other databases apart from the default one in Databricks?
Signup and view all the answers
What does the Hive metastore primarily store information about?
What does the Hive metastore primarily store information about?
Signup and view all the answers
What is indicated by the LOCATION keyword when creating a schema?
What is indicated by the LOCATION keyword when creating a schema?
Signup and view all the answers
What happens to the data files of a managed table when the table is dropped?
What happens to the data files of a managed table when the table is dropped?
Signup and view all the answers
How does Hive manage an external table's data?
How does Hive manage an external table's data?
Signup and view all the answers
Which command is used to specify which database to use when creating an external table?
Which command is used to specify which database to use when creating an external table?
Signup and view all the answers
Where is the definition of an external table stored?
Where is the definition of an external table stored?
Signup and view all the answers
What is the default case for table creation in Databricks?
What is the default case for table creation in Databricks?
Signup and view all the answers
Can you create an external table in a database created in a custom location?
Can you create an external table in a database created in a custom location?
Signup and view all the answers
What is the primary responsibility of Hive regarding managed tables?
What is the primary responsibility of Hive regarding managed tables?
Signup and view all the answers
Study Notes
Overview of Databricks and Hive Metastore
- Databricks utilizes a Hive metastore to manage database and table metadata.
- A database in Databricks is equivalent to a schema in the Hive metastore.
- Creating a database can be done using either
CREATE DATABASE
orCREATE SCHEMA
syntax; both are functionally identical.
Hive Metastore
- The Hive metastore serves as a repository for metadata, storing information about databases, tables, and partitions.
- It contains details regarding table definitions, data formats, and the physical storage locations of the data.
Default Database in Databricks
- Each Databricks workspace has a default database named "default."
- Tables can be created in the default database using the
CREATE TABLE
statement without specifying a database name. - By default, the table data is stored in the
/user/hive/warehouse
directory.
Creating Additional Databases
- Additional databases can be created using
CREATE DATABASE
orCREATE SCHEMA
syntax. - New databases will be located in the Hive metastore and stored in the default directory with a
.db
extension to identify them.
Custom Database Locations
- Databases can be created outside of the default Hive directory by specifying a custom path using the
LOCATION
keyword. - The database definition remains in the Hive metastore, while the folder for the database data will be located at the specified custom path.
Types of Tables in Databricks
- Two main types of tables exist in Databricks: managed tables and external tables.
- Managed tables store their data in the database directory and are owned entirely by Hive, which manages their lifecycle.
- Dropping managed tables will delete both the table definition and the underlying data files.
External Tables
- External tables store metadata in the Hive metastore but the actual data resides outside the database directory, specified by the
LOCATION
keyword. - Dropping an external table does not delete the underlying data files, allowing for data persistence even after the table’s removal.
Creating Tables
- External tables can be created within the default database or any specified database by using the
CREATE TABLE
statement with theLOCATION
keyword. - Switching between databases can be done using the
USE
keyword, followed by the database name.
Summary of Operations
- Both managed and external tables function through similar creation syntax, but their data management and lifecycle differ significantly, affecting storage and deletion behavior.
- Understanding how to effectively manage tables and databases within Databricks is essential for efficient data organization and manipulation.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
This quiz explores relational entities in Databricks, focusing on how databases and tables operate within the platform. Learn about the impact of the LOCATION keyword on the default storage directory and the relationship between databases and schemas in Hive metastore.