Friday, December 26, 2014

Databases in Hive

In Hive, database is just a catalog or namespace of tables. However, they are very useful for larger clusters with multiple teams and users, as a way of avoiding table name collisions. It’s also common to use databases to organize production tables into logical groups.
If you don’t specify a database, the default database is used.

Create a DataBase

hive> CREATE DATABASE workshop;
Throw an error if workshop database already exists. 

You can suppress these warnings with below variation:
hive> CREATE DATABASE IF NOT EXISTS workshop;
IF NOT EXISTS clause is useful for scripts that should create a database on-the-fly, if necessary, before proceeding.

Note:
Hive will create a directory for each database under a top-level directory specified by the property hive.metastore.warehouse.dir. (default value is /user/hive/warehouse).

Eg: when the 'workshop' database is created, Hive will create the directory /user/hive/warehouse/workshop.db. (Note the .db extension).

All the tables of the database will be stored in sub-directories of the database directory. The exception is tables in the default database, which doesn't have its own directory.

You can override the default location for the new directory as shown in this example:
hive> CREATE DATABASE workshop LOCATION '/hive_workshop’;

You can add a descriptive comment to the database, which will be shown by the
hive> CREATE DATABASE workshop COMMENT 'Holds all my exercises in Hive';

Database Schema

hive> DESCRIBE DATABASE workshop;

OK
workshop Holds all my exercises in Hive hdfs://satishkumar/user/hive/warehouse/workshop.db satishkumar
Time taken: 0.026 seconds, Fetched: 1 row(s)

DESCRIBE DATABASE also shows the directory location for the database. If you are running in pseudo-distributed mode, then the master server will be localhost. For local mode, the path will be a local path, file:///user/hive/warehouse/workshop.db.

You can associate key-value properties with the database, although their only function currently is to provide a way of adding information to the output of DESCRIBE DATABASE EXTENDED <database>:

hive> CREATE DATABASE workshop WITH DBPROPERTIES ('creator' = 'Satish Kumar', 'date' = '2014-12-28');

hive> DESCRIBE DATABASE workshop;                                                                    
OK
workshop hdfs://satishkumar/user/hive/warehouse/workshop.db satishkumar
Time taken: 0.026 seconds, Fetched: 1 row(s)

List of DataBases

At any time, you can see the databases that already exist as follows:
hive> SHOW DATABASES;
default
workshop

If you have a lot of databases, you can restrict the ones listed using a regular expression like
hive> SHOW DATABASES LIKE 'w.*';
workshop 
It lists only those databases that start with the letter ‘w’ and end with any other characters.

Set Working Database

The USE command sets a database as your working database, analogous to changing working directories in a filesystem:

hive> USE employee_db;

Note: There is no command to show you which database is your current working database.

Drop a DataBase

hive> DROP DATABASE IF EXISTS workshop;
The IF EXISTS is optional and suppresses warnings if workshop doesn't exist.

By default, Hive won’t permit you to drop a database if it contains tables. You can either drop the tables first or append the CASCADE keyword to the command, which will cause the Hive to drop the tables in the database first:

hive> DROP DATABASE IF EXISTS financials CASCADE;

Using the RESTRICT keyword instead of CASCADE is equivalent to the default behavior, where existing tables must be dropped before dropping the database. When a database is dropped, its directory is also deleted.

Alter a DataBase

You can set key-value pairs in the DBPROPERTIES associated with a database using the
ALTER DATABASE command. No other metadata about the database can be changed,
Including its name and directory location:

hive> ALTER DATABASE workshop SET DBPROPERTIES ('edited-by' = 'Teja');

There is no way to delete or “unset” a DBPROPERTY.

No comments:

Post a Comment