HiveQL

Posted on Updated on

Hive Shell:

Hive commands can be executed either via hive shell in interactive mode or by using a script and invoking with ‘hive -f script_name

Creating a database:

hive> create database hive_test;

Prints ‘OK’ if the command is executed successfully. You can also execute ‘show databases’ to list the database names .

Type ‘use hive_test’ to start using this database.

Create table:

‘CREATE TABLE test_tab(name STRING,id INT)
row format delimited
fields terminated by ‘,’;

This will create a table named ‘test_tab’ in hive_test database.

‘row format delimited’ -> delimiter expecting by this table. default delimiter is newline
fields terminated by ‘,’ -> Here we will load a csv file so the feilds are terminated by ‘,’

Table can also be created using ‘CREATE EXTERNAL TABLE‘ .

If we create table using ‘CREATE TABLE’ without ‘external’ keyword then Hive will move the data during load operation to hive warehouse. Unnecessarily we are duplicating the same data at both Hadoop filesystem and in Hive
warehouse. This can be avoided by using ‘External’ keyword.

CREATE EXTERNAL TABLE test_tab(name STRING,id INT)
LOCATION ‘/home/external_table’;
LOAD DATA INPATH ‘/home/test/test.txt’ INTO TABLE test_tab;

Load data to Hive table:
data can be loaded from local file system or from Hadoop file system.

load data local inpath ‘/home/test/test.csv’into table test_tab; will load data from local file system(Note the local keyword before inpath).

load data inpath ‘/home/test/test.txt’ into table test_tab; will load data from Hadoop file system.

Note: Hive wont throw any error during ‘load’ operation related to table schema because actual validation takes place during execution time.

Alter:

Hive table can be modified using alter . For instance we can add or drop columns from a table using alter.

ALTER TABLE test_tab ADD COLUMNS column_test

ALTER TABLE test_tab DROP column_test

to rename table name ALTER TABLE test_tab RENAME to test_table

Drop:

We can drop a table in Hive using ‘drop’.  When we drop data then both metadata and date would be deleted but for external tables only metadata would be deleted ( data would be untouched ).

DROP TABLE [ IF EXISTS ] test_table

Partition and buckets:

For efficient querying Hive supports partitioning and buckets.

Using partition a table can be divided into related parts. For instance, a table can be partitioned based on columns such as city,date or country. Using partition, it becomes easy to query a portion of data in a table.

Buckets further sub divides a table or partition into buckets. Bucketing works based on the value of hash function of some column of a table.

< full post under construction >

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s