Hive commands can be executed either via hive shell in interactive mode or by using a script and invoking with ‘hive -f script_name‘
Creating a database:
hive> create database hive_test;
Prints ‘OK’ if the command is executed successfully. You can also execute ‘show databases’ to list the database names .
Type ‘use hive_test’ to start using this database.
‘CREATE TABLE test_tab(name STRING,id INT)
row format delimited
fields terminated by ‘,’;
This will create a table named ‘test_tab’ in hive_test database.
‘row format delimited’ -> delimiter expecting by this table. default delimiter is newline
fields terminated by ‘,’ -> Here we will load a csv file so the feilds are terminated by ‘,’
Table can also be created using ‘CREATE EXTERNAL TABLE‘ .
If we create table using ‘CREATE TABLE’ without ‘external’ keyword then Hive will move the data during load operation to hive warehouse. Unnecessarily we are duplicating the same data at both Hadoop filesystem and in Hive
warehouse. This can be avoided by using ‘External’ keyword.
CREATE EXTERNAL TABLE test_tab(name STRING,id INT)
LOAD DATA INPATH ‘/home/test/test.txt’ INTO TABLE test_tab;
Load data to Hive table:
data can be loaded from local file system or from Hadoop file system.
load data local inpath ‘/home/test/test.csv’into table test_tab; will load data from local file system(Note the local keyword before inpath).
load data inpath ‘/home/test/test.txt’ into table test_tab; will load data from Hadoop file system.
Note: Hive wont throw any error during ‘load’ operation related to table schema because actual validation takes place during execution time.
Hive table can be modified using alter . For instance we can add or drop columns from a table using alter.
ALTER TABLE test_tab ADD COLUMNS column_test
ALTER TABLE test_tab DROP column_test
to rename table name ALTER TABLE test_tab RENAME to test_table
We can drop a table in Hive using ‘drop’. When we drop data then both metadata and date would be deleted but for external tables only metadata would be deleted ( data would be untouched ).
DROP TABLE [ IF EXISTS ] test_table
Partition and buckets:
For efficient querying Hive supports partitioning and buckets.
Using partition a table can be divided into related parts. For instance, a table can be partitioned based on columns such as city,date or country. Using partition, it becomes easy to query a portion of data in a table.
Buckets further sub divides a table or partition into buckets. Bucketing works based on the value of hash function of some column of a table.
< full post under construction >