Friday, December 26, 2014

Why go for Hive When Pig is There?


  • Pig 
    • Procedural data-flow language 
    • Pig is used by Programmers and Researchers. 
    • Pig is on the client side. 
    • For managing and querying unstructured data. 
      
  • Hive 
    • Declarative SQLish Language 
    • Hive is used by analysts generating data reports. 
    • Hive is on cluster side. 
    • For managing and querying structured data. 

Features 
Hive 
Pig 
Language 
SQL-like 
PigLatin 
Schemas/Type 
Yes (explicit) 
We have to create "tables" beforehand and stores the schema in a either shared or local database for metadata. 
Yes (implicit) 
No need to create table. 
Partitions 
Yes 
No 
Server 
Optional (Thrift) 
No 
UDF 
Yes (Java) 
Yes (Java) 
Custom Serialize/Deserializer 
Yes 
Yes 
DFS Direct Access 
Yes (implicit) 
We never point to the actual HDFS folder. 
Yes (explict) 
We explicitly point to HDFS folder. 
Join/Order/Sort 
Yes 
Yes 
Shell 
Yes 
Yes 
Streaming 
Yes 
Yes 
WebInterface 
Yes 
No 
JDBC/ODBC 
Yes (limited) 
No 

No comments:

Post a Comment