Wednesday, 10 July 2013

Impala: Open Source Real-Time SQL Queries on Hadoop

Cloudera Impala is an open source system which provides real-time SQL querying functionality on top of Hadoop.

Impala High-level Architectural View
Impala High-level Architectural View

Cloudera, which created Impala, said they had been technically inspired by Google's Dremel paper, which made them think that it would be possible to perform real-time, ad-hoc queries in Apache Hadoop.

In October 2012, when announcing Impala, Cloudera introduced it as follows:
“Real-Time Queries in Apache Hadoop, For Real”
Impala adopted Hive-SQL as an interface. As mentioned above, Hive-SQL is similar in terms of syntax to SQL, a popularly used query language. For this reason, users can access data stored in HDFS through a very familiar method.

As Hive-SQL uses Hive, you can access the same data through the same method. However, not all Hive-SQLs are supported by Impala. For this reason, you had better understand that Hive-SQLs that are used in Impala can also be used in Hive.

Continue to Meet Impala: Open Source Real-Time SQL Queries on Hadoop