Home > Big Data, Disrup. Technology, Distributed Machine Learning, High Scalability > Real-Time Hadoop queries will be a reality in 2013

Real-Time Hadoop queries will be a reality in 2013

Real-Time Hadoop queries will be a reality in 2013 thanks to two new projects from Cloudera: Impala and Trevni.

Impala is the open source version of Dremel, Google’s proprietary big data query solution. A first beta is available and the production version is foreseen for Q1 2013.

Impala allows you to run real-time queries on top of Hadoop’s HDFS, Hbase and Hive. No migrations necessary.

However the real revolution will only get better when Doug Cutting [the creator of Lucene, Hadoop, etc.]’s Trevni is integrated into Impala. Trevni is a new columnar data storage format that promises superior performance for reading large columnar stored data sets.

Impala+Trevni is promising real-time big data queries with multiple joins that are on par in performance but have more functionality than Google’s Dremel…

  1. December 5, 2012 at 10:46 am

    How this will affect Storm/Trident? Your thoughts?

    • December 5, 2012 at 10:55 am

      Very good question. I think they are complementary. Trident Storm is dealing with real-time events but when combined with Impala’s real-time queries events can be enriched tremendously.

      • December 5, 2012 at 11:16 am

        Hi Maarten, thanks for sharing your thoughts.

        Presently, I have a Storm Topology+Cassandra Counters app to throw RT events/alarms to a dashboard. And to retrieve the alarm details, I have to query my Hadoop ecosystem.

        So it seems from your reading that with Impala I can completely get rid of Cassandra? Will I be able to store even my RT counters and associated alarm/event definitions also directly in Hadoop?


      • December 5, 2012 at 11:23 am

        I wouldn’t substitute Cassandra by Impala but instead substitute Cassandra by Hbase and add Impala. Hbase will give you real-time storage and Impala real-time complex queries. Impala is for queries (like in a data warehouse) but not for counters accounting [reading is possible and would be perfect scenario for Impala]. The reason why Hbase would be superior for this setup is because al

          l the data would be in Hadoop instead of spread over multiple systems. Also impala would be needed for complex queries. If queries do not need joins then Cassandra or Hbase are good enough and no Impala is needed.
  2. December 5, 2012 at 11:29 am

    Thanks a lot.

  1. December 7, 2012 at 5:53 pm
  2. January 16, 2013 at 12:19 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: