SQL on Hadoop

Posted on Feb 23, 2014


Database tableIn a recent article published on bloorresearch.com, some interesting views about running SQL on Hadoop have been highlighted. In the article, Philip Howard answers three main questions: firstly, will we see more vendors porting their warehouse products onto Hadoop (or HDFS); secondly, how quickly will Cloudera or HortonWorks (with its SQL implementation) be able to produce an optimiser that can compete reasonably well with these intruders into their market; and, thirdly, how much does this matter?

“The good thing about running SQL on Hadoop is that SQL is a declarative language, which means that you don’t need to know where the data is, you just have to ask for it and then the database works out how to get the information you need”, according to Howard.

His view is that developing a good optimiser still takes years, despite the level of expertise available. However, the number of years has reduced considerably, mainly due to the change in the level of talent available – something that has changed greatly in recent years.

Our view is that both HDFS and SQL present separate strengths which can be combined very effectively, and quickly, to allow for the design of optimal data analytics solutions. MDP is already using this approach and combining these in our own Hadoop clusters.

Use the following link to read an opinion article as it appeared on BloorResearch:
http://www.bloorresearch.com/analysis/sql-on-hadoop/