This paper provides a framework meant to help in the evaluation of Big Data analytical tools.
We review ten factors we believe should be paramount in your evaluations of Big Data analytical software packages. We present these factors in a way you should find easily tailorable to your organizational needs.The need for sensemaking across large and growing data stores has given rise to new approaches to data infrastructure, including the use of capabilities like Apache Hadoop.
Hadoop overcomes traditional limitations of storage and compute by delivering capabilities that run on commodity hardware and can leverage any data type. Hadoop enables scalability to the largest of data sets in a very cost effective way, making it the infrastructure of choice for organizations seeking to make sense of their growing data stores. Its ability to store data without a data model means information can be leveraged without first knowledge of what questions will be asked of the data, making this a system with far more agility than legacy data based.
The core capability of Hadoop has now grown to include a full framework of tools that include a data warehouse infrastructure (Hive), parallel computation capabilities (Pig), scalable distributed databases able to store large tables (HBase), scalable means of distributing data (HDFS) and tools for rapidly importing and managing data and coordinating the infrastructure (like Sqoop, Flume, Oozie and Zookeeper). The use of this framework of Hadoop tools has given rise to a new series of innovation in sensemaking over large quantifies of data and has laid the foundation for a dramatic growth of new analytical tools which can operate over these Big Data infrastructures.