The annual west coast Strata Hadoop World is always an important event for Cask, and this year’s turned out to be the best one yet, both productive for the team and encouraging for the ecosystem. There was great participation from vendors across the big data industry and broad representation across the many Hadoop stakeholders: from developers and data architects, to data scientists and engineering leaders. We would like to thank O’Reilly and Cloudera for providing an amazing event and venue to gather the best in industry minds, Hadoop ecosystem developers, new and experienced users, and the vibrant vendor ecosystem. Hadoop has come a long way in 10 years; from the early meetups with 10s of people, to the early conferences with 100s of people, to the 1000s we see today. Despite the continued expansion of infrastructure technologies and acknowledged struggles enterprises are delivering business value on Hadoop, it was clear this year that the tides are shifting. What started as small pockets of Hadoop success, is turning into industry-wide adoption of use cases like fraud, customer intelligence and security, enabled by Hadoop. While the ecosystem as a whole is still very infrastructure-oriented, the expo floor this year was filled with substantially more products focused on business use cases and providing end-user value on Hadoop. The major trends we observed this year were primarily around four tracks:
- Data Integration – Products and services for ingestion, management and governance of data into Hadoop, both on-premise and cloud-based.
- Data Prep and Visualization – Products aimed at end-users to provide self-service data prep, exploration and reporting.
- Data Science Frameworks – Frameworks for developers and data scientists to more easily build and deploy smart products with Hadoop.
- Hadoop in the Cloud – Increase in cloud-native products and services and rise in overall discussion of services like EMR and HDInsights.
Traction and excitement around these tracks was amazing and we believe is indicative of the Hadoop industry as a whole maturing and orienting towards the challenges of enterprises and the need to deliver differentiated business value with big data. Things are shaping up for our industry to solve these issues and enable wider adoption of Hadoop.
At the Cask booth we demoed the second generation of Cask Hydrator and introduced a new tool called Cask Tracker , both focused on the data integration aspects of Hadoop. These CDAP Extensions are available for download immediately as part of CDAP 3.4 RC1. You can learn more in the CDAP Extensions Data Sheet or watch this demo