Putting the Promise of the Data Lake in Reach

September 26, 2017

Putting the Promise of the Data Lake in Reach

By Rik Tamm-Daniels, VP Technology and Partnerships, Paxata

Agility, elastic scaling, and unlimited analytic potential. The promise of a data lake is very appealing to organizations that were thwarted and obstructed by legacy infrastructure in their quest to drive business via data. Regrettably, the reality of the actual journey needed to reach the promised land revealed to be quite daunting and led to countless failed initiatives or projects that fulfilled only a sliver of the anticipated ROI from the data lake investment.

So where do the dragons live that make so many data lake endeavors go up in flames?

The Three Big Impediments

  1. Limit user access to the data lake
    If only a tiny fraction of your employees are able to access the data in the lake, you are just creating a new bottleneck.  Self-service access points to your data lake provide incredible scaling potential when it comes to utilizing data as an asset.
  2. The high cost of on-premise “elastic” scaling
    One of the big benefits of a data lake is elastic compute, where resources can be repurposed from job to job and from use case to use case.  The problem with an on-premise data lake is that the lead time to add new capacity can be 6 to 9 months and pre-buying capacity that is not yet needed is a non-starter for many organizations.  This limitation is not an issue in the cloud; you can spin up new compute in minutes and spin it down when you are done and you only pay for what you use.
  3. The complexity of managing layers in the data lake
    Putting in Hadoop is only the starting point. You need to figure out how to prepare data and how to analyze it in a scalable, efficient way. With all of the tools in the market today, there is no universal handbook explaining how to plug A to B to C products together; moreover, not all of these products play nicely in the sandbox.  The uncertainty around how all the pieces work together led many IT organizations to lock down their data lake infrastructure to such a degree that they ended up with the same old bottlenecks that led them down the data lake path in the first place.

CloudT30 to the rescue!

At Strata in New York this week, we are showcasing an answer to make the data lake vision a reality and drive business outcomes in record time.  CloudT30 is a solution developed by our partner, Clarity Insights, which leverages AWS CloudFormation templates to provide a point-and-click way for anyone to stand up a fully-functional data lake – from data management to data preparation to interactive high-performance querying to visualization and analytics. CloudT30 comes with best-of-breed tools pre-installed and pre-configured to enable business users to solve business challenges immediately.

CloudT30 provides a data lake in the cloud in 30 minutes with the following industry-leading components:

  • Cloudera – The modern platform for machine learning and analytics optimized for the cloud
  • Paxata – Self-service data preparation at scale with enterprise-grade security and governance
  • Zoomdata – Cutting edge visualizations at scale

With CloudT30, business users can immediately load data into their data lake, prepare it, and visualize it without writing a single line of code.

To learn more about CloudT30 and how you can take it for a test drive, visit www.clarityinsights.com.

Show Buttons
Hide Buttons