Reinventing the Information Pipeline

Contributing author: Rik Tamm-Daniels, VP of Technology and Partnerships

An attendee from Pinterest sat next to me at MapR’s Big Data Everywhere event in Redwood City, CA. He sketched out his problem on a notepad: his financial team is trying to find ways to cut costs, comparing billing statements with actual usage of something they rely on. The problem? The finance team is evaluating bills from Amazon Web Services and comparing these to log files for server hosting.

The finance guys know the billing statements, but they can’t understand these log files.

The traditional information pipeline creates a bottleneck. Finance team doesn’t have tools to understand these log files, asks IT for help, and together they fall into this familiar pattern:

  1. Business teams funnel their data requirements to IT
  2. IT runs requirements through linear ETL process, executed with manual scripting or coding
  3. IT reviews with business, makes changes, fixes errors. Repeat steps 1-3 several rounds.
  4. Data requests have taken too long. Business teams make decisions long before data is available or they ask for changes and re-start the process

traditionaldataprep

What’s wrong with this system? Complicated. No visibility. Time-consuming. Error-prone. Expensive.

Legacy infrastructure for data preparation was not designed to scale to the data and consumers of today’s information-driven world. This model relies on a small set of highly-skilled IT data scientists and data developers to interpret business requirements and execute a highly prescribed, lengthy, waterfall process. These data scientists and data developers lack business context and take extra time trying to understand.

My fellow attendee from Pinterest is experiencing the pains of this traditional approach. His finance team needs to combinethese  massive log files with the associated billing statements. What if his technical team enabled the finance analysts to work directly with data in an interactive environment instead of having to translate their requirements to someone else? Imagine a self-service architecture that enabled the finance team to perform the familiar tasks of data preparation (data integration, exploration, quality, cleansing, enrichment, and shaping) without code or scripts.

A modern architecture re-imagines the relationship between IT and business.

IT’s role changes:

  • Collect and centralize access to raw data
  • Provide a scalable infrastructure to business to drive self-service data prep, self-service analytics
  • Maintain full governance, manage scale and efficiency

moderninformation_small

A modern architecture requires an exploratory and interactive experience that analysts can understand. With a smart interface that is easy to use, business teams and data analysts accomplish their data prep goals on their own. Because the data is governed by IT, business teams will trust the data they have.

So what?

Companies spend millions of dollars investing in traditional processes that maintain slow, bottlenecked information pipelines. Their business teams lose time waiting for prepared data or struggling with data preparation without the appropriate tools.

The Paxata Adaptive Information Platform is a critical component to modernizing your organization.

  • Centralized access to data in a schema-on-read model
  • Elastic scale-out compute provide by Spark
  • Multi-user, multi-tenant model for business-scaled data prep

The Paxata self-service data prep app gives business teams the freedom and flexibility to work directly with their data on a modern platform:

  • Consumer experience for intuitively interacting with data
  • Smart algorithms to guide data preparation best practices
  • Collaborative and secure environment to work with trusted data across teams

Our customers choose Paxata to build a modern information architecture. Business teams work quickly and independently with the exploratory, interactive experience enabled by IT with complete data governance, scale, and efficiency.

  • On December 9, 2016
Tags: analytics, BI, big data everywhere, converged data platform, data lake, data prep, ETL, Excel, governance, Hadoop, information pipeline, IT, mapR, self-service, Spark
Show Buttons
Hide Buttons