By Farnaz Erfan
Excel has long been the tool for business analysts to perform lightweight data preparation tasks – identifying outliers and errors, aggregating values, and combining data into one spreadsheet for analytics. However, all too often, business users waste time using Excel to manually profile and process data.
Truth is that Excel is inadequate for enterprise projects that comprise large-scale data sets, involve group collaboration, and require data accuracy in a short amount of time.
Among many, there are 3 areas where Excel’s limitations are – to nicely put it – limiting and too time consuming for data preparation at scale:
1) Interactive with Data Beyond 1 Million Rows: With Excel, data is limited to a million rows. Even with less than that amount, the larger the number of rows, the slower Excel gets and the greater the chance of Excel crashing – and taking all of the user’s changes down with it.
2) Data Profiling: To profile data in Excel, users typically create filters and pivot tables – but problems arise when a column contains thousands of distinct values or when there are duplicates resulting from different spellings. And because Excel filters have no visual representation for each value, the user must switch back and forth between pivot tables and filtered data to get a (partial) understanding of the data.
3) Data Governance and Trust: With Excel, there is no actual audit trail or data lineage. You can’t see the steps taken to cleanse a particular dataset, aside from spending your time making sense out of complex macros. And even with that, you must save every version of Excel and apply comments to mark significant changes.
These requirements and more demonstrate where data preparation with Excel entirely lacks ‘enterprise’ readiness.
Download our eBook, 10 Reasons Not to Use Excel for Data Prepare in the Enterprise, to learn more.