IT execs are debating whether ETL (Extract, Transform and Load) is still relevant in a big data and cloud-based data warehouse world. Let’s take a look.
During the last four decades, ETL (which stands “extract, transform, and load”) for has been a mainstay method for organizations that need to move data from source systems to a data warehouse or other data repository for analytics purposes.
ETL extracts raw data from disparate source systems (e.g. CRM software, inventory software, e-commerce applications, web analytics), transforms all this data into a format suitable for querying and analysis, before finally loading it into a target system, which is typically a data warehouse, but could be any data repository. For an overview of ETL and a collection of useful resources, check out this ETL wiki.
Several ETL tools are available to help achieve an efficient ETL process. Alternatively, some enterprises hire developers to hand-code their ETL logic. A debate rages in IT circles over whether ETL is even relevant anymore in a big data and cloud-based data warehouse world.
Let’s discuss if ETL is still relevant in 2018, and let’s look at some ETL alternatives available now.
ETL and modern data analytics
Statistics company Statista estimates that the Hadoop market will grow from $6 billion in 2015 to $50 billion by 2020. Big data, in general, will grow from $27 billion to $100 billion in the same period. The argument for ETL being outdated partly emerged due to the huge growth of Hadoop and other Big data platforms. Is there a need to ETL data when organizations can just dump it in Hadoop and analyze the data? After all, large storage systems such as Hadoop allow organizations to store and analyze huge volumes of both structured and unstructured data in the same place, making ETL seemingly redundant.
However, it must be noted that it requires extreme skill and specialized data knowledge to make sense of data stored in Hadoop clusters as it is. For most professionals looking to use BI tools or reporting tools, they’ll want to connect such tools to a well-defined data model with data that conforms to business terminology in a clean format.
Data scientists can, of course, glean insights and trends from raw data. But BI analysts arguably still need ETL tools so that they can do their jobs and provide value to their businesses without worrying about difficult data exploration to perform their analyses.
Cloud-Based Data Warehouses
Another trend affecting the relevance of ETL is the emergence of cloud-based data warehouse systems as a replacement for on-premise systems. According to a 2017 data warehouse report, 80% of the data warehouse tools used by organizations are now cloud-based versus on-premise, and 61% of respondents were currently not using any ETL tool at all.
Cloud-based data warehouse providers such as AWS and Microsoft Azure use a network of remote servers and computing resources in the cloud to provide data warehouse functionality. Due to the power of these cloud-based data warehouse systems, some experts believe that ETL is now unnecessary—enterprises can get raw data from source systems and load it straight into the data warehouse.
While the argument is a solid one, it is incorrect to say that ETL is redundant. Not all enterprises use cloud-based data warehouses, meaning there is still a role for ETL to play in legacy systems. With increased cloud adoption, ETL will become less relevant over time, but it is not outdated for all use cases.
Some ETL alternatives
A variation called “Extract Load and Transform” (ELT) is becoming the favored alternative form of data movement that meets modern use cases better than ETL. In ELT, extraction is performed to get data from source systems to the target system, which is typically a cloud-based data warehouse, for BI purposes.
Where ELT differs is that raw data is only transformed within the target system on an as-needed basis. When someone wants to query the data, it’s transformed for that purpose. The benefit of ELT over ETL is the reduced waiting times, with data accessible at all times.
Integration platform as a service (iPaas) also provides a viable ETL alternative. iPaas solutions can provide real-time data integration to meet the on-demand analytic needs of modern BI analysts. Such services use cloud resources to integrate disparate cloud-based and on-premise systems together straight away without the need for complex coding or separate ETL tools.
In the end, it’s not the end of ETL…yet
The emergence of new technologies and powerful cloud-based systems hasn’t quite meant that the death knell for ETL has already sounded. There are still uses for ETL software, particularly for enterprises using on-premise data warehouses and for professionals who need to work with data that has been transformed and conformed to business terminology that they actually understand.