New Features: Models and History Tables

I’m excited to tell you about two new features we’re launching today: Models and History Tables.

 

Models

Etleap has long supported single-source transformations through data wrangling. This is great for cleaning, structuring, and filtering data, and for removing unwanted data, such as PII, before it is loaded to the destination. Today, we’re announcing the general availability of models, which enable transformations expressed as SQL queries. Two primary use cases for models are combining data from different sources to build data views optimized for analytics, and aggregating data to speed up analytics queries.

Etleap models are Redshift tables backed by SQL SELECT queries that you define, running against data that has been loaded to Redshift. Etleap creates tables that are the result of these SELECT queries, and updates these tables incrementally or through full refreshes. Etleap triggers updates based on changes to dependent tables, or on a schedule.

6xabeajztw

 

 

History Tables

For regular pipelines into Redshift, Etleap fetches new and updated records from the source. Following transformation, new rows are appended to the destination table, and updated rows are overwritten. This update strategy is known as type-1 Slowly Changing Dimensions in data warehouse speak.

Sometimes it’s useful to be able to go back in time and query the past state of a record, or to be able to investigate how a record has changed over time. For this, Etleap now provides the ability to retain the history of a record collection. For this, the technique known as type-2 Slowly Changing Dimensions is often used. Here’s how it works in Etleap: An end-date column is added to the table. When a record is initially inserted into the destination table, this column’s value is null. Whenever the record is changed in the source, instead of overwriting the existing record in the destination table, a new row is appended instead with a null end-date value. The existing record’s end-date value is set to the new record’s update timestamp.

Starting today, history tables are available for all pipelines from sources that have a primary key and an update timestamp. To get a history table, check the ‘retain history’ box during single or batch pipeline setup.

 

retainhistorywizard

 

Want to see these features in action? Request a demo here!

Scaling Etleap with funding from First Round Capital, SV Angel, and more

Today we’re excited to share that we’ve raised $1.5M from First Round Capital, SV Angel, Liquid2, BoxGroup, and others to continue to scale our enterprise-grade ETL solution for building and managing cloud data warehouses.

ETL has traditionally been associated with expensive projects that take months of custom development by specialized engineers. We started Etleap because we believe in a world where analytics teams manage their own data pipelines, and IT teams aren’t burdened with complex ETL infrastructure and tedious operations.

Etleap runs in the cloud and requires no engineering work to set up, maintain, and scale. It helps companies drastically lower the cost and complexity of their ETL solution and improve the usefulness of their data.

Over the past few years we’ve spent a lot of time with analytics teams in order to understand their challenges and have built features for integration, wrangling, and modeling. It’s a thrill to see data-driven customers, including Airtable, Okta, and AXS, use them. Their analytics teams are pushing the boundaries of what’s possible today, and we’re hard at work building features to help bring their productivity to new levels.

 


 

Curious how Etleap can solve your analytics infrastructure challenges? Click here to get a demo of Etleap!

Distributed CSV Parsing

tl;dr: This post is about how to split and process CSV files in pieces! Newline characters within fields makes it tricky, but with the help of a finite-state machine it’s possible to work around that in most real-world cases.

Comma-separated values (CSV) is perhaps the world’s most common data exchange format. It’s human-readable, it’s compact, and it’s supported by pretty much any application that ingests data. At Etleap we frequently encounter really big CSV files that would take a long time to process sequentially. Since we want our clients’ data pipelines to have minimal latency, we split these files into pieces and process them in a distributed fashion.

Continue reading “Distributed CSV Parsing”