Etleap Launches Snowflake Integration

I am pleased to announce our integration with Snowflake. This is the second data warehouse we support, augmenting our existing Amazon Redshift data warehouse and our S3/Glue data lake offering. 

Etleap lets you integrate all your company’s data into Snowflake, and transform and model it as necessary. The result is clean and well-structured data in Snowflake that is ready for high-performance analytics. Unlike traditional ETL tools, Etleap does not require engineering effort to create, maintain, and scale. Etleap provides sophisticated data error handling and comprehensive monitoring capabilities. Because it is delivered as a service, there is no infrastructure to maintain.

 

2019.05.07 - Etleap Product Graphic

 

Like any other pipeline set up in Etleap, pipelines to Snowflake can extract from any of Etleap’s supported sources, including databases, web services, file stores, and event streams. Using Etleap’s interactive data wrangler, users have full control over how data is cleaned, structured, and de-identified before it is loaded into Snowflake. From there, Etleap’s native integration with Snowflake is designed to maximize flexibility for users in specifying attributes such as Snowflake schemas, roles, and cluster keys. Once the data is loaded, Etleap’s SQL-based modeling features can be used to further improve the usability and performance of the data for analytics.

Not only does Etleap’s integration with Snowflake provide a seamless user experience, it is also a natural fit technically. Etleap is built on AWS and stores extracted and transformed data in S3. Since Snowflake stores data in S3, loading data into Snowflake is fast and efficient. Architecturally, part of what differentiates Snowflake is its separate, elastic scaling of compute and storage resources. Etleap is built on the same principle, thus enabling it to overcome traditional bottlenecks in ETL by scaling storage and compute resources for extraction and transformation separately and elastically. By taking advantage of AWS building blocks we are able to provide a powerful yet uncomplicated data analytics stack for our customers. 

Etleap is devoted to helping teams build data warehouses and data lakes on AWS, and we offer both hosted and in-VPC deployment options. Like Snowflake, Etleap takes advantage of AWS services such as S3 and EC2 to provide performance and cost benefits not possible with traditional ETL solutions.

As more and more teams building analytics infrastructure on AWS want to use Snowflake as their data warehouse, offering support for Snowflake was a natural next step for us. 

If you would like to explore building a Snowflake data warehouse with Etleap, you can sign up for a demo here.

 

New Features: Models and History Tables

I’m excited to tell you about two new features we’re launching today: Models and History Tables.

 

Models

Etleap has long supported single-source transformations through data wrangling. This is great for cleaning, structuring, and filtering data, and for removing unwanted data, such as PII, before it is loaded to the destination. Today, we’re announcing the general availability of models, which enable transformations expressed as SQL queries. Two primary use cases for models are combining data from different sources to build data views optimized for analytics, and aggregating data to speed up analytics queries.

Etleap models are Redshift tables backed by SQL SELECT queries that you define, running against data that has been loaded to Redshift. Etleap creates tables that are the result of these SELECT queries, and updates these tables incrementally or through full refreshes. Etleap triggers updates based on changes to dependent tables, or on a schedule.

6xabeajztw

 

 

History Tables

For regular pipelines into Redshift, Etleap fetches new and updated records from the source. Following transformation, new rows are appended to the destination table, and updated rows are overwritten. This update strategy is known as type-1 Slowly Changing Dimensions in data warehouse speak.

Sometimes it’s useful to be able to go back in time and query the past state of a record, or to be able to investigate how a record has changed over time. For this, Etleap now provides the ability to retain the history of a record collection. For this, the technique known as type-2 Slowly Changing Dimensions is often used. Here’s how it works in Etleap: An end-date column is added to the table. When a record is initially inserted into the destination table, this column’s value is null. Whenever the record is changed in the source, instead of overwriting the existing record in the destination table, a new row is appended instead with a null end-date value. The existing record’s end-date value is set to the new record’s update timestamp.

Starting today, history tables are available for all pipelines from sources that have a primary key and an update timestamp. To get a history table, check the ‘retain history’ box during single or batch pipeline setup.

 

retainhistorywizard

 

Want to see these features in action? Request a demo here!

Scaling Etleap with funding from First Round Capital, SV Angel, and more

Today we’re excited to share that we’ve raised $1.5M from First Round Capital, SV Angel, Liquid2, BoxGroup, and others to continue to scale our enterprise-grade ETL solution for building and managing cloud data warehouses.

ETL has traditionally been associated with expensive projects that take months of custom development by specialized engineers. We started Etleap because we believe in a world where analytics teams manage their own data pipelines, and IT teams aren’t burdened with complex ETL infrastructure and tedious operations.

Etleap runs in the cloud and requires no engineering work to set up, maintain, and scale. It helps companies drastically lower the cost and complexity of their ETL solution and improve the usefulness of their data.

Over the past few years we’ve spent a lot of time with analytics teams in order to understand their challenges and have built features for integration, wrangling, and modeling. It’s a thrill to see data-driven customers, including Airtable, Okta, and AXS, use them. Their analytics teams are pushing the boundaries of what’s possible today, and we’re hard at work building features to help bring their productivity to new levels.

 


 

Curious how Etleap can solve your analytics infrastructure challenges? Click here to get a demo of Etleap!

Building ETL Infrastructure that Analysts Love

This recorded session is from DataEngConf NYC 17. Slides are available on the event page.

There’s an often-quoted statistic that says that data analysts spend 80% of their time preparing data and only 20% actually analyzing it. There’s a lot that we as data engineers can do to help our analytics teams be more productive and spend less time worrying about data preparation. This session discusses common problems in data warehousing infrastructure from the point of view of analytics teams, and suggests practical solutions.

Watch the session video or read the key takeaways below. Continue reading “Building ETL Infrastructure that Analysts Love”

SVG in React

React.js is a great library for creating user interfaces consisting of components. In the browser React is used to output DOM elements like divs, sections and.. SVG! The DOM supports SVG elements, so there is nothing stopping us from outputting it inline directly with React. This allows for easy creation of SVG components that are updated with props and state just like any other component.

Why SVG?

Even though a lot is possible with plain CSS, creating complex shapes like hearts or elephants is very difficult and requires a lot of code. This is because you are restricted to a limited set of primitive shapes that you have to combine to create more complex ones. SVG on the other hand is an image format and allows you a lot more flexibility in creating custom paths. This makes it much easier to create complex shapes as you are free to create any shape you want. If you need convincing, checkout these slides from Sara Soueidan’s great talk about SVG UI components.

Our use

At Etleap we have used React with SVG output in some of our graphical components. A great example of this is our circular progress bar.

circularprogressbarcomponentCircular progress bar used on our dashboard.

This component uses SVG to display the circular progress bar and works just like any other React component. It accepts a few props, including the percentage value to display, and updates whenever new props are received. The reason we opted for SVG in this case was that creating a circular progress bar in CSS is tricky. Using SVG for this was much more appropriate and was straight forward using React to output the SVG markup directly to the DOM, let’s compare the two approaches.

SVG Progress Bar

The essential SVG markup required to render the progress bar is very simple:


<svg>
  <g transform="rotate(-90 100 100)" viewbox="0 0 100 100">
    <circle className="ProgressBarCircular-bar-background" r={radius} cx={posX} cy={posY} />
    <circle className="ProgressBarCircular-bar" strokeDashArray={strokeDashArray} strokeLinecap="round" r={radius} cx={posX} cy={posY} />
  </g>
</svg>

We need two circles, one for the dark background, and one for the lighter progress display. The circles are transparent, and the stroke of the circles show the progress and background. To show the amount of progress we use a dashed outline for the circle. If the space between the first and second dash is at least the length of the circumference of the circle only one dash will be shown and we can manipulate the length of that dash to show the current progress. We use stroke-dasharray to specify the length and distance between each dash and stroke-linecap: round to get rounded ends.

 

CSS Progress Bar

Let’s have a look at how we can create a similar progress bar in CSS:

Since CSS does not support stroke-dasharray, nor stroke-linecap, we are immediately at a disadvantage, therefore lets simplify the problem and start by creating a pie-chart. We create two circles here as well, one for the background and one for the progress bar. To display progress we need to be able to cut away part of the circle, so that we are left with a pie slice. To make this happen we can use the CSS clip property (unfortunately it has been deprecated, and the replacement clip-path has very poor browser support). This enables us to define a rectangle mask for the circle so that we can hide parts of it. The problem is that this only works for a maximum of 50% at a time, so we actually need two of these, one for the right and one for the left… As you can see; this is already getting pretty complicated, and we have not even looked at how to handle the rounded edges. So to prevent doubling the length of this post we’ll stop here. If you are interested in the full solution (without rounded edges) checkout this post by Anders Ingemann.

When to use SVG

SVG should not be a replacement for all graphical user elements, but can be used to more easily achieve tricky UI effects where CSS falls short. The most important difference is that SVG supports custom paths. This means you can create any complex shape you want and easily display it, or use it as a mask. This is especially relevant in scenarios involving charts or line drawings. Other interesting features that CSS is lacking includes drawing text along a path, animating paths, and support for a bunch of filters. That being said, CSS is catching up with SVG and has seen support for several filters, masks, and even custom clip paths. For now though, if your designer has created some truly fancy UI effect that you instinctually disregard as impossible, perhaps it is a good time to look into SVG and make it a reality after all.

Reducing the size of your Webpack bundle

To ensure a great user experience it is important to keep the initial page load as  fast as possible. There are two main ways of doing this; one is to reduce the number of file requests made when the site is loading, and the other is to reduce the size of the files. To automate this it is common to use a tool that combines all your javascript into one minified bundle file.

Continue reading “Reducing the size of your Webpack bundle”

Generating password reset tokens

There are a few requirements for a good password reset token:

  1. user should be able to reset their password with the token they receive from in an email
  2. the token should not be guessable
  3. the token should expire
  4. user should not be able to re-use token

Ideally, the web framework of your choice should already have a built-in way to generate reset tokens. However, we use Play and it does not provide a way to do that, so we have to roll our own.

Continue reading “Generating password reset tokens”

Distributed CSV Parsing

tl;dr: This post is about how to split and process CSV files in pieces! Newline characters within fields makes it tricky, but with the help of a finite-state machine it’s possible to work around that in most real-world cases.

Comma-separated values (CSV) is perhaps the world’s most common data exchange format. It’s human-readable, it’s compact, and it’s supported by pretty much any application that ingests data. At Etleap we frequently encounter really big CSV files that would take a long time to process sequentially. Since we want our clients’ data pipelines to have minimal latency, we split these files into pieces and process them in a distributed fashion.

Continue reading “Distributed CSV Parsing”

Typescript at Etleap

Typescript has been getting significant attention in the past year and with over 2 million downloads per month on npm, there has undoubtedly been an increase in adoption. However, many people are still unsure if Typescript will benefit their project, and there are few resources that show how Typescript can be used in large projects and what the practical benefits are. In this post we aim to highlight how we use Typescript at Etleap so that people can get an impression of why we decided to use it and how we benefit from it.

Continue reading “Typescript at Etleap”