AWS re:Invent 2019 Roundup

Materialized Views, Amazon Redshift Ready, and more!

Last week Etleap put on another exciting show at AWS re:Invent, where we announced some new features and integrations with AWS services, were interviewed by the tech experts over at “theCUBE,” hosted a session all about data lakes, and most importantly, spoke with countless attendees about ETL. Here’s a roundup of all the Etleap action you may have missed at AWS re:Invent 2019.


Etleap’s booth was a veritable oasis of ETL discussion and Etleap product demos
Amazon Redshift Launches materialized views with help from etleap

Among AWS’ numerous announcements at re:Invent this year was the availability of Materialized Views in preview on Amazon Redshift. The Materialized Views feature is designed to help customers achieve up to 100x faster query performance on analytical workloads such as dashboarding queries from Business Intelligence (BI) tools and ELT data processing. Etleap helped launch this feature by integrating it into a beta version of Etleap Models (let us know if you want to be included in the beta!) and showing that it can give an ~8x performance boost. The Redshift team showcased our results in their chalk talk on “Accelerating performance with Materialized Views.”


Yannis (seated, left) and Vuk (standing, right) from the Amazon Redshift team showcase Etleap at their Redshift Materialized Views Chalk Talk

“We are delighted to have Etleap help launch the Materialized Views feature in Amazon Redshift,” said Andi Gutmans, Vice President, Analytics, Amazon Web Services, Inc. “Amazon Redshift Materialized Views allow customers to realize a significant boost in query performance in ETL pipelines and BI dashboards. By integrating Etleap with this new functionality, customers can seamlessly get the benefits of Amazon Redshift Materialized Views without needing to make any application changes.”

You can read the full Etleap press release about Amazon Redshift Materialized Views here.

Etleap Founder makes the case for more analyst-friendly data lakes, alongside Redshift team

Many Etleap customers use our solution to build their S3/Glue data lakes, so data lakes are a topic we’ve learned a thing or two about over the years. For re:Invent this year, we thought we’d share our data lake expertise with the world by hosting a session alongside the Redshift team entitled “Five data lake considerations with Amazon Redshift, Amazon S3 & AWS Glue.”


Etleap founder and CEO, Christian Romming, led the session focused on data lakes

Have an interest in data lakes yourself? You can check out the session here.

Etleap featured on enterprise tech talk show

After our data lakes session, Founder and CEO of Etleap, Christian Romming, sat down with the hosts of “theCUBE,” re:Invent’s resident technologies interview show. Check it out:

Etleap founder sits down with David Vellante and John Walls of theCUBE
Etleap achieves Amazon Redshift Ready Designation

Distinguishing ourselves in the Amazon Redshift partner ecosystem, we announced that Etleap has achieved the designation of “Amazon Redshift Ready,” a recently announced status among partners who have proven integration with Amazon Redshift.

Etleap was featured in the keynote announcement among a select few debuting partners

“Etleap is proud to achieve Amazon Redshift Ready status,” said Christian Romming, Founder and CEO of Etleap. “Our team is dedicated to helping companies achieve maintenance-free, enterprise-grade ETL by leveraging the agility, breadth of services, and pace of innovation that AWS provides. Our status as an Amazon Redshift Ready partner shows our continued commitment to Amazon Redshift and the AWS ecosystem.”

You can read the full Etleap press release covering the Amazon Redshift Ready announcement here.


This concludes our roundup of the biggest Etleap new stories from AWS re:Invent 2019. Stay tuned for more Etleap trade show news, and for all things ETL you’re already in the right place.

Etleap Achieves Amazon Redshift Ready designation

Recently-announced designation distinguishes Etleap on the Redshift platform

SAN FRANCISCO, Calif. – December 4, 2019 — Etleap announced today that it has achieved the Amazon Redshift Ready designation. This designation recognizes that Etleap has demonstrated successful integration with Amazon Redshift. 

Achieving the Amazon Redshift Ready designation differentiates Etleap as an AWS Partner Network (APN) member with a product integrating with Amazon Redshift and is generally available and fully supported for AWS customers. AWS Service Ready Partners have demonstrated success building products integrated with AWS services, helping AWS customers evaluate and use their technology productively, at scale and varying levels of complexity. 

“Etleap is proud to achieve Amazon Redshift Ready status,” said Christian Romming, Founder and CEO of Etleap. “Our team is dedicated to helping companies achieve maintenance-free, enterprise-grade ETL by leveraging the agility, breadth of services, and pace of innovation that AWS provides. Our status as an Amazon Redshift Ready partner shows our continued commitment to Amazon Redshift and the AWS ecosystem.”

To support the seamless integration and deployment of these solutions, AWS established the AWS Service Ready Program to help customers identify products integrated with AWS services and spend less time evaluating new tools, and more time scaling their use of products that are integrated with AWS Services.

Etleap is analyst-friendly ETL-as-a-service for Amazon Redshift and Snowflake data warehouses and Amazon S3/AWS Glue data lakes. Etleap replaces time-consuming ETL setup and maintenance with intuitive software and a managed service that automates data pipelines and reduces time to value.

For more information, email info@etleap.com; Follow us on Twitter @etleap; or Like us on Facebook @etleap.


About Etleap: Etleap was founded by Christian Romming in 2013. Before founding Etleap, Romming was the CTO of an ad-tech company, where he recognized the available solutions for building data pipelines required monumental engineering resources to implement, maintain, and scale. Etleap is backed by world-class investment firms First Round Capital, SV Angel, BoxGroup, and Y Combinator. Our mission is to make data analytics teams more productive. Our ETL solution lets analysts build data warehouses without internal IT resources or knowledge of complex scripting languages. This reduces the time of typical ETL projects from weeks to hours, and takes out the pain of maintaining data pipelines over time.

What is the “length” of a string?

Finding the length of a string in JavaScript is simple, you use the .length property and that’s it, right?

Not so fast. The “length” of a string may not be exactly what you expect. It turns out that the string length property is the number of code units in the string, and not the number of characters (or more specifically graphemes) as we might expect. For example; “😃” has a length of 2, and “👱‍♂️” has a length of 5!

Screenshot from Etleap’s data wrangler where the column width depends on the column contents.

In our application we have a data wrangler that lets you view a sample of your data in a tabular format. Since this table supports infinite scrolling, both rows and columns are rendered on demand as you scroll vertically or horizontally. We can’t render all the rows and columns at once since a table could easily include more than a hundred thousand cells, which would bring the browser to its knees.

“The ‘length’ of a string may not be exactly what you expect.”

Imagine if most rows of a column contains a small amount of data, such as a single word, but a single row contains more data, such as a sentence. If this row is outside of the currently viewed area we don’t want the column to expand as you scroll down, and we definitely don’t want to cram the sentence into the same small space that’s required by the word. This means that we need to find the widest cell in the column before rendering all the cells. It’s fast and straightforward to find the length of the content in each cell, however what if the cell contains emojis or other content where we can’t rely on the length property to give us an accurate value?

Code units vs. code points

Let’s do a quick Unicode recap. Each character in Unicode is identified by a unique code point represented by a number between 0 and 10FFFF.  Unfortunately, 10FFFF is a large number and requires 4 bytes to represent. To prevent having to allocate 4 bytes for each character, Unicode also specifies different encoding standards that can be used to interpret it, including UTF-16 which is the internal string encoding used by JavaScript.

UTF-16 is a variable length encoding, which means that it uses either 2 or 4 bytes for each code point depending on what is required. To differentiate, we say that UTF-16 uses one or two code units to represent one Unicode code point. The most used characters all fit into one code unit, however some of the more exotic characters, such as emojis, require two code units.

“It turns out that code points are not the only caveat regarding string lengths in JavaScript.”

This is where a problem arises. Since the .length property returns the number of code units, and not the number of code points, it does not directly map to what you may expect. As an example, the emoji “☺️” has a length of 2, even though it looks like only one character.

How can we work around this? ES2015 introduced ways of splitting a string into its respective code points by providing a string iterator. Both Array.from and the spread operator […string] uses this internally so both can be used to get the length of a string in code points.

Combining Characters

It turns out that code points are not the only caveat regarding string lengths in JavaScript. Another is combining characters. A combining character is a character that doesn’t stand on its own, but rather modifies the other characters around it. This is supported in Unicode, meaning that characters such as “è” is actually made up of two code points, “e” and  “\u0300”. This is widely used to combine emojis to get a new representation, such as “👱‍♂️” which is a combination of ” 👱” and ” ♂” with a zero width joiner (\uDC71) in between.

Working around this is more complicated. Currently there is no built in way of reliably counting graphemes in JavaScript. A current stage 2 proposal suggests adding Intl.Segmenter which will return the number of graphemes in a string, however there’s no guarantee that it will make it into the spec (there’s a polyfill for the proposal if you’re desperate.)

Environment Specific Differences

Did you know there’s a ninja cat emoji? Neither did we, because it’s a Windows-only emoji! It’s represented by a combination of “🐱” and “👤”. This means that Windows users will see this combination as one character, while other users will see it as two characters. Depending on the users choice of fonts, they could even see something completely different. You could try to prevent this issue by choosing a specific font for your web app, however that won’t be sufficient as the browser will still search through other fonts on your system if a character is not available in your chosen font.

“The various environment specific differences means that there’s generally no way of measuring the rendered width of a string mathematically. “

Checkmate?

The various environment specific differences means that there’s generally no way of measuring the rendered width of a string mathematically. Therefore, the only way to determine the pixel length is to render it and measure. For our use case in the wrangler, this is exactly what we wanted to avoid in the first place. However there are some optimizations that we can make. 

Instead of rendering all the strings in each column, we can split the strings into their corresponding graphemes and render them individually. This allows us to cache the pixel length of each grapheme we encounter. Since there are substantially fewer graphemes than unique strings in a table, this results in a significant reduction in total rendering. This way we can easily determine the correct width of a column, all while keeping the scrolling snappy and your browser happy.