AXS knocks it out of the park with modern data analytics

Learn how Etleap and Looker helped AXS reduce manual ETL work and reporting, allowing them to focus on growth for themselves and their clients.

AXS is a ticketing company for live entertainment

AXS powers the ticket buying experience for over 350 world-wide partners

AXS is a leading ticketing, data, and marketing solutions provider in the US, UK and Europe. The company and its solutions empower more than 200 clients— teams, arenas, theaters, clubs and colleges— to turn data into action, maximize the value of all their events and create joy for fans. It is an enterprise event technology platform that services venues, promoters and sports teams; providing fans the opportunity to purchase tickets directly from their favorite venues via a user-friendly ticketing interface. While customers know them as a destination for tickets, clients recognize AXS for their data services, including transforming, reporting, analyzing, and more.

The data services offered by AXS have always been incredibly helpful for clients. But to make them so valuable, a significant amount of ETL work and reporting was required, which resulted in some challenges for the team. To learn more about those challenges, and how they eventually found a solution in both Etleap and Looker, we spoke with Ben Fischer, the Sr. Director of Business Intelligence and Strategy.

THE CHALLENGES

Ben oversees the Business Intelligence and Strategy team, which manages everything from integrations and building data models to powering the data warehouse and products across AXS. The team’s main objectives are to power AXS’s internal data services, while also delivering data services for clients.

Before Etleap and Looker, the Data Engineering team was spending more and more of their time working on internal and external requests for one-off ingestions and custom data sources. Each data source would take anywhere from half a day to weeks (or even months) to implement, which meant the team was spending most of their time on ETL work, and not enough time on making the data useful.

“With Etleap, we’re able to do the ETL end-to-end and get it directly into the hands of whoever’s trying to use it right away.”

– Ben Fischer, Sr. Director of Business Intelligence

Ben told us, “The whole team was just getting sucked into ETL work constantly, which was not the best use of their time. We wanted to be working on modeling and on the products.”

SEARCHING FOR A BETTER DATA SOLUTION

In order to find the right solutions to fit their needs, the AXS team compared several modern ETL solutions.

When comparing the options, they found that most of the tools were good solutions for getting ETL out of the engineers’ hands, allowing less technical people to consistently bring in the data, while also offering support and monitoring. However, Etleap stood out in two main areas: transformation and transparency.

For AXS, transformation was important because they wanted the ability to not only bring in a new data source, but automatically transform it into something useful for analysts. “Stitch and Fivetran are really focused on the “extract” and “load,” so they’ll bring data in from an outside source and put it into your data warehouse, but they don’t offer much in the way of transformation. You still have to transform it afterwards into something that’s usable, which means you’re still relying on engineering to access the data. With Etleap, we’re able to do the ETL end-to-end and get it directly into the hands of whoever’s trying to use it right away.” said Ben.

Etleap’s data wrangler makes parsing and structuring data take minutes instead of months.

Beyond the transformation aspect, the AXS team was also impressed by Etleap’s level of transparency around reliability. “A lot of the competitors emphasize this idea of 100% reliability. They would say that they would never miss any data, everything would come through perfectly, and you would never have any issues. But we knew that wasn’t the case. No tool is 100% perfect, and when talking with Etleap, they were much more open about what we could expect. They acknowledged that 100% reliability is the objective, but that it’s challenging to achieve in practice, so it’s something they’re continuously working to achieve. Some of the competitors wouldn’t even acknowledge that reliability could possibly be an issue, which makes you feel like they may not support you if anything goes wrong.” said Ben.

To top things off, Etleap was also very straightforward to use. It required very little training and offered reliable support, which meant AXS could get up and running immediately. When the team first started evaluating ETL solutions, they encountered complexity with managing the tools and building integrations. “But with Etleap, it’s pretty straightforward. There’s always somebody available if you need to reach out. That meant we could start using Etleap in just a matter of days, rather than undergoing weeks of training,” said Ben.

Once the team found an ETL solution, it was time to help out the analysts. They looked at a variety of products for business intelligence, and even tried a few different solutions, but Looker stood out in part because it could get report building out of the hands of the analysts. Ben told us, “With Etleap, it was about getting ETL out of engineering. With Looker, it was about getting report building out of analytics, so analysts can spend their time actually forming opinions, defining strategy, doing analyses, and digging into the data, rather than just building reports day in and day out.”

Consistency and confidence are critical to democratizing data, and Looker’s data modeling layer allows people across AXS to pull their own insights and reports very quickly without having to worry about whether the numbers match. This means the Business Intelligence & Strategy team can now stay focused on building models and driving insights, instead of just building reports.

WITH ETLEAP AND LOOKER, THE ENTIRE AXS TEAM IS ABLE TO FOCUS ON HIGH-VALUE TASKS THAT DRIVE THEIR BUSINESS FORWARD.

Since implementing both Etleap and Looker, the impact has positively affected the entire AXS team, as well as their clients.

First, the Data Engineering and Business Intelligence & Strategy teams are spending far less time on manual ETL and reporting work, and much more time on high-value tasks that contribute to internal growth, as well as client successes.

“These tools make our various teams more impactful across the business. For example, if our engineers were just doing all the ETL work manually, we would not be able to do even half of the work that we’re doing to drive the business forward. And the same applies with Looker. Right now, we’ve got people all over the organization looking at reports every day in Looker and answering their own questions about what’s going on with the business.

“Our lives have become much less about pulling reports or bringing in data, and more about really driving value for the company beyond the mundane day to day.” Beyond making life easier and their work more impactful, reporting has also become much faster. Previously, you may have had to wait weeks, or potentially months, to get access to a new data source so you could do your analyses. Now, you can solve that yourself in a couple of hours, without having to wait for other people.

Looker gives companies a single source of truth for all their data.

“With Looker, it was about getting report building out of analytics, so analysts can spend their time actually forming opinions, defining strategy, doing analyses, and digging into the data, rather than just building reports day in and day out.”

– Ben Fischer, Sr. Director of Business Intelligence

“It also makes iteration much faster. We can define something, put it into production, and report off of it. If three days later we realize we forgot something, it’s a two-minute fix rather than going back to engineering, having someone spend half a day on it.” said Ben.

Finally, having access to Etleap allows the team to easily look at data from different angles, making the analysis and insights for clients even more valuable. “Etleap has a function for modeling data, which is useful for reporting, as it allows you to build the aggregations you need to power impactful reports. We can have processes that run everyday and get a quick summary of the data from all different perspectives. Before, it would have taken an engineer a couple of days to build that.” said Ben.

With Etleap and Looker, the AXS team finally has the time and resources to focus on bigger initiatives, including GDPR, internationalization, increasing accessibility across the organization, and providing even more data services to clients. With these tools in their arsenal, the sky is truly the limit.

What is the “length” of a string?

Finding the length of a string in JavaScript is simple, you use the .length property and that’s it, right?

Not so fast. The “length” of a string may not be exactly what you expect. It turns out that the string length property is the number of code units in the string, and not the number of characters (or more specifically graphemes) as we might expect. For example; “😃” has a length of 2, and “👱‍♂️” has a length of 5!

Screenshot from Etleap’s data wrangler where the column width depends on the column contents.

In our application we have a data wrangler that lets you view a sample of your data in a tabular format. Since this table supports infinite scrolling, both rows and columns are rendered on demand as you scroll vertically or horizontally. We can’t render all the rows and columns at once since a table could easily include more than a hundred thousand cells, which would bring the browser to its knees.

“The ‘length’ of a string may not be exactly what you expect.”

Imagine if most rows of a column contains a small amount of data, such as a single word, but a single row contains more data, such as a sentence. If this row is outside of the currently viewed area we don’t want the column to expand as you scroll down, and we definitely don’t want to cram the sentence into the same small space that’s required by the word. This means that we need to find the widest cell in the column before rendering all the cells. It’s fast and straightforward to find the length of the content in each cell, however what if the cell contains emojis or other content where we can’t rely on the length property to give us an accurate value?

Code units vs. code points

Let’s do a quick Unicode recap. Each character in Unicode is identified by a unique code point represented by a number between 0 and 10FFFF.  Unfortunately, 10FFFF is a large number and requires 4 bytes to represent. To prevent having to allocate 4 bytes for each character, Unicode also specifies different encoding standards that can be used to interpret it, including UTF-16 which is the internal string encoding used by JavaScript.

UTF-16 is a variable length encoding, which means that it uses either 2 or 4 bytes for each code point depending on what is required. To differentiate, we say that UTF-16 uses one or two code units to represent one Unicode code point. The most used characters all fit into one code unit, however some of the more exotic characters, such as emojis, require two code units.

“It turns out that code points are not the only caveat regarding string lengths in JavaScript.”

This is where a problem arises. Since the .length property returns the number of code units, and not the number of code points, it does not directly map to what you may expect. As an example, the emoji “☺️” has a length of 2, even though it looks like only one character.

How can we work around this? ES2015 introduced ways of splitting a string into its respective code points by providing a string iterator. Both Array.from and the spread operator […string] uses this internally so both can be used to get the length of a string in code points.

Combining Characters

It turns out that code points are not the only caveat regarding string lengths in JavaScript. Another is combining characters. A combining character is a character that doesn’t stand on its own, but rather modifies the other characters around it. This is supported in Unicode, meaning that characters such as “è” is actually made up of two code points, “e” and  “\u0300”. This is widely used to combine emojis to get a new representation, such as “👱‍♂️” which is a combination of ” 👱” and ” ♂” with a zero width joiner (\uDC71) in between.

Working around this is more complicated. Currently there is no built in way of reliably counting graphemes in JavaScript. A current stage 2 proposal suggests adding Intl.Segmenter which will return the number of graphemes in a string, however there’s no guarantee that it will make it into the spec (there’s a polyfill for the proposal if you’re desperate.)

Environment Specific Differences

Did you know there’s a ninja cat emoji? Neither did we, because it’s a Windows-only emoji! It’s represented by a combination of “🐱” and “👤”. This means that Windows users will see this combination as one character, while other users will see it as two characters. Depending on the users choice of fonts, they could even see something completely different. You could try to prevent this issue by choosing a specific font for your web app, however that won’t be sufficient as the browser will still search through other fonts on your system if a character is not available in your chosen font.

“The various environment specific differences means that there’s generally no way of measuring the rendered width of a string mathematically. “

Checkmate?

The various environment specific differences means that there’s generally no way of measuring the rendered width of a string mathematically. Therefore, the only way to determine the pixel length is to render it and measure. For our use case in the wrangler, this is exactly what we wanted to avoid in the first place. However there are some optimizations that we can make. 

Instead of rendering all the strings in each column, we can split the strings into their corresponding graphemes and render them individually. This allows us to cache the pixel length of each grapheme we encounter. Since there are substantially fewer graphemes than unique strings in a table, this results in a significant reduction in total rendering. This way we can easily determine the correct width of a column, all while keeping the scrolling snappy and your browser happy.

High Pipeline Latency Incident Post-Mortem

Between 15:30 UTC on 8/27 and 14:00 UTC on 8/29 we experienced periods of higher-than-usual pipeline latencies. Between 04:00 and 10:00 UTC on 8/29 most pipelines were completely stopped. At Etleap we want to be transparent about system issues that affect customers, and this post summarizes the timeline of the incident and our team’s response, and what we are doing to prevent a similar incident from happening again.

Number of users with at least one pipeline with higher-than-normal latency.

What happened and what was the impact?

At around 11:30 UTC on 8/27 our ops team was alerted about spikes in two different metrics: CPU of a Zookeeper node and stop-the-world garbage collection (STW GC) time in a Java process responsible for orchestrating certain ETL activities. The two processes were running in different Docker containers on the same host. From this point onwards we saw intermittent spikes in both metrics and periods of downtime of the orchestration process, until the final fix was put in place at 14:00 UTC on 8/29. Additionally, at 15:30 UTC on 8/27 we received the first alert regarding high pipeline latencies. There were intermittent periods of high latency until 10:00 UTC on 8/29.

Incident Response

When our ops team received the first alert they followed our incident response playbook in order to diagnose the problem. It includes checking on potential causes such as spikes in usage, recently deployed changes, and infrastructure component health. The team determined that the issue had to do with the component that sets up source extraction activities, but found no other correlations. Suspecting an external change related to a pipeline source was leading to the increased garbage collection activity, they went on to attempt to narrow down the problem in terms of dimensions such as source, source type, and customer. Etleap uses a Zookeeper cluster for things like interprocess locking and rate limiting, and the theory was that a misbehaving pipeline source was causing the extraction logic to put a significant amount of additional load on the Zookeeper process, while at the same time causing memory pressure within the process itself. However, after an exhaustive search it was determined that the problem could not be attributed to a single source or customer. Also, memory analysis of the Java process with garbage collection issues showed nothing out of the ordinary.

The Culprit

Next, the team looked at the memory situation for the host itself. While each process was running within its defined memory bounds, we found that in aggregate the processes’ memory usage exceeded the amount of physical memory available on the host. The host was configured with a swap space, and while this is often a good practice, it is not so for Zookeeper: by being forced to swap to disk, Zookeeper’s response times went up, leading to queued requests.

Stats show Zookeeper node in an unhealthy state.

In other words, the fact that we had incrementally crossed an overall physical memory limit on this host caused a dramatic degradation of the performance of Zookeeper, which in turn resulted in garbage collection time in a client process. The immediate solution was to increase the physical memory on this host, which had the effect of bringing Zookeeper stats back to normal levels (along with the CPU and STW GC metrics mentioned before).

Zookeeper back in a healthy state after memory increase.

Next steps

We are taking several steps to prevent a similar issue in the future. First, we are configuring Zookeeper not to use swap space. Second, we’re adding monitoring of the key Zookeeper stats, such as latency and outstanding connections. Third, we are adding monitoring of available host physical memory to make sure we know when pressure is getting high. Any of the three configuration and monitoring improvements in isolation would have led us to find the issue sooner, and all three will help prevent issues like this from happening in the first place.

While it’s impossible to guarantee there will never be high latencies for some pipelines, periods of high latencies across the board are unacceptable. What made this incident particularly egregious was the fact that it went on for over 40 hours, and the whole Etleap team is sorry that this happened. The long resolution time was in large part because we didn’t have the appropriate monitoring to lead us towards the root cause, and we have learned from this and are putting more monitoring of key components in place going forward.

Etleap Launches Snowflake Integration

I am pleased to announce our integration with Snowflake. This is the second data warehouse we support, augmenting our existing Amazon Redshift data warehouse and our S3/Glue data lake offering. 

Etleap lets you integrate all your company’s data into Snowflake, and transform and model it as necessary. The result is clean and well-structured data in Snowflake that is ready for high-performance analytics. Unlike traditional ETL tools, Etleap does not require engineering effort to create, maintain, and scale. Etleap provides sophisticated data error handling and comprehensive monitoring capabilities. Because it is delivered as a service, there is no infrastructure to maintain.

 

2019.05.07 - Etleap Product Graphic

 

Like any other pipeline set up in Etleap, pipelines to Snowflake can extract from any of Etleap’s supported sources, including databases, web services, file stores, and event streams. Using Etleap’s interactive data wrangler, users have full control over how data is cleaned, structured, and de-identified before it is loaded into Snowflake. From there, Etleap’s native integration with Snowflake is designed to maximize flexibility for users in specifying attributes such as Snowflake schemas, roles, and cluster keys. Once the data is loaded, Etleap’s SQL-based modeling features can be used to further improve the usability and performance of the data for analytics.

Not only does Etleap’s integration with Snowflake provide a seamless user experience, it is also a natural fit technically. Etleap is built on AWS and stores extracted and transformed data in S3. Since Snowflake stores data in S3, loading data into Snowflake is fast and efficient. Architecturally, part of what differentiates Snowflake is its separate, elastic scaling of compute and storage resources. Etleap is built on the same principle, thus enabling it to overcome traditional bottlenecks in ETL by scaling storage and compute resources for extraction and transformation separately and elastically. By taking advantage of AWS building blocks we are able to provide a powerful yet uncomplicated data analytics stack for our customers. 

Etleap is devoted to helping teams build data warehouses and data lakes on AWS, and we offer both hosted and in-VPC deployment options. Like Snowflake, Etleap takes advantage of AWS services such as S3 and EC2 to provide performance and cost benefits not possible with traditional ETL solutions.

As more and more teams building analytics infrastructure on AWS want to use Snowflake as their data warehouse, offering support for Snowflake was a natural next step for us. 

If you would like to explore building a Snowflake data warehouse with Etleap, you can sign up for a demo here.

 

New Features: Models and History Tables

I’m excited to tell you about two new features we’re launching today: Models and History Tables.

 

Models

Etleap has long supported single-source transformations through data wrangling. This is great for cleaning, structuring, and filtering data, and for removing unwanted data, such as PII, before it is loaded to the destination. Today, we’re announcing the general availability of models, which enable transformations expressed as SQL queries. Two primary use cases for models are combining data from different sources to build data views optimized for analytics, and aggregating data to speed up analytics queries.

Etleap models are Redshift tables backed by SQL SELECT queries that you define, running against data that has been loaded to Redshift. Etleap creates tables that are the result of these SELECT queries, and updates these tables incrementally or through full refreshes. Etleap triggers updates based on changes to dependent tables, or on a schedule.

6xabeajztw

 

 

History Tables

For regular pipelines into Redshift, Etleap fetches new and updated records from the source. Following transformation, new rows are appended to the destination table, and updated rows are overwritten. This update strategy is known as type-1 Slowly Changing Dimensions in data warehouse speak.

Sometimes it’s useful to be able to go back in time and query the past state of a record, or to be able to investigate how a record has changed over time. For this, Etleap now provides the ability to retain the history of a record collection. For this, the technique known as type-2 Slowly Changing Dimensions is often used. Here’s how it works in Etleap: An end-date column is added to the table. When a record is initially inserted into the destination table, this column’s value is null. Whenever the record is changed in the source, instead of overwriting the existing record in the destination table, a new row is appended instead with a null end-date value. The existing record’s end-date value is set to the new record’s update timestamp.

Starting today, history tables are available for all pipelines from sources that have a primary key and an update timestamp. To get a history table, check the ‘retain history’ box during single or batch pipeline setup.

 

retainhistorywizard

 

Want to see these features in action? Request a demo here!

Scaling Etleap with funding from First Round Capital, SV Angel, and more

Today we’re excited to share that we’ve raised $1.5M from First Round Capital, SV Angel, Liquid2, BoxGroup, and others to continue to scale our enterprise-grade ETL solution for building and managing cloud data warehouses.

ETL has traditionally been associated with expensive projects that take months of custom development by specialized engineers. We started Etleap because we believe in a world where analytics teams manage their own data pipelines, and IT teams aren’t burdened with complex ETL infrastructure and tedious operations.

Etleap runs in the cloud and requires no engineering work to set up, maintain, and scale. It helps companies drastically lower the cost and complexity of their ETL solution and improve the usefulness of their data.

Over the past few years we’ve spent a lot of time with analytics teams in order to understand their challenges and have built features for integration, wrangling, and modeling. It’s a thrill to see data-driven customers, including Airtable, Okta, and AXS, use them. Their analytics teams are pushing the boundaries of what’s possible today, and we’re hard at work building features to help bring their productivity to new levels.

 


 

Curious how Etleap can solve your analytics infrastructure challenges? Click here to get a demo of Etleap!

Building ETL Infrastructure that Analysts Love

This recorded session is from DataEngConf NYC 17. Slides are available on the event page.

There’s an often-quoted statistic that says that data analysts spend 80% of their time preparing data and only 20% actually analyzing it. There’s a lot that we as data engineers can do to help our analytics teams be more productive and spend less time worrying about data preparation. This session discusses common problems in data warehousing infrastructure from the point of view of analytics teams, and suggests practical solutions.

Watch the session video or read the key takeaways below. Continue reading “Building ETL Infrastructure that Analysts Love”

SVG in React

React.js is a great library for creating user interfaces consisting of components. In the browser React is used to output DOM elements like divs, sections and.. SVG! The DOM supports SVG elements, so there is nothing stopping us from outputting it inline directly with React. This allows for easy creation of SVG components that are updated with props and state just like any other component.

Why SVG?

Even though a lot is possible with plain CSS, creating complex shapes like hearts or elephants is very difficult and requires a lot of code. This is because you are restricted to a limited set of primitive shapes that you have to combine to create more complex ones. SVG on the other hand is an image format and allows you a lot more flexibility in creating custom paths. This makes it much easier to create complex shapes as you are free to create any shape you want. If you need convincing, checkout these slides from Sara Soueidan’s great talk about SVG UI components.

Our use

At Etleap we have used React with SVG output in some of our graphical components. A great example of this is our circular progress bar.

circularprogressbarcomponentCircular progress bar used on our dashboard.

This component uses SVG to display the circular progress bar and works just like any other React component. It accepts a few props, including the percentage value to display, and updates whenever new props are received. The reason we opted for SVG in this case was that creating a circular progress bar in CSS is tricky. Using SVG for this was much more appropriate and was straight forward using React to output the SVG markup directly to the DOM, let’s compare the two approaches.

SVG Progress Bar

The essential SVG markup required to render the progress bar is very simple:


<svg>
<g transform="rotate(-90 100 100)" viewbox="0 0 100 100">
<circle className="ProgressBarCircular-bar-background" r={radius} cx={posX} cy={posY} />
<circle className="ProgressBarCircular-bar" strokeDashArray={strokeDashArray} strokeLinecap="round" r={radius} cx={posX} cy={posY} />
</g>
</svg>

We need two circles, one for the dark background, and one for the lighter progress display. The circles are transparent, and the stroke of the circles show the progress and background. To show the amount of progress we use a dashed outline for the circle. If the space between the first and second dash is at least the length of the circumference of the circle only one dash will be shown and we can manipulate the length of that dash to show the current progress. We use stroke-dasharray to specify the length and distance between each dash and stroke-linecap: round to get rounded ends.

 

CSS Progress Bar

Let’s have a look at how we can create a similar progress bar in CSS:

Since CSS does not support stroke-dasharray, nor stroke-linecap, we are immediately at a disadvantage, therefore lets simplify the problem and start by creating a pie-chart. We create two circles here as well, one for the background and one for the progress bar. To display progress we need to be able to cut away part of the circle, so that we are left with a pie slice. To make this happen we can use the CSS clip property (unfortunately it has been deprecated, and the replacement clip-path has very poor browser support). This enables us to define a rectangle mask for the circle so that we can hide parts of it. The problem is that this only works for a maximum of 50% at a time, so we actually need two of these, one for the right and one for the left… As you can see; this is already getting pretty complicated, and we have not even looked at how to handle the rounded edges. So to prevent doubling the length of this post we’ll stop here. If you are interested in the full solution (without rounded edges) checkout this post by Anders Ingemann.

When to use SVG

SVG should not be a replacement for all graphical user elements, but can be used to more easily achieve tricky UI effects where CSS falls short. The most important difference is that SVG supports custom paths. This means you can create any complex shape you want and easily display it, or use it as a mask. This is especially relevant in scenarios involving charts or line drawings. Other interesting features that CSS is lacking includes drawing text along a path, animating paths, and support for a bunch of filters. That being said, CSS is catching up with SVG and has seen support for several filters, masks, and even custom clip paths. For now though, if your designer has created some truly fancy UI effect that you instinctually disregard as impossible, perhaps it is a good time to look into SVG and make it a reality after all.

Reducing the size of your Webpack bundle

To ensure a great user experience it is important to keep the initial page load as  fast as possible. There are two main ways of doing this; one is to reduce the number of file requests made when the site is loading, and the other is to reduce the size of the files. To automate this it is common to use a tool that combines all your javascript into one minified bundle file.

Continue reading “Reducing the size of your Webpack bundle”

Generating password reset tokens

There are a few requirements for a good password reset token:

  1. user should be able to reset their password with the token they receive from in an email
  2. the token should not be guessable
  3. the token should expire
  4. user should not be able to re-use token

Ideally, the web framework of your choice should already have a built-in way to generate reset tokens. However, we use Play and it does not provide a way to do that, so we have to roll our own.

Continue reading “Generating password reset tokens”