6 min read

Sauce Labs combines ETL with data warehousing to harness global data growth and boost analytics

By Etleap Marketing
April 28, 2021
Blog Sauce Labs combines ETL with data warehousing to harness global data growth and boost analytics

Learn how Etleap partnered with Sauce Labs and Snowflake to create a scalable data cloud that enabled analytics. 

Sauce Labs is a cloud-hosted, web and mobile application automated testing platform company

logo-2 Screen Shot 2021-04-26 at 5.24.30 PM

Sauce Labs combines ETL with data warehousing to harness global data growth and boost analytics

Inspiring digital confidence requires greater analytics capabilities

Sauce Labs had been providing continuous testing solutions to America’s leading companies for over a decade, but the company needed to enhance its data analytics infrastructure to keep pace with the rapid growth of their SaaS platform.

“The last few years have been significant in terms of expanding our customer base,” explains Bob Henrikson, Data Engineering Manager at Sauce Labs. “But that also means more data.”

Meanwhile, their business had expanded outside of North America, including into Europe. As they went global, they needed a new approach that harnessed data growth and met evolving analytics requirements.

“Etleap had the basics around the different SaaS applications we needed, like Salesforce and Mixpanel, as well as the ability to do log-based replication from MySQL and MongoDB.”

Bob Henrikson, Data Engineering Manager at Sauce Labs

CHALLENGE

Transform data analytics to enable new projects and opportunities

The Sauce Labs customer base includes Fortune 500 companies such as VISA and Walmart, some of whom run thousands of concurrent tests per day. Each organization has a different team structure, and they want to understand how their teams are using the platform.

“There are many different use cases around understanding our customer usage and where they're at in their journey,” says Henrikson. “So, how can we offer opportunities to resolve these issues and get them more productive?”

Running a large platform across multiple regions also requires identifying problem areas and capacity bottlenecks to enable the prioritization of infrastructure investments. Henrikson led a small data team, which, when combined with analytics processes in need of enhancement, meant new data processing needs were challenging their ability to manage infrastructure demands without hiring more engineers.

As their business expanded, Sauce Labs resisted the prospect of scaling headcount as the primary solution to growth. Instead, they aimed to keep their team small while still meeting accelerated infrastructure demand.

Enable global analytics

The company’s initial analytics capabilities were based on a single MySQL table that held system and customer usage metrics such as daily test and error counts. The data was less than granular. Further, having a table included in the regional OLTP database provided strictly regional data and limited analytical possibilities when compared to what a centralized database could provide.

These constraints led to the manual build of lengthy Python code. The code extracted data from MySQL and MongoDB, fed it into Parquet files, deposited it in Amazon S3, and then queried it using Amazon Athena. The Python processes took hours and required significant amounts of memory, which was not only time-consuming but limited the range of new projects the team could pursue.

“When we started talking about setting up the European data center it became clear that we needed to do something different for our analytics infrastructure,” explains Henrikson.

Screen Shot 2021-04-26 at 5.24.52 PM

Reduce the risk of ETL pipeline failures and breakages

While its existing infrastructure fed customer-facing APIs and web pages into their application, the Python code Sauce Labs was using could not handle the new data scale as demand grew, which produced an increasingly high failure rate in their ETL pipelines.

“It ran for a long time on really beefy hardware,” says Henrikson. “The Python code pulled data from MySQL and put it back into MySQL.”

For this, they needed an ETL tool that could easily scale while providing pipeline stability and security.

“ In data engineering, there are all kinds of problems that you hit, so it was important for me to have a company like Etleap that I can consider part of the team.”

Bob Henrikson, Data Engineering Manager at Sauce Labs

SOLUTION

Etleap + Snowflake

Sauce Labs saw improving upon their existing AWS Athena interactive query service by establishing a scalable data cloud as the obvious first step.

“When we started the transition, Snowflake was a pretty easy decision,” says Henrikson. “It's a modern cloud platform that has all the features a data engineer would love.”

Snowflake’s separation of computing and storage also worked particularly well for Sauce Labs in terms of its data needs. However, the challenging part of moving into a cloud data warehousing environment was the plethora of ETL tools that promised easy transformation and loading of the data into the cloud.  

“A big requirement going in was the ability to do log-based replication for both MySQL and MongoDB,” explains Henrikson. “That quickly eliminated a lot of the players.”

After trialing several options, it was clear that Etleap not only met the tooling abilities Sauce Labs required but offered support and error resolution that other services could not match. For example, a well-known competing tool took several weeks to diagnose a problem and roll out a fix. During the Etleap evaluation, the same problem arose, but a diagnosis was produced in minutes.

“It turned out that Etleap's product already had a way to work around this,” recalls Henrikson. “So, in a five-minute phone call, we're done with what took four weeks previously. That was pretty significant.”

By replicating the data from MySQL to Snowflake with Etleap, then using SQL to write transformations in the more productive and scalable Snowflake environment, Sauce Labs can limit the risk of ETL pipeline failures and breakages. With Etleap’s pre-built and pre-tested connectors, setting up pipelines no longer takes days. Instead, replication of source data can begin in minutes and Henrikson’s team can have it up and running within a single afternoon.

Screen Shot 2021-04-26 at 5.25.01 PM

RESULTS

Sauce Labs has grown its business while maintaining a slim data team

While Sauce Labs still pursues incremental development work, pipelines are operated and maintained in the Etleap infrastructure. This keeps overhead stable while freeing the team to pursue new projects.

“We’re a small data team at Sauce, but there's a team at Etleap that I can interact with and get stuff done,” explains Henrikson. “We haven’t had to hire more data engineers to do the work of maintaining pipelines.”

By moving their Python code into Snowflake’s elastic compute environment Sauce Labs achieves scale on a level that was previously out of reach. Snowflake allows the easy tuning of compute capabilities to different use cases. Sauce Labs can now consider new and novel use cases, test and build data-centric product offerings tailored to their customers, and optimize customer-facing analytics.

“You can easily increase compute power or decrease it to match whatever your needs are. There's just no way we'd be doing what we're doing today. If we didn't have the Etleap and Snowflake solution in place we definitely would need more people.”

Bob Henrikson, Data Engineering Manager at Sauce Labs

Sauce Labs has integrated new sources to enhance analytics

Internally, the Sauce Labs customer success team digs into data to understand how each customer interacts with the platform and identifies users who may need extra engagement and support. To develop targeted new features, they needed a better understanding of their customer and where they were in their journey with Sauce Labs’ service.

Etleap allows Sauce Labs to seamlessly build-out sources that can simply be plugged in, like Mixpanel, Salesforce, and Recurly. The latter yields valuable insight into customer subscriptions and payments, that when loaded into Snowflake can be combined with account information from Salesforce.

Sauce Labs has brought data to the forefront

With insights gained from running millions of tests per day on their platform, Sauce Labs can productize data in new ways and interpret what the data is telling them. When a customer is experiencing a high error rate, Sauce Labs can identify failures and clusters of failures readily, analyze them in a broader context to see if they are occurring across the entire test suite, and diagnose the issue with a high degree of accuracy. As a result, the customer gets a speedy resolution.

Most significantly, their new data ecosystem allows them to merge customer-facing and internal data needs.

“In the past, we've had different systems producing those two sides of things, and they’re always in conflict,” explains Henrikson. “Now, we're talking about serving up both customer-facing and internal analytics, from the same data in Snowflake.”

By combining Snowflake’s data cloud with Etleap, Sauce Labs has transformed its analytics platform, allowing them to chart an expansion on its terms. Data is no longer a constraint but rather a tool to enable growth and foster innovation.

“The tooling that Etleap provides, mixed with the huge productivity gains with Snowflake, allows us to take on some of these tasks that we'd otherwise say, ‘No, we have our hands full already.

Bob Henrikson, Data Engineering Manager at Sauce Labs

About Sauce Labs

Sauce Labs is the leading provider of continuous testing solutions that deliver digital confidence. The Sauce Labs Continuous Testing Cloud delivers a 360-degree view of a customer’s application experience, ensuring that web and mobile applications look, function, and perform exactly as they should on every browser, OS, and device, every single time. Sauce Labs is a privately held company funded by TPG, Salesforce Ventures, IVP, Adams Street Partners, and Riverwood Capital. For more information, please visit https://saucelabs.com.

About Etleap

Etleap’s mission is to make data analytics teams more productive. Etleap’s ETL solution lets analysts build data warehouses without internal IT resources or knowledge of complex scripting languages. This reduces the time of typical ETL projects from weeks to hours, and takes out the pain of maintaining data pipelines over time. Etleap is backed by world-class investment firms First Round Capital, SV Angel, BoxGroup, and Y Combinator.

Tags: Case Studies