As a company grows, it continually seeks out solutions that help its teams achieve better performance and scale of their data analytics, especially when business growth has eclipsed current capabilities. Migrating to a new architecture is often a key component of this. However, a migration path that is painless, flexible, and supported is not always available.
In this post, we walk through how Tophatter—a virtual auction house where buyers and sellers interact, chat, and transact in diverse categories—recently migrated from DS2 to RA3 nodes in Amazon Redshift. We highlight the steps they took, how they improved stability and lowered costs, and the lessons other companies can follow.
Tophatter stores the majority of their product data in MySQL databases, while sending some webhook and web events to Amazon Simple Storage Service (Amazon S3). Additionally, some of their vendors drop data directly into dedicated S3 buckets. Etleap integrates with these sources. Every hour (according to the schedule configured by Tophatter), Etleap extracts all the new data that has been added or changed in the source, transforms the new data according to the pipeline rules defined by the user in the UI, and loads the resulting data into Amazon Redshift.
Tophatter relies on Mode Analytics and Looker for data analysis, and uses Etleap’s model feature based on Amazon Redshift materialized views to persist the results of frequently used business intelligence (BI) queries. Tophatter configures the update schedule of the model to happen at defined times or when certain source tables have been updated with new data.
Ultimately, these critical data pipelines fuel Tophatter dashboards that both internal analysts and users interact with.
The following diagram illustrates how Tophatter uses Etleap’s AWS-native extract, transform, and load (ETL) tool to ingest data from their operational databases, applications, and Amazon S3 into Amazon Redshift.
Before the migration, Tophatter’s team operated 4 DS2 Reserved Instance (RI) nodes (ds2.xlarge) in Amazon Redshift, which use HDD drives as opposed to relatively faster SSDs.
As their user base expanded and online auction activity increased exponentially, Tophatter’s ETL needs grew. In response, Etleap seamlessly scaled to support their increased volume of ingestion pipelines and materialized data models. But Tophatter’s Amazon Redshift cluster—which they managed internally—wasn’t as easy to scale. When Amazon Redshift usage increased, Tophatter had to resize the cluster manually or reduce the frequency of certain analytics queries or models. Finding the optimal cluster size often required multiple iterations.
Due to the time-sensitive nature of data needed for live online auctions, Tophatter used automated monitoring to notify on-call engineers when data pipeline latency had exceeded the desired threshold. Latencies and errors began to pop up more frequently—at least once or twice a week. These events caused distress for the on-call engineers. When the issue couldn’t be resolved internally, they notified Etleap support, who typically recommended either canceling or reducing the frequency of certain long-running model queries.
While the issue was still being resolved, the latencies resulted in downstream issues for the analytics team, such as certain tables being out of sync with others, resulting in incorrect query results.
To improve stability and reduce engineering maintenance, Tophatter decided to migrate from DS2 to RA3 nodes. Amazon Redshift RA3 with managed storage is the latest generation node type and would allow Tophatter to scale compute and storage independently.
With DS2 nodes, there was pressure to offload or archive historical data to other storage because of fixed storage limits. RA3 nodes with managed storage are an excellent fit for analytics workloads that require high storage capacity, such as operational analytics, where the subset of data that’s most important continually evolves over time.
Moving to the RA3 instance type would also enable Tophatter to capitalize on the latest features of Amazon Redshift, such as AQUA (Advanced Query Accelerator), Data Sharing, Amazon Redshift ML, and cross-VPC support.
Tophatter understood the benefits of migrating to RA3, but worried that their 3-year DS2 Reserved Instances commitment would present a roadblock. They still had 2 years remaining in their agreement and were unsure if an option was available that would allow them to change course.
They found out about the AWS RA3 upgrade program from the AWS account team, which helps customers convert their DS2 Reserved Instance commitments into RA3 without breaking the commitment agreement. This path enables you to seamlessly migrate from your legacy Amazon Redshift clusters to one of three RA3 node types: ra3.xlplus, ra3.4xlarge, or ra3.16xlarge.
Tophatter’s engagement with the program consisted of five steps:
Tophatter had the opportunity to evaluate three options for their RA3 migration:
Tophatter worked closely with their AWS account team for the migration and chose elastic resize due to the minimal downtime that option presented as well as prior experience using it. They completed the migration in under 2 hours, which included the requisite pre- and post-testing. As a pre-migration step, they took a snapshot of the cluster and were prepared to restore it if something went wrong.
After migrating to a 2 node ra3.4xlarge cluster, the Tophatter team realized numerous benefits:
The following graph illustrates the average latency of a typical ingestion pipeline before and after the migration from DS2 to RA3.
When Tophatter set out to migrate their DS2 instances to RA3, they were unaware of the AWS RA3 upgrade program and its benefits. They were happy to learn that the program avoided the administrative overhead of getting approvals ahead of time for various specific configurations, and let them try several options in order to find a stable configuration.
Tophatter migrated from DS2 to RA3 without breaking their current commitment and enhanced their analytics to be agile and versatile. Now, Tophatter is aiming to realize greater scaling benefits to support its exponential growth by exploring new RA3 features such as:
We hope Tophatter’s migration journey can help other AWS customers reap the benefits from the AWS RA3 upgrade program from DS2 or DC2 cluster families to RA3. We believe this enables better performance and cost benefits while unlocking valuable new Amazon Redshift features.