How Etleap automates its infrastructure process with Terraform & Ansible

Introduction

“Infrastructure as Code”, IaC, is a term every system administrator has heard by now. We can think about it as the process of managing and provisioning IT infrastructure through source-code instead of performing tasks manually. As we will explore, this helps DevOps teams efficiently and safely adapt infrastructure to meet the always-changing requirements dictated by the business. This approach helps to manage infrastructure in a way that enables the devops team to better serve the organization.

How can this paradigm help you? It encourages the adoption of software development practices like keeping infrastructure’s definition and configuration scripts in a source control system, automated code testing, and doing peer reviews. This benefits infrastructure management in numerous tried-and-true ways.

If you’re starting your journey into IaC, there are many resources you can reference to familiarize yourself with the concepts and terminology associated with this approach. Kief Morris’ “Infrastructure as Code: Managing Servers in the Cloud” is an essential book on the topic (alternatively, Martin Fowler’s blog gives a great overview).

At Etleap, we embrace IaC to build and improve our service every day. This practice helps us in our ongoing effort to make Etleap the best ETL platform it can be.

“IaC is makes it possible to effortlessly and reliably spin up any element of an infrastructure at any time, or even the entire infrastructure, in a matter of minutes.”

Let’s take a look at a few examples of how using IaC has helped Etleap build a better product.

Service uptime and disaster recovery

One advantage of IaC is that it makes it possible to effortlessly and reliably spin up any element of an infrastructure at any time, or even the entire infrastructure, in a matter of minutes. The new infrastructure will be consistent with the previous one, which is to say that its software and configuration are the same (every security patch is applied, OS is configured the same way, allocated resources are identical).

Imagine a scenario where extreme weather or a natural disaster destroys the data centers where Etleap is hosted. For obvious reasons, it’s vital that we have a plan to recover from such an ordeal. Using IaC, we’re able to easily and reliably reproduce the entire infrastructure needed by Etleap and get it running in a new data center in short order. And so, even in this extreme case we’re able to recover from a service disruption incredibly quickly.

“With IaC tools available, almost every aspect of an infrastructure’s configuration can be defined in a configuration file or scripted.”

Another common issue is configuration drift, which is a major concern for services that must ensure high availability and disaster recovery strategies. If left unchecked, configuration drift increases the risk of prolonged outages or loss of data. By making sure every change introduced to the infrastructure configuration is done through the definition files or scripts, we can totally eliminate configuration drift. This way, we reduce the risk of having misconfiguration issues when we need to re-provision our infrastructure.

Finally, to keep Etleap up and running at all times, we should be able to add more resources or replace an unhealthy component at any time. Let’s imagine that a server instance stops serving requests because it’s running out of memory. In this case we should be able to provision a new server, with more memory, and redirect the traffic to it. Etleap has dealt with a similar challenge where we encountered memory shortages when running an Amazon Elastic MapReduce cluster. After EMR had become unhealthy, we traced the root cause to memory degradation. But because the EMR cluster provisioning and configuration was scripted, it was straightforward to update the configuration and start a new cluster and point Etleap to it after it launched, with zero downtime for our users.

Improved monitoring, highly secure

With IaC tools available, almost every aspect of an infrastructure’s configuration can be defined in a configuration file or scripted. Not only physical hardware, networks, and storage, but also identity access management (IAM), monitoring, alarm systems, and much more.

Going back to our example of a server running out of memory: when things go sideways it’s essential to have a monitoring system that alerts us of these issues to avoid service outages. If we know a certain node is going into a bad state, we can take the needed action to improve its behavior or, in the worst case, replace the node outright. This way, we’re usually able to resolve the issue, before our customers notice any issues or downtime. It also makes a lot of sense having the definition of these alarms tied to the infrastructure they’re monitoring — any time infrastructure changes, its monitoring is updated as well.

IAM is hugely important when it comes to security. Meticulously defining the right access levels and ingress rules to different parts of the infrastructure is crucial for data and system protection. By restricting access to production servers we can prevent unauthorized persons from gaining access to sensitive data. Finally, audits and reviews of the configuration and any changes allow us to maintain the right access at all times. 

Etleap productization

At Etleap, IaC practices enable a repeatable deployment process. Each time we provision our infrastructure the result is a known quantity, and that’s something we take advantage of in multiple ways.

Etleap is SaaS, meaning our product runs in the cloud and our users don’t need to install or maintain anything to start using it. However, some of our customers, especially those with strict security requirements, require that Etleap runs in an isolated AWS VPC. Embracing IaC helps us efficiently deploy Etleap to a completely new environment. The installation process is well-defined and tested, and is a daily occurrence for us. This allows us to ensure that Etleap running in one environment will behave identically to another instance running in a different environment, which saves time when identifying issues and reduces the need for customers to contact the support team. Thinking of infrastructure as a product itself gives Etleap a competitive advantage, as it allows us to serve customers with complex security requirements.

“IaC not only helps manage production environments but the entire software development lifecycle.”

Running identical instances of Etleap in multiple environments also simplifies updates. For example, diagnosing and fixing a bug for a user running Etleap in his or her own VPC would be really challenging if each of the environments differed from one another. By ensuring parity between all environments where Etleap is deployed, we eliminate this potential headache.

Streamline development and delivery cycle

IaC not only helps manage production environments but the entire software development lifecycle. During development, we can provision an isolated sandbox environment to safely make changes without the risk of breaking something. We can test new changes against our sandbox environment to more quickly detect if they would negatively affect the production environment when deployed. Having each new feature or bug fix properly tested during development reduces the risk of introducing issues when changes are rolled out. Once thoroughly tested, changes are then automatically deployed in a CI/CD process, any new feature or bug fix is rolled out to our users as soon as they’re merged into the master branch.

For example, some time ago I was tasked with improving our validation process for users wanting to add or edit an S3 data lake or S3 input connection. One of our goals was to give to the user more accurate information about misconfiguration problems with their connections. In both cases, most of these configuration issues were related to incorrect policies being attached to a given IAM user. It would have been quite tedious to add all these cases manually through the AWS console. Instead, we were able to quickly and easily script the policies that matched the cases we wanted to test and roll them out to the sandbox environment.

Another case where we took advantage of our ability to effortlessly provision a sandbox environment during development was when we improved our ZooKeeper cluster. We switched from having a standalone ZooKeeper node to an ensemble of nodes. We scripted the cluster configuration and provisioned it in a sandbox environment. This way, we could test that the cluster was working as expected. We were also able to stress test the cluster out to see how it behaved. There were some questions we wanted to answer before rolling it out, like: how well does the cluster behaves when nodes are disconnected? Are new nodes automatically incorporated into the cluster? Will the master node switch to another node when it becomes unhealthy? We tested each of these scenarios in the safety of our sandbox environment without affecting production. When we finally rolled the new ZooKeeper cluster out, we could rest easy that it would work as expected, as we’d already tested against many of the possible point of failures during development.

Conclusion

By leveraging IaC, Etleap benefits in numerous ways. Hosting the infrastructure design in definition files and scripts ensures a consistent environment, where each node has exactly the desired configuration. This makes it easier and less risky to update many aspects of the infrastructure. Errors can be identified and fixed faster, or in the worst case, infrastructure can be reverted to the last functional configuration. Changes can be made quickly and with little effort, and we can easily scale by increasing the number of nodes or their size.

AWS re:Invent 2019 Roundup

Materialized Views, Amazon Redshift Ready, and more!

Last week Etleap put on another exciting show at AWS re:Invent, where we announced some new features and integrations with AWS services, were interviewed by the tech experts over at “theCUBE,” hosted a session all about data lakes, and most importantly, spoke with countless attendees about ETL. Here’s a roundup of all the Etleap action you may have missed at AWS re:Invent 2019.


Etleap’s booth was a veritable oasis of ETL discussion and Etleap product demos
Amazon Redshift Launches materialized views with help from etleap

Among AWS’ numerous announcements at re:Invent this year was the availability of Materialized Views in preview on Amazon Redshift. The Materialized Views feature is designed to help customers achieve up to 100x faster query performance on analytical workloads such as dashboarding queries from Business Intelligence (BI) tools and ELT data processing. Etleap helped launch this feature by integrating it into a beta version of Etleap Models (let us know if you want to be included in the beta!) and showing that it can give an ~8x performance boost. The Redshift team showcased our results in their chalk talk on “Accelerating performance with Materialized Views.”


Yannis (seated, left) and Vuk (standing, right) from the Amazon Redshift team showcase Etleap at their Redshift Materialized Views Chalk Talk

“We are delighted to have Etleap help launch the Materialized Views feature in Amazon Redshift,” said Andi Gutmans, Vice President, Analytics, Amazon Web Services, Inc. “Amazon Redshift Materialized Views allow customers to realize a significant boost in query performance in ETL pipelines and BI dashboards. By integrating Etleap with this new functionality, customers can seamlessly get the benefits of Amazon Redshift Materialized Views without needing to make any application changes.”

You can read the full Etleap press release about Amazon Redshift Materialized Views here.

Etleap Founder makes the case for more analyst-friendly data lakes, alongside Redshift team

Many Etleap customers use our solution to build their S3/Glue data lakes, so data lakes are a topic we’ve learned a thing or two about over the years. For re:Invent this year, we thought we’d share our data lake expertise with the world by hosting a session alongside the Redshift team entitled “Five data lake considerations with Amazon Redshift, Amazon S3 & AWS Glue.”


Etleap founder and CEO, Christian Romming, led the session focused on data lakes

Have an interest in data lakes yourself? You can check out the session here.

Etleap featured on enterprise tech talk show

After our data lakes session, Founder and CEO of Etleap, Christian Romming, sat down with the hosts of “theCUBE,” re:Invent’s resident technologies interview show. Check it out:

Etleap founder sits down with David Vellante and John Walls of theCUBE
Etleap achieves Amazon Redshift Ready Designation

Distinguishing ourselves in the Amazon Redshift partner ecosystem, we announced that Etleap has achieved the designation of “Amazon Redshift Ready,” a recently announced status among partners who have proven integration with Amazon Redshift.

Etleap was featured in the keynote announcement among a select few debuting partners

“Etleap is proud to achieve Amazon Redshift Ready status,” said Christian Romming, Founder and CEO of Etleap. “Our team is dedicated to helping companies achieve maintenance-free, enterprise-grade ETL by leveraging the agility, breadth of services, and pace of innovation that AWS provides. Our status as an Amazon Redshift Ready partner shows our continued commitment to Amazon Redshift and the AWS ecosystem.”

You can read the full Etleap press release covering the Amazon Redshift Ready announcement here.


This concludes our roundup of the biggest Etleap new stories from AWS re:Invent 2019. Stay tuned for more Etleap trade show news, and for all things ETL you’re already in the right place.

On-Demand Webinar: Etleap presents "Customer First Technology"

In this webinar, we explore how and why eMoney puts their customers first by choosing technologies that solve customer challenges and their use case for running Etleap and Looker within a highly secure VPC environment.

Ready to try Etleap for yourself? Click here to get started!

Stay tuned to this blog for more webinars and other Etleap content, and for all things ETL you’re already in the right place.

Etleap Achieves Amazon Redshift Ready designation

Recently-announced designation distinguishes Etleap on the Redshift platform

SAN FRANCISCO, Calif. – December 4, 2019 — Etleap announced today that it has achieved the Amazon Redshift Ready designation. This designation recognizes that Etleap has demonstrated successful integration with Amazon Redshift. 

Achieving the Amazon Redshift Ready designation differentiates Etleap as an AWS Partner Network (APN) member with a product integrating with Amazon Redshift and is generally available and fully supported for AWS customers. AWS Service Ready Partners have demonstrated success building products integrated with AWS services, helping AWS customers evaluate and use their technology productively, at scale and varying levels of complexity. 

“Etleap is proud to achieve Amazon Redshift Ready status,” said Christian Romming, Founder and CEO of Etleap. “Our team is dedicated to helping companies achieve maintenance-free, enterprise-grade ETL by leveraging the agility, breadth of services, and pace of innovation that AWS provides. Our status as an Amazon Redshift Ready partner shows our continued commitment to Amazon Redshift and the AWS ecosystem.”

To support the seamless integration and deployment of these solutions, AWS established the AWS Service Ready Program to help customers identify products integrated with AWS services and spend less time evaluating new tools, and more time scaling their use of products that are integrated with AWS Services.

Etleap is analyst-friendly ETL-as-a-service for Amazon Redshift and Snowflake data warehouses and Amazon S3/AWS Glue data lakes. Etleap replaces time-consuming ETL setup and maintenance with intuitive software and a managed service that automates data pipelines and reduces time to value.

For more information, email info@etleap.com; Follow us on Twitter @etleap; or Like us on Facebook @etleap.


About Etleap: Etleap was founded by Christian Romming in 2013. Before founding Etleap, Romming was the CTO of an ad-tech company, where he recognized the available solutions for building data pipelines required monumental engineering resources to implement, maintain, and scale. Etleap is backed by world-class investment firms First Round Capital, SV Angel, BoxGroup, and Y Combinator. Our mission is to make data analytics teams more productive. Our ETL solution lets analysts build data warehouses without internal IT resources or knowledge of complex scripting languages. This reduces the time of typical ETL projects from weeks to hours, and takes out the pain of maintaining data pipelines over time.

Etleap announces support for Amazon Redshift Materialized Views

Etleap customers will benefit from new technology in Etleap for faster query performance

SAN FRANCISCO, Calif. – December 2, 2019 — Today, Etleap, an Advanced Technology Partner in the Amazon Web Services (AWS) Partner Network (APN) and provider of fully-managed Extract, Load, Transform (ETL)-as-a-service, announced support for Amazon Redshift Materialized Views. The new feature is designed to help customers achieve up to 100x faster query performance on analytical workloads such as dashboarding queries from Business Intelligence (BI) tools and ELT data processing. Because Etleap was built from the ground up to handle data integration for Amazon Redshift users, including orchestration of transformations within Amazon Redshift, the company is uniquely positioned to test this new capability and provide support for it in their product.

“We are delighted to have Etleap help launch the Materialized Views feature in Amazon Redshift,” said Andi Gutmans, Vice President, Analytics, Amazon Web Services, Inc. “Amazon Redshift Materialized Views allow customers to realize a significant boost in query performance in ETL pipelines and BI dashboards. By integrating Etleap with this new functionality, customers can seamlessly get the benefits of Amazon Redshift Materialized Views without needing to make any application changes.”

“For as long as Amazon Redshift has been around, Etleap has been making some of the most complex data pipelines easier and faster for AWS users, so working with the Amazon Redshift team to improve post-load transformations with Amazon Redshift Materialized Views was a perfect fit for us,” said Christian Romming, Founder and CEO of Etleap. “Etleap was designed for AWS and delivers analyst-friendly, enterprise-grade ETL-as-a-service. By collaborating with the Amazon Redshift team on this project, we continue to show our commitment to our customers and AWS, and have taken another major step in our quest to make data integration less of a headache without sacrificing control or visibility — and we couldn’t be more excited.”

Customers value Etleap’s modeling feature, because it allows them to gain advanced intelligence from their data. One challenge for customers is the time it takes to refresh a model when data changes. Amazon Redshift Materialized Views allows Etleap to refresh model tables faster and use fewer Amazon Redshift cluster resources in the process, which frees up more resources for other Amazon Redshift workloads. This allows a customer’s engineering and analyst teams to deliver on the desired outcome more efficiently.

For more information, email info@etleap.com; Follow us on Twitter @etleap; or Like us on Facebook @etleap.


About Etleap: Etleap was founded by Christian Romming in 2013. Before founding Etleap, Romming was the CTO of an ad-tech company, where he recognized the available solutions for building data pipelines required monumental engineering resources to implement, maintain, and scale. Etleap is backed by world-class investment firms First Round Capital, SV Angel, BoxGroup, and Y Combinator. Our mission is to make data analytics teams more productive. Our ETL solution lets analysts build data warehouses without internal IT resources or knowledge of complex scripting languages. This reduces the time of typical ETL projects from weeks to hours, and takes out the pain of maintaining data pipelines over time.