This recorded session is from DataEngConf NYC 17. Slides are available on the event page.
There’s an often-quoted statistic that says that data analysts spend 80% of their time preparing data and only 20% actually analyzing it. There’s a lot that we as data engineers can do to help our analytics teams be more productive and spend less time worrying about data preparation. This session discusses common problems in data warehousing infrastructure from the point of view of analytics teams, and suggests practical solutions.
Watch the session video or read the key takeaways below. Continue reading “Building ETL Infrastructure that Analysts Love”
Continue reading “Reducing the size of your Webpack bundle”
There are a few requirements for a good password reset token:
- user should be able to reset their password with the token they receive from in an email
- the token should not be guessable
- the token should expire
- user should not be able to re-use token
Ideally, the web framework of your choice should already have a built-in way to generate reset tokens. However, we use Play and it does not provide a way to do that, so we have to roll our own.
Continue reading “Generating password reset tokens”
tl;dr: This post is about how to split and process CSV files in pieces! Newline characters within fields makes it tricky, but with the help of a finite-state machine it’s possible to work around that in most real-world cases.
Comma-separated values (CSV) is perhaps the world’s most common data exchange format. It’s human-readable, it’s compact, and it’s supported by pretty much any application that ingests data. At Etleap we frequently encounter really big CSV files that would take a long time to process sequentially. Since we want our clients’ data pipelines to have minimal latency, we split these files into pieces and process them in a distributed fashion.
Continue reading “Distributed CSV Parsing”
Typescript has been getting significant attention in the past year and with over 2 million downloads per month on npm, there has undoubtedly been an increase in adoption. However, many people are still unsure if Typescript will benefit their project, and there are few resources that show how Typescript can be used in large projects and what the practical benefits are. In this post we aim to highlight how we use Typescript at Etleap so that people can get an impression of why we decided to use it and how we benefit from it.
Continue reading “Typescript at Etleap”
At Etleap, what we do is help customers ETL their data. Our customers consume data from many different services like Salesforce, Marketo and Google AdWords, but the vast majority of them also have data in traditional SQL databases. Continue reading “Preventing database connection leaks”