Distributed CSV Parsing

tl;dr: This post is about how to split and process CSV files in pieces! Newline characters within fields makes it tricky, but with the help of a finite-state machine it’s possible to work around that in most real-world cases.

Comma-separated values (CSV) is perhaps the world’s most common data exchange format. It’s human-readable, it’s compact, and it’s supported by pretty much any application that ingests data. At Etleap we frequently encounter really big CSV files that would take a long time to process sequentially. Since we want our clients’ data pipelines to have minimal latency, we split these files into pieces and process them in a distributed fashion.

Continue reading “Distributed CSV Parsing”

Typescript at Etleap

Typescript has been getting significant attention in the past year and with over 2 million downloads per month on npm, there has undoubtedly been an increase in adoption. However, many people are still unsure if Typescript will benefit their project, and there are few resources that show how Typescript can be used in large projects and what the practical benefits are. In this post we aim to highlight how we use Typescript at Etleap so that people can get an impression of why we decided to use it and how we benefit from it.

Continue reading “Typescript at Etleap”