Preventing database connection leaks

At Etleap, what we do is help customers ETL their data. Our customers consume data from many different services like Salesforce, Marketo and Google AdWords, but the vast majority of them also have data in traditional SQL databases.

From a recent company offsite: drinking coldbrew next to a coffee grinder that looks like a fire hydrant is another thing that we do
From a recent company offsite: drinking coldbrew next to a coffee grinder that looks like a fire hydrant is another thing that we do

There are many strategies for data warehousing and it can be confusing with so many buzzwords flying around (beware of your data lake collapsing into singularity due to data gravity!). Increasingly we see that many customers are settling on aggregating all their data into Amazon Redshift.

As you can see, Etleap needs to connect to customers’ SQL databases on both ends of the data pipeline. There are many things that we do to ensure your data moves from one side to the other securely and consistently, and we will write about them on this blog. Today I am going to write about database connection leaks.

Database connection is a precious resource. Each Redshift cluster only allows 500 concurrent connections, for example. If we don’t manage our connections carefully, it will directly affect our customers’ ability to query their data.

Since we are mostly a Java shop, we use JDBC extensively, along with HikariCP to manage our database connections. Our first line of defense in preventing leaks is Java 7’s try-with-resources, which means we don’t have to remember to explicitly close connections:

try (Connection connection = dataSource.getConnection()) {
// do something with the connection
}

This obviously only helps if we remember to use try-with-resources in the first place. Also, sometimes it’s not trivial to structure the code so that the connection can be opened and closed in the same block. Fortunately, HikariCP has a built-in connection leak detection mechanism, which is based on timeouts:

HikariConfig hikariConfig = new HikariConfig();
hikariConfig.setLeakDetectionThreshold(60*1000);

And if a leak is detected, we will see this in the log:

11:28:50.535 [warn] [Hikari Housekeeping Timer (pool HikariCP Pool 1 (small) example example.com 3306)] com.zaxxer.hikari.pool.LeakTask - [p:] Connection leak detection triggered for connection Connection@6494a9b8, stack trace follows
java.lang.Exception: Apparent connection leak detected
at service.JDBCConnectionService.getConnection(JDBCConnectionService.java) ~[classes/:na]
...

From HikariCP’s wiki:

⌚leakDetectionThreshold
This property controls the amount of time that a connection can be out of the pool before a message is logged indicating a possible connection leak. A value of 0 means leak detection is disabled. Lowest acceptable value for enabling leak detection is 2000 (2 secs). Default: 0

Leak detection is disabled by default because incorrect timeout values would lead to frequent false positives. There are many methods and techniques for minimizing Redshift loading time (Redshift’s Best Practices for Loading Data is a good starting point), but loading large amounts of data can still take a very long time. Contention with other database queries can also make it tricky to determine what the correct timeout should be. At Etleap, we set it to a moderately high value and live with occasional false positives.

Our third line of defense is JDBC lint. It has an extensive list of checks that it can perform but we only use it for connection leak detection:

Configuration jdbcLintConfig = new Configuration(EnumSet.of(Check.CONNECTION_MISSING_CLOSE),
Arrays.asList(Configuration.PRINT_STACK_TRACE_ACTION));
connection = ConnectionProxy.newInstance(connection, jdbcLintConfig);

Under the hood, it works by checking if close() has been called at object finalization time. So if the connection is leaked, and Java’s GC kicks in, and object finalization is performed, we will see this in the log:

java.sql.SQLException: java.sql.SQLException
at com.maginatics.jdbclint.Configuration$1.apply(Configuration.java:71)
at com.maginatics.jdbclint.Utils.fail(Utils.java:30)
at com.maginatics.jdbclint.ConnectionProxy.finalize(ConnectionProxy.java:128)
...
Caused by: java.sql.SQLException
at com.maginatics.jdbclint.ConnectionProxy.(ConnectionProxy.java:40)
at com.maginatics.jdbclint.ConnectionProxy.newInstance(ConnectionProxy.java:60)
at service.JDBCConnectionService.getConnection(JDBCConnectionService.java) ~[classes/:na]
...

Unlike HikariCP’s leak detection which can have false positives, JDBC lint doesn’t detect all the errors so it can have false negatives. In particular, there’s no guarantee that a particular leaked object is garbage collected. And after an object is GC’ed, when or even if finalizer is run is also JVM implementation dependent. One way to verify that our setup is working by inducing a leak ourselves and coercing the JVM to do the right thing:

connectionService.getConnection(); // leak
System.gc();
System.runFinalization();

So in a nutshell, we ensure we don’t leak database connections by:

  1. Adopting good coding practice (try-with-resources)
  2. Investing in layered defenses to ensure that we can detect a bug if we have one. Both HikariCP’s and JDBC lint’s leak detection are running 100% of the time in our production environment.

ORM frameworks often abstract away database connection handling from application code, however it’s not well suited for ETL style workloads where we connect to databases with schemas that we have no control over. There are other times when ORM is undesirable, but as this post shows we don’t have to sacrifice all the safety mechanisms of an ORM framework even when we have to roll our own SQL.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s