Perth, Western Australia - 6th to 10th January 2014

<-- Back to schedule

The best CTDB bugs ever!

Project: CTDB

CTDB is the Clustered Trivial Database (TDB) and is used to provide clustering support for Samba. The CTDB daemon is highly asynchronous, event-driven, low latency, single-threaded, non-blocking and just a bit scary. CTDB makes extensive use of the talloc and tevent libraries.

While trying to improve performance, we found lengthy packet queues, extreme CPU usage and nodes running out of memory. This forced us learn some basic performance analysis and it wasn't rocket science! Fixing the performance issues required trivial data structure improvements, some code refactoring and even redesign of subsystems. The resulting performance improvements have been stunning and allow CTDB to support thousands of concurrent SMB connections per node.

On systems with idle CPUs we have seen tevent complain that is has taken a long time - sometimes minutes - to process events that normally take milliseconds. How can this happen? Why has this become our indicator for general system performance issues?

When can a simple "time robustness" test render a cluster useless?
What triggered race conditions that caused (apparently) inexplicable hangs?
Why would a pointer comparison stop CTDB from shutting down once in a while?

We will take you on an entertaining journey through our "favourite" CTDB bugs and the lessons we have learned. You'll laugh, you'll cry, you'll nod in agreement and shake your head in disbelief. You will be amazed!

Amitay Isaacs

Amitay Isaacs is a Linux hacker for last 20 years and he has been using Linux in Engineering and Scientific Computing. His interests are distributed systems, high performance computing and optimization algorithms. In recent years he has been working on Samba and is current maintainer of CTDB.

Martin Schwenke

Martin Schwenke has been developing Open Source software for nearly 15 years. Before that he did research into functional programming, lectured in computer science and did system administration. His early hacks mostly relate to Emacs. In recent years he has been hacking on CTDB, the Clustered Trivial DataBase used for clustering Samba.