Announcing Camelot, a Python Library to Extract Tabular Data from PDFs

I originally wrote this post for the SocialCops engineering blog.

Photo by Carles Rabada on Unsplash

The PDF (Portable Document Format) was born out of The Camelot Project to create “a universal way to communicate documents across a wide variety of machine configurations, operating systems and communication networks”. Basically, the goal was to make documents viewable on any display and printable on any modern printer. PDF was built on top of PostScript (a page description language), which had already solved this “view and print anywhere” problem. PDF encapsulates the components required to create a “view and print anywhere” document. These include characters, fonts, graphics and images.

A PDF file defines instructions to place characters (and other components) at precise


