Nowadays lot of valuable data is locked inside Portable Document Format (PDF) documents instead of being available in ready-to-use format. Fortunately there are a number of PDF to Excel converters to choose from.
EXCEL
First of all, many people don’t know that Excel can import PDFs directly — but only if you’ve got a Microsoft 365 or Office 365 subscription on Windows. It was a good choice for the simple file but got more cumbersome to use as PDF complexity rose. It’s also likely to be confusing to people who aren’t familiar with Excel’s Power Query / Get & Transform interface.
How to import a PDF directly into Excel:
In the Ribbon toolbar, go to Data > Get Data > From File > From PDF and select your file. For a single table, you’ll likely have one choice to import. Select it and you should see a preview of the table and an option to either load it or transform the data before loading. Click Load and the table will pop into your Excel sheet.
For a single table on one page, this is a quick and reasonably simple choice. If you have multiple tables in a multi-page PDF, this also works well — as long as each table is confined to one page. Things get a bit more complex if you’ve got one table over multiple PDF pages, though, and you’ll need knowledge of Power Query commands.
Adobe Acrobat Export PDF
As the creator of the Portable Document Format standard, Adobe is doing great job in parsing PDFs. A full-featured conversion subscription is somewhat pricey, but there’s also an inexpensive $1.99/month plan (annual subscription required) that includes an unlimited number of PDF to Excel conversions. (You can output Microsoft Word files with this tool as well).
The Excel conversions include any text on pages that have both text and tables. This can be a benefit if you’d like to keep that context or a drawback if you just want data for additional analysis.
AWS Textract
For an AWS cloud service, Textract is surprisingly easy to use. While you certainly can go through the usual multi-step AWS setup and coding process for Textract, Amazon also offers a drag-and-drop web demo that lets you download results as zipped CSVs.
You just need to sign up for a (free) Amazon AWS account.
Cometdocs
This web-based service is notable for multiple file format conversions: In addition to generating Excel, it can download results as Word, PowerPoint, AutoCAD, HTML, OpenOffice, and others. Free accounts can convert up to five files per week (30MB each); paid users get an unlimited number of conversions (2GB/day data limit).
Cometdocs is a supporter of public service journalism; the service offers free premium accounts to Investigative Reporters & Editors members.
PDFTables
PDFTables performed well on most of the app-generated PDF tables, even understanding that a two-column header would be best as a single-column header row. It did have some difficulty with data in columns that were mostly empty but also had some data in cells spread over two lines.
A key advantage to this service is automation. Its API is well documented and supports everything from Windows PowerShell and VBA (Office Visual Basic for Applications) to programming languages like Java, C++, PHP, Python, and R.
PDFtoExcel.com
This is a freemium platform with paid options. It proved to be the lone free choice that was able to handle our scanned nightmare PDF. Nice balance of cost and features. This was most compelling for complex scanned PDFs, but others did better when cell data ran across multiple lines.
Tabula
Unlike some free options from the Python world, Tabula is easy both to install and to use. And it has both a command-line and a browser interface, making it equally useful for batch conversions and point-and-click use.
Tabula did very well on PDFs of low or moderate complexity, although it did have an issue with the complex one (as did many of the paid platforms). Tabula requires a separate Java installation on Windows and Linux.
You can read more about PDF to Excel conversion here.
Teknita has the expert resources to support all your technology initiatives.
We are always happy to hear from you.
Click here to connect with our experts!
0 Comments