PDF to Excel is the largest monster in the ETL realm. It is powered by an industrial-strength PDF conversion utility that is capable of extracting thousands of pages in milliseconds. The extracted pages disappear into the maw of a custom Raw Data Parser that digests the extracted pages and spits out a clean comma-separated values file (CSV.)
Once the CSV has been prepared, Excel imports it via Power Query, where the data is further cleaned, shaped and filtered. The last step is to load the data to a table in a new Excel workbook.
The PDF to Excel monster is too powerful to be placed into the hands of mere mortals. If you need this kind of raw power on your desktop, you’ll have to become a wizard who can cast the following spells:
- Regex: Arcane regular expressions enhance the power of the Raw Data Parser
- M: the scripting language of Power Query
- Visual Basic: the language of the Raw Data Parser
In addition, your inventory must include the all-powerful PDF converter, the latest Excel software, Visual Studio and a Regular Expression cauldron.