If you have to deal with legacy XFA forms in your workflow, read on for a complex but scalable method to convert all of them into ordinary PDFs in one go.

In the beginning Adobe LiveCycle was created. This had made many people very angry and has been widely regarded as a bad move…except by courts. Sadly many are still in the wild.

Please wait...if this image is not eventually replaced by the contents of your document, your PDF viewer may not be able to display this type of document.
The ugly message that people have seen for years when trying to open an Adobe LiveCycle PDF. Chrome and FireFox are starting to support the format natively.

LiveCycle forms can have a lot of interactive features. Pretty often though, courts just used LiveCycle as a pretty design tool and printing the form on paper still leaves it usable. At Suffolk, we are building a Form Explorer tool that will process all of a state’s forms in one go. It’s important for us to be able to work with all of the various kinds of forms that a state has in a consistent way. The small but still sizeable number of XFA forms makes it much harder to compare forms to each other–and also to automate the form in Docassemble, which expects any form fields to be in the more standard Acroform format.

It wasn’t going to be enough to use a proprietary tool that we can’t redistribute, since our whole toolchain is something we want other people to be able to use for free.

So, I started looking into methods to convert XFA forms into standard PDFs that would work in batch and that could be fully automated. A different part of our workflow will process and add standard Acroform fields to those standard PDFs.

To get it out of the way: pdftocairo does not support rendering XFA forms into another visual representation. Neither does pdf2ps, qpdf, or pikepdf. While there are many proprietary tools and libraries that render XFA files, something we couldn’t freely share wasn’t going to work. The only open source options appeared to be Chrome and Firefox. After finding and discarding multiple closed source tools that would have been much easier to automate, I finally landed on this workflow:

  • Use Mozilla Firefox’s pdf.js to render the PDF in server mode
  • Use Chrome headless to visit the local pdf.js server and save the rendered page back to a PDF

This workflow came from a very helpful suggestion on the Mozilla Firefox Element chat.

Note that I use Ubuntu under Windows Subsystem on Linux. Some of the steps may be simpler if you are using a native Ubuntu Linux machine.

  1. I did a git clone git@github.com:mozilla/pdf.js.git
  2. I had previously installed node.js
  3. Inside the pdf.js directory, I ran npm install -g gulp-cli followed by npm install
  4. Then to start up the server, I ran gulp-server

This worked great! I was able to visit http://localhost:8888/web/viewer.html and open and even save an XFA PDF from inside Chrome.

Next, I had to copy the XFA forms into a subdirectory of the pdf.js repository. For security reasons, the server can only access local files that are in repository directory. I made a new directory, mkdir xfa and placed the files in that directory. Then I was able to directly open a link to a PDF with a URL like this: http://localhost:8888/web/viewer.html?file=/test_xfa/XFA-PDF-Sample.pdf, where /test_xfa/ was a directory directly beneath the pdf.js directory.

My first attempt was to run google-chrome headless directly, but I kept getting plain white PDFs without any content. My guess was that this was caused by some kind of race condition, where Chrome tried to print the PDF before pdf.js had finished the work of rendering it. So I switched to a method that allowed me to add a delay.

First, I installed google-chrome.

This blog post pointed me to a method to print to PDF with a custom delay. It was simple enough.

First, I created a new directory in my home directory. mkdir test_render.

I installed html-pdf-chrome with npm by typing npm install –save html-pdf-chrome.

Next, I followed the directions on the html-pdf-chrome documentation page to download and configure pm2. I then started up an instance of Chrome on port 9222

# install pm2 globally
npm install -g pm2
# start Chrome and be sure to specify a port to use in the html-pdf-chrome options.
pm2 start google-chrome \
  --interpreter none \
  -- \
  --headless \
  --disable-gpu \
  --disable-translate \
  --disable-extensions \
  --disable-background-networking \
  --safebrowsing-disable-auto-update \
  --disable-sync \
  --metrics-recording-only \
  --disable-default-apps \
  --no-first-run \
  --mute-audio \
  --hide-scrollbars \
  --remote-debugging-port=9222

Next, I created my own print.js file with the following contents. I had to extend the timeout–5000 ms was not enough. I also had to specify the port, as something (perhaps WSL) prevented node from dynamically creating a new instance of Chrome. On plain Ubuntu that might not be needed; but on the other hand, if you are batch processing PDFs you probably want to run it the background anyway.

const htmlPdf = require('html-pdf-chrome');

const options = {
    port: 9222,
    completionTrigger: new htmlPdf.CompletionTrigger.Timer(50000),
    chromeFlags: ['--disable-web-security', '--headless']
};

const url = 'http://localhost:8888/web/viewer.html?file=/test_xfa/XFA-PDF-Sample.pdf';
let pdf = htmlPdf.create(url, options).then((pdf) => pdf.toFile('./output.pdf'));

Finally, I ran my print.js file by typing node print.js.

Hope this is helpful! It should be relatively straightforward to tweak these instructions to process a whole folder full of XFA forms.

Other thoughts: it would be great if Mozilla’s pdf.js library was extended so that the pdf.js-dist commandline version was able to handle XFA forms. You can render a PDF to an image, and one idea I had was a workflow of PDF=>Image=>PDF. But trying to use the image conversion examples in the commandline version with an XFA form just rendered the same unhelpful message about installing Adobe Reader.


Leave a Reply

Your email address will not be published. Required fields are marked *