Friday, October 12, 2012

How to convert from Confluence to Sphinx

A collaboration between Frank and Jody has packaged up the uDig User Guide and Developers Guide into the ever fashionable "Sphinx" documentation system.

Sphinx is used by GeoMooseGeoServer, GeoTools, MapServer, OSGeoLive and well everyone! Sphinx uses Rich Structured Text (RST) to captures documentation as text files. This allows us to manage the documentation along side our source code when making a release.

Kudos to Frank for the excellent research in setting this up so we can use git to publish straight to github.

The same toolchain is used to publish the user guide into the online help included with the application.

With those improvements working out - the wiki has slowly started to fade in anticipation to shutting down. And developed an unexpected Moved to Github link at the top of each page.

Hooking Sphinx up to the Java Maven / Ant Build Chain

Justin DeOlivera gets the credit for this approach, used by the GeoServer and GeoTools projects.
  1. Add a pom.xml build plugin compile target for the maven-antrun-plugin
  2. Set up a build.xml to run sphinx, taking care to check that it is available
The above build.xml is especially recommended, as out of the box Sphinx produces a make.bat and Makefile (which does little good in a Java tool chain).

Conversion from Confluence (Textile) to Rich Structured Text (RST)

Thanks to Paul for the initial conversion scripts, I was able to use them as a starting point when from the Confluence wiki textile format to the Rich Structured Text format used by sphinx.

I ended up going with Pandoc and which converts one file at a time, with a java BulkExport script that calls Pandoc multiple times, and then cleans up the mess produced by confluence, copies the images over, fixes some header levels and generally gives it a good go.

Usage: java html.BulkConvert [index.html] [rst directory]
Where:
  index.html Where you have unzipped the confluence html export
  rst directory location where you would like the html files saved
If not provided the appication will prompt you for the above information

If any other project is considering making the change the source code is here.

Conversion of Open Office (ODT) to Rich Structured Text (RST)

There are also scripts covering conversion of Open Office documentation to RST. The odt2sphinx script does a fairly good job, but cannot handle image references. Breaking the link in Open Office and then converting produces some very amusing image names, resulting in java ImageRename script:
 Usage: java html.ImageRename [file.rst] [rename.properties]
Where:
  file.rst Used to locate your odt2sphinx files
  rename.properties used to rename files in your images folder
If not provided the appication will prompt you for the above information

Checking Eclipse Help TOC.XML files

The final bit of quality assurance is enforcing a "no page left behind" policy. Set up as a normal JUnit Test case, we rely TocCheck.java to throw an error message if the toc.xml file missed a page, or contains a link to a page that no longer exists.
<topic href="EN/uDig Overview.html" label="uDig Overview">
</topic>

As an added bonus it will send the XML fragment (such as the above) required to fix the problem to standard out (for a quick cut and paste fix).

No comments: