10 June 2015

Self-publishing with pandoc, etc

Depending on your thinking, I'm either just done with or approaching the end of a sabbatical. ("Just done with" if you think that once commencement occurs, it's just a regular summer.) Among the things I produced in the past few months is a very short "note" on a topic that doesn't fall within my usual area of research. I sent it to a couple of OA journals, but neither wanted to publish it as is. I'm not interested in doing more with it at this time, but it seems silly to have it just sit on my hard drive doing nothing. It's the kind of thing I'd do as a conference paper, if I went to a conference at which I think it'd be welcome. But since I don't go to such conferences, I figure I'll just put it out there for people to check out anyway. (The advantages of tenure and the internet!)

I could do it as a blog post, though it's already written in a more "academic" style than I write this blog in. Instead I'm going to post it as html on github and as a pdf on my account at figshare, where it's easily accessible, archived, and even gets a DOI. I'll also link to it from my academia.edu page (as well as here, obviously).

The Workflow

I've started using markdown with pandoc to generate documents. I was inspired by Dennis Tenen and Grant Wythoff's post last year, "Sustainable Authorship in Plain Text using Pandoc and Markdown," but I've long been a fan of avoiding proprietary formats that are likely to become obsolete (no doubt in part because I work with very old texts and materials professionally). It's easy enough to do simple stuff this way, but getting to more complex documents requires some work. Here's a list of stuff I do/use:

  • For editing my markdown documents, I use the free MacDown, which gives a nice split screen, showing the raw markdown on the left and the interpreted version on the right. There are a number of pandoc "enhancements" to markdown that MacDown can't handle, but it gets the vast majority of the formatting right and it prevents me from making stupid mistakes in that majority.
  • I keep all my bibliography in Zotero. I export it all as a a bibtex file using Better BibTeX, which provides some nice customization of the export entries. Once this bibtex file is created, I can easily cite the works in it within markdown and then let pandoc-citeproc expand them as appropriate.
  • I've given up on using pandoc to produce final versions of the same file in different formats. I'm mostly interested in html, OpenDoc, pdf and—sadly—Word. There are just too many complications in academic documents (footnotes, etc) and my skills and time are limited. LateX PDFs have a certain look to them, but anything I can print, OS X can turn into a PDF, so that's not a big deal for me. In most other cases, I don't need both html and odt/docx versions, so I can skip it there too. (I was really hoping that I could generate my CV in html and PDF directly, since I've been maintaining html and odt versions, the latter of which I turned into PDF via OS X, but I've yet to get enough into LateX to be able to reproduce my mildly complicated CV format.)
  • For html, I use a few different versions of my standard css file. It's all up on github, so you can see what I've done. I just discovered rawgit.com, so now I'm converting my html files to refer to the css files there, instead of on the institutional server that I've been using (and which has recently become a bit more difficult to keep updated from home). FYI, I usually edit css field manually with BBEdit, but I'm trying out CSSEdit, which seems to have been EOL'ed.
  • On the pandoc side, I've tweaked the default templates for html, odt and docx, so that they can handle multiple authors, as well as a license field in the header. Again, all on github.
  • To tie it together a bit, I wrote an applescript that takes the frontmost document in MacDown and runs it through pandoc, outputting whatever type of file you want (based on extension) and also allowing user-inputted pandoc switches.
The overall process then runs like this: write in MacDown, incorporating the citation keys from Zotero; process that with pandoc to generate the desired final file type; publish/share/whatever.

It's seems easy when I write it like that.

Numbering for citation

My one concern with html output of the note that I just wrote was that html has no default pagination, and pages are usually the way one cites an article. So instead of numbered pages, I decided to go with numbered paragraphs. (Read about Sebastian Heath's approach to articles he's editing for ISAW.) But how to number them, so that the numbers were visible (for easy citation) and so that I didn't have to manually put them in? With a little help from the pandoc Google group, I combined some features of pandoc with css. Since pandoc automatically gives IDs to headers and css allows for formatting those headers and even for auto-numbering them, I put a nearly empty level-6 header at the start of each paragraph in my markdown document and I used css to number them and put that number off in the margin. (They're nearly empty because markdown won't create empty headers.) Although the numbers are visible to a human reader, the IDs aren't ideal: section, section-1, section-2, and so on, but they are sequential and linkable. The headers are also a bit ugly in markdown, but they work and they also make it possible for me to indicate logical paragraphs instead of the actual ones. This is useful, for example, when there's a block quote, which technically creates a new paragraph, right in the middle of a logical paragraph.

One more thing, since I'm using css to number the paragraphs in the html version, those numbers are technically part of the display of the article, not part of its content. So you can see them, but you can't find them if you search in your browser. That's not the case in the PDF; there the numbers are "real" and you can find them in a search.

The Article

So about the article itself...it has to do with the original nature of the Golden Calf in Exodus 32. I speculate that it was in origin a "corn calf," to be associated with a lost harvest ritual. Go have a read:

No comments:

Post a Comment