Writing and Revision Control

I have been doing a lot of writing lately and was interested in automatic versioning so I could see the results of writing over time and how things change. I think that it would be really interesting to see a visualization of a book being written from scratch. Normally you only see the end product; tracking changes over time would allow others to see the sausage being made. This could be useful for teachers to help their students improve their process, for writers to analyze their craft, or for aspiring writers to see how books really get written.

Here’s a demo of what I envisioned using a recent blog post that I wrote using the following method.

The system uses git for version history. I also used a Vim hook that checks in the current file on buffer writes:

cabbr autocommit call Autocommit()
fun! Autocommit()
  au BufWritePost * silent !git add <afile>
  au BufWritePost * silent !git commit <afile> -m 'Generated commit'
endfu

This is about the finest grain of editing that I can imagine being useful and that was practical to do. Anything lower-level and you’re probably looking at the document as the cursor is moving around. Commits are nearly instantaneous, and you can amend commits to explain complicated changes. Git branching seems to work well with this system. Hence, you can have multiple streams of writing. If you’re working with other people, you could be writing a new chapter when you get some feedback on the last chapter which you would like to add. Simply create a branch from the time that you sent the document out, and you should be able to see exactly what the reviewer saw. In addition, authors of collaborative works can use the push/pull functionality to manage copies, which is probably better than emailing documents around. See this page on collaborative writing for more ideas.

As far as the current visualization goes, I used a Ruby suite that I found called DocDiff. I think that this based on or uses wdiff, a difference engine focused on words. Based on my understanding, wdiff writes each word to a line and uses the standard GNU diff algorithm to detect changes. Anyway, DocDiff seemed to fit for a rough visualization of the changes between each commit, so I used this and hacked together the navigation with some further scripting.

Improvements could include at least:

a more accurate diff function
showing diffs in the opposite direction when you go backwards in time
representing branching
advancing changes automatically instead of manually
showing sections moving smoothly in real-time
Doogie Howser typewriter noises :)
highlighting backgrounds a different color when the final words used are in their correct places
showing commit information in a corner for context
showing the product over time based on the source (think PDFs with images)

For blog writing, I’ve been pretty happy sticking with HTML, although Markdown would probably be better. For longer works, I have recently found Pandoc, which is a Haskell-based Markdown-and-more implementation with fewer bugs than the standard Markdown interpreter. You get support for other file formats, conversion between file formats, and the ability to write documents to PDF using LaTeX! LaTeX is nice for editing large works, but it can be cumbersome to read at times. Pandoc allows you to use Markdown for most things, and then switch to LaTeX mode for things like equations. Markdown seems to play well with versioning and seeing changes over time as well.

Non-programmers

I pretty much agree with all of the points Derek Hammer brings up in Personal Source Control. On a Linux machine, I would contend that git is pretty painless to set up, although you still need to realize that you’re starting something worth tracking. It’s not automatic yet. Plus, there’s the learning curve of actually using the system. It’s not fully in the background. And to the overall points, I definitely think that putting your local files and documents under version control is a great idea. If it’s so easy that users don’t need to think about it, then it’s an advance.

I read a book a couple of months ago called About Face 3: The Principles of Interaction Design, and the authors talked about going to a more document-oriented user interface model. The book gives some interesting examples to why this is a good idea.

Pretend you have never used a computer before. Would it be intuitive to rename the word processing document you are currently editing by clicking “Save As…” and saving it as a different file and then deleting the original? I think this makes little sense because thinking about files is more of an implementation detail than how people actually think about their projects. I don’t really care if the computer is storing my document in a hard drive, the cloud, or a shoebox, I just want to be able to rename a file I currently have open.

Versioning of documents should undergo the same analysis. Manually choosing to track changes and then being inundated with the visualization is a chore. And when users forget to track changes…? Complex merging is a chore regardless of how diligent users are.

While I was looking for a system that would check in text files automatically and push to a central repo when I was ready, I saw an article about flashbake, a git- and cron-based system of recording writing changes. While I’m not all that interested in what song was playing when I committed, having contextual information might be helpful in some cases.

Google Wave has a pretty nice way of replaying waves (conversations) that resembles the revision tracking system in a wiki. I could see using this for a personal knowledge management system when it comes out. It’s definitely nice because it should be accessible from many different platforms as long as you have an internet connection. One downside for me is the fact that you are not in full control of the data (backups, privacy, security.)

As far as collaborative text editors, I’ve looked into Gobby for some work projects. It had the best quality that I saw, you can see everyone in the session typing in real-time with different colors and no locks, and it’s cross-platform compatible FOSS.

Other thoughts

Here is an article about using relative dates. This seems like a helpful concept. Instead of saying that I read About Face a couple of months ago, I could put in an approximate date and then let software do the translation. This might be nice so that people know that I’m probably not interested in talking about the book when they stumble upon this post in a couple of years. Many things on the internet are time-specific, so anything to state this clearly seems to be moving in the right direction. I can’t think of how many times I’ve read something and thought, “this doesn’t seem quite right,” and then looked for a date and realized it was horribly out of date. Using more relative dates is a semantic web thing.

I’d like to see even more histories of documents on the web, with linking to specific versions of documents. This would enable programs to cache the documents that you link to, and if the contents change or have newer versions, you can be alerted to this fact so that you can update your references. Instead of having fully broken links when a page is down, your web server would just fall back to the caches of the documents. This would all but ensure that useful pages are backed up in a distributed fashion. People seeking to restore a lost site or link could run some sort of script that is aware of sites that are backing up external sites, and link to those caches, adding further redundancy. This reminds me of Linus Torvalds saying (hopefully tongue-in-cheek): “Backups are for wimps. Real men upload their data to an FTP site and have everyone else mirror it.” There would obviously be logistical hurdles to overcome if someone manually changes their cache, if versions get out of sync, etc.

While we’re at it, let’s add transparency to government operations. Consider every change to every bill in the legislative branch and every signing statement having an associated author, time, and commit message explaining the rationale. Then citizens could see who really makes positive changes to bills and who to hold responsible for the pork. Are you sure you want to release a hundred changes one minute before voting is to begin on the House floor? True “blame” and “praise” (from an SVN perspective.)

Anthony Panozzo's Blog

Ideas + Implementation

Non-programmers

Other thoughts

Comments