JavaScript Based Code Prettification

It appears that the recent syntax highlighting enhancements to Google Code’s source browser are implemented with a slightly modified version of Mike Samuel’s JavaScript Code Prettifier: a 100% browser-side JavaScript/CSS syntax highlighting engine with a sufficiently simple implementation (~750 LOC / 37KB unpacked / 12KB gzipped), a liberal license, and decent language support. The README claims that C (and friends), Java, Python, Bash, SQL, HTML, XML, CSS, JavaScript, and Makefiles are highlighted well and Ruby, PHP, Awk, and Perl are handled passably. As luck would have it, I just went through the process of wiring this script up last week with a few interesting tweaks to get automatic highlighting on all my posts.

The prescribed usage was fairly simple: bring in the stylesheet and external script, add a “prettyprint” class to <pre> and/or <code> blocks, and call prettyPrint() on load/ready.

This setup has a few (easily remedied) drawbacks, however:

  1. Manually adding a “prettyprint” class is a bit of a pain with Markdown since there’s no way to add a class name to a code block.

  2. Manually adding a “prettyprint” class is a bit of a pain because I don’t feel like flipping through looking for every post I’ve ever written with a code block in it.

  3. Adding prettify.js to a all pages means a 37K download even when no syntax highlighting is required.

  4. The stock stylesheet’s color palette is somewhat lacking.

Making it Automatic

The following wee bit of code is running on each page to automatically highlight all code blocks on the page. I used jQuery but this should be easily ported to other JavaScript libraries or standard DOM.

$(document).ready(function() {

    // add prettyprint class to all <pre><code></code></pre> blocks
    var prettify = false;
    $("pre code").parent().each(function() {
        $(this).addClass('prettyprint');
        prettify = true;
    });

    // if code blocks were found, bring in the prettifier ...
    if ( prettify ) {
        $.getScript("/js/prettify.js", function() { prettyPrint() });
    }

});

We run through the DOM after a page is fully loaded and look for <code> elements nestled within a <pre> (this is what Markdown puts out for code blocks). We add the “prettyprint” class to each of the <pre> parent elements and note that the prettifier is required. Lastly, we use jQuery’s getScript method to bring in the prettifier but only if we’ve detected that it’s required (jQuery.getScript is implemented by appending a <script> tag to <head>).

More Prettier

As a finishing touch, we’ll add a bit of flavor to the highlighting color pallette. There are 10 syntactical elements that can be styled. The meaning of each should be fairly obvious from the abbreviated class name. Here’s what I’m rolling with, at the moment:

.str { color:#181; font-style:italic }
.kwd { color:#369 }
.com { color:#666 }
.typ { color:#c40 }
.lit { color:#900 }
.pun { color:#000; font-weight:bold  }
.pln { color:#333 }
.tag { color:#369; font-weight:bold  }
.atn { color:#939; font-weight:bold  }
.atv { color:#181 }
.dec { color:#606 }

Examples

And without further ado, here’s a few fibs to get a feel for how well the highlighting works with different languages:

Python:

import sys

# calculate the nth fibonacci number
def fib(n):
  if n < 2:
    return 1
  else:
    return fib(n-2) + fib(n-1)

for i in range(int(sys.argv[1])):
  print "the %dth fibonacci number is %d" % (i, fib(i)) 

Ruby:

# calculate the nth fibonacci number
def fib(n)
  if n < 2
    1
  else
    fib(n-2) + fib(n-1)
  end
end

ARGV[0].to_i.times do |i|
  puts "the #{i}th fibonacci number is #{fib(i)}"
end

PHP:

<?php
  // calculate the nth fibonacci number
  function fib($n)
  {
      if( $n < 2 )
          return 1;
      else
          return fib($n-2) + fib($n-1);
  }

  for( $i=0; $i < 36; $i++)
  {
       printf("the %d'th fibonacci number is %d\n", $i, fib($i));
  }
?>

HTML and JavaScript:

<!DOCTYPE html>
<html>
  <head>
    <title>Fibonacci</title>
    <script type='text/javascript'>
      /* calculate the nth fibonacci number */
      function fib(n) {
          if ( n &lt; 2 )
              return 1;
          else
              return fib(n-2) + fib(n-1);
      }
      
      $(document).ready(function() {
        for(var i = 0; i < 36; i++)
          $("p#output").append("the " + i + "th fibonacci number is " + 
            fib(i) + "\n");
      });
    </script>
  </head>
  <body>
    <pre id='output'></pre>
  </body>
</html> 

Perl:

use strict;
use warnings;

for ( 0..36 ) {
    print "the " . $_ . "th fibonacci number is " . fib($_) . "\n";
}

# calculate the nth fibonacci number
sub fib {
  my $n = shift || 0;
  return 1 if ( $n < 2 );
  return fib( $n-2 ) + fib( $n-1 );
} 

C:

#include <stdio.h>

/* calculate nth fibonacci number */
int fib(int n) {
    if (n < 2)
        return 1;
    return fib(n - 2) + fib(n - 1);
}

int main(int argc, char ** argv) {
    int i = 0;
    for (i = 0, i < atoi(argv[1]), i++)
        printf("the %dth fibonacci number is %d\n", i, fib(i));
    return 0;
}

Bourne Shell

#!/bin/sh

# calculate the nth fibonacci number
fib() {
  if [ "$1" -lt 2 ]; then
    echo 1
    return 0
  else
    expr $(fib $(expr "$1" - 2)) + $(fib $(expr "$1" - 1))
    return 0
  fi
}

for i in $(seq 1 $1)
do
  printf "the %dth fibonacci number is %d\n" $i $(fib $i)
done

Not too shabby under Minefield - how’s it look in your browser?


Errata: jQuery.getScript Is Cache-Breaking

When loading external scripts dynamically with jQuery’s getScript, a cache busting parameter is added to the request URL. So, instead of writing something like <script src='/js/foo.js'>, it writes something like <script src='/js/foo.js?_=ts2477874287'>, causing the script to be pulled anew each time. I assume this is so that calling getScript repeatedly on a single page causes the script to be executed multiple times.

Boo. In reducing the amount of data fetched for pages that don’t require syntax highlighting, we inadvertently increased the amount of data fetched across multiple pages with highlighting.

Under the hood, the getScript method calls jQuery.ajax, explicitly setting cache to false. We can turn caching back on by replacing the call to getScript with the following:

$.ajax({
  type: 'GET',
  url: '/js/prettify.js',
  cache: true,
  success: function() { prettyPrint() },
  dataType: 'script',
  data: null
});

It might actually be easier to bypass jQuery altogether here and just add in the <script> element using basic DOM stuff.