It appears that the recent syntax highlighting enhancements to Google Code’s source browser are implemented with a slightly modified version of Mike Samuel’s JavaScript Code Prettifier: a 100% browser-side JavaScript/CSS syntax highlighting engine with a sufficiently simple implementation (~750 LOC / 37KB unpacked / 12KB gzipped), a liberal license, and decent language support. The README claims that C (and friends), Java, Python, Bash, SQL, HTML, XML, CSS, JavaScript, and Makefiles are highlighted well and Ruby, PHP, Awk, and Perl are handled passably. As luck would have it, I just went through the process of wiring this script up last week with a few interesting tweaks to get automatic highlighting on all my posts.

The prescribed usage was fairly simple: bring in the stylesheet and external script, add a “prettyprint” class to <pre> and/or <code> blocks, and call prettyPrint() on load/ready.

This setup has a few (easily remedied) drawbacks, however:

  1. Manually adding a “prettyprint” class is a bit of a pain with Markdown since there’s no way to add a class name to a code block.

  2. Manually adding a “prettyprint” class is a bit of a pain because I don’t feel like flipping through looking for every post I’ve ever written with a code block in it.

  3. Adding prettify.js to a all pages means a 37K download even when no syntax highlighting is required.

  4. The stock stylesheet’s color palette is somewhat lacking.

Making it Automatic

The following wee bit of code is running on each page to automatically highlight all code blocks on the page. I used jQuery but this should be easily ported to other JavaScript libraries or standard DOM.

$(document).ready(function() {

    // add prettyprint class to all <pre><code></code></pre> blocks
    var prettify = false;
    $("pre code").parent().each(function() {
        $(this).addClass('prettyprint');
        prettify = true;
    });

    // if code blocks were found, bring in the prettifier ...
    if ( prettify ) {
        $.getScript("/js/prettify.js", function() { prettyPrint() });
    }

});

We run through the DOM after a page is fully loaded and look for <code> elements nestled within a <pre> (this is what Markdown puts out for code blocks). We add the “prettyprint” class to each of the <pre> parent elements and note that the prettifier is required. Lastly, we use jQuery’s getScript method to bring in the prettifier but only if we’ve detected that it’s required (jQuery.getScript is implemented by appending a <script> tag to <head>).

More Prettier

As a finishing touch, we’ll add a bit of flavor to the highlighting color pallette. There are 10 syntactical elements that can be styled. The meaning of each should be fairly obvious from the abbreviated class name. Here’s what I’m rolling with, at the moment:

.str { color:#181; font-style:italic }
.kwd { color:#369 }
.com { color:#666 }
.typ { color:#c40 }
.lit { color:#900 }
.pun { color:#000; font-weight:bold  }
.pln { color:#333 }
.tag { color:#369; font-weight:bold  }
.atn { color:#939; font-weight:bold  }
.atv { color:#181 }
.dec { color:#606 }

Examples

And without further ado, here’s a few fibs to get a feel for how well the highlighting works with different languages:

Python:

import sys

# calculate the nth fibonacci number
def fib(n):
  if n < 2:
    return 1
  else:
    return fib(n-2) + fib(n-1)

for i in range(int(sys.argv[1])):
  print "the %dth fibonacci number is %d" % (i, fib(i))

Ruby:

# calculate the nth fibonacci number
def fib(n)
  if n < 2
    1
  else
    fib(n-2) + fib(n-1)
  end
end

ARGV[0].to_i.times do |i|
  puts "the #{i}th fibonacci number is #{fib(i)}"
end

PHP:

<?php
  // calculate the nth fibonacci number
  function fib($n)
  {
      if( $n < 2 )
          return 1;
      else
          return fib($n-2) + fib($n-1);
  }

  for( $i=0; $i < 36; $i++)
  {
       printf("the %d'th fibonacci number is %d\n", $i, fib($i));
  }
?>

HTML and JavaScript:

<!DOCTYPE html>
<html>
  <head>
    <title>Fibonacci</title>
    <script type='text/javascript'>
      /* calculate the nth fibonacci number */
      function fib(n) {
          if ( n &lt; 2 )
              return 1;
          else
              return fib(n-2) + fib(n-1);
      }

      $(document).ready(function() {
        for(var i = 0; i < 36; i++)
          $("p#output").append("the " + i + "th fibonacci number is " + 
            fib(i) + "\n");
      });
    </script>
  </head>
  <body>
    <pre id='output'></pre>
  </body>
</html>

Perl:

use strict;
use warnings;

for ( 0..36 ) {
    print "the " . $_ . "th fibonacci number is " . fib($_) . "\n";
}

# calculate the nth fibonacci number
sub fib {
  my $n = shift || 0;
  return 1 if ( $n < 2 );
  return fib( $n-2 ) + fib( $n-1 );
}

C:

#include <stdio.h>

/* calculate nth fibonacci number */
int fib(int n) {
    if (n < 2)
        return 1;
    return fib(n - 2) + fib(n - 1);
}

int main(int argc, char ** argv) {
    int i = 0;
    for (i = 0, i < atoi(argv[1]), i++)
        printf("the %dth fibonacci number is %d\n", i, fib(i));
    return 0;
}

Bourne Shell

#!/bin/sh

# calculate the nth fibonacci number
fib() {
  if [ "$1" -lt 2 ]; then
    echo 1
    return 0
  else
    expr $(fib $(expr "$1" - 2)) + $(fib $(expr "$1" - 1))
    return 0
  fi
}

for i in $(seq 1 $1)
do
  printf "the %dth fibonacci number is %d\n" $i $(fib $i)
done

Not too shabby under Minefield - how’s it look in your browser?


Errata: jQuery.getScript Is Cache-Breaking

When loading external scripts dynamically with jQuery’s getScript, a cache busting parameter is added to the request URL. So, instead of writing something like <script src='/js/foo.js'>, it writes something like <script src='/js/foo.js?_=ts2477874287'>, causing the script to be pulled anew each time. I assume this is so that calling getScript repeatedly on a single page causes the script to be executed multiple times.

Boo. In reducing the amount of data fetched for pages that don’t require syntax highlighting, we inadvertently increased the amount of data fetched across multiple pages with highlighting.

Under the hood, the getScript method calls jQuery.ajax, explicitly setting cache to false. We can turn caching back on by replacing the call to getScript with the following:

$.ajax({
  type: 'GET',
  url: '/js/prettify.js',
  cache: true,
  success: function() { prettyPrint() },
  dataType: 'script',
  data: null
});

It might actually be easier to bypass jQuery altogether here and just add in the <script> element using basic DOM stuff.

This entry has been tagged examples, coding, html, css, javascript, howto — follow a tag for an archive of related essays, weblog entries, and bookmarks.

Discuss

  1. All of your highlighting looks pretty good to me.

    I’ve used the google highligher a few times, but none of the js driven highlighters work in Google reader. The only source code highlighting that seems to make it through the google reader filter is color=”#rrggbb” on a span tag. Annoying.

    Mike on Tuesday, March 18, 2008 at 07:23 AM #

  2. I actually use the same library on my blog.

    I wrote a small patch for it to (dynamically) add line numbers in a separate <div/> on the left side of the code block. That way you can have line numbers, but copy&paste still works ok. You can find the patch here.

    Martin Probst on Tuesday, March 18, 2008 at 11:17 AM #

  3. Actually I didn’t know about this library, I needed something like this for my [url=tonylog.altervista.org]blog[/url], thank you that’s neat. Antonio

    Antonio on Tuesday, March 18, 2008 at 11:53 AM #

  4. I use the same javascript script on one of my sites, but I took a few shortcuts in setting it up:

    1. I patched Markdown to tag its pre/code blocks (I already had modified it to fix a few other bugs)
    2. I decided that loading the script on every page was fine, as most readers hit my front page, and the front page will have code snips on it regularly. Once it’s in the cache, it’s there (at no additional cost per subsequent view on any other page).

    Bruce on Tuesday, March 18, 2008 at 09:33 PM #

  5. The code highlighting is great, but I really recommend you try a different algorithm for writing the first 36 Fibonacci numbers!

    Yossi on Wednesday, March 19, 2008 at 04:46 AM #

  6. Cool markup. I was gonna say “bland style” but actually it’s growing on me: reminds me of my old Amiga color scheme. Why not just include the google script in the page statically (with good caching natch’), and use Brandon Aaron’s “livequery” for JQuery: http://brandonaaron.net/docs/livequery/ That thing’ll cause scripts to run after any JQuery DOM manipulations.

    Bill Burcham on Wednesday, March 19, 2008 at 09:08 AM #

  7. Manually adding a “prettyprint” class is a bit of a pain because I don’t feel like flipping through looking for every post I’ve ever written with a code block in it.

    This is exactly the reason why I’m reluctant to move from my current quirky setup to something move conventional. What I do now is keep the entire archive in a single file (an Atom feed) that I edit directly to write or amend a post. As my typography and markup style evolves, I therefore can bring the entirety of my archives up to current standard quite easily, even with a mere global search and replace for the simpler changes.

    I can’t give up that sort of power and convenience.

    Aristotle Pagaltzis on Wednesday, March 19, 2008 at 11:14 PM #

  8. i think that including the script tag without the conditional logic is the most efficient way of doing it. get once cache it forever. so the logic becomes an unnecesary step.

    fernando trasviña on Friday, March 21, 2008 at 05:04 PM #

  9. You’ll probably find that the jQuery cache breaker is there to fix an annoying bug in IE6: gzipped JS files are never re-evaluated from the cache when loaded on a second page. The file is accessed from the local cache by IE, but it stores the Gzipped version and just decides that the JS file is gibberish, without letting the user / developer know.

    Spent an hour or two debugging really wierd failure situations with that issue. You can skip it if you’re sure that your webserver won’t gzip JS files.

    Brad Shuttleworth on Saturday, March 22, 2008 at 08:11 PM #

  10. FWIW the syntax highlighting doesn’t render in my RSS Reader (RSSOwl). Otherwise looks great.

    toby on Tuesday, March 25, 2008 at 05:53 PM #

  11. I have an interesting spin on the colors for this, sort of a reverse idea going on with a dark background and light text colors.

    Check it out here: http://www.kevinleary.net/blog/2008/03/12/safe-mailtos-with-jquery/

    Kevin on Tuesday, April 01, 2008 at 09:37 PM #

  12. Kevin: perdy. Looks almost like TextMate’s VividChalk color scheme. I’ve been trying to get that highlighting scheme in gvim for some time now.

    Ryan Tomayko on Wednesday, April 02, 2008 at 08:00 AM #

  13. Ryan, take a look at highlight.js. It is also not very big (15KB, not including language definitions), uses a common convention for marking up code blocks (<pre><code>...</code></pre>) and you can specify a language if auto detection sometime fail (which happens not very often).

    Ivan Sagalaev on Friday, April 04, 2008 at 07:14 AM #

Leave a comment





(syntax: markdown)