I've never been very happy with BlueCloth, Ruby's de facto Markdown library. It was well-developed throughout 2005, reached a fairly complete 1.0 release that year, and then... just... stopped. There hasn't been so much as a maintenance release since 2005 -- and that's certainly not due to a lack of bugs and feature requests.

BlueCloth is slow. Really slow. Gruber's Markdown.pl (which was never designed for speed, if I remember correctly) can process the basic syntax test document a full three times in the same amount of time it takes BlueCloth to process it once.

BlueCloth is also broken:

$ echo "Oh _is_ it?" | Markdown.pl
<p>Oh <em>is</em> it?</p>

$ ruby -rbluecloth -e "puts BlueCloth.new('Oh _is_ it?').to_html"
<p>Oh, _is_ it?</p>

BlueCloth is broken, slow, and unmaintained. What to do?

Mislav Marohnić recently created a Git clone of the BlueCloth subversion repository to fix bugs. That's a good start. There's also Maruku, another pure-Ruby implementation that's a bit faster and includes a variety of interesting extensions to the core Markdown grammar.

Here's another idea:

class BabyShitGreenCloth
  def initialize(text)
    @text = text
  end
  def to_html
    open("|perl Markdown.pl", 'r+') do |io|
      io.write(@text)
      io.close_write
      io.read
    end
  end
end
BlueCloth = BabyShitGreenCloth

You laugh!? Don't. It would be funny if it were actually an inferior implementation compared to BlueCloth. It isn't.

Shrug. Just sayin...

Announcing Two New Fast Markdown Libraries for Ruby

(Three if you include the pipe-to-perl implementation above.)

I have two experimental solid Ruby extension libraries: one that wraps Jon MacFarleane's peg-markdown and one that wraps David Loren Parsons's Discount. Both are complete implementations of core Markdown plus SmartyPants in C.

Why two? Well, there are some pretty big differences between implementations:

  • Discount has a BSD-style license; peg-markdown is GPL. The Ruby extensions adopt the license of their parent work.

  • peg-markdown uses a PEG-based grammar definition and a parser generator called leg. That's just fucking cool. It stimulates both the CompSci weenie and pirate areas of my brain simultaneously. Also, this should -- theoretically, of course -- make peg-markdown easier to maintain and extend and guarantees a high level of correctness, assuming the grammar is defined properly.

  • Discount is thread-safe, has good memory management, and includes a stable set of functions geared toward library use. peg-markdown has none some of that and the author is entertaining suggestions (in the form of patches).

  • Discount is quite a bit faster than peg-markdown (~8x in my tests), although either will blow the doors off BlueCloth (or Markdown.pl for that matter) in raw performance.

Bottom line: Discount makes for a better Ruby extension presently but peg-markdown has legs (hardy har har).

Installing, Using, Hacking

UPDATE: The Discount class has been renamed to RDiscount. The gem has also been renamed from discount to rdiscount. It is recommended that you uninstall the original discount gem.

Git clones are available on GitHub for monitoring, hacking, and browsing the source / documentation files: rdiscount and rpeg-markdown.

GEMs have been released to RubyForge. Install as usual:

$ sudo gem install rdiscount
$ sudo gem install rpeg-markdown

(If you have a spare moment, please consider installing either or both of these and note any compilation errors along with your platform in the comments.)

Both extensions implement the basic protocol popularized by RedCloth and adopted by BlueCloth:

require 'rdiscount'
markdown = RDiscount.new("Hello World!")
puts markdown.to_html

For rpeg-markdown:

require 'peg_markdown'
markdown = PEGMarkdown.new("Hello World!")
puts markdown.to_html

In addition, both libraries set the top-level Markdown constant (when defined) at their implementation classes, making it possible to write code that expresses no interest in Markdown implementation:

markdown = Markdown.new("Hello World!")
puts markdown.to_html

Lastly, you can inject either library into BlueCloth-using code by substituting require 'bluecloth' statements with the following:

begin
  require 'rdiscount'
  BlueCloth = RDiscount
rescue LoadError
  require 'bluecloth'
end

Benchmarks

Here's the results of processing the Basic Markdown Syntax test file over 100 iterations with BlueCloth, Maruku, Discount, and rpeg-markdown on my 2GHz MacBook Pro. All values are wall-clock time.

$ ruby benchmark.rb
Results for 100 iterations
BlueCloth: 13.029987s total time, 00.130300s average
   Maruku: 08.424132s total time, 00.084241s average
RDiscount: 00.082019s total time, 00.000820s average

PEGMarkdown: 00.715275s total time, 00.007153s average

Here's the code used to perform the benchmarks (benchmark.rb):

 iterations = 100
 test_file = "#{File.dirname(__FILE__)}/benchmark.txt"
 implementations = %w[BlueCloth RDiscount Maruku PEGMarkdown]

 # Attempt to require each implementation and remove any that are not
 # installed.
 implementations.reject! do |class_name|
   begin
     require class_name.downcase
     false
   rescue LoadError => boom
     puts "#{class_name} excluded. Try: gem install #{class_name.downcase}"
     true
   end
 end

 # Grab actual class objects.
 implementations.map! { |class_name| Object.const_get(class_name) }

 def benchmark(implementation, text, iterations)
   start = Time.now
   iterations.times do |i|
     implementation.new(text).to_html
   end
   Time.now - start
 end

 test_data = File.read(test_file)

 puts "Spinning up ..."
 implementations.each { |impl| benchmark(impl, test_data, 1) }

 puts "Running benchmarks ..."
 results =
   implementations.inject([]) do |r,impl|
     GC.start
     r << [ impl, benchmark(impl, test_data, iterations) ]
   end

 puts "Results for #{iterations} iterations:"
 results.each do |impl,time|
   printf "  %10s %09.06fs total time, %09.06fs average\n",
     "#{impl}:", time, time / iterations
 end