Moving Past BlueCloth
I’ve never been very happy with BlueCloth, Ruby’s de facto Markdown library. It was well-developed throughout 2005, reached a fairly complete 1.0 release that year, and then… just… stopped. There hasn’t been so much as a maintenance release since 2005 – and that’s certainly not due to a lack of bugs and feature requests.
BlueCloth is slow. Really slow. Gruber’s Markdown.pl
(which was never
designed for speed, if I remember correctly) can process the basic syntax
test document a full three times in the same amount of time it takes
BlueCloth to process it once.
BlueCloth is also broken:
$ echo "Oh _is_ it?" | Markdown.pl <p>Oh <em>is</em> it?</p> $ ruby -rbluecloth -e "puts BlueCloth.new('Oh _is_ it?').to_html" <p>Oh, _is_ it?</p>
BlueCloth is broken, slow, and unmaintained. What to do?
Mislav Marohnić recently created a Git clone of the BlueCloth subversion repository to fix bugs. That’s a good start. There’s also Maruku, another pure-Ruby implementation that’s a bit faster and includes a variety of interesting extensions to the core Markdown grammar.
Here’s another idea:
class BabyShitGreenCloth
def initialize(text)
@text = text
end
def to_html
open("|perl Markdown.pl", 'r+') do |io|
io.write(@text)
io.close_write
io.read
end
end
end
BlueCloth = BabyShitGreenCloth
You laugh!? Don’t. It would be funny if it were actually an inferior implementation compared to BlueCloth. It isn’t.
Shrug. Just sayin…
Announcing Two New Fast Markdown Libraries for Ruby
(Three if you include the pipe-to-perl implementation above.)
I have two experimental solid Ruby extension libraries: one that wraps Jon MacFarleane’s peg-markdown and one that wraps David Loren Parsons’s Discount. Both are complete implementations of core Markdown plus
SmartyPants in C.
Why two? Well, there are some pretty big differences between implementations:
-
Discount has a BSD-style license; peg-markdown is GPL. The Ruby extensions adopt the license of their parent work.
-
peg-markdown uses a PEG-based grammar definition and a parser generator called leg. That’s just fucking cool. It stimulates both the CompSci weenie and pirate areas of my brain simultaneously. Also, this should – theoretically, of course – make peg-markdown easier to maintain and extend and guarantees a high level of correctness, assuming the grammar is defined properly.
-
Discount is thread-safe, has good memory management, and includes a stable set of functions geared toward library use. peg-markdown has
nonesome of that and the author is entertaining suggestions (in the form of patches). -
Discount is quite a bit faster than peg-markdown (~8x in my tests), although either will blow the doors off BlueCloth (or Markdown.pl for that matter) in raw performance.
Bottom line: Discount makes for a better Ruby extension presently but peg-markdown has legs (hardy har har).
Installing, Using, Hacking
UPDATE: The Discount class has been renamed to RDiscount. The gem has also been renamed from discount to rdiscount. It is recommended that you uninstall the original discount gem.
Git clones are available on GitHub for monitoring, hacking, and browsing the source / documentation files: rdiscount and rpeg-markdown.
GEMs have been released to RubyForge. Install as usual:
$ sudo gem install rdiscount
$ sudo gem install rpeg-markdown
(If you have a spare moment, please consider installing either or both of these and note any compilation errors along with your platform in the comments.)
Both extensions implement the basic protocol popularized by RedCloth and adopted by BlueCloth:
require 'rdiscount'
markdown = RDiscount.new("Hello World!")
puts markdown.to_html
For rpeg-markdown:
require 'peg_markdown'
markdown = PEGMarkdown.new("Hello World!")
puts markdown.to_html
In addition, both libraries set the top-level Markdown
constant
(when defined) at their implementation classes, making it possible
to write code that expresses no interest in Markdown implementation:
markdown = Markdown.new("Hello World!")
puts markdown.to_html
Lastly, you can inject either library into BlueCloth-using code by
substituting require 'bluecloth'
statements with the following:
begin
require 'rdiscount'
BlueCloth = RDiscount
rescue LoadError
require 'bluecloth'
end
Benchmarks
Here’s the results of processing the Basic Markdown Syntax test file over 100 iterations with BlueCloth, Maruku, Discount, and rpeg-markdown on my 2GHz MacBook Pro. All values are wall-clock time.
$ ruby benchmark.rb
Results for 100 iterations
BlueCloth: 13.029987s total time, 00.130300s average
Maruku: 08.424132s total time, 00.084241s average
RDiscount: 00.082019s total time, 00.000820s average PEGMarkdown: 00.715275s total time, 00.007153s average
Here’s the code used to perform the benchmarks (benchmark.rb
):
iterations = 100
test_file = "#{File.dirname(__FILE__)}/benchmark.txt"
implementations = %w[BlueCloth RDiscount Maruku PEGMarkdown]
# Attempt to require each implementation and remove any that are not
# installed.
implementations.reject! do |class_name|
begin
require class_name.downcase
false
rescue LoadError => boom
puts "#{class_name} excluded. Try: gem install #{class_name.downcase}"
true
end
end
# Grab actual class objects.
implementations.map! { |class_name| Object.const_get(class_name) }
def benchmark(implementation, text, iterations)
start = Time.now
iterations.times do |i|
implementation.new(text).to_html
end
Time.now - start
end
test_data = File.read(test_file)
puts "Spinning up ..."
implementations.each { |impl| benchmark(impl, test_data, 1) }
puts "Running benchmarks ..."
results =
implementations.inject([]) do |r,impl|
GC.start
r << [ impl, benchmark(impl, test_data, iterations) ]
end
puts "Results for #{iterations} iterations:"
results.each do |impl,time|
printf " %10s %09.06fs total time, %09.06fs average\n",
"#{impl}:", time, time / iterations
end