I've never been very happy with BlueCloth, Ruby's de facto Markdown library. It was well-developed throughout 2005, reached a fairly complete 1.0 release that year, and then... just... stopped. There hasn't been so much as a maintenance release since 2005 -- and that's certainly not due to a lack of bugs and feature requests.
BlueCloth is slow. Really slow. Gruber's Markdown.pl (which was never
designed for speed, if I remember correctly) can process the basic syntax
test document a full three times in the same amount of time it takes
BlueCloth to process it once.
BlueCloth is also broken:
$ echo "Oh _is_ it?" | Markdown.pl
<p>Oh <em>is</em> it?</p>
$ ruby -rbluecloth -e "puts BlueCloth.new('Oh _is_ it?').to_html"
<p>Oh, _is_ it?</p>
BlueCloth is broken, slow, and unmaintained. What to do?
Mislav Marohnić recently created a Git clone of the BlueCloth subversion repository to fix bugs. That's a good start. There's also Maruku, another pure-Ruby implementation that's a bit faster and includes a variety of interesting extensions to the core Markdown grammar.
Here's another idea:
class BabyShitGreenCloth
def initialize(text)
@text = text
end
def to_html
open("|perl Markdown.pl", 'r+') do |io|
io.write(@text)
io.close_write
io.read
end
end
end
BlueCloth = BabyShitGreenCloth
You laugh!? Don't. It would be funny if it were actually an inferior implementation compared to BlueCloth. It isn't.
Shrug. Just sayin...
Announcing Two New Fast Markdown Libraries for Ruby
(Three if you include the pipe-to-perl implementation above.)
I have two experimental solid Ruby extension libraries: one that wraps Jon MacFarleane's peg-markdown and one that wraps David Loren Parsons's Discount. Both are complete implementations of core Markdown plus
SmartyPants in C.
Why two? Well, there are some pretty big differences between implementations:
Discount has a BSD-style license; peg-markdown is GPL. The Ruby extensions adopt the license of their parent work.
peg-markdown uses a PEG-based grammar definition and a parser generator called leg. That's just fucking cool. It stimulates both the CompSci weenie and pirate areas of my brain simultaneously. Also, this should -- theoretically, of course -- make peg-markdown easier to maintain and extend and guarantees a high level of correctness, assuming the grammar is defined properly.
Discount is thread-safe, has good memory management, and includes a stable set of functions geared toward library use. peg-markdown has
nonesome of that and the author is entertaining suggestions (in the form of patches).Discount is quite a bit faster than peg-markdown (~8x in my tests), although either will blow the doors off BlueCloth (or Markdown.pl for that matter) in raw performance.
Bottom line: Discount makes for a better Ruby extension presently but peg-markdown has legs (hardy har har).
Installing, Using, Hacking
UPDATE: The Discount class has been renamed to RDiscount. The gem has also been renamed from discount to rdiscount. It is recommended that you uninstall the original discount gem.
Git clones are available on GitHub for monitoring, hacking, and browsing the source / documentation files: rdiscount and rpeg-markdown.
GEMs have been released to RubyForge. Install as usual:
$ sudo gem install rdiscount
$ sudo gem install rpeg-markdown
(If you have a spare moment, please consider installing either or both of these and note any compilation errors along with your platform in the comments.)
Both extensions implement the basic protocol popularized by RedCloth and adopted by BlueCloth:
require 'rdiscount'
markdown = RDiscount.new("Hello World!")
puts markdown.to_html
For rpeg-markdown:
require 'peg_markdown'
markdown = PEGMarkdown.new("Hello World!")
puts markdown.to_html
In addition, both libraries set the top-level Markdown constant
(when defined) at their implementation classes, making it possible
to write code that expresses no interest in Markdown implementation:
markdown = Markdown.new("Hello World!")
puts markdown.to_html
Lastly, you can inject either library into BlueCloth-using code by
substituting require 'bluecloth' statements with the following:
begin
require 'rdiscount'
BlueCloth = RDiscount
rescue LoadError
require 'bluecloth'
end
Benchmarks
Here's the results of processing the Basic Markdown Syntax test file over 100 iterations with BlueCloth, Maruku, Discount, and rpeg-markdown on my 2GHz MacBook Pro. All values are wall-clock time.
$ ruby benchmark.rb
Results for 100 iterations
BlueCloth: 13.029987s total time, 00.130300s average
Maruku: 08.424132s total time, 00.084241s average
RDiscount: 00.082019s total time, 00.000820s average
PEGMarkdown: 00.715275s total time, 00.007153s average
Here's the code used to perform the benchmarks (benchmark.rb):
iterations = 100
test_file = "#{File.dirname(__FILE__)}/benchmark.txt"
implementations = %w[BlueCloth RDiscount Maruku PEGMarkdown]
# Attempt to require each implementation and remove any that are not
# installed.
implementations.reject! do |class_name|
begin
require class_name.downcase
false
rescue LoadError => boom
puts "#{class_name} excluded. Try: gem install #{class_name.downcase}"
true
end
end
# Grab actual class objects.
implementations.map! { |class_name| Object.const_get(class_name) }
def benchmark(implementation, text, iterations)
start = Time.now
iterations.times do |i|
implementation.new(text).to_html
end
Time.now - start
end
test_data = File.read(test_file)
puts "Spinning up ..."
implementations.each { |impl| benchmark(impl, test_data, 1) }
puts "Running benchmarks ..."
results =
implementations.inject([]) do |r,impl|
GC.start
r << [ impl, benchmark(impl, test_data, iterations) ]
end
puts "Results for #{iterations} iterations:"
results.each do |impl,time|
printf " %10s %09.06fs total time, %09.06fs average\n",
"#{impl}:", time, time / iterations
end
Discuss
Wow I had no idea BlueCloth was so slow. I'll be switching my blog over to Discount right now. Thanks for your contributions.
— Arya A. on Saturday, May 31, 2008 at 12:30 AM #
Hi Ryan,
So I successfully installed this on my Mac but when I was ready to deploy and SSH'd over into my CentOS machine (x86_64) and run sudo gem install discount, I get this:
ERROR: Error installing discount:
/usr/local/bin/ruby extconf.rb install discount
I cd'd into the ext directory and ran extconf.rb then make && make install and that worked fine. But the gem won't install properly. Got any ideas?
— Arya A. on Saturday, May 31, 2008 at 01:02 AM #
Any chance of renaming the discount class? Discount is likely to be a widely used ActiveRecord model name for ecommerce web sites.
— Andrew White on Saturday, May 31, 2008 at 01:53 AM #
Arya: thanks. I have no idea but I should be able to test a bit.
Andrew: sure. How about InsanelyFastCloth? Although, superheros may object.
— Ryan Tomayko on Saturday, May 31, 2008 at 04:20 AM #
Hey Ryan, nice work. Both compile successfully here on Gentoo amd64.
I just want to give Maruku a bit of a plug. While I'll probably be moving to rpeg-markdown myself, Maruku still has quite a few advantages. It has a lot of very nice extensions and it is very easy to further extend with ruby. I also found it to be more than an order of magnitude slower than BlueCloth for some larger documents.
I'm personally a bit of a performance nut so I'll probably try adding some of the extensions I like to rpeg-markdown but if you want a pure Ruby Markdown parser forget BlueCloth and go with Maruku. It rocks!
— Dave Balmain on Saturday, May 31, 2008 at 04:41 AM #
Whoops I meant to say I found Maruko to be an order of magnitude faster than BlueCloth in some cases. Sorry.
— Dave Balmain on Saturday, May 31, 2008 at 05:08 AM #
I did some benchmarks myself. My blog is static, built from plain markdown files with HAML templates by a ruby script. I just added a loop that built my entire blog 100 times (including HAML parsing and file IO):
Discount seems nice and fast, but under my environment rpeg-markdown ran considerably slower than the old BlueCloth.
— Eivind Uggedal on Saturday, May 31, 2008 at 06:05 AM #
I tried to hit plan-watch.com this morning to check in on how your venture was going...but it appeared to be down. Plan watch / pro-serve still going well?
— Dan on Saturday, May 31, 2008 at 08:59 AM #
I have had the same problem with markdown in Ruby for a while. I'll definitely be checking both of these out, thanks!
— Paul Barry on Saturday, May 31, 2008 at 09:27 AM #
Elivind: huh. Those numbers look a lot different than mine. In my tests, Discount is almost 59x BlueCloth whereas in yours it's only about 2.5x. Something's not right with one (or both) of our benchmarks.
— Ryan Tomayko on Saturday, May 31, 2008 at 12:56 PM #
Ryan,
It's very strange, I am able to compile the extension and generate the .so file but it won't successfully install. After building the extension and copying .so into the lib directory and requiring 'discount.rb' directly (not as a gem), it works. But when I do:
It doesn't work.
Any pointers at all would be greatly appreciated.
— Arya A. on Saturday, May 31, 2008 at 02:56 PM #
Just tried out Discount. It installed on my MBP running 10.5 fine. It seems to work well (unlike BlueCloth and Maruku) and is blazing fast. Thanks!
— Paul Barry on Saturday, May 31, 2008 at 06:27 PM #
Ryan: If you read my comment my test does not only do markdown parsing, but also HAML and file reading and writing.
— Eivind Uggedal on Sunday, June 01, 2008 at 05:42 AM #
As someone said above, it builds fine on 10.5 Intel but the Discount name's kinda picky, what about just using FastCloth? :-)
— Federico on Sunday, June 01, 2008 at 11:21 PM #
We've been using handy scripting languages for so long, it's easy to forget how wickedly, ruthlessly, and diabolically fast C code actually runs. :)
— John on Monday, June 02, 2008 at 01:58 PM #
The Discount gem name is now "rdiscount":
The class is now named "RDiscount":
I was this close to naming it DarkBlueCloth.
— Ryan Tomayko on Tuesday, June 03, 2008 at 07:34 AM #
FYI, in case you haven't seen it - there is a Markdown testbed available online, which allows you to try out a number of Markdown implementations next to each other.
The URL for this is http://babelmark.bobtfish.net/
— Tomas Doran on Tuesday, June 03, 2008 at 08:37 AM #
How much work would it take to make these gems work on Windows? Is that even possible? I am cursed with having to target the windows platform... but Maruku is super slow...
— Simon Lau on Friday, June 06, 2008 at 04:51 PM #
ah; DarkBlueCloth would have been awesome :D
Ryan, this is fantastic; thanks for the post. btw, no problem compiling and running on osx 10.4 and Gentoo 1.12.9
— Adam Greene on Friday, June 06, 2008 at 11:36 PM #
Simon: A lot and maybe. I've been wanting to experiment with cross-compiling for win32 using mingw so maybe I'll give it a shot. I'm afraid it might be a waste of time though because the extension relies on non-standard
funopen/fopencookiefunctions in libc. From what I can tell, BSD, Mac, and Linux are the only platforms that support these.— Ryan Tomayko on Saturday, June 07, 2008 at 12:38 AM #
Ryan, these are very sweet indeed; no issues installing on Gentoo. WDYT about announcing these on the markdown-discuss list?
My main reasons for not yet switching from Maruku (which I love) to RDiscount are 1) the syntax extensions for "Markdown Extra" and 2), from rdiscount.rb,
Is filter_html on your hitlist?
— Thomas Nichols on Friday, June 13, 2008 at 03:18 PM #
I love Maruku. It has awesome features, supports Markdown and Markdown Extra, and has extensions.
Caching is your friend.
Awesome post, regardless. But I live Maruku's features too much - the sacrifice in speed is worth it for me ... atleast for now. I'll have to see how featureful these 2 libraries are.
— remi on Tuesday, June 17, 2008 at 01:01 AM #
This looks really cool and I'm gonna check it out soon.
How about "PurpleCloth" or "IndigoCloth" or perhaps my favorite "color": "PlaidCloth"
:)
— Rev. Dan on Tuesday, June 17, 2008 at 12:48 PM #
This looks really cool and I'm gonna check it out soon.
How about "PurpleCloth" or "IndigoCloth" or perhaps my favorite "color": "PlaidCloth"
:)
— Rev. Dan on Tuesday, June 17, 2008 at 12:48 PM #
RDiscount on Windows seems to be a no-go. It's looking for C libraries and stuff it seems like.
— Eleo on Tuesday, June 17, 2008 at 04:36 PM #
Hey Ryan,
I've been looking for alternatives to funopen and fopencookie for Windows without success.
Since you're using callbacks for the write buffer is not possible to do with plain C (can be done using C++ MemoryStreams, but that is C++).
The other alternative will be use temporary files (tmpfile) to hold all the buffer and the convert it to a ruby string, but that change will be only for Windows.
What do you think?
Drop me a line to luislavena (at) gmail (dot) c o m
Thanks for your time.
— Luis Lavena on Friday, June 20, 2008 at 07:53 AM #
— You Might Want To Implement :filter_html on Sunday, June 29, 2008 at 03:16 PM #
Heh.
— Anonymous Coward on Sunday, June 29, 2008 at 03:17 PM #
Heh type 2
— You're filtering out css properties? Heh... on Sunday, June 29, 2008 at 03:18 PM #
Gee. Thanks, commenter #27-#29. I have a separate html5lib based sanitizer in place so I'm not too worried about it at the moment.
Maybe you should think about implementing
:filter_html-- patches are quite welcome.In case anyone's curious, here's the body of the comments above:
27. <style>*{display:none !important}</style> 28. <b style="display: block; position: absolute; top: 0; left: 0; color: red; font-size: 50px">Heh.</b> 29. <b style='font-size: 5000px; margin-bottom: -2000px'>Heh type 2</b>Solid attempts, all :)
— Ryan Tomayko on Sunday, July 06, 2008 at 08:57 PM #
Hi Ryan,
thanks for the links! I tried it for converting rather long Markdown text in Sinatra app (in Czech and the numbers are impressive:
127.0.0.1 ... "GET /.... HTTP/1.1" 200 25356 2.4435 127.0.0.1 ... "GET /.... HTTP/1.1" 200 25427 0.0179
Karel
— Karel Minarik on Friday, July 25, 2008 at 01:15 PM #
Maruku supports the (very welcome) extensions of PHP Markdown Extra including definition lists, IDs on headers, optional markdown within block elements such as tables, abbreviations and more.
Important to note is that Discount does not support these extensions.
Finally I'll emphasize that the GPL'd natured of peg-markdown means you cannot use it in proprietary (in fact, in non-GPL) applications.
Thanks for the useful write-up.
— Alan Hogan on Thursday, July 31, 2008 at 06:04 PM #
That's a feature.
— Ryan Tomayko on Saturday, August 02, 2008 at 04:20 AM #
Thanks for this!
Big (un-measured of course) speed improvement for me. The one bit of markdown that sent me screaming was taking 14s to process with BlueCloth. It was imperceptible with RDiscount. I have tiny bits of markdown'd text all over my site so this should really be a boon.
— Chris Nolan.ca on Thursday, August 07, 2008 at 10:49 PM #
Hi Ryan
I've also had problems installing both gems on OS X:
ERROR: Error installing rdiscount: ERROR: Failed to build gem native extension./System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby extconf.rb install rdiscount can't find header files for ruby.
— James on Monday, August 18, 2008 at 02:54 AM #
Thank you so friggn much... you took my application from having an average of 1,200 ms/request to 150 ms/request on 400 request/second... I love you... I love you!!! And I love EngineYard as well for finding the problem with BlueCloth!!!
— John Kopanas on Monday, August 18, 2008 at 12:45 PM #
Leave a comment