Showing posts with label ruby. Show all posts
Showing posts with label ruby. Show all posts

21 March 2010

First-class functions make printf-debugging obsolete

tl;dr: Instead of looking through lines of text in an error log, store the variable bindings of your error cases and debug in a REPL in the context of your error.

One of the most productive features of a language is a read-eval-print loop: type in an expression, and see its value. A standard RDBMS has one for SQL, and a standard browser has one for Javascript. Good languages like Ruby, Python, Haskell and Lisp each have one. They're incredibly useful for exploratory programming.

If you're playing with library functions, or doing a join on a new set of tables, or trying to do complex Ajax calls, you'll have an easier time if you see the results of your code immediately. The faster the results come back, the most you keep your train of thought, and the easier it is to debug your code, and write more code based on it until you've solved your problem.

In an error case, you'd ideally like to be in that read-eval-print loop, poking around to see where your assumptions didn't hold true, but often times there's only a few print statements logging a few values. This is one way of getting back into that read-eval-print loop in Ruby.

As a motivating example, let's say you're trying to get the average number of letters per vowel in English words. Assuming that /usr/share/dict/words is sufficient, we can loop over every word, count the characters and count the vowels, divide, and then do the average.

As a first guess, let's try

words = File.readlines("/usr/share/dict/words")
words.map {|word| word.length / word.count("aeiouy") }.average

and bam, a divide-by-zero error. Ruby doesn't tell us the local variable bindings, what "word" was set to when it broke. It'd be great if, instead of doing

words.map {|word| print word; word.length / word.count("aeiouy") }.average

we could make a read-eval-print loop that operated in the context of the function. We already have the print part, and for the read part we can pass in a string, so all we need to do is evaluate a string in the context of the function, and store that closure over the function context outside its scope. The function eval evaluates a string, we can make a function via lambda, and we can store our function into a global variable, as follows:

words.map {|word| $repl = lambda {|string| eval string }; word.length / word.count("aeiouy") }.average

and now, when the divide-by-zero error happens, $repl is a function that evaluates its argument in the loop context of the failing words, with all of its local variables set. In my case,

>>$repl.call("word")
=> "A\n"

Voila! We didn't account for uppercase letters, and there's also some trailing whitespace coming through. Map each word to its lowercase, whitespace-stripped version, and try again:

words=File.readlines("/usr/share/dict/words").map {|word| word.strip.downcase }
words.map {|word| $repl = lambda {|string| eval string}; word.length / word.count("aeiouy") }.average

and this time

>> $repl.call("word")
=> "b"

we see another case to watch out for.

But why should we junk up our mapping function? Is there some way to package up variable bindings in a library? One of Ruby's core classes, actually, is Binding. An instance of Binding is basically the lookup table that the interpreter uses to look up the value of a variable, which you can then pass to eval. You can get the current binding at any point by calling

Kernel.binding

and you can call .binding on a block or a lambda, to get their bindings. Assuming the file "save_bindings.rb" had the following code:

$bindings = {}
def save_binding(key, &block)
$bindings[key] = block.binding
block.call
end
def use_binding_to_eval(key, string)
b = $bindings[key]
b ? eval(string,b) : "invalid key"
end

then we could do something refreshing like

require "save_bindings"
words.map {|word| save_binding("counting") { word.length / word.count("aeiouy") } }.average

so instead of doing a printf, we call save_binding, and then pass in a block that contains the code we want to run. So save_binding stores the binding in a hash table based on a key (the same key every time), and runs the code, and then, when our divide-by-zero gets thrown, we can see what the problem is by calling

>> use_binding_to_eval("counting", "word")
=> "b"

or, equally easily, anything more complex:

>> use_binding_to_eval("counting", "word.length")
=> 1
>> use_binding_to_eval("counting", "word == 'b'")
=> true
>> use_binding_to_eval("counting", "word.upcase!")
=> "B"
>> use_binding_to_eval("counting", "word")
=> "B"

and the environment variables all point to live values, as we can see with the destructively-updating String#upcase!.

Imagine if this were in a webapp. Take the following Sinatra code, in "letters-per-vowel.rb":

require "rubygems"
require "sinatra"

get("/letters-per-vowel") {
word = params["word"]
(word.length / word.count("aeiouy")).to_s
}

and run it in production mode to avoid Sinatra's helpful error messages:

ruby letters-per-vowel.rb -e production -p 4567

and go to localhost:4567/letters-per-vowel?word=aeiou or whatever vowelful word you want, and you get back what you expect, the string value of the numerical value of the division of the length of the word divided by the number of vowels. But what to do when you hit localhost:4567/letters-per-vowel?word=grrr or something like that? Use "save_bindings.rb" from earlier and make yourself a browser-based read-eval-print loop:

require "rubygems"
require "sinatra"
require "save_bindings"

SECRET = "password1"

get("/letters-per-vowel") {
word = params["word"]
save_binding("LPV") {
(word.length / word.count("aeiouy")).to_s
}
}

get("/debug") {
use_binding_to_eval(params["key"], params["string"]).to_s if params["secret"] == SECRET
}

We need a secret, to avoid people evaluating arbitrary code on the server. And now, if you hit localhost:4567/letters-per-vowel?word=grrr you'll still get that error message, but then go to localhost:4567/debug?secret=password1&key=LPV&string=request.env['HTTP_USER_AGENT'] and you can see the user agent from the environment of the request that caused the error, or go to localhost:4567/debug?secret=password1&key=LPV&string=word.size to see how long the word was. It's a bit of a security hole, but you could probably get even more exciting results if you put it behind the slick front-end of http://tryruby.org.


Lee

27 May 2009

Ruby's memcache-client 1.7.2 does NOT support compression; patch it in with 7 lines of code

The Ruby memcache-client 1.7.2 code seems like the most popular memcached client. Alas, memcache-client 1.7.2 does not recognize :compress => true, all the demos out there notwithstanding.
Let's fix this by monkeypatching Zlib compression into Marshal.


require 'zlib'

module Marshal
@@load_uc = method :load
@@dump_uc = method :dump
def self.load(v) @@load_uc[Zlib::Inflate.inflate(v)] end
def self.dump(v) Zlib::Deflate.deflate(@@dump_uc[v]) end
end

And there we go!  Four lines of patching Marshal.load for a better memory footprint.

To dissect each phrase of that sentence:
  1. "Four lines": I wanted just to try how well zlib would work on my Marshalled ActiveRecord objects and html fragments, and it did so handily, almost 3:1.  Indeed, the only reason I poked around at the source code is because one of my largest but still highly-compressible HTML fragments was 1.2MB, over the size limit.  I've since gone back to storing large HTML fragments on disk (uncompressed), having found many more values to store in Memcached.
  2. "patching Marshal.load": monkeypatching Marshal is not as bad as String.  Chances are, you use the Marshal format as a blob, and you keep your Marshal files to yourself (and leave external serialization to friendlier fare like JSON).  So, all in all, it's much easier to change the Marshal format than mucking through the memcache-client code.
  3. "better memory footprint": instead of Zlib, try LZMA, with slightly smaller compressed sizes than BZIP and faster decompression times, good properties for cache compression.  But Zlib is already in the standard library, so it's a good first approximation.
The ersatz alias_method_chaining feels kludgy, as does Ruby's distinction between methods and lambdae.  Ah well.

Thoughts?

02 May 2007

microsoft knows drm.

I've been hearing a lot about this string of hexadecimal numbers. It starts with 09, ends with c0, and I think it has f9 and 88 somewhere inside.

Let's ask Microsoft search what it is!

require('open-uri') && puts(open('http://search.msn.com/results.aspx?q=09+f9+88+c0') {|f| f.read}.downcase.gsub(/<[^>]+>/,'').tr('^0-9a-f','').scan(/09.+?c0/).inject(Hash.new(0)) {|h,nu| h[nu]+=1;h}.sort_by {|str,freq| freq}.last.first)

Whew!

For non-Rubyists:
require we can open urls like files, and put this string: open the msn search page for all pages that have 09, f9, 88, and c0, read it in one gulp into lowercase, regexp out all html tags, translate out any character that's not a hex digit, scan for all substrings that start with 09 and end with c0, make a histogram* of the array, sort by most popular, and take the most popular string.
(* Inject a hash table through the array of scanned substrings; the strings are the keys, the frequencies are the values; add one to the value every time you see any string, starting at zero.)

Remember folks, this is Microsoft's suggested answer to the 128-bit programming challenge
posed earlier today, so like love, and Cambridge weather, it's just temporary.

08 April 2007

kata 6.

prag dave's anagrams resonated with me, because i'm working on hashing text down.

so follow along in irb, if you have /usr/share/dict/words:

class Symbol
def to_proc(*args) lambda {|*a| a.first.send self, *(args+a[1..-1])} end
alias [] to_proc
end # for faux currying

w = File.readlines('/usr/share/dict/words').map {|w| w.strip.downcase}.uniq ;:done
h = Hash.new([])
w.each {|word| nu = word.split('').sort.join; h[nu]+=[word]} ;:done
anas = h.values.find_all {|v| v.size > 1} ;:done
puts anas.map(&:join[',']) # all anagram n-tuples
puts "---"
puts anas.sort_by(&:size)[-30..-1].map(&:join[',']) # the top by set size
puts "---"
puts anas.sort_by {|a| a.first.size}[-30..-1].map(&:join[',']) # the top by word size

in fairness, the symbol-currying is in "sym2proc.r", so it's really just 10 lines of code, but that's the general idea.
(lots of the library functions look haskelly, but ruby just felt better for string processing)

08 March 2007

eval is amazing.

In terms of things, eval is much bigger than JSON callbacks, even bigger than Lisp itself. It's big like the ribosome --- eval is how things come alive.

For example, I'm rendering urls and associated metadata in the browser from a ruby cgi script. Instead of some big XML specification, I'm just passing an array of strings from server to client. It's big_array.inspect in Ruby, and eval(inspectedBigArray) in Javascript. No monadic parsers, no macro magic; inspect and eval, code I don't even have to write myself. (And if the inspected big array is too big, the browser can start lysis; so it goes.)

G J Sussman: Programming is a good medium for expressing poorly-understood and sloppily-formulated ideas: exactly the opposite of people who'd want to plague me with type theory.