Friday, 10 February 2012

Ruby YAML and sorting top-level keys

I wanted to sort the top-level keys in a YAML hash in Ruby in a once-off way (that is, not sort all the time - just this once). I found two methods.

The first one is based on something I found on StackOverflow. So given a hash:

a = { 
  "a" => "b",
  "e" => "b",
  "c" => "b",
  "p" => "b",
  "z" => "b",
  "x" => "b",
  "d" => "b",
}

The function looks something like:

def sort_yaml1(obj)
  YAML::quick_emit(obj) do |out|
    out.map(obj.taguri, obj.to_yaml_style) do |map|
      obj.keys.sort.each do |k| 
        v = obj[k]
        map.add(k, v)
      end 
    end 
  end 
end

puts sort_yaml1(a)

The second methodology is very manual and not quite as elegant, but is probably less dependant on the YAML internals:

def sort_yaml2(obj)
  arr = obj.sort
  out = "---\n"
  arr.each do |element|
    entry = {element[0] => element[1]}
    out += entry.to_yaml.to_a[1..-1].join + "\n"
  end 
  out 
end

puts sort_yaml2(a)


Its a shame the SortedKeys option isn't supported out of the box in Ruby 1.8.7, but it seems the SortedKeys option isn't passed to the underlying syck driver at all.

1 comment:

  1. Small fix, so both are compliant:

    def sort_yaml2(obj)
    arr = obj.sort
    out = "--- \n"
    arr.each do |element|
    entry = {element[0] => element[1]}
    out += entry.to_yaml.to_a[1..-1].join
    end
    out
    end

    For given hash:

    (main):096:0> sort_yaml1(a).hash == sort_yaml2(a).hash
    => true

    Plus, between the two:

    (main):104:0> quick(10000) { sort_yaml2(a) }
    Rehearsal ------------------------------------
    9.790000 0.730000 10.520000 ( 10.585600)
    -------------------------- total: 10.520000sec

    user system total real
    9.890000 0.610000 10.500000 ( 10.505404)
    => nil

    (main):105:0> quick(10000) { sort_yaml1(a) }
    Rehearsal ------------------------------------
    5.790000 0.340000 6.130000 ( 6.254850)
    --------------------------- total: 6.130000sec

    user system total real
    5.770000 0.460000 6.230000 ( 6.265030)
    => nil

    First is an obvious winner.

    KW

    ReplyDelete