Wednesday, 28 December 2011

Esper and Structured Content Struggles

Working with structured data in Esper means you need to deal with a varying amount of issues including special character types and nesting. As Esper is Java all this data must be represented as a java.util.Map type before representation which adds some special cases.

I've compiled a number of problems and solutions that I have experience directly. The code and data examples will be using JRuby since this is what I develop with. For more details getting this working see my article provided earlier.


Doing a select on a stream that contains special characters

Sometimes structured data can contain special characters that need escaping. This is especially a problem in the label or key part of the data. Lets take the character '@'. If you need to escape, you must use the back-tick character.

Input Data:

{"itemName"=>"test","price"=>100, "@foo" => { "bar" => "baz" }}

Select statement:

select `@foo`.bar from OrderEvent

This would return:

{"@foo.bar"=>"baz"}


Without the escape, you would get the error:


NativeException: com.espertech.esper.client.EPStatementSyntaxException: Incorrect syntax near '@' at line 1 column 7, please check the select clause [select @foo.bar from OrderEvent]


Returning a column with special characters


The converse of the last section is the returning of special characters in column names. Lets say we want to return the above example but with a column name of '@selected'. You'll need to escape it:


select `@foo`.bar as `@selected` from OrderEvent

The problem with this, is that now your column will still contain the back-ticks. I'm not sure if this is meant to be a bug or not, but I wrote a small function to clean this up:


def clean_escapes(old_hash)
  new_hash = {}
  old_hash.each do |k,v|
    new = k.gsub('`','')
    new_hash[new] = v 
  end 
  new_hash
end # def clean_escapes

This will strip out back-ticks aggressively, but you may find you want something more selective that only does this on the first and last characters.


Selecting a property that is dynamic


Dynamic properties are allowed in Esper, however they must be dealt with during a select properly or else you will receive an error if you hit a block of data that does not contain the property.


Lets say we have 3 pieces of input data:


{"itemName"=>"test","price"=>100, "@foo" => { "bar" => "baz" }}
{"itemName"=>"test","price"=>200}
{"itemName"=>"test","price"=>300}


You can see that the property '@foo' only exists in one case. This is a dynamic property.


So first of all declare the event type. For some reason, without defining @foo earlier, you will get an error (this only seems to happen at the top level with properties that require escaping):


order_event_type = { 
  "itemName" => "string",
  "price" => "double",
  "@foo" => {}, 
}
ep_config.addEventType("OrderEvent", order_event_type)

The select statement to grab @foo.bar whenever it exists would be:


select `@foo`.bar? from OrderEvent


This then returns:


{"@foo.bar?"=>"baz"}
{"@foo.bar?"=>nil}
{"@foo.bar?"=>nil}

As you can see, it returns only a match once.

Returning original and new data

Sometimes you want to preserve the original data as well as add new properties. Lets say you have the following set of data:

{"id" => 4, "itemName"=>"test", "price"=>100}
{"id" => 5, "itemName"=>"test", "price"=>100}
{"id" => 6, "itemName"=>"test", "price"=>300}
{"id" => 7, "itemName"=>"test", "price"=>300}
{"id" => 8, "itemName"=>"foo", "price"=>300}
{"id" => 9, "itemName"=>"test", "price"=>500}

And you want to provide a brand new property called 'doublePrice'. You normally would select the information this way:

select price * 2 as doublePrice from OrderEvent


Which produces the following output:



{"doublePrice"=>200}
{"doublePrice"=>200}
{"doublePrice"=>600}
{"doublePrice"=>600}
{"doublePrice"=>600}
{"doublePrice"=>1000}

Now if instead you want to return the original and new properties, the SQL query can be modified as such:

select *, price * 2 as doublePrice from OrderEvent

The problem however, is that your UpdateListener class now receives a new class type:

com.espertech.esper.event.WrapperEventBean

This wrapper contains separate underlying returned events for the origin and new properties. So using a small change to my handler code, I merge this whenever it occurs:

if event.class == com.espertech.esper.event.WrapperEventBean then
  c = event.getUnderlying.first.merge(event.getUnderlying.second) 
  puts c.inspect
else
  puts event.getUnderlying.inspect
end

Now I get combined output:

{"id"=>4, "itemName"=>"test", "price"=>100, "doublePrice"=>200}
{"id"=>5, "itemName"=>"test", "price"=>100, "doublePrice"=>200}
{"id"=>6, "itemName"=>"test", "price"=>300, "doublePrice"=>600}
{"id"=>7, "itemName"=>"test", "price"=>300, "doublePrice"=>600}
{"id"=>8, "itemName"=>"foo", "price"=>300, "doublePrice"=>600}
{"id"=>9, "itemName"=>"test", "price"=>500, "doublePrice"=>1000}

Now this won't work for nested data as Ruby's Hash merge will only work on the shallow level and override keys on the left hand side of the merge operator. If you want to merge structured data, you will need a far more intelligent way of merging data in that case.



No comments:

Post a Comment