Thursday, March 27, 2008

Dapper has a lot of promise, but boy can it be annoying!

I haven't done a review in a while, so I thought I'd get back into it, starting with Dapper. I came across them in a Hinchcliffe blog entry about the most promising mashup tools. He had Dapper on the list, along with other more well-known tools such as JackBe, who still won't let me test their product.

Dapper allows developers to pull content from websites and expose them using various APIs. There is nothing new about this. Nearly all the products I've reviewed have this capability. Dapper differentiates itself with the number of different APIs it supports, which I'll get to later. The list is very impressive, but doesn't include SOAP.

Too bad about SOAP, but I understand. SOAP is overkill for screen scraped content. You don't need transactional integrity or security (not that SOAP has that problem licked) if you're just pulling read-only content from a page. Still, it does mean I can't use the content in something like a BPEL orchestration. Clearly that use isn't something Dapper has in mind. Then again, neither did Intel MashMaker or Kapow.

For consistency with my other reviews, I attempted to create a feed from the news page on the Serena website. No luck. Dapper couldn't load the page. So next I went to Digg to add recent news items. While Dapper loaded the page, it scrambled the page elements. I couldn't pull the top news stories from digg/science or digg/technology.

Next I went to my own blog to see if I could pull my content into a feed. I know, everyone can already get a blog as a feed, but this was an experiment. The instructions for Dapper say its selection algorithm will work better with multiple similar pages, so I added the links for my most recent three posts and went to the next step, selecting the contents to scrape.

Unlike other screen scraping technologies I've played with, Dapper has some smarts built in. Their algorithm supposedly helps mashers select the right content to pull into the API without having to mess around with Xpath. Well, certainly there is an algorithm in place, but I found it much more of an annoyance than a help. I couldn't get selections to work correctly, and when I tried to de-select manually, I got a page script error and the interface stopped in its tracks. I couldn't interact with the application at all, and had to reload the entire page with a new URL.

I went back to the start and tried it again, and got similar results, except that I didn't even try manually de-selecting page elements. Instead I wondered if I should not give Dapper multiple pages to work with. I selected 'Back' in the interface to return to the page where I selected my inputs, intending to remove all but the latest blog entry. And guess what?

Right! I got an error on the page again. The 'Back' link didn't work either. At least this time the interface didn't freeze up.

After a while I defined something close to the selections I wanted. (I never did get the exact content.) And now for the reason Dapper is different. The reason I kept playing with Dapper despite its many flaws: I could expose the content as POX, RSS, Filtered RSS, HTML, a Google gadget, a Netvibes module, a PageFlake, a Google map, an image loop, an iCalendar, Atom, CSV, JSON, XSL, YAML or even as an email. True, a lot of these formats don't make sense for blog content, but it's nice to have the option.

I especially liked the preview that let me take a look at the content before finalizing my output format choice. That was sweet.

Bottom line. I wouldn't use Dapper today for production mashups. It just isn't ready. However, when Dapper fixes their algorithm so it isn't annoying, when they do some serious debugging, when they fix their performance issues and when they otherwise clean up their usability, it will be one killer application for creating mashable content.




2 comments:

Unknown said...

Hi, My recommendation is to take a look at wso2 and openkapow as well. Thanks for this post.

LOTONtech said...

Nice article.

Readers interested in mashup development environments should also check out Microsoft Popfly, Yahoo! Pipes, and the Google Mashup Editor (GME).

Tony Loton, author --
"Introduction to Microsoft Popfly, No Programming Required"
"Working with Yahoo! Pipes, No Programming Required"
"Mashup Case Studies with Yahoo! Pipes"
"Creating Google Mashups with the Google Mashup Editor"
http://www.lotontech.com/it_books.htm