My new project: Tact, a simple chat app.

jQuery, hammers and nails, and screen scraping

November 24, 2008

I am currently playing around with jQuery and I am in a mode where jQuery is my hammer and every imaginable web problem looks like a nail to be hammered with jQuery. So I am trying it out on various kinds of web problems, from UI beautification (from widgets like accordions and tabs, to details like rounded corners) to more structural/backendish things.

One thing I just did is very similar to screen scraping, but in a more intelligent way. I call it “intelligent screen scraping”. I’m pretty happy with my result.

In short, imagine you have two websites. Site A has a page which is something like this:

<h1>Current news</h1>
<a href="...url....">News item 1</a>
<a href="...url....">News item 2</a>
<h1>Archive</h1>
<a href="...url....">Old news item 1</a>
<a href="...url....">Old news item 2</a>
<a href="...url....">Old news item 3</a>

Your mission: to re-post the first paragraph of every item under “Current news” (but NOT archive) to site B.

Now, traditionally, you would build some sort of server-side aggregation thing. Maybe with RSS or something. But I didn’t have that luxury, and let’s say for simplicity that both of these were simple HTML pages without any backend “system”.

So I took this nail and hammered it with jQuery. On page B, I put a piece of jQuery which, as soon as page B is loaded, goes out and fetches page A by Ajax. Then it does the following magic on the DOM of page A:

$(data).find('#layerName h1:first').nextAll()
.parent().find('h1:last').prevAll().find('a').each(function(i) {
}

Which translates to the following in English:

A bit intimidating to look at, but makes perfect sense. To me, anyway. And this result in particular is nothing extraordinary, it’s just an example of the class of problems you can solve with jQuery.

Now, this exact same thing could have been done with “classic” screen scraping. But I called the jQuery stuff “intelligent screenscraping” because from my point of view it has these advantages.

I’m not saying that this is an ideal solution to the sort of aggregation need that I described above. But I did find that for my particular need, I got the work done with far less code than anything serverside would have been.