Episode 13 - Google blog search - part 2

Vlcsnap-2009-12-09-07h24m22s34

Click on image above for a preview

Released: Dec 09, 2009, Running time: 35 min

In this episode:

We add logic to avoid fetching duplicate search results from Google Blog search.

Tags: rspec, google, model, search, mysql, remarkable

Add to cart$5.00
  • Share

Now that we are using Google Blog Search to fetch search results, we are running into the same problem with duplicates that we had when querying Twitter.

Twitter was giving us the option to specify a “since” parameter in order to limit the search results to statuses created after a certain status ID.

Since Google Blog Search API does not give us such an option (or we could not find one), we had to find some alternate method to filter out results that come in having the same URL. Initially we wanted to use a hash function like MD5 or SHA1 applied to URLs.

Finally we decided to add a unique constraint on URLs to our SearchResult model and let the database and ActiveRecord validations handle it.

See part 1 of Google Blog Search to see how we fetch the results that need filtering.

Screenshots

  • Vlcsnap-2009-12-09-09h58m00s40
  • Vlcsnap-2009-12-09-10h03m21s184
  • Vlcsnap-2009-12-09-10h04m21s10
  • Vlcsnap-2009-12-09-10h05m22s114
  • Vlcsnap-2009-12-09-10h07m23s40
  • Vlcsnap-2009-12-09-10h09m56s39
  • Vlcsnap-2009-12-09-10h13m05s125
  • Vlcsnap-2009-12-09-10h16m35s175
  • Vlcsnap-2009-12-09-10h17m11s22

No comments yet

Commenting is disabled.