Released: Dec 09, 2009, Running time: 35 min
In this episode:
We add logic to avoid fetching duplicate search results from Google Blog search.
Add to cart$5.00Released: Dec 09, 2009, Running time: 35 min
In this episode:
We add logic to avoid fetching duplicate search results from Google Blog search.
Add to cart$5.00Now that we are using Google Blog Search to fetch search results, we are running into the same problem with duplicates that we had when querying Twitter.
Twitter was giving us the option to specify a “since” parameter in order to limit the search results to statuses created after a certain status ID.
Since Google Blog Search API does not give us such an option (or we could not find one), we had to find some alternate method to filter out results that come in having the same URL. Initially we wanted to use a hash function like MD5 or SHA1 applied to URLs.
Finally we decided to add a unique constraint on URLs to our SearchResult model and let the database and ActiveRecord validations handle it.
See part 1 of Google Blog Search to see how we fetch the results that need filtering.
No comments yet