Episode 11 - The Dashboard, part 3 - removing duplicates

Vlcsnap-2009-11-08-19h26m11s65

Click on image above for a preview

Released: Nov 09, 2009, Running time: 72 min

In this episode:

We remove duplicates from our Twitter search results by remembering the ID of the last result and only fetching tweets that have occurred since then.

Tags: cucumber, rspec, twitter

Add to cart$5.00
  • Share

We get duplicates in our search results for two reasons:

  1. Running the same search again and again will return some of the results already fetched in the previous runs.
  2. Search terms can be defined more than once, and we get results for each of the search terms.

In this episode we deal with removing the first source for duplicates. For this we will add a column to our Search model and use it to query Twitter so it won’t send us results we already received.

Before we add the extra column, we refactor our search code in order to support the changes in logic we want to introduce.

Unfortunately, due to circumstances beyond our control, we don’t get around to finishing the algorithm, but we do get to the point where we only have a couple of more steps which we explain. To check out the finished code look at the Brandizzle project on GitHub for the Search model and the Search model spec.

Screenshots

  • Vlcsnap-2009-11-08-19h26m29s244
  • Vlcsnap-2009-11-08-19h28m31s186
  • Vlcsnap-2009-11-08-20h02m09s127
  • Vlcsnap-2009-11-08-20h10m24s233
  • Vlcsnap-2009-11-08-20h10m33s67
  • Vlcsnap-2009-11-08-20h12m41s63
  • Vlcsnap-2009-11-08-20h28m07s107

No comments yet

Commenting is disabled.