Gavin Sheridan, innovation director of Storiful, blogs today about his attempts to put geocoded Twitter info to use in finding patterns that could be turned into stories.
He ran searches on Twitter and fed the results into Datasift. What you’re seeing in the map below are 135,000 tweets over this past weekend from users who mentioned GOP candidates by name. Green dots denote generally positive tweets. Red, negative.
That’s just a screen cap. The Google map itself is zoomable, of course.
One of the things Gavin learned, he says:
When it came to Rick Santorum, we found that more females appeared to be discussing him, while the opposite was the case with Ron Paul.
I think what Gavin means is that the 3.2 percent difference in these two red pie chart slices may be significant. Or not.
Now, perhaps this data is good and perhaps it’s not. The actual number of tweets he’s looking at is only 3,196. The reason, Gavin says, is because not everyone on Twitter uses geo-tagging.
Gavin writes:
The result was relatively small: only about 2.4 percent of those tweeting geo-tagged their tweets
And that’s the rub. It’s been way too long since I calculated margin of error. But I’d imagine the margin of error for 3,196 — if you’re hoping to have Twitter users stand in for, say, “likely voters in the U.S.” — might be more than 3.2 percent.
[In fact, if want our sample to stand in for, say, 200 million voters in the U.S., the margin of error would be only 1.73 percentage points. So I stand corrected. Go knock yourself out playing with this new toy.]
In addition, the set of Twitter users nationwide who happen to geo-tag is not a random sample. So whatever data you get from this would not necessarily be a fair representative of, again, “likely voters.” Or any other group. Including run-of-the-mill Twitter users, for that matter.
The worst thing about using data like this, though: It identifies the tweeters. A little too closely for my tastes.
Those little green and red dots? If you zoom in close and click on an individual dot, you’ll be able to read each tweet in your database.
Not just see the tweet. But the identity of the user as well. (I’ve redacted this screencap, but you get the idea.)
Now, we all know Twitter is open to the public. Assuming you’ve not locked your tweets, anyone can read what you’re broadcasting. And, yes, other users can pull data on your and put it to various uses.
But what will your readers think when they find a story on your new web site, they zoom in close on their neighborhood and find you’ve quoted them, identified them by name and then mapped their home addresses?
Quite a difference from the old days, when a reporter would stop someone at the mall and ask for their name and their opinion. At least those folks had a choice whether or not their opinion showed up in the newspaper. And we never ran maps guiding loons from the opposing political party to their doorstep.
If nothing else, Gavin’s idea of mixing geocoded Twitter data with Datasift is a powerful, powerful case for not using geo-tagging with Twitter.
So before you use a tool like this, stop and think. If there’s a chance your readers might not be amused to find themselves identified so prominently, then consider not posting a map like this.
We’re supposed to be the good guys, remember?
—
Thanks to Niketa Patel of CNNMoney for retweeting Gavin’s blog post this afternoon.






In our digital age, it strikes me as reasonable to accept a user’s publishing location information publicly as consent to have that information included in a project such as this.
That being said, abstracting the location data to the city level wouldn’t have hurt this visualization.