Transmissions from a new project on the data frontier:
Three days wandering landscape of one-billion-plus geolocated Tweets. Overstimulated. Devising computer-aided system to generate density fields. Will calm noise, help to answer important questions:
How does global conversation vary over time and space?
Windiest place – spot with most characters per Tweet – in world?
“Hella” tweeted more from downtown San Francisco or Telegraph Avenue in Berkeley?
Density estimation apparatus under development.
Inverse-squared approach consumes all available resources. Still running.
Kernel density looks blobby, oversmeared, gappy. Feels arbitrary, artificial.
Binning replaces fine data detail with cell structure. Like feeding Tweets through woodchipper, slopping pulp into buckets.
Frustrated. Radioed home base. Response drowned out by Retweet static. Will rest here.
Carrier pigeon arrived. Note suggests k-nearest neighbors algorithm. Wired together prototype with k-d tree, priority queues. Bushwhacked to vicinity of TwitterPlex, wiggled “nearest neighbor” knob from one to 10000:
Conclusions: Twitter oatmeal is lumpy, people love to Tweet from airport. More as becomes apparent.
Trolls soon. Signing off for now, Steve.