The six formative years that I spent in the Southern U.S. gave me many things: a deep understanding of cockroaches, impeccable water skills, and a year-round tan. And, last but not least, the precious, lingering gift of the word y’all.
I could sing the praises of y’all ’til the end of time! Short for you all, it’s a simple, monosyllabic utterance that evokes lemonade on the veranda, strolls under oaks and Spanish moss, and warm, uncomplicated, friendly times. The essence of the South wrapped into four tidy letters and an apostrophe! How could you not help but to love y’all, y’all?
Despite these feelings, my thoughts sometimes wander, and I find myself asking: could there be another such quirky little word buried in the Southern lexicon?
At such questions, I’m predisposed to throwing algorithms, and always on the lookout for an excuse to do some hard-core statistical data-mining. So, as they say, game on! An urgent signal went out to my crack team of computer scientists, and at our first meeting, we formulated a slightly-more-scientific query:
Could we quantify the differences between Southerner and Yankee, by analyzing the everyday communications of the average Joe?
Hell yeah! First, we defined the Northeast as New Jersey, New York, Maine, and everything in between, and the Deep South as Louisiana, Mississippi, Alabama, Georgia, and the Carolinas. Then, we gathered our raw data, on sale at a discount, from the aisles of the Internet Dot Com, in the form of 4,000 random blog feeds from a major social networking site, tied to our regions via user profiles. After a bit of text extraction and some filtering to handle the degenerate cases (e.g. a post with a thousand repeats of “I love guinea pigs!”), we had a 5,000,000-word sample from the Yankees, and another of similar size for the Southerners.
We fed these into the Corpusculator, a custom suite of text-analysis software. For several minutes, it rumbled, as regional differences percolated, and our bloggy inputs, in mutual opposition, slowly neutralized the smells of teen spirit.
Then, Eureka! Out popped two lists: one for North and one for South, each cataloging the words that appeared in excess, as relative to the frequencies of the other region.
Via the wondrous Wordle, I built a word cloud for each, and assembled them into a two-chapter novella that I call “A Tale Of Two Regional, Multi-State Areas.” Click on the picture below to see the whole thing, with the caveat that Northeasterners are quite fond of dropping the F-bomb, which appears prominently:
What we have here is two solid blocks of differential Zeitgeist, chock full of inter-regional revelations. Yankees refer more to summer and winter – probably because in Dixie, the seasons are rarely more than a curiosity, but to the north, the difference between August and January is fundamental. Northerners tend to reference books, while the South seems more preoccupied with the doctor. Then, there’s the aforementioned profanity – with Yankees preferential to the F-word, and my dear Southerners given to damn, frankly.
As for my precious y’all? Yup, there it is on the southern side. A quick scan revealed that its kissin’ cousins – the other quirky Dixie colloquialisms – were all texting shorthand such as lol and omg. Color me disappointed, but I suppose that’s that price of progress, y’all!