Maybe the one single thing more regrettable than drunk texting is drunk tweeting. Publicly broadcasting intoxication is definitely not the best way to bolster one’s social media clout, and yet a lot of people can’t resist boasting about their alcoholic escapades. Researchers have now trained an algorithm to spot alcohol-related tweets, and even to guess if the tweeter was drinking at the time of posting.
Nabil Hossain at the University of Rochester, upstate New York, decided to combine Twitter and machine learning to keep track of alcohol use across a given community.
To do that, he and his team collected thousands of geotagged posts tweeted between July 2013 and July 2014 in New York state, and then winnowed them down to tweets containing booze-related keywords (ranging from “beer keg” to “shitfaced").
They later churned about 11,000 tweets through Amazon’s Mechanical Turk crowdsourcing service. Each tweet passed through three human “Turkers," who were asked three questions:
Q1: Does the tweet make any reference to drinking alcoholic beverages?
Q2: if so, is the tweet about the tweeter him or herself drinking alcoholic beverages?
Q3: if so, is it likely that the tweet was sent at the time and place the tweeter was drinking alcoholic beverages?
Hossain used the answers to teach three different algorithms—”linear support vector machines” or (SVMs)—to answer one question each. The SVMs were trained with 80 percent of the Turk-answered tweets, and were later tested on the remaining 20 percent.
The success rate—that is, the rate at which the machines’ answers matched the Turkers’ consensus—ranged from 92 percent for the algorithm answering Q1, to 82 percent for the drunk-spotting algorithm answering Q3.
The team then took things a step further, trying to make out whether the drunk-tweeters were at home or somewhere else when posting. To pin down a Twitter-geolocated place as somebody’s home, they put together a list of words people are likely to use when they are home (like “bath,” “sofa,” “TV,” “sleep,” and “home”) and filtered thousands of tweets accordingly.
Again, they asked Turkers to establish whether the tweets had been sent from somebody’s home—and then honed the resulting dataset with other information such as the location of the last tweet of the day. In this way, they created another algorithm they claim can determine a user’s home location with an accuracy of up to 80 percent.
Crossing the two findings, the team managed to understand whether New York State dwellers preferred to drink at home or in a club or somewhere else. This highlighted some interesting patterns, like the fact that most people in New York City tend to drink at home or close to where they live (probably because there’s a bar on every block in NYC), unlike folks in the suburbs. The researchers also think the technique could be used to spot areas where alcohol consumption is more widespread.
While Twitter is not an optimal place to extract a representative dataset (users tend to be younger and from some minorities), Hossain plans to transform this research into a starting point to make sense of alcohol use.
“Our future work will perform a comprehensive study of alcohol consumption in social media around features such as user demographics, settings people go to drink-and-tweet,” the paper reads. “We can explore the social network of drinkers to find out how social interactions and peer pressure in social media influence the tendency to reference drinking.”
This post originated on Ars Technica UK
转载本站任何文章请注明：转载至神刀安全网，谢谢神刀安全网 » Machine learning algorithm can identify drunken tweeting