In just under seven years, Twitter has grown to count nearly 3% of the entire global population among its active users who have sent more than 170 billion 140-character messages. Today the service plays such a significant role in American culture that the Library of Congress has assembled a permanent archive of the site back to its first tweet, updated daily. With its open API, Twitter has become one of the most popular data sources for social research, yet the majority of the literature has focused on it as a text or network graph source, with only limited efforts to date focusing exclusively on the geography of Twitter, assessing the various sources of geographic information on the service and their accuracy. More than 3% of all tweets are found to have native location information available, while a naive geocoder based on a simple major cities gazetteer and relying on the user-provided Location and Profile fields is able to geolocate more than a third of all tweets with high accuracy when measured against the GPS-based baseline. Geographic proximity is found to play a minimal role both in who users communicate with and what they communicate about, providing evidence that social media is shifting the communicative landscape.
Authors: Kalev Leetaru, Shaowen Wang, Guofeng Cao, Anand Padmanabhan, Eric Shook