Social Networks are becoming an increasingly intriguing source of analysis for interdisciplinary research, mixing and merging algebraic formulas and users activities. The field of interdisciplinary studies entirely dedicated to what media agencies have been doing since the late 2000s are flourishing and evolving, multiplying and virtually competing to offer the best tool, create the best script and offer a complex and complete analysis of these 2.0 weapons that are changing the world, at least for the time being.
There is a conspicuous list of tools available for free and many of them are very ‘flexible’ or, they can be used to do many things, from calculating social relations to social media activiities to biological network. As long as there is a network, there is a tool.
In the specificity of social networking sites analysis, the available tools have two weaknesses, which are going to become three unless something is done. They will be listed and then explained in greater details.
- Lack of what could be defined ‘data-platform integration’
- Lack of what could be defined as ‘mood integration’
With ‘platform integration’ this post refers to the possibility to integrate data coming from different platforms (twitter, Facebook, LinkedIn, emails, blogs) and display it as one, complex and heterogeneous network. To date, this has not been possible especially due to the complexity and the nature of the ‘raw data’. What does it all mean? It is not too complex.
Social Networking Sites analysis relies onto the possibility to study relations (hyperlinks), established between the complex set of objects that constitute the World Wide Web or, an intense cobweb of URLs (nodes). Whether these URLs are users or videos, that only matters when these relations want/need to be extrapolated and studied. There are many ways of extrapolating data from Social Networking Sites. Facebook has launched the GraphSearch, Twitter has its own tools to check who is doing what, what conversations are trendier (#) and what account (@) is more influential in specific geographical locations. Marketing specialists and advertisers can go a bit further and plan a very well geo-targeted campaigns but even the basic and general information offered for free by the tool is nevertheless interesting from a sociological point of view. Universities and other organisations have developed their own tools, which are too many to list; some of them are open, some are exclusive to the developing organization. In all cases, there lacks a data integration, or the possibility to investigate all tools and platforms together, in the same ‘place’, as part of one big network and not as different, isolated platform-related networks. The reason is almost straightforward. Data is harvested different in Twitter because the source is different, the variables are different and because the typology of the data is different from Facebook data. Blogs and forums are another and yet completely different story. In the case of Facebook and emails for example, the data can only be harvested in relation to an ego-network or, only based on what you –user- follow, ‘friend’ (another Facebook revolution: friend has also become a verb) and have a communication with (sender-receiver). This observation brings to the next point.
- Mood Integration
What is mood integration? The answer lies in a long list of questions such as how can we quantify a ‘like’ on facebook and a ‘retweet’ on twitter? How can be quantify and even compare a ‘repost’ and a ‘#’ on Twitter? But questions continue; if we apply traditional Social Network Analysis (SNA), what is the difference between a ‘friend’ on Facebook, a link between two bloggers, a follower on Twitter and a follower on Facebook? How about a connection on LinkedIn and a friend on Facebook?
There is no possible answer. Although we could reduce all of these ‘semiotics’ into an analysis of hyperlinks, the gathered data will be of different nature; which means that unless we manually input every URLs we stumble upon into Gephi, there is no viable tool that allows a researcher to harvest Facebook, Twitter, blogs and forums, LinkedIn and emails and create a uniform database.
Another problem arises, especially when it comes to collecting, understanding and aggregating data on Twitter.
Twitter has the characteristic of allowing users to open an account, participate in discussions creating hastags, which is nothing more than an issue around which a public is formed. Every user has only 140 characters at her or his disposal; in this very small space, links can be shared, opinions can be voiced and news can be given. Twitter is good for the android users, Iphone fanatics and for people who cannot care less of writing a blog-post. Pictures are shared on Twitter, moods and ideas are displayed. So far we could argue that even Facebook does it, LinkedIn has recently added the ‘status’ and the ‘post’ feature (along with other annoying applications). And they all share the possibility of having the icon directly on the phone screen. With the difference that whereas on LinkedIn or on Facebook nobody would expect a sudden and continuous change of status, Twitter works oppositely: nobody would expect anybody to not tweet constantly. What happens behind the scenes is that an enormous database of information is created, perhaps stored and available for retrieval. Already in 2010 Twitter highlighted the difficulties of having such big data to deal with and launched FlockDB, a graph database able to keep up with the humongous amount of data produced by users at the wake of the Twitter era. What is even more problematic is that there are very few hopes for any researcher who wants to not only have access to a graph database and have a grasp oof big datasets; it is also a problem of historical analysis. Whether we want to extrapolate data and statistics using open access software and tools, whether we go through OAuth and similar or decide to buy them trough Twittercounter for only 29 USD per profile (http://twittercounter.com), we can only retrieve users’ data and only one user at the time;good luck making connections between different users. The problems becomes even more complicated when the # (hastags) want to be analysed. It is possible to recur to the search APIs and gain info on most receint hastags but nothing closer to a historical set of data that can show the evolution of a trend. The creation of an application is rather urgent especially if the study of the hype and/or decline of an issue wants not only be described through a touch and feel and a set of hypothesis but also shown and integrated into a network graph.
Shall we go on hypothesizing and hoping to receive a conspicuous grant that allows a researcher spend 20,000 GBP for the acquisition of clean csv reports from twittercounter.com and about the same to have a task force collating information coming from different @? Shall we still go on and hope that at some point in life a very interested and well-heartened developer free of charge creates a hypothetical TwitHisto, an ideal Twitter app that allows us to trace the history of a # and, again, from somewhere 20,000 GBP pop up to aggregate all of the desired and needed # so that to have some sort of idea on how a trend rose, how it developed and situated itself in a broader Twitter (and non) context?
These are questions that do not only pertain to the word of developers and to the Twitter staff, which is always duper-helpful to anybody willing to create a new application. It is also a question for academics and especially researchers, those who are in charge of understanding how data can be managed, altered and displayed, displaced and integrated in a way that allows social research to get rid of old methodologies to describe a whole new world of coded information, that lies in a virtual mundaneum that can be anywhere at anytime. It is not only about what to do with a big, volatile and ever changing database; it is also about how to make sure we do have a grasp of it for innovative and cutting edge research.