tiistai 8. lokakuuta 2013

Social network analysis of one's Facebook friends

This blog post is in English as you may have noticed. It is made for a training session held at Södertörn University for the Swedish media organizations.

Objective:
Create a network visualization of one's Facebook friends.

Requirements:
Gephi, Python, MongoDB, Google Chrome, Scraper, Google Account

0. Network visulialization examples 


First here are some of examples where network analysis has given insigth to the given story.
  1. Social network analysis of a shooter suspect's Facebook friends
  2. Probing the murky web of foreign ownership in Finnish firms
  3. Spotlight - Party of the True Finns and islamophobia
As you can see network visualizations can be used in various ways. They can be used both in the background and also as an interactive end user application.

 1. Get your Facebook friends


Your own Facebook friends data is available through Facebook API. To access your Friends list through Facebook API browse to address:
https://graph.facebook.com/me/friends?access_token={place_valid_access_token_here}

A valid access token can be acquired from Graph API Explorer. (Pic 1)

You can get your own Facebook friends data from Facebook API in JSON format fairly easily. (Pic 1)

But only your own Facebook friends data is available through the API. So if you desire to get the friends data of any other profile it is vital to know that it is not available through Facebook API. So if you want to make a social network analysis of your friend's Facebook friends you need to make a little detour.

The detour here is called screen scraping. Screen scraping is a method where we use the computer to read the data what we see on our screen. We do this while computers are much faster doing it and make less mistakes. At simplest screen scraping is done from one page but it can be done from any number of pages.

A tool called Scraper is a very good for doing this when we just need to scrape data from one to two pages. Scaper is Google Chrome addon. if you desire to know more see a tutorial by Jens Finnäs.

For scraping we will use Facebook's mobile interface while it has much simpler outlook. This is nice when doing screen scraping because we have to deal with less code.

The following steps allow you to screen scrape your own friends list (that was also available through the API).
  1. Go to your Friends list. It is located at https://m.facebook.com/{place_your_username_here}?v=friends
  2. First scroll down as many times you need to get all of your friends to show on the screen. To achieve this it is quite handy to used the pagedown functionality. You may experience problems if you have more than 500 friends or so.
  3. Next open the context menu while clicking one of your friend and from the context menu select scrape similar. You need to have the Scraper plugin installed and enabled. (Pic 2)
  4. After clicking scrape similar you'll get a new window. On the left side you can define what you are scraping for and on the right side you see what you get. (Pic 3)
  5. To define what you are looking for you need to use either XPath or jQuery. The default selection is XPath and that suits us. Enter the following code without the quotes into the available field "//div[@class='_4mn c']/a" and click Scrape. What the code says is that we want to get the anchor elements (aka. links) which are located under a <div>-element which has class attribute value "_4mn c". In the section columns set XPath to "@href". That tells the code to fetch the @href parameter from those anchors.
  6. Export the dataset to Google Docs and open the document in Google Docs. Remove unnecessary columns B to E.
  7. Download the data as a .csv-file to your local computer. File -> Download as -> Comma separated values (.csv, current sheet)

(Pic 2)

(Pic 3)

 2. Create the connections between friends


Now we have a .csv-file of all your friends. From this we could make a network visualization which would show that all your friends are connected to you.

What we want to do is to figure out how your friends are connected which each other. This can be accomplished via Facebook API's Friend property which allows us to ask wheater to Facebook profiles are friends with each other. This is nice while you can do this for any two Facebook profiles regardless of their privacy settings.

Key point here is that one can't ask Facebook API who are all the friends of a Facebook profile. But one can ask one-by-one wheather to profiles are friends. So if you have a list of Facebook profiles it is possible to make the network. And as you may have realized we have list while we have screen scraped it. This also means that sometimes you can't get the profile list you deside due user's privacy settings. For example users can hide their friends list from users that they have not friended.

But how to use the Friend property. One can access it via Graph API Explorer which we already used to acquire the proper access token.

With Graph API Explorer we could make the network by hand by entering all the queries to Graph API one-by-one. For example the query:

"SELECT uid1 FROM friend WHERE uid2 = 635279474 AND uid1 = 732028610"

tells us whether these to user id's are friends with each other. But for any larger network this would be a pain in the ass while you would have to make hundreds and hundreds of queries. (Pic 4)

(Pic 4)
Fortunately this is not the case. I have written a Python script that does the work for us. But unfortunately I'm quite sure that the script won't work out of the box for most users.

Download the script.

Place the script file in some good place where you can access it. Copy also the .csv-file that same folder. Next open terminal, browse to that folder and run:

python facebook_network.py {filename} ({use_existing = True|False})

for example:

python facebook_network.py teemo_tebest.csv False

When you are asked for an access token please refer to Graph Api Explorer to get one. You may phase several problems while running the script. First check that you have Python installed. Secondly you need a running MongoDB. Thirdly you need to have atleast two Python libraries installed called pyfacegraph and pymongo. (Pic 5)

The script need to be run in terminal environment. (Pic 5)
The script runs for a while. First it fetches all the metadata available from the profiles. After that the script forms the connecions between the users. After all is done the script outputs the data into a network file format.

3. Visualize the data with Gephi


Now that we have the data in a network format (.gexf) we can visualize it with available tools. One great and free tool is called Gephi.

If you failed to form your own dataset you can download and example dataset.

First open Gephi and use the basic file opening functionality to open the .gexf-file created in the previous step. You'll see a pop up window that gives you an overview of the data. After Gephi has loaded the data you'll see a junk of data. (Pic 6)

The data is layouted randomly when the file is loaded for the first time. (Pic 6)

Without adjusting any settings we don't get much out of the data. We can zoom into it and point single nodes to see the edges leaving from it but that is just it.

What we need to do is to define the layout algorithm, the node size and the color values for the nodes to get more out of the data.

Just to make clear:
Node: balloon
Edge: connection between nodes.

For social networks a layout algorithm called ForceAtlas2 is a proper choise so we will use that. Check that your settings are as in the picture. (Pic 7)

ForceAtlas is great for social networks. Other available layout algorithms include for example Geo Layout. (Pic 7)
After you hit run you'll see the magic happening in just a few seconds. The nodes are settled in the network based on the edges between them. Meaning that the nodes which have more connections between each other are grouped closer to each other and the nodes that have less connections to each other are torn apart.

Next you want to define the colors and the node sizes. But before then you need to count some figures. Mainly you want to run Modularity and Eigenvector centrality. Modularity will divide the nodes into groups and eigenvector centrality will allow us to define the node size. (Pic 8)

Statistics can be calculated from the right side. (Pic 8)
Now we can use these figures to define the colors. (Pic 9)

You can change the colors if you want. (Pic 9)
And the node sizes. (Fig 10)

You may define the min and the max node size. (Pic 10)
After doing so your network could look example like this. (Pic 11)

The separate groups are in different colors defined by Modularity and the node sizes tell how central the node is in the network. (Pic 11) 
You can zoom into the groups and point out single nodes to see the edges. To enable the labels click the small arrow icons below the network. (Pic 12)

You may enable the node labels. (Pic 12)
You are all done. You may adjust your colors and nodes sizes. You can test for example to use the gender information to define the colors. You can also try other layout algorithms to see what is available.

See also Olli Parviainen's great slides about visualizing your Twitter network with Gephi. Slides 8 to 17 show you the main visualizing steps inside Gephi that we've also used here.

4. Summary


Using network visualizations to show the connections between your Facebook friends or social media connections in general is a great way to make insight.

There are also several Javascript tools available that enable you to publish your network online as an interactive visualization:
Network visualizations are not that familiar way of visualization for a common user. Not like bar charts or maps. People tend not to understand what they are about and call them these odd spyderweb visualizations.  From this perspective I am not that eager to publish them online. But for professionals like journalists and programmers network visualizations can give real good insigth into the data. Knowing how to do network visualization gives you most often an unique perspective that no one else have thought of.

6 kommenttia:

  1. I believe that social networks are very powerful in terms of all the aspect, like business, education and others.
    Social Network

    VastaaPoista
  2. The importance of social networking sites in today's world is immense. Indeed, the above mentioned websites are best, however there is also a latest Social Networking site MyworldGo , where you can connect with others. We are also available on iOS and android.

    VastaaPoista
  3. The importance of social networking sites in today's world is immense. Indeed, the above mentioned websites are best, however there is also a latest Social Networking site MyworldGo , where you can connect with others. We are also available on iOS and android.

    VastaaPoista
  4. Best information about software.Thanks for sharing such great information. hope you keep sharing such kind of information Web Data Extractor

    VastaaPoista
  5. We setup the whole collection of microsoft office online themes for PowerPoint which are meant to highlight your presentation.

    VastaaPoista
  6. Amazing information about the software. Thank you soo much for the amazing information. keep sharing great information like this. Web Data Scraping

    VastaaPoista