O'Malley Library: DAsH: Using Gephi to do Network Analysis

What is Gephi?

Gephi is an open-source data visualization program specifically useful in creating network analysis graphs. CSV files listing connections (edges) between nodes (datapoints) can be uploaded, along with attribute sheets for these nodes. The data can be visualized in different layouts, and with different symbol and color options. It also contains analysis programs that let you see the different measures for each node and for the graph as a whole. It is a bit more complex and less user-friendly but offers a lot of customization and analysis options, so the trial and error for complex subjects will likely be worth it.

Learning Goals

Gephi is a more sophisticated way to do network graphing. You have more options for customizations, and you can export the information that you get from it. Further, if you are looking to do more than just visualize and are interested in the findings your graph shows like - the density of your graph as opposed to other graphs, or the centrality of a given node - you can also calculate and export the different metrics for your graph as a whole and for the individual nodes.

With this module, you'll be graphing the interactions between characters in A Midsummer Night's Dream by importing the below csv files into Gephi and visualizing the results. You'll add labels, and change the colors and sizes to best display your data.

Data

The below CSV files are from the lesson in on how to create structured data from unstructured text. If you created that sheet already and want to reuse your data, you'll just need to save each sheet as its own CSV file. In the Links file, change the first two columns from Stage Mate 1 and Stage Mate 2 to Source and Target. On the Nodes file, change it from Node to Id.

Getting Started

If you haven't already done so, download Gephi from their website. The version I'll be using in the tutorial below will be 0.10
Installing it is pretty easy, it does have some system requirements you'll want to make sure that you meet though. It unfortunately doesn't have an online equivalent though, so if your computer doesn't have the required space or RAM available you'll need to use something else.
Once the installation is finished, locate Gephi in your start menu and start up the program.
On the initial welcome window that pops up, click on New Project
There are 3 main interfaces in Gephi. There's the first window Overview which will show a graphic analysis of the data that has been uploaded as well as allow for some customization of colors and graphics, the calculation of statistics and the application of filters. The next window Data Laboratory is where the tables that are uploaded for the nodes and links are located, so if anything needs to be inspected or modified about the data, that's the window to go to. The last window is Preview and it allows for additional visual customization of the graph, I won't be using it much in this module because the Overview window offers much better customization options in my opinion, but feel free to play around with it.

Importing your Spreadsheets

A convenient thing about Gephi is that within one window, you can open up multiple workspaces. So if you have a set of data that you want to look at in multiple manners, you can open up different workspaces. We'll first import the Midsummer Night's Dream spreadsheets for your Nodes and Links.

File > Import Spreadsheet
Start with the links spreadsheet. It will should automatically detect that the columns are separated by a comma. Select that you want to import it as an edges table (that's the terminology that Gephi uses for the connections between items).
When you click on Finish, it will pop up with an Import Report letting you know that there are Parallel edges, and asking you to pick a strategy to deal with them. Parallel edges in this case means that there are multiple paths between the same set of nodes. You'll notice this pretty quickly if you peruse your sheet. Puck and Oberon are linked together in several scenes and in the same scene multiple times. Gephi wants to know how to deal with that, since it can't display them as multiple parallel lines. In this case select Don't merge. It will mean that you won't be able to see, say, all the lines between each node, but you will be able to filter to only display certain Acts.
Additionally check that you want to allow Self-loops, your Graph Type is Mixed and that you want this new sheet Appended to your existing workspace.
Click OK, and the window will close and a graph will appear in the overview.
Click to the Data Laboratory tab on your current workspace, and a scroll through will show that theedges have been imported along with the additional information like the Act and Scene for each connection. However, when you click to the Nodes tab, there isn't additional information about your nodes. That's because we've only imported the links, next we'll be importing the nodes sheet.
Go to File > Import Spreadsheet and select your nodes spreadsheet Midsummer Night's Dream - Nodes and click on open. Make sure in the options you have selected to import it as Nodes table.
In the Import Report that springs up, select to Append it to existing workspace.
Click OK and now the Nodes section of the Data Laboratory contains the additional character information for group and gender that were included in with the nodes table

Calculating Metrics

In addition to creating a visual representation of how the different nodes are connected, Gephi can calculate the metrics for the qualities each node has for how connected it is to the other nodes, and how densely the graph is constructed as a whole. You can calculate a lot of metrics about your graph in here. To start with we'll take a look at degree and centrality.

Degree is a measure that will tell you how many other nodes each node is connected to. Centrality is a measure that tells you how much a particular node functions to facilitate connections between nodes, like in the case of the Midsummer Night's Dream scenario where Bottom is in a scene with characters from many different groups, he is more central than someone like Theseus who is nearly only in scenes with other members of the royal family.

On the side of your Overview screen, click on the tab that says Statistics. This will show you the different metrics that you can calculate for your graph.
First, click on Run next to Average Degree. Within the Statistics show you the average degree for your graph. Click over to the Data Laboratory for this workspace and this sheet now has information on the degrees stored for each node.
Click on Run next to Network Diameter. This will compute how each node is connected to each other node which will add measurements like Betweenness Centrality, Closeness Centrality and Eccentricity to your Data Laboratory which in turn means that it becomes a measure that you can add to your graph. Leave Directed checked in this case.

To talk about paths between nodes more informally for a second, think about the party game Six Degrees of Kevin Bacon, where you try to connect Kevin Bacon and any other actor or actress based on movies they were in together. For example if you wanted to connect Kevin Bacon and Daniel Kaluuya, you could say that Kevin Bacon is in X-Men First Class with Caleb Landry Jones who is in Get Out with Daniel Kaluuya.

If there was a network analysis graph of the movies of Kevin Bacon, then an actor or actress with a high betweenness centrality in that graph is one that very often appears on the path to connect Kevin Bacon with other actors. One of those might be Laurence Fishburne, who is in Mystic River with Kevin Bacon, and who also appeared in Apocalypse Now with Marlon Brando who appeared opposite many other prolific actors throughout the 1950s-1990s.

An actor or actress with a high closeness centrality would be one that is not only in a movie with Kevin Bacon, but one who is in many other movies with actors who are in movies with Kevin Bacon, so the average distance between that actor and any other actor in the network would be very small. So someone like Tom Cruise who is in A Few Good Men with Kevin Bacon. The two got their start at around the same time and in some of the same genres and so hypothetically would have appeared in movies with many of the same people.

Eccentricity measures how many moves nodes a given node has to hop through to get to the furthest node away from it. So in the case of the Kevin Bacon game, if there is a debut actor who is not in many other movies other than one with Kevin Bacon, they would have a high eccentricity. An actor or actress with a lot of other movies with large casts of people who were themselves in many other movies would have a low eccentricity.

After the Network Diameter function is run, each node's Betweenness Centrality, Closeness Centrality and Eccentricity will be in the Data Laboratory which means those measures can be visualized on the graph for our data, the characters of the play A Midsummer Night's Dream and we can better see what these measures mean.

Changing Node Attributes

In the Overview tab, you can use the Appearance section to change what your graph looks like from its original default configuration.

In the Nodes section click to Partition when you have it on the palette, this changes the color. Choose group, and you'll see default colors assigned to the groups. If you click on any of the squares you can select the colors from the graph. I've chosen lavender for the Royals, pink for the fairies and brown for the mechanicals.
When you are finished, select Apply and the graph will change so that the nodes are the color of their corresponding group.
If we hover the cursor over a node, Gephi will display its connections by blurring out the nodes that are not connected to it and leaving the ones it is connected to highlighted. Browse the graph in this manner. When I do this, I note that it looks like the purple nodes (those for the royals) are mostly connected to the other purple nodes and to the brown ones which represent the mechanicals (the actors who put on a play for the royals), and that the pink nodes that represent the fairies aren't as connected with those in other groups. But without labels, we can't tell which characters are represented by which dot
For the next step let's add labels to our map so we can see which nodes represent which character. Go to Data Laboratory, and click on Copy Data to other column. Choose Id since that is the name that character labels are in. Click that you want to copy it to Label. This will copy the character names into the Label column.
Next, go back into the Overview window and click on the black T icon at the bottom. This will turn on your node labels. Use the slider on the right until the labels are a side where you can read the label but it's not overwhelmingly large. Now when you hover over a node, you can see the names of the other characters that it is connected to.
Click to the next option in Nodes, the one that is 3 interlocked rings. This is the menu used to change node size. Click on Ranking and choose Degree. Make the Min size 5 and the Max 20. The nodes are now different size based on how many links are connected to them. You'll see the Royals and Bottom are the largest, followed by the other Mechanicals.
Just to see how Degree measures something different than centrality, use the dropdown to change the nodes to be ranked by Betweenness Centrality. This will instead base the node size based on how central it is to unique node connections. So rather than how many connections it has, the nodes size will be determined by how central it is to different node clusters being connected. In this graph the node for Bottom is the largest, meaning this character has a high betweenness centrality, though Oberon and Puck on the larger side too.
Play with different measures, what do they tell you about how these networks are set up. Switch it back to Degree.

Changing Layouts

Currently the layout chosen is somewhat of a default, but you can try different layouts and see what they tell you about your data. Note that there is an element of randomness within these layouts so yours might look slightly different from mine.

On the Layout window within the Overview tab, choose Force Atlas 2, and choose Run and then Stop after the graph seems to have stopped moving. Now instead of the groups being mixed together, you'll see that this layout groups the more connected nodes together.
Now try Fruchterman Reingold, and it's set up differently.
You can also adjust the properties more granularly. For instance, if you change the layout to Yifan Hu Proportional, you'll see that the nodes cluster together to an almost unreadable extent. But if you then click and edit the number next to Optimal Distance to change it from 100 to 500, you'll see the graph changes to that the lengths of the lines have increased
Leave it on this layout for now.

Saving and Exporting Your Graph

You can save your workspace at any time by using File > Save.
If you want to save a picture of the whole graph, you just go to the camera icon at the bottom of the Overview window, and click on it. It will ask where you want to save the PNG file. If you want to select a certain image size, use the dropdown and select Configure first, choose your size and then click on the camera icon, name your file and choose where to save it.

To see how I set up my graph and how my screenshot came out see the files below. You will have to unzip and extract the Gephi file from the zipfile it is in.

Learning Goals

In the last exercise, we created a simple graph in Gephi. Next, we'll explore additional tools that Gephi offers to let us better see how different nodes are connected. There is also the options of filters to narrow the graph to just certain sets of nodes and edges. We'll be proceeding from the graph we built in the last section of this lesson, you can use the gephi file within the below zip file.

MidsummerNightsDream1.zip

Getting Started

Go to where you have downloaded the zip file above, extract the gephi file within it, MidsummerNightsDream1.gephi and open it within Geph
Take a look at the Data Laboratory and Overview window to acquaint or reacquaint yourself with what's in this workspace. This data contains the times that characters within Midsummer Night's Dream are on stage together.
Currently in the Overview section of Gephi, these relationships are graphed with the group that the characters are in represented by the color of the node, and the size of the node determined by its degree, or how often it is connected with other nodes in the graph, i.e. how many times the character is onstage with other characters in the play.

Using Additional Tools

In the last section we talked about how to turn your data into a graph. Other than displaying the relationships between nodes visually, there are other tools you can use in Gephi once you have created a graph. Though they are more useful for larger and less straightforward network graphs, there are a couple of other tools in Gephi that can let you see other facets of the network within your data.

Heat Map. If you want to see how connected other nodes are to any target node, you can use the Heat Map function, which will reconfigure the colors of other nodes to match how close or far they are from your target node in terms of how many nodes it would have to pass through to reach those other nodes.
- Click on the Heat Map icon on the left side of the Overview window, this says Heat Map when you hover over it and is the last icon down
- The display at the top of the window will change to prompt you to Click a node. There will also be a dropdown to allow you to select whether you want the colors displayed as a palette or gradient. The default should be gradient. with a series of arrows that you can use to control the colors used on either side of the gradient and where the midpoint is set. Click on the far-right arrow and use it to select a dark blue color. This will make it more easily discernable on the graph which nodes are more or less easily reachable by your target node.
- Next, click on your target node. You can better see the connections or lack thereof by choosing a far flung one, so I've picked Egeus. When you do this, you'll see the colors of the nodes change, some will be a blue purple, some a lavender, and some an orange color. Your source node will be the color that's the far left on your scale, since it is the source.
- You can use the gradient line above to read your graph's new color scheme. The closer the color of a node is to the colors on the left of the gradient line, the less amount of steps it is between it and your source node, and the closer it is to the colors on the right of the gradient lines, the more steps there are between it and the source node. When you do this there will also be a display at the top telling you what the maximum distance is between your source node and the farthest away node, in this case it's 4 steps.
- The orange nodes are the ones that are only one step away from being connected to Egeus, your source node, the lighter orange ones are the next further out, and lastly the bluer shade of purple are the ones that are the furthest out. From a glance at this, it's hard to tell which of the fairies is the furthest away, that maximum of 4, but you can use the next item to see what the paths are between nodes that are not directly connected.
- To get it ready, first reset the nodes to their original color by clicking to the Appearances menu, selecting Nodes then on the color palette options to Partitions to get back to the menu where you've selected the colors by group. Click on Apply and it will reset to those colors.
Shortest Path: This is the tool that will visualize the shortest path between two nodes that are not directly connected. So while the heat map tool visualizes the path length between your source node and all the other nodes, this tool lets you get more focused and just see the actual path between the source node and the target node.
- Click on the line with dots at the end shaped icon above the heat map icon and the display at the top of your overview will change, asking you to Select a source node
- Click on the node for Egeus.
- The display will change to one asking you to Select a target node. Click on the node for Moth
- The graph will change to display the path by which Egeus is connected to Moth. The other nodes are made more transparent, and the nodes of Demetrius and Oberon are highlighted red along with the lines between them that constituted the shortest path.
- Clear this path by going to the nodes and edges options and hitting Apply for each in their prior color schemes (Edges to Unique and that grey color, Nodes to partition for the color by group)
- Using Shortest Path, see if you can find a longer path between any two nodes.
- This isn't the most useful function with a graph that is relatively dense like this one, but for graphs that have more distinct groups or with a lot more nodes in them, it will be useful to you to see exactly how closely connected two entities are. You might also deal at times with graphs where two nodes are not connected at all, and that will be a useful result to find.

Using Filters

Currently, the edges you see represent all the different connections between each character laid on top of each other. To see what this means in terms of you being able to see how characters are linked (or not) throughout the play, try the below configuration to see how they can't be displayed without some extra modification.

Click to Edges on the Appearance module. Choose Partition as what you want the color to be based on. Chose Act as the attribute and hit Apply. Uh oh. Even though all five acts appear as colors on the legend for edges, only the colors for V, III, and IV appear on the graph. This is because the edges all appear on top of each other, so you can only see an edge if it is the latest occurring link between two characters (which is why the colors for acts I and II aren't on the graph).
Now supposing that as part of your project you weren't just interested in how characters interacted over the entire play, but how that changes between each act in the play, and if different acts had different densities of networks. You can't display that on the main graph, but you can use Filters in order to only display certain parts of your graph at one time.
Click to the Filters window on the side of the screen. It has two parts. Library which contains the different kind of filters you can use to change the graph and data as it appears, and Queries where you will use those filters to create queries. We'll find the Filters that we want in the library, and drag them into the Queries section.
In Library, click on Attributes and Equal and scroll down until you see act. Click on the filter icon next to it and drag it down to the Queries section of the window.
Use the plus signs in the Queries window below Equal (act) to drill down to Parameters and then click to highlight pattern: null. In the window below called Equal (act) settings type in I, then click OK. Pattern is the search term we are using. This is us telling Gephi to only show us edges that occur in Act I of the play according to our data. When your Queries window looks like this, click on Filter.
When you do this you'll see that the graph changes and now only contains connections between some of the characters. You'll also notice that now the color that has been assigned to Act I (in my case Orange) is the only color in use. Additionally some of the characters (the Fairies) don't have any lines connecting them at all.
Click Stop and the graph will return to its previous display.
Try this changing the Acts to different ones and see how that changes your graph, which acts seem to have the most connections between characters of different groups?.
You can also search based on parameters based on nodes, not just edges, so if you wanted to search for just connections between characters whose gender was Woman you could filter by that too. Right-click on Equal (act) under Queries and choose Remove. Then click on gender (string:node) under Equal in the library and drag that into the Queries section.
Choose Pattern again, and type in Woman for the pattern, and choose OK and then Filter. With only the female characters shown, it is evident that they do not associate with each other outside of their group.
You can filter based on numerical qualities too. and just see characters above or below a certain number of degrees. Click Stop on your filter for Equal (gender) and right click on that Filter under Queries and choose Remove.
Scroll down in the Filters Library until you get to Range, choose the dropdown off of that, and then select Degree integer and drag it to under Queries.
A sliding scale will appear below, and you can narrow the range. I'll narrow it to just the ones in the middle range, and choose Filter.
When that filter goes into effect, it looks like the only characters within that range are the Mechanicals and one of the Fairies.
To get rid of the filter and see the full graph again, you can select Stop and the graph will go back to the full range.

Learning Goals

In the last module you looked into how to create a network graph and using the filtering capabilities to see which connections persisted in which act of the play. This one we'll be creating a different workspace, one that will let us visualize the edges connecting the different characters as having different weights based on how often they are connected to each other. It's important to note however that this will create these weights based on grouping together the different entries for each connection. This will mean that it will no longer be possible for you to break out which act and scene a different connection happened in. That's why it's being created in a different workspace.

Data

Attached are the spreadsheets you'll be importing to create your Gephi worksheet. If you've completed the previous module you'll already have these downloaded.

Getting Started

If you haven't already done so, download Gephi from their website. The version I'll be using in the tutorial below will be 0.10
Installing it is pretty easy, it does have some system requirements you'll want to make sure that you meet though. It unfortunately doesn't have an online equivalent though, so if your computer doesn't have the required space or RAM available you'll need to use something else.
Once the installation is finished, locate Gephi in your start menu and start it
On the initial welcome window that pops up, click on New Project.
If you are continuing from the previous tutorial and already have a copy of Gephi open just go to Workspace and choose New

Importing your Files into a Merged Graph.

Even though you may have already imported the spreadsheets in the previous tutorials, you'll need to import this one in a different fashion to create this new graph.

Go to File > Import Spreadsheet and select the Midsummer Nights Dream - Links.csv wherever you've put it.
Make sure the dropdown has selected that you are importing it as an Edges table. Then click Next. Keep all the defaults on the next screen and choose Finish.
When the Import Report comes up choose the Edges Merge strategy as Sum. In this case, we'll be caring more about how many times each node is connected to each other node, not what the connections look like in other sections of the play
If you've created a new workspace for this graph and are still in your original document, choose Append to existing workspace. Otherwise choose New Workspace
Choose Undirected as the kind of graph rather than Mixed. As we care about the sum of times that each character is on stage with another individual character rather than at one point in the play this is happening, we don't need the graph to be directed.

To understand the difference between directed and undirected graphs, you can think about social media. On Instagram or X (Twitter) a lot of people might, say, follow Taylor Swift, but that doesn't mean that Taylor Swift follows them back. For most users, they and Taylor Smith are connected in only one direction. Whereas on Facebook, if 2 people are friends, they by definition are connected both ways as both parties have to agree to being connected, so it is an undirected graph. Other examples of a directed network might be the network of a company where some people pass work from one node to another, or a campaign finance map of which organizations are giving money to which candidates. In this last module we made a mixed graph of both directed and undirected connections, but for this one, let's just see what the graph looks like when it is undirected.
Click OK and it will open a new workspace that has a preview window with a network graph that has lines of varying thickness. You'll also get a warning that some edges have been removed. In this example,
Go to Data Laboratory. Unlike the previous graph you made, instead of there being multiple edges that connect each person to each other person as many times as they appeared alongside each other, it just lists the last appearance of them together as the information on act and scene but has a weight denoting how many times they appeared together.
Let's add the Nodes information. Go to File > Import Spreadsheet. Select Midsummer Night's Dream - nodes.csv and make sure you're importing it as a Nodes table. Click Next and then Finish. Make sure that you've selected that you want to append it to the existing workspace.

Adding Attributes and Labels on the Chart

Right now, the chart that is in the Overview window doesn't give the viewer any context about what each node represents and what their attributes are. Let's change that by changing how they appear and adding labels

Go to the Overview window where the graph is In the Appearance window, select Partition under Nodes so that the color will be determined by some catagory and choose group as that category. Click on the colors next to each partition to make them different colors. Let's stick with the color scheme of Royals as displayed by purple, Fairies as pink, Mechanicals as brown. Click Apply and your map will change. In my instance, it changed the color of the lines too, but I didn't want that, so I went to the edges section of the chart and changed it.
Now we can see a bit more about what the natures of the connections are between the nodes. It looks like members of the Royal group have a lot of thick connections between each other, but the members of the Fairy group only have thin lines coming off of their nodes, meaning they have less connections. However, without the labels, this graph is still rather cryptic, so let's add labels. Go to Data Laboratory, and the tab for Nodes. Choose Copy data to another column. Select that the data you want to copy is Id and that you want to copy it to the Label column.
In Overview, turn on the labels on the chart by clicking on the icon with the Black T at the bottom of the window. Use the slider to adjust the size to one that is readable but doesn't take up too much space on the map.
The layout still seems a bit tight to me, it looks like some of the nodes are overlapping in a way that makes the graph harder to decipher, so I'll play around with the Layout dropdown on the left-hand side of the screen. For me, Fruchterman Reingold is the layout that is working the best because it spaces out the nodes enough that each of their labels can be read, and it makes clear that the highest number of connections exist between the Royals.

Adding Statistics to Visualization

In the previous graph we made, we chose to make the nodes sized based on their degree, or the amount of connections that they had to other nodes. Now since the edges were merged so they could be displayed as larger or smaller depending on how strong each connection was, we might want to pick a different metric to use to size the nodes, but regardless, to add more possibilities for how we can display nodes, we'll first need to calculate the statistics for the graph.

In the Overview section of Gephi, click to Statistics tab and it will present you with the options for calculating different statistics on your graph. Click Run next to Average Degree, Average Weighted Degree, and Diameter (leave the defaults selected for this last one in the window that pops up).
When you run each of those statistics, a window pops up showing you the distribution of the different measurements. That information is now stored in your Data Laboratory for each node. Click through to that window and browse through to see the new measures that have been added for each node. You can sort by clicking on the heading at the top to inspect the relationship between the different measures, but in the end you'll want to be displaying these differences on your graph.
Return to the Overview section. Go to the Nodes tab of the Appearance window. Go to the sizes section (the one that looks like 3 interlocking rings), choose Ranking as the method you want to use to determine the sizes and select Degree from the dropdown, choosing the Min size as 5 and Max size as 20. Click on Apply.
To see what the different measures tell you about your graph, play around with switching what measures determine the node size, and see which nodes are getting larger or smaller, and what kinds of connections exist between them and other nodes. Since we are using basically the same data from the last exercise, we can see again the Bottom is the node that has the most connectivity with other nodes (the highest Betweenness Centrality). Which view you'd choose to use within your project depends on what you want to display to the viewer. But with the ability to take screenshots using the camera icon at the bottom of the Overview screen, you can display several different views within your presentation or paper.

Saving and Exporting

You can save your workspace at any time by using File > Save.
If you want to save a picture of the whole graph, just go to the camera icon at the bottom of the Overview window, and click on it. It will ask where you want to save the PNG file, and then you'll have a file like the below I have attached where the nodes are sized based on their Closeness Centrality. If you want to change the sizes of the image you are saving to make the pixel size bigger or smaller, click on the little arrow next to the camera icon and choose Configure.
To see how I configured my graph, you can take a look at the Gephi file in the zip file below.