Gephi is an open-source data visualization program specifically useful in creating network analysis graphs. CSV files listing connections (edges) between nodes (datapoints) can be uploaded, along with attribute sheets for these nodes. The data can be visualized in different layouts, and with different symbol and color options. It also contains analysis programs that let you see the different measures for each node and for the graph as a whole. It is a bit more complex and less user-friendly but offers a lot of customization and analysis options, so the trial and error for complex subjects will likely be worth it.
Gephi is a more sophisticated way to do network graphing. You have more options for customizations, and you can export the information that you get from it. Further, if you are looking to do more than just visualize and are interested in the findings your graph shows like - the density of your graph as opposed to other graphs, or the centrality of a given node - you can also calculate and export the different metrics for your graph as a whole and for the individual nodes.
With this module, you'll be graphing the interactions between characters in A Midsummer Night's Dream by importing the below csv files into Gephi and visualizing the results. You'll add labels, and change the colors and sizes to best display your data.
The below CSV files are from the lesson in on how to create structured data from unstructured text. If you created that sheet already and want to reuse your data, you'll just need to save each sheet as its own CSV file. In the Links file, change the first two columns from Stage Mate 1 and Stage Mate 2 to Source and Target. On the Nodes file, change it from Node to Id.
A convenient thing about Gephi is that within one window, you can open up multiple workspaces. So if you have a set of data that you want to look at in multiple manners, you can open up different workspaces. We'll first import the Midsummer Night's Dream spreadsheets for your Nodes and Links.
In addition to creating a visual representation of how the different nodes are connected, Gephi can calculate the metrics for the qualities each node has for how connected it is to the other nodes, and how densely the graph is constructed as a whole. You can calculate a lot of metrics about your graph in here. To start with we'll take a look at degree and centrality.
Degree is a measure that will tell you how many other nodes each node is connected to. Centrality is a measure that tells you how much a particular node functions to facilitate connections between nodes, like in the case of the Midsummer Night's Dream scenario where Bottom is in a scene with characters from many different groups, he is more central than someone like Theseus who is nearly only in scenes with other members of the royal family.
To talk about paths between nodes more informally for a second, think about the party game Six Degrees of Kevin Bacon, where you try to connect Kevin Bacon and any other actor or actress based on movies they were in together. For example if you wanted to connect Kevin Bacon and Daniel Kaluuya, you could say that Kevin Bacon is in X-Men First Class with Caleb Landry Jones who is in Get Out with Daniel Kaluuya.
If there was a network analysis graph of the movies of Kevin Bacon, then an actor or actress with a high betweenness centrality in that graph is one that very often appears on the path to connect Kevin Bacon with other actors. One of those might be Laurence Fishburne, who is in Mystic River with Kevin Bacon, and who also appeared in Apocalypse Now with Marlon Brando who appeared opposite many other prolific actors throughout the 1950s-1990s.
An actor or actress with a high closeness centrality would be one that is not only in a movie with Kevin Bacon, but one who is in many other movies with actors who are in movies with Kevin Bacon, so the average distance between that actor and any other actor in the network would be very small. So someone like Tom Cruise who is in A Few Good Men with Kevin Bacon. The two got their start at around the same time and in some of the same genres and so hypothetically would have appeared in movies with many of the same people.
Eccentricity measures how many moves nodes a given node has to hop through to get to the furthest node away from it. So in the case of the Kevin Bacon game, if there is a debut actor who is not in many other movies other than one with Kevin Bacon, they would have a high eccentricity. An actor or actress with a lot of other movies with large casts of people who were themselves in many other movies would have a low eccentricity.
In the Overview tab, you can use the Appearance section to change what your graph looks like from its original default configuration.
Currently the layout chosen is somewhat of a default, but you can try different layouts and see what they tell you about your data. Note that there is an element of randomness within these layouts so yours might look slightly different from mine.
To see how I set up my graph and how my screenshot came out see the files below. You will have to unzip and extract the Gephi file from the zipfile it is in.
In the last exercise, we created a simple graph in Gephi. Next, we'll explore additional tools that Gephi offers to let us better see how different nodes are connected. There is also the options of filters to narrow the graph to just certain sets of nodes and edges. We'll be proceeding from the graph we built in the last section of this lesson, you can use the gephi file within the below zip file.
In the last section we talked about how to turn your data into a graph. Other than displaying the relationships between nodes visually, there are other tools you can use in Gephi once you have created a graph. Though they are more useful for larger and less straightforward network graphs, there are a couple of other tools in Gephi that can let you see other facets of the network within your data.
Currently, the edges you see represent all the different connections between each character laid on top of each other. To see what this means in terms of you being able to see how characters are linked (or not) throughout the play, try the below configuration to see how they can't be displayed without some extra modification.
In the last module you looked into how to create a network graph and using the filtering capabilities to see which connections persisted in which act of the play. This one we'll be creating a different workspace, one that will let us visualize the edges connecting the different characters as having different weights based on how often they are connected to each other. It's important to note however that this will create these weights based on grouping together the different entries for each connection. This will mean that it will no longer be possible for you to break out which act and scene a different connection happened in. That's why it's being created in a different workspace.
Attached are the spreadsheets you'll be importing to create your Gephi worksheet. If you've completed the previous module you'll already have these downloaded.
Even though you may have already imported the spreadsheets in the previous tutorials, you'll need to import this one in a different fashion to create this new graph.
Right now, the chart that is in the Overview window doesn't give the viewer any context about what each node represents and what their attributes are. Let's change that by changing how they appear and adding labels
In the previous graph we made, we chose to make the nodes sized based on their degree, or the amount of connections that they had to other nodes. Now since the edges were merged so they could be displayed as larger or smaller depending on how strong each connection was, we might want to pick a different metric to use to size the nodes, but regardless, to add more possibilities for how we can display nodes, we'll first need to calculate the statistics for the graph.