O'Malley Library: DAsH: Using NodeXL to Create a Network Analysis

What is NodeXL?

NodeXL is an open-source template that can be used in Excel sheets to generate a network graph. The free version, Basic is what will be used in the exercise below, but there is also a Pro version available that allows for data to be gathered from Twitter, and the possibility of importing from other social media platforms. You can use NodeXL Basic to graph data that you have already gathered, the new version of Twitter however, does not allow you to gather data using NodeXL Basic. Basic allows you to do a lot of customization in terms of the color and shape of the nodes, and additionally contains analysis tools that will group them based on the patterns of connection that the nodes exhibit. If you want to do a graph that requires more customization that Palladio offers, but it doesn't require quite the level of complexity that a program like Gephi offers, then NodeXL can help you create that graph.

Using NodeXL With Pre-Gathered Data

Learning Goals

NodeXL can make graphs out of data you input manually and it allows you to do a lot of customization. In this lesson you'll create a NodeXL graph from previously created data about the interactions in the Shakespeare play A Midsummer Night's Dream. You'll customize the graph to display the information about the characters, the time scale and other information by using different colors and sizes for different categories.

Data

If you've already created the A Midsummer Night's Dream dataset in the previous tutorial for Palladio, you can use it here, otherwise the link to that data is below.

Midsummer Night's Dream spreadsheet

Getting Started

Open the link to the Midsummer Night's Dream data spreadsheet and take a look at the headings to see how the information in it is structured. The different characters and information about their attributes is stored in the sheet Nodes. Information on the connections between them is stored in the sheet Links.
If you haven't already downloaded NodeXL from their site, visit their download center. Unzip the resulting folder you receive and run the installer.
Locate NodeXL Excel Template in your Start Menu and click to open it. Close the popup window that will appear trying to get you to download NodeXL Pro by clicking the X in the right-hand corner
It will open like a normal Excel sheet, except it already contains some of the column headers and sheets. However, the rows so far contain no additional information below those columns. Scroll through the different sheets to get a sense of how the data is going to be organized. Generally when talking about network analyses, I like to use the words nodes, and connections(or links). This sheet uses the terms vertices and edges. If I switch between the two in this tutorial, that is why, but a node is a vertex and an edge is a connection or link between them.
- It opens to the sheet Edges. This is where the data on the connections between the different vertices will be stored. A Vertex is the entity that you are looking to examine the connections to, its plural is vertices. In this case, each vertex is a character in the play A Midsummer Night's Dream. Each row on the Edges sheet stores information on a connection between two vertices (in our sheet it represents a time that two characters were on stage together), and has additional columns for how to represent those connections on the graph, as in what color to make the line. We'll be adding columns with information on the act and the scene.
- The next sheet is Vertices. Each row represents a different vertex (or node). In this case the nodes are characters in the play, but nodes are the entities whose interconnections you want to explore. The additional columns allow for more options on what color to make each node that can either be generated by the program, or customized by you.
- After that comes Groups and Group Vertices. In Groups, you can set up the different categories you want to assign to the vertices, and decide how the vertices in that group will be drawn. In Group Vertices, you assign the vertices to a particular group. This can be done manually, or the program can do so for you. For instance, you may want to separate the characters in this group in to men, women and unknown. Or you might want the program to separate out the vertices depending on their properties in the graph, how connected the nodes are or how many scenes (edges) they are in together.
- The last sheet in this workbook Overall Metrics is where measurements the program gets from analyzing the graph as a whole will be stored, such as the graph's density or other measures.

Adding in Edges

The sheet that was created that contains the links, or connections between the different characters is what you'll be adding in on the Edges section of the sheet.

From the Midsummer Night's Dream spreadsheet in the Links tab take the information in the rows below the columns Stage Mate 1, Stage Mate 2 and copy them into the Vertex 1 and Vertex 2 columns in the Edges sheet in NodeXL. Leave Vertex 1 and Vertex 2 as the column names.
Instead of copying over the other attributes straight into the next columns over like they are on your original sheet, you'll need to go over to where it says Add Your Own Column Here and switch that to say Act. Add a column to the right of that called Scene. NodeXL will automatically change the color and add a dropdown option. The columns you see directly next to your Vertex 1 and Vertex 2 columns on the right are the ones that control the colors and styles of the vertices and if you move them over to accommodate other attributes, the template won't function correctly, and your graph won't work, so that is why you add the new columns you want where the program indicates you to.
Paste in your data for the Acts and Scenes of these connections from your Links sheet in the Midsummer Night's dream data.

Adding in Vertices

Since you've manually entered in the edges, the vertexes won't automatically transfer over. You'll have to add them in, but fortunately, you've already got them in the Nodes sheet in your Midsummer Night's Dream data.

Copy the characters listed in the Nodes column on your Nodes sheet, and paste them underneath where it says Vertex on your NodeXL sheet.
Similar to what you did with the edges, even though on the original sheet, the columns for Faction and Gender go right next to the Node column, on this sheet, you'll need to put those over to the right of the sheet where the option to Add Your Own Columns Here is highlighted in Column AC. The formulas that make the graphing possibilities of the sheet function because the other columns are all in the places that they are.
Add the columns for the other attributes that you have for the nodes Faction and Gender where it says Add Your Own Columns Here, and paste in the contents.

Starting Graph

With this amount of information, you can actually already make a graph. At the top of your sheet is a Document Actions window that is currently displaying information about NodeXL.

Click where it says Show Graph in that window, and it will show you the graph of just the nodes and their connections in plain black and grey.
At this stage you can only tell the information about which nodes are connected to other nodes, though they aren't labeled so you do not know who they are. If you click on any node, it will highlight the edges connected to that node, and what other nodes those edges connect to. If you want to see a specific node or connection, you can locate it in the vertices and edges sheets and it will highlight in red on the graph.
At this point it's a good idea to save what you've done so far, so you don't lose it, I've called it MidsummerNightsDream. Save it as an excel document.

Altering Colors and Adding Labels

With a well-made graph, you'll want your viewer to be able to figure out what your graph's message is at a glance. You can assist in this by adding labels and colors that represent different information about the nodes and the links for them.

In the menu that you get to by clicking on NodeXL Basic in the menu bar at the top, there's an icon called Autofill Columns. Click on this icon to open the Autofill Columns menu. Some aspects of your graph can be changed as a batch so you don't have to go in and manually change each row.
Since Faction is a column that you've recorded for the different nodes (or vertices as they're called here) that you'd like displayed, let's start with that. In this play, the characters fall into three groups, Royals, Mechanicals, and Fairies, and that information has been recorded about them in the column Faction.
Click to the Vertices tab in the Autofill Columns window, since this is where you can batch edit options about the nodes. Click on the dropdown next to Vertex Color and choose Faction. Next click on the arrow next to options and choose Vertex Color Options. This will let you make every vertex that's in a given faction the same color so they are easily identifiable.
In that window, choose what the column's values are. In this case they are Categories but if you had numerical data like for instance a character's age or number of lines, you might choose to display that information instead with a lighter or cooler color meaning a smaller number. For now, select Categories and choose OK. Click on Autofill and the columns on your sheet in Color under Vertices will autofill with a randomly assigned color for each group
You can also do this for the labels for the Vertices. In the dropdown next to Vertex Label choose Vertex. This will fill the label column with the character names without you needing to manually copy and paste in each name. Click OK and then click on Refresh Graph and your vertices will now have a different color based on what group they are in
Unfortunately, you can't autofill the shapes column by categories, so if you want to have men, women and unspecified represented by different shapes, you'll have to do that manually. So sort the sheet by gender and assign a different shape for Man, Woman and Unspecified. In my case I've chosen Solid Square for Man, Solid Diamond for Woman and Solid Triangle for Unspecified.
Click on Refresh Graph and now your viewer will be able to tell at a glance which clusters contain members of which group, and what the gender of the characters in that group are.

From this new graph you can see visually which nodes seem to have a lot of different connections to other nodes. But unfortunately for the edges, since they are placed on top of each other, you can't see in this graph just how many connections go into each vertex. So whether Oberon is on stage with Puck in all acts or just once in the whole play, there is only one line present between them which can make visualization of the amount of connections that a node might have with another more difficult. This is somewhat resolvable in the next step where you'll calculate each node's degree (how many connections are made to it), which is another thing you'll be able to visualize.

Calculating Graph Metrics

Not all of a graph's metrics can be calculated in the NodeXL basic program but one of the more basic and useful ones is - degree. A vertex (or node's) degree is determined by how many links are connected to it. You'll be using NodeXL's metrics function to calculate what the degrees are for each of your nodes.

In the NodeXL section of the menu, click on Graph Metrics. In the window that opens, check the boxes next to the first four options, which will measure the general qualities of the whole graph, as well as the degrees for each of the different nodes in the graph.
Click on Calculate Metrics and NodeXL will analyze the graph's overall connections, and how connected each node is. It will take you to the Overall Metrics tab which we'll be going over later, but for now, let's go back to the Vertices tab, since we'll be using this new option to change how the vertices appear on the graph.
Now there's a whole new set of columns inserted into the Vertices sheet but the only one that is populated is Degree. Scroll through and click on a few records to see what it means. When you click on a vertex with a high degree number on the graph, it is connected to a higher number of other vertices, and a vertex with a lower degree number has fewer connections to see when highlighted. This is an important bit of information to convey to your audience, and you can do this by changing the size of the vertices in proportion to their degrees.
The Autofill Columns option will come in handy again, so click on it. Click to the Vertices tab and using the dropdown next to Vertex Size choose Degree and then click on the arrow beside it and choose Vertex Size Options
Change the Max option to 20 and click OK and you'll see that the Size column has now had different options filled in, from 1.5 to 20 depending on how large the degree is. Click on Refresh Graph and this information will now be represented on the graph.
The graph now has different sized nodes on it, but it is still pretty crowded.
Click to the Overall Metrics sheet. This is where measurements for your graph as a whole are stored. Some measurements are self-explanatory, like you can see that the total number of vertices and edges are counted. Other measures that can tell you something about the graph are:
- Maximum Geodesic Distance, which in this graph is 3. This means that the maximum distance between two nodes is 3. Distance is a measure of how many other connections a path would need to go through to get from one node to the other. So to get from Moth to Egeus, the path would need to go 1) to Bottom 2) to Theseus or one of the other Royals and 3) from that Royal to Egeus. The distance is 3. Average Geodesic Distance is 1.4, which means that for the most part, if nodes are not directly connected to each other, they are only one additional connection away.
- Graph Density, which in this graph is .55. If no nodes are connected at all, that means a graph has a density of 0. If all nodes are connected directly to all other nodes, that means it has a density of 1. The measure in this graph says that it is somewhere in the middle of these two figures.
- Minimum Degree, Maximum Degree, Average Degree and Median Degree along with the accompanying chart tell you a bit about the distribution of degrees among the node. So, by this measure, on average, a node is connected to 12 other nodes, so each character on average appears onstage at least once with 12 separate characters across the play.

Manually Altering the Graph

If your graph looks like mine, the Royals group is rather snarled and clustered so it's difficult to read the labels for the different edges, and harder to tell how many acts any two characters appear in together. You can change this by manually moving different vertices.

To give you a larger graph canvas to work with hover over the boundary between the Document Actions section of your sheet where you have the graph and draw it down.
Make sure that you have the arrow icon highlighted at the top of the graph then click on one of the Royals vertices that is bound in tightly with the others, and drag it down. I've chosen Hermia.
Drag each node that is crowded together outwards until your graph is more legible. Note however that this layout will change back if you refresh your graph again.

Using Filters

NodeXL doesn't have any kind of function to allow you to create multiple parallel edges. In other words, whether Bottom and Oberon are onstage in five scenes together, or just one, there will be one connecting edge between them. Any attempt to color say, Act I edges as blue and Act III edges as red, will just mean that the edge will be red and that other blue edge will be hidden underneath it. If you want to display connections between nodes differently depending on some attribute, you'll need to create a series of different graphs. Fortunately you can do this rather easily using filters, just like you would use to filter any other Microsoft Excel document.

In order to only display the connections between characters in Act I, go to the Edges section of your sheet and click on the dropdown arrow next to Act and make sure that only I is selected. Click OK and your sheet will change to represent only edges that occur in Act I.
Click on Refresh Graph and the Graph will also change in appearance to only display the characters' interactions in Act I.

a smaller graph where the differing groups are not connected

Play around the filters to change the acts displayed and then refresh the graph. This can let you ask questions of your analysis like, do any of the the acts contain a connection between all three groups of characters, or are those connections only made if you look at the play as a whole?
If this were a larger project, you could try and make the same kind of graph for other Shakespeare plays and see how they compared.

Changing the Layout

By default, the graph is using a structure called Fruchterman-Reingold, but there are other options.

If you want it to try different options to see if it makes a clearer graph, you can click on the dropdown next to Refresh Graph and choose a different option, and then select Refresh Graph. For instance, when I select Harel-Koren it makes a graph that more clearly shows how connected Bottom is to nearly all the other nodes.
For more customization, you can select Layout Options to change things like how separated groups and vertices are.
Anytime you make a change that you want to see the results of on the graph, select Refresh Group or Lay Out Again and the graph will change.

Exporting

While you can save the spreadsheet you've created, and the settings that are used on the graph, if you want to use the graph as an illustration , you'll need to take a screenshot of it, since there isn't a way to easily save the graph as an image file. You can however, right click on the graph and choose Copy image to Clipboard and then paste it into an image software or screeshot software.

Using a screenshot software like PicPick, highlight the graph, and save it as an image.
You'll also want to save as a NodeXL document so you have all the data and settings that you added to it.

If you want to see how my graph was composed, please see the below sheet which contains the graph, and the jpegs that contain the information for the different graphs for each act of the play.

DAsH

Subject Librarian

Contact Us