Skip to Main Content

DAsH

Research Guide for DAsH (or digital humanities) resources and tools

AntConc

Learning Goals

AntConc is a software that makes it straightforward to do searches and analysis on large numbers of text files. In this tutorial, you'll download the program, and use it to plot the use of a term or phrase in different works in gothic fiction. You'll also see how to search for words when they appear in the context of other terms of interest and how to pop out the results window so you can compare results for multiple words or parameters. 

Installing AntConc

  • If you haven't already installed AntConc, head to the AntConc websiteClick on the link under Downloads that matches the computer system you are using. 
  • Antconc.exe will download and once it completes, you'll be set to run it, just by double-clicking on the file.

Data

Since one of AntConc's big selling points is its ability to analyze large amounts of text, and compare those texts to each other, I'll be loading in an entire directory of files of gothic fiction downloaded from Project Gutenberg. This information is in a zip file below that you should unzip as a folder somewhere that is easy to get to on your computer since you'll be navigating to it later.

Getting Started

  • The first thing that you'll need to do to start analyzing text is to upload the files you want to analyze into AntConc. Keep in mind that you can only load in plain text files, not pdfs, word documents, web pages, or any other file formats. So in your own research you'll need to make sure the items you want analyzed are converted to or downloaded in plain text (.txt) format. 
  • Go to File  and choose Open Dir. On other occasions you may want to look at a single file or a few chosen files, and then you'd choose Open File(s)Open Dir  allows you to open all of the files in a directory with AntConc at once.
  • Navigate to the folder you created when you unzipped GothicFiction.zip, select it and click OK.
  • When you select OK the section to the left of the window marked Corpus Files will populate with the names of all the files in the directory.

Searching for One Term in Concordance.

Concordance is a list of words used in a corpus (group of texts) ranked by how frequently they are used.  The Concordance tab will let you see the words in the context that they are used in each document, but you supply it with the terms for it to search for. So it is most useful if you already know which terms are of interest to you, rather than if you are trying to see the most popularly used words in a piece. 

  • Pick a term that you'd be interested in seeing the use of in Gothic fiction. For my example I'll be using love. Type it in below Search Term, make sure Words is checked off, and click on Start
  •  The window will populate with parts of sentences that contain the word love under the heading KWIC (Keyword In Context) and to the right under File you can see the name of the file that the phrase appears in.
    The AntConc window containing sentences that contain the word love
  • If you want to see more of the sentence containing your search term, go to where it says Search Window Size and use the arrows to increase the number. I'll choose 100. Then click on Start again, and more of the sentence on either side will appear in your window.
  • Scroll until you find a sentence that you are interested in seeing in its entirety, and click on where love is highlighted in blue. 
  • Your AntConc window will switch to the File View  tab and show you that term within the work you are interested in as well as list how many time it appears in that file. 

Concordance Plot

A Concordance Plot allows you to see when in the narrative your search term is used by displaying bars across a box indicating where in the text its used. Perhaps you want to see at a glance which of the novels have weddings at the end by using that as a search term and look for where there are a lot of black lines towards the end of the box. If you are looking at only one work and want to see which parts of the book prominently feature one character rather than another, you could compare the plots for each of those terms.

  • Click over to the tab Concordance Plot. If there is not a search term currently being displayed, type in love under Search Term and select Start
  • Scroll through the results and check out the variation between books. Dr. Jekyll and Mr. Hyde has nearly no mentions of the term. In The Picture of Dorian Gray, it is used mostly in the second quarter. For Dracula it isn't quite as clustered as in Dorian Gray, but is used at a pretty steady clip throughout the book. 

Searching for Variants of a Term

If you simply type in the word love and select Words, and search, you'll only get uses of that word in that tense. You won't get uses of loves, loved, loving or other variants. There are a couple of different ways you can make sure you're getting a more complete view of the use of a term by searching for its other forms as well.

Wildcards. A wildcard is the use of a symbol like a * or a ? to let the search engine know that it can search for that your term at the beginning or ending of a word. To use one in this case, instead of typing love,  you'd type lov* in order to search for words that start with lov and have any other combination of characters after it. 

  • In the Concordance window type in lov* as the search term, and click OK.
  • Now the number of hits has gone up from 2600 to 3794. Scroll through the results and see the other terms that you're getting with it. You'll see that love, loved, loves  and loving are now included so you're getting other verb forms, but also words like lover, lovely, and loveliness, so if you are very strictly only interested in the word as describing an emotion of a character, this is too scattershot of a solution for you to get those variants. 
     

Advanced Search:  This will allow you to add multiple terms to be searched. You can type them in or even load them from a list that you've saved to a text file. 

  • Click on Advanced next to Search Term, and in the Advanced Search window that opens, click on the box marked Use search term(s) from list below, and then type in each term you are interested in, separating them by pressing enter.
    In the Advanced Search window, the words love, loving, loved and loves are entered in the search term window.
  • Click OK and you'll return to the search window. The box below Search Term will be greyed out since you've added the terms you want in another window. If you click on Start, it will narrow the search to just those items. You'll see that your hits have now been narrowed to 3162 and when you scroll, words like loveliness and love-lorn will now be gone. 

Use of | to connect terms: If you only have a few terms and don't want to upload a list, you can just enter the terms in the Search Term box and separate them with the | symbol. This symbol is called a pipe.

the terms love|loves|loving|loved in the search term window

Searching for Phrases

The Search Term boxes in AntConc allow for you to type in not just single words, but also phrases. If you're trying to trace the use of a phrase, or the use of someone's full name, this can be very helpful. I'll be showing you below how to look at the different words used after loves or loved depending on if the person in question is male or female. 

  • Type into the search box he loves|he loved|she loves|she loved and click Start.
  • Scroll through the results and see if you notice any patterns. If you aren't getting enough information from the sentences available, you can increase the Search Window Size and search again. 
  • In this case, only a few of the books have multiple hits of both version of the term. You can click on Concordance Plot to see which ones might be suitable for closer study by this method. 
  • If you are working with a small sample size, you might want to add character names before the use of the word love to make sure you aren't leaving anything out since AntConc is only searching for the words he and she and doesn't have the added knowledge of the gender of a character if a name or pronoun such as I or you is used. You can create this list in a text file and load it into the advanced search.

Advanced Search: Context Words

You can get more into this with the collocates tab later, but if you only want to look for your search term if it's within a certain number of words of another term of interest for you, you can use the Advanced search to do so. I'll be doing this with love and dark, since it's gothic fiction we are looking at and those topics are rather commonly connected in that genre. You can also do this with different adjectives or verbs and a character you're interested in or a location and an adjective you're interested in. Maybe you want to see how often it's mentioned that the castle is on the moor, or how often someone is really described as dark and handsome. You can narrow down how close the words have to be to each other by changing the Context Horizon

  • Click on Advanced next to the search term box, and in the window that opens click on the box next to Use Context Words And Horizons in the lower half of the box. 
  • You can't use wildcards in this instance, so you'll have to add the terms one by one of: black, blacker, blackness, dark, darken, darkened, darkening, darker, darkest, darkness, dim, gloom, night and after each click on Add. Go ahead and add other synonyms or leave some of these out depending on what you're interested in. 
  • Below that box is the Context Horizon. This is the amount of words that are allowed to be between your search term and your context words and it will still count as a hit. The default is 5L and 5R (or five words to the left and to the right), but use the arrows to stretch it to 10L and 10R. 
  • Click on Apply and then in the Concordance window, press Start and then you'll be able to see the use of the words in the context of the sentence and the work that it appears in. Not the largest number of results, but feel free to try out any combinations you're interested in.

 

Clone Results

 

If you are interested in one set of search results and want to be able to compare it to another, click on the button Clone Results at the bottom of the window. The results will pop out into a new window. You can conduct a new search in the original AntConc window and then compare the two side by side. 
A search results window and the cloned results window of a previous search results window side by side

Saving Results

You can save the results of a search to a text file by choosing File and then Save Output to Text File. Note that you'll probably need to copy and paste this text into a .csv or excel file yourself afterwards as AntConc doesn't transfer over the headers for the data. 

These tools above will let you see the context that your chosen words or phrases are used in for the works in your corpus. The tools in the next module Clusters/N-grams will let you know which phrases are most popular so that you might gather new phrases to search for. 

Learning Goals

AntConc can be used not just to search for terms you know you have an interest in but also to analyze overall trends within the work. It isn't limited only to the uses of single terms but can be used to look for that word in two or more word phrases, known as n-grams. N in this context means a number, so it's a stand-in to let you know that the phrase has multiple words. For example, bigrams are two word phrases, trigrams are three words, and so on. N-gram just lets you include all those options in one bucket term. You'll see how to look for n-grams within the text using this tab.

Installing AntConc

  • If you haven't already installed AntConc, head to their website.Click on the link under Downloads that matches the computer system you are using. 
  • Antconc.exe will download and once it completes, you'll be set to run it just by double-clicking on the file. 

Data

If you didn't do the previous exercise,  please download the data now.

Since one of AntConc's big selling points is its ability to analyze large amounts of text, and compare those texts to each other, I'll be loading in an entire directory of files of gothic fiction downloaded from Project Gutenberg. This information is in a zip file below that you should unzip as a folder somewhere that it is easy to get to on your computer since you'll be navigating to it later.

Getting Started

If you didn't do the previous exercise, here is how to open your data. 

  • The first thing that you'll need to do to start analyzing text is to upload the files you want to analyze into AntConc. Keep in mind that you can only load in plain text files, not pdfs, word documents, web pages or any other file format. So in your own research you'll need to make sure the items you want analyzed are in the plain text (.txt) format. 
  • Go to File  and choose Open Dir. On other occasions you may want to look at a single file or a few chosen files, and then you'd choose Open File(s)Open Dir  just allows you to open all of the files in a directory into your program at once.
  • Navigate to the folder you created when you unzipped GothicFiction.zip, select it and click OK.
  • When you do this the window to the left of your AntConc program will be full of the different works within this folder that you are able to search

Searching for Clusters/N-Grams

In the previous exercise, you were able to search for words of interest to you in the corpus and see the frequency of their use, and the context of their use in the different novels that make up your Gothic Fiction corpus. The Clusters/N-Grams tool in AntConc will allow you to see what phrases the word you are interested in is often a part of. 

  • Pick a term that you are interested in searching for in this corpus. I'm putting in the term dark and adding a * to the end so that it captures words like darkness, darkened, darkening etc because I'm interested to see if the term is really used as often in gothic literature as I would think. Then hit Start 
  • When it finishes, you'll get a screen that looks something like this:
    The results of the n-gram search, how often it is used, how many works it is used in, the search term dark or darkened followed by the word used with it.
  • The field Cluster contains the phrase that your keyword is part of. Rank is the place for that phrase overall depending on how frequently it's used (the information in the field Freq) and how many of the files it is used in (the information in the field Range). You can also tell from the top line of information how many different clusters were found with that word (in the example's case 1037) and the total number of tokens involved (1976)
  • One interesting result that I see here is that "dark eyes" ranks as the fourth most used phrase with dark at the beginning, not just because it is used 35 times in the corpus but also because it is used across 16 books. 
  • By default, the search results are sorted by Freq, but if you want to instead have it sorted by the total number of books a phrase appears in, go to the dropdown underneath Sort by (this is below the search term window), select Sort by Range and then click on Sort
  • If you want to gather more information on the context in which a phrase is used you can click on the cluster, and AntConc will make it a search term for the Concordance window. For instance, if you'll click on dark eyes you'll see that Charlotte Bronte was a big fan of reminding you of a certain male character's dark eyes in Jane Eyre.
    The phrase Dark Eyes and the Concordance table for it
  • You can also click over to Concordance Plot and see where the use of this phrase is placed within the novels. 
  • Clicking on any of the phrases in the Concordance window will take you to the File View tab and let you see where the word is used.

Modifying your Search

You can do further customization of your search to explore the different uses of your term of interest

A list of the search term options (listed below) that are available on your cluster n-grams search tool

Search Term Position: By default, AntConc will search for phrases where your subject term is on the left, however in the option of Search Term Position, you can switch that option to be On Right or leave both boxes unchecked, and Ant-Conc will search for the key word on either side of a phrase

Min Freq. : By default, the Min. Freq. option is set to one, meaning that if a phrase is used even one time, it will be included in your results. You may want to change that to be 2, 3, or more depending on how often you want a term to be used before you see it in the results.  

Min Range: By default, the Min. Range option is set to one, so if a phrase is used in even one document in your corpus, it'll be included as a result. If you are only interested in books that are used in a specific chunk of the corpus (like, say,a quarter) you could change this number to be whatever minimum number of books that is. 

Cluster Size: Since this is a tool that allows you to see when your chosen word is part of a phrase, you'll want to be able to control how many other words are part of that phrase. Using Min and Max you can change how many words are part of the phrase you're looking at. For instance, if you are interested in both two and three-word phrases you can change the max to 3

Play with these search options and see how it changes your results. 

Clone Results

You can compare two different words or set of search parameters by using the button marked Clone Results

This pops out the window containing the search results that you've done and then you can do another search in the main window and see how your two searches compare. 

The search results window and the Clone Results window compared for two different searches.

Saving Results

You can save the results of a search to a text file by choosing File, then Save Output to Text File. Note that you'll probably need to copy and paste this into a .csv or excel file yourself afterwards as AntConc doesn't transfer over the headers for the data, so without doing that it will be difficult to tell what the table is showing.

Learning Goals

AntConc can be used not just to search for terms you know you have an interest in but also to analyze overall trends within the work.

Installing AntConc

  • If you haven't already installed AntConc, head to their websiteClick on the link under Downloads that matches the computer system you are using. 
  • Antconc.exe will download and once it completes, you'll be set to run it by double-clicking on the file. 

Data

If you didn't do the previous exercise, please download the data now.

Since one of AntConc's big selling points is its ability to analyze large amounts of text, and compare those texts to each other, I'll be loading in an entire directory of files of gothic fiction downloaded from Project Gutenberg. This information is in a zip file below that you should unzip as a folder somewhere that it is easy to get to on your computer since you'll be navigating to it later.

Getting Started

If you didn't do the previous exercise, here is how to open your data. 

  • The first thing that you'll need to do to start analyzing text is to upload the files you want to analyze into AntConc. Keep in mind that you can only load in plain text files, not pdfs, word documents, web pages, or any other file format. So in your own research you'll need to make sure the items you want analyzed are in plain text (.txt) files.
  • Go to File  and choose Open Dir. On other occasions you may want to look at a single file or a few chosen files, and then you'd choose Open File(s)Open Dir  just allows you to open all of the files in a directory into your program at once.
  • Navigate to the folder you created when you unzipped GothicFiction.zip, select it and click OK.
  • When you do this the window to the left of your AntConc screen will be full of the different works within this folder that you are able to search

Searching for Collocates

A collocation is words or terms that occur more often together than would be expected by chance.

  • After loading in your corpus, proceed to the Collocates tab, and type the term that you are most interested in into the search bar, I've chosen wife to see the terms used near women described that way in gothic fiction. 
  • You'll get a message saying that first, AntConc needs to jump to the Word List tool and generate a word list. Select OK. This tool doesn't just need to know how frequent your key word is, but also how frequent each word is, so that it can analyze how often your keyword and another word appear together relative to how much they appear in total separately throughout the works. 
  • There will be a delay while your files are processed, and once it finishes, you'll see a list of the words collocated with your word of choice. 
  • The measures for these results are Rank, which is the order of the list according to which stat you want it sorted by, Freq which is the amount of times that the two words are used within the span of each other you've determined, Freq(L) for the amount of times that the collocate word appears to the left of the search term, and Freq(R) for the amount of times it has appeared to the right. Stat is the measure of how often these words are collocated.
  • By default it is sorted by the statistical measure of how collocated these words are, and you'll notice that a lot of the words near the top ranking on the list have a smaller frequency and are more uncommon words. This is due to the standard of Mutual Information being used which weights the collocation measure of an item depending on how often the collocate appears in the work as a total compared to how often it appears with your search term. This means it can weight very infrequently used words as having the highest statistical measure if it just happens to be used close to your keyword the only or one of few times that it is used.
    For example, as with the other tools, if you click on the collocate, you can see the words used together in context. So when you click on dogmatize which has the highest rating, you'll see that it's included in a context where wife is used twice in quick succession, but then never again. 
    Concordance hit for dogmatize - He does not dogmatize to his wife" "My wife is...
  • You can change which measure is used to figure out the level of collocation by going to Tool Preferences, selecting Collocates Preferences and change the Selected Collocate Measure to T-Score
  • When you change this, note that the list of terms now has very different words with a high score. The T-Score measure is mostly concerned with how often the words appear together, but doesn't weight whether a collocate never appears without your key word. So the words that are near the top are more likely to be commonly used words. 

Modifying Your Search

Other than changing which measure is used for how collocated a given pair of words are, there are other modifications

Window Span: If you want to cast a larger net as to how closely the words appear you can make the number larger, but if you want to narrow how quickly in succession the words appear you can narrow it to as little as 1 word (which would be the same as your n-grams). The check box for Same makes it so that you can narrow both measures at the same time

Collocate Options

Min. Collocate Frequency: This can be used to increase the amount of times that a word appearing in conjunction with a collocate before it appears on the list. A default is 1, but you scroll through the list you'll see that there is a huge number of words that have appeared with the keyword at least once, so it is likely they aren't all significant. 

  • Re-do your search by increasing the Min. Collocate Frequency to 5 and decreasing the Window Span  to 3 on each side and then click on Start.
  • In the case of wife the search now only yields 184 words. You can base the parameters you want to set these options to by taking a look at samples of your corpus and see how far apart related terms seem to be. If words within 1-3 words of wife seem to be discussing the wife and her actions most of the time, that's probably an accurate measure. If when you get to 5 or 6 words away and the topic has shifted most of the time, then that span is probably too far. 

Just like with the Concordance tab, you can use wildcards, or use the Advanced search to add multiple variants of a term to widen your results. 

Clone Results

You can compare two different words or set of search parameters by using the button marked Clone Results

This pops out the window containing the search results for your latest search setup. Then you can do another search in the main window and see how your two searches compare. 

The cloned results for one search result compared to the search results window

Saving Results

You can save the results of a search to a text file by choosing File, and Save Output to Text File. Note that you'll probably need to copy and paste this into a .csv or excel file yourself afterwards as it doesn't transfer over the headers for the data, so you'll need to write it down by hand to know what each column means. 

Learning Goals

AntConc can be used not just to analyze the corpus that you give it as a whole, but also how individual works within it compare to a larger corpus. If you have a group of newspaper articles on the same subject but from different locations, you can see what words are common in one location's articles vs. another, or if the discussion of that topic has a universal vocabulary? You can see if works from the beginning of an author's career differ from those towards the end in terms of vocabulary? Keyword List allows you to do that by letting you compare one work to a reference corpus of other works so you can see which words are used more often in the one work as opposed to the other works. You can also compare a whole group of works to a whole other group of works and get the same information.

Installing AntConc

  • If you haven't already installed AntConc, head to their websiteClick on the link under Downloads that matches the computer system you are using. 
  • Antconc.exe will download and once it completes, you'll be set to run it by clicking on the file in Explorer.

Data

If you didn't do the previous exercise,  please download the data now.

Since one of AntConc's big selling points is its ability to analyze large amounts of text, and compare those texts to each other, I'll be loading in an entire directory of files of gothic fiction downloaded from Project Gutenberg. This information is in a zip file below that you should unzip as a folder somewhere that it is easy to get to on your computer since you'll be navigating to it later.

Getting Started

In the previous tutorials, you've loaded in the entire group of files in the zip file into AntConc because you were analyzing the patterns across the corpus. With Keyword List you'll originally only be opening the file for the work you want compared to the rest of the corpus.

  • Go to File -> Open File and navigate to the folder that you've unzipped all the text files in the GothicFiction zip file into. 
  • Select the book you want to have analyzed against the rest of the corpus from the list of files and click Open. I'll be looking at 1815_Frankenstein_MaryShelley, so please use that if you want to follow along, but if there is a different file you want to try instead, the steps are the same. 
  • When you've done this, the Corpus Files list will just list the file that you've chosen. This is the target file or the one you want to be analyzed against your reference corpus.
    The Corpus Files list only lists Frankenstein
  • Click over to Keyword List. Right now it is blank, and even if you try to enter in a search term, you'll get an error message telling you that since there's no reference corpus word list, you'll have to generate one of those first

Setting up Keyword List With a Reference Corpus

reference corpus is the group of documents that you'll want your target file to be analyzed against to determine the unique word frequencies that the target file has compared to the reference corpus. In this case we'll be using the other pieces of Gothic Fiction that we've downloaded as the reference corpus, but there is also the option to upload a word list if that's what you have for a document.

  • Click on Tool Preferences on the menu bar and in the window that opens, select the option for Keyword List
  • Underneath Keyness Values, there are options for what method AntConc will use to figure out which words are key (or used uniquely more often) in your target corpus as compared to the reference corpus. For now, just leave on the default of Log-Likelihood.  This is also where you can set the Threshold Value to only include a subgroup of the words like the top 100 words used in your target file. In this case, leave it on the default of All Values. 
  • Click on the box next to Show negative keywords. This will mean that not only will the keyword list contain words that are used more frequently in your target file than in the corpus you're comparing it to, but at the bottom of that list, you'll also get the words that are used less frequently in your target file than in the corpus you're comparing it to. 
  • Where it says Reference Corpus, leave Use raw file(s) selected, and then click on the button marked Add Files. If you have a whole directory of files you want to add, you can chose Add Directory, but in this case since Frankenstein is in the directory with all the other works of gothic fiction, we don't want to skew the results by including it in the reference corpus we want to analyze it against. 
  • In the window that pops up to let you choose the files, use Ctrl and Shift to select all the works except for Frankenstein and click on Open 
  • The window below Reference Corpus will be populated with the names for each of the files you selected as well as the total amount of files selected
  • Click on the button for Load and a green bar will list the progress as it processes them. When it finishes the check box next to Loaded will be checked. Click Apply
  • Leaving the Search Term blank in Keyword List, click on start. You'll get a message saying that AntConc needs to jump to the Word List Tool. Click OK
  • When it finishes processing, you'll have a results list where the top is words that are used more frequently in Frankenstein than on average throughout the reference corpus.
    The Keyword List From Frankenstein, i and my are near the top along with a bunch of names.
  • Take a look at the results. Some are names, which make sense, since unless a name is for some reason particularly common for a major character with a lot of different novel writers, the names in Frankenstein would be used uniquely often in that book. I and My are also very high, with my having the highest Keyness score. This makes sense if you know a bit about the book. The narration is first person, and it contains a lengthy story from the creature which is also told in first person. This might give you an avenue of future research, was Frankenstein being told in the first person an unusual tactic compared to narrator choices for other gothic fiction?  
  • Terms such as ice (a setting for part of the story) and misery and miserable score highly too. The higher the Keyness score is, the more uniquely often it is used in your target corpus vs. the reference corpus. 
    • If you scroll down to the bottom you'll see eventually that the color of the words switches to blue.  These are the negative keywords or words that are less frequently used in your target file Frankenstein than the reference corpus it is being compared to.  
      The negative keywords, starting with she her and mr
      In this case the top 20 negative keywords have a lower Keyness score, (273-64) so the difference isn't quite as severe as those in the top 20 positive keywords (1155 -114) . Words like she, her, miss and mrs appearing so highly could indicate that there are less female characters in this work, and the inclusion of a lot of simple words might indicate that Shelley tends to use a lot of complex words instead of one syllable ones. 
  • By default, this is sorted by the Keyness  score but you can change this by clicking on the dropdown below Sort by and choosing Sort by Freq, then clicking on the button Sort. When it is sorted by Freq you can instead see how many of the most used words in Frankenstein are used uniquely often in that book and it is only a few of the unique words that were used in the hundreds of times.. 
  • If there is a specific term that you are curious about in your target file you can type it into the Search Term box but instead of clicking on Start, choose Search Only
  • I'll choose monster and when I search for that word I see it's in the 40s when the list is sorted by Keyness so not all that uniquely used in Frankenstein

Clone Results

 Now you have some results that you're interesting in looking at further, and comparing to other searches. Click on Clone Results, so AntConc will pop out your results into a new window. You'll use this second window to see how different tool preferences will change the results that you get. 

Changing Keyword List Tool Preferences

Previously, we used the default keyword generation method of Log-Likelihood which figures out the keyness of words by comparing the number of each word to the amount of other words in the corpus for both the target and reference corpus. The other method available is Chi Squared which is structured a bit differently than log-likelihood. In general Log-likelihood is recommended since items with smaller expected frequencies can have their keyness miscalculated with chi-squared.

  • To try and see the difference between these two methods, go to Tool Preferences and change the Keyword Generation Method to Chi Squared and click Apply
  • Then when it brings you back to the Keyword List menu, click on Start.
  • When comparing the two windows you'll see the results are now different. The Chi-Squared results are on the left and Log-Likelihood are on the right. 
  • In the Chi-Squared test, proper names are higher up on the chart and in fact most of the top 10 are now names. 

Using Multiple Target Files

With the keyness test you aren't limited to only comparing one file to a larger corpus, you can compare multiple files that have an attribute you're interested in to multiple files without that attribute. Maybe you're interested in books from one decade vs. another. Maybe you're interested in books from female vs. male writers. In this case, you'll be comparing books from the beginning of the corpus, which goes back to the 18th century to those at the end, so the early 20th century.

  • First, you'll need to close the file you already have open, go to File -> Close All Files.
  • Then, go back up to File -> Open Files. Select from 1764_CastleOfOtranto... to 1824_PrivateMemoirsAndConfessions... from the Gothic Fiction file and click Open.
  • You'll also need to change your reference corpus to the late 19th and early 20th century items that you are interested in comparing your target corpus to. Go to Tool Preferences and use Clear and then Clear List to get rid of your existing reference corpus. 
  • Click on Add Files and select 1859_TheWomanInWhite... through 1911_LairOfTheWhiteWorm... from your Gothic Fiction file then click on Open, and then Load
  • When your corpus has finished loading, switch Keyword Generation Method back to Log-Likelihood and choose Apply.
  • Go to Word List to regenerate the word list with your new target and reference corpuses by clicking on Start
  • Then click back to Keyword List  and click Start to see the new results 
  • In the positive keywords, many again are names, but also there are archaic word forms like thy and thou, and words less often used in the later era like upon. You can scroll through to get a sense of which words were used more often in gothic literature of the late 18th and early 19th century vs that of the late 19th to early 20th century
    The results for this keyword search, mostly names.
  • When you get down far enough the negative keywords will start to show up, or words used much less often in the target corpus, early gothic literature than the reference corpus which was later gothic literature.

  • This does have a few proper names but there are other interesting items included like two different forms of the word doctor, suggesting that was a profession of characters more often in the later half of the 19th century than previous. We and us are also present, so perhaps later gothic literature has more stories containing collective action by characters rather than those of individuals. 

 

Saving Output

At any point you can save your output lists for comparison or graphing in other programs, by using File -> Save Output to Text File. You'll want to paste it into a csv or xls file so you can add in the heading explaining what each column is, since that information does not push over to the text file.

Things to Keep in Mind

The kind of distant reading that this text analysis tool allows you to do isn't meant as a substitution for other knowledge of the subject, but a way to look for patterns in text that you have some subject knowledge of but not the time to read 40, 100, 300, or 3,000 documents that pertain to the subject.