AntConc is a software that makes it straightforward to do searches and analysis on large numbers of text files. In this tutorial, you'll download the program, and use it to plot the use of a term or phrase in different works in gothic fiction. You'll also see how to search for words when they appear in the context of other terms of interest and how to pop out the results window so you can compare results for multiple words or parameters.
Since one of AntConc's big selling points is its ability to analyze large amounts of text, and compare those texts to each other, I'll be loading in an entire directory of files of gothic fiction downloaded from Project Gutenberg. This information is in a zip file below that you should unzip as a folder somewhere that is easy to get to on your computer since you'll be navigating to it later.
Concordance is a list of words used in a corpus (group of texts) ranked by how frequently they are used. The Concordance tab will let you see the words in the context that they are used in each document, but you supply it with the terms for it to search for. So it is most useful if you already know which terms are of interest to you, rather than if you are trying to see the most popularly used words in a piece.
A Concordance Plot allows you to see when in the narrative your search term is used by displaying bars across a box indicating where in the text its used. Perhaps you want to see at a glance which of the novels have weddings at the end by using that as a search term and look for where there are a lot of black lines towards the end of the box. If you are looking at only one work and want to see which parts of the book prominently feature one character rather than another, you could compare the plots for each of those terms.
If you simply type in the word love and select Words, and search, you'll only get uses of that word in that tense. You won't get uses of loves, loved, loving or other variants. There are a couple of different ways you can make sure you're getting a more complete view of the use of a term by searching for its other forms as well.
Wildcards. A wildcard is the use of a symbol like a * or a ? to let the search engine know that it can search for that your term at the beginning or ending of a word. To use one in this case, instead of typing love, you'd type lov* in order to search for words that start with lov and have any other combination of characters after it.
Advanced Search: This will allow you to add multiple terms to be searched. You can type them in or even load them from a list that you've saved to a text file.
Use of | to connect terms: If you only have a few terms and don't want to upload a list, you can just enter the terms in the Search Term box and separate them with the | symbol. This symbol is called a pipe.
The Search Term boxes in AntConc allow for you to type in not just single words, but also phrases. If you're trying to trace the use of a phrase, or the use of someone's full name, this can be very helpful. I'll be showing you below how to look at the different words used after loves or loved depending on if the person in question is male or female.
You can get more into this with the collocates tab later, but if you only want to look for your search term if it's within a certain number of words of another term of interest for you, you can use the Advanced search to do so. I'll be doing this with love and dark, since it's gothic fiction we are looking at and those topics are rather commonly connected in that genre. You can also do this with different adjectives or verbs and a character you're interested in or a location and an adjective you're interested in. Maybe you want to see how often it's mentioned that the castle is on the moor, or how often someone is really described as dark and handsome. You can narrow down how close the words have to be to each other by changing the Context Horizon.
If you are interested in one set of search results and want to be able to compare it to another, click on the button Clone Results at the bottom of the window. The results will pop out into a new window. You can conduct a new search in the original AntConc window and then compare the two side by side.
You can save the results of a search to a text file by choosing File and then Save Output to Text File. Note that you'll probably need to copy and paste this text into a .csv or excel file yourself afterwards as AntConc doesn't transfer over the headers for the data.
These tools above will let you see the context that your chosen words or phrases are used in for the works in your corpus. The tools in the next module Clusters/N-grams will let you know which phrases are most popular so that you might gather new phrases to search for.
AntConc can be used not just to search for terms you know you have an interest in but also to analyze overall trends within the work. It isn't limited only to the uses of single terms but can be used to look for that word in two or more word phrases, known as n-grams. N in this context means a number, so it's a stand-in to let you know that the phrase has multiple words. For example, bigrams are two word phrases, trigrams are three words, and so on. N-gram just lets you include all those options in one bucket term. You'll see how to look for n-grams within the text using this tab.
If you didn't do the previous exercise, please download the data now.
Since one of AntConc's big selling points is its ability to analyze large amounts of text, and compare those texts to each other, I'll be loading in an entire directory of files of gothic fiction downloaded from Project Gutenberg. This information is in a zip file below that you should unzip as a folder somewhere that it is easy to get to on your computer since you'll be navigating to it later.
If you didn't do the previous exercise, here is how to open your data.
In the previous exercise, you were able to search for words of interest to you in the corpus and see the frequency of their use, and the context of their use in the different novels that make up your Gothic Fiction corpus. The Clusters/N-Grams tool in AntConc will allow you to see what phrases the word you are interested in is often a part of.
You can do further customization of your search to explore the different uses of your term of interest
Search Term Position: By default, AntConc will search for phrases where your subject term is on the left, however in the option of Search Term Position, you can switch that option to be On Right or leave both boxes unchecked, and Ant-Conc will search for the key word on either side of a phrase
Min Freq. : By default, the Min. Freq. option is set to one, meaning that if a phrase is used even one time, it will be included in your results. You may want to change that to be 2, 3, or more depending on how often you want a term to be used before you see it in the results.
Min Range: By default, the Min. Range option is set to one, so if a phrase is used in even one document in your corpus, it'll be included as a result. If you are only interested in books that are used in a specific chunk of the corpus (like, say,a quarter) you could change this number to be whatever minimum number of books that is.
Cluster Size: Since this is a tool that allows you to see when your chosen word is part of a phrase, you'll want to be able to control how many other words are part of that phrase. Using Min and Max you can change how many words are part of the phrase you're looking at. For instance, if you are interested in both two and three-word phrases you can change the max to 3
Play with these search options and see how it changes your results.
You can compare two different words or set of search parameters by using the button marked Clone Results.
This pops out the window containing the search results that you've done and then you can do another search in the main window and see how your two searches compare.
You can save the results of a search to a text file by choosing File, then Save Output to Text File. Note that you'll probably need to copy and paste this into a .csv or excel file yourself afterwards as AntConc doesn't transfer over the headers for the data, so without doing that it will be difficult to tell what the table is showing.
AntConc can be used not just to search for terms you know you have an interest in but also to analyze overall trends within the work.
If you didn't do the previous exercise, please download the data now.
Since one of AntConc's big selling points is its ability to analyze large amounts of text, and compare those texts to each other, I'll be loading in an entire directory of files of gothic fiction downloaded from Project Gutenberg. This information is in a zip file below that you should unzip as a folder somewhere that it is easy to get to on your computer since you'll be navigating to it later.
If you didn't do the previous exercise, here is how to open your data.
A collocation is words or terms that occur more often together than would be expected by chance.
Other than changing which measure is used for how collocated a given pair of words are, there are other modifications
Window Span: If you want to cast a larger net as to how closely the words appear you can make the number larger, but if you want to narrow how quickly in succession the words appear you can narrow it to as little as 1 word (which would be the same as your n-grams). The check box for Same makes it so that you can narrow both measures at the same time
Min. Collocate Frequency: This can be used to increase the amount of times that a word appearing in conjunction with a collocate before it appears on the list. A default is 1, but you scroll through the list you'll see that there is a huge number of words that have appeared with the keyword at least once, so it is likely they aren't all significant.
Just like with the Concordance tab, you can use wildcards, or use the Advanced search to add multiple variants of a term to widen your results.
You can compare two different words or set of search parameters by using the button marked Clone Results.
This pops out the window containing the search results for your latest search setup. Then you can do another search in the main window and see how your two searches compare.
You can save the results of a search to a text file by choosing File, and Save Output to Text File. Note that you'll probably need to copy and paste this into a .csv or excel file yourself afterwards as it doesn't transfer over the headers for the data, so you'll need to write it down by hand to know what each column means.
AntConc can be used not just to analyze the corpus that you give it as a whole, but also how individual works within it compare to a larger corpus. If you have a group of newspaper articles on the same subject but from different locations, you can see what words are common in one location's articles vs. another, or if the discussion of that topic has a universal vocabulary? You can see if works from the beginning of an author's career differ from those towards the end in terms of vocabulary? Keyword List allows you to do that by letting you compare one work to a reference corpus of other works so you can see which words are used more often in the one work as opposed to the other works. You can also compare a whole group of works to a whole other group of works and get the same information.
If you didn't do the previous exercise, please download the data now.
Since one of AntConc's big selling points is its ability to analyze large amounts of text, and compare those texts to each other, I'll be loading in an entire directory of files of gothic fiction downloaded from Project Gutenberg. This information is in a zip file below that you should unzip as a folder somewhere that it is easy to get to on your computer since you'll be navigating to it later.
In the previous tutorials, you've loaded in the entire group of files in the zip file into AntConc because you were analyzing the patterns across the corpus. With Keyword List you'll originally only be opening the file for the work you want compared to the rest of the corpus.
A reference corpus is the group of documents that you'll want your target file to be analyzed against to determine the unique word frequencies that the target file has compared to the reference corpus. In this case we'll be using the other pieces of Gothic Fiction that we've downloaded as the reference corpus, but there is also the option to upload a word list if that's what you have for a document.
Now you have some results that you're interesting in looking at further, and comparing to other searches. Click on Clone Results, so AntConc will pop out your results into a new window. You'll use this second window to see how different tool preferences will change the results that you get.
Previously, we used the default keyword generation method of Log-Likelihood which figures out the keyness of words by comparing the number of each word to the amount of other words in the corpus for both the target and reference corpus. The other method available is Chi Squared which is structured a bit differently than log-likelihood. In general Log-likelihood is recommended since items with smaller expected frequencies can have their keyness miscalculated with chi-squared.
With the keyness test you aren't limited to only comparing one file to a larger corpus, you can compare multiple files that have an attribute you're interested in to multiple files without that attribute. Maybe you're interested in books from one decade vs. another. Maybe you're interested in books from female vs. male writers. In this case, you'll be comparing books from the beginning of the corpus, which goes back to the 18th century to those at the end, so the early 20th century.
When you get down far enough the negative keywords will start to show up, or words used much less often in the target corpus, early gothic literature than the reference corpus which was later gothic literature.
This does have a few proper names but there are other interesting items included like two different forms of the word doctor, suggesting that was a profession of characters more often in the later half of the 19th century than previous. We and us are also present, so perhaps later gothic literature has more stories containing collective action by characters rather than those of individuals.
At any point you can save your output lists for comparison or graphing in other programs, by using File -> Save Output to Text File. You'll want to paste it into a csv or xls file so you can add in the heading explaining what each column is, since that information does not push over to the text file.
The kind of distant reading that this text analysis tool allows you to do isn't meant as a substitution for other knowledge of the subject, but a way to look for patterns in text that you have some subject knowledge of but not the time to read 40, 100, 300, or 3,000 documents that pertain to the subject.