How a word is used in context by an author can tell you something about what it represents to them. A tool called WordTree will let you visualize the context a word is used in throughout a whole work. It can show you every time a word you've chosen is used in a work, and the sentence before or after it. It's a project of Jason Davies, a DH scholar from a technique by Martin Wattenberg and Fernanda Viegas.
You can put in one of the character's names from a work and see what they are usually doing or saying as compared to another character. You can see if the author tends to use dark as an adjective or noun and when. You can see who the author describes as taking certain actions. All you need is pasted-in text.
Below I've attached the text of The Tempest since it is freely available online, though you can use something as long as that or as short as a few news articles or speeches. The Tempest what I'll be using in my example but if you have another work you want to experiment with, go ahead.
It is important to note though, that there isn't yet any kind of export function for this tool, all you can do is take a screenshot. So it's more for exploring than undertaking a detailed technical analysis.
This is a copy of Shakespeare's The Tempest that I downloaded as plain text from Project Gutenberg. I've already stripped out the additional language regarding when it was uploaded and other introductory material. Be sure when you're undertaking any kind of text analysis that you get rid of things like introductions or 'About the Authors' if you can so it doesn't skew your results.
Word Tree is a useful exploring tool, but it doesn't tell you on its own which words are most frequent or more frequently use together. You are limited to whatever list of words you already have in mind to search for, or what you discover by experimenting with the results of those searches. It can help you experiment and provides elegant illustrations for patterns that you've noticed in a work, but you can't download the results as a data set for offline use or for use in another program.
WordWanderer is an online program by Marian Dörk and Dawn Knight at Newcastle University that allows you to:
In this exercise, you'll use WordWanderer to look at the text of Frankenstein by Mary Shelley and see what distant reading of it using WordWanderer can highlight about the text.
This is a copy of Frankenstein by Mary Shelley I have downloaded from Project Gutenberg. I've undertaken the extra step of deleting the metadata (when it was posted, who transcribed it, etc) so that only the text of the novel will go into the Word Wanderer. It's a good idea to get rid of anything that isn't the actual text that you are studying before you put it into a text analysis mechanism.
By default, WordWanderer gets rid of certain common words but displays all other kinds of language (adjective, nouns, verbs, etc). You can alter these settings to get rid of further words from the display and frequency counts, to allow those common words in, and to only display certain kinds of language by using the Options menu at the top of the screen.
Common words are often called stopwords and consist of things like pronouns and articles like 'and', 'the', 'a'. It may be useful to you to leave the common words in because you're analyzing how different writers use them. You may want to get rid of a certain word or name from the interface. You also might want to narrow down what is shown to adjectives or verbs depending on what about a piece you are interested in analyzing. You'll try these out on the text of Frankenstein you've loaded in and see how the results change
Only a limited number of words will appear in the interface at a time, although Word Wanderer has searched the entire document. If you are interested in a word that didn't appear frequently enough in the work to be included in the initial interface you can still find it by typing it into the box with search for words written in it that appears at the top of the page. When the keyword is found, you can use the same tools to find out words that it is often used with.
You can take screenshots of any of the visualizations of word relationships you discover, but there isn't any kind of export function. This is a good tool to use to explore and provide illustration of concepts, but if you want to take the data you receive from it and analyze it in other ways, one of the more complex pieces of programming detailed elsewhere in the Text Analysis tutorial section like AntConc might be more useful.
The previous two modules in this tutorial dealt with how to do text analysis when you have an individual text to analyze, one novel or play, or if you had a representative amount of work by an author. With the visualization program Bookworm, you can analyze word use in an entire genre over time, comparing individual words or phrases in some instances. This is a program created by several scholars at the Cultural Observatory. It is designed to be used on a whole corpus. A corpus is a large set of text organized by topic , genre, source or other category. This can be books, speeches, periodicals or even TV and movie subtitles. Bookworm graphs the use of a term or phrase over time across whatever group of works is fed into it, and is hosted several places, over different corpuses. Its code is freely available (open-source) and can be set up to analyze any corpus but doing the setup yourself is extremely complicated so for the general user one of the locations that hosts the program for their corpus of interest will do.
With this exercise, you'll see how to use Bookworm applied to different bodies of work hosted at several different sites.
HathiTrust is a digital library with scanned copies of hundreds of thousands of books from college research libraries all over the country.You can't read the full text of all of these books for copyright reasons, but their ability to be searched for key terms for text analysis projects like Bookworm is falls within fair use provisions of copyright. Using Bookworm on HathiTrust's archives lets you search through these volumes en masse to track the use of a certain word or phrase over time. You can narrow the graph down to certain date ranges and certain genres. However, you can't get too granular in your selections (so no picking just 8 books from the 1920s and 8 from the 1940s by certain authors), so keep that in mind when planning to use it in a project.
While this version of Bookworm does allow you to link and download it has some important limitations you should keep in mind. One is that you cannot use it to search for phrases (or n-grams). You can search for single terms only. Another is that while you can narrow it within certain categories, you can't narrow it only to a single work or to a set group of works. There are some tools you can use on the Hathitrust website where you can pick and choose which (public domain) works you'd like included in your analysis and we'll go into those in a later example, but Bookworm on this site can't be used that narrowly.
Scholar Ben Schmidt was one of the designers for Bookworm, and at his own website, he hosts several digital tools using Bookworm or other programs. bookworm:movies is one he created that uses the Open Subtitles database to search through movies and television shows to see how vocabulary has changed in these media from 1960-2014. This can be relevant to your research if you're interested in communications, media and cultural history.
You should read the blog post at the top that contains the disclaimers about where the information for this graph is coming from and how it can be used.
The more specific you go (just a particular director or just a particular TV show) the more complete your results will be. This isn't a database of all television shows or all movies, just the ones that were in the Open Subtitles database at the time that the program was constructed. If you are using a more narrow lens (just the Simpsons, say) you'll know better when a low % means that the term was actually used less, and when it means that there is a lack of data for that particular year for a certain genre.
This one is a bit more niche but for scholars of societal trends as shown through fashion, the Robots Reading Vogue tool created by Yale can be extremely illuminating. In the exercise below, we'll look and see if trends in the use of a term in advertising occur before or after its use in features, and if pockets being a feature in women's fashion were ever popular or if women have always been stuck having to carry a purse. You can look at these trends alongside any other societal trends you may be studying.
Currently the Bookworm is set up to search all parts of Vogue, advertisements, articles, photo shoots and even letters to the editor for these terms. So if you wanted to see if different parts of the magazine, say the articles and the advertisements for example showed different trends for these terms, you can add filters, and add additional search boxes.
It looks like both handbag and purse rose throughout the early half of the 20th century, taking over for both forms of pockets in the 50s. then making a comeback in the 60s briefly before dropping back off. Handbag has been on the rise since 1986
Play around with other fashion or gender-related vocabulary words that you're curious about and see what you can find with this site.
Though there are other tools that let you explore n-grams and word frequencies with more complexity, Textalyser is useful if you are at a computer where you can't download any programs and you just want a quick n-gram analysis or word frequency count for a piece. It is simple to use, and its results are straightforward. You'll be using this tool to look at word frequencies and n-grams used in no star reviews from Roger Ebert.
Though Textalyser will also let users paste in text or just a website URL for the item they want analysed, in this case you'll be provided with the below text file of Roger Ebert's reviews of movies that are 20% or less on Rotten Tomatoes to explore. If you'd rather look at another document, feel free, just remember that if it's a file you're uploading, it will need to be plain text (.txt)
After Textalyser is done analyzing the text, you'll see that at the bottom of the page will be a long list headed as Textalyser Results
While this site doesn't let you compare works to each other directly, or visualize what's revealed, you can easily copy and paste any of these charts into an Excel document or Google Sheets. From there you could graph or filter the data if you wanted, but Textalyser doesn't have a visualization option.