Skip to Main Content

DAsH

Research Guide for DAsH (or digital humanities) resources and tools

Learning Goals

How a word is used in context by an author can tell you something about what it represents to them. A tool called WordTree will let you visualize the context a word is used in throughout a whole work. It can show you every time a word you've chosen is used in a work, and the sentence before or after it.  It's a project of Jason Davies, a DH scholar from a technique by Martin Wattenberg and Fernanda Viegas.

You can put in one of the character's names from a work and see what they are usually doing or saying as compared to another character. You can see if the author tends to use dark as an adjective or noun and when. You can see who the author describes as taking certain actions. All you need is pasted-in text.

Below I've attached the text of The Tempest since it is freely available online, though you can use something as long as that or as short as a few news articles or speeches. The Tempest what I'll be using in my example but if you have another work you want to experiment with, go ahead.

It is important to note though, that there isn't yet any kind of export function for this tool, all you can do is take a screenshot. So it's more for exploring than undertaking a detailed technical analysis.

Data

This is a copy of Shakespeare's The Tempest that I downloaded as plain text from Project Gutenberg. I've already stripped out the additional language regarding when it was uploaded and other introductory material. Be sure when you're undertaking any kind of text analysis that you get rid of things like introductions or 'About the Authors' if you can so it doesn't skew your results. 

Getting Started

  • Go to https://www.jasondavies.com/wordtree/ in your web browser of choice.  There are a couple of sample choices on the main page if you want to first see how Word Tree operates.
  • In a Wordtree, the keyword is in the largest font on the page, and has multiple branches connected of phrases that start with that keyword.  If the keyword is followed (or in the case of a reverse tree, preceded) by the same word multiple times, that word will appear in a bigger font, and then branch off into the sentences that continue after that word. Here's an example from Bob Dylan's "Blowin' In the Wind".
    "How many" is in large letters, and from it branches the phrases and sentences that come after it
  • How Many is the keyword phrase, since it is followed multiple times by times and years can, those words are larger, and branch off into the phrases that follow those words. At the bottom of the page are the sentences starting with words that don't occur repeatedly after How Many so they are smaller and don't contain further branches. If this seems a little unclear to you still, try the other pre-loaded texts on the front page of WordTree, until what is being illustrated with this page is clear. 
  • Open up the copy of The Tempest that you've downloaded, highlight all the text and copy it. 
  • Go to the box where it says Paste Text on the WordTree site and paste in the text of The Tempest. Note that there's a box that says Shareable that is checked by default. For this it makes sense to leave that box checked, The Tempest isn't copyright protected and doesn't contain private information. But if you were instead pasting in  transcriptions of personal interviews that contain sensitive information or materials that you don't have the copyright for, you'd want to uncheck that box.
  • When you have The Tempest pasted in, click on Generate WordTree!

Working in the Word Tree Interface

  • By default, it'll just use the first word in your document, which is SCENE as the keyword. The display contains the WordTree in the panel taking of the left side, the full text of the document is on the right with the portion containing the first instance of the word highlighted. There is scroll bar between the two with a line with notches in it telling you where in the full text the keyword is found. The highlighted portion of the scroll bar shows what position of the text you are scrolled to.  
  • You can hold down the shift key and click on any word either in the word tree or in the text on the side of the screen to make it the new root of the tree. Hold down the shift key and click on 'thunder' in that first sentence. The display will change to make thunder the root of the word tree. You'll see that the word is followed a few times by an entrance of a character, and that it's used before '-stroke' and 'and lightning' multiple times.
  • The word thunder is at the root of the word tree that branches out into different words that have followed the word thunder.
  • When you generate a WordTree, the default is to show you what sentence comes after the key words you are interested in. You can instead see the word that comes immediately before it by going up to where it says reverse tree next to the search bar and checking that box. Now you can instead see 'thunder' in relation to the words that come before it.
    In this word tree, thunder is at the far right and various other words branch from it to the left.
  • You aren't limited to only words that you see in the interface, by using the search bar you can search for a word to see what context it's used in (and whether it's in the work). Try 'monster' by typing it into the search bar and clicking enter. 
  • You can see that it is often used before Cal speaks (Caliban is sometimes addressed as monster) or Trin or Steph for Trinculo and Stephano.  If you click reverse tree you can see the words and punctuation used before 'monster', that sometimes it's hyphenated with other terms, and that it can often appear at the beginning of a sentence,  which shows that it's a form of address. 
  • If you are interested in having only a briefer selection before or after the keyword you've made your root, you can click on the box before one phrase per line.
  • If you want to get a closer look at only one of the branches, say the one where 'The poor' precedes 'monster' just click on the phrase you're interested in, and the word tree will narrow to only show you that part in the interface, and highlight it within the text to the right. Just click on your root word 'monster' in order to remove this filter.
  • You can click to reverse the tree and then it will show you what follows that phrase 'poor monster'
  • You can type a phrase into the search bar as well, for instance 'beseech you' will tell you who is being asked desperately to do something for the speaker

The phrase Beseech you is the root followed by branches into the text after it. .

Word Tree is a useful exploring tool, but it doesn't tell you on its own which words are most frequent or more frequently use together. You are limited to whatever list of words you already have in mind to search for,  or what you discover by experimenting with the results of those searches. It can help you experiment and provides elegant illustrations for patterns that you've noticed in a work, but you can't download the results as a data set for offline use or for use in another program.

Learning Goals

WordWanderer is an online program by Marian Dörk and Dawn Knight at Newcastle University that allows you to:

  • see the most frequently used words in a work
  • see which words are most often used near a word you choose as a keyword
  • see the context that keyword is used in (sort of like Word Tree in the previous tutorial, but more basic)
  • choose two keywords and see the words that both keywords appear near
  • filter the words you are performing these tasks on to only certain parts of speech, and to leave out other bits of punctuation

In this exercise, you'll use WordWanderer to look at the text of Frankenstein by Mary Shelley and see what distant reading of it using WordWanderer can highlight about the text. 

Data

This is a copy of Frankenstein by Mary Shelley I have downloaded from Project Gutenberg. I've undertaken the extra step of deleting the metadata (when it was posted, who transcribed it, etc) so that only the text of the novel will go into the Word Wanderer. It's a good idea to get rid of anything that isn't the actual text that you are studying before you put it into a text analysis mechanism.

Getting Started

  • Go to the website for WordWanderer -  http://wordwanderer.org/
  • It offers the option of either selecting one of the texts it has available or pasting in the text that you want analyzed. 
  • Open 1815_Frankenstein_MaryShelley.txt and copy the text inside. Paste it into the box on the center-right of the screen that says Paste your text here.
  • There's a box for turn text to lower case, check it. Turning all words to lower case makes it easier for the program to analyze it, otherwise it will see a word beginning with a capital letter and classify it as a different word than that same word beginning with a lower case letter.  This will skew your frequency counts as there will be one count for 'The' at the beginning of the sentence and another for 'the' in the middle of a sentence. You may not want to have this checked if you know there's a word within your project that means something different when capitalized. Like if there's a character named 'Will' for instance and you don't want to confuse the character with the word 'will'.
  • When the text is pasted into the box, click on the button that says wander.

Using the WordWanderer Interface

  • After the program is done running, the next screen that will load is a list of the most commonly used words in the text. The larger the font size on a word, the more often it is used. This tells you something about word frequency and what common topics and themes might be but you can take it a step further.
  • One of the larger terms on here is the word 'elizabeth', the name of Victor Frankenstein's betrothed. If you want to see the words most often used by her name you can hover over the name and the display will change, with the words most commonly used with her name appearing outlined in a darker shade of purple. You see a lot of superlatives here, a lot of words for family relations, and a lot of words concerning negative circumstance like 'anguish', 'grief 'and 'horror'. If you wanted you could tally how many of these darker outlined words fall into positive, neutral or negative categories.
  • If you click on her name (the term you are interested in) you'll get a different diagram, with her name in the center. Words of a larger size are the ones that most often occur near 'elizabeth', and the words closest to the center are the ones that are the closest to her name in sentences. Below the diagram you can see every use of the keyword in context. You can see how associated she is with Victor's father, and with emotion-related words, 'hope', 'horror', 'sweet'
  • Click back on the highlighted term in the middle 'elizabeth' in order to return to the initial interface
  • To see what words two different terms are both used in common context with, just click on one term and then the other and draw a line between them. You can only connect words that are both used in proximity to each other, so when you hover over a word, your choices for what to connect it to are limited to those that highlighted. In this case, choose 'thoughts' and 'feelings' by first clicking on 'thoughts' and then dragging the line that appears over to connect with 'feelings'.
  • The interface now switches to a display where each of the keywords goes to an opposite side of the chart. Within the chart are all the words that appeared commonly with both of the highlighted keywords. If the word is in a larger font, that means that combined, it was used many times between the two words. The words towards the center are commonly used near both keywords equally, those close horizontally to 'thoughts' are used more often near that word, and those close horizontally to 'feelings' are used more often near that word.

    Is there anything interesting that you notice about the words that were common to both? About the words used more often with feelings than thoughts?
  • Click on 'feelings' and then click on it again when it moves to the center of the interface to clear it back to the original screen

Filtering on WordWanderer

By default, WordWanderer gets rid of certain common words but displays all other kinds of language (adjective, nouns, verbs, etc).  You can alter these settings to get rid of further words from the display and frequency counts, to allow those common words in, and to only display certain kinds of language by using the Options menu at the top of the screen.

Common words are often called stopwords and consist of things like pronouns and articles like 'and', 'the', 'a'. It may be useful to you to leave the common words in because you're analyzing how different writers use them. You may want to get rid of a certain word or name from the interface. You also might want to narrow down what is shown to adjectives or verbs depending on what about a piece you are interested in analyzing. You'll try these out on the text of Frankenstein you've loaded in and see how the results change

  • Go up to Options at the top of the interface and click on it. There are two sets of options on the menu, Show where you can pick which parts of speech are included in the interface, and Hide where you can exclude certain items.
  • " is one of the largest items on the results page right now, despite Punctuation not being checked as a part of speech to show. To correct this oversight, you can go to the section Hide and add " where it has the the blank space that allows you to add more words to the hide list. Add ' too and click Apply
  • You'll see that the interface shifts to not contain those items. If you want them to return again, you can just uncheck the box. To see what it would look like for common words not to be hidden, uncheck that box and click Apply
  • Now other words are still contained in the interface but items such as i and of and to are much larger than anything else. So it isn't quite as useful to you to determine a word's relative frequency, since all other words are going to be of a smaller size than those.
  • But suppose you wanted to leave in these common pronouns and see how the words describing the actions differ? Go up to Options and change the boxes under Show to only be checked off for Verbs, Adverbs and Pronouns. Click Apply 
  • Now, if you want to see how a speaker describes their own actions, and how they describe the actions of someone else, you can click on i, he, and she and look at how the interfaces differ, both in the verbs associated with these different pronouns and the proximity to the subject.
  • Go back up to Options and restore it to its default, the boxes in the first two columns checked, and the common parts of speech now hidden.

Searching the Interface

Only a limited number of words will appear in the interface at a time, although Word Wanderer has searched the entire document. If you are interested in a word that didn't appear frequently enough in the work to be included in the initial interface you can still find it by typing it into the box with search for words written in it that appears at the top of the page. When the keyword is found, you can use the same tools to find out words that it is often used with.

  • Type face into the search box and take note of the words appearing around it
  • Click and hold on face, drawing a line between it and monster to see the words common between the two. Click on face and then see if the words that it shares with man are similar. 

You can take screenshots of any of the visualizations of word relationships you discover, but there isn't any kind of export function. This is a good tool to use to explore and provide illustration of concepts, but if you want to take the data you receive from it and analyze it in other ways, one of the more complex pieces of programming detailed elsewhere in the Text Analysis tutorial section like AntConc might be more useful. 

Learning Goals

The previous two modules in this tutorial dealt with how to do text analysis when you have an individual text to analyze, one novel or play, or if you had a representative amount of work by an author. With the visualization program Bookworm, you can analyze word use in an entire genre over time, comparing individual words or phrases in some instances. This is a program created by several scholars at the Cultural Observatory. It is designed to be used on a whole corpus. A corpus is a large set of text organized by topic , genre, source or other category. This can be books, speeches, periodicals or even TV and movie subtitles. Bookworm graphs the use of a term or phrase over time across whatever group of works is fed into it, and is hosted several places, over different corpuses.  Its code is freely available (open-source) and can be set up to analyze any corpus but doing the setup yourself is extremely complicated so for the general user one of the locations that hosts the program for their corpus of interest will do. 

With this exercise, you'll see how to use Bookworm applied to different bodies of work hosted at several different sites. 

Using Bookworm at HathiTrust

HathiTrust is a digital library with scanned copies of hundreds of thousands of books from college research libraries all over the country.You can't read the full text of all of these books for copyright reasons, but their ability to be searched for key terms for text analysis projects like Bookworm is falls within fair use provisions of copyright. Using Bookworm on HathiTrust's archives lets you search through these volumes en masse to track the use of a certain word or phrase over time. You can narrow the graph down to certain date ranges and certain genres. However, you can't get too granular in your selections (so no picking just 8 books from the 1920s and 8 from the 1940s by certain authors), so keep that in mind when planning to use it in a project. 

  • Go to https://bookworm.htrc.illinois.edu​ A page will load with default options for search but you can see a bit about how the tool works. At the top are the search terms in the color that corresponds with which line on the graph represents them. You can delete a term or add a new one using the plus and minus sign to the left. To the right of the search box is the filter being used.
    In the example, the blue line represents 'computer' being used in US publications and the orange represents 'computer' being used in UK publications. You can change these filters by clicking on the funnel shaped icon to the right and changing which filters are in use and what they are filtering for (only novels? only history? only science?). Further options are available at the top right that will do things like narrow the date range, change if the search is case sensitive, and change what the graph is measuring (total volumes containing your keyword? # of keywords used per million words? total mentions of keyword?).
    If you hover over the line you see the year the point you are over corresponds to, and if you click you can see information on the top volumes using the term, though not all items that contain it. 
    The bookworm interface described above.
  • To alter the search terms and filters to match the geographic area you're interested in, first click on the filter icon for each and then click on the box next to Publication Country. If there are some already selected, click on each to unselect them and it should now say in All Volumes as what is selected
  • To see if there has been fluctuations of scholars studying Greek, Roman and Egyptian history,  add in the terms 'Greek', 'Roman' and 'Egyptian' as your search terms. Put 'Greek' on the first line, 'Roman' on the second, and then press the + sign to the left of the search box. This creates a new line. In that line type in 'Egyptian', and then click the button that says Search
  • You'll get a graph where the Roman line far surpasses the other two, and though the Greek line at times gets a bit closer to the Roman one, the Egyptian line stays hovering near the bottom of the chart.
    A graph of the use of the words Greek Roman Egyptian in stuff
  • It is important however to note what parameters are on the corpus the Bookworm is searching, and you can always check that to the right of the search term. In this case, it's searching everything the Hathitrust has. So if the item was actually talking about modern Greece and used the term 'Greek', that would be counted. To get something a bit closer to your research question about history scholars you'd want to make sure it was only searching works about history.
  • Go to the top line for the  'Greek' search term and click on the filter icon at the far right. Take a look at all the different restrictions you can place on where your term is being searched for. Scroll down to the box next to Class and click on it to see the options. Toward the top of the list there is one that says World History And History of Europe, Asia, Africa, Australia, New Zealand, Etc. This will narrow your search to only items classified as being about those subjects. Click on this option and it will add it to the class. You can add multiple classes to a search if needed, but let's stick with this one for now.
    Class option
  • Go into the filters for Roman  and Egyptian and make the same change to the Class to make it World History And History of Europe, Asia, Africa, Australia, New Zealand, Etc then scroll to Search at the top and click on it. It will take some time for Bookworm to process. 
  • The new graph still has the three with very different levels but it is interesting to note things like the use of 'Roman' experiencing more obvious peaks and valleys than measured for the other two in the 19th century, and that in the 1940 to 1960 range the term 'Egyptian' was rising while 'Greek' fell and 'Roman' fluctuated
    A different graph of the three terms
  • A larger graph like this can tell you which periods might make for interesting further examination. In this case, let's take a look at the period between 1940-1960.  You can narrow down your date range by going to the Dates menu above the graph, and moving the blue circles until it contains the date range of 1940-1960.
  • Change the Smoothing mechanism from 10 years down to 2 since you're dealing with a smaller amount of time. When you've changed this, select Search.
  • This more zoomed in view shows some interesting things, like briefly, 'Greek' was used more often than 'Roman' in history texts in 1944, something you missed in the more smoothed out graph. It also shows that 'Greek' mostly fell from 1946 onwards while the other two mostly rose until the end of the 1950s. 
    a more zoomed in version of the graph
  • For any larger project you'd want to graph other terms related to Roman, Egyptian or Greek history to see if they confirmed or disputed the trends showed by this graph, like did 'Pharoah' and 'Caesar' display falls and rises like 'Egyptian' and 'Greek' if it is particularly ancient history that you are interested in.
  • You can create a link to this view of the graph by going to the icon that looks like a chain link, and can export a screenshot of it by clicking on the icon that looks like a cloud with an arrow pointing out of it.

​While this version of Bookworm does allow you to link and download it has some important limitations you should keep in mind. One is that you cannot use it to search for phrases (or n-grams). You can search for single terms only. Another is that while you can narrow it within certain categories, you can't narrow it only to a single work or to a set group of works. There are some tools you can use on the Hathitrust website where you can pick and choose which (public domain) works you'd like included in your analysis and we'll go into those in a later example, but Bookworm on this site can't be used that narrowly. 

Using Bookworm to Search TV and Movies

Scholar Ben Schmidt was one of the designers for Bookworm, and at his own website, he hosts several digital tools using Bookworm or other programs. bookworm:movies is one he created that uses the Open Subtitles database to search through movies and television shows to see how vocabulary has changed in these media from 1960-2014. This can be relevant to your research if you're interested in communications, media and cultural history.

  • Go to http://movies.benschmidt.org/ 
  • There will be a default search already plugged in to illustrate how the site works, but you can explore the different options for narrowing your search filter to a certain genre or medium.
  • You use the + and - signs to add or delete search terms. Click on the box next to the + and - sign, and will open a box showing which filters are ones you can use.  You can restrict your search to whether what you're searching is movies or television, to what genre is being searched, to even which television show or director you are looking into.
  • You can use it to compare the use of a word in shows over time. Let's say I want to see if evidence being presented on cop shows is similar to the kind gathered in real life cases. All three CSIs are in this database so I could see how often words like 'DNA' or 'trajectory' or 'witness' were used. 
    • Search for the word 'DNA' for the three CSI shows available in the TV Show category by plugging that in as a search term, adding one more search box using the plus sign, and then by clicking on the dropdown next to TV Show, first select 'CSI: Crime Scene Investigation' for one of the search terms, 'CSI: Miami' for the next and 'CSI: NY' for the last box and clicking on Redraw Chart

       
    • It looks like all three's use of 'DNA' in subtitles rose and fell at about the same rate until a dip in 2009 after which after the two spinoffs began using it less, and the main show's use climbed back to its original rate.
    • Click on link that says Design Chart above the graph. Y: Words Per Million will let you change what's being graphed. Currently it's displaying how many words out of a million are our search term DNA is what is being shown. If I were interested in how many episodes as a percentage 'DNA' appeared in, I could choose % of texts. If I just wanted to see as a nonweighted sum how many times a character said it, I'd choose word count, and if I just wanted to see as a number how many episodes it appeared in regardless of how many there were total, I'd choose text count. 
  • You can use Bookworm to compare the use of a phrase rather than a single word. Reload the site to clear your parameters, then with the search terms of 'single mother' add the filter of 'TV Show'  to  Medium for one instance of each and 'Movie' to the other and then click Search. 
    You'll notice that for much of TV's history the term was almost entirely absent, until about the late 80s when it starts appearing and then surpasses the use of the term in movies for a time.This doesn't mean necessarily that depictions of single mothers were absent, but could mean that the term wasn't the one that was used. If you click a point where you hover, you'll see the work(s) with the top search result for that term. Every movie and TV show doesn't appear here, so you'll want to make clear in any reporting of your results what you were searching.
  • Since you're interested in television, you're not going to get any matches on that score, judging by this graph, before 1955, so if you want to start the searching for both items at that date (as interesting as it is that it looks like there are movies in the silent and precode era that mention single mothers). To do this, you go to the line you have set up for single mother in the movies, and choose the filter icon, and select year
  • From the options that follow, pick all the years from 1955 onward, by selecting them with a click. Unfortunately, there isn't another way to select a whole range. 
    In the filter for year for, years of 1955 after are marked with a check mark
  • This makes the time frame that the graph has to cover smaller and therefore makes it easier to see the details of what the usage difference is between the two. 
    The graph covers a smaller time span so is more detailed
  • It's important to note that with this new version, you can only graph 3 different terms, so plan what questions you are asking accordingly
  • You can export your map by clicking on the download arrow and choosing the file type you'd like it saved as. You can also click at the link up top if you'd like to have a link you can share with people to your graph. This way they'd actually be able to click on any outliers and see which work's subtitles they came from. 

You should read the blog post at the top that contains the disclaimers about where the information for this graph is coming from and how it can be used.

The more specific you go (just a particular director or just a particular TV show) the more complete your results will be. This isn't a database of all television shows or all movies, just the ones that were in the Open Subtitles database at the time that the program was constructed. If you are using a more narrow lens (just the Simpsons, say) you'll know better when a low % means that the term was actually used less, and when it means that there is a lack of data for that particular year for a certain genre.

Using Bookworm to search Vogue

This one is a bit more niche but for scholars of societal trends as shown through fashion, the Robots Reading Vogue tool created by Yale can be extremely illuminating. In the exercise below, we'll look and see if  trends in the use of a term in advertising occur before or after its use in features, and if pockets being a feature in women's fashion were ever popular or if women have always been stuck having to carry a purse. You can look at these trends alongside any other societal trends you may be studying.

  • Go to http://bookworm.library.yale.edu a default display will show up, usually plotting the differing use of women and girls in the publication, but sometimes there are other examples. Like in the other instances of this program that you've seen, when you hover over a line at a point you're interested in, it will tell you the year, and if you click you can see information on the texts turning up this word. If you are looking at this from a social scientist angle, what do you make of the fact that girls rises steadily throughout the early 20th century to become more often used than women between 1945 and 1969?

Currently the Bookworm is set up to search all parts of Vogue, advertisements, articles, photo shoots and even letters to the editor for these terms. So if you wanted to see if different parts of the magazine, say the articles and the advertisements for example showed different trends for these terms, you can add filters, and add additional search boxes. 

  • Click the + sign to the left of each search term so that you now have both women and girls listed twice. 
  • Go to the filter icon to the right of each search term and under genre select Advertisement for one of the search lines for women and Article for the other. Do the same for girls. Click on the Search button
  • Take a look at how restricting the search to these two categories changes things. For one thing, it shows that there were different points at which girls ceased being a term of choice for articles and for advertisements, with the term all but disappearing from advertisements around 1980  and that there were several periods of times where it was used in advertisements while women was used in articles. 
  • If you decide you want to include several different genres on the same line, like if you want to look at both letters to the editor and articles as compared to advertisements, you can just click on the filter and use the dropdown that appears when you click on the line next to genre to also select Letters to the Editor. When you do this however, it doesn't seem to impact the overall trend for either, but something to keep in mind when designing how you'll search.
  • Now to compare different terms, go back to the filter icon and de-select the options you've previously selected. Replace the terms you have in each with pocket, pockets, handbag and purse. Since pocket and pockets both get used when talking about having a handy pouch in your clothing while handbag and purse seem to be talked about in the singular - I've included both. I ran this with handbag and purses as options and it just cluttered up the graph, as both were used less than their singular version at all times . Click on the Search button.
    Graph for different terms

It looks like both handbag and purse rose throughout the early half of the 20th century, taking over for both forms of pockets in the 50s. then making a comeback in the 60s briefly before dropping back off.  Handbag has been on the rise since 1986

  • You can also use multiple word phrases. For instance if you make your search terms knee length and ankle length you can see how those length for skirts, pants and dresses have ebbed and flowed. Although take note that in each of these graphs the words per million has gone down quite a bit so knee length and ankle length were not commonly used terms.

  •  

Play around with other fashion or gender-related vocabulary words that you're curious about and see what you can find with this site.

Learning Goals

Though there are other tools that let you explore n-grams and word frequencies with more complexity, Textalyser is useful if you are at a computer where you can't download any programs and you just want a quick n-gram analysis or word frequency count for a piece. It is simple to use, and its results are straightforward. You'll be using this tool to look at word frequencies and n-grams used in no star reviews from Roger Ebert.

Data

Though Textalyser will also let users paste in text or just a website URL for the item they want analysed, in this case you'll be provided with the below text file of Roger Ebert's reviews of movies that are 20% or less on Rotten Tomatoes to explore. If you'd rather look at another document, feel free, just remember that if it's a file you're uploading, it will need to be plain text (.txt)

Getting Started

  • Textalyser has a very simple interface and is easy to use. Go to http://textalyser.net/, click where it says Choose File  and navigate to where you've saved TwoThumbsDown on your computer
  • There are options you may want to include before analyzing
    • You can include your own stop list or none at all by using Apply Stoplist. Leave the default on for now of English.
    • You can specify if there are specific words you want analyzed or a smaller count of those words. In this case, put it to 40.
    • You can check the box for Exhaustive polyword phrases, which will list all the n-grams in the work, even those that were only used once. Leave this unchecked for now. 
  • Click on Analyze the text to get your analysis

Analyzing Results

After Textalyser is done analyzing the text, you'll see that at the bottom of the page will be a long list headed as Textalyser Results

  • The first section is basic statistics such as the total word count, the number of words, average syllables, sentence length, etc. Though this might not be as useful on its own, if you are planning on analyzing a group of different books, you'll probably want to have similar word counts for those books or come up with some way to weight the works if not. There are also readability and complexity statistics. 
  • The next section contains the top words used in the book. Though words such as "the" have been eliminated as stop words, you'll see that some of the top words are still more basic words such as you, like, and what. Not surprisingly, movie and film are very frequent 
  • It's followed by statistics about what the most common character-length for words is and syllable counts. You can see this isn't an author that uses very elaborate words as one and two syllable words are most common, though the distribution of character length is a bit more varied.
  • After you've scrolled past this, you'll get to the n-grams or phrases of two or more words. The two word phrases don't tell us all that much, but when you scroll to the three word phrases you can see that there are some phrases that Ebert repeats frequently such as "not funny when" and "is not funny", indicating this is one of his more universal complaints about bad movies, as well as "consider a scene" indicating that's one of his openings to citing a particularly illustratively bad scene. These common phrases could be compared to other critics' no star reviews or to Ebert's positive reviews. 
  • At the end is simply an unfiltered wordcount of each words' frequency

Exporting

While this site doesn't let you compare works to each other directly, or visualize what's revealed, you can easily copy and paste any of these charts into an Excel document or Google Sheets. From there you could graph or filter the data if you wanted, but Textalyser doesn't have a visualization option.