Skip to Main Content


Research Guide for DAsH (or digital humanities) resources and tools

Getting Data

To do a project with digital data, you need to first get that data. At times, this will be easy, like if you started on your research question because you already had the data that you wanted to use (a group of novels, a census report, a play, etc). Other times you'll be in search of the data that can answer your question. An important part of this is making sure that you get the data in the right format that the program that you'll be using to analyze it needs its data in. This will likely be a txt file for text analysis programs or a csv file for numerical analysis, graphing or mapping. Sometimes you'll be lucky and the data source you're using will offer its information in those formats. In other cases you may need to get more creative. 

Copyright can complicate this. Any written works before 1923 are in the public domain, but items written after that will be trickier to get in a plain text format. For numerical data, while government collected data should usually be freely available in accessible formats, some companies may offer you data in small chunks for free, but want to charge you for the full data received in the format you want. Others may have their data posted online in a format that isn't easy to cut and paste into the kind of programs that you'll be using for analysis. You can use web-scraping or APIs when they are available to collect data on a massive scale. 

Data Sources