DSL Workshop: Text Analysis and Data Visualization using R
Digital Scholarship Lab Workshop: Text Analysis and Data Visualization using R
Holmes Finch and Matthias Raess
Thursday, March 22, 2018, 2-5 PM
Schwartz Complex, Bracken Library 105E
Ball State’s Digital Scholarship Lab will host a workshop on doing text analysis (and other forms of data visualization) with R. R is a powerful statistical software that can be employed to analyze textual data, which can include novels, newspaper articles, interview transcripts, and collections of tweets. R enables researchers to detect patterns of language and meaning in large and complex sets of texts.
The workshop’s leaders will introduce participants to methods for analyzing and creating plots from textual data (e.g. most frequent words) using the R statistical programming language. More precisely, participants will be able to see how to get from textual data input to a finished plot, and how to run simple sentiment analyses with textual data. While some of the data come from ready-made datasets (e.g. the works of Jane Austen), participants will also be able to ‘witness’ live data collection from Twitter from pre-processing steps to finalized plots and sentiment analysis. (For links to the scripts used in the workshop, click here or scroll to the bottom of this page.)
The Workshop will have two parts (participants can register for both parts or just part 1).
Part 1: 2:00-3:30 PM — Introduction to using R for text analysis. This portion is for those mainly interested in learning more about text analysis.
Part 2: 4:00-5:00 PM – B.Y.O.D. (Bring Your Own Data). This portion is for those with a textual data set that is close to being ready to go and who are interested in trying some hands-on experimentation.
For those planning to participate in the second part of the workshop, please bring you laptop to the workshop and make the following preparations:
- Download and install R and RStudio on your machine. For a quick and dirty explanation of this process go to http://web.cs.ucla.edu/~gulzar/rstudio/.
- If you want to analyze Twitter data, please download and install rtweet (http://rtweet.info/) on your machine and have a developer app set up on Twitter (https://apps.twitter.com/).
- Please determine in advance what variables you have in your text data and what research question(s) you seek to answer. Ideally, your data should be in either csv-format (if mostly quantitative) or docx/txt (if mostly text documents). If you need assistance organizing your data, we can provide that during the second part of the workshop.
Contact the DSL at email@example.com with questions about preparing for the workshop.
For more on R, check out these resources:
R cheat sheets on various topics
Text mining in R ebook link
Lynda – Up and Running with R
Lynda – R Statistics Essential Training
Code School – Try R
DataCamp – Various courses on R, including focus on visualization, text mining, and stringr-package.
Rbloggers – 750 contributors.
STHDA – Just a ton of cool stats and R stuff, including tips on ggplot2 visualization package.
Stackoverflow – This is where you ask questions.
Click here for R Scripts on how to clean up data fetched from twitter. (Click on the download button)
Click here for R Scripts on how to visualize data fetched from twitter. (Click on the download button)
Click here for R Scripts on how mine Jan Austen text for data. (Click on the download button)