Predictive modeling with text using tidy data principles

October 2020

Abstract

Have you ever encountered text data and suspected there was useful insight latent within it but felt frustrated about how to find that insight? Are you familiar with dplyr and ggplot2, and ready to learn how unstructured text data can be used for prediction within the tidyverse and tidymodels ecosystems? Do you need a flexible framework for handling text data that allows you to engage in tasks from exploratory data analysis to supervised predictive modeling? This tutorial is geared toward an R user with intermediate familiarity with R, RStudio, the basics of regression and classification modeling, and tidyverse packages such as dplyr and ggplot2. This person is comfortable with the main functions from dplyr and ggplot2 and is now ready to learn how to analyze and model text using tidy data principles. This R user has some experience with statistical modeling (such as using lm() and glm()) for prediction and classification and wants to learn how to build models with text.

Type

Workshop

Publication

Presented at 2020 Conference

University Southern California