Subscribe here for quick access to our latest blog posts. New to RSS feeds? Click here

Blog: Data Cleaning Tips in R*

Posted on July 8, 2020 by  in Blog

Founder, R for the Rest of Us

I recently came across a set of data cleaning tips in Excel from EvaluATE, which provides support for people looking to improve their evaluation practice.


Screenshot of the Excel Data Cleaning Tips

As I looked through the tips, I realized that I could show how to do each of the five tips listed in the document in R. Many people come to R from Excel so having a set of R to Excel equivalents (also see this post on a similar topic) is helpful.

The tips are not intended to be comprehensive, but they do show some common things that people do when cleaning messy data. I did a live stream recently where I took each tip listed in the document and showed its R equivalent.

As I mention at the end of the video, while you can certainly do data cleaning in Excel, switching to R enables you to make your work reproducible. Say you have some surveys that need cleaning today. You write your code and save it. Then, when you get 10 new surveys next week, you can simply rerun your code, saving you countless Excel points and clicks.

You can watch the full video at the very bottom or go each tip by using the videos immediately below. I hope it’s helpful in giving an overview of data cleaning in R!

Tip #1: Identify all cells that contain a specific word or (short) phrase in a column with open-ended text

Tip #2: Identify and remove duplicate data

Tip #3: Identify the outliers within a data set

Tip #4: Separate data from a single column into two or more column

Tip #5: Categorize data in a column, such as class assignments or subject groups

Full Video

*This is a Repost of David Keyes’ blog Data Cleaning Tips in R