David Keyes

Founder, R for the Rest of Us

David Keyes has over a decade of experience conducting research and evaluation. He has led the Mexican Migration Field Research and Training Program at the University of California, San Diego; conducted evaluation work as part of the Oregon Community Foundation research team; and served as a data visualization consultant to other researchers and evaluators. In recent years, David has also trained evaluators (and others) to use R—the most powerful tool for data analysis and visualization—as the founder of R for the Rest of Us.

Blog: Data Cleaning Tips in R*

Posted on July 8, 2020 by  in Blog ()

Founder, R for the Rest of Us

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

I recently came across a set of data cleaning tips in Excel from EvaluATE, which provides support for people looking to improve their evaluation practice.

Screenshot of the Excel Data Cleaning Tips

As I looked through the tips, I realized that I could show how to do each of the five tips listed in the document in R. Many people come to R from Excel so having a set of R to Excel equivalents (also see this post on a similar topic) is helpful.

The tips are not intended to be comprehensive, but they do show some common things that people do when cleaning messy data. I did a live stream recently where I took each tip listed in the document and showed its R equivalent.

As I mention at the end of the video, while you can certainly do data cleaning in Excel, switching to R enables you to make your work reproducible. Say you have some surveys that need cleaning today. You write your code and save it. Then, when you get 10 new surveys next week, you can simply rerun your code, saving you countless Excel points and clicks.

You can watch the full video at the very bottom or go each tip by using the videos immediately below. I hope it’s helpful in giving an overview of data cleaning in R!

Tip #1: Identify all cells that contain a specific word or (short) phrase in a column with open-ended text

Tip #2: Identify and remove duplicate data

Tip #3: Identify the outliers within a data set

Tip #4: Separate data from a single column into two or more column

Tip #5: Categorize data in a column, such as class assignments or subject groups

Full Video

*This is a Repost of David Keyes’ blog Data Cleaning Tips in R

Blog: How I Came to Learn R, and Why You Should Too!

Posted on February 5, 2020 by  in Blog ()

Founder, R for the Rest of Us

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Title graphic image

A few years ago, I left my job on the research team at the Oregon Community Foundation and started working as an independent evaluation consultant. No longer constrained by the data analysis software choices made by others, I was free to use whatever tool I wanted. As an independent consultant, I couldn’t afford proprietary software such as SPSS, so I used Excel. But the limits of Excel quickly became apparent, and I went in search of other options.

I had heard of R, but it was sort of a black box in my mind. I knew it was a tool for data analysis and visualization, but I had no idea how to use it. I had never coded before, and the prospect of learning was daunting. But my desire to find a new tool was strong enough that I decided to take up the challenge of learning R.

My journey to successfully using R was rocky and circuitous. I would start many projects in R before finding I couldn’t do something, and I would have to slink back to Excel. Eventually, though, it clicked, and I finally felt comfortable using R for all of my work.

The more I used R, the more I came to appreciate its power.

  1. The code that had caused me such trouble when I was learning became second nature. And I could reuse code in multiple projects, so my workflow became more efficient.
  2. The data visualizations I made in R were far better and more varied than anything I had produced in Excel.
  3. The most fundamental shift in my work, though, has come from using RMarkdown. This tool enables me to go from data import to final report in R, avoiding the dance across, say, SPSS (for analyzing data), Excel (for visualizing data), and Word (for reporting). And when I receive new data, I can simply rerun my code, automatically generating my report.

In 2019, I started R for the Rest of Us to help evaluators and others learn to embrace the power of R. Through online courses, workshops, coaching, and custom training for organizations, I’ve helped many people transition to R.

I’m delighted to share some videos here that show you a bit more about what R is and why you might consider learning it. You’ll learn about what importing data into R looks like and how you can use a few lines of code to analyze your data, and you’ll see how you can do this all in RMarkdown. The videos should give you a good sense of what working in R looks like and help you decide if it makes sense for you to learn it.

I always tell people considering R that it is challenging to learn. But I also tell them that the time and energy you invest in learning R is very much worth it in the end. Learning R will not only improve the quality of your data analysis, data visualization, and workflow, but also ensure that you have access to this powerful tool forever—because, oh, did I mention that R is free? Learning R is an investment in your current self and your future self. What could be better than that?

R Video Series