Over the course of analyzing data for a number of projects, I have encountered several situations that can make it hard to work with spreadsheet-based data, especially when working with multiple programs. Today, I would like to share some frustration-saving tips for preparing spreadsheet data for use with programs like Excel, Access, and SPSS.

1) In Excel, use an apostrophe at the beginning of numbers that start with a zero: Excel will usually remove zeros at the beginning of a field, so ZIP Codes or other numbers with leading zeros must be entered with a single apostrophe in front. Do not worry about this with numbers that are not categorical.

2) Minimize the number of columns in your data download: Survey systems like Qualtrics or Survey Monkey usually provide an option to export single-question responses as multiple columns, (i.e., Gender_male, Gender_female, and Gender_other). Using this option can make it difficult to clean and analyze your data later. If the question was a “select all that apply” type question, using multiple columns is appropriate. If the respondent can only select one option, then keep the answers as one column, with distinct codes for each type of response.

3) Use simple column headings when setting up spreadsheets: Both SPSS and Access like to read the first row of spreadsheets as variable names. The second row will always be interpreted as data. To save yourself frustration, instead of putting the question title “GPA” in the first line, and a series of years in the second line, simply have one line of variable names that includes “GPA2014” and “GPA2015.”

4) Avoid special characters: Many computer programs hate special characters (like quotation marks or @ symbols) in variable names or text fields. For example, Access will read quotation marks as delimiters in text fields during some operations, which will trigger errors that can cause the database to fault.

5) Use built-in number formats instead of hand-entering symbols: Avoid hand-entering percent symbols and dollar signs in Excel fields. Instead, enter .xx for percentages and xx.xx for dollars, and then set the number format for the whole column to percentage or currency (drop down menu on the ribbon). Excel will keep the numbers as entered, but will display them properly. Also make sure all cells in a column are set to the same number format.

6) Assign a code to identify missing data: Empty cells are interpreted differently by different programs, depending on the data type. Without going into too much detail as to the reason why, using a number that could never appear in your data (like 99999) is better than using zeros to represent missing data. Use formulas to exclude those codes from any calculations you do in Excel (using the codes above, the Excel formula =averageif(a:a,” <777” would exclude cells in column A that are coded as missing). This is also more upload-friendly to Access or SPSS. Use the word “None” to represent missing qualitative data in text columns. Doing this will speed up error checking and upload/transfer to other programs.

Following these tips can save you time and effort in the long run.

About the Authors

Miranda Lee

Miranda Lee box with arrow

Doctoral Candidate, The Evaluation Center at Western Michigan University

Miranda Lee is a project manager at the Evaluation Center at Western Michigan Unviersity. She is also a doctoral candidate in the Interdisciplinary Ph.D. in Evaluation program. Her primary responsibilities with EvaluATE involve the maintenance of our contact list and our database of evaluation data, conduct of the annual survey, webmaster, and contributor to various EvaluATE products.

Creative Commons

Except where noted, all content on this website is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Related Blog Posts

Nation Science Foundation Logo EvaluATE is supported by the National Science Foundation under grant number 1841783. Any opinions, findings, and conclusions or recommendations expressed on this site are those of the authors and do not necessarily reflect the views of the National Science Foundation.