We EvaluATE - Data...

Blog: Scavenging Evaluation Data

Posted on January 17, 2017 by  in Blog ()

Director of Research, The Evaluation Center at Western Michigan University

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

But little Mouse, you are not alone,
In proving foresight may be vain:
The best laid schemes of mice and men
Go often askew,
And leave us nothing but grief and pain,
For promised joy!

From To a Mouse, by Robert Burns (1785), modern English version

Research and evaluation textbooks are filled with elegant designs for studies that will illuminate our understanding of social phenomena and programs. But as any evaluator will tell you, the real world is fraught with all manner of hazards and imperfect conditions that wreak havoc on design, bringing grief and pain, rather than the promised joy of a well-executed evaluation.

Probably the biggest hindrance to executing planned designs is that evaluation is just not the most important thing to most people. (GASP!) They are reluctant to give two minutes for a short survey, let alone an hour for a focus group. Your email imploring them to participate in your data collection effort is one of hundreds of requests for their time and attention that they are bombarded with daily.

So, do all the things the textbooks tell you to do. Take the time to develop a sound evaluation design and do your best to follow it. Establish expectations early with project participants and other stakeholders about the importance of their cooperation. Use known best practices to enhance participation and response rates.

In addition: Be a data scavenger. Here are two ways to get data for an evaluation that do not require hunting down project participants and convincing them to give you information.

1. Document what the project is doing.

I have seen a lot of evaluation reports in which evaluators painstakingly recount a project’s activities as a tedious story rather than straightforward account. This task typically requires the evaluator to ask many questions of project staff, pore through documents, and track down materials. It is much more efficient for project staff to keep a record of their own activities. For example, see EvaluATE’s resume. It is a no-nonsense record of our funding, activities, dissemination, scholarship, personnel, and contributors.  In and of itself, our resume does most of the work of the accountability aspect of our evaluation (i.e., Did we do what we promised?).  In addition, the resume can be used to address questions like these:

  • Is the project advancing knowledge, as evidenced by peer-reviewed publications and presentations?
  • Is the project’s productivity adequate in relation to its resources (funding and personnel)?
  • To what extent is the project leveraging the expertise of the ATE community?

2. Track participation.

If your project holds large events, use a sign-in sheet to get attendance numbers. If you hold webinars, you almost certainly have records with information about registrants and attendees. If you hold smaller events, pass around a sign-in sheet asking for basic information like name, institution, email address, and job title (or major if it’s a student group). If the project has developed a course, get enrollment information from the registrar.  Most importantly: Don’t put these records in a drawer. Compile them in a spreadsheet and analyze the heck out of them. Here are example data points that we glean from EvaluATE’s participation records:

  • Number of attendees
  • Number of attendees from various types of organizations (such as two- and four-year colleges, nonprofits, government agencies, and international organizations)
  • Number and percentage of attendees who return for subsequent events
  • Geographic distribution of attendees

Project documentation and participation data will be most helpful for process evaluation and accountability. You will still need cooperation from participants for outcome evaluation—and you should engage them early to garner their interest and support for evaluation efforts. Still, you may be surprised by how much valuable information you can get from these two sources—documentation of activities and participation records—with minimal effort.

Get creative about other data you can scavenge, such as institutional data that colleges already collect; website data, such as Google Analytics; and citation analytics for published articles.

Blog: Possible Selves: A Way to Assess Identity and Career Aspirations

Posted on September 14, 2016 by  in Blog ()

Professor of Psychology, Arkansas State University

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Children are often asked the question “What do you want to be when you grow up?” Many of us evaluate programs where the developers are hoping that participating in their program will change this answer. In this post, I’d like to suggest using “possible self” measures as a means of evaluating if a program changed attendees’ sense of identity and career aspirations.

What defines the term?

Possible selves are our representations of our future. We all think about what we ideally would like to become (the hoped-for possible self), things that we realistically expect to become (the expected possible self), and things that we are afraid of becoming (the feared-for possible self).[1][2] Possible selves can change many times over the lifespan and thus can be a useful measure to examine participants’ ideas about themselves in the future.

How can it be measured?

There are various ways to measure possible selves. One of the simplest is to use an open-ended measure that asks people to describe what they think will occur in the future. For example, we presented the following (adapted from Osyerman et al., 2006[2]) to youth participants in a science enrichment camp (funded by an NSF-ITEST grant to Arkansas State University):

Probably everyone thinks about what they are going to be like in the future. We usually think about the kinds of things that are going to happen to us and the kinds of people we might become.

  1. Please list some things that you most strongly hope will be true of you in the future.
  2. Please list some things that you think will most likely be true of you in the future.

The measure was used both before and after participating in the program. We purposely did not include a feared-for possible self, given the context of a summer camp.

What is the value-added?

Using this type of open-ended measure allows for participants’ own voices to be heard. Instead of imposing preconceived notions of what participants should “want” to do, it allows participants to tell us what is most important to them. We learned a great deal about participants’ world views and their answers helped us to fine-tune programs to better serve their needs and to be responsive to our participants. Students’ answers focused on careers, but also included hoped-for personal ideals. For instance, European-American students were significantly more likely to mention school success than African-American students.  Conversely, African-American students were significantly more likely to describe hoped-for positive social/emotional futures compared to European-American students. These results allowed program developers to gain a more nuanced understanding of motivations driving participants. Although we regarded the multiple areas of focus as a strength of the measure, evaluators considering using a possible self-measure may also want to include more directed, follow-up questions.

For more information on how to assess possible selves, see Professor Daphna Oyserman’s website.


[1] Markus, H. R., & Nurius, P. (1986). Possible selves. American Psychologist, 41, 954–969.

[2] Oyserman, D., Bybee, D., &Terry, K. (2006). Possible selves and academic outcomes: How and when possible selves impel action. Journal of Personality and Social Psychology, 91, 188–204.

Blog: Six Data Cleaning Checks

Posted on September 1, 2016 by  in Blog ()

Research Associate, WestEd’s STEM Program

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Data cleaning is the process of verifying and editing data files to address issues of inconsistency and missing information. Errors in data files can appear at any stage of an evaluation, making it difficult to produce reliable data. Data cleaning is a critical step in program evaluation because clients rely on accurate results to inform decisions about their initiatives. Below are six essential steps I include in my data cleaning process to minimize issues during data analysis:

1. Compare the columns of your data file against the columns of your codebook.

Sometimes unexpected columns might appear in your data file or columns of data may be missing. Data collected from providers external to your evaluation team (e.g., school districts) might include sensitive participant information like social security numbers. Failures in software used to collect data can lead to responses not being recorded. For example, if a wireless connection is lost while a file is being downloaded, some information in that file might not appear in the downloaded copy. Unnecessary data columns should be removed before analysis and, if possible, missing data columns should be retrieved.

2. Check your unique identifier column for duplicate values.

An identifier is a unique value used to label a participant and can take the form of a person’s full name or a number assigned by the evaluator. Multiple occurrences of the same identifier in a data file usually indicate an error. Duplicate identifier values can occur when participants complete an instrument more than once or when a participant identifier is mistakenly assigned to multiple records. If participants move between program sites, they might be asked to complete a survey for a second time. Administrators might record a participant’s identifier incorrectly, using a value assigned to another participant. Data collection software can malfunction and duplicate rows of records. Duplicate records should be identified and resolved.

3. Transform categorical data into standard values.

Non-standard data values often appear in data gathered from external data providers. For example, school districts often provide student demographic information but vary in the categorical codes they use. For example, the following table shows a range of values I received from different districts to represent students’ ethnicities:


To aid in reporting on participant ethnicities, I transformed these values into the race and ethnicity categories used by the National Center for Education Statistics.

When cleaning your own data, you should decide on standard values to use for categorical data, transform ambiguous data into a standard form, and store these values in a new data column.  OpenRefine is a free tool that facilitates data transformations.

4. Check your data file for missing values.

Missing values occur when participants choose not to answer an item, are absent the day of administration, or skip an item due to survey logic. If missing values are found, apply a code to indicate the reason for the missing data point. For example, 888888 can indicate an instrument was not administered and 999999 can indicate a participant chose not to respond to an item. The use of codes can help data analysts determine how to handle the missing data. Analysts sometimes need to report on the frequency of missing data, use statistical methods to replace the missing data, or remove the missing data before analysis.

5. Check your data file for extra or missing records.

Attrition and recruitment can occur at all stages of an evaluation. Sometimes people who are not participating in the evaluation are allowed to submit data. Check the number of records in your data file against the number of recruited participants for discrepancies. Tracking dates when participants join a project, leave a project, and complete instruments can facilitate this review.

6. Correct erroneous or inconsistent values.

When instruments are completed on paper, participants can enter unexpected values. Online tools may be configured incorrectly and allow illegal values to be submitted. Create a list of validation criteria for each data field and compare all values against this list. de Jonge and van der Loo provide a tutorial for checking invalid data using R.

Data cleaning can be a time-consuming process. These checks can help reduce the time you spend on data cleaning and get results to your clients more quickly.

Blog: How Real-time Evaluation Can Increase the Utility of Evaluation Findings

Posted on July 21, 2016 by , in Blog ()
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Peery Wilkerson
Elizabeth Peery Stephanie B. Wilkerson

Evaluations are most useful when evaluators make relevant findings available to project partners at key decision-making moments. One approach to increasing the utility of evaluation findings is by collecting real-time data and providing immediate feedback at crucial moments to foster progress monitoring during service delivery. Based on our experience evaluating multiple five-day professional learning institutes for an ATE project, we discovered the benefits of providing real-time evaluation feedback and the vital elements that contributed to the success of this approach.

What did we do?

With project partners we co-developed online daily surveys that aligned with the learning objectives for each day’s training session. Daily surveys measured the effectiveness and appropriateness of each session’s instructional delivery, exercises and hands-on activities, materials and resources, content delivery format, and session length. Participants also rated their level of understanding of the session content and preparedness to use the information. They could submit questions, offer suggestions for improvement, and share what they liked most and least. Based on the survey data that evaluators provided to project partners after each session, partners could monitor what was and wasn’t working and identify where participants needed reinforcement, clarification, or re-teaching. Project partners could make immediate changes and modifications to the remaining training sessions to address any identified issues or shortcomings before participants completed the training.

Why was it successful?

Through the process, we recognized that there were a number of elements that made the daily surveys useful in immediately improving the professional learning sessions. These included the following:

  • Invested partners: The project partners recognized the value of the immediate feedback and its potential to greatly improve the trainings. Thus, they made a concentrated effort to use the information to make mid-training modifications.
  • Evaluator availability: Evaluators had to be available to pull the data after hours from the online survey software program and deliver it to project partners immediately.
  • Survey length and consistency: The daily surveys took less than 10 minutes to complete. While tailored to the content of each day, the surveys had a consistent question format that made them easier to complete.
  • Online format: The online format allowed for a streamlined and user-friendly survey. Additionally, it made retrieving a usable data summary much easier and timelier for the evaluators.
  • Time for administration: Time was carved out of the training sessions to allow for the surveys to be administered. This resulted in higher response rates and more predictable timing of data collection.

If real-time evaluation data will provide useful information that can help make improvements or decisions about professional learning trainings, it is worthwhile to seek resources and opportunities to collect and report this data in a timely manner.

Here are some additional resources regarding real-time evaluation:

Blog: Student Learning Assessments: Issues of Validity and Reliability

Posted on June 22, 2016 by  in Blog ()

Senior Educational Researcher, SRI International

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

In my last post, I talked about the difference between program evaluation and student assessment. I also touched on using existing assessments if they are available and appropriate, and if not, constructing new assessments.  Of course, that new assessment would need to meet test quality standards or otherwise it will not be able to measure what you need to have measured for your evaluation. Test quality has to do with validity and reliability.

When a test is valid, it means that when a student responds with a wrong answer, it would be reasonable to conclude that they did so because they did not learn what they were supposed to have learned. There are all kinds of impediments to an assessment’s validity. For example, if in a science class you are asking students a question aimed at determining if they understand the difference between igneous and sedimentary rocks, yet you know that some of them do not understand English, you wouldn’t want to ask them the question in English. In testing jargon, what you are introducing in such a situation is “construct irrelevant variance.” In this case, the variance in results may be as much due to whether they know English (the construct irrelevant part) as to whether they know the construct, which is the differences between the rock types. Hence, these results would not help you determine if your innovation is helping them learn the science better.

Reliability has to do with test design, administration, and scoring. Examples of unreliable tests are those that are too long, introducing test-taking fatigue that interfere with their being reliable measures of student learning. Another common example of unreliability is when the scoring directions or rubric are not designed well enough to be sufficiently clear about how to judge the quality of an answer. This type of problem will often result in inconsistent scoring, otherwise known as low interrater reliability.

To summarize, a student learning assessment can be very important to your evaluation if a goal of your project is to directly impact student learning. Then you have to make some decisions about whether you can use existing assessments or develop new ones, and if you make new ones, they need to meet technical quality standards of validity and reliability. For projects not directly aiming at improving student learning, an assessment may actually be inappropriate in the evaluation because the tie between the project activities and the student learning may be too loose. In other words, the learning outcomes may be mediated by other factors that are too far beyond your control to render the learning outcomes useful for the evaluation.

Blog: Project Data for Evaluation: Google Groups Project Team Feedback

Posted on February 11, 2016 by  in Blog ()

President and CEO, Censeo Group

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

At Censeo Group, a program evaluation firm located in northeast Ohio, we are evaluating a number of STEM projects and often face the challenge of how to collect valid and reliable data about the impact of curriculum implementation: What implementation looks like, students’ perceptions of the program, project leaders’ comfort with lessons, and the extent to which students find project activities engaging and beneficial.

We use various methods to gather curriculum implementation data. Observations offer a glimpse into how faculty deliver new curriculum materials and how students interact and react to those materials, but are time-intensive and require clear observation goals and tools. Feedback surveys offer students and staff the opportunity to provide responses that support improvement or provide a summative analysis of the implementation, but not everyone responds and some responses may be superficial. During a recent project, we were able to use an ongoing, rich, genuine, and helpful source of project information for the purpose of evaluation.

Google Groups Blog

Project leaders created a Google Group and invited all project staff and the evaluation team to join with the following message:

“Welcome to our Google Group! This will be a format for sharing updates from interventions and our sites each week. Thanks for joining in our discussion!”

The team chose Google Groups because everybody was comfortable with the environment, and it is free, easy to use and easy to access.

Organizing the Posts

Project leaders created a prompt each week, asking staff to “Post experiences from Week X below.” This chronological method of organization kept each week’s feedback clustered. However, a different organizing principle could be used, for example, curriculum unit or school.

In the case of this Google Group, the simple prompt resonated well with project staff, who wrote descriptive and reflective entries. Graduate students, who were delivering a new curriculum to high school students, offered recommendations for colleagues who would be teaching the content later in the week about how to organize instruction, engage students, manage technology, or address questions that were asked during their lessons. Graduate students also referred to each other’s posts, indicating that this interactive method of project communication was useful and helpful for them as they worked in the schools, for example, in organizing materials or modifying lessons based on available time or student interest.

Capturing and Analyzing the Data 

The evaluation team used NVIVO’s NCapture, a Web browser add-on for NVIVO qualitative data analysis software that allows the blog posts to be quickly imported into the software for analysis. Once in NVIVO, the team coded the data to analyze the successes and challenges of using the new curriculum in the high schools.

Genuine and Ongoing Data

The project team is now implementing the curriculum for the second time with a new group of students. Staff members are posting weekly feedback about this second implementation. This ongoing use of the Google Group blog will allow the evaluation team to analyze and compare implementation by semester (Fall 2015 versus Spring 2016), by staff type (reveal changes in graduate students’ skills and experience), by school, and other relevant categories.

From a strictly data management perspective, a weekly survey of project staff using a tool such as Google Forms or an online survey system, from which data could be transferred directly into a spreadsheet, likely would have been easier to manage and analyze. However, the richness of the data that the Google Groups entries generated was well worth the trade-off of the extra time required to capture and upload each post. Rather than giving staff an added “evaluation” activity that was removed from the work of the project, and to which likely not all staff would have responded as enthusiastically, these blog posts provided evaluation staff with a glimpse into real-time, genuine staff communication and classroom implementation challenges and successes. The ongoing feedback about students’ reactions to specific activities supported project implementation by helping PIs understand which materials needed to be enhanced to support students of different skill levels as the curriculum was being delivered. The blog posts also provided insights into the graduate students’ comfort with the curriculum materials and highlighted the need for additional training for them about specific STEM careers. The blog allowed PIs to quickly make changes during the semester and provided the evaluation team with information about how the curriculum was being implemented and how changes affected the project over the course of the semester.

You can find additional information about NVIVO here: http://www.qsrinternational.com/product. The site includes training resources and videos about NVIVO.

You can learn how to create and use a Google Group at the Google Groups Help Center:

Blog: Strategic Knowledge Mapping: A New Tool for Visualizing and Using Evaluation Findings in STEM

Posted on January 6, 2016 by  in Blog (, )

Director of Research and Evaluation, Meaningful Evidence, LLC

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

A challenge to designing effective STEM programs is that they address very large, complex goals, such as increasing the numbers of underrepresented students in advanced technology fields.

To design the best possible programs to address such a large, complex goal, we need a large, complex understanding (from looking at the big picture). It’s like when medical researchers seek to develop a new cure–they need deep understanding of how medications interact with the body, other medications, and how they will affect the patient based on their age and medical history.

A new method, Integrative Propositional Analysis (IPA), lets us visualize and assess information gained from evaluations. (For details, see our white papers.) At the 2015 American Evaluation Association conference, we demonstrated how to use the method to integrate findings from the PAC-Involved (Physics, Astronomy, Cosmology) evaluation into a strategic knowledge map. (View the interactive map.)

A strategic knowledge map supports program design and evaluation in many ways.

Measures understanding gained.
The map is an alternative logic model format that provides broader and deeper understanding than usual logic model approaches. Unlike other modeling techniques, IPA lets us quantitatively assess information gained. Results showed that the new map incorporating findings from the PAC-Involved evaluation had much greater breadth and depth than the original logic model. This indicates increased understanding of the program, its operating environment, how they work together, and options for action.

Graphic 1

Shows what parts of our program model (map) are better understood.
In the figure below, the yellow shadow around the concept “Attendance/attrition challenges” indicates that this concept is better understood. We better understand something when it has multiple causal arrows pointing to it—like when we have a map that shows multiple roads leading to each destination.

Graphic 2

Shows what parts of the map are most evidence supported.
We have more confidence in causal links that are supported by data from multiple sources. The thick arrow below shows a relationship that many sources of evaluation data supported. All five evaluation data sources—the project team interviews, student focus group, review of student reflective journals, observation, and student surveys all provided evidence that more experiments/demos/hands-on activities caused students to be more engaged in PAC-Involved.

graphic 3

Shows the invisible.
The map also helps us to “see the invisible.” If something does not have arrows pointing to it, we know that there is “something” that should be added to the map. This indicates that more research is needed to fill those “blank spots on the map” and improve our model.

Graphic 4

Supports collaboration.
The integrated map can support collaboration among the project team. We can zoom in to look at what parts are relevant for action.

Graphic 5

Supports strategic planning.
The integrated map also supports strategic planning. Solid arrows leading to our goals indicate things that help. Dotted lines show the challenges.

Graphic 6

Clarifies short-term and long-term outcomes.
We can create customized map views to show concepts of interest, such as outcomes for students and connections between the outcomes.

Graphic 7

We encourage you to add a Strategic Knowledge Map to your next evaluation. The evaluation team, project staff, students, and stakeholders will benefit tremendously.

Blog: Using Embedded Assessment to Understand Science Skills

Posted on August 5, 2015 by , , in Blog (, )
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Cathlyn Stylinski
Senior Agent
University of Maryland Center
for Environmental Science
Karen Peterman
Karen Peterman Consulting
Rachel Becker-Klein
Senior Research Associate
PEER Associates

As our field explores the impact of informal (and formal) science programs on learning and skill development, it is imperative that we integrate research and evaluation methods into the fabric of the programs being studied. Embedded assessments (EAs) are “opportunities to assess participant progress and performance that are integrated into instructional materials and are virtually indistinguishable from day-to-day [program] activities” (Wilson & Sloane, 2000, p. 182). As such, EAs allow learners to demonstrate their science competencies through tasks that are integrated seamlessly into the learning experience itself.

Since they require that participants demonstrate their skills, rather than simply rate their confidence in using them, EAs offer an innovative way to understand and advance the evidence base for knowledge about the impacts of informal science programs. EAs can take on many forms and can be used in a variety of settings. The essential defining feature is that these assessments document and measure participant learning as a natural component of the program implementation and often as participants apply or demonstrate what they are learning.

Related concepts that you may have heard of:

  • Performance assessments: EA methods can include performance assessments, in which participants do something to demonstrate their knowledge and skills (e.g., scientific observation).
  • Authentic assessments: Authentic assessments are assessments of skills where the learning tasks mirror real-life problem-solving situations (e.g., the specific data collection techniques used in a project) and could be embedded into project activities. (Rural School and Community Trust, 2001; Wilson & Soane, 2000)

You can use EAs to measure participants’ abilities alongside more traditional research and evaluation measures and also to measure skills across time. So, along with surveys of content knowledge and confidence in a skill area, you might consider adding experiential and hands-on ways of assessing participant skills. For instance, if you were interested in assessing participants’ skills in observation, you might already be asking them to make some observations as a part of your program activities. You could then develop and use a rubric to assess the depth of that observation.

Although EA offers many benefits, the method also poses some significant challenges that have prevented widespread adoption to date. For the application of EA to be successful, there are two significant challenges to address: (1) the need for a standard EA development process that includes reliability and validity testing and (2) the need for professional development related to EA.

With these benefits and challenges in mind, we encourage project leaders, evaluators, and researchers to help us to push the envelope by:

  • Thinking critically about the inquiry skills fostered by their informal science projects and ensuring that those skills are measured as part of the evaluation and research plans.
  • Considering whether projects include practices that could be used as an EA of skill development and, if so, taking advantage of those systems for evaluation and research purposes.
  • Developing authentic methods that address the complexities of measuring skill development.
  • Sharing these experiences broadly with the community in an effort to highlight the valuable role that such projects can play in engaging the public with science.

We are currently working on a National Science Foundation grant (Embedded Assessment for Citizen Science – EA4CS) that is investigating the effectiveness of embedded assessment as a method to capture participant gains in science and other skills. We are conducting a needs assessment and working on creating embedded assessments at each of three different case study sites. Look for updates on our progress and additional blogs over the next year or so.

Rural School and Community Trust (2001). Assessing Student Work. Available from http://www.ruraledu.org/user_uploads/file/Assessing_Student_Work.pdf

Wilson, M., & Sloane, K. (2000). From principles to practice: An embedded assessment system. Applied Measurement in Education, 13(2), 181-208. Available from http://dx.doi.org/10.1207/S15324818AME1302_4

Blog: Visualizing Network Data

Posted on July 29, 2015 by , in Blog ()
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Rick Orlina
Evaluation and Research Consultant
Rocinante Research
Veronica S. Smith
Data Scientist, Founder
data2insight, LLC

Social Network Analysis (SNA) is a methodology that we have found useful when answering questions about relationships. For example, our independent evaluation work with National Science Foundation-funded Integrative Graduate Education Traineeship (IGERT) programs typically include a line of inquiry about the nature of interdisciplinary relationships across trainees and faculty, and how those relationships change over time.

Sociograms are data displays that stakeholders can use to understand network patterns and identify potential ways to affect desired changes in the network. There are currently few, if any, rules regarding how to draw sociograms to facilitate effective communication with stakeholders. While there is only one network—the particular set of nodes and the ties that connect them—there are many ways to draw the network. We share two methods for visualizing networks and describe how they have been helpful when communicating evaluation findings to clients.

Approach 1: Optimized Force-Directed Maps

Figure 1 presents sociograms for one of the relationships defined as part of an IGERT evaluation as measured at two time points. Specifically, this relationship reflects whether participants reported that they designed or taught a course, seminar, or workshop together.

In this diagram, individuals (nodes) who share a tie tend to be close together, while individuals who do not share a tie tend to be farther apart. When drawn in this way, the sociogram reveals how people organize into clusters. Red lines represent interdisciplinary relationships, making it possible to see patterns in the connections that bridge disciplinary boundaries. These sociograms combine data from three years, so nodes do not move from one sociogram to the next. Nodes appear and disappear as individuals enter and leave the network, and the ties connecting people appear and disappear as reported relationships change. Thus, it is easy to see how connections—around individuals and across the network—evolve over time.

One shortcoming of this data display is that it can be difficult to identify the same person (node) in a set of sociograms spanning multiple time periods. However, with additional data processing, it is possible to create a set of aligned sociograms (in which node positions are fixed) that make visual analysis of changes over time easier.

Figure 1: Sociograms — Fixed node locations based on ties reported across all years

a) “Taught with” relationship year 4

 Orlina Smith 1

b) “Taught with” relationship year 3

Orlina Smith 2

Approach 2: Circular/Elliptical Maps

Figure 2 introduces another way to present a sociogram: a circular layout that places all nodes on the perimeter of a circle with ties drawn as chords passing through the middle of the circle (or along the periphery when connecting neighboring nodes). Using the same data used for Figure 1, Figure 2 groups nodes along the elliptical boundary by department and, within each department, by role. By imposing this arrangement on the nodes, interdisciplinary ties pass through the central area of the ellipse, making it easy to see the density of interdisciplinary ties and to identify people and departments that contribute to interdisciplinary connections.

One limitation of this map is that it is difficult to see the clustering and to distinguish people who are central in the group versus people who tend to occupy a position around the group’s periphery.

Figure 2: Sociograms—All nodes from all years of survey placed in a circular layout and fixed

a) “Taught with” relationship year 4

Orlina Smith 3

b) “Taught with” relationship year 3

Orlina Smith 4

Because both network diagrams have strengths and limitations, consider using multiple layouts and choose maps that best address stakeholders’ questions. Two excellent—and free—software packages are available for people interested in getting started with network visualization: NetDraw (https://sites.google.com/site/netdrawsoftware/home), which was used to create the sociograms in this post, and Gephi (http://gephi.github.io), which is also capable of computing a variety of network measures.

Blog: Crowdsourcing Interview Data Analysis

Posted on June 24, 2015 by  in Blog ()

Associate Professor, Claremont Graduate University

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Most of the evaluations I conduct include interview or focus group data. This data provides a sense of student experiences and outcomes as they progress through a program. After collecting this data, we would transcribe, read, code, re-read, and recode to identify themes and experiences to capture the complex interactions between the participants, the program, and their environment. However, in our reporting of this data, we are often restricted to describing themes and providing illustrative quotes to represent the participant experiences. This is an important part of the report, but I have always felt that we could do more.

This led me to think of ways to quantify the transcribed interviews to obtain a broader impression of participant experiences and compare across interviews. I also came across the idea of crowdsourcing, which means that you get a lot of people to perform a very specific task for payment. For example, a few years ago 30,000 people were asked to review satellite images to locate a crashed airplane. Crowdsourcing has been around for a long time (e.g., the Oxford English dictionary was crowdsourced), but it has become considerably easier to access the “crowd.” Amazon’s Mechanical Turk (MTurk.com) gives researchers access to over 500,000 people around the world. It allows you to post specific tasks and have them completed within hours. For example, if you wanted to test the reliability of a survey or survey items, you can post it on MTurk and have 200 people take the survey (depending on the survey’s length, you can pay them $.50 to $1.00).

So the idea of crowdsourcing got me thinking about the kind of information we can get if we had 100 or 200 or 300 people read through interview transcripts. For simplicity, I wanted MTurk people (Called Workers on MTurk) to read transcripts and rate (using a Likert scale) students’ experiences in specific programs, as well as select text that they deemed important and illustrative of those participant experiences. We conducted a series of studies using this procedure and found that the crowd’s average ratings of the students’ experiences were stable and consistent, even after we used five different samples. We also found that the text the crowd selected was the same across the five different samples. This is important from a reporting standpoint, because it helped to identify the most relevant quotes for the reports, and the ratings provided a summary of the student experiences that could be used to compare different interview transcripts.

If you are interested in trying this approach out, here a few suggestions:

1) Make sure that you remove any identifying information about the program from the transcripts before posting them on MTurk (to protect privacy and comply with HSIRB requirements).

2) Pay the MTurk people more for work that takes more time. If a task takes 15 to 20 minutes, then I would suggest that a minimum payment is $.50 per response. If the task takes more than 20 minutes I would suggest going $.75 to $2.00 depending on the time it would take to complete.

3) Be specific about what you want the crowd to do. There should be no ambiguity about the task (this can be accomplished by pilot testing the instructions and tasks and asking the MTurk participants to provide you feedback on the clarity of the instructions).

I hope that you found this useful and please let me know how you have used crowdsourcing in your practice.