Archive: evaluation design

Blog: Surveying Learners in Open Online Technical Courses

Posted on July 22, 2015 by  in Blog ()

Assistant Professor, Engineering Education, Purdue University

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

I lead evaluation for nanoHUB-U, one of the educational opportunities provided through The concept of nanoHUB-U is to provide free access to interdisciplinary, highly technical topics of current significance in STEM research, particularly those related to nanotechnology. In fact, many of the recent nanoHUB-U course topics are so new that the information is not yet available in published textbooks. My job is to lead the effort to determine the merit of offering the courses and provide useable information to improve future courses. Sounds great, right? So what’s the problem?

Open online technical courses are similar to a broader group of learning opportunities, frequently referred to as MOOCs (massive open online courses). However, technical courses are not intended for a massive audience. How many people on the globe really want to learn about nanophotonic modeling? One of the major challenges for evaluation in open contexts is that anyone from around the world with Internet connection can access the course, whether or not they intend to “complete” the course, have language proficiency to understand the instructor, desire to reference materials or complete all course aspects, etc. In short: we know little about who is coming into the course and why.

To reach evaluative conclusions, evaluators must first begin with understanding stakeholders’ reasons for offering the course and a deep understanding of learners. Demographic questions must go well beyond the usual race/ethnicity and gender identity. For this blog, I’m focusing on survey aspects of open online technical course evaluation.

Practical Tips:

  1. Design the survey collaboratively with the course instructors. Instructors are experts in the technical content and will know what type of background information is necessary to be successful in the course (e.g., Have you ever taken a differential equations course?)
  2.  Design questions that target learners’ motivations, goals, and intentions for the course. Some examples include: How much time per week do you intend to spend on this course? How much of the course do you intend to complete? How concerned are you with grade outcomes? What do you hope to gain from this experience? Are you currently working full-time, part-time, full-time student, part-time student or unemployed?
  3. Embed the pre-survey in the first week’s course material. While not technically a true “pretest,” we have found this technique has resulted in a significantly higher response rate than the instructor emailing a link to the survey.
  4. Capture the outcomes of the group the course was designed for. The opinions of thousands may not be in alignment with who the stakeholders intended the course to serve. Design questions with a logic that targets the intended learner group by using deeper, open-ended questions (e.g., If this information has not been provided, where would you have learned this type of material?)
  5. Embed the post-survey in the last week’s course material. Again, our experience has been that this approach for surveys has generated a much higher response rate than emailing the course participants a link (even with multiple reminders). Most likely those that take the post-survey are the learners who participated in most aspects of the course.
  6. Use the survey results to identify groups of learners within the course. It is really useful to compare what learners’ intentions were in the course to what their behavior was, as well as their pre- and post-survey responses. When interpreting the results, it is important to examine responses based on groups of learners, rather than summing up the overall course averages.

Surveys are one aspect of the evaluation design for nanoHUB-U. Evaluation in an open educational context requires much more contextualization, and traditional educational evaluation metrics should be adapted to provide the most useful, action-oriented information possible.

Blog: ATE Small Project Evaluation

Posted on February 18, 2015 by  in Blog (, )

Executive Director, The Evaluation Center at Western Michigan University

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

All ATE proposals, except for planning grants, are required to specify a budget line for an independent evaluator. But the solicitation offers no guidance as to what a small-scale project evaluation should look like or the kinds of data to collect. The Common Guidelines for Education Research and Development—issued jointly by the National Science Foundation and the Institute of Education Sciences—specify that evidence of impact requires randomized controlled trials, while evidence of promise is generated by correlational and quasi-experimental studies.

The Common Guidelines aren’t well aligned to the work done by many ATE projects and centers, especially projects awarded through the “Small Grants” track. Small ATE projects are funded to do things like create new degree programs, offer summer camps, expand recruitment, provide compensatory education, and develop industry partnerships. These sorts of endeavors are quite distinct from the research and development work to which the Common Guidelines are oriented.

NSF expects small ATE projects to be grounded in research and utilize materials developed by ATE centers. Generally speaking, the charge of small projects is to do, not necessarily to innovate or prove. Therefore, the charge for small project evaluations is to gather and convey evidence about how well this work is being done and how the project contributes to the improvement of technician education. Evaluators of small projects should seek empirical evidence about the extent to which…

The project’s activities are grounded in established practices, policies, frameworks, standards, etc. If small projects are not generating their own evidence of promise or impact, then they should be leveraging the existing evidence base to select and use strategies and materials that have been shown to be effective. Look to authoritative, trusted sources such as the National Academies Press (for example, see the recent report, Reaching Students: What Research Says About Effective Instruction in Undergraduate Science and Engineering) and top-tier education research journals.

The target audience is engaged. All projects should document who is participating in the project (students, faculty, partners, advisors, etc.) and how much. A simple tracking spreadsheet can go a long way toward evaluating this aspect of a project. Showing sustained engagement by a diverse set of stakeholders is important for demonstrating the project’s perceived relevance and quality.

The project contributes to changes in knowledge, skill, attitude, or behavior among the target audience. For any project that progresses beyond development to piloting or implementation, there is presumably some change being sought among those affected. What do they know that they didn’t know before? What new/improved skills do they have? Did their attitudes change? Are they doing anything differently? Even without experimental and quasi-experimental designs, it’s possible to establish empirical and logical linkages between the project’s activities and outcomes.

The ATE program solicitation notes that some projects funded through its Small Grants track “will serve as a prototype or pilot” for a subsequent project. As such, ATE small grant recipients should ensure their evaluations generate evidence that their approaches to improving technician education are worth the next level of investment by NSF.

To learn more about…
‒ the Common Guidelines, see EvaluATE’s Evaluation and Research in the ATE Program webinar recording and materials
‒ evaluation of small projects, see EvaluATE’s Low-Cost, High-Impact Evaluation for Small Projects webinar recording and materials
‒ alternative means for establishing causation, see Jane Davidson’s Understand Causes of Outcomes and Impacts webinar recording and slides

Blog: Indicators and the Difficulty With Them

Posted on January 21, 2015 by  in Blog ()

EvaluATE Blog Editor

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Evaluators that are working in education contexts are often required to use externally created criteria and standards, such as GPA targets, graduation rates, and other such metrics when evaluating program success. These standardized goals create a problem that program directors and their evaluators should be on the lookout for. It is called goal displacement, which occurs when one chases a target indicator at the expense of the other parts of a larger mission (Cronin and Gugimoto, 2014). An example of goal displacement was provided in a recent blog post by Bernard Marr (

“Another classic example comes from a Russian nail factory. When the government centrally planned the economy it created targets of output for the factory, measured in weight. The result was that the factory produced a small number of very heavy nails. Obviously, people in Russia didn’t just need massively big nails so the target was changed to the amount of nails the factory had to produce. As a consequence, the nail factory produced a massive amount of only tiny nails.”

The lesson here is that we have to understand that indicators are not truth, they are pointers to truth. As such, it is bad assessment practice to only use a single indicator in assessment and evaluation. In the Russian nail factory example ,suppose what you were really trying to measure was success of the factory in meeting the country’s needs for nails. Clearly, even though the factory was able to meet the targets for the weight or quantity indicators, it failed at its ultimate target, which was meeting the need for the right kind of nails.

I was moved to write about this issue when thinking about a real-world evaluation of an education program that has to meet federally mandated performance indicators, such as percentage of students who meet a certain GPA. The program works with students who tend towards low academic performance and who have little role modeling for success. In order to fully understand the program’s value, it was important to look at not only the number of people who met the federal target, but also statistics related to how students with different initial GPAs and different levels of parental support performed over time. This trend data showed the real story: Even those students who were not meeting the uniform federal target were still improving. More often, the students with less educated role models started with lower GPAs and increased those GPAs over time in the program, while students who had more educated role models, tended to start off better, but did not improve as much. This means that through mentoring, the program was having immense impact on the most needy students (low initial performers), whether or not they met the full federal standard. Although the program still needs to make improvements to reach the federal standards, we now know an important leverage point that can help the students improve even further – increased mentoring to compensate for a lack of educated role models in their personal lives. Thus we were able to look past just the indicator, and found what was really important to the program’s success!

Wouters, P. (2014). 3 The Citation: From Culture to Infrastructure. Beyond Bibliometrics: Harnessing Multidimensional Indicators of Scholarly Impact, 47.

Blog: Evaluating ATE Efforts Using Peer-Generated Surveys

Posted on December 17, 2014 by  in Blog ()


Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

During the course of evaluating the sustainability of NSF’s Advanced Technological Education program, I introduced a new method for creating evaluation surveys. I call it a Peer-Generated Likert Scale because it uses actual statements of the population of interest as the basis for the items on the survey. Listed below are the steps one would follow to develop a peer-generated Likert-type survey, using a generic example of a summer institute in the widget production industry.

1. Describe the subject of the evaluation and the purpose of the evaluation.
In this step, you want to develop a sense of the scope of your evaluation activity, the relevant content, and the relevant subjects. For example:

“This is a six-day faculty development program designed for middle and high school teachers, college faculty, administrators, and others to learn about the widget industry. The purpose of the evaluation is to obtain information about the success of the program.”

2. Define the domain of content to be measured by the survey.
This would require a review of the curriculum materials, conversations with the instructors, and perhaps a couple of classroom observations. Let us suppose the following are some of the elements of the domain to be addressed by a survey:

a. perceived learning about the widget industry
b. attitudes toward the institute
c. judgments about the quality of instruction
d. backgrounds of participants
e. institute organization and administration
f. facilities
g. etc.

3. Collect statements from the participants about the activity related to those domains.
Participants who are involved in the educational activity are given the opportunity to reflect anonymously upon their experiences. They are given prompts, such as :

a. Please list three strengths of the summer institute.
b. Please list three limitations of the institute.

4. Review the statements, select potential survey items, and pilot the survey.
These statements are then reviewed by the evaluation team and selected according to their match with the elements of the domain. They are put in a Likert-type format going from Strongly Agree, Agree, Uncertain, Disagree, to Strongly Disagree. You can plan that response time will be about 30 seconds/item. Most surveys will consist of 20 – 30 items.

5. Collect data and interpret the results.
The most effective way to report the results of this type of survey is to show the percent agreeing or strongly agreeing with the positively stated items (“This was one of the most effective workshops that I have ever taken.”) and disagreeing with the negatively stated items (“There was too much lecture and not enough hands-on experiences.”)

The survey I developed for my ATE research contained 23 such items, and I estimated it would take about 15 minutes to complete. Although I was evaluating ATE sustainability, ATE team leaders could use the process to evaluate their program or individual products and activities. Further information on the details of the procedure can be found in Welch, W. W. (2011). A study of the impact of the advanced technological education program. This study is available from University of Colorado’s DECA Project.

Newsletter: Evaluation that Seriously Gets to the Point- and Conveys it Brilliantly

Posted on April 1, 2013 by  in Newsletter - ()

Evaluation, much as we love it, has a reputation among nonevaluators for being overly technical and academic, lost in the details, hard work to wade through, and in the end, not particularly useful. Why is this? Many evaluators were originally trained in the social sciences. There we added numerous useful frameworks and methodologies into our toolkits. But, along the way, we were inculcated with several approaches, habits, and ways of communicating that are absolutely killing our ability to deliver the value we could be adding. Here are the worst of them:

  1. Writing question laundry lists – asking long lists of evaluation questions that are far too narrow and detailed (often at the indicator level)
  2. Leaping to measurement – diving into identifying intended outcomes and designing data collection instruments without a clear sense of who or what the evaluation is for
  3. Going SMART but unintelligent – focusing on what’s most easily measurable rather than making intelligent choices to go after what’s most important (SMART = specific, measurable, achievable, relevant, and time-based)
  4. Rorschach inkblotting – assuming that measures, metrics, indicators, and stories are the answers; they are not!
  5. Shirking valuing – treating evaluation as an opinion-gathering exercise rather than actually taking responsibility for drawing evaluative conclusions based on needs, aspirations, and other relevant values
  6. Getting lost in the details – leaving the reader wading through data instead of clearly and succinctly delivering the answers they need
  7. Burying the lead – losing the most important key messages by loading way too many “key points” into the executive summaries, not to mention the report itself, or using truly awful data visualization techniques
  8. Speaking in tongues – using academic and technical language that just makes no sense to normal people

Thankfully, hope is at hand! Breakthrough thinking and approaches are all around us, but many evaluators just aren’t aware of them . Some have been there for decades. Here’s a challenge for 2013. Seek out and get really serious about infusing the following into your evaluation work:

  • Evaluation-Specific Methodology (ESM) – the methodologies that are distinctive to evaluation, i.e., the ones that go directly after values. Examples include needs and values assessment; merit determination methodologies; importance weighting methodologies; evaluative synthesis methodologies; and value-for-money analysis
  • Actionable Evaluation – a pragmatic, utilization-focused framework for evaluation that asks high-level explicitly evaluative questions, and delivers direct answers to them using ESM
  • Data Visualization & Effective Reporting – the best of the best of dataviz, reporting, and communication to deliver insights that are not just understandable but unforgettable