Hezel Associates is a research and evaluation firm with a long history of studying education innovations. As CEO, my colleagues and I work with developers of programs and innovations intended to realize lasting outcomes for STEM education stakeholders. Our NSF-funded work has included evaluations of two ATE grants, along with ten projects across eight other programs over the past five years.
Those projects vary substantially but one constant is what I call “The NSF Conundrum”: Too-common, problematic inconsistencies in distinctions between RESEARCH to “advance knowledge” in STEM education (the Intellectual Merit review criterion) and EVALUATION of Foundation-funded activities. In Hezel Associates’ internal lexicon, PIs should collect and analyze data for “research” aims defined for the project. The evaluator, typically external, is responsible for a “program evaluation” examining the PI’s work—including research. Both require data collection and analysis, but for different uses.
I make this distinction not suggesting that it is universal, but instead to illuminate instances where individuals—PIs, program officers, evaluators, and review panelists—make important decisions based on different assumptions and principles. A particularly egregious example occurred for a Hezel Associates client proposing an ITEST Strategies project in early 2014. One panelist, supporting a “POOR” rating, wrote…
“The authors provide a general description of evaluation questions, data needs, instruments, but with no statistical analysis for the data once collected. The authors are advised to develop a comprehensive evaluation plan that would provide credible results from the comparison of the two programs…”
…apparently missing that such a plan was detailed as half of the “research and development” (R&D) effort proposed earlier in the document—including random assignment, validated instruments, and inferential statistics examining group differences (ANOVA) with tests of assumptions and post hoc corrections. The “research” compared the two program offerings, while the “evaluation” assessed the rigor and implementation of the collaborative R&D activities. This panelist’s personal definition of terms predisposed him to focus in the wrong place for crucial proposal content.
Help is here, however, in the form of The Common Guidelines for Education Research and Development (http://www.nsf.gov/pubs/2013/nsf13126/nsf13126.pdf). Released in 2013 by NSF and the U.S. Department of Education, this report aims to “enhance the efficiency and effectiveness of both agencies’ STEM education research and development programs.” The Guidelines enumerate six “types” of research, differentiating purposes and defining standards of rigor for studies ranging from foundational research to effectiveness tests of innovations implemented “at scale.” The 2014 ATE program solicitation specifically invokes the Guidelines for the Targeted Research on Technician Education track, although all activities described as appropriate for projects in any of the ATE program tracks could arguably be situated in that framework. And the new Guidelines have the potential to largely resolve the conundrum.
So consider this a plea for NSF staff, PIs, evaluators, and other stakeholders to adopt the R&D orientation and framework defined by the Guidelines and to agree to key definitions. The resulting conception of research as a development function and the opportunity to clarify the evaluator’s role, should help us all better support the Foundation’s goals.