We EvaluATE - Evaluation Design

Blog: Kirkpatrick Model for ATE Evaluation

Posted on October 2, 2019 by  in Blog ()
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Jim Kirkpatrick Wendy Kayser Kirkpatrick
Senior Consultant, Kirkpatrick Partners President, Kirkpatrick Partners

The Kirkpatrick Model is an evaluation framework organized around four levels of impact: reaction, learning, behavior, and results. It was developed more than 50 years ago by Jim’s father, Dr. Don Kirkpatrick, specifically for evaluating training initiatives in business settings. For decades, it has been widely believed that the four levels are applicable only to evaluating the effectiveness of corporate training programs. However, we and hundreds of global “four-level ambassadors” — including Lori Wingate and her colleagues at EvaluATE — have successfully applied Kirkpatrick outside of the typical “training” box. The Kirkpatrick Model has broad appeal because of its practical, results-oriented approach.

The Kirkpatrick Model provides the foundation for evaluating almost any kind of social, business, health, or education intervention. The process starts with identifying what success will look like and driving through with a well-coordinated, targeted plan of support, accountability, and measurement. It is a framework for demonstrating ultimate value through a compelling chain of evidence.

Kirpatrick Model Visual

Whether your Advanced Technological Education (ATE) grant focuses on enhancing a curricular program, providing professional development to faculty, developing educational materials, or serving as a resource and dissemination center, the four levels are relevant.

At the most basic level (Level 1: Reaction), you need to know what your participants think of your work and your products. If they don’t value what you’re providing, you have little chance of producing higher-level results.

Next, it’s important to determine how and to what extent participants’ knowledge, skills, attitudes, confidence, and/or commitment changed because of the resources and follow-up support you provided (Level 2: Learning). Many evaluations, unfortunately, don’t go beyond Level 2. But it’s a big mistake to assume that if learning takes place, behaviors change and results happen. It’s critical to determine the extent to which people are doing things differently because of their new knowledge, skill, etc. (Level 3: Behavior).

Finally, you need to be able to answer the question “So what?” In the ATE context, that means determining how your work has impacted the landscape of advanced technological education and workforce development (Level 4: Results).

The four levels are the foundation of the model, but there is much more to it. We hope you’ll take the time to examine and reflect on how this approach can bring value to your initiative and its evaluation. To learn more about Kirkpatrick, visit our website or  kirkpatrickpartners.com, where you’ll find a wealth of free resources, as well as information on our certificate and certification programs.

Want to learn more about this topic? View EvaluATE’s webinar ATE Evaluation: Measuring Reaction, Learning, Behavior, and Results.

 

Blog: 5 Tips for Evaluating Multisite Projects*

Posted on August 21, 2019 by  in Blog (, )

Senior Research Manager, Social & Economic Sciences Research Center at Washington State University

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Conducting evaluations for multisite projects can present unique challenges and opportunities. For example, evaluators must be careful to ensure that consistent data are captured across sites, which can be challenging. However, having results for multiple sites can lead to stronger conclusions about an intervention’s impact. The following are helpful tips for evaluating multisite projects.

 1.      Investigate the consistency of project implementation. Just because the same guidelines have been provided to each site does not mean that they have been implemented the same way! Variations in implementation can create difficulties in collecting the data and interpreting the evaluation results.

2.      Standardize data collection tools across sites. This will minimize confusion and result in a single dataset with information on all sites. On the downside, this may result in having to limit the data to a subset of information that is available across all sites.

3.      Help the project managers at each site understand the evaluation plan. Provide a clear, comprehensive overview of the evaluation plan that includes the expectations of the managers. Simplify their roles as much as possible.

4.      Be sensitive in reporting side-by-side results of the sites. Consult with project stakeholders to determine if it is appropriate or helpful to include side-by-side comparisons of the performance of the various sites.

5.      Analyze to what extent differences in outcomes are due to variations in project implementation. Variation in results across sites may provide clues to factors that may facilitate or impede the achievement of certain outcomes.

6.      Report the evaluation results back to the site managers in whatever form would be the most useful to them. This is an excellent opportunity to recruit the site managers as supporters of evaluation, especially if they see that the evaluation results can be used to aid their participant recruitment and fundraising efforts.

 

* This blog is a reprint of a conference handout from an EvaluATE workshop at the 2011 ATE PI Conference.

 

FOR MORE INFORMATION

Smith-Moncrieffe, D. (2009, October). Planning multi-site evaluations of model and promising programs. Paper presented at the Canadian Evaluation Society Conference, Ontario, CA.

Lawrenz, F., & Huffman, D. (2003). How can multi-site evaluations be participatory? American Journal of Evaluation, 24(4), 471–482.

Blog: Building Research-Practice Collaborations for Effective STEM + Computing Education Evaluation Design

Posted on November 29, 2018 by  in Blog ()

Director of Measurement, Evaluation, and Learning, Kapor Center

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

 

At the Kapor Center, our signature three-summer educational program (SMASH Academy) aims to prepare underrepresented high school students of color to pursue careers in science, technology, education, and mathematics (STEM) and computing through access to courses, support networks, and opportunities for social and personal development.

In the nonprofit sector, evaluations can be driven by funder requirements, which often focus on outcomes. However, by solely focusing on outcomes, teams can lose sight of the goal of STEM evaluation: to inform programming (through the creation of process evaluation tools such as observation protocols and course evaluations) to ensure youth of color are prepared for the future STEM economy.

To keep that goal in focus, the Kapor Center ensures that the evaluation method driving its work is utilization-focused evaluation. Utilization-focused evaluation begins with the premise that the success metric of an evaluation is the extent to which it is used by key stakeholders (Patton, 2008). This framework requires joint decision making between the evaluator and stakeholders to determine the purpose of the evaluation, the kind of data to be collected, the type of evaluation design to be created, and the uses of the evaluation. Using this framework shifts evaluation from a linear, top-down approach to a feedback loop involving practitioners.

Figure 1. Evaluation Cycle of SMASH Academy

The evaluation cycle at the Kapor Center, a collaboration between our research team and SMASH’s program team, is outlined below:

  1. Inquiry: This stage begins with conversations with the stakeholders (e.g., programs and leadership teams) about common understandings of short-, medium-, and long-term outcomes as well as the key strategies that drive outcomes. Delineating outcomes has been integral to working transparently toward program priorities.
  2. Instrument Development: Once groups are in agreement about the goal of the evaluation and our path to it, we develop instruments. Instrument mapping, linking each tool and question to specific outcomes, has been a good practice to open the communication channels among teams.
  3. Instrument Administration: When working with seasonal staff at the helm of evaluation administration, documentation of processes has been crucial for fidelity. Not surprisingly, with varying levels of experience among program staff, the creation of systems to standardize data collection has been key, including scoring rubrics to be used during observations and guides for survey administration.

Data Analysis and Reporting: When synthesizing data, analyses and reporting need to not only tell a broad impact story but also provide concrete targets and priorities for the program

  1. In this regard, analyses have encompassed pre-post outcome differences and reports on program experiences.
  2. Reflection and Integration: At the end of the program cycle, the program team reflects on the data together to inform their path forward. In such a meeting, the team engages in answering three questions: 1) What did you observe about the data? 2) What can you infer about the data and what evidence supports your inference? and 3) What are the next steps to develop and prioritize program modifications?

Developing stronger research-practice ties have been integral to the Kapor Center’s understanding of what works, for whom, and under what context to ensure more youth of color pursue and persist in STEM fields. Beyond the SMASH program, the practice of collective cooperation between researchers and practitioners provides an opportunity to impact strategies across the field.

 

References

Patton, M. Q. (2008). Utilization-focused evaluation. Newbury Park, CA: Sage.

 

Blog: Measure What Matters: Time for Higher Education to Revisit This Important Lesson

Posted on May 23, 2018 by  in Blog (, )

Senior Partner, Cosgrove & Associates

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

If one accepts Peter Drucker’s premise that “what gets measured, gets managed,” then two things are apparent: measurement is valuable, but measuring the wrong thing has consequences. Data collection efforts focusing on the wrong metrics lead to mismanagement and failure to recognize potential opportunities. Focusing on the right measures matters. For example, in Moneyball, Michael Lewis describes how the Oakland Athletics improved their won-loss record by revising player evaluation metrics to more fully understand players’ potential to score runs.

The higher education arena has equally high stakes concerning evaluation. A growing number of states (more than 30 in 2017)[1] have adopted performance funding systems to allocate higher education funding. Such systems focus on increasing the number of degree completers and have been fueled by calls for increased accountability. The logic of performance funding seems clear: Tie funding to the achievement of performance metrics, and colleges will improve their performance. However, research suggests we might want to re-examine this logic.  In “Why Performance-Based College Funding Doesn’t Work,” Nicholas Hillman found little to no evidence to support the connection between performance funding and improved educational outcomes.

Why are more states jumping on the performance-funding train? States are under political pressure, with calls for increased accountability and limited taxpayer dollars. But do the chosen performance metrics capture the full impact of education? Do the metrics result in more efficient allocation of state funding? The jury may be still out on these questions, but Hillman’s evidence suggests the answer is no.

The disconnect between performance funding and improved outcomes may widen even more when one considers open-enrollment colleges or colleges that serve a high percentage of adult, nontraditional, or low-income students. For example, when a student transfers from a community college (without a two-year degree) to a four-year college, should that behavior count against the community college’s degree completion metric? Might that student have been well-served by their time at the lower-cost college? When community colleges provide higher education access to adult students who enroll on a part-time basis, should they be penalized for not graduating such students within the arbitrary three-year time period? Might those students and that community have been well-served by access to higher education?

To ensure more equitable and appropriate use of performance metrics, college and states would be well-served to revisit current performance metrics and more clearly define appropriate metrics and data collection strategies. Most importantly, states and colleges should connect the analysis of performance metrics to clear and funded pathways for improvement. Stepping back to remember that the goal of performance measurement is to help build capacity and improve performance will place both parties in a better position to support and evaluate higher education performance in a more meaningful and equitable manner.

[1] Jones, T., & Jones, S. (2017, November 6). Can equity be bought? A look at outcomes-based funding in higher ed [Blog post].

Blog: Documenting Evaluations to Meet Changing Client Needs: Why an “Evaluation Plan” Isn’t Enough

Posted on April 11, 2018 by  in Blog ()

CEO, Hezel Associates

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

No plan of action survives first contact with the enemy – Helmuth van Moltke (paraphrased)

Evaluations are complicated examinations of complex phenomena. It is optimistic to assume that the details of an evaluation won’t change, particularly for a multiyear project. So how can evaluators deal with the inevitable changes? I propose that purposeful documentation of evaluations can help. In this blog, I focus on the distinctions among three types of documents—the contract, scope of work, and study protocol—each serving a specific purpose.

  • The contract codifies legal commitments between the evaluator and client. Contracts inevitably outline the price of the work, period of the agreement, and specifics like payment terms. They are hard to change after execution, and institutional clients often insist on using their own terms. Given this, while it is possible to revise a contract, it is impractical to use the contract to manage and document changes in the evaluation. I advocate including operational details in a separate “scope of work” (SOW) document, which can be external or appended to the contract.
  • The scope of the work translates the contract into an operational business relationship, listing the responsibilities of both the evaluator and client, tasks, deliverables, and timeline in detail sufficient for effective management of quality and cost. Because the scope of an evaluation will almost certainly change (timelines seem to be the first casualty), it is necessary to establish a process to document “change orders”—detailing revisions to SOW details, who proposed (by either party), who accepted—to avoid conflict. If a change to the scope does not affect the price of the work, it may be possible to manage and record changes without having to revisit the contract. I encourage evaluators to maintain “working copies” of the SOW, with changes, dates, and details of approval communications from clients. At Hezel Associates, practice is to share iterations of the SOW with the client when the work changes, with version dates to document the evaluation-as-implemented so everyone has the same picture of the work.
Working Scope of Work

Click to enlarge.

  • The study protocol then goes further, defining technical aspects of the research central to the work being performed. A complex evaluation project might require more than one protocol (e.g., for formative feedback and impact analysis), each being similar in concept to the Methods section of a thesis or dissertation. A protocol details questions to be answered, the study design, data needs, populations, data collection strategies and instrumentation, and plans for analyses and reporting. A protocol frames processes to establish and maintain appropriate levels of study rigor, builds consensus among team members, and translates evaluation questions into data needs and instrumentation to assure collection of required data before it is too late. Technical aspects of the evaluation are central to the quality of the work but likely to be mostly opaque to the client. I argue that it is crucial that such changes be formally documented in the protocol, but I suggest maintaining such technical information as internal documents for the evaluation team—unless a given change impacts the SOW, at which point the scope must be formally revised as well.

Each of these types of documentation serves an entirely different function as part of what might be called an “evaluation plan,” and all are important to a successful, high-quality project. Any part may be combined with others in a single file, transmitted to the client as part of a “kit,” maintained separately, or perhaps not shared with the client at all. Regardless, our experience has been that effective documentation will help avoid confusion after marching onto the evaluation field of battle.

Blog: Part 2: Using Embedded Assessment to Understand Science Skills

Posted on January 31, 2018 by , , in Blog ()
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
RBKlein
Rachel Becker-Klein
Senior Research Associate
PEER Associates
KPeterman
Karen Peterman
President
Karen Peterman Consulting
CStylinski
Cathlyn Stylinski
Senior Agent
University of Maryland Center
for Environmental Science

In our last EvaluATE blog, we defined embedded assessments (EAs) and described the benefits and challenges of using EAs to measure and understand science skills. Since then, our team has been testing the development and use of EAs for three citizen science projects through our National Science Foundation (NSF) project, Embedded Assessment for Citizen Science. Below we describe our journey and findings, including the creation and testing of an EA development model.

Our project first worked to test a process model for the development of EAs that could be both reliable and valid (Peterman, Becker-Klein, Stylinski, & Grack-Nelson, in press). Stage 1 was about articulating program goals and determining evidence for documenting those goals. In Stage 2, we collected both content validity evidence (the extent to which a measure was related to the identified goal) and response process validity evidence (how understandable the task was to participants). Finally, the third stage involved field-testing the EA. The exploratory process, with stages and associated products, is depicted in the figure below.

We applied our EA development approach to three citizen-science case study sites and were successful at creating an EA for each. For instance, for Nature’s Notebook (an online monitoring program where naturalists record observations of plants and animals to generate long-term datasets), we worked together to create an EA of paying close attention. This EA was developed for participants to use in the in-person workshop, where they practiced observation skills by collecting data about flora and fauna at the training site. Participants completed a Journal and Observation Worksheet as part of their training, and the EA process standardized the worksheet and also included a rubric for assessing how participants’ responses reflected their ability to pay close attention to the flora and fauna around them.

Embedded Assessment Development Process

Lessons Learned:

  • The EA development process had the flexibility to accommodate the needs of each case study to generate EAs that included a range of methods and scientific inquiry skills.
  • Both the SMART goals and Measure Design Template (see Stage 1 in the figure above) proved useful as a way to guide the articulation of project goals and activities, and the identification of meaningful ways to document evidence of inquiry learning.
  • The response process validity component (from Stage 2) resulted in key changes to each EA, such as changes to the assessment itself (e.g., streamlining the activities) as well as the scoring procedures.

Opportunities for using EAs:

  • Modifying existing activities. All three of the case studies had project activities that we could build off to create an EA. We were able to work closely with program staff to modify the activities to increase the rigor and standardization.
  • Formative use of EAs. Since a true EA is indistinguishable from the program itself, the process of developing and using an EA often resulted in strengthened project activities.

Challenges of using EAs:

  • Fine line between EA and program activities. If an EA is truly indistinguishable from the project activity itself, it can be difficult for project leaders and evaluators to determine where the program ends and the assessment begins. This ambiguity can create tension in cases where volunteers are not performing scientific inquiry skills as expected, making it difficult to disentangle whether the results were due to shortcomings of the program or a failing of the EA designed to evaluate the program.
  • Group versus individual assessments. Another set of challenges for administering EAs relates to the group-based implementation of many informal science projects. Group scores may not represent the skills of the entire group, making the results biased and difficult to interpret.

Though the results of this study are promising, we are at the earliest stages of understanding how to capture authentic evidence to document learning related to science skills. The use of a common EA development process, with common products, has the potential to generate new research to address the challenges of using EAs to measure inquiry learning in the context of citizen science projects and beyond. We will continue to explore these issues in our new NSF grant, Streamlining Embedded Assessment for Citizen Science (DRL #1713424).

Acknowledgments:

We would like to thank our case study partners: LoriAnne Barnett from Nature’s Notebook; Chris Goforth, Tanessa Schulte, and Julie Hall from Dragonfly Detectives; and Erick Anderson from the Young Scientists Club. This work was supported by the National Science Foundation under grant number DRL#1422099.

Resource:

Peterman, K., Becker-Klein, R., Stylinski, C., & Grack-Nelson, A. (2017). Exploring embedded assessment to document scientific inquiry skills within citizen science. In C. Herodotou, M. Sharples, & E. Scanlon (Eds.), Citizen inquiry: A fusion of citizen science and inquiry learning (pp. 63-82). New York, NY: Rutledge.

Blog: Addressing Challenges in Evaluating ATE Projects Targeting Outcomes for Educators

Posted on November 21, 2017 by  in Blog ()

CEO, Hezel Associates

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Kirk Knestis—CEO of Hezel Associates and ex-career and technology educator and professional development provider—here to share some strategies addressing challenges unique to evaluating Advanced Technological Education (ATE) projects that target outcomes for teachers and college faculty.

In addition to funding projects that directly train future technicians, the National Science Foundation (NSF) ATE program funds initiatives to improve abilities of grade 7-12 teachers and college faculty—the expectation being that improving their practice will directly benefit technical education. ATE tracks focusing on professional development (PD), capacity building for faculty, and technological education teacher preparation all count implicitly on theories of action (typically illustrated by a logic model) that presume outcomes for educators will translate into outcomes for student technicians. This assumption can present challenges to evaluators trying to understand how such efforts are working. Reference this generic logic model for discussion purposes:

Setting aside project activities acting directly on students, any strategy aimed at educators (e.g., PD workshops, faculty mentoring, or preservice teacher training) must leave them fully equipped with dispositions, knowledge, and skills necessary to implement effective instruction with students. Educators must then turn those outcomes into actions to realize similar types of outcomes for their learners. Students’ action outcomes (e.g., entering, persisting in, and completing training programs) depend, in turn, on them having the dispositions, knowledge, and skills educators are charged with furthering. If educators fail to learn what they should, or do not activate those abilities, students are less likely to succeed. So what are the implications—challenges and possible solutions—of this for NSF ATE evaluations?

  • EDUCATOR OUTCOMES ARE OFTEN NOT WELL EXPLICATED. Work with program designers to force them to define the new dispositions, understandings, and abilities that technical educators require to be effective. Facilitate discussion about all three outcome categories to lessen the chance of missing something. Press until outcomes are defined in terms of persistent changes educators will take away from project activities, not what they will do during them.
  • EDUCATORS ARE DIFFICULT TO TEST. To truly understand if an ATE project is making a difference in instruction, it is necessary to assess if precursor outcomes for them are realized. Dispositions (attitudes) are easy to assess with self-report questionnaires, but measuring real knowledge and skills requires proper assessments—ideally, performance assessments. Work with project staff to “bake” assessments into project strategies, to be more authentic and less intrusive. Strive for more than self-report measures of increased abilities.
  • INSTRUCTIONAL PRACTICES ARE DIFFICULT AND EXPENSIVE TO ASSESS. The only way to truly evaluate instruction is to see it, assessing pedagogy, content, and quality with rubrics or checklists. Consider replacing expensive on-site visits with the collection of digital videos or real-time, web-based telepresence.

With clear definitions of outcomes and collaboration with ATE project designers, evaluators can assess whether technician training educators are gaining the necessary dispositions, knowledge, and skills, and if they are implementing those practices with students. Assessing students is the next challenge, but until we can determine if educator outcomes are being achieved, we cannot honestly say that educator-improvement efforts made any difference.

Blog: Partnering with Clients to Avoid Drive-by Evaluation

Posted on November 14, 2017 by , in Blog ()
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
   
 John Cosgrove

Senior Partner, Cosgrove & Associates

 Maggie Cosgrove

Senior Partner, Cosgrove & Associates

If a prospective client says, “We need an evaluation, and we will send you the dataset for evaluation,” our advice is that this type of “drive-by evaluation” may not be in their best interest.

As calls for program accountability and data-driven decision making increase, so does demand for evaluation. Given this context, evaluation services are being offered in a variety of modes. Before choosing an evaluator, we recommend the client pause to consider what they would like to learn about their efforts and how evaluation can add value to such learning. This perspective requires one to move beyond data analysis and reporting of required performance measures to examining what is occurring inside the program.

By engaging our clients in conversations related to what they would like to learn, we are able to begin a collaborative and discovery-oriented evaluation. Our goal is to partner with our clients to identify and understand strengths, challenges, and emerging opportunities related to program/project implementation and outcomes. This process will help clients not only understand which strategies worked, but why they worked and lays the foundation for sustainability and scaling.

These initial conversations can be a bit of a dance, as clients often focus on funder-required accountability and performance measures. This is when it is critically important to elucidate the differences between evaluation and auditing or inspecting. Ann-Murray Brown examines this question and provides guidance as to why evaluation is more than just keeping score in Evaluation, Inspection, Audit: Is There a Difference? As we often remind clients, “we are not the evaluation police.”

During our work with clients to clarify logic models, we encourage them to think of their logic model in terms of storytelling. We pose commonsense questions such as: When you implement a certain strategy, what changes to you expect to occur? Why do you think those changes will take place? What do you need to learn to support current and future strategy development?

Once our client has clearly outlined their “story,” we move quickly to connect data collection to client-identified questions and, as soon as possible, we engage stakeholders in interpreting and using their data. We incorporate Veena Pankaj and Ann Emery’s (2016) data placemat process to engage clients in data interpretation.  By working with clients to fully understand their key project questions, focus on what they want to learn, and engage in meaningful data interpretation, we steer clear of the potholes associated with drive-by evaluations.

Pankaj, V. & Emery, A. (2016). Data placemats: A facilitative technique designed to enhance stakeholder understanding of data. In R. S. Fierro, A. Schwartz, & D. H. Smart (Eds.), Evaluation and Facilitation. New Directions for Evaluation, 149, 81-93.

Blog: Integrating Perspectives for a Quality Evaluation Design

Posted on August 2, 2017 by , in Blog ()
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
John Dorris

Director of Evaluation and Assessment, NC State Industry Expansion Solutions

Dominick Stephenson

Assistant Director of Research Development and Evaluation, NC State Industry Expansion Solutions

Designing a rigorous and informative evaluation depends on communication with program staff to understand planned activities and how those activities relate to the program sponsor’s objectives and the evaluation questions that reflect those objectives (see white paper related to communication). At NC State Industry Expansion Solutions, we have worked long enough on evaluation projects to know that such communication is not always easy because program staff and the program sponsor often look at the program from two different perspectives: The program staff focus on work plan activities (WPAs), while the program sponsor may be more focused on the evaluation questions (EQs). So, to help facilitate communication at the beginning of the evaluation project and assist in the design and implementation, we developed a simple matrix technique to link the WPAs and the EQs (see below).

Click to enlarge

For each of the WPAs, we link one or more EQs and indicate what types of data collection events will take place during the evaluation. During project planning and management, the crosswalk of WPAs and EQs will be used to plan out qualitative and quantitative data collection events.

Click to enlarge

The above framework may be more helpful with the formative assessment (process questions and activities). However, it can also enrich the knowledge gained by the participant outcomes analysis in the summative evaluation in the following ways:

Understanding how the program has been implemented will help determine fidelity to the program as planned, which will help determine the degree to which participant outcomes can be attributed to the program design.
Details on program implementation that are gathered during the formative assessment, when combined with evaluation of participant outcomes, can suggest hypotheses regarding factors that would lead to program success (positive participant outcomes) if the program is continued or replicated.
Details regarding the data collection process that are gathered during the formative assessment will help assess the quality and limitations of the participant outcome data, and the reliability of any conclusions based on that data.

So, for us this matrix approach is a quality-check on our evaluation design that also helps during implementation. Maybe you will find it helpful, too.

Blog: Logic Models for Curriculum Evaluation

Posted on June 7, 2017 by , in Blog ()
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Rachel Tripathy Linlin Li
Research Associate, WestEd Senior Research Associate, WestEd

At the STEM Program at WestEd, we are in the third year of an evaluation of an innovative, hands-on STEM curriculum. Learning by Making is a two-year high school STEM course that integrates computer programming and engineering design practices with topics in earth/environmental science and biology. Experts in the areas of physics, biology, environmental science, and computer engineering at Sonoma State University (SSU) developed the curriculum by integrating computer software with custom-designed experiment set-ups and electronics to create inquiry-based lessons. Throughout this project-based course, students apply mathematics, computational thinking, and the Next Generation Science Standards (NGSS) Scientific and Engineering Design Practices to ask questions about the world around them, and seek the answers. Learning by Making is currently being implemented in rural California schools, with a specific effort being made to enroll girls and students from minority backgrounds, who are currently underrepresented in STEM fields. You can listen to students and teachers discussing the Learning by Making curriculum here.

Using a Logic Model to Drive Evaluation Design

We derived our evaluation design from the project’s logic model. A logic model is a structured description of how a specific program achieves an intended learning outcome. The purpose of the logic model is to precisely describe the mechanisms behind the program’s effects. Our approach to the Learning by Making logic model is a variant on the five-column logic format that describes the inputs, activities, outputs, outcomes, and impacts of a program (W.K. Kellogg Foundation, 2014).

Learning by Making Logic Model

Click image to view enlarge

Logic models are read as a series of conditionals. If the inputs exist, then the activities can occur. If the activities do occur, then the outputs should occur, and so on. Our evaluation of the Learning by Making curriculum centers on the connections indicated by the orange arrows connecting outputs to outcomes in the logic model above. These connections break down into two primary areas for evaluation: 1) teacher professional development, and 2) classroom implementation of Learning by Making. The questions that correlate with the orange arrows above can be summarized as:

  • Are the professional development (PD) opportunities and resources for the teachers increasing teacher competence in delivering a computational thinking-based STEM curriculum? Does Learning by Making PD increase teachers’ use of computational thinking and project-based instruction in the classroom?
  • Does the classroom implementation of Learning by Making increase teachers’ use of computational thinking and project-based instruction in the classroom? Does classroom implementation promote computational thinking and project-based learning? Do students show an increased interest in STEM subjects?

Without effective teacher PD or classroom implementation, the logic model “breaks,” making it unlikely that the desired outcomes will be observed. To answer our questions about outcomes related to teacher PD, we used comprehensive teacher surveys, observations, bi-monthly teacher logs, and focus groups. To answer our questions about outcomes related to classroom implementation, we used student surveys and assessments, classroom observations, teacher interviews, and student focus groups. SSU used our findings to revise both the teacher PD resources and the curriculum itself to better situate these two components to produce the outcomes intended. By deriving our evaluation design from a clear and targeted logic model, we succeeded in providing actionable feedback to SSU aimed at keeping Learning by Making on track to achieve its goals.