Archive: evaluation

Blog: Not Just an Anecdote: Systematic Analysis of Qualitative Evaluation Data

Posted on August 30, 2017 by  in Blog ()

President and Founder, Creative Research & Evaluation LLC (CR&E)

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

As a Ph.D. trained anthropologist, I spent many years learning how to shape individual stories and detailed observations into larger patterns that help us understand social and cultural aspects of human life.  Thus, I was initially taken aback when I realized that program staff or program officers often initially think of qualitative evaluation as “just anecdotal.” Even people who want “stories” in their evaluation reports can be surprised at what is revealed through a systematic analysis of qualitative data.

Here are a few tips that can help lead to credible findings using qualitative data.  Examples are drawn from my experience evaluating ATE programs.

  • Organize your materials so that you can report which experiences are shared among program participants and what perceptions are unusual or unique. This may sound simple, but it takes forethought and time to provide a clear picture of the overall range and variation of participant perceptions. For example, in analyzing two focus group discussions held with the first cohort of students in an ATE program, I looked at each transcript separately to identify the program successes and challenges raised in each focus group. Comparing major themes raised by each group, I was confident when I reported that students in the program felt well prepared, although somewhat nervous about upcoming internships. On the other hand, although there were multiple joking comments about unsatisfactory classroom dynamics, I knew these were all made by one person and not taken seriously by other participants because I had assigned each participant a label and I used these labels in the focus group transcripts.
  • Use several qualitative data sources to provide strength to a complex conclusion. In technical terms, this is called “triangulation.” Two common methods of triangulation are comparing information collected from people with different roles in a program and comparing what people say with what they are observed doing. In some cases, data sources converge and in some cases they diverge. In collecting early information about an ATE program, I learned how important this program is to industry stakeholders. In this situation, there was such a need for entry-level technicians that stakeholders, students, and program staff all mentioned ways that immediate job openings might have a short-term priority over continuing immediately into advanced levels in the same program.
  • Think about qualitative and quantitative data together in relation to each other.  Student records and participant perceptions show different things and can inform each other. For example, instructors from industry may report a cohort of students as being highly motivated and uniformly successful at the same time that institutional records show a small number of less successful students. Both pieces of the picture are important here for assessing a project’s success; one shows high level of industry enthusiasm, while the other can provide exact percentages about participant success.

Additional Resources

The following two sources are updated classics in the fields of qualitative research and evaluation.

Miles, M. B., Huberman, A. M., & Saldana, J. (2014). Qualitative data analysis: A methods sourcebook. Thousand Oaks, CA: Sage.

Patton, M. Q. (2015). Qualitative research & evaluation methods: Integrating theory and practice: The definitive text of qualitative inquiry frameworks and options (4th ed.). Thousand Oaks, CA: Sage.

Blog: Reporting Anticipated, Questionable, and Unintended Project Outcomes

Posted on August 16, 2017 by  in Blog ()

Education Administrator, Independent

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Project evaluators are aware that evaluation aims to support learning and improvement. Through a series of planned interactions, event observations, and document reviews, the evaluator is charged with reporting to the project leadership team and ultimately the project’s funding agency, informing audiences of the project’s merit. This is not to suggest that reporting should only aim to identify positive impacts and outcomes of the project. Equally, there is substantive value in informing audiences of unintended and unattained project outcomes.

Evaluation reporting should discuss aspects of the project’s outcomes, whether anticipated, questionable, or unintended. When examining project outcomes the evaluator analyzes obtained information and facilitates project leadership through reflective thinking exercises for the purpose of defining the significance of the project and summarizing why outcomes matter.

Let’s be clear, outcomes are not to be regarded as something negative. In fact, with the projects that I have evaluated over the years, outcomes have frequently served as an introspective platform informing future curriculum decisions and directions internal to the institutional funding recipient. For example, the outcomes of one STEM project that focused on renewable energy technicians provided the institution with information that prompted the development of subsequent proposals and projects targeting engineering pathways.

Discussion and reporting of project outcomes also encapsulates lessons learned and affords the opportunity for the evaluator to ask questions such as:

  • Did the project increase the presence of the target group in identified STEM programs?
  • What initiatives will be sustained during post funding to maintain an increased presence of the target group in STEM programs?
  • Did project activities contribute to the retention/completion rates of the target group in identified STEM programs?
  • Which activities seemed to have the greatest/least impact on retention/completion rates?
  • On reflection, are there activities that could have more significantly contributed to retention/completion rates that were not implemented as part of the project?
  • To what extent did the project supply regional industries with a more diverse STEM workforce?
  • What effect will this have on regional industries during post project funding?
  • Were partners identified in the proposal realistic contributors to the funded project? Did they ensure a successful implementation enabling the attainment of anticipated outcomes?
  • What was learned about the characteristics of “good” and “bad” partners?
  • What are characteristics to look for and avoid to maximize productivity with future work?

Factors influencing outcomes include, but are not limited to:

  • Institutional changes, e.g., leadership;
  • Partner constraints or changes; and
  • Project/budgetary limitations.

In some instances, it is not unusual for the proposed project to be somewhat grandiose in identifying intended outcomes. Yet, when project implementation gets underway, intended activities may be compromised by external challenges. For example, when equipment is needed to support various aspects of a project, procurement and production channels may contribute to delays in equipment acquisition, thus adversely effecting project leadership’s ability to launch planned components of the project.

As a tip, it is worthwhile for those seeking funding to pose the outcome questions at the front-end of the project – when the proposal is being developed. Doing this will assist them in conceptualizing the intellectual merit and impact of the proposed project.

Resources and Links:

Developing an Effective Evaluation Report: Setting the Course for Effective Program Evaluation. Atlanta, Georgia: Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Office on Smoking and Health, Division of Nutrition, Physical Activity and Obesity, 2013.

Blog: Evaluator, Researcher, Both?

Posted on June 21, 2017 by  in Blog ()

Professor, College of William & Mary

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Having served as a project evaluator and as a project researcher, it is apparent to me how critical it is to have conversations about roles at the onset of funded projects.  Early and open conversations can help avoid confusion, help eliminate missed timing to collect critical data, and highlight where differences exist for each project team role. The blurring of lines over time regarding strict differences between evaluator and researcher requires project teams, evaluators, and researchers to create new definitions for project roles, to understand scope of responsibility for each role, and to build data systems that allow for sharing information across roles.

Evaluation serves a central role in funded research projects. The lines between the role of the evaluator and that of the researcher can blur, however, because many researchers also conduct evaluations. Scriven (2003/2004) saw the role of evaluation as a means to determine “the merit, worth, or value of things” (para. #1), whereas social science research instead is “restricted to empirical (rather than evaluative) research, and bases its conclusion only on factual results—that is, observed, measured, or calculated data” (para. #2).  Consider too, how Powell (2006) posited “Evaluation research can be defined as a type of study that uses standard social research methods for evaluative purposes” (p. 102).  It is easy to see how confusion arises.

Taking a step back can shed light on the differences in these roles and ways they are now being redefined. The role of researcher shows a different project perspective, as a goal of research is the production of knowledge, whereas the role of the external evaluator is to provide an “independent” assessment of the project and its outcomes. Typically, an evaluator is seen as a judge of a project’s merits, which assumes a perspective that a “right” outcome exists. Yet inherent in the role of evaluation are the values held by the evaluator, the project team, and the stakeholders as context influences the process and who makes decisions on where to focus attention, why, and how feedback is used (Skolits, Morrow, & Burr, 2009).  Knowing more about how the project team intends to use evaluation results to help improve project outcomes requires a shared understanding of the role of the evaluator (Langfeldt & Kyvik, 2011).

Evaluators seek to understand what information is important to collect and review and how to best use the findings to relate outcomes to stakeholders (Levin-Rozalis, 2003).  Researchers instead focus on diving deep into investigating a particular issue or topic with a goal of producing new ways of understanding in these areas. In a perfect world, the roles of evaluators and researchers are distinct and separate. But, given requirements for funded projects to produce outcomes that inform the field, new knowledge is also discovered by evaluators. The swirl of roles results in evaluators publishing results of projects that informs the field, researchers leveraging their evaluator roles to publish scholarly work, and both evaluators and researchers borrowing strategies from each other to conduct their work.

The blurring of roles requires project leaders to provide clarity about evaluator and researcher team functions. The following questions can help in this process:

  • How will the evaluator and researcher share data?
  • What are the expectations for publication from the project?
  • What kinds of formative evaluation might occur that ultimately changes the project trajectory? How do these changes influence the research portion of the project?
  • How does shared meaning of terms, role, scope of work, and authority for the project team occur?

Knowing how the evaluator and researcher will work together provides an opportunity to leverage expertise in ways that move beyond the simple additive effect of both roles.  Opportunities to share information is only possible when roles are coordinated, which requires advanced planning. It is important to move beyond siloed roles and towards more collaborative models of evaluation and research within projects. Collaboration requires more time and attention to sharing information and defining roles, but the time spent on coordinating these joint efforts is worth it given the contributions to both the project and to the field.


References

Levin-Rozalis, M. (2003). Evaluation and research: Differences and similarities. The Canadian Journal of Program Evaluation, 18(2):1-31.

Powell, R. R. (2006).  Evaluation research:  An overview.  Library Trends, 55(1), 102-120.

Scriven, M. (2003/2004).  Michael Scriven on the differences between evaluation and social science research.  The Evaluation Exchange, 9(4).

Blog: Logic Models for Curriculum Evaluation

Posted on June 7, 2017 by , in Blog ()
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Rachel Tripathy Linlin Li
Research Associate, WestEd Senior Research Associate, WestEd

At the STEM Program at WestEd, we are in the third year of an evaluation of an innovative, hands-on STEM curriculum. Learning by Making is a two-year high school STEM course that integrates computer programming and engineering design practices with topics in earth/environmental science and biology. Experts in the areas of physics, biology, environmental science, and computer engineering at Sonoma State University (SSU) developed the curriculum by integrating computer software with custom-designed experiment set-ups and electronics to create inquiry-based lessons. Throughout this project-based course, students apply mathematics, computational thinking, and the Next Generation Science Standards (NGSS) Scientific and Engineering Design Practices to ask questions about the world around them, and seek the answers. Learning by Making is currently being implemented in rural California schools, with a specific effort being made to enroll girls and students from minority backgrounds, who are currently underrepresented in STEM fields. You can listen to students and teachers discussing the Learning by Making curriculum here.

Using a Logic Model to Drive Evaluation Design

We derived our evaluation design from the project’s logic model. A logic model is a structured description of how a specific program achieves an intended learning outcome. The purpose of the logic model is to precisely describe the mechanisms behind the program’s effects. Our approach to the Learning by Making logic model is a variant on the five-column logic format that describes the inputs, activities, outputs, outcomes, and impacts of a program (W.K. Kellogg Foundation, 2014).

Learning by Making Logic Model

Click image to view enlarge

Logic models are read as a series of conditionals. If the inputs exist, then the activities can occur. If the activities do occur, then the outputs should occur, and so on. Our evaluation of the Learning by Making curriculum centers on the connections indicated by the orange arrows connecting outputs to outcomes in the logic model above. These connections break down into two primary areas for evaluation: 1) teacher professional development, and 2) classroom implementation of Learning by Making. The questions that correlate with the orange arrows above can be summarized as:

  • Are the professional development (PD) opportunities and resources for the teachers increasing teacher competence in delivering a computational thinking-based STEM curriculum? Does Learning by Making PD increase teachers’ use of computational thinking and project-based instruction in the classroom?
  • Does the classroom implementation of Learning by Making increase teachers’ use of computational thinking and project-based instruction in the classroom? Does classroom implementation promote computational thinking and project-based learning? Do students show an increased interest in STEM subjects?

Without effective teacher PD or classroom implementation, the logic model “breaks,” making it unlikely that the desired outcomes will be observed. To answer our questions about outcomes related to teacher PD, we used comprehensive teacher surveys, observations, bi-monthly teacher logs, and focus groups. To answer our questions about outcomes related to classroom implementation, we used student surveys and assessments, classroom observations, teacher interviews, and student focus groups. SSU used our findings to revise both the teacher PD resources and the curriculum itself to better situate these two components to produce the outcomes intended. By deriving our evaluation design from a clear and targeted logic model, we succeeded in providing actionable feedback to SSU aimed at keeping Learning by Making on track to achieve its goals.

Blog: National Science Foundation-funded Resources to Support Your Advanced Technological Education (ATE) Project

Posted on August 3, 2016 by  in Blog ()

Doctoral Associate, EvaluATE

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Did you know that other National Science Foundation programs focused on STEM education have centers that provide services to projects? EvaluATE offers evaluation-specific resources for the Advanced Technological Education program, while some of the others are broader in scope and purpose. They offer technical support, resources, and information targeted at projects within the scope of specific NSF funding programs. A brief overview of each of these centers is provided below, highlighting evaluation-related resources. Make sure to check the sites out for further information if you see something that might be of value for your project!

The Community for Advancing Discovery Research in Education (CADRE) is a network for NSF’s Discovery Research K-12 program (DR K-12). The evaluation resource on the CADRE site is a paper on evaluation options (formative and summative), which differentiates evaluation from the research and development efforts carried out as part of project implementation.  There are other more general resources such as guidelines and tools for proposal writing, a library of reports and briefs, along with a video showcase of DR K-12 projects.

The Center for the Advancement of Informal Science Education (CAISE) has an evaluation section of its website that is searchable by type of resource (i.e., reports, assessment instruments, etc.), learning environment, and audience. For example, there are over 850 evaluation reports and 416 evaluation instruments available for review. The site hosts the Principal Investigator’s Guide: Managing Evaluation in Informal STEM Education Projects, which was developed as an initiative of the Visitor Studies Association and has sections such as working with an evaluator, developing an evaluation plan, creating evaluation tools and reporting.

The Math and Science Partnership Network (MSPnet) supports the math and science partnership network and the STEM+C (computer science) community. MSPnet has a digital library with over 2,000 articles; a search using the term “eval” found 467 listings, dating back to 1987. There is a toolbox with materials such as assessments, evaluation protocols and form letters. Other resources in the MSPnet library include articles and reports related to teaching and learning, professional development, and higher education.

The Center for Advancing Research and Communication (ARC) supports the NSF Research and Evaluation on Education in Science and Engineering (REESE) program through technical assistance to principal investigators. An evaluation-specific resource includes material from a workshop on implementation evaluation (also known as process evaluation).

The STEM Learning and Research Center (STELAR) provides technical support for the Innovative Technology Experiences for Students and Teachers (ITEST) program. Its website includes links to a variety of instruments, such as the Grit Scale, which can be used to assess students’ resilience for learning, which could be part of a larger evaluation plan.

Blog: The Shared Task of Evaluation

Posted on November 18, 2015 by  in Blog (, )

Independent Educational Program Evaluator

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Evaluation was an important strand at the recent ATE meeting in Washington, DC. As I reflected on my own practice as an external evaluator and listened to the comments of my peers, I was impressed once again with how dependent evaluation is on a shared effort by project stakeholders. Ironically, the more external an evaluator is to a project, the more important it is to collaborate closely with PIs, program staff, and participating institutions. Many assessment and data collection activities that are technically part of the outside evaluation are logistically and financially dependent on the internal workings of the project.

This has implications for the scope of work for evaluation and for the evaluation budget. A task might appear in the project proposal as, “survey all participants,” and it would likely be part of the evaluator’s scope of work. But in practice, tasks such as deciding what to ask on the survey, reaching the participants, and following up with nonresponders are likely to require work by the PIs or their assistants.

Occasionally you hear certain percentages cited as appropriate levels of effort for evaluation. Whatever overall portion evaluation plays in a project, my approach is to think of that portion as the sum of my efforts and those of my clients. This has several advantages:

  • During planning, it immediately highlights data that might be difficult to collect. It is much easier to come up with a solution or an alternative in advance and avoid a big gap in the evidence record.
  • It makes clear who is responsible for what activities and avoids embarrassing confrontations along the lines of, “I thought you were going to do that.”
  • It keeps innocents on the project and evaluation staffs from being stuck with long (and possibly uncompensated) hours trying to carry out tasks outside their expected job descriptions.
  • It allows for more accurate budgeting. If I know that a particular study involves substantial clerical support for pulling records from school databases, I can reduce my external evaluation fee, while at the same time warning the PI to anticipate those internal evaluation costs.

The simplest way to assure that these dependencies are identified is to consider them during the initial logic modelling of the project. If an input is professional development, and its output is instructors who use the professional development, and the evidence for the output is use of project resources, who will have to be involved in collecting that evidence? Even if the evaluator proposes to visit every instructor and watch them in practice, it is likely that those visits will have to be coordinated by someone close to the instructional calendar and daily schedule. Specifying and fairly sharing those tasks produces more data, better data, and happier working relationships.

Blog: Checklists for Evaluating K-12 and Credentialing Testing Programs

Posted on October 14, 2015 by , in Blog ()
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Kosh Cizek
Audra Kosh
Doctoral Student
Learning Sciences and Psychological Studies
University of North Carolina, Chapel Hill
Gregory Cizek
Professor
Educational Measurement and Evaluation
University of North Carolina, Chapel Hill

In 2012, we created two checklists for evaluators to use as a tool for evaluating K-12 and credentialing assessment programs. The purpose of these checklists is to assist evaluators in thoroughly reviewing testing programs by distilling the best practices for testing outlined by various professional associations, including the Standards for Educational and Psychological Testing, the U.S. Department of Education’s Standards and Assessment Peer Review Guidance, the Standards for the Accreditation of Certification Programs, the Code of Fair Testing Practices in Education, and the Rights and Responsibilities of Test Takers.

The checklists were developed to allow evaluation of five aspects of testing: 1) Test Development, 2) Test Administration, 3) Reliability Evidence, 4) Validity Evidence, and 5) Scoring and Reporting. A separate checklist was developed for each area; each of the checklists presents detailed indicators of quality testing programs that evaluators can check off as observed (O), not observed (N), or not applicable (NA) as they conduct evaluations. Three examples of checklist items are included below (one each from the Test Development, Test Administration, and Scoring and Reporting checklists).

The checklists are intended to be used by those wishing to evaluate K-12 or credentialing assessment programs against consensus criteria regarding quality standards for such programs. One of the main sources informing development of the original checklists was the guidance provided in the then-current edition of the Standards for Educational and Psychological Testing (AERA, APA, NCME, 1999). However, much has changed in testing since the publication of the 1999 Standards, and the Standards were revised in 2014 to address emerging methods and concerns related to K-12 and credentialing assessment programs. Consequently, revised checklists have been produced to reflect the new Standards.

The latest edition of the Standards, as compared to the 1999 edition, pays greater attention to testing diverse populations and the role of new technologies in testing. For example, the following three key revisions to the Standards are reflected in the new checklists:

  1. Validity and reliability evidence should be produced and documented for subgroups of test takers. Testing programs should collect validity evidence for various subgroups of test takers from different socioeconomic, linguistic, and cultural backgrounds, as opposed to aggregating validity evidence for an entire sample of test takers. A focus on validity evidence within unique subgroups helps ensure that test interpretations remain valid for all members of the intended testing population.
  2. Tests should be administered in an appropriate language. Given that test takers can come from linguistically diverse backgrounds, evaluators should check that tests are administered in the most appropriate language for the intended population and intended purpose of the test. Interpreters, if used, should be fluent in both the language and content of the test.
  3. Automated scoring methods should be described. Current tests increasingly rely on automated scoring methods to score constructed-response items previously scored by human raters. Testing programs should document how automated scoring algorithms are used and how scores obtained from such algorithms should be interpreted.

Although these three new themes in the Standards illustrate the breadth of coverage of the checklists, they provide only a sample of the changes embodied in the full version of the revised checklists, which contain approximately 100 specific practices that testing programs should follow distilled from contemporary professional standards for assessment programs. The revised checklists are particularly helpful in that they provide users with a single-source compilation of the most up-to-date and broadly endorsed elements of defensible testing practice.  Downloadable copies of the revised checklists for K-12 and credentialing assessment programs can be found at (bit.ly/checklist-assessment).