This checklist provides information on what should be included in evaluation plans for proposals to the
National Science Foundation’s (NSF) Advanced Technological Education (ATE) program. Grant seekers should carefully read the most recent ATE program solicitation (ATE Program Solicitation) for details about the program and proposal submission requirements.
ATE Evaluation Plan Checklist Field Test
EvaluATE invites individuals who are developing proposals for the National Science Foundation’s Advanced Technological Education (ATE) program to field test our updated ATE Evaluation Plan Checklist and provide feedback for improvement.
The field test version of the checklist is available below.
How to participate in the field test:
(1) Use the checklist while developing the evaluation plan for an ATE proposal.
(2) After you have completed your proposal, complete the brief feedback form.
After a few questions about the context of your work, this form will prompt you to answer four open-ended questions about your experience with the checklist:
• What was especially helpful about this checklist?
• What did you find confusing or especially difficult to apply?
• What would you add, change, or remove?
• If using this checklist affected the contents of your evaluation plan or your process for developing it, please describe how it influenced you.
There is a dearth of research on evaluation practice, particularly of the sort that practitioners can use to improve their own work (according to Nick Smith in a forthcoming edition of New Directions for Evaluation, “Using Action Design Research to Research and Develop Evaluation Practice”1,2).
Action design research is described by Dr. Smith as a “strategy for developing and testing alternative evaluation practices within a case-based, practical reasoning view of evaluation practice.” This approach is grounded in the understanding that evaluation is not a “generalizable intervention to be evaluated, but a collection of performances to be investigated” (p. 5). Importantly, action design research is conducted in real time, in authentic evaluation contexts. Its purpose is not only to better understand evaluation practices, but to develop effective solutions to common challenges.
We at EvaluATE are always on the lookout for opportunities to test out ideas for improving evaluation practice as well as our own work in providing evaluation education. A chronic problem for many evaluators is low response rates. Since 2009, EvaluATE has presented 4 to 6 webinars per year, each concluding with a brief feedback survey. Given that these webinars are about evaluation, a logical conclusion is that participants are predisposed to evaluation and will readily complete the surveys, right? Not really. Our response rates for these surveys range from 34 to 96 percent, with an average of 60 percent. I believe we should consistently be in the 90 to 100 percent range.
So in the spirit of action design research on evaluation, I decided to try a little experiment. At our last webinar, before presenting any content, I showed a slide with the following statement beside an empty checkbox: “I agree to complete the <5-minute feedback survey at the end of this webinar.” I noted the importance of evaluation for improving our center’s work and for our accountability to the National Science Foundation. We couldn’t tell exactly how many people checked the box, but it’s clear that several did (play the video clip below). I was optimistic that asking for this public (albeit anonymous) commitment at the start of the webinar would boost response rates substantially.
The result: 72 percent completed the survey. Pretty good, but well short of my standard for excellence. It was our eighth highest response rate ever and highest for the past year, but four of the five webinar surveys in 2013-14 had response rates between 65 and 73 percent. Like so often in research, the initial results are inclusive and we will have to investigate further: How are webinar response rates affected by audience composition, perceptions of the webinar’s quality, or asking for participation multiple times? As Nick Smith pointed out in his review of a draft of this blog: “What you are really after is not just a high response rate, but a greater understanding of what effects webinar evaluation response rates. That kind of insight turns your efforts from local problem solving to generalizable knowledge – from Action Design Problem Solving to Action Design Research.”
I am sharing this experience not because I found the sure-fire way to get people to respond to webinar evaluation surveys. Rather, I am sharing it as a lesson learned and to invite you to conduct your own action design research on evaluation and tell us about it here on the EvaluATE blog.
1 Disclosure: Nick Smith is the chairperson of EvaluATE’s National Visiting Committee, an advisory panel that reports to the National Science Foundation.
2 Smith, N. L. (in press). Using action design research to research and develop evaluation practice. In P. R. Brandon (Ed.), Recent developments in research on evaluation. New Directions for Evaluation.
Learning Sciences and Psychological Studies
University of North Carolina, Chapel Hill
Educational Measurement and Evaluation
University of North Carolina, Chapel Hill
In 2012, we created two checklists for evaluators to use as a tool for evaluating K-12 and credentialing assessment programs. The purpose of these checklists is to assist evaluators in thoroughly reviewing testing programs by distilling the best practices for testing outlined by various professional associations, including the Standards for Educational and Psychological Testing, the U.S. Department of Education’s Standards and Assessment Peer Review Guidance, the Standards for the Accreditation of Certification Programs, the Code of Fair Testing Practices in Education, and the Rights and Responsibilities of Test Takers.
The checklists were developed to allow evaluation of five aspects of testing: 1) Test Development, 2) Test Administration, 3) Reliability Evidence, 4) Validity Evidence, and 5) Scoring and Reporting. A separate checklist was developed for each area; each of the checklists presents detailed indicators of quality testing programs that evaluators can check off as observed (O), not observed (N), or not applicable (NA) as they conduct evaluations. Three examples of checklist items are included below (one each from the Test Development, Test Administration, and Scoring and Reporting checklists).
The checklists are intended to be used by those wishing to evaluate K-12 or credentialing assessment programs against consensus criteria regarding quality standards for such programs. One of the main sources informing development of the original checklists was the guidance provided in the then-current edition of the Standards for Educational and Psychological Testing (AERA, APA, NCME, 1999). However, much has changed in testing since the publication of the 1999 Standards, and the Standards were revised in 2014 to address emerging methods and concerns related to K-12 and credentialing assessment programs. Consequently, revised checklists have been produced to reflect the new Standards.
The latest edition of the Standards, as compared to the 1999 edition, pays greater attention to testing diverse populations and the role of new technologies in testing. For example, the following three key revisions to the Standards are reflected in the new checklists:
Validity and reliability evidence should be produced and documented for subgroups of test takers. Testing programs should collect validity evidence for various subgroups of test takers from different socioeconomic, linguistic, and cultural backgrounds, as opposed to aggregating validity evidence for an entire sample of test takers. A focus on validity evidence within unique subgroups helps ensure that test interpretations remain valid for all members of the intended testing population.
Tests should be administered in an appropriate language. Given that test takers can come from linguistically diverse backgrounds, evaluators should check that tests are administered in the most appropriate language for the intended population and intended purpose of the test. Interpreters, if used, should be fluent in both the language and content of the test.
Automated scoring methods should be described. Current tests increasingly rely on automated scoring methods to score constructed-response items previously scored by human raters. Testing programs should document how automated scoring algorithms are used and how scores obtained from such algorithms should be interpreted.
Although these three new themes in the Standards illustrate the breadth of coverage of the checklists, they provide only a sample of the changes embodied in the full version of the revised checklists, which contain approximately 100 specific practices that testing programs should follow distilled from contemporary professional standards for assessment programs. The revised checklists are particularly helpful in that they provide users with a single-source compilation of the most up-to-date and broadly endorsed elements of defensible testing practice. Downloadable copies of the revised checklists for K-12 and credentialing assessment programs can be found at (bit.ly/checklist-assessment).
DRAFT: This checklist is designed to help project staff create a project resume. A project resume is a list of all key activities or accomplishments of a project. This document can easily be created in a word processing document, then uploaded to the project’s website. Make the resume easy to find on the project’s website, such as in the “About” section. For a more dynamic resume, include links to supporting documents, staff biographies, or personal Web pages, this will allow users to quickly locate items referenced on the project’s resume. Tracking all activities over the life of a project will make it easier to complete annual reports, apply for future funding, and respond to information requests. For an example of our project resume see (About > EvaluATE’s Resume).