Subscribe here for quick access to our latest blog posts. New to RSS feeds? Click here

Blog: Checklists for Evaluating K-12 and Credentialing Testing Programs

Posted on October 14, 2015 by , in Blog
Kosh Cizek
Audra Kosh
Doctoral Student
Learning Sciences and Psychological Studies
University of North Carolina, Chapel Hill
Gregory Cizek
Professor
Educational Measurement and Evaluation
University of North Carolina, Chapel Hill

In 2012, we created two checklists for evaluators to use as a tool for evaluating K-12 and credentialing assessment programs. The purpose of these checklists is to assist evaluators in thoroughly reviewing testing programs by distilling the best practices for testing outlined by various professional associations, including the Standards for Educational and Psychological Testing, the U.S. Department of Education’s Standards and Assessment Peer Review Guidance, the Standards for the Accreditation of Certification Programs, the Code of Fair Testing Practices in Education, and the Rights and Responsibilities of Test Takers.

The checklists were developed to allow evaluation of five aspects of testing: 1) Test Development, 2) Test Administration, 3) Reliability Evidence, 4) Validity Evidence, and 5) Scoring and Reporting. A separate checklist was developed for each area; each of the checklists presents detailed indicators of quality testing programs that evaluators can check off as observed (O), not observed (N), or not applicable (NA) as they conduct evaluations. Three examples of checklist items are included below (one each from the Test Development, Test Administration, and Scoring and Reporting checklists).

The checklists are intended to be used by those wishing to evaluate K-12 or credentialing assessment programs against consensus criteria regarding quality standards for such programs. One of the main sources informing development of the original checklists was the guidance provided in the then-current edition of the Standards for Educational and Psychological Testing (AERA, APA, NCME, 1999). However, much has changed in testing since the publication of the 1999 Standards, and the Standards were revised in 2014 to address emerging methods and concerns related to K-12 and credentialing assessment programs. Consequently, revised checklists have been produced to reflect the new Standards.

The latest edition of the Standards, as compared to the 1999 edition, pays greater attention to testing diverse populations and the role of new technologies in testing. For example, the following three key revisions to the Standards are reflected in the new checklists:

  1. Validity and reliability evidence should be produced and documented for subgroups of test takers. Testing programs should collect validity evidence for various subgroups of test takers from different socioeconomic, linguistic, and cultural backgrounds, as opposed to aggregating validity evidence for an entire sample of test takers. A focus on validity evidence within unique subgroups helps ensure that test interpretations remain valid for all members of the intended testing population.
  2. Tests should be administered in an appropriate language. Given that test takers can come from linguistically diverse backgrounds, evaluators should check that tests are administered in the most appropriate language for the intended population and intended purpose of the test. Interpreters, if used, should be fluent in both the language and content of the test.
  3. Automated scoring methods should be described. Current tests increasingly rely on automated scoring methods to score constructed-response items previously scored by human raters. Testing programs should document how automated scoring algorithms are used and how scores obtained from such algorithms should be interpreted.

Although these three new themes in the Standards illustrate the breadth of coverage of the checklists, they provide only a sample of the changes embodied in the full version of the revised checklists, which contain approximately 100 specific practices that testing programs should follow distilled from contemporary professional standards for assessment programs. The revised checklists are particularly helpful in that they provide users with a single-source compilation of the most up-to-date and broadly endorsed elements of defensible testing practice.  Downloadable copies of the revised checklists for K-12 and credentialing assessment programs can be found at (bit.ly/checklist-assessment).