Archive: assessment

Blog: Part 2: Using Embedded Assessment to Understand Science Skills

Posted on January 31, 2018 by , , in Blog ()
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
RBKlein
Rachel Becker-Klein
Senior Research Associate
PEER Associates
KPeterman
Karen Peterman
President
Karen Peterman Consulting
CStylinski
Cathlyn Stylinski
Senior Agent
University of Maryland Center
for Environmental Science

In our last EvaluATE blog, we defined embedded assessments (EAs) and described the benefits and challenges of using EAs to measure and understand science skills. Since then, our team has been testing the development and use of EAs for three citizen science projects through our National Science Foundation (NSF) project, Embedded Assessment for Citizen Science. Below we describe our journey and findings, including the creation and testing of an EA development model.

Our project first worked to test a process model for the development of EAs that could be both reliable and valid (Peterman, Becker-Klein, Stylinski, & Grack-Nelson, in press). Stage 1 was about articulating program goals and determining evidence for documenting those goals. In Stage 2, we collected both content validity evidence (the extent to which a measure was related to the identified goal) and response process validity evidence (how understandable the task was to participants). Finally, the third stage involved field-testing the EA. The exploratory process, with stages and associated products, is depicted in the figure below.

We applied our EA development approach to three citizen-science case study sites and were successful at creating an EA for each. For instance, for Nature’s Notebook (an online monitoring program where naturalists record observations of plants and animals to generate long-term datasets), we worked together to create an EA of paying close attention. This EA was developed for participants to use in the in-person workshop, where they practiced observation skills by collecting data about flora and fauna at the training site. Participants completed a Journal and Observation Worksheet as part of their training, and the EA process standardized the worksheet and also included a rubric for assessing how participants’ responses reflected their ability to pay close attention to the flora and fauna around them.

Embedded Assessment Development Process

Lessons Learned:

  • The EA development process had the flexibility to accommodate the needs of each case study to generate EAs that included a range of methods and scientific inquiry skills.
  • Both the SMART goals and Measure Design Template (see Stage 1 in the figure above) proved useful as a way to guide the articulation of project goals and activities, and the identification of meaningful ways to document evidence of inquiry learning.
  • The response process validity component (from Stage 2) resulted in key changes to each EA, such as changes to the assessment itself (e.g., streamlining the activities) as well as the scoring procedures.

Opportunities for using EAs:

  • Modifying existing activities. All three of the case studies had project activities that we could build off to create an EA. We were able to work closely with program staff to modify the activities to increase the rigor and standardization.
  • Formative use of EAs. Since a true EA is indistinguishable from the program itself, the process of developing and using an EA often resulted in strengthened project activities.

Challenges of using EAs:

  • Fine line between EA and program activities. If an EA is truly indistinguishable from the project activity itself, it can be difficult for project leaders and evaluators to determine where the program ends and the assessment begins. This ambiguity can create tension in cases where volunteers are not performing scientific inquiry skills as expected, making it difficult to disentangle whether the results were due to shortcomings of the program or a failing of the EA designed to evaluate the program.
  • Group versus individual assessments. Another set of challenges for administering EAs relates to the group-based implementation of many informal science projects. Group scores may not represent the skills of the entire group, making the results biased and difficult to interpret.

Though the results of this study are promising, we are at the earliest stages of understanding how to capture authentic evidence to document learning related to science skills. The use of a common EA development process, with common products, has the potential to generate new research to address the challenges of using EAs to measure inquiry learning in the context of citizen science projects and beyond. We will continue to explore these issues in our new NSF grant, Streamlining Embedded Assessment for Citizen Science (DRL #1713424).

Acknowledgments:

We would like to thank our case study partners: LoriAnne Barnett from Nature’s Notebook; Chris Goforth, Tanessa Schulte, and Julie Hall from Dragonfly Detectives; and Erick Anderson from the Young Scientists Club. This work was supported by the National Science Foundation under grant number DRL#1422099.

Resource:

Peterman, K., Becker-Klein, R., Stylinski, C., & Grack-Nelson, A. (2017). Exploring embedded assessment to document scientific inquiry skills within citizen science. In C. Herodotou, M. Sharples, & E. Scanlon (Eds.), Citizen inquiry: A fusion of citizen science and inquiry learning (pp. 63-82). New York, NY: Rutledge.

Blog: Addressing Challenges in Evaluating ATE Projects Targeting Outcomes for Educators

Posted on November 21, 2017 by  in Blog ()

CEO, Hezel Associates

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Kirk Knestis—CEO of Hezel Associates and ex-career and technology educator and professional development provider—here to share some strategies addressing challenges unique to evaluating Advanced Technological Education (ATE) projects that target outcomes for teachers and college faculty.

In addition to funding projects that directly train future technicians, the National Science Foundation (NSF) ATE program funds initiatives to improve abilities of grade 7-12 teachers and college faculty—the expectation being that improving their practice will directly benefit technical education. ATE tracks focusing on professional development (PD), capacity building for faculty, and technological education teacher preparation all count implicitly on theories of action (typically illustrated by a logic model) that presume outcomes for educators will translate into outcomes for student technicians. This assumption can present challenges to evaluators trying to understand how such efforts are working. Reference this generic logic model for discussion purposes:

Setting aside project activities acting directly on students, any strategy aimed at educators (e.g., PD workshops, faculty mentoring, or preservice teacher training) must leave them fully equipped with dispositions, knowledge, and skills necessary to implement effective instruction with students. Educators must then turn those outcomes into actions to realize similar types of outcomes for their learners. Students’ action outcomes (e.g., entering, persisting in, and completing training programs) depend, in turn, on them having the dispositions, knowledge, and skills educators are charged with furthering. If educators fail to learn what they should, or do not activate those abilities, students are less likely to succeed. So what are the implications—challenges and possible solutions—of this for NSF ATE evaluations?

  • EDUCATOR OUTCOMES ARE OFTEN NOT WELL EXPLICATED. Work with program designers to force them to define the new dispositions, understandings, and abilities that technical educators require to be effective. Facilitate discussion about all three outcome categories to lessen the chance of missing something. Press until outcomes are defined in terms of persistent changes educators will take away from project activities, not what they will do during them.
  • EDUCATORS ARE DIFFICULT TO TEST. To truly understand if an ATE project is making a difference in instruction, it is necessary to assess if precursor outcomes for them are realized. Dispositions (attitudes) are easy to assess with self-report questionnaires, but measuring real knowledge and skills requires proper assessments—ideally, performance assessments. Work with project staff to “bake” assessments into project strategies, to be more authentic and less intrusive. Strive for more than self-report measures of increased abilities.
  • INSTRUCTIONAL PRACTICES ARE DIFFICULT AND EXPENSIVE TO ASSESS. The only way to truly evaluate instruction is to see it, assessing pedagogy, content, and quality with rubrics or checklists. Consider replacing expensive on-site visits with the collection of digital videos or real-time, web-based telepresence.

With clear definitions of outcomes and collaboration with ATE project designers, evaluators can assess whether technician training educators are gaining the necessary dispositions, knowledge, and skills, and if they are implementing those practices with students. Assessing students is the next challenge, but until we can determine if educator outcomes are being achieved, we cannot honestly say that educator-improvement efforts made any difference.

Blog: Thinking Critically about Critical Thinking Assessment

Posted on October 31, 2017 by , in Blog ()
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Vera Beletzan
Senior Special Advisor Essential Skills
Humber College
Paula Gouveia
Dean, School of Liberal Arts and Sciences
Humber College

Humber College, as part of a learning outcomes assessment consortium funded by the Higher Education Quality Council of Ontario (HEQCO), has developed an assessment tool to measure student gains in critical thinking (CT) as expressed through written communication (WC).

In Phase 1 of this project, a cross-disciplinary team of faculty and staff researched and developed a tool to assess students’ CT skills through written coursework. The tool was tested for usability by a variety of faculty and in a variety of learning contexts. Based on this pilot, we revised the tool to focus on two CT dimensions: comprehension and integration of writer’s ideas, within which are six variables: interpretation, analysis, evaluation, inference, explanation, and self-regulation.

In Phase 2, our key questions were:

  1. What is the validity and reliability of the assessment tool?
  2. Where do students experience greater levels of CT skill achievement?
  3. Are students making gains in learning CT skills over time?
  4. What is the usability and scalability of the tool?

To answer the first question, we examined the inter-rater reliability of the tool, as well as compared CTWC assessment scores with students’ final grades. We conducted a cross-sectional analysis by comparing diverse CT and WC learning experiences in different contexts, namely our mandatory semester I and II cross-college writing courses, where CTWC skills are taught explicitly and reinforced as course learning outcomes; vocationally-oriented courses in police foundations, where the skills are implicitly embedded as deemed essential by industry; and a critical thinking course in our general arts and sciences programs, where CT is taught as content knowledge.

We also performed a longitudinal analysis by assessing CTWC gains in a cohort of students across two semesters in their mandatory writing courses.

Overall, our tests showed positive results for reliability and validity. Our cross-sectional analysis showed the greatest CT gains in courses where the skill is explicitly taught. Our longitudinal analysis showed only modest gains, indicating that a two-semester span is insufficient for significant improvement to occur.

In terms of usability, faculty agreed that the revised tool was straightforward and easy to apply. However, there was less agreement on the tool’s meaningfulness to students, indicating that further research needs to include student feedback.

Lessons learned:

  • Build faculty buy-in at the outset and recognize workload issues
  • Ensure project team members are qualified
  • For scalability, align project with other institutional priorities

Recommendations:

  • Teach CT explicitly and consistently, as a skill, and over time
  • Strategically position courses where CT is taught explicitly throughout a program for maximum reinforcement
  • Assess and provide feedback on students’ skills at regular intervals
  • Implement faculty training to build a common understanding of the importance of essential skills and their assessment
  • For the tool to be meaningful, students must understand which skills are being assessed and why

Our project will inform Humber’s new Essential Skills Strategy, which includes the development of an institutional learning outcomes framework and assessment process.

A detailed report, including our assessment tool, will be available through HEQCO in the near future. For further information, please contact the authors: vera.beletzan@humber.ca  or paula.gouveia@humber.ca

Blog: National Science Foundation-funded Resources to Support Your Advanced Technological Education (ATE) Project

Posted on August 3, 2016 by  in Blog ()

Doctoral Associate, EvaluATE

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Did you know that other National Science Foundation programs focused on STEM education have centers that provide services to projects? EvaluATE offers evaluation-specific resources for the Advanced Technological Education program, while some of the others are broader in scope and purpose. They offer technical support, resources, and information targeted at projects within the scope of specific NSF funding programs. A brief overview of each of these centers is provided below, highlighting evaluation-related resources. Make sure to check the sites out for further information if you see something that might be of value for your project!

The Community for Advancing Discovery Research in Education (CADRE) is a network for NSF’s Discovery Research K-12 program (DR K-12). The evaluation resource on the CADRE site is a paper on evaluation options (formative and summative), which differentiates evaluation from the research and development efforts carried out as part of project implementation.  There are other more general resources such as guidelines and tools for proposal writing, a library of reports and briefs, along with a video showcase of DR K-12 projects.

The Center for the Advancement of Informal Science Education (CAISE) has an evaluation section of its website that is searchable by type of resource (i.e., reports, assessment instruments, etc.), learning environment, and audience. For example, there are over 850 evaluation reports and 416 evaluation instruments available for review. The site hosts the Principal Investigator’s Guide: Managing Evaluation in Informal STEM Education Projects, which was developed as an initiative of the Visitor Studies Association and has sections such as working with an evaluator, developing an evaluation plan, creating evaluation tools and reporting.

The Math and Science Partnership Network (MSPnet) supports the math and science partnership network and the STEM+C (computer science) community. MSPnet has a digital library with over 2,000 articles; a search using the term “eval” found 467 listings, dating back to 1987. There is a toolbox with materials such as assessments, evaluation protocols and form letters. Other resources in the MSPnet library include articles and reports related to teaching and learning, professional development, and higher education.

The Center for Advancing Research and Communication (ARC) supports the NSF Research and Evaluation on Education in Science and Engineering (REESE) program through technical assistance to principal investigators. An evaluation-specific resource includes material from a workshop on implementation evaluation (also known as process evaluation).

The STEM Learning and Research Center (STELAR) provides technical support for the Innovative Technology Experiences for Students and Teachers (ITEST) program. Its website includes links to a variety of instruments, such as the Grit Scale, which can be used to assess students’ resilience for learning, which could be part of a larger evaluation plan.

Blog: Student Learning Assessments: Issues of Validity and Reliability

Posted on June 22, 2016 by  in Blog ()

Senior Educational Researcher, SRI International

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

In my last post, I talked about the difference between program evaluation and student assessment. I also touched on using existing assessments if they are available and appropriate, and if not, constructing new assessments.  Of course, that new assessment would need to meet test quality standards or otherwise it will not be able to measure what you need to have measured for your evaluation. Test quality has to do with validity and reliability.

When a test is valid, it means that when a student responds with a wrong answer, it would be reasonable to conclude that they did so because they did not learn what they were supposed to have learned. There are all kinds of impediments to an assessment’s validity. For example, if in a science class you are asking students a question aimed at determining if they understand the difference between igneous and sedimentary rocks, yet you know that some of them do not understand English, you wouldn’t want to ask them the question in English. In testing jargon, what you are introducing in such a situation is “construct irrelevant variance.” In this case, the variance in results may be as much due to whether they know English (the construct irrelevant part) as to whether they know the construct, which is the differences between the rock types. Hence, these results would not help you determine if your innovation is helping them learn the science better.

Reliability has to do with test design, administration, and scoring. Examples of unreliable tests are those that are too long, introducing test-taking fatigue that interfere with their being reliable measures of student learning. Another common example of unreliability is when the scoring directions or rubric are not designed well enough to be sufficiently clear about how to judge the quality of an answer. This type of problem will often result in inconsistent scoring, otherwise known as low interrater reliability.

To summarize, a student learning assessment can be very important to your evaluation if a goal of your project is to directly impact student learning. Then you have to make some decisions about whether you can use existing assessments or develop new ones, and if you make new ones, they need to meet technical quality standards of validity and reliability. For projects not directly aiming at improving student learning, an assessment may actually be inappropriate in the evaluation because the tie between the project activities and the student learning may be too loose. In other words, the learning outcomes may be mediated by other factors that are too far beyond your control to render the learning outcomes useful for the evaluation.

Blog: Using Learning Assessments in Evaluations

Posted on June 8, 2016 by  in Blog ()

Senior Educational Researcher, SRI International

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

If you need to evaluate your project, but get confused about whether there should be a student assessment component, it’s important to understand the difference between project evaluation and student assessment. Both are about rendering judgments about something and they are often used interchangeably. In the world of grant-funded education projects however, they have overlapping yet quite different meanings. When you do a project evaluation, you are looking at to what extent the project is meeting its goals and achieving its intended outcomes. When you are assessing, you are looking at student progress in meeting learning goals. The most commonly used instrument of assessment is a test, but there are other mechanisms for assessing learning as well, such as student reports, presentations, or journals.

Not all project evaluations require student assessments and not all assessments are components of project evaluations. For example, the goal of the project may be to restructure an academic program, introduce some technology to the classroom, or get students to persist through college. Of course, in the end, all projects in education aim to improve learning. Yet, by itself, an individual project may not aspire to directly influence learning, but rather influence it through a related effort. In turn, not all assessments are conducted as components of project evaluations. Rather, they are most frequently used to determine the academic progress of individual students.

If you are going to put a student assessment component in your evaluation, answer these questions:

  1. What amount of assessment data will you need to properly generalize from your results about how well your project is faring? For example, how many students are impacted by your program? Do you need to assess them all or can you limit your assessment administration to a representative sample?
  2. Should you administer the assessment early enough to determine if the project needs to be modified midstream? This would be called a formative assessment, as opposed to a summative assessment, which you would do at the end of a project, after you have fully implemented your innovation with the students.

Think also about what would be an appropriate assessment instrument. Maybe you could simply use a test that the school is already using with the students. This would make sense, for example, if your goal is to provide some new curricular innovations in a particular course that the students are already taking. If your project fits into this category, it makes sense because it is likely that those assessments would have already been validated, which means they would have been piloted and subsequently modified as needed to ensure that they truly measure what they are designed to measure.

An existing assessment instrument may not be appropriate for you, however. Perhaps your innovation is introducing new learnings that those tests are not designed to measure. For example, it may be facilitating their learning of new skills, such as using new mobile technologies to collect field data. In this situation, you would want your project’s goal statements to be clear about whether the intention of your project is to provide an improved pathway to already-taught knowledge or skills, or a pathway to new learnings entirely, or both. New learnings would require a new assessment. In my next post, I’ll talk about validity and reliability issues to address when developing assessments.

Blog: Checklists for Evaluating K-12 and Credentialing Testing Programs

Posted on October 14, 2015 by , in Blog ()
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Kosh Cizek
Audra Kosh
Doctoral Student
Learning Sciences and Psychological Studies
University of North Carolina, Chapel Hill
Gregory Cizek
Professor
Educational Measurement and Evaluation
University of North Carolina, Chapel Hill

In 2012, we created two checklists for evaluators to use as a tool for evaluating K-12 and credentialing assessment programs. The purpose of these checklists is to assist evaluators in thoroughly reviewing testing programs by distilling the best practices for testing outlined by various professional associations, including the Standards for Educational and Psychological Testing, the U.S. Department of Education’s Standards and Assessment Peer Review Guidance, the Standards for the Accreditation of Certification Programs, the Code of Fair Testing Practices in Education, and the Rights and Responsibilities of Test Takers.

The checklists were developed to allow evaluation of five aspects of testing: 1) Test Development, 2) Test Administration, 3) Reliability Evidence, 4) Validity Evidence, and 5) Scoring and Reporting. A separate checklist was developed for each area; each of the checklists presents detailed indicators of quality testing programs that evaluators can check off as observed (O), not observed (N), or not applicable (NA) as they conduct evaluations. Three examples of checklist items are included below (one each from the Test Development, Test Administration, and Scoring and Reporting checklists).

The checklists are intended to be used by those wishing to evaluate K-12 or credentialing assessment programs against consensus criteria regarding quality standards for such programs. One of the main sources informing development of the original checklists was the guidance provided in the then-current edition of the Standards for Educational and Psychological Testing (AERA, APA, NCME, 1999). However, much has changed in testing since the publication of the 1999 Standards, and the Standards were revised in 2014 to address emerging methods and concerns related to K-12 and credentialing assessment programs. Consequently, revised checklists have been produced to reflect the new Standards.

The latest edition of the Standards, as compared to the 1999 edition, pays greater attention to testing diverse populations and the role of new technologies in testing. For example, the following three key revisions to the Standards are reflected in the new checklists:

  1. Validity and reliability evidence should be produced and documented for subgroups of test takers. Testing programs should collect validity evidence for various subgroups of test takers from different socioeconomic, linguistic, and cultural backgrounds, as opposed to aggregating validity evidence for an entire sample of test takers. A focus on validity evidence within unique subgroups helps ensure that test interpretations remain valid for all members of the intended testing population.
  2. Tests should be administered in an appropriate language. Given that test takers can come from linguistically diverse backgrounds, evaluators should check that tests are administered in the most appropriate language for the intended population and intended purpose of the test. Interpreters, if used, should be fluent in both the language and content of the test.
  3. Automated scoring methods should be described. Current tests increasingly rely on automated scoring methods to score constructed-response items previously scored by human raters. Testing programs should document how automated scoring algorithms are used and how scores obtained from such algorithms should be interpreted.

Although these three new themes in the Standards illustrate the breadth of coverage of the checklists, they provide only a sample of the changes embodied in the full version of the revised checklists, which contain approximately 100 specific practices that testing programs should follow distilled from contemporary professional standards for assessment programs. The revised checklists are particularly helpful in that they provide users with a single-source compilation of the most up-to-date and broadly endorsed elements of defensible testing practice.  Downloadable copies of the revised checklists for K-12 and credentialing assessment programs can be found at (bit.ly/checklist-assessment).

Blog: Using Embedded Assessment to Understand Science Skills

Posted on August 5, 2015 by , , in Blog (, )
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
CStylinski
Cathlyn Stylinski
Senior Agent
University of Maryland Center
for Environmental Science
KPeterman
Karen Peterman
President
Karen Peterman Consulting
RBKlein
Rachel Becker-Klein
Senior Research Associate
PEER Associates

As our field explores the impact of informal (and formal) science programs on learning and skill development, it is imperative that we integrate research and evaluation methods into the fabric of the programs being studied. Embedded assessments (EAs) are “opportunities to assess participant progress and performance that are integrated into instructional materials and are virtually indistinguishable from day-to-day [program] activities” (Wilson & Sloane, 2000, p. 182). As such, EAs allow learners to demonstrate their science competencies through tasks that are integrated seamlessly into the learning experience itself.

Since they require that participants demonstrate their skills, rather than simply rate their confidence in using them, EAs offer an innovative way to understand and advance the evidence base for knowledge about the impacts of informal science programs. EAs can take on many forms and can be used in a variety of settings. The essential defining feature is that these assessments document and measure participant learning as a natural component of the program implementation and often as participants apply or demonstrate what they are learning.

Related concepts that you may have heard of:

  • Performance assessments: EA methods can include performance assessments, in which participants do something to demonstrate their knowledge and skills (e.g., scientific observation).
  • Authentic assessments: Authentic assessments are assessments of skills where the learning tasks mirror real-life problem-solving situations (e.g., the specific data collection techniques used in a project) and could be embedded into project activities. (Rural School and Community Trust, 2001; Wilson & Soane, 2000)

You can use EAs to measure participants’ abilities alongside more traditional research and evaluation measures and also to measure skills across time. So, along with surveys of content knowledge and confidence in a skill area, you might consider adding experiential and hands-on ways of assessing participant skills. For instance, if you were interested in assessing participants’ skills in observation, you might already be asking them to make some observations as a part of your program activities. You could then develop and use a rubric to assess the depth of that observation.

Although EA offers many benefits, the method also poses some significant challenges that have prevented widespread adoption to date. For the application of EA to be successful, there are two significant challenges to address: (1) the need for a standard EA development process that includes reliability and validity testing and (2) the need for professional development related to EA.

With these benefits and challenges in mind, we encourage project leaders, evaluators, and researchers to help us to push the envelope by:

  • Thinking critically about the inquiry skills fostered by their informal science projects and ensuring that those skills are measured as part of the evaluation and research plans.
  • Considering whether projects include practices that could be used as an EA of skill development and, if so, taking advantage of those systems for evaluation and research purposes.
  • Developing authentic methods that address the complexities of measuring skill development.
  • Sharing these experiences broadly with the community in an effort to highlight the valuable role that such projects can play in engaging the public with science.

We are currently working on a National Science Foundation grant (Embedded Assessment for Citizen Science – EA4CS) that is investigating the effectiveness of embedded assessment as a method to capture participant gains in science and other skills. We are conducting a needs assessment and working on creating embedded assessments at each of three different case study sites. Look for updates on our progress and additional blogs over the next year or so.

Rural School and Community Trust (2001). Assessing Student Work. Available from http://www.ruraledu.org/user_uploads/file/Assessing_Student_Work.pdf

Wilson, M., & Sloane, K. (2000). From principles to practice: An embedded assessment system. Applied Measurement in Education, 13(2), 181-208. Available from http://dx.doi.org/10.1207/S15324818AME1302_4

Report: Assessing the Impact and Effectiveness of Professional Development in the ATE Program

Posted on October 1, 2014 by , in Resources ()

The purpose of this report is to describe and assess ATE professional development experiences and to aid community colleges through the nation in their efforts to meet the new challenges posed by rapidly developing high technology sectors.

File: Click Here
Type: Report
Category: ATE Research & Evaluation
Author(s): Karen Powe, Normal Gold