Option 1: Design tests to duplicate the diploma exam’s norms and items as closely as possible.
Consider my graphical analysis of the province-wide results on the January 2015 Mathematics 30-1 diploma exam:
Each star represents 100 test-takers.
Note the dramatic difference in the number of students who received F for their school-awarded mark (about 400) compared to the number of students who received F for their diploma examination mark (about 2500). One interpretation is that teachers across the province are awarding marks in a manner consistent with competency-based learning, whereas the province is designing the Mathematics 30-1 diploma exam to some extent as a fair competition. In my opinion Alberta Education develops many items that are artificially difficult. For example, several items tested four separate sub-items. If a student answers any one of the sub-items incorrectly, the student loses the entire mark that is at stake. By testing four different things in one item, Alberta Ed is emphasizing a more content-based learning model. This is not necessarily a wrong thing to do. It all depends on what one is trying to accomplish and why. If the diploma exams were not a fair competition then perhaps the universities would need to employ SATs or ACTs to select students. Alberta’s post-secondaries accept diploma exams as valid measurements of students’ aptitudes. Therefore, perhaps diploma exams are the assessment model that should be mimicked.
Criticism of Option 1
Duplicating the provincial results in our department would place our students on Procrustean beds by locking our assessments onto a normative scale instead of assessing competencies directly. Some diploma exam items introduce difficulty for the sake of difficulty so that grades are suppressed. Alberta Ed needs the Math 30–1 diploma exam to be, to a significant degree, a competition, and not entirely a competency-based assessment, so that students are placed in appropriate post-secondary programs. (They could simply scale the marks with no loss in validity, but doing so would not be politically palatable.)
Consider these curious stats I data mined from the January 2015 Mathematics 30-1 diploma exam School Report:
- The average score on questions designated as Standard of Excellence was 65%.
- The average score on questions designated as Acceptable Standard was 66%. Incredibly, these two classes of items (Standard of Excellence and Acceptable Standard) have, on average, about the same level of difficulty
- Numerical-resonponse item 6: 46% of Alberta students answered this item correctly. This item is at the lowest cognitive level, “Procedure,” and is at the Acceptable Standard level. This is one of the most difficult items on the test.
- Multiple-choice item 16: 87% of Alberta students answered this item correctly. This item is at the highest cognitive level, “Problem Solving,” and is at the Standard of Excellence. This is one of the easiest items on the test.
I have often heard that difficulty is independent of cognitive levels or the standards. This leads me to ask: What, then, is the purpose of difficulty? It’s not mentioned in the curriculum. My only conclusion is that difficulty is being used to make diploma exams a competition (presumably, a fair one).
I see the Math 30-1 diploma exam as partly an arms race: as teachers become better at teaching the course material and students become better at answering the trickier questions by finding calculator-based workarounds and by using other heuristics, the diploma exam needs to be made progressively more difficult to keep marks down. This happened with Pure Math 30. See: http://acfonthesamepage.blogspot.ca/2014/09/provincial-assessment-for-mathematics.html.
Option 2: Build tests directly from specification in the Program of Studies
Another option is to develop test blueprints based on a fair interpretation of the achievement indicators precisely as they are stated—i.e., develop exam items that meet the defined standard (acceptable or excellent), but are not more difficult than they need to be (no artificial difficulty just for the sake of beating marks down). This method diminishes emphasis on the competitive aspects of testing and places more emphasis on competency; gives the benefit of the doubt to our students since it is more likely to test math skills than IQ; and is more consistent with a UDL-based assessment philosophy. (CAST’s definition of UDL: “UDL is intended to increase access to learning by reducing physical, cognitive, intellectual, and organizational barriers to learning, as well as other obstacles.”) We could stop trying to second-guess the ever-morphing diploma exam. Equivalency exams could be designed according to Alberta Ed’s own standards rather than to mimic the diploma exam’s difficulty and item styles. Moreover, the equivalency exam and diploma exam are now weighted at 30% instead of 50%. Therefore, perhaps more effort should be employed in diversifying assessments beyond just tests rather than chasing the all–multiple-choice diploma exams.
Criticism of Option 2
Perhaps Option 2 is excessively soft-hearted and insufficiently hard-headed. One purpose of assessment is to gather evidence that can be used when exhorting teachers and students to strive for ever increasing excellence. How can anyone achieve excellence without first surmounting significant difficulty? Moreover, extraordinary efforts and resources are invested to ensure that diploma exams are highly valid.
Conclusion
A blend of the two options is possible, and there are many other valid ways and reasons to assess. I don’t have all the answers to the initial questions, but I hope I have provided some valid explorations of them.
Regards,
Michael
Previous blog postings
No comments:
Post a Comment
We love comments. Why else would be post? Let us know what you like. Add your own thoughts. And if comments are not enough, send us a post.