Monday, 16 September 2013

Item Writing Guide for Math and Science from Alberta Education

An instructor asked me about a guidebook/primer produced by Alberta Education that is used to train their item writers. I often refer to this guidebook when writing math and science items: on the department drive, navigate to “STAFF FOLDERS\Michael Gaschnitz\OTSP” to find Alberta Education’s Writing Effective Machine-Scored Questions for Mathematics & Science.

This 84-page document is divided into several helpful sections:

In Why Use Multiple-Choice Questions and Why Use Numerical Response Questions, the situations best-suited to each item type are addressed. Positive features and cautions are listed for each type.

In Test Wiseness Quiz, a quiz is provided that demonstrates several items that can be answered by testwise students who may actually be ignorant of the concept the item is trying to test. We want our tests to be as testwiseness-proof as possible in order to maintain the validity of scores. Otherwise, we end up testing a kind of intelligence or street-smarts instead of the relevant content area.

In Guidelines for Writing Multiple-Choice Questions and Guidelines for Writing Numerical-Response Questions, 25 or so specific guidelines are provided, such as “do not use 5 as the deciding factor for rounding.” Most of them also appear in the broader item-writing research and literature.

In Commonly Used Question Formats and Stems, stem templates are provided. This allows for what is called “item modelling,” a strategy item-writers use to help prevent the deeply-dreaded state of item-writer’s-block. If item writers are stumped when trying to create an item for a particular concepts, they can look to these templates for some inspiration.

In Blueprints for Tables of Test Specifications, test blueprinting guidelines are provided. One curious bit of advice states that “In general, 40% to 50% of the marks should be targeted for the ‘acceptable standard’ and 25% for the ‘standard of excellence’ with the rest in between.” The remaining 30% are in a category described in the Chemistry 30 Information bulletin as “intermediate standard.” However, no information bulletin that I know of clearly describes the intermediate standard.

In Example: Unit Test Blueprints, a Physics 30 blueprint is provided that categorizes items as easy, medium or hard. I’ve never seen that schema in any information bulletin for any course, but it seems quite intuitive.

In Fix-It Questions, several problematic items are provided and we are asked to identify the problem and improve the item. Answers are provided on p.56.

This document is from the “Writing Effective Machine Scored Questions” session delivered by the science and mathematics diploma exam managers, and was hosted by the Calgary Regional Consortium (http://www.crcpd.ab.ca/). I highly recommend these sessions as you can obtain an abundant amount of information that is never posted at http://education.alberta.ca/. For example, here are few “rules” for marking numerical-response items that I’ve never seen in writing:

Blanks (columns without an entry) are generally ignored or are assumed zero. Although the instructions say to left-justify all responses, right-justification is also accepted. For example, if the answer is “42,” then any of the following responses are accepted as correct:

4 2 _ _
_ 4 2 _
_ _ 4 2
4 _ _ 2
4 _ 2 _
_ 4 _ 2

If the answer to a numerical response item works out to be 0.776, then any of the following responses are accepted as correct:

0 . 7 8
 _ . 7 8
. 7 8 _ 
. _ 7 8
 
In Biology 30 and Chemistry 30, truncated answers are also accepted as correct:
0 . 7 7
_ . 7 7
. 7 7 _ 
. _ 7 7

Furthermore, the psychometricians look at the results for a particular numerical response item and may add an additional answer to the key if many stronger students provide a particular answer, and if the answer makes sense. In other words, the key for numerical-response items has to evolve as results come in so that we are fair to students. We cannot blindly mark numerical-response items. Item writers often cannot predict every possible correct answer because numerical-response items are “constructed-response” items--to some extent that is why they are being used in place of written-response items. Some numerical-response items are becoming incredibly complex! Other testing agencies besides Alberta Education adjust their numerical-response keys as well, so this is, apparently, an accepted practice.

Regards, Michael

No comments:

Post a Comment

We love comments. Why else would be post? Let us know what you like. Add your own thoughts. And if comments are not enough, send us a post.