Tuesday, 17 April 2018

The Importance of Accurate Student Assessment - Take-Aways from the Calgary Regional Consortium’s “Interpreting Diploma Exam Results: A Formative Tool to Enhance Instructional Practices” Professional Learning Session


As the name suggests, the focus of this session was to consider a multitude of factors to appropriately interpret diploma exam results, particularly to consider discrepancies between an individual learner’s school awarded mark (SAM) and diploma exam mark (DIP) as well as averages of SAMs and DIPs province-wide.  Participants were even provided several Excel programs that are used to input data to provide enhanced, visual interpretations of diploma exam results.  While the focus was on HS, the take-aways can apply to all teachers at any level because it matters to all of us that a learner’s grades accurately reflect his or her skills.

In relation to the reports provided by Alberta Education, it is interesting to look at provincial averages in comparison to our learners’ averages.  But before throwing our hands in the air in defeat, or alternatively jumping for joy, we should consider that our small sample size at BVC doesn’t give us very much data to work with in any particular exam sitting.  Instead of looking at a single year, we should consider trends through the multi-year reports and also reflect on whether the demographics of our learners are representative of the demographics of the province.

What we can consider at BVC for a particular exam sitting is any discrepancies between a learner’s SAM and DIP.  The province-wide trend is for DIPs to be lower than SAMs.  But why?  

Tim Coates, a former Social Studies teacher who served as the Director of the Diploma Examination Program Branch at Alberta Education from 2005 – 2014, facilitated the session and offered some points for consideration.

  • A bad day at the exam.  It can happen that a learner may have a bad day at the exam and underperform. 
  • Grades for non-outcomes.  Awarding or reducing grades for attendance, participation, attitude, effort, or other behaviours such as turning work in early/late can impact learners’ grades substantially regardless of their ability in a course.  While instructors may be frustrated that a learner does not take his or her work seriously, or delighted that he or she makes a tremendous effort, these things should not directly inform any part of a learner’s grade.
  • Grades for “polluted” data.  Coates provided a memorable example.  Imagine you need a lab test.  You go to a lab and provide your sample.  The technician then tells you, “we’ll give you your results just as soon as we have samples from three other people; we’re going to mix your samples, test the results, and then distribute the results among the group.”  Clearly, this would not work to measure an individual’s lab results, and it doesn’t work to measure an individual learner’s ability in a group work situation, either.  We need “clean” data.
  • Grades for formative assessment and resubmission of assignments.  We can agree that learning happens through practice and assessing this practice serves as formative assessment for learning, and therefore it should not be counted as a weighted grade.  But what about allowing a learner to take feedback from a summative assessment, make improvements, and then resubmit the assessment?  Clearly the learner’s grade will improve.  Good.  But is this an actual representation of a learner’s ability to do the work him or herself?  Coates emphasized that this method of resubmission and reassessment should be considered formative, not summative, and therefore there should be no weight associated with the grade.  For learners to demonstrate their mastery of a skillset, they need a new task to independently show what they know.
  • Assessments are unreliable.  Coates explained in considerable length the process of assessing the reliability of exams and showed a variety of calculations we can make to determine both the difficulty and discrimination of our exams.  (This is not for an ELA teacher to attempt to explain.)  Through this, he stressed that we consider the purpose of our assessments: through our assessments, can we determine the difference between an 80% and 90% learner, or a 30% and 50% learner?  Does the assessment make it possible for everyone to perform reasonably well despite not knowing much, or perform only average in spite of knowing a lot?  In other words, does our assessment discriminate?  If the assessments we provide our learners throughout the course are not able to really showcase what he or she does or does not know, their SAM is not going to correlate to their DIP.  
    •  As a side note, this is the purpose of analytics programs such as Form Return (which we used in our department for a short time) and Smarter Grades (a newer program which we are investigating) that can make calculations for multiple choice and short answer tests and show the results of the data, even aggregating the data from multiple classes, terms, or years.  Although it would mean considerable backend work, use of such a program to analyze the reliability of unit and equivalency exams could result in the useful development of better exams.  

Unfortunately, at the time I attended the session, our DIP reports were not available to me, but I am looking forward to analyzing my own learners’ diploma examination results in this new light and I hope to have reasonable access to the data in a timely manner after future exam sittings.  I think looking at the results within subject discipline groups will foster useful discussion about exam and assessment reliability and consistency between classes.

This session proved to be a good reminder that all assessments should reflect the outcomes, the entire set of outcomes, and only the outcomes of the course.  It also brought me back to the old conflict between what I feel is philosophically sound and what I feel is practically and/or usefully managed.  For instance, how do we survive the marking load from multiple learners who fail to complete assessments on time and then submit a horde of them the last week of the term?  Coates reasons that it is not our job to teach the learner a lesson in responsibility and that our obligation is to provide the most accurate report of learning; therefore, no assignment is too late and no marks should be deducted for being late.  So if we do accept late assignments and allow them full grades as Coates suggests, do we also laboriously make comments on the late work, or can we say the point of providing feedback is now lost?  Or what about formative assessments (i.e. a writing assignment not for grade but for feedback) where the associated summative assessment is already past due?  Or what about denying access to a discussion forum after a new forum has started and most of the class has moved on? 

What do you think?  Do you face similar conflicts?  How have you resolved them?

Lorna

No comments:

Post a Comment

We love comments. Why else would be post? Let us know what you like. Add your own thoughts. And if comments are not enough, send us a post.