Skip navigation

Category Archives: formative

By Peter Preston, Poland

Teachers do calculate the average score from tests, but then nothing serious is done with it. Even when the average score is close to the pass mark little statistical comment is made about the glaring problem that this represents. For example, if the average and the pass mark are the same and the population is normally distributed around the average, this means that 50% of the students fail. Can it be considered acceptable for 50% of the candidates to fail an end-of-the-year examination or even worse an end-of-the-course examination?

In fact at our college the last third-year UoE exam failed 80% of the students. Now you would think that a statistically-minded person would immediately start asking questions about validity of the exam. Construct validity – did the items set test the points intended to be tested? Course validity – did the items tested figure in the course syllabus? Is there a proper tie-up between the course syllabus and the test specifications (if the latter exist at all)? Did the distribution of correct responses discriminate between the weak and strong candidates? Were the items either too easy [not in this case] or too difficult? Is there any objective reference to competence standards built into the teaching programme? To ask just a few relevant questions.

I would love to hear that other institutions do use statistical analysis of exam data and look at the variance between different exam sittings using the same exam or different ones, but I wonder if small institutes can ever bring together the required expertese to carry out such work either before the exam goes live or afterwards. It would be great to conduct a poll on this matter to try to assess the use of statistics in the analysis of exam data at as many institutes as possible.

Peter Preston's students in Poland

My own experience inclines me to believe that exams are in fact not so much an educational evaluation of the work being done as a policy instrument to give face validity to the programme. As such one does not need to worry about the quality of the exam since one can adjust the results before publication. Or in the case of my institute the exam can be repeated by order from above until the teachers get the message.

I do not like the cynical manipulation of exam data, so having good quality statistical information and quality control of all documents involved in the course would be the start to a reevaluation of the course and teaching methods. By accurate assessment at the beginning of a course it should be possible to predict the level students could get to after a given number of teaching hours, taking into account the realities of life. By keeping proper statistical records over a few years one would accumulate powerful information. This is what insurance companies do to calculate their premiums.


>By Nik Bramblett – UCF, Orlando FL, USA

Sometimes we need to evaluate L2 socialization skills using an alternative assessment and not a paper test.

Here’s what I would do:

(a) Work with students (using appropriate combination of whole group, breakout small-group, and/or individual/paired strategies) to develop a rubric for a role-playing activity. Discuss what “socialization skills” means and how you might measure mastery of them. Let the students decide what’s important and what they will be graded on (with appropriate guidance from you as necessary, of course).

(b) Have students work in pairs or trios for the assessment… students would randomly select a social problem-solving situation from a collection that you created on cards or whatever… “You need make an important call [make up a specific scenario] and your cell phone is dead; there are two strangers nearby [perhaps it’s a bus stop or whatever]. Interact with those people to solve your problem.” for example. Students would have a brief period to plan/rehearse, and would then more-or-less improv a scene.

(c) Both you and the student audience would use the rubric you designed together (and reviewed clearly and modeled and practiced before these presentations began) to measure the ability of the students to perform whatever specific tasks, roles, etc. you had decided were the measurable objectives. Students’ ability to effectively judge their peers’ performance would (rightly) be part of the grade. This would not only measure the mastery of the skills but also the metacognition behind the skills.

>By Noriko Ishihara – University of Minnesota, USA / Hosei University

[An excellent way to test students language abilities is in a realistic setting. But how can that be done? Noriko Ishihara explains.]

How to do a scenario-based assessment of socializing skills. In my view, it’s very close to assessing sociolinguistic/pragmatic ability, which has usually been done with a situational approach.

In this instruction and assessment, learner language is elicited using realistic scenarios and the teacher chooses from a range of language- and culture-focused features to assess, for example,

– directness, politeness, and formality
– organization/discourse structure
– language form, semantic strategies, word choice
– tone (verbal and non-verbal cues)
– understanding and use of sociocultural norms
– the extent to which the speaker’s intentions match the listener’s most likely interpretation

The selected feature(s) can be assessed using various rubrics and/or checklists by the teacher and learners themselves, which can be used as rather formal assessment or part of everyday instruction/informal assessment. If anyone is interested, I’d be happy to share a paper in press that details this approach with various sample scenarios, learners language, and sample assessment using authentic learner language.

>By Maria Spelleri, Manatee Community College, USA

Every semester I mess around with my scoring systems, trying to find “something better.” I’m not happy patrolling for homework, nor do I like points for attendance because I find myself making all kinds of exceptions for people. I teach in a community college, so, I cannot just assume behavior and habits conducive to college learning; therefore, it seems that part of my grading needs to be for rewarding the development of successful student behavior like completing homework and showing up to class. Yet, I loathe the time-swallowing nickel and dime approach to grading: daily collection of points spread out over many different categories so that no one category seems more important than another, or perhaps in our students’ eyes, all categories seem equally unimportant.

Just as an observation as I mulled my own grading problems, I realized that we give points/ grades for behaviors we want to encourage (attendance, homework, completing a paper according to format), points for amount of effort put into something (bigger tasks get more points, we may reward a quantity of something or a completion of something, quality of research, or we take off points for late submissions), and points for demonstrating achievement (tests, quizzes, essays, presentations). I believe that if students develop certain behaviors and if a certain effort is expended, then the last point, a demonstration of achievement, will almost always occur.

To get away from 2-4 mini-categories of grading (attendance 10%, HW 15%, etc.), I am trying a catch-all category for a larger percentage of the grade called “Specified in-class and out-of class assignments and activities.” My idea is that this category encourages the behavior I want and the amount of effort going into studies, which will then lead to success in the class. This category does not include tests or major class projects like major essays in a writing class or major presentations in a speech class.

As I see fit, I will pre-announce that a specific homework assignment will be for points, that a particular class discussion will receive points for quality of participation, a pair activity writing an introduction will be given points, or a quiz will be for points. Not only do I get a larger, and I believe, more meaningful percentage value, but I also don’t have the daily grind of remembering who participated and to what extent, nor do I have to go around with a grade book like a third grade teacher checking for homework. (Well, I do that, but maybe only once every 4 classes!) My quiz category is enveloped into this mega-category as well. (I don’t care about quiz grades as a measure of evaluation — I leave that for the tests. I give quizzes to keep students on their toes studying and to find areas of weakness.)

I’m almost half way through the semester using this method in 5 courses. I certainly have been less frustrated than when I have to be overly strict, tediously marking every lousy point, or when I am too lax and students walk all over me. We’ll see how it goes. Some class examples: in my high intermediate grammar class, the scoring for the class is Tests- 60% and “specified in-class and out of class assignments and activities”- 40%. My high intermediate writing class does six major papers for 70% of grade, and “specified in-class..” for 30%.