Assessing professional competence: from methods to programmes

Download 65.8 Kb.
Size65.8 Kb.
1   2   3   4   5   6   7   8   9   ...   12

Reliability refers to the reproducibility of the scores obtained from an assessment. It is generally expressed as a coefficient ranging from zero (no reliability) to 1 (perfect reliability). Often 0.80 is regarded as the minimal acceptable value, although it may be lower or higher depending on the exam's purpose (for instance, for a licensing examination it will have to be higher). Reliability can be negatively affected by many sources of error or bias, and research has provided conclusive evidence that if we want to increase reliability we will have to ensure that our sampling takes account of all these unwanted sources of variance. A good understanding of the issues involved in sampling may offer us many more degrees of freedom in test development.

The predominant condition affecting the reliability of assessment is domain or content specificity, because competence is highly dependent on context or content. This means that we will only be able to achieve reliable scores if we use a large sample across the content of the subject to be tested 8. . If the assessment involves other conditions with a potential effect on reliability - such as examiners and patients - careful sampling across those conditions is equally essential. With intelligent test designs, which sample efficiently across conditions (such as using different examiners for each station in an OSCE) reliable scores will generally be obtained within a reasonable testing time.

So far nothing new. What is new, however, is the recent insight that reliability is not conditional on objectivity and standardisation. The fact that objectivity and reliability are often confused was addressed theoretically some time ago 9. , but the empirical evidence is becoming convincingly clear now and may point the way to new directions in assessment. To illustrate our point, let us look at the OSCE. The OSCE was developed as an alternative to the then prevailing subjective and unreliable clinical assessment methods, such as vivas and clinical ratings. The main perceived advantage of the OSCE was objectivity and standardisation, which were regarded as the main underpinnings of its reliability. However, an abundance of evidence from studies has by now shown that the reliability of an OSCE is contingent on careful sampling, particularly across clinical content, and an appropriate number of stations, which generally means that several hours of testing time are needed 10. . What actually occurred was that the brevity of the clinical samples (leading to a larger sample overall than in previous methods) and the fact that students rotated through the stations (optimal sampling across patients and examiners) led to more adequate sampling, which in turn had a far greater impact on reliability than any amount of standardisation could have had. This finding is not unique to the OSCE. In recent years many studies have demonstrated that reliability can also be achieved with less standardised assessment situations and more subjective evaluations, provided the sampling is appropriate. Table 1 illustrates this by presenting reliability estimates for several instruments with differing degrees of standardisation. For comparative purposes, the reliability estimates are expressed as a function of the testing time needed.

Insert table 1 about here
The comparative data should not be interpreted too strictly since only a single study was included for each type of method and reliability estimations were based on different designs across studies. For our discussion it is irrelevant to know the exact magnitude of the reliability or which method can be hailed as the “winner”. The important point is to illustrate that all methods require substantial sampling and that methods which are less structured or standardised, such as the oral examination, the long case examination, the mini-CEX and the incognito standardised patient method, can be entirely or almost as reliable as other more structured and objective measures. In a recent review, a similar conclusion was presented for global clinical performance assessments 11. . They are not included in table 1 since the unit of testing time is unavailable, but a sufficiently reliable global estimate of competence requires somewhere between 7 and 11 ratings, probably not requiring more than a few hours of testing time. All these reliability studies show that sampling remains the pivotal factor in achieving reliable scores with any instrument and that there is no direct connection between reliability and the level of structuring or standardisation.

This insight has far-reaching consequences for the practice of assessment. Basically, the message is that no method is inherently unreliable and any method can be sufficiently reliable, provided sampling is appropriate across conditions of measurement. An important consequence of this shift in the perspective on reliability is that there is no need for us to banish from our assessment toolbox instruments that are rather more subjective or not perfectly standardised, on condition that we use those instruments sensibly and expertly. Conversely, we should not be deluded into thinking that as long as we see to it that our assessment toolbox exclusively contains structured and standardised instruments, the reliability of our measurements will automatically be guaranteed.

Share with your friends:
1   2   3   4   5   6   7   8   9   ...   12

The database is protected by copyright © 2017
send message

    Main page
mental health
health sciences
gandhi university
Rajiv gandhi
Chapter introduction
multiple choice
research methods
south africa
language acquisition
Relationship between
qualitative research
literature review
Curriculum vitae
early childhood
relationship between
Masaryk university
nervous system
Course title
young people
Multiple choice
bangalore karnataka
state university
Original article
academic performance
essay plans
social psychology
psychology chapter
Front matter
United states
Research proposal
sciences bangalore
Mental health
compassion publications
workplace bullying
publications sorted
comparative study
chapter outline
mental illness
Course outline
decision making
sciences karnataka
working memory
Literature review
clinical psychology
college students
systematic review
problem solving
research proposal
human rights
Learning objectives
karnataka proforma