- Research article
- Open Access
Development of a new valid and reliable microsurgical skill assessment scale for ophthalmology residents
BMC Ophthalmology volume 18, Article number: 68 (2018)
More and more concerns have been arisen about the ability of new medical graduates to meet the demands of today’s practice environment. In this study, we wanted to develop a valid, reliable and standardized assessment tool for evaluating the basic microsurgical skills of residents in a microsurgery laboratory, to get them well prepared before entering the surgical realm of ophthalmology.
Twenty-three experts who have teaching experience reviewed the assessment scale. Constructive comments were incorporated to ensure face and content validity. Twenty-one attendings from different specialties then graded eight corneal rupture suturing videos with the scale to investigate interrater reliability. Fourteen of them graded the same videos 3 months later to investigate intrarater reliability (repeatability).
A total of 280 assessment scales were completed. All the ICC values of interrater reliability were greater than 0.8 with 75% data greater than 0.9 (range 0.860–0.976). All the ICC values of intrarater reliability (repeatability) were also greater than 0.8 with 63% data greater than 0.9 (range 0.833–0.954).
The assessment scale we developed is valid and reliable. This tool could be useful to ensure that junior residents achieve a certain level of microsurgical technique in a laboratory environment before training in the operation room. Hopefully, this tool will provide a structured template for other residency programs to assess their residents for basic microsurgical skills.
Along with the development of ophthalmic medical education, the training of surgical skills has become a key part of it. More and more educators have realized the importance of residents’ competence in the operating room; however, the traditional methods for assessing surgical skills are largely subjective. Those methods were lack of standardization, consistency and reliability. Moreover, for the student assessed, they didn’t know the standards and goals of surgical training. In order to change the condition, educators worldwide had done a lot of work. A variety of surgical competency assessment tools had been developed by international ophthalmic educators, such as OASIS (Objective Assessment of Skills in Intraocular Surgery), GRASIS (Global Rating Assessment of Skills in Intraocular Surgery), OSACSS (Objective Structured Assessment of Cataract Surgical Skill) and OSCAR (Ophthalmology Surgical Competency Assessment Rubric), and the feedback from experts and application of those assessments showed excellent results [1,2,3,4,5,6,7]. By far, most of the assessments focus on the performance of residents during real-life operations, especially cataract surgeries.
China is a developing and industrialized country. Ocular rupture especially corneal rupture is a common and dangerous ophthalmic emergency, which usually is residents’ first independent real-life surgery. Prompt and meticulous wound management may reduce severe postoperative complications such as wound leak and endophthalmitis . Thus, residents should be well prepared before they go into the operation room. What’s more, suturing technique is a critical and fundamental part of microsurgery. Standardized and adept micromanipulation and suturing would pave the way for entering the surgical realm of ophthalmology. Therefore, in Shanghai, suturing corneal rupture on pig eyes is mandated to be one of the periodical exams of residency program. Appropriate evaluation of this procedure is essential because weaknesses in training and teaching are difficult to correct without factual data [9, 10]. Since no rating assessment for suturing corneal rupture has been created before, Chinese ophthalmic education workers need to develop a comprehensive assessment scale in response to the current demand. In this study, we aimed to establish an efficient and reliable assessment scale for suturing corneal rupture to ensure the basic surgical competency of residents.
This study was approved by the Ethics Committee of Shanghai General Hospital. All the operations were performed in a microsurgery laboratory using pig eyes (Fig. 1a). Each resident was given detailed information of what they were going to perform. The ruptures were “L” shaped involving the limbus. First, we made a full-thickness horizontal incision (about 6 mm) from 9 o’clock limbus to central cornea. The incision was then extended down for another 3 mm vertically (Fig. 1b). All necessary instruments, as well as distracter instruments, were laid out on the table. The whole process from gloves on to gloves off was videotaped and stored for later view. Senior attendings from different specialties were asked to watch those recorded videos and finish the assessment scales accordingly. The videotapes were chosen from residents at different rotating levels to include a range of surgical skills, and evaluators were blinded to the resident’s level of training. What’s more, 3 month later, each attending was asked to watch the same videos and complete the scales again. In order to avoid the recall of the last scoring, the playing order of the videos was changed.
Validity of the assessment scale
A questionnaire was created (Fig. 2) to evaluate the scale’s face validity (i.e., the extent to which the components address the vital aspects) and content validity (i.e., the extent to which the components assess resident competency and skill) [3, 7]. The questionnaire along with the assessment scale was sent to experts from several teaching and research offices including one member of the committee of Shanghai standardized residency program, and then the scale was revised according to their comments and suggestions.
Reliability and repeatability of the assessment scale
Senior attendings from different specialties were included in this evaluation to achieve a broad representation. The interrater reliability of different observers as well as the intrarater reliability of the same observer (repeatability) was tested using the intraclass correlation coefficient (ICC) . The ICC is defined as the ratio of the between-subjects variance to the sum of the combined within-subjects and between-subjects variance . ICC can very between 0 and 1, with 1 indicating perfect agreement. It should be greater than 0.7 in order for newly developed scales to be considered reliable [13,14,15]. We calculated the ICC using SPSS version 13.0 (Chicago, IL, USA). Considering the fact that we had a sample group of observers and cases, we used the Two-Way Random model. The Single Measures results were used to evaluate repeatability, and the Average Measures results were used for reliability. The significance level and confidence coefficients were set to 0.05 and 0.95, respectively.
Validity of the assessment scale
Twenty-three experts completed the questionnaire, and the results of the questionnaire were noted in Table 1. Four experts recommended adding an assessment of “preoperative preparation and postoperative cleaning up” to the scale since the videotapes contained those parts and they were aspects of surgical skills. Two experts expressed that some of the descriptors were too explicit and burdensome to read and simplification may be better. Three experts suggested to use separated rating scales for “knotting”, “knots tightness”, and “knots exposure”. One expert commented to add “Suturing” to the scale to assess the general suturing performance of the students such as needle load and needle entry. Five experts felt there was no need to include an assessment of “abnormal events management”. All comments and suggestions were considered, and appropriate suggestions were incorporated into the assessment scale, thus establishing a level of face and content validity .
The finalized assessment scale was shown in Table 2. This assessment scale includes 6 measures of basic surgical skills (preoperative preparation, microscope use, instrument handling, hands coordination, postoperative clean up and overall performance) and 9 measures of the stages of suturing (suturing, suturing order, sutures interval, sutures width, sutures depth, knotting, knots tightness, knots exposure and wound leakage and anterior chamber formation), which are rated on a 5-point Likert scale, with each point anchored by explicit behavioral descriptors.
Reliability and repeatability of the assessment scale
Twenty-one attendings from different specialties finished 8-videotaped corneal suturing surgeries and completed the assessment scales accordingly for the first time. Specialties represented were cataract (4), glaucoma (3), cornea (3), strabismus (1), and retina (10). Only 14 attendings finished the scale again 3 month later. A total of 280 assessment scales were completed. All experts expressed that they could complete the scale within 5 min.
The interrater reliability of each surgical procedure step and overall score, considering 21 observers together, was summarized in Table 3. All the ICC values were greater than 0.8 with 75% data greater than 0.9. “Microscope use” Showed the highest reliability (0.976, 95%CI 0.942–0.994). The intrarater reliability (repeatability) of each step and overall score was listed in Table 4. All data were greater than 0.8, with 63% data greater than 0.9. “Suturing order” showed the highest repeatability (0.954, 95%CI 0.934–0.968).
Investigations suggested a trend towards enhanced acquisition of microsurgical skill in students allowed to practice microsurgery on all kinds of simulators and/or in the wet laboratory [16,17,18]. Nevertheless, in the early twenty-first century, the ophthalmic education of residents in China was unstructured and of variable quality. There were more and more concerns arising about the ability of new medical graduates to meet the demands of today’s practice environment. Thus, China started the residency program about 10 years ago and Shanghai was one of the pilot cities. Up to now, each city is still responsible for its own resident training and examination. In Shanghai, the committee of ophthalmic resident training standardized the program as 3 years of ophthalmology education, and every year they will attend an annual ophthalmology residency-in-training examination. The major purpose of those examinations is to evaluate residents’ competence in 4 aspects: (1) medical knowledge, (2) patient care and communication skills, (3) case-based learning and analyzing, and (4) surgical skills. Suturing technique is a critical and fundamental part of microsurgery. Standardized and adept micromanipulation and suturing would pave the way for entering the surgical realm of ophthalmology. Therefore, the surgical skills of junior residents are assessed by performance on suturing corneal rupture on pig eyes. This kind of examination has been held for 5 years and the ophthalmic educators found out that the traditional scoring method might be unreliable due to grade inflation and overt subjective assessments [10, 19, 20]. Residency examination is supposed to enable competence in all aspects by collecting performance data that reliably and accurately reflects the resident’s real ability. Thus, a valid and reliable assessment tool is desperately needed.
To our knowledge, this is the first throughout assessment scale for corneal rupture suturing in wet laboratory. Fisher et al.  developed a phacoemulsification/wound construction and suturing technique assessment scale for ophthalmology residents, but suturing technique assessment was only part of the scale containing 8 general items. The scale was simple and only had 2 choices (not done/incorrect and done correctly). There was no behavioral or skill-based rubric for the observers to use when assessing the resident’s performance. Feldman et al.  used a corneal laceration repair assessment to evaluate microsurgical skill improvement after training on the simulator. However, the assessment was totally objective and only measured suture depth, bite size and suture spacing. In this study, we created a comprehensive, globally applicable assessment scale to evaluate the key components of corneal rupture suturing. This assessment scale breaks down to 15 essential items including 6 measures of basic surgical skills and 9 measures of the stages of suturing, with basic skill measures similar to that employed in GRASIS and OSCAR. Moreover, the scale is rated on a 5-point Likert scale with behavioral anchors for each level in each step of the surgical procedure.
The reliability and repeatability of the assessment tools mentioned above were seldom detected. In this study, we investigated validity, reliability and repeatability of our assessment scale. For validity, we asked 23 experts from different teaching and research offices, and all the comments were considered and appropriate suggestions were incorporated into the assessment scale. Therefore, a level of face and content validity was established. Considering the reliability for the entire group of 21 observers, the ICC values were higher than 0.8 (range 0.860–0.976) in all 15 individual categories as well as the overall score, indicating reliability of the tool as a whole. What’s more, the assessment scale yielded very good repeatability, with ICC values ranging from 0.833 to 0.954. An assessment scale is considered to give almost perfect outcomes when ICC value is 0.75 and above [13, 15, 22].
Drawbacks of the assessment scale are that it is relatively simple and it cannot provide information about resident’s judgment and handling of complications on real operations. However, it is a standardized tool that can be used to determine whether a resident is adequately prepared, in terms of their basic microsurgical skills, to enter the operating room. The “passing” threshold could be set at a score of > 3 for each item on the 5-point Likert scale. In addition, process in the wet laboratory can be standardized so that each resident is assessed under comparable circumstances, and ophthalmic educators can easily track their improvements or adjust the complexity to train residents of different rotating levels by changing the rupture (straight/ “Y” shaped rupture, with/without limbus).
In this study, we aimed to create a standardized tool to assess basic surgical skills and to improve overall process of early surgical education. In summary, the assessment scale we developed is valid and reliable. It is an analytical scoring system that contains observable and measurable components of surgical performance. It will help educators to reduce the subjectivity of the assessment and clearly express to the residents what is expected to obtain competence. Hopefully, this tool will provide a structured template for other residency programs to assess their residents for basic surgical skills.
Global rating assessment of skills in intraocular surgery
Intraclass correlation coefficient
Objective assessment of skills in intraocular surgery
Objective structured assessment of cataract surgical skill
Ophthalmology surgical competency assessment rubric
Fisher JB, Binenbaum G, Tapino P, Volpe NJ. Development and face and content validity of an eye surgical skills assessment test for ophthalmology residents. Ophthalmology. 2006;113:2364–70.
Cremers SL, Ciolino JB, Ferrufino-Ponce ZK, Henderson BA. Objective assessment of skills in intraocular surgery (OASIS). Ophthalmology. 2005;112:1236–41.
Cremers SL, Lora AN, Ferrufino-Ponce ZK. Global rating assessment of skills in intraocular surgery (GRASIS). Ophthalmology. 2005;112:1655–60.
Feldman BH, Geist CE. Assessing residents in phacoemulsification. Ophthalmology. 2007;114:1586.
Saleh GM, Gauba V, Mitra A, Litwin AS, Chung AK, Benjamin L. Objective structured assessment of cataract surgical skill. Arch Ophthalmol. 2007;125:363–6.
Golnik KC, Beaver H, Gauba V, Lee AG, Mayorga E, Palis G, et al. Cataract surgical skill assessment. Ophthalmology. 2011;118:427. e1-5
Golnik KC, Haripriya A, Beaver H, Gauba V, Lee AG, Mayorga E, et al. Cataract surgical skill assessment. Ophthalmology. 2011;118:2094–e2.
Kong GY, Henderson RH, Sandhu SS, Essex RW, Allen PJ, Campbell WG. Wound-related complications and clinical outcomes following open globe injury repair. Clin Exp Ophthalmol. 2015;43:508–13.
Scott DJ, Valentine RJ, Bergen PC, Rege RV, Laycock R, Tesfay ST, et al. Evaluating surgical competency with the American Board of Surgery in-Training Examination, skill testing, and intraoperative assessment. Surgery. 2000;128:613–22.
Moorthy K, Munz Y, Sarker SK, Darzi A. Objective assessment of technical skills in surgery. BMJ. 2003;327:1032–7.
Koch GG. Intraclass correlation coefficient; in Kotz S, Johnson NL (eds): encyclopedia of statistical sciences 4. New York: Wiley; 1982. p. 213–7.
Meyer JJ, Gokul A, Vellara HR, Prime Z, McGhee CN. Repeatability and agreement of Orbscan II, Pentacam HR, and Galilei tomography Systems in Corneas with Keratoconus. Am J Ophthalmol. 2017;175:122–8.
Zaki R, Bulgiba A, Nordin N, Azina IN. A systematic review of statistical methods used to test for reliability of medical instruments measuring continuous variables. Iran J Basic Med Sci. 2013;16:803–7.
Cronbach LJ, Shavelson RJ. My current thoughts on coefficient alpha and successor procedures. Educ Psychol Meas. 2004;64:391–418.
Barraquer RI, Pinilla Cortés L, Allende MJ, Montenegro GA, Ivankovic B, D'Antin JC, et al. Validation of the nuclear cataract grading system BCN 10. Ophthalmic Res. 2017;57:247–51.
Thomsen AS, Subhi Y, Kiilgaard JF, la Cour M, Konge L. Update on simulation-based surgical training and assessment in ophthalmology: a systematic review. Ophthalmology. 2015;122:1111–30. e1
Bourcier T, Chammas J, Becmeur PH, Sauer A, Gaucher D, Liverneaux P, et al. Robot-assisted simulated cataract surgery. J Cataract Refract Surg. 2017;43:552–7.
Thomsen AS, Bach-Holm D, Kjærbo H, Højgaard-Olsen K, Subhi Y, Saleh GM, et al. Operating room performance improves after proficiency-based virtual reality cataract surgery training. Ophthalmology. 2017;124:524–31.
Lee AG, Carter KD. Managing the new mandate in resident education: a blueprint for translating a national mandate into local compliance. Ophthalmology. 2004;111:1807–12.
Mills RP, Mannis MJ. American Board of Ophthalmology Program Directors’ task force on competencies. Report of the American Board of Ophthalmology Task Force on the competencies. Ophthalmology. 2004;111:1267–8.
Feldman BH, Ake JM, Geist CE. Virtual reality simulation. Ophthalmology. 2007;114:828. e1-4
Dong J, Jia YD, Wu Q, Zhang S, Jia Y, Huang D, et al. Interchangeability and reliability of macular perfusion parameter measurements using optical coherence tomography angiography. Br J Ophthalmol. 2017;101:1542–9.
This work was supported by National Natural Science Foundation of China (81600704), Interdisciplinary Program of Shanghai Jiao Tong University (YG2015QN19), and Shanghai Ophthalmology Practical Training Platform Construction Grant. The grants had no role in the design or conduct of this research.
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Ethics approval and consent to participate
This study was approved by the Ethics Committee of Shanghai General Hospital. Written informed consent was obtained from all residents.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Zhang, Z., Zhou, M., Liu, K. et al. Development of a new valid and reliable microsurgical skill assessment scale for ophthalmology residents. BMC Ophthalmol 18, 68 (2018). https://doi.org/10.1186/s12886-018-0736-z
- Assessment scale
- Cornea suturing
- Medical education
- Microsurgical skill