Case Study: The Leadership Test (ID-Leadership).
Are participants who complete a psychometric test twice at an advantage?
Organizations regularly ask candidates to complete psychometric tests as part of their staffing processes. Successful completion of these tests is often a prerequisite for obtaining the position. Can having already completed a test as part of a personal development process favour a candidate? This is the question we have attempted to answer in this White Paper.
Psychometric tests are used for many reasons, including hiring, training and development, compensation, and promotion, or even demotion (EEOC, 1978; U.S. Department of Labor Employment and Training Administration, 1999). In fact, organizations can invest a great deal of time and money in the use of these tests. For example, the Centre for Economics and Business Research (CEBR) estimates that for a typical firm with 20 to 49 employees, it costs about $40,000 to hire one employee (CEBR, 2014). Organizations must therefore ensure that they assess candidates using reliable and valid tests. Also, candidates must view evaluations positively so that they do not view the tests and the entire process as unfair and unethical (Smither, Reilly, Millsap, Pearlman & Stoffey, 1993).
At the same time, the Principles for the Validation and Use of Personnel Selection Procedures of the SIOP (2017) and the EEOC (1978) state that employers should provide opportunities for re-evaluation and re-examination of candidates whenever technically and administratively feasible. Therefore, one or more candidates may have already completed a test in a previous process. This raises the question of whether this situation benefits these individuals (Martin, 2014).
Several research studies have been conducted on retest situations in various jobs (Kelley, Jacobs, & Farr, 1994; Kolk, Born, & der Flier, 2003) and more specifically in the field of law enforcement (Hausknecht, Trevor, & Farr, 2002; Maurer, Solamon, & Troxtel, 1998), in military contexts (Carretta, 1992), for firefighters (Dunlop, Morrison, & Cordery, 2011), and in school admissions processes (Lievens, Buyse, & Sackett, 2005; Lievens, Reeve, & Heggestad, 2007; Puddey, Mercer, Andrich, & Styles, 2014). Also, several types of tests have been reviewed in the literature, including cognitive ability tests (Bartels, Wegrzyn, Wiedl, Ackermann, & Ehrenreich, 2010; Hausknecht, Halpert, Dipaolo, & Gerrard, 2007; Hausknecht et al., 2002; Kulik, Kulik, & Bangert, 1984; Lievens et al., 2005; Lievens et al., 2007), personality tests (Kelley et al…), and other tests that have been used in schools (Lievens, Buyse, & Sackett, 2005; Lievens, Reeve, & Heggestad, 2007; Puddey, Mercer, Andrich, & Styles, 2014), 1994; Walmsley & Sackett, 2013), knowledge and skill tests (Carretta, 1992; Dunlop et al., 2011; Lievens et al., 2005; Van Iddekinge, Morgeson, Schleicher, & Campion, 2011), assessment centers (Brummel, Rupp, & Spain, 2009; Kolk et al.., 2003) and situational judgment tests (Lievens et al., 2005; Lievens & Sackett, 2007; Maurer et al., 1998) that are used in many contexts, including leadership assessment.
Intuitively, one would think that repeated administration of a test would tend to improve results.
Indeed, test-retest situations could decrease anxiety (Hausknecht et. al., 2002; Maurer et. al., 1998; Van Iddekinge et. al., 2011), give a better knowledge of the test format and questions (Hausknecht et. al, 2002; Maurer et. al., 1998; Van Iddekinge et. al., 2011) or increase candidate motivation (Carretta, 1992; Hausknecht, 2010; Van Iddekinge et. al., 2010).
At the same time, one would think that the number of retest attempts would have an impact on the results.
However, studies show inconsistent results. Indeed, some research indicates that there are improvements (Hausknecht et al., 2002) while others show that results improve from test 1 to test 2
but that there is no improvement from test 2 to test 3 (Bartels et al., 2010; Dunlop et al., 2011; Kelley et al., 1994; Puddey et al., 2014; Randall & Villado, 2016).
When it comes to measuring leadership, situational judgment tests (SJTs) are often favoured because they assess how a candidate would react to statements or a scenario in a work context (McDaniel & Nguyen, 2001). However, a recent study (Reichin, 2018) demonstrated that previous experience with a management basket or SJT did not have an impact on the results obtained during a second job interview.
In general, therefore, it can be concluded that repeated administration of a situational judgment test does not have a significant impact on the results obtained. To verify this point, we conducted a study aimed at verifying the evolution of ID-Leadership results when the latter was administered twice to the same candidates.
The ID-Leadership is a situational judgment test that evaluates participants’ leadership characteristics. The test can be used in competency development or staffing processes. If the test is used for development purposes, a generic report is produced.
This report provides general information on:
- Leadership style (transformational or transactional);
- The type of motivation that motivates the participant to exercise leadership (intrinsic or extrinsic);
- The type of approach preferred (results orientation vs. people-orientation / focus on the internal team or external stakeholders);
- The type of environment and employees with whom one feels most comfortable (stable or turbulent environment/ employees with little experience or who have mastered their jobs);
The report that is provided after the test presents all the results associated with these elements in addition to providing candidates with avenues for development and reflection.
However, no information is presented concerning the specific competencies associated with leadership.
In the case where the test is used in staffing, a specific report presents the candidates’ results according to each of the following 17 competencies grouped into four categories:
- Leadership of People
- Interpersonal Communication
- Caring for others
- Knowing how to work in a team
- Build high-performance teams
- People Orientation
- Leadership in Action
- Action Orientation
- Knowing how to adapt
- Need to succeed
- Initiative / Entrepreneurship
- Creativity / Innovation
- Decisional Leadership
- Knowing how to order
- Knowing how to mobilize
- Knowing how to organize/orchestrate
- Knowing how to control
- Situational Leadership
- Knowing how to transmit information
- Sense of the environment
No information is provided regarding the styles, approaches, and preferences identified in the People Leadership Development Report.
To determine whether the information received when taking the ID-Leadership in development mode can have an impact on the results in staffing mode, we examined the ID-Leadership database to identify the people who completed the test twice. A total of 50 participants fell into this category. The characteristics of the participants were as follows:
- Gender: 60% male / 40% female
- Average length of time between the two periods: 1.8 years (minimum: 6 months / maximum: 3.8 years)
- Order of presentation:
- A- Administration #1: “Development” report
- B- Administration #2: “Staffing” Report
It should be noted that all participants who completed the test at “Time 1” not only read the “development” report but also received individual feedback from an advisor.
The purpose of this feedback was to explain the results to participants so that they could improve. As a result, this is a situation where participants had a maximum chance of improving their outcomes when they moved to “Time 2.
Figure #1 presents the average results obtained in the two waves based on the information presented to participants in the development report. As can be seen, the variations observed are extremely small and insignificant.
Also, in some cases, an increase in results is observed (e.g., decision leadership / intrinsic motivation) while in other cases a decrease is observed (e.g., action leadership/people orientation). Statistically speaking, test-retest reliability varies between 0.46 and 0.67 with an average of 0.54 (significant at 0.01), a very high level of stability for components of a test.
Figure #2 presents the average results obtained in the two transitions based on the information presented to participants in the “staffing” report. As was the case with the previous information, we note that the variations observed are extremely small and not significant. The same observation can be made for Figure 3, where it can be seen that the averages for the different categories are virtually identical from one intake to the next. Statistically speaking, test-retest reliability varies between 0.50 and 0.71 with a mean of 0.61 (significant at 0.01), a very high level of stability for components of a test.
The results obtained in this study confirm the observations made in the scientific and professional literature. Indeed, the stability of the results is extremely high for situational judgment tests.
In the case of ID-Leadership, a very high level of stability is observed even though the participants have access to their test results and receive personalized feedback. Leadership, therefore, appears to be a very stable characteristic that varies little over time. The development of leadership skills would therefore take place over a medium or longterm horizon (more than 2 years).
Furthermore, the results allow us to conclude that taking the test in “development” mode has little impact on the results during a second administration in the “endowment” mode. The fluctuations observed are normal and reflect the fact that fundamental human characteristics vary slightly with time and circumstances.
Bartels, C., Wegrzyn, M., Wiedl, A., Ackermann, V., & Ehrenreich, H. (2010). Practice effects in healthy adults: A longitudinal study on frequent repetitive cognitive testing. BMC Neuroscience, 11(1), 118-130.
Brummel, B. J., Rupp, D. E., & Spain, S. M. (2009). Constructing parallel simulation exercises for assessment centers and other forms of behavioral assessment. Personnel Psychology, 62(1), 137-170.
Carretta, T. R. (1992). Short-term retest reliability of an experimental U.S. Air Force pilot candidate selection test battery. The International Journal of Aviation Psychology, 2(3), 161-173.
Centre for Economics and Business Research. (2014). Cost of small business employment. Retrieved from https://cebr.com/reports/cost-of-small-businessemployment/
Dunlop, P. D., Morrison, D. L., & Cordery, J. L. (2011). Investigating retesting effects in a personnel selection context. International Journal of Selection & Assessment, 19(2), 217.
Equal Employment Opportunity Commission. (1978). The uniform guidelines on employee selection procedures. Retrieved from https://www.gpo.gov/fdsys/pkg/CFR-2016-title29-vol4/xml/CFR-2016- title29-vol4-part1607.xml
Hausknecht, J. P., Trevor, C. O., & Farr, J. L. (2002). Retaking ability tests in a selection setting: implications for practice effects, training performance, and turnover. Journal of Applied Psychology, 87(2), 243-255.
Hausknecht, J. P., Halpert, J. A., Di Paolo, N. T., & Gerrard, M. M. (2007). Retesting in selection: A meta-analysis of coaching and practice effects for tests of cognitive ability. Journal of Applied Psychology, 92(2), 373-385.
Hausknecht, J. j. (2010). Candidate persistence and personality test practice effects: Implications for staffing system management. Personnel Psychology, 63(2), 299324.
Kelley, P. L., Jacobs, R. R., & Farr, J. L. (1994). Effects of multiple administrations of the MMPI for employee screening. Personnel Psychology, 47(3), 575-592.à
Kolk, N. J., Born, M. P., & der Flier, H. v. (2003). The transparent assessment centre: The effects of revealing dimensions to candidates. Applied Psychology: An International Review, 52(4), 648-668.
Kulik, J. A., Kulik, C. C., & Bangert, R. L. (1984). Effects of practice on aptitude and achievement test scores. American Educational Research Journal, 21(2), 435-44.
Lievens, F. f., Buyse, T., & Sackett, P. R. (2005). Retest effects in operational selection settings: Development and test of a framework. Personnel Psychology, 58(4), 981-1007.
Lievens, F., Reeve, C. L., & Heggestad, E. D. (2007). An examination of psychometric bias due to retesting on cognitive ability tests in selection settings. Journal of Applied Psychology, 92(6), 1672-1682.
Martin, W. (2014). The problem with using personality tests for hiring. Retrieved from https://hbr.org/2014/08/the-problem-with-using-personality-tests-for-hiring
Maurer, T., Solamon, J., & Troxtel, D. (1998). Relationship of coaching with performance in situational employment interviews. Journal of Applied Psychology, 83(1), 128-136.
McDaniel, M., & Nguyen, N. (2001). Situational judgment tests: A review of practice and constructs assessed. International Journal of Selection And Assessment, 9(12), 103-113.
Puddey, I. B., Mercer, A., Andrich, D., & Styles, I. (2014). Practice effects in medical school entrance testing with the undergraduate medicine and health sciences admission test (UMAT). BMC Medical Education, 14(1), 48-63.
Randall, J., & Villado, A. (2017). Take two: Sources and deterrents of score change in employment retesting. Human Resource Management Review, 27(3), 536-553.
Reichin, S.L. (2018). Investigating factors related to score change at retest: Examining promotional assessments. M.A. Thesis. Middle Tennessee State University. May 2018.
Smither, J. W., Reilly, R. R., Millsap, R. E., Pearlman, K., & Stoffey, R. W. (1993).Applicant reactions to selection procedures. Personnel Psychology, 46(1), 49-76.
Society for Industrial and Organizational Psychology. (2017). Principles for the validation and use of personnel selection procedures. Retrieved from http://www.siop.org/_principles/principles.pdf U.S. Department of Labor Employment and Training Administration, 1999
Van Iddekinge, C. H., Morgeson, F. P., Schleicher, D. J., & Campion, M. A. (2011). Can I retake it? Exploring subgroup differences and criterion-related validity in promotion retesting. Journal of Applied Psychology, 96(5), 941-955.
Walmsley, P., & Sackett, P. (2013). Factors affecting potential personality retest improvement after initial failure. Human Performance, 26(5), 390-408Back to clients