DISPARATE IMPACT IN PROMOTIONAL TESTING
Nearly all written promotional examinations have a disparate impact or adverse impact on minorities. Abundant research indicates that blacks score an average of one standard deviation lower than whites on high-stakes cognitive tests. This disparity is attributed to a variety of reasons – none of them related to the knowledge, skills and abilities of firefighters with respect to each other or in the performance of the job.
Nor is such performance related to study habits, as so many cities and firefighters like to believe. It is easy to take courses to prepare you to study and “take multiple-guess exams” but how you score on such written exams is not a reflection of how well you will perform on the job – which is the legal requirement of what any promotional or hiring examination should test for.
Because all of us grow up through school taking various tests to pass grades and move on, we think we are used to them and they are a proper fixture of everyday life. But fire fighting does not lend itself well to the written examination, which can never in a multiple-guess format capture the nuances of a situation and always has inherent ambiguities. The problem with written tests as a measure of fire fighting command capability was perhaps best put by a judge in the First Circuit Appeals case of Boston Chapter NAACP v. Beecher:
- There is a difference between memorizing (or absorbing through past experience) the fire fighter terminology and being a good fire fighter. If the Boston Red Sox recruited players on the basis of their knowledge of baseball history and vocabulary, the team might acquire authorities … who could not bat, pitch, or catch.
Tests that merely reiterate old or simplistic job knowledge and do not account for command ability, performance, and other better predictors, such as those seen in many assessment centers (which can range from real fire scenes to pother scenarios designed to actually challenge the candidate and see him or her in action).
Lest We Forget: Promotional and Hiring Tests are High-Stakes
Testing has its own vernacular. Promotional exams are what is referred to as "high-stakes testing." "High-stakes testing" is a term ascribed to those tests or selection devices that will have a major impact on a person’s life, such as school admissions, hiring or promotion. "Cognitive tests" are tests that rely heavily on cognitive skills, e.g. reading comprehension. For example, multiple choice job knowledge or technical knowledge tests are cognitive tests. Another example is the SAT - which some major universities and colleges no longer using it for student admissions because of its inherent deficiencies as a selection device.
In the promotion context, one of the biggest myths in testing is that cognitive tests select the most qualified candidates for promotion. They do not – particularly in a firefighter context. Job knowledge tests have little value in predicting how any candidate will actually perform in the job. In fact, at least one peer-reviewed study has demonstrated that lack of job knowledge is generally a minor reason for the failure of any executive on the job. One need go no further than the financial crisis and the American auto executives to see this in action. The massive failures we have witnessed in the managing the economy or these industrial giants have little to do with how smart these guys all are. A pretty strong case can be made that the smartest guys in the room were nothing of the sort-but our guess is that they all did great on written tests at whatever business school they attended.
Further indicia of this is that some jurisdictions have dispensed with written promotional tests altogether for their safety forces or have used them as threshold requirements, i.e. pass/fail, only, and then combine them with other factors, such as performance, assessment centers, and the like.
Regardless, in firefighter testing one invariably sees the protestations of non-minorities – usually through unions-couched in terms of written tests getting the "most qualified" and "those that study the hardest" do the best. Neither assertion is even close to being true. Since these tests are generally multiple-guess formats where the correct answer is always before the test taker, they are reduced to being nothing more than memorization and recognition exams. i.e., cognitive skills. As all of you know from protests, such exams are imperfect often even in typical knowledge. In the book, Challenging the Myths of Fair Employment Practices, author Richard S. Barrett (who testified in the original seminal disparate impact cases and also helped author the EEOC guidelines on testing) states:
- Myth: The typical multiple-choice test gets at important information. Generally [in fire fighting], important information and the situation to which it is applied are too complex for the multiple-choice format. The result is a test that relies heavily on trivia. ….It is easy to see if a person knows the dates of Columbus’ voyages, but multiple-choice tests cannot reflect the complexity of the times, the decisions that were made, and the pressures that led to his downfall. … {Such tests} do not reward – in fact, they punish – the applicant who carefully and fully considers problems before acting.
Notably, Dr. Barrett is an I.O. psychologist who administers examinations for various fields. He does not approve it for dynamic areas such as fire fighting, finding there are other, better alternatives to determine who’s best prepared to handle life and death situations, and dealing with citizens in need, as well as deployment to disaster cites through FEMA, etc.
A non-firefighter given the same preparation material and time to study or training in “fire fighter exam-taking” might pass the typical job knowledge or technical knowledge written examination and rank high enough to be promoted – but obviously would not be a very effective officer. This underscores the fact that written examinations have little, if any, value in determining promotions.
It should also be noted that union officials and fire officers have little-if any-knowledge about testing. The same is true for most city personnel department employees. Test development is a specialized body of knowledge that requires training and practical expertise to perform competently. This is why municipalities hire testing consultants, who are invariably industrial/organizational psychologists or the occasional behavioral psychologist sprinkled in. We know this from numerous depositions of city and fire officials taken from around the country. Therefore, any opinions expressed by these officials regarding testing are usually nothing more than just that: opinions -- and the age-old adage about everybody having one, and thinking that everyone else’s but theirs, stinks is apropos. This is particularly true with some fire fighter unions who have tremendous political influence over many city administrations and tend to support multiple-guess tests and maintaining the status quo. It is these types of tests that continue to be accorded the most weight in most testing programs. There are some notable exceptions, such as Jefferson County, Ala., and Minneapolis, Minn., and others, which have learned and adapted their testing processes to reflect the needs of the community that owns the fire departments serving them.
Disparate Impact Analysis
Disparate impact arises in the context of a facially neutral policy or practice that has a disparate impact upon some group. In testing cases, the facially neutral policy is always the selection process, which generally includes some form of testing. There are some important aspects of disparate impact litigation that separate these cases from disparate treatment cases, which are your typical discrimination case. Most notably, in disparate impact cases, the plaintiff is not required to prove intent, that is, that the employer intended to discriminate against him/her personally. It is the impact of the facially neutral policy that triggers liability, particularly where the employer continues to insist on using a bad testing method.
Adverse impact is measured by use of an “impact ratio.” And yes, it is math -- but do not glaze over. Disparate impact cases rely almost exclusively on statistics. But, calculation of the impact ratio is not that difficult. Simply stated, impact ratio is the comparison of the selection rate for the group with the highest selection rate with that of groups with lower selection rates. Groups are generally defined demographically - i.e. race, gender, age - which in statistical parlance are known as subgroups.
The "Rule of Thumb" for disparate impact cases is “the 4/5s Rule” or the "80% Rule." (As an interesting historical aside, the term "rule of thumb" relates to the thickness of a switch a man could use to beat his wife, or no bigger than the diameter of his thumb; it hails from Biblical times, we think.) The rule is set forth in the Uniform Guidelines on Employee Selection (“UGES” – pronounced "you-gess"), which were promulgated by the EEOC in 1978. The UGES have been adopted by the USDOL, OFCCP, and all other federal agencies. The rule is this:
- If the selection rate of one subgroup is less than 80% - or 4/5s - than that of the group with the highest rate of selection, then there is adverse impact.
There can also be adverse impact, however, with impact ratios that are greater than 80% based upon statistical significance (2 standard deviations) – but for this type of statistical analysis to have any weight there must be sample sizes that are sufficiently large enough (usually 120+) to be able to be meaningful; smaller sample sizes are not as statistically reliable. Expert testimony is necessary for this aspect of determining adverse impact using statistics. Regardless, determination of the impact ratio is the predicate starting point in any disparate impact case. This is the prima facie case that a plaintiff generally must prove. Then the employer who insists on using the test anyway for hiring or promotion must prove by a preponderance of the evidence that the tests as fashioned are a matter of business necessity.
This is when the exams themselves are scrutinized by experts and the court – if a case gets that far. Again, it is the employer’s burden (a heavy one) to prove that the tests are sufficiently job-related and predict sufficiently future successful performance in the particular position tested for. This is done by using one of a variety of validation strategies set forth by the UGES.
Assuming that the employer can prove the job-relatedness of its promotional examinations, the burden of proof then returns to the employee to show that there are alternative methods available that the employer failed to use that would have had a demonstrably less adverse impact. When many of the promotional and hiring cases were working their way through the courts 25-30 years ago, testing was still pretty much in its unruly adolescence and there were not clearly established alternative methods available. Therefore, it was difficult to prevail in those cases where the employer could establish the job-relatedness of its examinations.
It is generally from this pea-soup of psycho mumbo-jumbo that one hears the "most qualified" and "those who study the hardest get promoted” comments. Put another way: "if you do not know what you are talking about, say it loudly and repeat it often."
The purpose in any high stakes testing is to develop a fair test that eliminates the noise of inherent defects, such as that which resides in written tests. In our experience, minority firefighters are not unwilling nor afraid to compete; they are just weary of competing in a rigged game. Our country is well served by all firefighters, but most certainly those of color and women who have confronted so many obstacles to just being able to serve.
The ultimate question is, if an adverse impact exists, is such an impact based on impermissible qualities, such as race, age, gender? What is the particular department situation and testing method - is it a written "job knowledge" test combined with oral and an "assessment center"? This tends to be most typical. But test results will vary widely based on a host of inter-connected criteria.
For example, one must consider:
- The weight assigned to each portion of the examination;
- The degree of subjectivity involved in establishing and grading the examination;
- The skill in which the examination was constructed and applied to your particular department’s inner workings;
- The suitability of the underlying job analysis done by the tester;
- The care used in administering, grading, and computing the examination results;
- The department’s own directives to the tester and the tester's own internal goals (e.g., profit; time-saving measures leading to sloppiness; etc.);
- The method of choosing from among eligible candidates.
These are but a few factors that will affect the outcome and score of any promotional examination. Of course, there are many more in general and also many more unique to a departmental profile.
Although statistics are an integral part of proving the disparate impact case, the heart of it is the underlying methodology employed by the particular tester. This forms the bedrock (or mud, depending on the tester) of the required results of reliability (repeatability) and validity (integrity). Again, it is important to realize that you don’t have to prove intent with this type of litigation. Even facially neutral criteria can work together in such a way as to result in impermissible disparate impact discrimination – not just based on race, but also gender and, increasingly, age.
The Assessment Center: Is it Adequate?
Nearly all examinations for safety forces involve an assessment portion of the examination. Because such examinations are so widely varied, this portion of the examination poses particular dangers for adequate and neutral testing and criteria. You need to assess the factors in the examinations, such as whether they are written, oral, or an actual simulation of a fire scene, etc. Potential bias can creep in via subjective grading, whether there are checks and balances, whether experienced personnel are doing the assessing, whether the exam properly gauges officer qualities, the format of how the test scenarios are presented, etc. And again, documentation is essential.
With the software and hardware technologies that exist and are always being developed and improved, the cost effectiveness of computerized simulations has made alternative measures much more accessible than in the past.
Many cities, including our experience with Cleveland, are so intent on cutting costs that they permit shoddy examination methods in this area and conduct what may not be “true” assessment centers (e.g., a paper and pencil “in-basket” computer-graded test) and will virtually always result in promotion of lesser-qualified individuals. Such assessment center examinations must therefore be very closely examined.
Getting Involved
Disparate impact litigation is complicated and not for the faint-hearted. But it’s a vital part of ensuring the most qualified individuals are not carved out from the rest of the group because of impermissible factors. We’re continually amazed at the prevalence of disparate impact in promotional testing; indeed many cities have ignored their own prior consent decrees with seeming impunity.
It takes courage to get involved - a willingness to be possibly shunned or harassed by others not in the protected group, for example. And it takes fortitude: you’ll likely be living with this litigation for some years to come. On the other hand, it’s better to take action, rather than continue to see the whittling down of your rights, loss of seniority, and lack of access to better opportunities in the department. It's just as important for the future, and for those to come after you, as it is to rectify past wrongdoings.
To get involved, talk to others in your situation. Gather all manuals - including SOPs and SOGs, policies, general orders, ordinances, etc. Assuming you're a public employee, conduct a public records request under your state’s "sunshine" laws. Drafting the Public Records Request. Compile as much information as you can. Virtually all states have sunshine or FOIA laws, and your public employer must obey the laws or be subject to a writ filed in court with attendant damages and attorney fees.
In short, don’t let the specter of disparate impact litigation intimidate you. With proper application and analysis by experienced lawyers and statistical/testing experts, you can go a long way toward righting decades of wrongs in the workplace and helping to ensure a secure future for yourself and others.