A Peer Review of

ErgoScience Research

Ó 2000 Valpar International Corporation
All rights reserved.

A careful review of research conducted on the ErgoScience Physical Work Performance Evaluation (PWPE), as well as ErgoScience marketing materials, reveals significant flaws and shortcomings in the research as well as questionable use of this research by ErgoScience for blatant and misleading self promotion. These issues may not be readily apparent to the casual reader, but they are important to note when considering the purchase of this FCE system.

The study in question (the Article) is titled "Reliability and Validity of a Newly Developed Test of Physical Work Performance" by Deborah Lechner, et al., published in the Journal of Occupational Medicine, September, 1994. The following will highlight important facts about the Article and the suspect ways that ErgoScience uses it for sales promotion.

Peer Review

ErgoScience cites the Article extensively during sales presentations and in sales literature, heavily stressing that it is "peer-reviewed." This is done to an extent that makes one believe that peer-review is an endorsement process that somehow bestows special status to that which is peer-reviewed. This is nonsensical sales hype. Publication in a peer-reviewed journal is never, in any way, an endorsement of a product. Peer-review is intended to ensure that articles:

    1. Comply with scientific research principles and methods
    2. Contribute to the larger body of knowledge on a given subject

It should also be noted that the concept of peer-review has been in existence for hundreds of years and can take many forms. Publication of research in a peer-reviewed professional journal is just one of those methods. In addition, not all peer-reviewed journals have the same standards of scrutiny and review.

False Claims

ErgoScience asserts that "The PWPE is the only FCE that has been studied in it’s (sic) entirety for interrater reliability." This is false on two accounts. First, other FCE systems have conducted interrater reliability studies in at least as much depth as ErgoScience. Secondly, only three of the PWPE's six sections were included in the reliability study. The endurance, balance, and coordination sections of the PWPE were excluded. So, in fact, interrater reliability was studied for only part of the PWPE.


Conclusions drawn from research should not extend beyond the elements formally studied. Anything else is pure conjecture. Competent researchers know this and are obliged by professional ethics to confine discussion of the results accordingly. The Article examined interrater reliability and validity only. ErgoScience marketing materials, however, reference the Article and claim that it "demonstrates that 2 day testing is not necessary to achieve reliability and validity in FCE (sic)." In order to make this determination, temporal stability reliability must be studied. It was not, so this claim is unsubstantiated.


Individuals were recruited into the study from a rheumatology clinic. These subjects were not randomly selected, and they were not injured workers . Arthritis is not a diagnosis representative of typical FCE client populations. For these reasons alone, the study findings do not pertain to injured workers at all, and, in fact, cannot be generalized beyond the sample itself.

The article says the subjects were either working at least 20 hours per week or not working at all because they had been declared medically unfit for work of any kind. Unfortunately, the authors did not say how many of the 50 in the sample fit into the non-working category, nor did they reveal the numbers of those who were working that fell into each of the several DOL strength classifications. These omissions make it impossible for the reader to evaluate either the reliability or validity evidence discussed in the Article.

The authors do not state why they chose to include persons in the sample who were not working and who had been medically deemed unfit to work. The inclusion of these subjects does have the effect of increasing the magnitude of the statistics the researchers calculated. As Anastasi (1998) states, "The question of sample heterogeneity is relevant to the measure of validity…It will be recalled that, other things being equal, the wider the range of scores, the higher will be the correlation. This fact should be kept in mind when interpreting the validity coefficients."

The Article states that "only 14-18% of those evaluated were found to be working above the level predicted by the PWPE." Consider, however, that the researchers chose persons for the sample who weren’t working at all, and who, in fact, had zero chance of performing jobs that made higher demands than the PWPE could have predicted. The authors did not share with their readers how many of the sample fit into that category, even though this information is essential to have in order to evaluate the 14-18% figures.

Shifting Statistics

The Article initially states (p. 1000) that it will investigate concurrent criterion-related validity. Mid-way through the Article (p. 1003), the authors suddenly switch to construct validity and then quickly shift to convergent validity. No explanation is given for the shift away from the design of the study as described at the beginning of the Article. Further, no discussion is provided to support the new directions. It's remarkable that the peer-reviewers did not catch this egregious error but, as noted earlier, not all peer-reviewed journals have the same standards of scrutiny and review.

Validity Conclusions

The statistically derived coefficients presented for the PWPE (.41 and .55) were criterion-related validity correlations. A coefficient of .41 means that the PWPE is only 9% better than random chance at assigning subjects to the proper job classification. A coefficient of.55 translates to 16% better than random chance. Far from demonstrating the validity of the PWPE, those coefficients are inadequate for clinical work, particularly if there is a possibility of having to defend assessment results in court.

Daubert vs. Merrill Dow Pharmaceuticals

ErgoScience frequently aligns the PWPE with a 1993 U.S. Supreme Court decision, Daubert vs. Merrill Dow Pharmaceuticals. For example, in a marketing packet sent to a prospective customer, ErgoScience states, "The PWPE is the only FCE that meets the standard set forth by the Supreme Court." This is patently false because there is no Supreme Court standard. The ruling in question has nothing whatsoever to do with FCE results as they are typically used—even in court. In addition, PWPE marginal correlation coefficients and inability to generalize from the study sample to the working population would not likely stand up to close scrutiny in any court.


When one takes the time to thoroughly examine the 1994 study, it is evident that it provides only minimal support for the PWPE. Comparing the actual study results to the marketing claims reveals a fantastic level of hype and a few outright misrepresentations in ErgoScience sales materials and presentations.

One has to wonder about the company that stands behind the PWPE as much as one has to wonder about the research that will support the PWPE in practice and in court.


Anastasi, A. (1988). Psychological testing (6th ed.). New York: Macmillan.

Lechner, D.E., Jackson, J.R. Roth, D.L., & Straaton, K.V. (1994). Reliability and validity of a newly developed test of physical work performance. Journal of Occupational Medicine, 36, 997-1004.

[Back] [Home]

Copyright © 1999 - Valpar International Corporation; All Rights Reserved.