Constant-Murley Score: Systematic Review and Standardized Evaluation in Different Shoulder Pathologies
Constant-Murley Score: Systematic Review and Standardized Evaluation in Different Shoulder Pathologies
https://doi.org/10.1007/s11136-018-1875-7
REVIEW
Abstract
Purpose The objective of this study was to evaluate the psychometric properties of the Constant–Murley Score (CMS) in
various shoulder pathologies, based on a systematic review and expert standardized evaluations.
Methods A systematic review was performed in MEDLINE and EMBASE databases. Titles and abstracts were reviewed and
finally the included articles were grouped according to patients’ pathologies. Two expert evaluators independently assessed
the CMS properties of reliability, validity, responsiveness to change, interpretability and burden score in each group, using
the EMPRO (Evaluating Measures of Patient Reported Outcomes) tool. The CMS properties were assessed per attribute and
overall for each considered group. Only the concept and measurement model was assessed globally.
Results Five individual pathologies (i.e. subacromial, fractures, arthritis, instability and frozen shoulder) and two additional
groups (i.e. various pathologies and healthy subjects) were considered. Overall EMPRO scores ranged from 58.6 for sub-
acromial to 30.6 points for instability. Responsiveness to change was the only quality to obtain at least 50 points across all
groups, but for frozen shoulder. Insufficient information was obtained in relation to the concept and measurement model and
great variability was seen in the other evaluated attributes.
Conclusions The current evidence does not support the CMS as a gold standard in shoulder evaluation. Its use is advisable
for subacromial pathology; but data are inconclusive for other shoulder conditions. Prospective studies exploring the psy-
chometric properties of the scale, particularly for fractures, arthritis, instability and frozen shoulder are needed.
Level of evidence Systematic review.
Keywords Constant–Murley score · Systematic review · Shoulder pathologies · EMPRO tool · Standardized evaluation ·
Psychometric properties
13
Vol.:(0123456789)
2218 Quality of Life Research (2018) 27:2217–2226
pathologies [5, 7, 8]; differences according to age and sex Articles presenting information on the development pro-
have been observed [9, 10] and lack of standardization in cess, the psychometric properties and the administration of
measuring the strength component has been criticized [11, the CMS tool were eligible for inclusion. Articles written in
12]. English, Spanish, French, German and Italian were included
In an attempt to clarify certain aspects related to its in the evaluation stage. Opinion letters, congress abstracts,
administration, the original author published an article with study protocols, case studies, articles on animal and cadav-
modifications and guidelines for the instrument´s use in eric studies presenting information on surgical or other tech-
2008 [13]. A visual analog scale (VAS) was suggested for niques applicable to shoulder pathologies were excluded.
the pain item, and part of the ADL questions and specific Titles, abstracts and full texts were independently
instructions on how to evaluate the strength component were reviewed by two investigators (KV & MA) in a three-step
presented. It was also stated that the CMS is not valid for process. A third researcher (YP) was appointed to resolve
evaluating episodic severe pain, as in dislocation. Finally, a possible discrepancies if needed. In order to complete the
score modification, adjusting for age and sex was proposed search, the reference lists of all finally selected articles were
[13]. also hand searched. General shoulder review articles were
The psychometric properties of the CMS questionnaire not given to the evaluators, but were read and their refer-
have been the subject of literature reviews [3, 4], general ences hand searched by the previous two authors. Review
systematic reviews [14] and reviews on specific shoulder articles on specific shoulder pathologies were not evaluated
pathologies [15, 16]. However, up to date, no standardized per se, but were given to the evaluators for consideration and
evaluation of its properties in various shoulder diagnoses possible identification of further references on relevant data.
has been presented. Patient pathologies of all included articles were noted and
The evaluating measures of patient reported outcomes were subsequently grouped according to their characteris-
(EMPRO) tool was created for evaluating the psychometric tics. The grouping criteria were established by one of the
properties of patient reported outcomes (PRO) [17]. This co-authors (RC: orthopaedic surgeon experienced in upper
tool is composed of a broad spectrum of questions and extremities), considering the main shoulder pathologies,
specific recommendations on how each property should be in line with the indications of the American Academy of
assessed. It requires the involvement of expert evaluators Orthopaedic Surgeons (AAOS).
and offers standardized and comparable results. It assesses
the concept and measurement model of a scale as well as The CMS scale
the attributes of reliability, validity and responsiveness to
change, among others; and it has been previously used in The CMS is a multi-item functional scale assessing pain,
the evaluation of different PRO scales [18–20]. ADL, ROM and strength of the affected shoulder. Its score
The purpose of the current study was to perform a sys- ranges from 0 to 100 points, representing worst and best
tematic literature review and a standardized evaluation of the shoulder function, respectively.
CMS properties. The evidence was grouped according to the In the original publication, the pain experienced during
type of shoulder diagnosis. Subacromial, fractures, arthritis, normal activities of daily living was scored as: no pain = 15
instability and frozen shoulder pathologies were assessed, points, mild = 10, moderate = 5 and severe = 0 points [1].
while data on various pathologies and healthy subjects were The most recent publication recommends these options to
also evaluated. The current results will offer clinicians and be replaced by a VAS, maintaining the 15 points score range
researchers more insight on the CMS psychometric proper- [13].
ties, allowing for the latter to be compared between different The ADL component is assigned a maximum of 20 points
diagnostic groups. To the best of our knowledge, it is the and evaluates limitations in doing normal work, recreational
first time that a CMS evaluation with these characteristics activities, unaffected night sleep and positioning the arm up
is performed. to a certain level. The first two items were originally scored
as: no limitation = 4, moderate = 2 and severe = 0 points [1].
In the latest publication a VAS was suggested for both ques-
Materials and methods tions [13], while the score range of the other two would
remain the same. Night sleep is assessed as: unaffected = 2,
Literature review sometimes disturbed = 1, always disturbed = 0 points.
And finally arm positioning: up to waist = 2, xiphoid = 4,
Systematic searches were conducted in MEDLINE and neck = 6, head = 8, above head = 10 points.
EMBASE databases for the period between January 1st The ROM part evaluates four active ranges of motion,
1986 and May 2nd 2014. For specific strategies see Online receiving 10 points each, i.e. pain-free forward and lateral
Appendix 1. elevation, external and internal rotation. Elevation degrees
13
Quality of Life Research (2018) 27:2217–2226 2219
are measured with a goniometer in a seated position and working group or had undergone an EMPRO training course.
scores range from: 0°–30° = 0 to 151°–180° = 10 points. All evaluators reviewed the corresponding full text articles,
External rotation is based on five unassisted hand manoeu- filled in the assessment tool and were subsequently given
vers, assigned 2 points each: hand behind head with elbow access to the evaluation of their pair. Discrepancies were dis-
forward, hand behind head with elbow back, hand on top cussed and a final consensus was reached in all cases.
of head with elbow forward, hand on top of head with
elbow back and full elevation. Internal rotation was initially EMPRO scores
measured with the dorsum of the hand pointing to certain
parts of the body, but in the most recent publication, the An attribute and an overall score were derived per pathology.
thumb was suggested as a pointer to the following anatomic Attribute scores were the response mean of all replies when
landmarks: lateral thigh = 0, buttock = 2, lumbosacral junc- at least 50% of the attribute items were rated; otherwise no
tion = 4, waist = 6, 12th dorsal vertebra = 8 and interscapular score was given. Items with “no information” were assigned
region = 10 points. 1 point (lowest possible), while “not applicable” items were
The strength component is given 25 points. Originally, assigned the mean value of the rest of the attribute items,
the use of an unsecured cable tensiometer or spring bal- excluding the “no information” ones. Mean responses were
ance was instructed and scoring was based on the number linearly transformed to a 100-point scale, with higher values
of pounds of pull that a subject could resist, in up to a maxi- suggesting better properties; scores of 50 or more points
mum of 90° of abduction [1]. In the updated recommenda- are considered to be acceptable [18]. Two sub-scores are
tions, this is done at 90° of abduction, with the hand facing estimated for the attributes of reliability (i.e. internal con-
downward, using either a dynamometer or a defined spring sistency and reproducibility) and burden (i.e. respondents
balance technique. The maximum value of three consecutive and administrative burden), with the highest among the two
repetitions should be used. When desired abduction cannot being the attribute´s global score. The burden scores are
be reached, then the subject is given 0 points [13]. presented separately and are not affecting any further calcu-
Given the importance that age and sex have in the func- lations. The overall EMPRO score was based on the rating
tional capacity of the shoulder, an alternative CMS scoring, of 5 attributes: concept and measurement model, reliability,
adjusting for these two variables, was also proposed. Based validity, responsiveness and interpretability. This score was
on values derived by 900 healthy subjects, the relative CMS calculated if at least 3 of those 5 attributes had a rating and
is calculated as the original CMS divided by the respective attributes with insufficient information were given 0 points.
age and sex-matched healthy values [13]. The rating algorithm was run in SPSS version 23 (SPSS,
Chicago, IL, USA).
The EMPRO tool For the needs of this study, the attribute of concept and
measurement model was evaluated only once, by two of the
The EMPRO is a standardized scale, designed to evaluate authors (KV & MA), also participating in the evaluation
the psychometric properties of PRO questionnaires, based on process. It was not deemed necessary for all reviewers to
published evidence [17]. It is composed of 39 items divided repeat this evaluation, given that the same published infor-
into 8 attributes: concept and measurement model, reli- mation would have to be evaluated by all. The score of this
ability, validity, responsiveness to change, interpretability, attribute entered in the final EMPRO score of all considered
respondent and administrative burden, alternative modes of pathology groups.
administration and finally cross-cultural and linguistic adap- It was hypothesized that respondent burden would vary
tations. Each item is accompanied by specific instructions, according to pathology; while aspects like “time required”
and rated on a 4-point Likert scale from 1 (strongly disagree) and “training and expertise needed”, assessed as part of the
to 4 (strongly agree) and include a “no information” option. administrative burden, may also depend upon it. For these
Five items have an additional “not applicable” option. The reasons, the two burden attributes were evaluated per pathol-
EMPRO is a reliable and valid tool and has been used in ogy group. Likewise, the alternative forms of administration
the evaluation of condition specific and generic PRO instru- attribute, were also evaluated per pathology. None of these
ments [17, 19–22]. Eleven shoulder PRO scales have also three attributes is included in the final EMPRO scores.
been evaluated with this tool [18].
The articles corresponding to each diagnostic group were rated The systematic literature search identified 3337 unique titles,
independently, via EMPRO, by 2 evaluators with expertise in of those 2594 were excluded, for not being related to the
PRO. Most evaluators belonged to the EMPRO development studied topic. A total of 743 abstracts were reviewed, of
13
2220 Quality of Life Research (2018) 27:2217–2226
which 624 were excluded, mainly for not mentioning CMS The included articles were subsequently divided into
use (68) or not reporting data on CMS properties (495). The five individual pathology groups, named: subacromial
rest were excluded for being secondary research articles, pathology, fractures, arthritis, instability and frozen shoul-
case studies, study protocols, commentaries, animal and der. Studies presenting data on heterogeneous shoulder
cadaveric and no shoulder related studies. Finally, at the full pathologies (various pathologies) and studies on healthy
text revision phase, 24 articles were additionally excluded subjects were also evaluated. Information on the exact
for not fulfilling the inclusion criteria. One article was iden- pathologies considered and the number of finally included
tified by hand search. Thus, a total of 96 full text articles articles per group is presented in Table 1.
were considered at the EMPRO evaluation phase (Fig. 1).
Fig. 1 PRISMA flowchart
with numbers of included and Initial search in MEDLINE Initial search in EMBASE
excluded articles at each step of indexed publications
Identification
indexed publications
the systematic literature review yielded 2,845 citations
yielded 2,531 citations
2,039 duplicates
were removed
3,337 titles
were reviewed
2,594 articles
were excluded
Screening
743 abstracts
were reviewed
624 articles
were excluded:
1 article identified
by hand search
96 articles were
included in the review
13
Quality of Life Research (2018) 27:2217–2226 2221
Each pair of evaluators reviewed between 1 (i.e. frozen and 31.3 points for fractures, while arthritis and instability,
shoulder) and 37 (i.e. subacromial pathology) published both obtained 41.7 points. Various pathologies and healthy
articles. Articles presenting elaborate data on more than subjects were assigned 53.3 and 55.6 points, respectively.
one pathologies were additionally given to the correspond- Responsiveness to change was overall the best evalu-
ing pathology group evaluators. The subacromial pathol- ated property, with all obtained scores being between 83.3
ogy evaluators also assessed the concept and measurement and 50 points. The highest score among the five individual
model attribute based on the two publications written by the pathologies was obtained by the subacromial group, fol-
original CMS author. The list of all considered publications lowed by arthritis, fractures and instability. In addition, vari-
is presented in Online Appendix 2. ous pathologies obtained 55.6 points, while no score was
The total EMPRO scores of the individual pathology calculated for healthy subjects. Responsiveness to change
groups ranged from a maximum of 58.6 points for subac- is not usually evaluated in healthy subjects, as no change is
romial pathology to a minimum of 30.6 points for insta- expected in this group. This property was excluded when
bility (Table 2). The subacromial group was the only one calculating the healthy subjects total score.
to surpass the threshold of 50 total points. Fractures and As far as the attribute of interpretability was concerned,
arthritis obtained 43.5 and 41.7 points, respectively. Vari- within the individual pathology groups, the highest score
ous pathologies and healthy subjects were assigned 49.3 and was 61.1 points for subacromial pathology. Instability pre-
37.9 points each. Information on CMS properties in frozen sented the lowest value of 27.8 points and the other two
shoulder was insufficient. For this reason, neither attribute groups obtained 44.4 points each. Among the additional
(but the concept and measurement model), nor total EMPRO groups, various pathologies received 50 points and no score
scores were derived. was calculated for healthy subjects.
Internal consistency scores were low and calculated only In relation to the respondent and administrative burden
for the subacromial and various pathologies groups, which scores, the subacromial group obtained ≥ 50 in both attrib-
obtained 25 and 37.5 points, respectively. On the other utes. Arthritis obtained 33.3 and 83.3 points, whereas for
hand, reproducibility scores were noticeably higher. Over instability both values were < 21 points. Various patholo-
50 points were given to subacromial and fracture groups. gies reached 44.4 points for respondent and 75 points for
Arthritis was assigned 33.3 points and instability obtained administrative burden, while no scores were obtained for the
the lowest possible EMPRO score of 0 points. Both vari- healthy subjects group.
ous pathologies and healthy subjects had values > 50 in this The attribute of alternative forms of administration was
attribute. Lack of item response theory (IRT) information evaluated only for various pathologies, reaching 33.3 points.
penalized reproducibility evaluations. This evaluation was based on an article presenting a totally
Validity scores of the five individual pathology groups, self-administered CMS tool. Based on a series of explicit
oscillated between 44.4 points for subacromial diagnoses instructions and photos, subjects are guided on how to reply
Individual pathologies
Subacromial pathology Impingement syndrome 37
rotator cuff deficiencies; bursitis
tendinitis, tendinosis of the shoulder
calcific tendinitis of the shoulder
Fractures Proximal humeral fractures 7
Arthritis Glenohumeral osteoarthritis; rheumatoid arthritis; degenerative shoulder joint 6
disease; avascular necrosis of the humeral head
Instability Traumatic or non-traumatic shoulder instability; recurrent luxation 5
recurrent dislocation
Frozen shoulder Frozen shoulder; adhesive capsulitis 1
Additional groups
Various pathologies Various pathologies; shoulder pain 29
Healthy subjects No shoulder pathology; healthy individuals 9
odela
Concept & measurement m 2
Total 96
a
The concept and measurement model was evaluated globally for the CMS scale
13
2222 Quality of Life Research (2018) 27:2217–2226
Table 2 Item, attribute and total EMPRO scores for all considered pathology groups
Attribute Individual pathology groups Additional groups
Subacro- Fractures Arthritis Instability Frozen shoulder Various Healthy subjectsǂ
mial pathol- patholo-
ogy gies
Concept and measurement m odel¥ 33.3 33.3 33.3 33.3 33.3 33.3 33.3
Reliability: global score 70.8 58.3 33.3 0 54.2 62.5
Reliability: internal consistency 25 37.5
Data collection methods described +++ – – – – +++ –
Cronbach alpha adequate ++ – – – – +++ –
IRT estimates provided – – – – – – –
Testing in different populations – NA – – – ++ –
Reliability: reproducibility 70.8 58.3 33.3 0 54.2 62.5
Data collection methods described ++++ +++ ++ + – +++ ++++
Test–retest and time interval adequate ++++ +++ +++ + – ++++ ++++
Reproducibility coefficients adequate ++++ ++++ ++ + – +++ +++
IRT estimates provided – – – – – – –
Validity 44.4 31.3 41.7 41.7 53.3 55.6
Content validity adequate – + + – – + –
Construct/criterion validity adequate +++ +++ +++ ++ – ++++ +++
Sample composition described +++ ++ ++++ ++ – ++++ +++
Prior hypothesis stated +++ +++ +++ +++ + +++ +++
Rationale for criterion validity NA NA NA NA NA NA NA
Tested in different populations – – – +++ – ++ ++++
Responsiveness to change 83.3 50 55.6 50 55.6
Adequacy of methods +++ ++++ +++ ++++ + +++ –
Description of estimated change magnitude ++++ +++ ++++ +++ – +++ –
Comparison of stable and unstable groups ++++ ++ ++ + – ++ –
Interpretability 61.1 44.4 44.4 27.8 50
Rationale of external criteria ++++ +++ +++ ++ – +++ ++++
Description of interpretation strategies +++ ++ +++ ++ – ++ –
How data should be reported stated ++ ++ – ++ – +++ –
Total EMPRO score 58.6 43.5 41.7 30.6 49.3 37.9
Burden score
Burden I: respondent 61.1 33.3 16.7 44.4
Skills and time needed +++ – +++ ++ – +++ –
Impact on respondents ++ + ++ ++ – + ++
Not suitable circumstances ++++ – – + – +++ –
Burden II: administrative 50 83.3 20.8 75.0
Resources required + +++ ++++ + – +++ –
Time required +++ – +++ + – ++++ –
Training and expertise needed ++ – +++ + – +++ –
Burden of score calculation ++++ – ++++ ++++ – +++ +
Alternative forms of administration 33.3
Metric characteristics of alternative forms – – – – – ++ –
Comparability of alternative forms – – – – – ++ –
Scores range from strongly agree (++++) to strongly disagree (+) and no information (–), not applicable (NA). IRT item response theory. For
all pathology groups, the overall EMPRO scores include the Concept and measurement model score of 33.3 points.
¥
The items of concept and measurement model attribute were evaluated globally as follows: Concept of measurement stated (++++); Obtain-
ing and combining items described (+); Rationality for dimensionality and scales (+); Involvement of target population (–); Scale variability
described and adequate (+++); Level of measurement described (++); Procedures for deriving scores (++)
ǂ
Healthy subjects score was based on four attributes, excluding Responsiveness to change
13
Quality of Life Research (2018) 27:2217–2226 2223
the ROM and strength parts of the scale, originally designed unstable patients, would have been desirable in the other
for the clinicians [23]. The cultural adaptation attribute was groups.
not evaluated in this study. Information based on cultur- Reliability was overall the second best scored quality,
ally adapted CMS versions was not assessed separately; it with reproducibility being more frequently and adequately
was considered part of the standardized evaluation. This presented than internal consistency. Cronbach alphas were
approach has also been followed in previous articles [18]. > 0.60, but a value of 0.37 was also seen [31]. Scarce infor-
mation was surprising, as internal consistency is one of the
commonest reported scale properties [28]. Given that many
Discussion perceive the CMS as a gold standard, it may be that internal
consistency is not of concern to them. On the other hand,
The CMS scale has been accepted and widely used, with- it may also reflect selective reporting. A recently published
out ever being properly validated [2, 4, 13, 14]. In the cur- study, on patients with humeral fractures, concluded insuf-
rent study the psychometric properties of the CMS were ficient evidence as far as the CMS internal consistency was
assessed, in seven pathology groups, by expert evaluators concerned [32].
using the EMPRO tool. In general, assigned scores were The available evidence supports scale reproducibility for
low. Subacromial and various pathologies obtained the best subacromial pathology and healthy subjects. Fractures and
overall evaluations, but only the first group’s total EMPRO various pathologies also scored over the established thresh-
score was considered acceptable. Healthy subjects presented old but information, particularly on data collection methods,
higher attribute scores, compared to most individual pathol- was not sufficient. A previous systematic review reported the
ogies. This was due to the fact that most of the respective CMS reproducibility to be acceptable in different shoulder
publications were evaluating at least one CMS property. conditions [14]. However, some of the corresponding esti-
They were thus more likely to adequately analyse and report mations were based on Spearman´s correlation coefficients.
the corresponding information. Lack of interpretability data This statistic does not capture systematic score differences
penalized this group´s total score. In shoulder fractures, and cannot be considered an appropriate reproducibility
reproducibility surpassed the desired threshold and respon- measure [33]. Intraclass correlation coefficients or the 95%
siveness to change was borderline, but none of the other limits of agreement are more adequate methods for evaluat-
attributes were regarded as adequate. Others have argued ing this property [17, 34].
that the current evidence does not really support the broad Validity, the degree to which an instrument measures
CMS use in this kind of patients [15, 24]. Responsiveness to what is supposed to measure [30], was acceptable only for
change and administrative burden were the only two attrib- various pathologies and healthy subjects. Within this attrib-
utes with acceptable estimations for arthritis. Administrat- ute, content validity was the worse evaluated aspect in all
ing the CMS in rheumatoid arthritis patients has also been groups. On the other hand prior hypothesis related to con-
criticized; mainly due to the difficulty of an accurate strength vergent and known group validity, received the same score
component registration [25]. Shoulder instability can lead to (+++) across all shoulder diagnoses.
luxation episodes and severe pain, reducing overall function, Interpretability, the degree to which a scale´s score can be
but these characteristics are not constantly present, which is assigned an easily understood meaning [30], was acceptable
why the CMS cannot properly assess this particular condi- for the subacromial and on the threshold for various patholo-
tion. This has been previously addressed and also accepted gies. The concept and measurement model attribute, high-
by the original CMS author [5, 13, 26, 27]. The current study lighted that data on the CMS development are scarce. Even
corroborates this already known fact. Expert evaluations did though the concept of measurement has been clearly stated
not indicate good CMS properties for frozen shoulder. It is and scale variability properly described, no information
worth mentioning that one of the purposes of the single arti- related to a target population has been presented. Ration-
cle assessed in that group was studying the CMS drawbacks ale for item selection and scale components is insufficient,
when administered to frozen shoulder patients [7]. whereas the level of measurement and justification of score
As far as the different psychometric properties are con- derivation are not properly explained. Similar observations
cerned, responsiveness to change and reliability are of major have been made by previous authors [2–4].
importance to the clinicians [28–30]. Instruments capable Finally, great variability was observed in the burden
of capturing changes over time, and free of random error attributes. Subacromial pathology was the only one with an
are of great relevance. In the current study, responsiveness acceptable respondent burden. This group, along with vari-
to change was the best evaluated quality, with ≥ 50 points ous pathologies and arthritis also obtained the highest scores
across all groups (but frozen shoulder). Among them, better in administrative burden.
evidence was obtained for subacromial pathology. Further At this point, it is relevant to mention that subacromial
information, especially in relation to comparing stable and pathology was the most frequent shoulder condition in the
13
2224 Quality of Life Research (2018) 27:2217–2226
various articles regarding pathologies. It is thus likely that accurate and reproducible findings. Information related to
the evaluations of this very group may have been affected the exact CMS administration is still to be improved. The
by this fact. reviewers’ expertise may have introduced certain variability
Based on 34 articles, and performing a descriptive syn- to the obtained results. However, EMPRO-specific instruc-
thesis of the evidence, the above-mentioned work of Roy tions and consensus of all evaluations should have mini-
et al. concluded that the convergent validity of the scale mized the effect of this bias. Finally, the number of articles
was well established, reliability coefficient values reached identified per group may have affected certain evaluations.
acceptable benchmarks and that, with the exception of shoul- It is important to highlight that the inclusion criteria were
der instability, the CMS had excellent responsiveness [14]. applied irrespective of diagnoses. Included articles were
Our results support the convergent validity of the scale, with those presenting psychometric information. Grouping them
the exception of frozen shoulder, but disagree with the gen- by diagnosis was done a posteriori. There is a chance that
eralizability of the other two statements. According to the better evidence could be provided if more publications were
current evaluations, based on broader evidence, reliability found, but the systematic literature review steps followed and
cannot be claimed in the cases of arthritis, instability and the specific inclusion criteria should have reduced the pos-
frozen shoulder. Further responsiveness to change informa- sibility of excluding relevant articles. Finally, the EMPRO
tion, would have been desirable for all, but the subacromial instrument was created for evaluating PRO and the CMS is
pathology group. a functional scale with a PRO component. Nonetheless, it
Recently, another systematic review and standardized is accepted that both instrument types should possess the
evaluation of various shoulder scales in rotator cuff patients, same psychometric properties, which justifies the aim and
using the COSMIN checklist was published [16]. Based on approach of the current study [28, 36].
17 articles the authors concluded positive evidence for CMS The systematic review followed by expert evaluations
reproducibility and responsiveness, indeterminate evidence constitute the main strengths of this work. To the best of
for internal consistency, measurement error and criterion our knowledge our study is the first in using the EMPRO
validity, while negative or lack of evidence was found for assessment tool for exploring the CMS attributes in different
the rest evaluated attributes. shoulder pathologies. The current results offer a clearer per-
The administered intervention is an important factor ception of the scale´s psychometric properties, indicating its
in the evolution of any pathology [35]. The possibility of positive and negative qualities. Our conclusions are in line
additional evaluations, considering the applied intervention, with previously published works [6, 15, 16, 25, 37, 38]. The
irrespective of diagnosis, was contemplated in a secondary present evaluation should be of interest to the clinicians who
phase of this study. However, grouping the articles anew, administer the scale, and to the investigators who may wish
based on this characteristic, was very difficult to accomplish. to improve the available information. Exploring the CMS
Single studies applied different interventions for the same properties in different intervention types or developing vari-
underlying pathologies; frequently information on interven- ations of the scale, applicable to certain shoulder pathologies
tion type and procedures was not available and commonly for example, could be possible future investigation lines.
results were presented globally, ignoring the intervention
type. For the above reasons no such evaluations were even-
tually performed. Conclusions
Certain limitations should be addressed. After the latest
recommendations [13], the modified CMS version (i.e. VAS The CMS use is advisable for patients with subacromial
for pain and ADL activities) or the age and sex adjusted pathology. As far as other shoulder conditions are con-
score, has been implemented in certain publications. Most cerned, the evidence suggests certain capacity in capturing
articles presenting these “updated” scores also reported changes over time, but the data were not conclusive. The
the original CMS values. When this was not the case, the obtained results do not justify the CMS as a gold standard in
“updated” values were considered to be the same as the orig- shoulder evaluation. Prospective studies set up to explore the
inal ones. The recommended modifications are perceived as psychometric properties of the scale, particularly for frac-
improvements of an instrument, rather than different score tures, arthritis, instability and frozen shoulder are needed.
tools, which justifies their joint evaluation. While, due to
the low number of these publications, separate evaluations Acknowledgements We thank the EMPRO Group for its assistance
would not have been possible. An additional limitation is and expert reviews in this study; especially Montse Ferrer and Miren
Orive for their help and guidance.
the fact that most included articles did not explain the exact
way of assessing the scale´s strength component. It is pos-
Author contributions KV, MA, AE and RC are responsible for the
sible that studies implementing an ISOBEX dynamometer concept, design, and information collection; KV, AE and RC obtained
or similar, as recently recommended [13], obtained more funding for this project; KV, MA, MM, MMA, YP, OG, CZ and NG
13
Quality of Life Research (2018) 27:2217–2226 2225
participated in the standardized evaluation process and data interpreta- 8. van den Ende, C. H., Rozing, P. M., Dijkmans, B. A., Verhoef, J.
tion; KV and MA drafted the first version of the manuscript; all authors A., Voogt-van der Harst, E. M., & Hazes, J. M. (1996). Assess-
have critically revized and accepted all manuscript versions. ment of shoulder function in rheumatoid arthritis. The Journal of
Rheumatology, 23, 2043–2048.
Funding This study was funded by the Health Department of the 9. Walton, M. J., Walton, J. C., Honorez, L. A., Harding, V. F., &
Basque Country Government (No.: 2013111087). The funding source Wallace, W. A. (2007). A comparison of methods for shoulder
was not involved in any phase of the current project. strength assessment and analysis of Constant score change in
patients aged over fifty years in the United Kingdom. Journal of
Shoulder and Elbow Surgery, 16, 285–289.
Data availability The articles on which the standardized evaluations
10. Yian, E. H., Ramappa, A. J., Arneberg, O., & Gerber, C. (2005).
were based are presented in Online Appendix 2. Final standardized
The constant score in normal shoulders. Journal of Shoulder and
evaluations are presented in Table 2. Individual standardized evalu-
Elbow Surgery, 14, 128–133.
ations are available from the corresponding author, upon reasonable
11. Lillkrona, U. (2008). How should we use the constant score?—
request.
A commentary. Journal of Shoulder and Elbow Surgery, 17,
362–363.
Compliance with ethical standards 12. Bankes, M. J., Crossman, J. E., & Emery, R. J. (1998). A standard
method of shoulder strength measurement for the constant score
Conflict of interest None of the authors has any conflicts of interests. with a spring balance. Journal of Shoulder and Elbow Surgery,
7, 116–121.
Ethical approval This article does not contain any studies with human 13. Constant, C. R., Gerber, C., Emery, R. J., Sojbjerg, J. O., Gohlke,
participants performed by any of the authors. F., & Boileau, P. (2008). A review of the Constant score: Modifi-
cations and guidelines for its use. Journal of Shoulder and Elbow
Informed consent This is a systematic review study. For this reason, Surgery, 17, 355–361.
informed consent was not needed. 14. Roy, J. S., Macdermid, J. C., & Woodhouse, L. J. (2010). A
systematic review of the psychometric properties of the Con-
stant–Murley score. Journal of Shoulder and Elbow Surgery, 19,
Open Access This article is distributed under the terms of the Crea- 157–164.
tive Commons Attribution 4.0 International License (http://creativeco 15. Slobogean, G. P., & Slobogean, B. L. (2011). Measuring shoul-
mmons.org/licenses/by/4.0/), which permits unrestricted use, distribu- der injury function: Common scales and checklists. Injury, 42,
tion, and reproduction in any medium, provided you give appropriate 248–252.
credit to the original author(s) and the source, provide a link to the 16. Huang, H., Grant, J. A., Miller, B. S., Mirza, F. M., & Gagnier,
Creative Commons license, and indicate if changes were made. J. J. (2015). A systematic review of the psychometric properties
of patient-reported outcome instruments for use in patients with
rotator cuff disease. The American Journal of Sports Medicine,
43, 2572–2582.
17. Valderas, J. M., Ferrer, M., Mendivil, J., Garin, O., Rajmil, L.,
References Herdman, M., et al. (2008). Development of EMPRO: A tool for
the standardized assessment of patient-reported outcome meas-
1. Constant, C. R., & Murley, A. H. (1987). A clinical method of ures. Value Health, 11, 700–708.
functional assessment of the shoulder. Clinical Orthopaedics & 18. Schmidt, S., Ferrer, M., Gonzalez, M., Gonzalez, N., Valderas,
Related Research, 214, 160–164. J. M., Alonso, J., et al. (2014). Evaluation of shoulder-specific
2. Barra-Lopez, M. E. (2007). El test de Constant-Murley. Una revi- patient-reported outcome measures: a systematic and standardized
sion de sus caracteristicas. Rehabilitacion (Madr), 41, 228–235. comparison of available evidence. Journal of Shoulder and Elbow
3. Kirkley, A., Griffin, S., & Dainty, K. (2003). Scoring systems Surgery, 23, 434–444.
for the functional assessment of the shoulder. Arthroscopy, 19, 19. Garin, O., Herdman, M., Vilagut, G., Ferrer, M., Ribera, A.,
1109–1120. Rajmil, L., et al. (2014). Assessing health-related quality of life in
4. Angst, F., Schwyzer, H. K., Aeschlimann, A., Simmen, B. R., & patients with heart failure: A systematic, standardized comparison
Goldhahn, J. (2011). Measures of adult shoulder function: dis- of available measures. Heart Failure Reviews, 19(3), 359–367.
abilities of the arm, shoulder, and hand questionnaire (DASH) and https://doi.org/10.1007/s10741-013-9394-7.
its short version (QuickDASH), shoulder pain and disability index 20. Schmidt, S., Garin, O., Pardo, Y., Valderas, J. M., Alonso, J.,
(SPADI), American shoulder and elbow surgeons (ASES) Society Rebollo, P., et al. (2014). Assessing quality of life in patients with
standardized shoulder assessment form, constant (Murley) score prostate cancer: A systematic and standardized comparison of
(CS), simple shoulder test (SST), Oxford shoulder score (OSS), available instruments. Quality of Life Research, 23, 2169–2181.
shoulder disability questionnaire (SDQ), and Western Ontario 21. Maratia, S., Cedillo, S., & Rejas, J. (2016). Assessing health-
shoulder instability index (WOSI). Arthritis Care & Research related quality of life in patients with breast cancer: A systematic
(Hoboken), 63(Suppl 11), S174-S188. and standardized comparison of available instruments using the
5. Conboy, V. B., Morris, R. W., Kiss, J., & Carr, A. J. (1996). An EMPRO tool. Quality of Life Research, 25, 2467–2480.
evaluation of the Constant–Murley shoulder assessment. The 22. Sinclair, S., Russell, L. B., Hack, T. F., Kondejewski, J., &
Journal of Bone and Joint Surgery, 78, 229–232. Sawatzky, R. (2017). Measuring compassion in healthcare: A
6. Blonna, D., Scelsi, M., Marini, E., Bellato, E., Tellini, A., Rossi, comprehensive and critical review. Patient, 10(4), 389–405. https
R., et al. (2012). Can we improve the reliability of the Constant– ://doi.org/10.1007/s40271-016-0209-5.
Murley score? Journal of Shoulder and Elbow Surgery, 21, 4–12. 23. Levy, O., Haddo, O., Massoud, S., Mullett, H., & Atoun, E.
7. Othman, A., & Taylor, G. (2004). Is the constant score reliable (2014). A patient-derived Constant–Murley score is comparable
in assessing patients with frozen shoulder? 60 shoulders scored to a clinician-derived score. Clinical Orthopaedics and Related
3 years after manipulation under anaesthesia. Acta Orthopaedica Research, 472, 294–303.
Scandinavica, 75, 114–116.
13
2226 Quality of Life Research (2018) 27:2217–2226
24. Baker, P., Nanda, R., Goodchild, L., Finn, P., & Rangan, A. 32. Mahabier, K. C., Den, H. D., Theyskens, N., Verhofstad, M. H. J.,
(2008). A comparison of the Constant and Oxford shoulder scores & Van Lieshout, E. M. M. (2017). Reliability, validity, responsive-
in patients with conservatively treated proximal humeral fractures. ness, and minimal important change of the disabilities of the arm,
J Shoulder Elbow Surg, 17, 37–41. shoulder and hand and Constant–Murley scores in patients with a
25. Christie, A., Hagen, K. B., Mowinckel, P., & Dagfinrud, H. (2009). humeral shaft fracture. Journal of Shoulder and Elbow Surgery,
Methodological properties of six shoulder disability measures in 26, e1–e12.
patients with rheumatic diseases referred for shoulder surgery. 33. Bland, J. M., & Altman, D. G. (1986). Statistical methods for
Journal of Shoulder and Elbow Surgery, 18, 89–95. assessing agreement between two methods of clinical measure-
26. Dawson, J., Fitzpatrick, R., & Carr, A. (1999). The assessment of ment. Lancet, 1, 307–310.
shoulder instability: The development and validation of a ques- 34. Altman, D. G. (1999). Practical statistics for medical research.
tionnaire. The Journal of Bone and Joint Surgery, 81, 420–426. Boca Raton: Chapman & Hall.
27. Plancher, K. D., & Lipnick, S. L. (2009). Analysis of evidence- 35. Murphy, R. J., & Carr, A. J. (2010). Shoulder pain. BMJ Clinical
based medicine for shoulder instability. Arthroscopy, 25, 897–908. Evidence, 2010, 1107.
28. Mc Dowell, I. (2006). Measuring health: A guide to rating scales 36. Mitchell, L. E., Ziviani, J., Oftedal, S., & Boyd, R. N. (2013).
and questionnaires (3rd edn). Oxford: Oxford University Press. A systematic review of the clinimetric properties of measures
29. Ferguson, C. J. (2009). An effect size primer: A guide for clini- of habitual physical activity in primary school aged children
cians and researchers. Professional Psychology: Research and with cerebral palsy. Research in Developmental Disabilities, 34,
Practice, 40, 532. 2419–2432.
30. Scientific Advisory Committee of the Medical Outcomes Trust 37. Angst, F., Goldhahn, J., Drerup, S., Aeschlimann, A., Schwyzer,
(2002). Assessing health status and quality-of-life instruments: H. K., & Simmen, B. R. (2008). Responsiveness of six outcome
Attributes and review criteria. Quality of Life Research, 11, assessment instruments in total shoulder arthroplasty. Arthritis &
193–205. Rheumatism, 59, 391–398.
31. Oh, J. H., Jo, K. H., Kim, W. S., Gong, H. S., Han, S. G., & Kim, 38. Rocourt, M. H., Radlinger, L., Kalberer, F., Sanavi, S., Schmid, N.
Y. H. (2009). Comparative evaluation of the measurement prop- S., Leunig, M., et al. (2008). Evaluation of intratester and inter-
erties of various shoulder outcome instruments. The American tester reliability of the Constant–Murley shoulder assessment.
Journal of Sports Medicine, 37, 1161–1168. Journal of Shoulder and Elbow Surgery, 17, 364–369.
Affiliations
13