(navigation image)
Home American Libraries | Canadian Libraries | Universal Library | Community Texts | Project Gutenberg | Children's Library | Biodiversity Heritage Library | Additional Collections
Search: Advanced Search
Anonymous User (login or join us)
Upload
See other formats

Full text of "A Compendium of Neuropsychological Tests Administration"

A Compendium of 

Neuropsychological Tests: 

Administration, Norms, 

and Commentary, 

Third Edition 



ESTHER STRAUSS 

ELISABETH M. S. SHERMAN 

OTFRIED SPREEN 



OXFORD UNIVERSITY PRESS 



Contents 



List of Acronyms, xiii 

1. Psychometrics in Neuropsychological Assessment, 1 

2. Norms Selection in Neuropsychological Assessment, 44 

3. History Taking, 55 

4. Test Selection, Test Administration, and Preparation of 
the Patient, 75 

5. Report Writing and Feedback Sessions, 86 

6. General Cognitive Functioning, Neuropsychological 
Batteries, and Assessment of Premorbid Intelligence, 98 

Introduction, 98 

Bayley Scales of Infant Development — Second Edition 

(BSID-II), 114 
Cognitive Assessment System (CAS), 133 
Dementia Rating Scale— 2 (DRS-2), 144 
Kaplan Baycrest Neurocognitive Assessment (KBNA), 159 
Kaufman Brief Intelligence Test (K-BIT), 164 
Mini-Mental State Examination (MMSE), 168 
National Adult Reading Test (NART), 189 
NEPSY: A Developmental Neuropsychological 

Assessment, 201 
Neuropsychological Assessment Battery (NAB), 218 
Raven's Progressive Matrices (RPM), 229 
Repeatable Battery for the Assessment of 

Neuropsychological Status (RBANS), 237 
Stanford-Binet Intelligence Scales — Fifth Edition 

(SB5),258 
The Test of Nonverbal Intelligence— 3 (TONI-3), 268 
The Speed and Capacity of Language Processing 

Test (SCOLP), 272 
Wechsler Abbreviated Scale of Intelligence (WASI), 279 
Wechsler Adult Intelligence Scale— III (WAIS-III), 283 
Wechsler Intelligence Scale for Children — Fourth Edition 

(WISC-IV),310 



Wechsler Preschool and Primary Scale of Intelligence — 

Third Edition (WPPSI-III), 337 
Wechsler Test of Adult Reading (WTAR), 347 
Woodcock- Johnson III Tests of Cognitive Abilities 

(WJ III COG), 351 

7. Achievement Tests, 363 

Introduction, 363 

The Gray Oral Reading Test — Fourth Edition 

(GORT-4), 365 
Wechsler Individual Achievement Test — Second 

Edition (WIAT-II), 370 
Wide Range Achievement Test— 3 ( WRAT3), 384 
Woodcock- Johnson III Tests of Achievement 

(WJIIIACH),390 

8. Executive Functions, 401 

Introduction, 401 

Behavioral Assessment of the Dysexecutive Syndrome 

(BADS), 408 
CANTAB, 415 
Category Test (CT), 424 
Cognitive Estimation Test (CET), 437 
Delis-Kaplan Executive Function System (D-KEFS), 443 
Design Fluency Test, 450 
Five-Point Test, 456 
The Hayling and Brixton Tests, 460 
Ruff Figural Fluency Test (RFFT), 466 
Self-Ordered Pointing Test (SOPT), 471 
Stroop Test, 477 
Verbal Fluency, 499 
Wisconsin Card Sorting Test ( WCST), 526 

9. Attention, 546 

Introduction, 546 
Brief Test of Attention (BTA), 547 
Color Trails Test (CTT) and Children's Color 
Trails Test (CCTT), 554 



Contents 



Comprehensive Trail Making Test (CTMT), 557 
Conners' Continuous Performance Test II 

(CPT-II), 562 
Integrated Visual and Auditory Continuous Performance 

Test (IVA + Plus), 575 
Paced Auditory Serial Addition Test (PASAT) and 

Children's Paced Auditory Serial Addition Test 

(CHIPASAT), 582 
Ruff 2 & 7 Selective Attention Test (2 & 7 Test), 610 
Symbol Digit Modalities Test (SDMT), 617 
Test of Everyday Attention (TEA), 628 
Test of Everyday Attention for Children 

(TEA-Ch), 638 
Test of Variables of Attention (T.O.V.A.), 645 
Trail Making Test (TMT), 655 

10. Memory, 678 

Introduction, 678 

Autobiographical Memory Interview (AMI), 687 
Benton Visual Retention Test (BVRT-5), 691 
Brief Visuospatial Memory Test — Revised 

(BVMT-R), 701 
Brown-Peterson Task, 704 
Buschke Selective Reminding Test (SRT), 713 
California Verbal Learning Test-II (CVLT-II), 730 
California Verbal Learning Test — Children's Version 

(CVLT-C), 735 
Children's Memory Scale (CMS), 746 
Doors and People Test (DPT), 755 
Hopkins Verbal Learning Test — Revised 

(HVLT-R), 760 
Recognition Memory Test (RMT), 769 
Rey-Osterrieth Auditory Verbal Learning Test 

(RAVLT), 776 
Rey Complex Figure Test (ROCF), 811 
Rivermead Behavioural Memory Test — Second 

Edition (RBMT-II), 841 
Ruff-Light Trail Learning Test (RULIT), 851 
Sentence Repetition Test, 854 

Wechsler Memory Scale— Third Edition (WMS-III), 860 
Wide Range Assessment of Memory and 

Learning— Second Edition (WRAML2), 881 

1 1 . Language Tests, 891 

Introduction, 891 

Boston Diagnostic Aphasia Examination — 

Third Edition (BDAE-3), 892 
Boston Naming Test— 2 (BNT-2), 901 
Dichotic Listening — Words, 916 
Expressive One-Word Picture Vocabulary Test — 

Third Edition (EOWPVT3), 922 
Expressive Vocabulary Test (EVT), 928 
Multilingual Aphasia Examination (MAE), 933 
Peabody Picture Vocabulary Test — Third Edition 

(PPVT-III),940 
Token Test (TT), 953 



12. Tests of Visual Perception, 963 

Introduction, 963 
Balloons Test, 965 
Bells Cancellation Test, 968 
Clock Drawing Test (CDT), 972 
Facial Recognition Test (FRT), 983 
Hooper Visual Organization Test (VOT), 990 
Judgement of Line Orientation (JLO), 997 
Visual Object and Space Perception Battery 
(VOSP), 1006 

13. Tests of Somatosensory Function, Olfactory Function, 
and Body Orientation, 1012 

Introduction, 1012 
Finger Localization, 1013 
Right-Left Orientation (RLO), 1017 
Rivermead Assessment of Somatosensory 

Performance (RASP), 1020 
Smell Identification Test (SIT), 1023 
Tactual Performance Test (TPT), 1031 

14. Tests of Motor Function, 1042 

Introduction, 1042 
Finger Tapping Test (FTT), 1043 
Grip Strength, 1052 
Grooved Pegboard, 1061 
Purdue Pegboard Test, 1068 

15. Assessment of Mood, Personality, 
and Adaptive Functions, 1080 

Introduction, 1080 

Beck Depression Inventory — Second Edition 

(BDI-II), 1084 
Behavior Rating Inventory of Executive Function 

(BRIEF), 1090 
Geriatric Depression Scale (GDS), 1099 
Instrumental Activities of Daily Living (IADL), 1 107 
Minnesota Multiphasic Personality Inventory-2 

(MMPI-2), 1113 
Personality Assessment Inventory (PAI), 1126 
Scales of Independent Behavior — Revised 

(SIB-R), 1134 
Trauma Symptom Inventory (TSI), 1140 

16. Assessment of Response Bias 

and Suboptimal Performance, 1 145 

Introduction, 1145 

The b Test, 1158 

The Dot Counting Test (DCT), 1161 

Rey Fifteen-Item Test (FIT), 1 166 

Test of Memory Malingering (TOMM), 1171 

21 Item Test, 1176 

Victoria Symptom Validity Test (VSVT), 1 179 

Word Memory Test ( WMT), 1 184 

Test Index, 1189 
Subject Index, 1204 



List of Acronyms 



3MS Modified Mini-Mental State Examination BIA 

2 & 7 Test Ruff 2 & 7 test BLC 

AAE African American English BNT 

ACI Attention/ Concentration Index BQSS 

AcoA Anterior communicating artery BRB-N 

AD Alzheimer's disease or absolute deviation 

ADHD Attention Deficit Hyperactivity Disorder BRI 

ADL Activities of daily living BRIEF 

AERA American Educational Research Association 

AIS Autobiographical Incidents Schedule BRIEF-A 

ALS Amyotrophic lateral sclerosis 

AMI Autobiographical Memory Interview or BRIEF-P 
Asocial Maladaptic Index 

AMNART American National Adult Reading Test BRIEF-SR 

A-MSS Age-corrected MOANS scaled score 

A&E-MSS Age- and education-corrected MOANS BRR 

scaled score BRS 

ANELT-A Nijmwegen Everyday Language Test BSI 

APA American Psychological Association BSID 

APM Advanced Progressive Matrices B-SIT 

Arith Arithmetic BTA 

ASHA American Speech and Hearing Association BVMT-R 

ASTM Amsterdam Short Term Memory Test 

ATR Atypical Response BVRT 

ATT Attention CAARS 

Aud. Imm. Auditory Immediate CANTAB 

Aud. Del. Auditory Delay 

Aud. Recog Auditory Recognition CARB 

AVLT Auditory Verbal Learning Test CAS 

BADS Behavioral Assessment of the Dysexecutive C-AUSNART 

Syndrome CBCL 

BAI Beck Anxiety Inventory CCC 

BASC Behavior Assessment System for Children CCF 

BCET Biber Cognitive Estimation Test CCRT 

BCT Booklet Category Test CCTT 

BD Block Design CCT 

BDAE Boston Diagnostic Aphasia Examination CDF 

BDI Beck Depression Inventory CDT 



Brief Intellectual Ability 
Big Little Circle 
Boston Naming Test 
Boston Qualitative Scoring System 
Brief Repeatable Battery of Neuropsycho- 
logical Tests 

Behavior Regulation Index 
Behavior Rating Inventory of Executive 
Function 

Behavior Rating Inventory of Executive 
Function — Adult 

Behavior Rating Inventory of Executive 
Function-Preschool Version 
Behavior Rating Inventory of Executive 
Function — Self- Report 
Back Random Responding 
Behavior Rating Scale 
Brief Symptom Inventory 
Bayley Scales of Infant Development 
Brief Smell Identification Test 
Brief Test of Attention 
Brief Visuospatial Memory 
Test — Revised 

Benton Visual Retention Test 
Conners' Adult ADHD Rating Scale 
Cambridge Neuropsychological Test Auto- 
mated Batteries 

Computerized Assessment of Response Bias 
Cognitive Assessment System 
Contextual Australian NART 
Child Behavior Checklist 
Consonant Trigrams 
Cattell Culture Fair Intelligence Test 
Cambridge Contextual Reading Test 
Children's Color Trails Test 
Children's Category Test 
Cashel Discriminant Function 
Clock Drawing Test 



xiv List of Acronyms 



CELF 


Clinical Evaluation of Language Funda- 


DRS 




mentals 


DSM-IV 


CERAD 


Consortium for the Establishment of a Reg- 






istry for Alzheimer's Disease 


DSp 


CES-D 


Center for Epidemiological Studies 
Depression Scale 


DSS-ROCF 


CET 


Cognitive Estimation Test 


DSym 


CFT 


Complex Figure Test 


DT 


CFQ 


Cognitive Failures Questionnaire 


Ed 


CHC 


Carroll-Horn-Cattell 


EI 


CHI 


Closed head injury 


ELF 


CHIPASAT 


Children's Paced Auditory Serial 


EMI 




Addition Test 


EMQ 


CI 


Confidence interval 


EOWPVT 


CLR 


Conceptual Level Responses 




CLTR 


Consistent Long-Term Retrieval 


ERP 


CMS 


Children's Memory Scale 


ERR 


CNS 


Central nervous system 


E-Score 


CO 


Correlation Score 


EVT 


Comp 


Comprehension 


EXIT25 


CONCEPT 


Conceptualization 


Fam Pic (Pix) 


Cons 


Consistency 


FAS 


CONSRT 


Construction 




COWA 


Controlled Oral Word Association 




CPM 


Colored Progressive Matrices 


FBS 


CPT 


Continuous Performance Test 


FDWT 


CRI 


Concussion Resolution Index 


FIM 


CSHA 


Canadian Study of Health and Aging 


FFD 


CT 


Category Test or computed tomography 


FIT 


CTMT 


Comprehensive Trail Making Test 


FLD 


CTT 


Color Trails Test 


FLE 


CV 


Consonant Vowel 


FMS 


CVA 


Cerebrovascular accident 


FP 


CVLT 


California Verbal Learning Test 


FRT 


CVLT-C 


California Verbal Learning Test — Children's 


FTD 




Version 


FTT 


CVLT-SF 


California Verbal Learning Test — Short Form 


FSIQ 


CW 


Color-Word 


FWSTM 


DAFS 


Direct Assessment of Functional Status 


GAI 


DAI 


Diffuse axonal injury 


Ga 


DAS 


Das Assessment System 


Gc 


DCT 


Dot Counting Test 


GCS 


DEF 


Defensiveness Index 


GDS 


DEX 


Dysexecutive Questionnaire 




DFR 


Delayed Free Recall 


GEC 


DFT 


Design Fluency Test 


Gf 


DH 


Dominant Hand 


GLR 


DICA 


Diagnostic Interview for Children and 


GQ 




Adolescents 


Grw 


D-KEFS 


Delis-Kaplan Executive Function System 


GS 


DL 


Dichotic Listening 


Gsm 


DLB 


Dementia with Lewy bodies 


GV 


DMS 


Delayed Matching to Sample 


GIA 


DPT 


Doors and People Test 


GLC 


DR 


Delayed Recall or Recognition 


GMI 


DRI 


Delayed Recall Index 





Dementia Rating Scale 
Diagnostic and Statistical Manual of Men- 
tal Disorders 
Digit Span 

Developmental Scoring System for the 
Rey-Osterrieth Complex Figure 
Digit Symbol 
Dual Task 
Education 
Exaggeration Index 
Excluded Letter Fluency 
Externalized Maladaptive Index 
Everyday Memory Questionnaire 
Expressive One-Word Picture Vocabulary 
Test 

Event-related potential 
Errors 

Effort-Index Score 
Expressive Vocabulary Test 
Executive Interview 
Family Pictures 

Letters commonly used to assess phonemic 
fluency (COWA) or Fetal Alcohol 
Syndrome 
Fake Bad Scale 
Fused Dichotic Words Test 
Functional Independence Measures 
Freedom from Distractibility 
Rey Fifteen-Item Test 
Frontal lobe dementia 
Frontal lobe epilepsy 
Failure to Maintain Set 
False Positive 
Facial Recognition Test 
Frontotemporal dementia 
Finger Tapping Test 
Full Scale IQ 

Four- Word Short-Term Memory Test 
General Ability Index 
Auditory Processing 
Crystallized Ability 
Glascow Coma Scale 
Geriatric Depression Scale or Gordon 
Diagnostic System 
Global Executive Composite 
Fluid Ability 
Long-Term Retrieval 
Quantitative Knowledge 
Reading- Writing 
Processing Speed 
Short-Term Memory 
Visual Processing 
General Intellectual Ability 
General Language Composite 
General Memory Index or General 
Maladaptic Index 



List of Acronyms xv 



GNDS 

GORT 

GRI 

HD 

HMGT 

HRCT 

HRNES 

HS 

HVLT-R 

IADL 

ICC 

IED 

IES 

IMC 

IMI 

ImPACT 

Info 

INS 
I/P 
IR 
ISI 

IVA 

JOLO(JLO) 
K-ABC 

KAIT 

K-BIT 

K-FAST 

KBNA 

KTEA 

LAMB 

LD 

LDFR 

LEA 

LEI 

Let-Num Seq 

LH 

LL 

LM 

LOC 

LOT 

LTPR 

LTR 

LTS 

MAE 

MAI 

MAL 

MAVDRI 

MAVLEI 



General Neuropsychological Deficit Scale MAVPRI 
Gray Oral Reading Test 

General Recognition Index MBA 

Huntington's disease MC 

Homophone Meaning Generation Test MCG 

Halstead-Reitan Category Test MCI 
Halstead-Russell Neuropsychological 

Evaluation System MDI 

High school MEM 

Hopkins Verbal Learning Test — Revised MHT 

Instrumental Activities of Daily Living MMPI 

Intraclass correlation MMSE 

Intra/Extra-Dimensional Shift MND 

Impact of Event Scale MOANS 
Information-Memory-Concentration Test MOAANS 

Immediate Memory Index or Internalized MOT 

Maladaptive Index MR 

Immediate Post-Concussion Assessment MRI 

and Cognitive Testing MS 

Information MSB 

International Neuropsychological Society MSI 

Initiation/Perseveration MSS 

Immediate Recognition MTBI 

Inter- Stimulus Interval MTCF 

Integrated Visual and Auditory Continuous MTS 

Performance Test NAB 

Judgement of Line Orientation NAN 

Kaufman Assessment Battery for Children NART 

Kaufman Adolescent and Adult NAART 

Intelligence Test NCCEA 
Kaufman Brief Intelligence Test 

Kaufman Functional Academic Skills Test NCE 

Kaplan Baycrest Neurocognitive NDH 

Assessment NFT 

Kaufman Test of Educational Achievement NIM 

Learning and Memory Battery NIS 

Learning disability NPA 

Long Delay Free Recall NPE 

Left ear advantage NPP 

Learning Efficiency Index NPV 

Letter Number Sequencing NVIQ 

Left hemisphere OA 

Learning to Learn OARS 

Logical Memory ODD 

Loss of consciousness OPIE 

Learning Over Trials ORQ 

Long-Term Percent Retention PA 

Long-Term Retrieval PAI 

Long-Term Storage PAL 

Multilingual Aphasia Examination PASAT 

Multilevel Assessment Instrument PASS 

Malingering Index PC 

Mayo Auditory- Verbal Delayed Recall PCS 

Index PD 

Mayo Auditory- Verbal Learning Efficiency PDD 

Index PDI 



Mayo Auditory- Verbal Percent Retention 

Index 

Mini-Battery of Achievement 

Multiple Choice 

Medical College of Georgia 

Mild Cognitive Impairment or 

Metacognitive Index 

Mental Development Index 

Memory 

Moray House Test 

Minnesota Multiphasic Personality Inventory 

Mini Mental State Examination 

Malingered Neurocognitive Dysfunction 

Mayo's Older Americans Normative Studies 

Mayo's African American Normative Studies 

Motor 

Matrix Reasoning 

Magnetic resonance imaging 

Multiple Sclerosis 

Meyers Short battery 

Memory Screening Index 

MOANS Scaled Score 

Mild traumatic brain injury 

Modified Taylor Complex Figure 

Matching to Sample 

Neuropsychological Assessment Battery 

National Academy of Neuropsychology 

National Adult Reading Test 

North American Reading Test 

Neurosensory Center Comprehensive 

Examination for Aphasia 

Normal Curve Equivalent 

Non-Dominant Hand 

Neurofibrillary tangles 

Negative Impression Management 

Neuropsychological Impairment Scale 

Negative Predictive Accuracy 

Non-Perseverative errors 

Negative predictive power 

Negative predictive value 

Nonverbal IQ 

Object Assembly 

Older Americans Resources and Services 

Oppositional Defiant Disorder 

Oklahoma Premorbid Intelligence Estimate 

Oral Reading Quotient 

Picture Arrangement 

Personality Assessment Inventory 

Paired Associate Learning 

Paced Auditory Serial Addition Test 

Planning, Attention, Simultaneous, Successive 

Picture Completion 

Post-concussive symptoms (or syndrome) 

Parkinson's disease 

Pervasive developmental disorder 

Psychomotor Development Index 



xvi List of Acronyms 



PDRT 
PE 
PET 
PIM 
PKU 
POI 

PICA 
PIQ 

PMA 

PNES 
PPA 
PPP 
PPV 

PPVT 

PR 

PRI 

PRM 

PSAT 

PSI 

PSQ 

PSS 

PTA 

PTSD 

PVSAT 

RASP 

RAVLT 

RAVLT-EI 

RBANS 

RBMT 
RBMT-C 

RBMT-E 

rc 

rCBF 

RCI 

RCI-PE 

RD 

RDF 

RDS 

REA 

REC 

RFFT 

RH 

RLO 

RLTR 

RMF 

RMI 

RMT 

ROC 

ROCF 

ROR 



Portland Digit Recognition Test 

Perseverative Errors or practice effects 

Positron emission tomography 

Positive Impression Management 

Phenylketonuria 

Perceptual Organization Index 

Porch Index of Communicative Ability 

Performance IQ 

Primary Mental Abilities 

Psychogenic nonepileptic seizures 

Positive predictive accuracy 

Positive predictive power 

Positive predictive value 

Peabody Picture Vocabulary Test 

Perseverative Responses or percentile rank 

Perceptual Reasoning Index or Percent 

Retention Index 

Pattern Recognition Memory 

Paced Serial Addition Test 

Processing Speed Index 

Processing Speed Quotient 

Pernal Semantic Schedule 

Post-traumatic amnesia 

Post-traumatic stress disorder 

Paced Visual Serial Addition Test 

Rivermead Assessment of Somatosensory 

Performance 

Rey Auditory Verbal Learning Test 

Rey Auditory Verbal Test — Exaggeration Index 

Repeatable Battery for the Assessment 

of Neuropsychological Status 

Rivermead Behavioral Memory Test 

Rivermead Behavioral Memory Test for 

Children 

Rivermead Behavioral Memory Test-Extended 

Version 

Recognition (hit rate) or Restructured 

Clinical 

Regional cerebral blood flow 

Reliable Change Index 

Reliable Change Index with practice effects 

Reading disorder 

Rogers Discriminant Function 

Reliable Digit Span 

Right ear advantage 

Recognition 

Ruff Figural Fluency Test 

Right hemisphere 

Right-Left Orientation 

Random Long-Term Retrieval 

Recognition Memory for Faces 

Rarely Missed Index or Relative Mastery Index 

Recognition Memory Test 

Receiver Operating Curve 

Rey-Osterrieth Complex Figure 

Reach Out and Read 



OWPVT 


Receptive One- Word Picture Vocabulary Test 


RPC 


Recognition Percent Correct 


RPI 


Relative Proficiency Index 


RPM 


Raven's Progressive Matrices 


RT 


Reaction Time 


RULIT 


Ruff-Light Trail Learning Test 


RVP 


Rapid Visual Information Processing 


SADI 


Self- Awareness of Deficits Interview 


SAILS 


Structured Assessment of Independent 




Living Skills 


SAS 


Standard Age Score 


SAT9 


Stanford Achievement Test, Ninth Edition 


SB 


Stanford-Binet 


SB-rv 


Stanford-Binet Intelligence Scales, Fourth 




Edition 


SBE 


Spanish Bilingual Edition 


SBI 


Screening Battery Index 


SCID 


Structured Clinical Interview for DSM 


SCL-90 


Symptom Checklist-90 


SCOLP 


Speed and Capacity of Language Processing Test 


SCT 


Short Cateogry Test 


SD 


Standard deviation 


SDMT 


Symbol Digit Modalities Test 


SEE 


Standard error of the estimate 


SEM 


Standard error of measurement 


SEP 


Standard error of prediction 


SES 


Socioeconomic status 


SIB-R 


Scales of Independent Behavior- Revised 


SIM 


Similarities 


SIT 


Smell Identification Test 


SOAP 


Subject-Relative, Object-Relative, Active, Passive 


SOC 


Stockings of Cambridge 


SOP 


Sub-Optimal Performance 


SOPT 


Self-Ordered Pointing Test 


SPECT 


Single photon emission computed tomography 


SPIE 


Australian version of the OPIE 


SPM 


Standard Progressive Matrices 


SPS 


Standardized Profile Score 


SRB 


Standardized regression-based 


SRM 


Spatial Recognition Memory 


SRT 


Selective Reminding Test 


ss 


Symbol Search or standard score 


SSP 


Spatial Span 


SST 


Silly Sentences Test 


ST 


Sub-Test 


STPR 


Short-Term Percent Retention 


STR 


Short-Term Recall 


SVT 


Symptom Validity Test 


SWT 


Spot-the-Word Test 


SWM 


Spatial Working Memory 


TACL-R 


Test for Auditory Comprehension 




of Language- Revised 


TBI 


Traumatic brain injury 


TE 


Total Errors 


TEA 


Test of Everyday Attention 


TEA-Ch 


Test of Everyday Attention for Children 



List of Acronyms xvii 



TIA 


Transient ischemic attack 


VR 


TLE 


Temporal lobe epilepsy 


VST 


TMT 


Trail Making Test 


VSVT 


TOLD-P 


Test of Language Development-Primary 


WAB 


TOMAL 


Test of Memory and Learning 


WAIS 


TOMM 


Test of Memory Malingering 


WAIS-R 


TONI 


Test of Nonverbal Intelligence 


WAIS-III 


T.O.V.A 


Test of Variables of Attention 




TP 


True Positive 


WASI 


TPT 


Tactual Performance Test 


WCST 


TRF 


Teacher's Report Form 


WIAT 


TSI 


Trauma Symptom Inventory 


wise 


TT 


Token Test 


WISC-III 


TTF 


Trials to Complete First Category 




TVIP 


Test de Vocabulario en Imagenes Peabody 


WISC-IV 


UCO 


Uses for Common Objects 




UH-DRS 


Unified Huntington's Disease Rating Scale 


WJ III ACH 


UNIT 


Universal Nonverbal Intelligence Test 




UPSIT 


University of Pennsylvania Smell Identification 
Test (see also SIT) 


WJ III COG 


VABS 


Vineland Adaptive Behavior Scale 


WJ-R 


VaD 


Vascular dementia 


WMH 


VCI 


Verbal Comprehension Index 


WMI 


VD 


Vascular dementia 


WMS 


VIQ 


Verbal IQ 


WMT 


Vis. Imm. 


Visual Immediate 


WPPSI 


Vis. Del. 


Visual Delay 




VLS 


Victoria Longitudinal Study 


WPPSI-III 


VMI 


Developmental Test of Visual-Motor 






Integration 


WPPSI-R 


VOC 


Vocabulary 




VOSP 


Visual and Object Space Perception 
Battery 


WRAML 


VOT 


Hooper Visual Organization Test 


WRAT 


VPA 


Verbal Paired Associates 


WTAR 



Visual Reproduction 

Victoria Stroop Test 

Victoria Symptom Validity Test 

Western Aphasia Battery 

Wechsler Adult Intelligence Scale 

Wechsler Adult Intelligence Scale, Revised 

Wechsler Adult Intelligence Scale, Third 

Edition 

Wechsler Abbreviated Scale of Intelligence 

Wisconsin Card Sorting Test 

Wechsler Individual Achievement Test 

Wechsler Intelligence Scale for Children 

Wechsler Intelligence Scale for Children, 

Third Edition 

Wechsler Intelligence Scale for Children, 

Fourth Edition 

Woodcock- Johnson Tests of Achievement, 

Third Edition 

Woodcock- Johnson Tests of Cognitive 

Abilities, Third Edition 

Woodcock- Johnson Revised 

White matter hyperintensities 

Working Memory Index 

Wechsler Memory Scale 

Word Memory Test 

Wechsler Preschool and Primary Scale of 

Intelligence 

Wechsler Preschool and Primary Scale of 

Intelligence, Third Edition 

Wechsler Preschool and Primary Scale of 

Intelligence, Revised 

Wide Range Assessment of Memory and 

Learning 

Wide Range Achievement Test 

Wechsler Test of Adult Reading 



This page intentionally left blank 



A COMPENDIUM OF 
NEUROPSYCHOLOGICAL TESTS 



This page intentionally left blank 



1 

Psychometrics in Neuropsychological Assessment 

with Daniel J. Slick 



OVERVIEW 



THE NORMAL CURVE 



The process of neuropsychological assessment depends to a 
large extent on the reliability and validity of neuropsycholog- 
ical tests. Unfortunately, not all neuropsychological tests are 
created equal, and, like any other product, published tests 
vary in terms of their "quality," as defined in psychometric 
terms such as reliability, measurement error, temporal stabil- 
ity, sensitivity, specificity, predictive validity, and with respect 
to the care with which test items are derived and normative 
data are obtained. In addition to commercial measures, nu- 
merous tests developed primarily for research purposes have 
found their way into wide clinical usage; these vary consider- 
ably with regard to psychometric properties. With few excep- 
tions, when tests originate from clinical research contexts, 
there is often validity data but little else, which makes esti- 
mating measurement precision and stability of test scores a 
challenge. 

Regardless of the origins of neuropsychological tests, their 
competent use in clinical practice demands a good working 
knowledge of test standards and of the specific psychometric 
characteristics of each test used. This includes familiarity 
with the Standards for Educational and Psychological Testing 
(American Educational Research Association [AERA] et al., 
1999) and a working knowledge of basic psychometrics. Texts 
such as those by Nunnally and Bernstein (1994) and Anastasi 
and Urbina (1997) outline some of the fundamental psycho- 
metric prerequisites for competent selection of tests and in- 
terpretation of obtained scores. Other, neuropsychologically 
focused texts such as Mitrushina et al. (2005), Lezak et al. 
(2004), Baron (2004), Franklin (2003a), and Franzen (2000) 
also provide guidance. The following is intended to provide a 
broad overview of important psychometric concepts in neu- 
ropsychological assessment and coverage of important issues 
to consider when critically evaluating tests for clinical usage. 
Much of the information provided also serves as a conceptual 
framework for the test reviews in this volume. 



The frequency distributions of many physical, biological, and 
psychological attributes, as they occur across individuals in 
nature, tend to conform, to a greater or lesser degree, to a bell- 
shaped curve (see Figure 1-1). This normal curve or normal 
distribution, so named by Karl Pearson, is also known as the 
Gaussian or Laplace-Gauss distribution, after the 18th-century 
mathematicians who first defined it. The normal curve is the 
basis of many commonly used statistical and psychometric 
models (e.g., classical test theory) and is the assumed distri- 
bution for many psychological variables. 1 

Definition and Characteristics 

The normal curve has a number of specific properties. It is 
unimodal, perfectly symmetrical and asymptotic at the tails. 
With respect to scores from measures that are normally dis- 
tributed, the ordinate, or height of the curve at any point 
along the x (test score) axis, is the proportion of persons 
within the sample who obtained a given score. The ordinates 
for a range of scores (i.e., between two points on the x axis) 
may also be summed to give the proportion of persons that 
obtained a score within the specified range. If a specified nor- 
mal curve accurately reflects a population distribution, then 
ordinate values are also equivalent to the probability of ob- 
serving a given score or range of scores when randomly sam- 
pling from the population. Thus, the normal curve may also 
be referred to as a probability distribution. 



Figure 



The normal curve. 




A Compendium of Neuropsychological Tests 



The normal curve is mathematically defined as follows: 



/(*) = 



I 



2no 



-e-(x-p) 2 



[1] 



Where: 



x = measurement values (test scores) 
pi = the mean of the test score distribution 
<7= the standard deviation of the test score distribution 
n= the constant pi (3.14 ... ) 
e= the base of natural logarithms (2.71 . . . ) 
f(x) = the height (ordinate) of the curve for any given test 
score 



Relevance for Assessment 

As noted previously, because it is a frequency distribution, 
the area under any given segment of the normal curve indi- 
cates the frequency of observations or cases within that inter- 
val. From a practical standpoint, this provides psychologists 
with an estimate of the "normality" or "abnormality" of any 
given test score or range of scores (i.e., whether it falls in the 
center of the bell shape, where the majority of scores lie, or 
instead, at either of the tail ends, where few scores can be 
found). The way in which the degree of "normality" or "ab- 
normality" of test scores is quantified varies, but perhaps 
the most useful and inherently understandable metric is the 
percentile. 



Z Scores and Percentiles 

A percentile indicates the percentage of scores that fall at or 
below a given test score. As an example, we will assume that 
a given test score is plotted on a normal curve. When all of 
the ordinate values at and below this test score are summed, 
the resulting value is the percentile associated with that test 
score (e.g., a score in the 75th percentile indicates that 75% of 
the reference sample obtained equal or lower scores). 

To convert scores to percentiles, raw scores may be linearly 
transformed or "standardized" in several ways. The simplest 
and perhaps most commonly calculated standard score is the 
z score, which is obtained by subtracting the sample mean 
score from an obtained score and dividing the result by the 
sample SD, as show below: 



z=(x-X)/SD 



[2] 



Where: 



x = measurement value (test score) 

X= the mean of the test score distribution 

SD = the standard deviation of the test score distribution 

The resulting distribution of z scores has a mean of and an 
SD of 1, regardless of the metric of raw scores from which they 
were derived. For example, given a mean of 25 and an SD of 5, 
a raw score of 20 translates into a z score of — 1. The percentile 



corresponding to any resulting z score can then be easily 
looked up in tables available in most statistical texts. Z score 
conversions to percentiles are also shown in Table 1-1. 

Interpretation of Percentiles 

An important property of the normal curve is that the rela- 
tionship between raw or z scores (which for purposes of this 
discussion are equivalent, since they are linear transforma- 
tions of each other) and percentiles is not linear. That is, a 
constant difference between raw or z scores will be associated 
with a variable difference in percentile scores, as a function of 
the distance of the two scores from the mean. This is due to the 
fact that there are proportionally more observations (scores) 
near the mean than there are farther from the mean; otherwise, 
the distribution would be rectangular, or non-normal. This 
can readily be seen in Figure 1-2, which shows the normal 
distribution with demarcation of z scores and corresponding 
percentile ranges. 

The nonlinear relation between z scores and percentiles 
has important interpretive implications. For example, a one- 
point difference between two z scores may be interpreted 
differently, depending on where the two scores fall on the nor- 
mal curve. As can be seen, the difference between a z score of 
and a z score of +1 is 34 percentile points, because 34% of 
scores fall between these two z scores (i.e., the scores being 
compared are at the 50th and 84th percentiles). However, the 
difference between a z score of +2 and a z score of +3 is less 
than 3 percentile points, because only 2.5% of the distribu- 
tion falls between these two points (i.e., the scores being com- 
pared are at the 98th and 99.9th percentiles). On the other 
hand, interpretation of percentile-score differences is also not 
straightforward, in that an equivalent "difference" between 
two percentile rankings may entail different clinical implica- 
tions if the scores occur at the tail end of the curve than if they 
occur near the middle of the distribution. For example, a 30- 
point difference between scores at the 1st percentile versus the 
31st percentile may be more clinically meaningful than the 
same difference between scores at the 35th percentile versus 
the 65th percentile. 

Linear Transformation of Z Scores: T Scores 
and Other Standard Scores 

In addition to the z score, linear transformation can be used 
to produce other standardized scores that have the same prop- 
erties with regard to easy conversion via table look-up (see 
Table 1-1). The most common of these are T scores (M= 50, 
SD= 10), scaled scores, and standard scores such as those used 
in most IQ tests (M= 10, SD= 3, and M= 100, SD= 15). It 
must be remembered that z scores, T scores, standard scores, 
and percentile equivalents are derived from samples; although 
these are often treated as population values, any limitations of 
generalizability due to reference sample composition or test- 
ing circumstances must be taken into consideration when 
standardized scores are interpreted. 



Table 1-1 Score Conversion Table 



IQ a 


T 


SS b 


Percentile 


-zl 


+z 


Percentile 


SS b 


T 


IQ a 


<55 


<20 


<1 


<0.1 


<3.00> 


>99.9 


>19 


>80 


>145 


56-60 


21-23 


2 


<1 


2.67- 


-2.99 


>99 


18 


77-99 


140-144 


61-67 


24-27 


3 


1 


2.20- 


-2.66 


99 


17 


73-76 


133-139 


68-70 


28-30 


4 


2 


1.96- 


-2.19 


98 


16 


70-72 


130-132 


71-72 


31 




3 


1.82- 


-1.95 


97 




69 


128-129 


73-74 


32-33 




4 


1.70- 


-1.81 


96 




67-68 


126-127 


75-76 


34 


5 


5 


1.60- 


-1.69 


95 


15 


66 


124-125 


77 






6 


1.52- 


-1.59 


94 






123 


78 


35 




7 


1.44- 


-1.51 


93 




65 


122 


79 


36 




8 


1.38- 


-1.43 


92 




64 


121 


80 




6 


9 


1.32- 


-1.37 


91 


14 




120 


81 


37 




10 
11 


1.26- 
1.21- 


-1.31 
-1.25 


90 
89 




63 


119 


82 


38 




12 


1.16- 


-1.20 


88 




62 


118 


83 






13 


1.11- 


-1.15 


87 






117 


84 


39 




14 
15 


1.06- 
1.02- 


-1.10 
-1.05 


86 

85 




61 


116 


85 


40 


7 


16 

17 


.98- 
.94- 


-1.01 

-.97 


84 
83 


13 


60 


115 


86 


41 




18 


.90- 


-.93 


82 




59 


114 


87 






19 

20 


.86- 
.83- 


-.89 
-.85 


81 
80 






113 


88 


42 




21 

22 


.79- 
.76- 


-.82 
-.78 


79 
78 




58 


112 


89 






23 


.73- 


-.75 


77 






111 


43 






24 


.70- 


-.72 


76 




57 




90 




8 


25 
26 


.66- 
.63- 


-.69 
,65 


75 
74 


12 




110 


91 


44 




27 
28 
29 


.60- 

.57- 
.54- 


,62 
,59 
,56 


73 
72 
71 




56 


109 


92 






30 


.52- 


,53 


70 






108 


45 






31 


.49- 


,51 


69 




55 




93 






32 
33 


.46- 
.43- 


,48 
,45 


68 
67 






107 


94 


46 




34 

35 
36 


.40- 
.38- 
.35- 


,42 
,39 

,37 


66 
65 
64 




54 


106 


95 




9 


37 


.32- 


,34 


63 


11 




105 


47 






38 


.30- 


,31 


62 




53 




96 






39 
40 
41 


.27- 
.25- 
.22- 


,29 
,26 
,24 


61 
60 
59 






104 


97 


48 




42 
43 
44 


.19- 
.17- 
.14- 


,21 
,18 
,16 


58 
57 
56 




52 


103 


98 






45 


.12- 


,13 


55 






102 


49 






46 


.09- 


,11 


54 




51 




99 






47 
48 
49 


.07- 
.04- 
.02- 


-.08 
-.06 
,03 


53 
52 
51 






101 


100 


50 


10 


50 


.00- 


,01 


50 


10 


50 


100 



*M= 100, SD= 15; b M= 
Note: SS - Scaled 



10, SD= 3. 



6 A Compendium of Neuropsychological Tests 

Figure 1 -2 The normal curve demarcated by z scores. 




0.15% 



0.15% 



The Meaning of Standardized Test Scores: 
Score Interpretation 

As well as facilitating translation of raw scores to estimated 
population ranks, standardization of test scores, by virtue of 
conversion to a common metric, facilitates comparison of 
scores across measures. However, this is only advisable when 
the raw score distributions for tests that are being compared 
are approximately normal in the population. In addition, if 
standardized scores are to be compared, they should be derived 
from similar samples, or more ideally, from the same sample. A 
score at the 50th percentile on a test normed on a population 
of university students does not have the same meaning as an 
"equivalent" score on a test normed on a population of elderly 
individuals. When comparing test scores, one must also take 
into consideration both the reliability of the two measures and 
their intercorrelation before determining if a significant differ- 
ence exists (see Crawford & Garthwaite, 2002). In some cases, 
relatively large disparities between standard scores may not ac- 
tually reflect reliable differences, and therefore may not be 
clinically meaningful. Furthermore, statistically significant or 
reliable differences between test scores may be common in a 
reference sample; therefore, the baserate of differences must 
also be considered, depending on the level of the scores (an IQ 
of 90 versus 110 as compared to 110 versus 130). One should 
also keep in mind that when test scores are not normally dis- 
tributed, standardized scores may not accurately reflect actual 
population rank. In these circumstances, differences between 
standard scores may be misleading. 

Note also that comparability across tests does not imply 
equality in meaning and relative importance of scores. For ex- 
ample, one may compare standard scores on measures of 
pitch discrimination and intelligence, but it will rarely be the 
case that these scores are of equal clinical or practical meaning 
or significance. 



In clinical practice, one may encounter standard scores that are 
either extremely low or extremely high. The meaning and com- 
parability of such scores will depend critically on the charac- 
teristics of the normative sample from which they derive. 

For example, consider a hypothetical case in which an ex- 
aminee obtains a raw score that is below the range of scores 
found in a normal sample. Suppose further that the SD in the 
normal sample is very small and thus the examinee's raw score 
translates to a z score of —5, indicating that the probability of 
encountering this score in the normal population would be 3 
in 10 million (i.e., a percentile ranking of .00003). This repre- 
sents a considerable extrapolation from the actual normative 
data, as (1) the normative sample did not include 10 million 
individuals (2) not a single individual in the normative sam- 
ple obtained a score anywhere close to the examinee's score. 
The percentile value is therefore an extrapolation and confers 
a false sense of precision. While one may be confident that 
it indicates impairment, there may be no basis to assume that 
it represents a meaningfully "worse" performance than a z 
score of -3, or of —4. 

The estimated prevalence value of an obtained z score (or 
T score, etc.) can be calculated to determine whether interpre- 
tation of extreme scores may be appropriate. This is simply ac- 
complished by inverting the percentile score corresponding to 
the z score (i.e., dividing 1 by the percentile score). For exam- 
ple, a z score of —4 is associated with an estimated frequency of 
occurrence or prevalence of approximately 0.00003. Dividing 1 
by this value gives a rounded result of 31,560. Thus, the esti- 
mated prevalence value of this score in the population is 1 in 
31,560. If the normative sample from which a z score is derived 
is considerably smaller than the denominator of the estimated 
prevalence value (i.e., 31,560 in the example), then some cau- 
tion may be warranted in interpreting the percentile. In addi- 
tion, whenever such extreme scores are being interpreted, 
examiners should also verify that the examinee's raw score falls 
within the range of raw scores in the normative sample. If the 
normative sample size is substantially smaller than the esti- 
mated prevalence sample size and the examinee's score falls 
outside the sample range, then considerable caution may be 
indicated in interpreting the percentile associated with the 
standardized score. Regardless of the z score value, it must also 
be kept in mind that interpretation of the associated percentile 
value may not be justifiable if the normative sample has a sig- 
nificantly non-normal distribution (see later for further dis- 
cussion of non-normality). In sum, the clinical interpretation 
of extreme scores depends to a large extent on the properties of 
the normal samples involved; one can have more confidence 
that the percentile is reasonably accurate if the normative sam- 
ple is large and well constructed and the shape of the norma- 
tive sample distribution is approximately normal, particularly 
in tail regions where extreme scores are found. 



Interpreting Extreme Scores 

A final critical issue with respect to the meaning of standard- 
ized scores (e.g., z scores) has to do with extreme observations. 



The Normal Curve and Test Construction 

Although the normal curve is from many standpoints an ideal 
or even expected distribution for psychological data, test score 



Psychometrics in Neuropsychological Assessment 



Figure 1-3 Skewed distributions. 





Positive Skew 



Negative Skew 



samples do not always conform to a normal distribution. 
When a new test is constructed, non-normality can be "cor- 
rected" by examining the distribution of scores on the proto- 
type test, adjusting test properties, and resampling until a 
normal distribution is reached. For example, when a test is 
first administered during a try-out phase and a positively 
skewed distribution is obtained (i.e., with most scores cluster- 
ing at the tail end of the distribution), the test likely has too 
high a floor, causing most examinees to obtain low scores. 
Easy items can then be added so that the majority of scores 
fall in the middle of the distribution rather than at the lower 
end (Anastasi & Urbina, 1997). When this is successful, the 
greatest numbers of individuals obtain about 50% of items 
correct. This level of difficulty usually provides the best differ- 
entiation between individuals at all ability levels (Anastasi & 
Urbina, 1997). 

It must be noted that a test with a normal distribution in 
the general population may show extreme skew or other di- 
vergence from normality when administered to a population 
that differs considerably from the average individual. For ex- 
ample, a vocabulary test that produces normally distributed 
scores in a general sample of individuals may display a neg- 
atively skewed distribution due to a low ceiling when admin- 
istered to doctoral students in literature, and a positively 
skewed distribution due to a high floor when administered to 
preschoolers from recently immigrated, Spanish-speaking 
families (see Figure 1-3 for examples of positive and negative 
skew). In this case, the test would be incapable of effectively 
discriminating between individuals within either group be- 
cause of ceiling effects and floor effects, respectively, even 
though it is of considerable utility in the general population. 
Thus, a test's distribution, including floors and ceilings, must 
always be considered when assessing individuals who differ 
from the normative sample in terms of characteristics that af- 
fect test scores (e.g., in this example, degree of exposure to En- 
glish words). In addition, whether a test produces a normal 
distribution (i.e., without positive or negative skew) is also an 
important aspect of evaluating tests for bias across different 
populations (see Chapter 2 for more discussion of bias). 

Depending on the characteristics of the construct being 
measured and the purpose for which a test is being designed, a 
normal distribution of scores may not be obtainable or even 
desirable. For example, the population distribution of the con- 
struct being measured may not be normally distributed. Alter- 
natively, one may want only to identify and/or discriminate 
between persons at only one end of a continuum of abilities 



(e.g., a creativity test for gifted students). In this case, the 
characteristics of only one side of the sample score distribu- 
tion (i.e., the upper end) are critical, while the characteristics 
on the other side of the distribution are of no particular con- 
cern. The measure may even be deliberately designed to have 
floor or ceiling effects. For example, if one is not interested in 
one tail (or even one-half) of the distribution, items that 
would provide discrimination in that region may be omitted 
to save administration time. In this case, a test with a high 
floor or low ceiling in the general population (and with posi- 
tive or negative skew) may be more desirable than a test with a 
normal distribution. In most applications, however, a more 
normal-looking curve within the targeted subpopulation is 
usually desirable. 

Non-Normality 

Although the normal curve is an excellent model for psycho- 
logical data and many sample distributions of natural pro- 
cesses are approximately normal, it is not unusual for test 
score distributions to be markedly non-normal, even when 
samples are large (Miccerti, 1989). 2 For example, neuropsy- 
chological tests such as the Boston Naming Test (BNT) and 
Wisconsin Card Sorting Test (WCST) do not have normal dis- 
tributions when raw scores are examined, and, even when de- 
mographic correction methods are applied, some tests continue 
to show a non-normal, multimodal distribution in some pop- 
ulations (Fastenau, 1998). (An example of a non-normal dis- 
tribution is shown in Figure 1-4.) 

The degree to which a given distribution approximates the 
underlying population distribution increases as the number 
of observations (N) increases and becomes less accurate as N 
decreases. This has important implications for norms com- 
prised of small samples. Thus, a larger sample will produce a 
more normal distribution, but only if the underlying popu- 
lation distribution from which the sample is obtained is 
normal. In other words, a large N does not "correct" for non- 
normality of an underlying population distribution. However, 



Figure 1 -4 A non-normal test score distribution. 

Percentiles 
0.8 68 84 




50 

Raw Score 
Mean = 50, SD = 



10 



8 A Compendium of Neuropsychological Tests 



small samples may yield non-normal distribution due to 
random sampling effects, even though the population from 
which the sample is drawn has a normal distribution. That 
is, one may not automatically assume, given a non-normal 
distribution in a small sample, that the population distribu- 
tion is in fact non-normal (note that the converse may also 
be true). 

Several factors may lead to non-normal test score distribu- 
tions: (a) the existence of discrete subpopulations within the 
general population with differing abilities, (b) ceiling or floor 
effects, and (c) treatment effects that change the location of 
means, medians, and modes and affect variability and distri- 
bution shape (Miccerti, 1989). 

Skew 

As with the normal curve, some varieties of non-normality 
may be characterized mathematically. Skew is a formal mea- 
sure of asymmetry in a frequency distribution that can be cal- 
culated using a specific formula (see Nunnally & Bernstein, 
1994). It is also known as the third moment of a distribution 
(the mean and variance are the first and second moments, re- 
spectively). A true normal distribution is perfectly symmetri- 
cal about the mean and has a skew of zero. A non-normal but 
symmetric distribution will have a skew value that is near 
zero. Negative skew values indicate that the left tail of the dis- 
tribution is heavier (and often more elongated) than the right 
tail, which may be truncated, while positive skew values indi- 
cate that the opposite pattern is present (see Figure 1-3). 
When distributions are skewed, the mean and median are not 
identical because the mean will not be at the midpoint in rank 
and z scores will not accurately translate into sample per- 
centile rank values. The error in mapping of z scores to sam- 
ple percentile ranks increases as skew increases. 

Truncated Distributions 

Significant skew often indicates the presence of a truncated 
distribution. This may occur when the range of scores is re- 
stricted on one side but not the other, as is the case, for exam- 
ple, with reaction time measures, which cannot be lower than 
several hundred milliseconds, but can reach very high positive 
values in some individuals. In fact, distributions of scores 
from reaction time measures, whether aggregated across trials 
on an individual level or across individuals, are often charac- 
terized by positive skew and positive outliers. Mean values 
may therefore be positively biased with respect to the "central 
tendency" of the distribution as defined by other indices, such 
as the median. Truncated distributions are also commonly seen 
on error scores. A good example of this is Failure to Maintain 
Set (FMS) scores on the WCST (see review in this volume). 
In the normative sample of 30- to 39-year-old persons, ob- 
served raw scores range from to 21, but the majority of per- 
sons (84%) obtain scores of or 1, and less than 1% obtain 
scores greater than 3. 



Floor/Ceiling Effects 

Floor and ceiling effects may be defined as the presence of 
truncated tails in the context of limitations in range of item 
difficulty. For example, a test may be said to have a high floor 
when a large proportion of the examinees obtain raw scores at 
or near the lowest possible score. This may indicate that the 
test lacks a sufficient number and range of easier items. Con- 
versely, a test may be said to have a low ceiling when the oppo- 
site pattern is present (i.e., when a high number of examinees 
obtain raw scores at or near the highest possible score). Floor 
and ceiling effects may significantly limit the usefulness of a 
measure. For example, a measure with a high floor may not be 
suitable for use with low functioning examinees, particularly 
if one wishes to delineate level of impairment. 



Multimodality and Other Types 
of Non-Normality 

Multimodality is the presence of more than one "peak" in a 
frequency distribution (see histogram in Figure 1-4 for an ex- 
ample) . Another form of significant non-normality is the uni- 
form or near-uniform distribution (a distribution with no or 
minimal peak and relatively equal frequency across scores). 
When such distributions are present, linearly transformed 
scores (z scores, T scores, and other deviation scores) may be 
totally inaccurate with respect to actual sample/population 
percentile rank and should not be interpreted in that frame- 
work. In these cases, sample-derived rank percentile scores 
may be more clinically useful. 

Non-Normality and Percentile Derivations 

Non-normality is not trivial; it has major implications for 
derivation and interpretation of standard scores and compar- 
ison of such scores across tests: standardized scores derived by 
linear transformation (e.g., z scores) will not correspond to 
sample percentiles, and the degree of divergence may be quite 
large. 

Consider the histogram in Figure 1-4, which shows the 
distribution of scores obtained for a hypothetical test. This 
test, with a sample size of 1000, has a mean raw score of 50 
and a standard deviation of 10; therefore (and very conve- 
niently), no linear transformation is required to obtain T 
scores. An expected normal distribution based on the ob- 
served mean and standard deviation has been overlaid on the 
observed histogram for purposes of comparison. 

The histogram in Figure 1-4 shows that the distribution of 
scores for the hypothetical test is grossly non-normal, with a 
truncated lower tail and significant positive skew, indicating 
floor effects and the existence of two distinct subpopulations. 
If the distribution were normal (i.e., if we follow the normal 
curve, superimposed on the histogram in Figure 1-4, instead 
of the histogram itself), a raw score of 40 would correspond 
to a T score of 40, a score that is 1 SD or 10 points from the 



Psychometrics in Neuropsychological Assessment 9 



mean, and translate to the 16th percentile (percentile not 
shown in the graph). However, when we calculate a percentile 
for the actual score distribution (i.e., the histogram), a score 
of 40 is actually below the 1st percentile with respect to 
the observed sample distribution (percentile = 0.8). Clearly, 
the difference in percentiles in this example is not trivial and 
has significant implications for score interpretation. 

Normalizing Test Scores 

When confronted with problematic score distributions, many 
test developers employ "normalizing" transformations in an 
attempt to correct departures from normality (examples of 
this can be found throughout this volume, in the Normative 
Data section for tests reviewed). Although helpful, these pro- 
cedures are by no means a panacea, as they often introduce 
problems of their own with respect to interpretation. Addi- 
tionally, many test manuals contain only a cursory discussion 
of normalization of test scores. Anastasi and Urbina (1997) 
state that scores should only be normalized if: (1) they come 
from a large and representative sample, or (2) any deviation 
from normality arises from defects in the test rather than 
characteristics of the sample. Furthermore, as we have noted 
above, it is preferable to adjust score distributions prior to 
normalization by modifying test content (e.g., by adding or 
modifying items) rather than statistically transforming non- 
normal scores into a normal distribution. Although a detailed 
discussion of normalization procedures is beyond the scope 
of this chapter (interested readers are referred to Anastasi & 
Urbina, 1997), ideally, test makers should describe in detail 
the nature of any significant sample non-normality and the 
procedures used to correct it for derivation of standardized 
scores. The reasons for correction should also be justified, and 
direct percentile conversions based on the uncorrected sample 
distribution should be provided as an option for users. De- 
spite the limitations inherent in correcting for non-normality, 
Anastasi and Urbina (1997) note that most test developers 
will probably continue to do so because of the need to use test 
scores in statistical analyses that assume normality of distri- 
butions. From a practical point of view, test users should be 
aware of the mathematical computations and transforma- 
tions involved in deriving scores for their instruments. When 
all other things are equal, test users should choose tests that 
provide information on score distributions and any proce- 
dures that were undertaken to correct non-normality, over 
those that provide partial or no information. 

Extrapolation/Interpolation 

Despite all the best efforts, there are times when norms fall 
short in terms of range or cell size. This includes missing data 
in some cells, inconsistent age coverage, or inadequate demo- 
graphic composition of some cells compared to the popula- 
tion. In these cases, data are often extrapolated or interpolated 
using the existing score distribution and techniques such as 



multiple regression. For example, Heaton and colleagues have 
published sets of norms that use multiple regression to cor- 
rect for demographic characteristics and compensate for few 
subjects in some cells (Heaton et al., 2003). Although multiple 
regression is robust to slight violations of assumptions, esti- 
mation errors may occur when using normative data that vio- 
lates the assumptions of homoscedasticity (uniform variance 
across the range of scores) and normal distribution of scores 
necessary for multiple regression (Fastenau & Adams, 1996; 
Heaton etal, 1996). 

Age extrapolations beyond the bounds of the actual ages of 
the individuals in the samples are also sometimes seen in nor- 
mative data sets, based on projected developmental curves. 
These norms should be used with caution due to the lack of 
actual data points in these age ranges. Extrapolation methods, 
such as those that employ regression techniques, depend on 
the shape of the distribution of scores. Including only a subset 
of the distribution of age scores in the regression (e.g., by 
omitting very young or very old individuals) may change the 
projected developmental slope of certain tests dramatically. 
Tests that appear to have linear relationships, when consid- 
ered only in adulthood, may actually have highly positively 
skewed binomial functions when the entire age range is con- 
sidered. One example is vocabulary, which tends to increase 
exponentially during the preschool years, shows a slower 
rate of progress during early adulthood, remains relatively 
stable with continued gradual increase, and then shows a mi- 
nor decrease with advancing age. If only a subset of the age 
range (e.g., adults) is used to estimate performance at the tail 
ends of the distribution (e.g., preschoolers and elderly), the 
estimation will not fit the shape of the actual distribution. 

Thus, normalization may introduce error when the rela- 
tionship between a test and a demographic variable is non- 
linear. In this case, linear correction using multiple regression 
distorts the true relationship between variables (Fasteneau, 
1998). 



MEASUREMENT PRECISION: RELIABILITY 
AND STANDARD ERROR 

Like all forms of measurement, psychological tests are not 
perfectly precise; rather, test scores must be seen as estimates 
of abilities or functions, each associated with some degree of 
measurement error. 3 Each test differs in the precision of the 
scores that it produces. Of critical importance is the fact 
that no test has only one specific level of precision. Rather, 
precision always varies to some degree, and potentially sub- 
stantially, across different populations and test-use settings. 
Therefore, estimates of measurement error relevant to specific 
testing circumstances are a prerequisite for correct interpreta- 
tion. For example, even the most precise test may produce 
highly imprecise results if administered in a nonstandard 
fashion, in a nonoptimal environment, or to an uncoopera- 
tive examinee. Aside from these obvious caveats, a few basic 



10 A Compendium of Neuropsychological Tests 



principles help in determining whether a test generally pro- 
vides precise measurements in most situations where it will be 
used. We begin with an overview of the related concepts of re- 
liability, true scores, obtained scores, the various estimates of 
measurement error, and the notion of confidence intervals. 
These are reviewed below. 

Definition of Reliability 

Reliability refers to the consistency of measurement of a given 
test and can be defined in several ways, including consistency 
within itself (internal consistency reliability), consistency over 
time (test-retest reliability), consistency across alternate forms 
(alternate form reliability), and consistency across raters (in- 
terrater reliability). Indices of reliability indicate the degree to 
which a test is free from measurement error (or the propor- 
tion of variance in observed scores attributable to variance in 
true scores). The interpretation of such indices is often not so 
straightforward. 

It is important to note that the term "error" in this context 
does not actually refer to "incorrect" or "wrong" information. 
Rather, "error" consists of the multiple sources of variability 
that affect test scores. What may be termed error variance in 
one application may be considered part of the true score in 
another, depending on the construct being measured (state or 
trait), the nature of the test employed, and whether it is 
deemed relevant or irrelevant to the purpose of the testing 
(Anastasi & Urbina, 1997). An example relevant to neuropsy- 
chology is that internal reliability coefficients tend to be 
smaller at either end of the age continuum. This finding has 
been attributed to both limitations of tests (e.g., measurement 
error) and increased intrinsic performance variability among 
very young and very old examinees. 

Factors Affecting Reliability 

Reliability coefficients are influenced by (a) test characteristics 
(e.g., length, item type, item homogeneity, and influence of 
guessing) and (b) sample characteristics (e.g., sample size, 
range, and variability). The extent of a test's "clarity" is inti- 
mately related to its reliability: reliable measures typically 
have (a) clearly written items, (b) easily understood test in- 
structions, (c) standardized administration conditions, (d) 
explicit scoring rules that minimize subjectivity, and (e) a 
process for training raters to a performance criterion (Nun- 
nally & Bernstein, 1994). For a list of commonly used reliabil- 
ity coefficients and their associated sources of error variance, 
see Table 1-2. 



Internal Reliability 

Internal reliability reflects the extent to which items within a 
test measure the same cognitive domain or construct. It is a 
core index in classical test theory. A measure of the intercorre- 
lation of items, internal reliability is an estimate of the corre- 
lation between randomly parallel test forms, and by extension, 



Table 1 -2 Sources of Error Variance in Relation to Reliability 
Coefficients 



Type of Reliability Coefficient 

Split-half 

Kuder-Richardson 

Coefficient alpha 

Test-retest 

Alternate-form (immediate) 

Alternate-form (delayed) 

Interrater 



Error Variance 

Content sampling 
Content sampling 
Content sampling 
Time sampling 
Content sampling 
Content sampling and time 
sampling 
Interscorer differences 



Source: Adapted from Anastasi & Urbina, 1997, with permission. 



of the correlation between test scores and true scores. This is 
why it is used for estimating true scores and associated stan- 
dard errors (Nunnally & Bernstein, 1994). All things being 
equal, longer tests will generally yield higher reliability esti- 
mates (Sattler, 2001). Internal reliability is usually assessed 
with some measure of the average correlation among items 
within a test (Nunnally & Bernstein, 1994). These include the 
split-half or Spearman-Brown reliability coefficient (obtained 
by correlating two halves of items from the same test) and co- 
efficient alpha, which provides a general estimate of reliability 
based on all the possible ways of splitting test items. Alpha is 
essentially based on the average intercorrelation between test 
items and any other set of items, and is used for tests with 
items that yield more than two response types (i.e., possible 
scores of 0, 1, or 2). For additional useful references concern- 
ing alpha, see Chronback (2004) and Streiner (2003a, 2003b). 
The Kuder-Richardson reliability coefficient is used for items 
with yes/no answers or heterogeneous tests where split-half 
methods must be used (i.e., the mean of all the different split- 
half coefficients if the test were split into all possible ways). 
Generally, Kuder-Richardson coefficients will be lower than 
split-half coefficients when tests are heterogeneous in terms of 
content (Anastasi & Urbina, 1997). 

The Special Case of Speed Tests 

Tests involving speed, where the score exclusively depends on 
the number of items completed within a time limit rather 
than the number correct, will cause spuriously high internal 
reliability estimates if internal reliability indices such as split- 
half reliability are used. For example, dividing the items into 
two halves to calculate a split-half reliability coefficient will 
yield two half- tests with 100% item completion rates, whether 
the individual obtained a score of 4 (i.e., yielding two half- 
tests totaling 2 points each, or perfect agreement) or 44 (i.e., 
yielding two half-tests both totaling 22 points, also yielding 
perfect agreement). The result in both cases is a split-half reli- 
ability of 1.00 (Anastasi & Urbina, 1997). Some alternatives 
are to use test-retest reliability or alternate form reliability, 
ideally with the alternate forms administered in immediate 
succession to avoid time sampling error. Reliabilities can also 



Psychometrics in Neuropsychological Assessment 1 1 



be calculated for any test that can be divided into specific time 
intervals; scores per interval can then be compared in a proce- 
dure akin to the split-half method, as long as items are of rela- 
tively equivalent difficulty (Anastasi & Urbina, 1997). For 
most of the speed tests reviewed in this volume, reliability is 
estimated by using the test-retest reliability coefficient, or else 
by a generalizability coefficient (see below). 

Test-Retest Reliability 

Test-retest reliability, also known as temporal stability, pro- 
vides an estimate of the correlation between two test scores 
from the same test administered at two different ponts in time. 
A test with good temporal stability should show little change 
over time, providing that the trait being measured is stable and 
there are no differential effects of prior exposure. It is impor- 
tant to note that tests measuring dynamic (i.e., changeable) 
abilities will by definition produce lower test-retest reliabilities 
than tests measuring domains that are more trait-like and sta- 
ble (Nunnally & Bernstein, 1994). See Table 1-3 for common 
sources of bias and error in test-retest situations. 

A test has an infinite number of possible test-retest reliabil- 
ities, depending on the length of the time interval between 
testing. In some cases, reliability estimates are inversely related 
to the time interval between baseline and retest (Anastasi & 
Urbina, 1997). In other words, the shorter the time interval 
between test and retest, the higher the reliability coefficient 
will be. However, the extent to which the time interval affects 
the test-retest coefficient will depend on the type of ability 
evaluated (i.e., stable versus more variable). Reliability may 
also depend on the type of individual being assessed, as some 
groups are intrinsically more variable over time than others. 
For example, the extent to which scores fluctuate over time 
may depend on subject characteristics, including age (e.g., 
normal preschoolers will show more variability than adults) 
and neurological status (e.g., TBI examinees' scores may vary 
more in the acute state than in the post-acute state). Ideally, 
reliability estimates should be provided for both normal indi- 
viduals and the clinical populations in which the test is in- 
tended to be used, and the specific demographic characteristics 
of the samples should be fully specified. Test stability coeffi- 
cients presented in published test manuals are usually derived 
from relatively small normal samples tested over much 
shorter intervals than are typical for retesting in clinical prac- 
tice and should therefore be considered with due caution 
when drawing inferences regarding clinical cases. However, 
there is some evidence that duration of interval has less of 
an impact on test-retest scores than subject characteristics 
(Dikmen et al, 1999). 

Prior Exposure and Practice Effects 

Variability in scores on the same test over time may be related 
to situational variables such as examinee state, examiner state, 
examiner identity (same versus different examiner at retest), 
or environmental conditions that are often unsystematic and 



Table 1 -3 Common Sources of Bias and Error in 
Test-Retest Situations 



Bias Intervening variables 



Practice effects 



Demographic 
considerations 



Error Statistical errors 

Random or 
uncontrolled events 



Events of interest (e.g., surgery, 
medical intervention, 
rehabilitation) 
Extraneous events 

Memory for content 
Procedural learning 
Other factors 

(a) Familiarity with testing 
context and examiner 

(b) Performance anxiety 

Age (maturational effects and 

aging) 

Education 

Gender 

Ethnicity 

Baseline ability 

Measurement error (SEM) 
Regression to the mean (SE e ) 



Source: From Lineweaver & Chelune, 2003, p. 308. Reprinted with permission from 
Elsevier. 



may or may not be considered sources of measurement error. 
Apart from these variables, one must consider, and possibly 
parse out, effects of prior exposure, which are often conceptu- 
alized as involving implicit or explicit learning. Hence the 
term practice effects is often used. However, prior exposure to 
a test does not necessarily lead to increased performance at 
retest. Note also that the actual nature of the test may some- 
times change with exposure. For instance, tests that rely on a 
"novelty effect" and/or require deduction of a strategy or 
problem solving (e.g., WCST, Tower of London) may not be 
conducted in the same way once the examinee has prior fa- 
miliarity with the testing paradigm. 

Like some measures of problem-solving abilities, measures 
of learning and memory are also highly susceptible to practice 
effects, though these are less likely to reflect a fundamental 
change in how examinees approach tasks. In either case, prac- 
tice effects may lead to low test-retest correlations by effec- 
tively lowering the ceiling at retest, resulting in a restriction of 
range (i.e., many examinees obtain scores at near the maxi- 
mum possible at retest). Nevertheless, restriction of range 
should not be assumed when test-retest correlations are low 
until this has been verified by inspection of data. 

The relationship between prior exposure and test stability 
coefficients is complex, and although test-retest coefficients 
may be affected by practice or prior exposure, the coefficient 
does not indicate the magnitude of such effects. That is, retest 
correlations will be very high when individual retest scores all 
change by a similar amount, whether the practice effect is nil or 
very large. When stability coefficients are low, then there may 
be (1) no systematic effects of prior exposure, (2) the relation 



12 A Compendium of Neuropsychological Tests 



of prior exposure may be nonlinear, or (3) ceiling effects/ 
restriction of range related to prior exposure may be attenuat- 
ing the coefficient. For example, certain subgroups may benefit 
more from prior exposure to test material than others (e.g., 
high-IQ individuals; Rapport et al., 1998), or some subgroups 
may demonstrate more stable scores or consistent practice ef- 
fects than do others. This causes the score distribution to 
change at retest (effectively "shuffling" the individuals' rank- 
ings in the distribution), which will attenuate the correlation. 
In these cases, the test-retest correlation may vary significantly 
across subgroups and the correlation for the entire sample 
will not be the best estimate of reliability for any of the sub- 
groups, overestimating reliability for some and underestimat- 
ing reliability for others. In some cases, practice effects, as 
long as they are relatively systematic and accurately assessed, 
will not render a test unusable from a reliability perspective, 
though they should always be taken into account when retest 
scores are interpreted. In addition, individual factors must 
always be considered. For example, while improved perfor- 
mance may usually be expected with a particular measure, an 
individual examinee may approach tests that he or she had 
difficulty with previously with heightened anxiety that leads 
to decreased performance. Lastly, it must be kept in mind that 
factors other than prior exposure (e.g., changes in environ- 
ment or examinee state) may affect test-retest reliability. 

Alternate Forms Reliability 

Some investigators advocate the use of alternate forms to 
eliminate the confounding effects of practice when a test must 
be administered more than once (e.g., Anastasi & Urbina, 
1997). However, this practice introduces a second form of er- 
ror variance into the mix (i.e., content sampling error), in ad- 
dition to the time sampling error inherent in test-retest 
paradigms (see Table 1-3; see also Lineweaver & Chelune, 
2003). Thus, tests with alternate forms must have extremely 
high correlations between forms in addition to high test-retest 
reliability to confer any advantage over using the same form 
administered twice. Moreover, they must demonstrate equiva- 
lence in terms of mean scores from test to retest, as well as 
consistency in score classification within individuals from test 
to retest. Furthermore, alternate forms do not necessarily 
eliminate effects of prior exposure, as exposure to stimuli and 
procedures can confer some positive carry-over effect (e.g., 
procedural learning) despite the use of a different set of items. 
These effects may be minimal across some types of well- 
constructed parallel forms, such as those assessing acquired 
knowledge. For measures such as the WCST, where specific 
learning and problem solving are involved, it may be difficult 
or impossible to produce an equivalent alternate form that 
will be free of effects of prior exposure to the original form. 
While it is possible to attain this degree of psychometric so- 
phistication through careful item analysis, reliability studies, 
and administration to a representative normative group, it is 
rare for alternate forms to be constructed with the same psy- 
chometric rigor as were the original forms from which they 



were derived. Even well-constructed alternate forms often lack 
crucial validation evidence such as similar correlations to cri- 
terion measures as the original test form. This is especially 
true for older neuropsychological tests, particularly those 
with original forms that were never subjected to any item 
analysis or reliability studies whatsoever (e.g., BVRT). Inade- 
quate test construction and psychometric properties are also 
found for alternate forms in more general published tests in 
common usage (e.g., WRAT-3). Because so few alternate 
forms are available and few of those that are meet these psy- 
chometric standards, our tendency is to use reliable change 
indices or standardized regression-based scores for estimating 
change from test to retest. 

Interrater Reliability 

Most test manuals provide specific and detailed instructions 
on how to administer and score tests according to standard 
procedures to minimize error variance due to different exam- 
iners and scorers. However, some degree of examiner variance 
remains in individually administered tests, particularly when 
scores involve a degree of judgment (e.g., multiple-response 
verbal tests such as the Wechsler Vocabulary Scales, which re- 
quire the rater to administer a score from to 2). In this case, 
an estimate of the reliability of administration and scoring 
across examiners is needed. 

Interrater reliability can be evaluated using percentage 
agreement, kappa, product-moment correlation, and intra- 
class correlation coefficient (Sattler, 2001). For any given test, 
Pearson correlations will provide an upper limit for the intra- 
class correlations, but intraclass correlations are preferred be- 
cause, unlike the Pearson's r, they take into account paired 
assessments made by the same set of examiners from those 
made by different examiners. Thus, the intraclass correlation 
distinguishes those sets of scores ranked in the same order 
from those that are ranked in the same order but have low, 
moderate, or complete agreement with each other, and cor- 
rects for interexaminer or test-retest agreement expected by 
chance alone (Cicchetti & Sparrow, 1981). However, advan- 
tages of the Pearson correlation are that it is familiar, is readily 
interpretable, and can be easily compared using standard sta- 
tistical techniques; it is best for evaluating consistency in 
ranking rather than agreement per se (Fastenau et al., 1996). 

Generalizability Coefficients 

One reliability coefficient type not covered in this list is the 
generalizability coefficient, which is starting to appear more 
frequently in test manuals, particularly in the larger test bat- 
teries (e.g., Wechsler scales and NEPSY). In generalizability 
theory, or G theory, reliability is evaluated by decomposing 
test score variance using the general linear model (e.g., vari- 
ance components analysis). This is a variant of the mathe- 
matical methods used to apportion variance in general linear 
model analyses such as ANOVA. In the case of G theory, the 
between-groups variance is considered an estimate of a true 



Psychometrics in Neuropsychological Assessment 13 



score variance and within-groups variance is considered an 
estimate of error variance. The generalizability coefficient is 
the ratio of estimated true variance to the sum of the esti- 
mated true variance and estimated error variance. A discus- 
sion of this flexible and powerful model is beyond the scope 
of this chapter, but detailed discussions can be found in 
Nunnally and Bernstein (1994) and Shavelson et al. (1989). 
Nunnally and Bernstein (1994) also discuss related issues 
pertinent to estimating reliabilities of variables reflecting 
sums such as composite scores, and the fact that reliabilities 
of difference scores based on correlated measures can be very 
low. 



know the degree to which scores are replicable at retesting, 
whether or not the test may be used again in future. 

It is our belief that test users should take an informed 
and pragmatic, rather than dogmatic, approach to evaluating 
reliability of tests used to inform diagnosis or other clinical 
decisions. If a test has been designed to measure a single, one- 
dimensional construct, then high internal consistency reliabil- 
ity should be considered an essential property. High test-retest 
reliability should also be considered an essential property un- 
less the test is designed to measure state variables that are ex- 
pected to fluctuate, or if systematic factors such as practice 
effects attenuate stability coefficients. 



Evaluating a Test's Reliability 

A test cannot be said to have a single or overall level of relia- 
bility. Rather, tests can be said to exhibit different kinds of re- 
liability, the relative importance of which will vary depending 
on how the test is to be used. Moreover, each kind of reliabil- 
ity may vary across different populations. For instance, a test 
may be highly reliable in normally functioning adults, but be 
highly unreliable in young children or in individuals with 
neurological illness. It is important to note that while high re- 
liability is a prerequisite for high validity, the latter does not 
follow automatically from the former. For example, height 
can be measured with great reliability, but it is not a valid in- 
dex of intelligence. It is usually preferable to choose a test of 
slightly lesser reliability if it can be demonstrated that the test 
is associated with a meaningfully higher level of validity 
(Nunnally & Bernstein, 1994). 

Some have argued that internal reliability is more impor- 
tant than other forms of reliability; thus, if alpha is low but 
test-retest reliability is high, a test should not be considered 
reliable (Nunnally, 1978, as cited by Cicchetti, 1989). Note 
that it is possible to have low alpha values and high test-retest 
reliability (if a measure is made up of heterogeneous items 
but yields the same responses at retest), or low alpha values 
but high interrater reliability (if the test is heterogeneous in 
item content but yields highly consistent scores across 
trained experts; an example would be a mental status exami- 
nation). Internal consistency is therefore not necessarily the 
primary index of reliability, but should be evaluated within 
the broader context of test-retest and interrater reliability 
(Cicchetti, 1989). 

Some argue that test-retest reliability is not as important as 
other forms of reliability if the test will only be used once and 
is not likely to be administered again in future. However, de- 
pending on the nature of the test and retest sampling proce- 
dures (as discussed previously), stability coefficients may 
provide valuable insight into the replicability of test results, 
particularly as these coefficients are a gauge of "real-world" 
reliability rather than accuracy of measurement of true scores 
or hypothetical reliability across infinite randomly parallel 
forms (as is internal reliability). In addition, as was stated pre- 
viously, clinical decision making will almost always be based 
on the obtained score. Therefore, it is critically important to 



What Is an Adequate Reliability Coefficient? 

The reliability coefficient can be interpreted directly in terms 
of the percentage of score variance attributed to different 
sources (i.e., unlike the correlation coefficient, which must be 
squared). Thus, with a reliability of .85, 85% of the variance 
can be attributed to the trait being measured, and 15% can be 
attributed to error variance (Anastasi & Urbina, 1997). When 
all sources of variance are known for the same group (i.e., 
when one knows the reliability coefficients for internal, test- 
retest, alternate form, and interrater reliability on the same 
sample), it is possible to calculate the true score variance (for 
an example, see Anastasi & Urbina, 1997, pp. 101-102). As 
noted above, although a detailed discussion of this topic is be- 
yond the scope of this volume, the portioning of total score 
variance into components is the crux of generalizability the- 
ory of reliability, which forms the basis for reliability esti- 
mates for many well-known speed tests (e.g., Wechsler scale 
subtests such as Digit Symbol). 

Sattler (2001) notes that reliabilities of .80 or higher are 
needed for tests used in individual assessment. Tests used for 
decision making should have reliabilities of .90 or above. Nun- 
nally and Bernstein (1994) note that a reliability of .90 is a 
"bare minimum" for tests used to make important decisions 
about individuals (e.g., IQ tests), and .95 should be the optimal 
standard. When important decisions will be based on test 
scores (e.g., placement into special education), small score dif- 
ferences can make a great difference to outcome, and precision 
is paramount. They note that even with a reliability of .90, the 
SEM is almost one-third as large as the overall SD of test scores. 

Given these issues, what is a clinically acceptable level of 
reliability? According to Sattler (2001), tests with reliabilities 
below .60 are unreliable; those above .60 are marginally reli- 
able, and those above .70 are relatively reliable. Of note, tests 
with reliabilities of .70 may be sufficient in the early stages of 
validation research to determine whether the test correlates 
with other validation evidence; if so, additional effort can be 
expended to increase reliabilities to more acceptable levels 
(e.g., .80) by reducing measurement error (Nunnally & Bern- 
stein, 1994). In outcome studies using psychological tests, in- 
ternal consistencies of .80 to .90 and test-retest reliabilities of 
.70 are considered a minimum acceptable standard (Andrews 
et al., 1994; Burlingame et al, 1995). 



14 A Compendium of Neuropsychological Tests 

Table 1-4 Magnitude of Reliability Coefficients 

Magnitude of Coefficient 

Very high (.90+) 
High (.80-.89) 
Adequate (.70-.79) 
Marginal (.60-.69) 
Low (<.59) 



In terms of internal reliability of neuropsychological tests, 
Cicchetti et al. (1990) have proposed that internal consistency 
estimates of less than .70 are unacceptable, reliabilities be- 
tween .70 and .79 are fair, reliabilities between .80 and .89 are 
good, and reliabilities above .90 are excellent. 

For interrater reliabilities, Cicchetti and Sparrow (1981) 
report that clinical significance is poor for reliability coeffi- 
cients below .40, fair between .40 and .59, good between .60 
and .74, and excellent between .75 and 1.00. Fastenau et al. 
(1996), in summarizing guidelines on the interpretation of in- 
traclass correlations and kappa coefficients for interrater reli- 
ability, consider coefficients larger than .60 as substantial and 
of .75 or .80 as almost perfect. 

These are the general guidelines that we have used 
throughout the text to evaluate the reliability of neuropsycho- 
logical tests (see Table 1-4) so that the text can be used as a 
reference when selecting tests with the highest reliability. 
Users should note that there is a great deal of variability with 
regard to the acceptability of reliability coefficients for neu- 
ropsychological tests, as perusal of this volume will indicate. 
In general, for tests involving multiple subtests and multiple 
scores (e.g., Wechsler scales, NEPSY, D-KEFS), including 
those derived from qualitative observations of performance 
(e.g., error analyses), the farther away a score gets from the 
composite score itself and the more difficult the score is to 
quantify, the lower the reliability. A quick review of the relia- 
bility data presented in this volume also indicates that verbal 
tests, with few exceptions, tend to have consistently higher re- 
liability than tests measuring other cognitive domains. 

Limits of Reliability 

Although it is possible to have a reliable test that is not valid for 
some purposes, the converse is not the case (see later). Further, 
it is also conceivable that there are some neuropsychological 
domains that simply cannot be measured reliably. Thus, even 
though there is the assumption that questionable reliability is 
always a function of the test, reliability may depend on the na- 
ture of the psychological process measured or on the nature of 
the population evaluated. For example, many of the executive 
functioning tests reviewed in this volume have relatively mod- 
est reliabilities, suggesting that this ability is difficult to assess 
reliably. Additionally, tests used in populations with high re- 
sponse variability, such as preschoolers, elderly individuals, or 
individuals with brain disorders, may invariably yield low reli- 
ability coefficients despite the best efforts of test developers. 



Lastly, as previously discussed, reliability coefficients do 
not provide complete information on the reproducibility of 
individual test scores. Thus, with regard to test-retest reliabil- 
ity, it is possible for a test to have high reliability (r= .80) but 
have retest means that are 10 points higher than baseline 
scores. Reliability coefficients do not provide information on 
whether individuals retain their relative place in the distribu- 
tion from baseline to retest. Procedures such as the Bland- 
Altman method (Altman & Bland, 1983; Bland & Altaian, 
1986) are one way to determine the limits of agreement be- 
tween two assessments for individuals in a group. 



MEASUREMENT ERROR 

A good working understanding of conceptual issues and meth- 
ods of quantifying measurement error is essential for compe- 
tent clinical practice. We start our discussion of this topic with 
concepts arising from classical test theory. 

True Scores 

A central element of classical test theory is the concept of a 
true score, or the score an examinee would obtain on a mea- 
sure in the absence of any measurement error (Lord & Novick, 
1968). True scores can never be known. Instead, they are esti- 
mated, and are conceptually defined as the mean score an ex- 
aminee would obtain across an infinite number of randomly 
parallel forms of a test, assuming that the examinee's scores 
were not systematically affected by test exposure/practice or 
other time-related factors such as maturation (Lord & Novick, 
1968). In contrast to true scores, obtained scores are the actual 
scores yielded by tests. Obtained scores include any measure- 
ment error associated with a given test. 4 That is, they are the 
sum of true scores and error. 

In the classical model, the relation between obtained and 
true scores is expressed in the following formula, where error 
(e) is random and all variables are assumed to be normal in 
distribution: 



x= t+ e 



[3] 



Where: 



x = obtained score 
t = true score 
e = error 

When test reliability is less than perfect, as is always the case, 
the net effect of measurement error across examinees is to 
bias obtained scores outward from the population mean. That 
is, scores above the mean are most likely to be higher than 
true scores, while those below the mean are most likely to be 
lower than true scores (Lord & Novick, 1968). Estimated true 
scores correct this bias by regressing obtained scores toward 
the normative mean, with the amount of regression depend- 
ing on test reliability and deviation of the obtained score from 
the mean. The formula for estimated true scores (f') is: 



Psychometrics in Neuropsychological Assessment 15 



t , = X+[r„(x-X)] 



[4] 



Where: 



X= mean test score 

r xx = test reliability (internal consistency reliability in 

classical test theory) 
x = obtained score 

If working with z scores, the formula is simpler: 



x z 



[5] 



Formula 4 shows that an examinee's estimated true score is 
the sum of the mean score of the group to which he or she be- 
longs (i.e., the normative sample) and the deviation of his or 
her obtained score from the normative mean weighted by test 
reliability (as derived from the same normative sample). Fur- 
ther, as test reliability approaches unity (i.e., r— 1.0), esti- 
mated true scores approach obtained scores (i.e., there is little 
measurement error, so estimated true scores and obtained 
scores are nearly equivalent). Conversely, as test reliability ap- 
proaches zero (i.e., when a test is extremely unreliable and 
subject to excessive measurement error), estimated true scores 
approach the mean test score. That is, when a test is highly reli- 
able, greater weight is given to obtained scores than to the nor- 
mative mean score, but when a test is very unreliable, greater 
weight is given to the normative mean score than to obtained 
scores. Practically speaking, estimated true scores will always 
be closer to the mean than obtained scores are (except, of 
course, where the obtained score is at the mean). 

The Use of True Scores in Clinical Practice 

Although the true score model is abstract, it has practical util- 
ity and important implications for test score interpretation. 
For example, what may not be immediately obvious from for- 
mulas 4 and 5 is readily apparent in Table 1-5: estimated true 
scores translate test reliability (or lack thereof ) into the same 
metric as actual test scores. 

As can be seen in Table 1-5, the degree of regression to the 
mean of true scores is inversely related to test reliability and 
directly related to degree of deviation from the reference 
mean. This means that the more reliable a test is, the closer are 
obtained scores to true scores and that the further away the ob- 
tained score is from the sample mean, the greater the discrep- 



Table 1 -5 Estimated True Score Values for Three Observed Scores 
at Three Levels of Reliability 









Observed Scores 










(M = 


100, SD= 15) 






Reliability 


110 




120 


130 


Testl 


.95 


110 




119 


129 


Test 2 


.80 


108 




116 


124 


Test 3 


.65 


107 




113 


120 



Estimated true scores rounded to whole values. 



ancy between true and obtained scores. For a highly reliable 
measure such as Test 1 (r= .95), true score regression is mini- 
mal, even when an obtained score lies a considerable distance 
from the sample mean; in this example, a standard score of 
130, or two SDs above the mean, is associated with an esti- 
mated true score of 129. In contrast, for a test with low relia- 
bility such as Test 3 (r=.65), true score regression is quite 
substantial. For this test, an obtained score of 130 is associated 
with an estimated true score of 120; in this case, fully one- 
third of the observed deviation is "lost" to regression when the 
estimated true score is calculated. 

Such information may have important implications with 
respect to interpretation of test results. For example, as shown 
in Table 1-5, as a result of differences in reliability, obtained 
scores of 120 on Test 1 and 130 on Test 3 are associated with 
essentially equivalent estimated true scores (i.e., 119 and 120, 
respectively). If only obtained scores are considered, one 
might interpret scores from Test 1 and Test 3 as significantly 
different, even though these "differences" actually disappear 
when measurement precision is taken into account. It should 
also be noted that such differences may not be limited to com- 
parisons of scores across different tests within the same indi- 
vidual, but may also apply to comparisons between scores 
from the same test across different individuals when the indi- 
viduals come from different groups and the test in question 
has variable reliability across those groups. 

Regression to the mean may also manifest as pronounced 
asymmetry of confidence intervals centered on true scores, 
relative to obtained scores, as discussed in more detail later. 
Although calculation of true scores is encouraged as a means 
of gauging the limitations of reliability, it is important to con- 
sider that any significant difference between characteristics of 
an examinee and the sample from which a mean sample score 
and reliability estimate were derived may invalidate the pro- 
cess. For example, in some cases it makes little sense to esti- 
mate true scores for severely brain-injured individuals on 
tests of cognition using test parameters from healthy norma- 
tive samples, as mean scores within the brain-injured popula- 
tion are likely to be substantially different from those seen in 
healthy normative samples; reliabilities may differ substan- 
tially as well. Instead, one may be justified in deriving esti- 
mated true scores using data from a comparable clinical sample 
if this is available. Overall, these issues underline the complex- 
ities inherent in comparing scores from different tests in dif- 
ferent populations. 

The Standard Error of Measurement 

Examiners may wish to quantify the margin of error associ- 
ated with using obtained scores as estimates of true scores. 
When the sample SD and the reliability of obtained scores are 
known, an estimate of the SD of obtained scores about true 
scores may be calculated. This value is known as the standard 
error of measurement, or SEM (Lord & Novick, 1968). More 
simply, the SEM provides an estimate of the amount of error 
in a person's observed score. It is a function of the reliability 



16 A Compendium of Neuropsychological Tests 



of the test, and of the variability of scores within the sample 
The SEM is inversely related to the reliability of the test. Thus 
the greater the reliability of the test is, the smaller the SEM is 
and the more confidence the examiner can have in the preci- 
sion of the score. 

The SEM is defined by the following formula: 

SEM 



SD-yJT- 



[6] 



Where: 



SD = the standard deviation of the test, as derived from an 

appropriate normative sample 
r xx = the reliability coefficient of the test (usually internal 

reliability) 

Confidence Intervals 

While the SEM can be considered on its own as an index of 
test precision, it is not necessarily intuitively interpretable, 5 
and there is often a tendency to focus excessively on test scores 
as point estimates at the expense of consideration of associ- 
ated estimation error ranges. Such a tendency to disregard 
imprecision is particularly inappropriate when interpreting 
scores from tests of lower reliability. Clinically, it may there- 
fore be very important to report, in a concrete and easily un- 
derstandable manner, the degree of precision associated with 
specific test scores. One method of doing this is to use confi- 
dence intervals. 

The SEM is used to form a confidence interval (or range 
of scores), around estimated true scores, within which obtained 
scores are most likely to fall. The distribution of obtained scores 
about the true score (the error distribution) is assumed to be 
normal, with a mean of zero and an SD equal to the SEM; 
therefore, the bounds of confidence intervals can be set to in- 
clude any desired range of probabilities by multiplying by the 
appropriate z value. Thus, if an individual were to take a large 
number of randomly parallel versions of a test, the resulting 
obtained scores would fall within an interval of ±1 SEM of the 
estimated true score 68% of the time, and within 1.96 SEM 
95% of the time (see Table 1-1). 

Obviously, confidence intervals for unreliable tests (i.e., 
with a large SEM) will be larger than those for highly reliable 
tests. For example, we may again use data from Table 1-5. For 
a highly reliable test such as Test 1, a 95% confidence interval 
for an obtained score of 110 ranges from 103 to 116. In con- 
trast, the confidence interval for Test 3, a less reliable test, is 
larger, ranging from 89 to 124. 

It is important to bear in mind that confidence intervals 
for obtained scores that are based on the SEM are centered on 
estimated true scores. 6 Such confidence intervals will be sym- 
metric around obtained scores only when obtained scores are 
at the test mean or when reliability is perfect. Confidence in- 
tervals will be asymmetric about obtained scores to the same 
degree that true scores diverge from obtained scores. There- 
fore, when a test is highly reliable, the degree of asymmetry 
will often be trivial, particularly for obtained scores within 



one SD of the mean. For tests of lesser reliability, the asymme- 
try may be marked. For example, in Table 1-5, consider the 
obtained score of 130 on Test 2. The estimated true score in 
this case is 124 (see equations 4 and 5). Using equation 5 and 
a z-multiplier of 1.96, we find that a 95% confidence interval 
for the obtained scores spans +13 points, or from 111 to 137. 
This confidence interval is substantially asymmetric about the 
obtained score. 

It is also important to note that SEM-based confidence in- 
tervals should not be used for estimating the likelihood of ob- 
taining a given score at retesting with the same measure, as 
effects of prior exposure are not accounted for. In addition, 
Nunally and Bernstein (1994) point out that use of SEM- 
based confidence intervals assumes that error distributions 
are normally distributed and homoscedastic (i.e., equal in 
spread) across the range of scores obtainable for a given test. 
However, this assumption may often be violated. A number of 
alternate error models do not require these assumptions and 
may thus be more appropriate in some circumstances (see 
Nunally and Bernstein, 1994, for a detailed discussion). 7 

Lastly, as with the derivation of estimated true scores, when 
an examinee is known to belong to a group that markedly dif- 
fers from the normative sample, it may not be appropriate to 
derive SEMs and associated confidence intervals using nor- 
mative sample parameters (i.e., SD and r xx ), as these would 
likely differ significantly from parameters derived from an ap- 
plicable clinical sample. 

The Standard Error of Estimation 

In addition to estimating confidence intervals for obtained 
scores, one may also be interested in estimating confidence in- 
tervals for estimated true scores (i.e., the likely range of true 
scores about the estimated true score). For this purpose, one 
may construct confidence intervals using the standard error of 
estimation (SE E ; Lord & Novick, 1968). The formula for this is: 



SE E = SD^rJl-r xx ) 



[7] 



Where: 



SD = the standard deviation of the variable being 

estimated 
r^ = the test reliability coefficient 

The SE E , like the SEM, is an indication of test precision. As 
with the SEM, confidence intervals are formed around esti- 
mated true scores by multiplying the SE E by a desired z value. 
That is, one would expect that over a large number of randomly 
parallel versions of a test, an individual's true score would fall 
within an interval of +1 SE E of the estimated true score 68% 
of the time, and fall within 1.96 SE E 95% of the time. As with 
confidence intervals based on the SEM, those based on the 
SE E will usually not be symmetric around obtained scores. All 
of the other caveats detailed previously regarding SEM-based 
confidence intervals also apply. 

The choice of constructing confidence intervals based on 
the SEM versus the SE E will depend on whether one is more 



Psychometrics in Neuropsychological Assessment 17 



interested in true scores or obtained scores. That is, while the 
SEM is a gauge of test accuracy in that it is used to determine 
the expected range of obtained scores about true scores over 
parallel assessments (the range of error in measurement of the 
true score), the SE E is a gauge of estimation accuracy in that it 
is used to determine the likely range within which true scores 
fall (the range of error of estimation of the true score). Re- 
gardless, both SEM-based and SE E -based confidence intervals 
are symmetric with respect to estimated true scores rather 
than the obtained scores, and the boundaries of both will be 
similar for any given level of confidence interval when a test is 
highly reliable. 

The Standard Error of Prediction 

When the standard deviation of obtained scores for an alter- 
nate form is known, one may calculate the likely range of ob- 
tained scores expected on retesting with an alternate form. 
For this purpose, the standard error of prediction (SE P ; Lord & 
Novick, 1968) maybe used to construct confidence intervals. 
The formula for this is: 



SE p =SD yA /l-r: 



[8] 



Where: 



SD = the standard deviation of the parallel form 

administered at retest 
r xx = me reliability of the form used at initial testing 

In this case, confidence intervals are formed around estimated 
true scores (derived from initial obtained scores) by multiply- 
ing the SE P by a desired z value. That is, one would expect that 
when retested over a large number of randomly parallel ver- 
sions of a test, an individual's obtained score would fall within 
an interval of +1 SE P of the estimated true score 68% of the 
time, and fall within 1.96 SE E 95% of the time. As with confi- 
dence intervals based on the SEM, those based on the SE P will 
generally not be symmetric around obtained scores. All of the 
other caveats detailed previously regarding the SEM-based 
confidence intervals also apply. In addition, while it may be 
tempting to use SE p -based confidence intervals for evaluating 
significance of change at retesting with the same measure, this 
practice violates the assumptions that a parallel form is used 
at retest and, particularly, that no prior exposure effects apply. 

SEMs and True Scores: Practical Issues 

Nunnally and Bernstein (1994) note that most test manuals 
do "an exceptionally poor job of reporting estimated true 
scores and confidence intervals for expected obtained scores 
on alternative forms. For example, intervals are often erro- 
neously centered about obtained scores rather than estimated 
true scores. Often the topic is not even discussed" (p. 260). 
Sattler (2001) also notes that test manuals often base confi- 
dence intervals on the overall SEM for the entire standardiza- 
tion sample, rather than on SEMs for each age band. Using the 
average SEM across age is not always appropriate, given that 



some age groups are inherently more variable than others 
(e.g., preschoolers versus adults). In general, confidence inter- 
vals based on age-specific SEMs are preferable to those based 
on the overall SEM (particularly at the extremes of the age 
distribution, where there is the most variability) and can often 
be constructed using age-based SEMs found in most manuals. 
It is important to acknowledge that while estimated true 
scores and associated confidence intervals have merit, there 
are practical reasons to focus on obtained scores instead. For 
example, essentially all validity studies and actuarial predic- 
tion methods for most tests are based on obtained scores. 
Therefore, obtained scores must usually be employed for di- 
agnostic and other purposes to maintain consistency to prior 
research and test usage. For more discussion regarding the 
calculation and uses of the SEM, SE E , SE P , and alternative er- 
ror models, see Dudek (1979), Lord and Novick (1968), and 
Nunnally and Bernstein (1994). 



VALIDITY 

Models of validity are not abstract conceptual frameworks 
that are only minimally related to neuropsychological prac- 
tice. The Standards for Educational and Psychological Testing 
(AERA et al., 1999) state that validation is the joint responsi- 
bility of the test developer and the test user (1999). Thus, a 
working knowledge of validity models and the validity char- 
acteristics of specific tests is a central requirement for respon- 
sible and competent test use. From a practical perspective, 
a working knowledge of validity allows users to determine 
which tests are appropriate for use and which fall below stan- 
dards for clinical practice or research utility. Thus, neuropsy- 
chologists who use tests to detect and diagnose neurocognitive 
difficulties should be thoroughly familiar with commonly 
used validity models and how these can be used to evaluate 
neuropsychological tools. Assuming that a test is valid because 
it was purchased from a reputable test publisher, appears to 
have a large normative sample, or came with a large user's 
manual can be a serious error, as some well-known and com- 
monly used neuropsychological tests are lacking with regard 
to crucial aspects of validity. 

Definition of Validity 

Cronbach and Meehl (1955) were some of the first theorists to 
discuss the concept of construct validity. Since then, the basic 
definition of validity evolved as testing needs changed over 
the years. Although construct validity was first introduced as a 
separate type of validity (e.g., Anastasi & Urbina, 1997), it has 
moved, in some models, to encompass all types of validity 
(e.g., Messick, 1993). In other models, the term "construct 
validity" has been deemed redundant and has simply been re- 
placed by "validity," since all types of validity ultimately in- 
form as to the construct measured by the test. Accordingly, the 
term "construct validity" has not been used in the Standards 
for Educational and Psychological Testing since 1974 (AERA 



18 A Compendium of Neuropsychological Tests 



et al., 1999). However, whether it is deemed "construct valid- 
ity" or simply "validity," the concept is central to evaluating 
the utility of a test in the clinical or research arena. 

Test validity may be defined at the most basic level as the 
degree to which a test actually measures what it is intended to 
measure, or in the words of Nunnally and Bernstein (1994), 
"how well it measures what it purports to measure in the con- 
text in which it is to be applied" (p. 1 12). As with reliability, an 
important point to be made here is that a test cannot be said 
to have one single level of validity. Rather, it can be said to ex- 
hibit various types and levels of validity across a spectrum of 
usage and populations. That is, validity is not a property of a 
test, but rather, validity is a property of the meaning attached to 
a test score; validity can only arise and be defined in the spe- 
cific context of test usage. Therefore, while it is certainly nec- 
essary to understand the validity of tests in particular contexts, 
ultimate decisions regarding the validity of test score interpre- 
tation must take into account any unique factors pertaining to 
validity at the level of individual assessment, such as devia- 
tions from standard administration, unusual testing environ- 
ments, examinee cooperation, and the like. 

In the past, assessment of validity was generally test- 
centric. That is, test validity was largely indexed by compari- 
son with other tests, especially "standards" in the field. Since 
Cronbach (1971), there has been a move away from test-based 
or "measure-centered validity" (Zimiles, 1996) toward the in- 
terpretation and external utility of tests. Messick (1989, 1993) 
expanded the definition of validity to encompass an overall 
judgment of the extent to which empirical evidence and theo- 
retical rationales support the adequacy and effectiveness of 
interpretations and actions resulting from test scores. Subse- 
quently, Messick (1995) proposed a comprehensive model of 
construct validity wherein six different, distinguishable types 
of evidence contribute to construct validity. These are (1) 
content related, (2) substantive, (3) structural, (4) generaliz- 
ability, (5) external, and (6) consequential evidence sources 
(see Table 1-6), and they form the "evidential basis for score 

Table 1 -6 Messick's Model of Construct Validity 
Type of Evidence Definition 

Content-related Relevance, representativeness, and technical 

quality of test content 
Substantive Theoretical rationales for the test and test 

responses 
Structural Fidelity of scoring structure to the structure 

of the construct measured by the test 
Generalizability Scores and interpretations generalize across 

groups, settings, and tasks 
External Convergent and divergent validity, criterion 

relevance, and applied utility 
Consequential Actual and potential consequences of test use, 

relating to sources of invalidity related to 

bias, fairness, and distributive justice 3 

Source: Adapted from Messick, 1995. 

a See Lees-Haley (1996) for limitations of this component. 



interpretation" (Messick, 1995, p. 743). Likewise, the Stan- 
dards for Educational and Psychological Testing (AERA et al., 
1999) follows a model very much like Messick's, where differ- 
ent kinds of evidence are used to bolster test validity based on 
each of the following sources: (1) evidence based on test con- 
tent, (2) response processes, (3) internal structure, (4) rela- 
tions to other variables, and (5) consequences of testing. The 
most controversial aspect of these models is the requirement 
for consequential evidence to support validity. Some argue 
that judging validity according to whether use of a test results 
in positive or negative social consequences is too far-reaching 
and may lead to abuses of scientific inquiry, as when a test re- 
sult does not agree with the overriding social climate of the 
time (Lees-Haley, 1996). Social and ethical consequences, al- 
though crucial, may therefore need to be treated separately 
from validity (Anastasi 8c Urbina, 1997). 

Validity Models 

Since Cronbach and Meehl, various models of validity have 
been proposed. The most frequently encountered is the tripar- 
tite model whereby validity is divided into three components: 
content validity, criterion-related validity, and construct valid- 
ity (see Anastasi & Urbina, 1997; Mitrushina et al, 2005; Nun- 
nally & Bernstein, 1994; Sattler, 2001). Other validity subtypes, 
including convergent, divergent, predictive, treatment, clinical, 
and face validity, are subsumed within these three domains. 
For example, convergent and divergent validity are most often 
treated as subsets of construct validity (Sattler, 2001) and con- 
current and predictive validity as subsets of criterion validity 
(e.g., Mitrushina et al., 2005). Concurrent and predictive valid- 
ity only differ in terms of a temporal gradient; concurrent va- 
lidity is relevant for tests used to identify existing diagnoses or 
conditions, whereas predictive validity applies when determin- 
ing whether a test predicts future outcomes (Anastasi 8t Ur- 
bana, 1997). Although face validity appears to have fallen out 
of favor as a type of validity, the extent to which examinees be- 
lieve a test measures what it appears to measure can affect mo- 
tivation, self-disclosure, and effort. Consequently, face validity 
can be seen as a moderator variable affecting concurrent and 
predictive validity that can be operationalized and measured 
(Bornstein, 1996; Nevo, 1985). Again, all these labels for dis- 
tinct categories of validity are ways of providing different types 
of evidence for validity and are not, in and of themselves, differ- 
ent types of validity, as older sources might claim (AERA et al., 
1999; Yun & Ulrich, 2002). Lastly, validity is a matter of degree 
rather than an all-or-none property; validity is therefore never 
actually "finalized," since tests must be continually reevaluated 
as populations and testing contexts change over time (Nun- 
nally & Bernstein, 1994). 



How to Evaluate the Validity of a Test 

Pragmatically speaking, all the theoretical models in the world 
will be of no utility to the practicing clinician unless they 
can be translated into specific, step-by-step procedures for 



Psychometrics in Neuropsychological Assessment 19 



evaluating a test's validity. Table 1-7 presents a comprehensive 
(but not exhaustive) list of specific features users can look for 
when evaluating a test and reviewing test manuals. Each is or- 
ganized according to the type of validity evidence provided. 
For example, construct validity can be assessed via correla- 
tions with other tests, factor analysis, internal consistency 
(e.g., subtest intercorrelations), convergent and discriminant 
validation (e.g., multitrait-multimethod matrix), experimen- 
tal interventions (e.g., sensitivity to treatment), structural 
equation modeling, and response processes (e.g., task decom- 
position, protocol analysis; Anastasi & Urbina, 1997). Most 
importantly, users should also remember that even if all other 
conditions are met, a test cannot be considered valid if it is 
not reliable (see previous discussion). 

It is important to note that not all tests will have sufficient 
evidence to satisfy all aspects of validity, but test users should 
have a sufficiently broad knowledge of neuropsychological 
tools to be able to select one test over another, based on the 
quality of the validation evidence available. In essence, we 



have used this model to critically evaluate all the tests re- 
viewed in this volume. 

Note that there is a certain degree of overlap between cat- 
egories in Table 1-7. For example, correlations between a 
specific test and another test measuring IQ can simultane- 
ously provide criterion-related evidence and construct-related 
evidence of validity. Regardless of the terminology, it is im- 
portant to understand how specific techniques such as fac- 
tor analysis serve to inform the validity of test interpretation 
across the range of settings in which neuropsychologists 
work. 

What Is an Adequate Validity Coefficient? 

Some investigators have proposed criteria for evaluating evi- 
dence related to criterion validity in outcome assessments. For 
instance, Andrews et al. (1994) and Burlingame et al. (1995) 
recommend that a minimum level of acceptability for correla- 
tions involving criterion validity is .50. However, Nunnally 



Table 1-7 Sources of Evidence and Techniques for Critically Evaluating the Validity of Neuropsychological Tests 
Type of Evidence Required Evidence 



Content-related 



Construct-related 



Criterion-related 



Response processes 



Refers to themes, wording, format, tasks, or questions on a test, and administration and scoring 

Description of theoretical model on which test is based 

Review of literature with supporting evidence 

Definition of domain of interest (e.g., literature review, theoretical reasoning) 

Operationalization of definition through thorough and systematic review of test domain from which items are 

to be sampled, with listing of sources (e.g., word frequency sources for vocabulary tests) 

Collection of sample of items large enough to be representative of domain and with sufficient range of difficulty 

for target population 

Selection of panel of judges for expert review, based on specific selection criteria (e.g., academic and practical 

backgrounds or expertise within specific subdomains) 

Evaluation of items by expert panel based on specific criteria concerning accuracy and relevance 

Resolution of judgment conflicts within panel for items lacking cross-panel agreement (e.g., empirical means such 

as Index of Item Congruence; Hambleton, 1980) 

Formal definition of construct 

Formulation of hypotheses to measure construct 

Gathering empirical evidence of construct validation 

Evaluating psychometric properties of instrument (i.e., reliability) 

Demonstration of test sensitivity to developmental changes, correlation with other tests, group differences studies, 

factor analysis, internal consistency (e.g., correlations between subtests, or to composites within the same test), 

convergent and divergent validation (e.g., multitrait-multimethod matrix), sensitivity to experimental 

manipulation (e.g., treatment sensitivity), structural equation modeling, and analysis of process variables 

underlying test performance. 

Identification of appropriate criterion 

Identification of relevant sample group reflecting the entire population of interest; if only a subgroup is examined, 

then generalization must remain within subgroup definition (e.g., keeping in mind potential sources of error such 

as restriction of range) 

Analysis of test-criterion relationships through empirical means such as contrasting groups, correlations with 

previously available tests, classification of accuracy statistics (e.g., positive predictive power), outcome studies, 

and meta-analysis 

Determining whether performance on the test actually relates to the domain being measured 
Analysis of individual responses to determine the processes underlying performance (e.g., questioning test takers 
about strategy, analysis of test performance with regard to other variables, determining whether the test measures 
the same construct in different populations, such as age) 



Source: Adapted from Anastasi & Urbina, 1997; American Educational Research Association et at, 1999; Messick, 1995; and Yun and Ulrich, 2002. 



20 A Compendium of Neuropsychological Tests 



and Bernstein (1994) note that validity coefficients rarely ex- 
ceed .30 or .40 in most circumstances involving psychological 
tests, given the complexities involved in measuring and pre- 
dicting human behavior. There are no hard and fast rules 
when evaluating evidence supportive of validity, and interpre- 
tation should consider how the test results will be used. Thus, 
tests with even quite modest predictive validities (r— .30) may 
be of considerable utility, depending on the circumstances in 
which they will be used (Anastasi & Urbina, 1997; Nunnally & 
Bernstein, 1994), particularly if they serve to significantly in- 
crease the test's "hit rate" over chance. It is also important to 
note that in some circumstances, criterion validity may be 
measured in a categorical rather than continuous fashion, 
such as when test scores are used to inform binary diagnoses 
(e.g., demented versus not demented). In these cases, one 
would likely be more interested in indices such as predictive 
power than other measures of criterion validity (see below for 
a discussion of classification accuracy statistics). 



USE OF TESTS IN THE CONTEXT OF 
SCREENING AND DIAGNOSIS: 
CLASSIFICATION ACCURACY STATISTICS 

In some cases, clinicians use tests to measure how much of an 
attribute (e.g., intelligence) an examinee has, while in other 
cases, tests are used to help determine whether or not an exam- 
inee has a specific attribute, condition, or illness that may be 
either present or absent (e.g., Alzheimer's disease). In the latter 
case, a special distinction in test use may be made. Screening 
tests are those which are broadly or routinely used to detect a 
specific attribute, often referred to as a condition of interest, or 
COI, among persons who are not "symptomatic" but who may 
nonetheless have the COP (Streiner, 2003c). Diagnostic tests 
are used to assist in ruling in or out a specific condition in per- 
sons who present with "symptoms" that suggest the diagnosis 
in question. Another related use of tests is for purposes of pre- 
diction of outcome. As with screening and diagnostic tests, the 
outcome of interest may be defined in binary terms — it will ei- 
ther occur or not occur (e.g., return to the same type and level 
of employment). Thus, in all three cases, clinicians will be in- 
terested in the relation of the measure's distribution of scores 
to an attribute or outcome that is defined in binary terms. 

Typically, data concerning screening or diagnostic accu- 
racy are obtained by administering a test to a sample of per- 



sons who are also classified, with respect to the COI, by a so- 
called gold standard. Those who have the condition according 
to the gold standard are labeled COP, while those who do not 
have the condition are labeled COI'. In medicine, the gold 
standard is often a highly accurate diagnostic test that is more 
expensive and/or has a higher level of associated risk of 
morbidity than some new diagnostic method that is being 
evaluated for use as a screening measure or as a possible re- 
placement for the existing gold standard. In neuropsychology, 
the situation is often more complex, as the COI may be a psy- 
chological construct (e.g., malingering) for which consensus 
with respect to fundamental definitions is lacking or diagnos- 
tic gold standards may not exist. These issues may be less 
problematic when tests are used to predict outcome (e.g., re- 
turn to work), though other problems that may afflict out- 
come data such as intervening variables and sample attrition 
may complicate interpretation of predictive accuracy. 

The simplest way to relate test results to binary diagnoses or 
outcomes is to utilize a cutoff score. This is a single point along 
the continuum of possible scores for a given test. Scores at or 
above the cutoff classify examinees as belonging to one of two 
groups; scores below the cutoff classify examinees as belonging 
to the other group. Those who have the COI according to the 
test are labeled as Test Positive (Test + ), while those who do not 
have the COI are labeled Test Negative (Test - ). 

Table 1-8 shows the relation between examinee classifica- 
tions based on test results versus classifications based on a 
gold standard measure. By convention, test classification is de- 
noted by row membership and gold standard classification is 
denoted by column membership. Cell values represent the to- 
tal number of persons from the sample falling into each of 
four possible outcomes with respect to agreement between a 
test and respective gold standard. By convention, agreements 
between gold standard and test classifications are referred to 
as True Positive and True Negative cases, while disagreements 
are referred to as False Positive and False Negative cases, with 
positive and negative referring to the presence or absence of a 
COI as per classification by the gold standard. When consid- 
ering outcome data, observed outcome is substituted for the 
gold standard. It is important to keep in mind while reading 
the following section that while gold standard measures are 
often implicitly treated as 100% accurate, this may not always 
be the case. Any limitations in accuracy or applicability of a 
gold standard or outcome measure need to be accounted for 
when interpreting classification accuracy statistics. 



Table 1 -8 Classification/Prediction Accuracy of a Test in Relation to a "Gold Standard" or Actual 
Outcome 







Gold Standard 




Test Result 


COI+ 


coi- 


Row Total 


Test+ 
Test- 
Column total 


A (True Positive) 
C (False Negative) 
A + C 


is (False Positive) 
D (True Negative) 
B + D 


A + B 
C + D 
N=A+B+C+D 



Psychometrics in Neuropsychological Assessment 2 1 



Test Accuracy and Efficiency 

The general accuracy of a test with respect to a specific COI is 
reflected by data in the columns of a classification accuracy 
table (Streiner, 2003c). The column-based indices include 
Sensitivity, Specificity, and the Positive and Negative Likelihood 
Ratios (LR + and LR~). The formulas for calculation of the 
column-based classification accuracy statistics from data in 
Table 1-9 are given below: 

Sensitivity = A/ (A + C) [9] 

Specificity =D/(D + B) [10] 

LR+ = Sensitivity/( 1 - Specificity) [11] 

LR~ = Specificity/ ( 1 — Sensitivity) [12] 

Sensitivity is defined as the proportion of COI + examinees 
who are correctly classified as such by a test. Specificity is de- 
fined as the proportion of COI~ examinees who are correctly 
classified as such by a test. The Positive Likelihood Ratio 
(LR + ) combines sensitivity and specificity into a single index 
of overall test accuracy indicating the odds (likelihood) that a 
positive test result has come from a COI + examinee. For ex- 
ample, a likelihood ratio of 3.0 may be interpreted as indicat- 
ing that a positive test result is three times as likely to have 
come from a COI + examinee as a COI~ one. The LR~ is inter- 
preted conversely to the LR + . As the LR approaches 1 , test clas- 
sification approximates random assignment of examinees. 
That is, a person who is Test + is equally likely to be COI + or 
COT". Given that they are derived from an adequate norma- 
tive sample, Sensitivity, Specificity and LR +/ ~ are assumed to 
reflect fixed (i.e., constant) properties of a test that are appli- 
cable whenever a test is used within the normative popula- 
tion. For purposes of working examples, Table 1-9 presents 
hypothetical test and gold standard data. 

Using equations 9 to 12 above, the hypothetical test 
demonstrates moderate Sensitivity (.75) and high Specificity 
(.95), with a LR+ of 15 and an LR" of 3.8. Thus, for the hypo- 
thetical measure, a positive result is 15 times more likely to be 
obtained by an examinee who has the COI than one who does 
not, while a negative result is 3.8 times more likely to be ob- 
tained by an examinee who does not have the COI than one 
who does. 

Note that Sensitivity, Specificity, and LR +/ ~ are parameter 
estimates that have associated errors of estimation that can be 
quantified. The magnitude of estimation error is inversely re- 
lated to sample size, and can be quite large when samples size 



Table 1 -9 Classification/Prediction Accuracy of a Test in Relation 
to a "Gold Standard" or Actual Outcome — Hypothetical Data 







Gold Standard 




Test Result 


COI+ 


coi- 


Row Total 


Test+ 

Tesr 
Column total 


30 

10 
40 


2 

38 
40 


32 

48 

N=80 



is small. The formulae for calculating standard errors for Sen- 
sitivity, Specificity, and the LR are complex and will not be 
presented here (see McKenzie et al., 1997). Fortunately, these 
values may also be easily calculated using a number of readily 
available computer programs. Using one of these (by Mackin- 
non, 2000) with data from Table 1-9, the 95% confidence in- 
terval for Sensitivity was found to be .59 to .87, while that for 
Specificity was .83 to .99. LR + was 3.8 to 58.6, and LR~ was 2.2 
to 6.5. Clearly, the range of measurement error is not trivial 
for this hypothetical study. In addition to appreciating issues 
relating to estimation error, it is also important to understand 
that while column-based indices provide useful information 
about test validity and utility, a test may nevertheless have 
high sensitivity and specificity but still be of limited clinical 
value in some situations, as will be detailed below. 

Predictive Power 

As opposed to being concerned with test accuracy at the group 
level, clinicians are typically more concerned with test accu- 
racy in the context of diagnosis and other decision making at 
the level of individual examinees. That is, clinicians wish to 
determine whether or not an individual examinee does or does 
not have a given COI. In this scenario, clinicians must consider 
indices derived from the data in the rows of a classification ac- 
curacy table (Streiner, 2003c). These row-based indices are 
Positive Predictive Power (PPP) and Negative Predictive Power 
(NPP). 9 The formulas for calculation of these from data in 
Table 1-8 are given below: 



PPP = A/ (A + B) 
NPP = D/(C + D) 



[13] 

[14] 



Positive Predictive Power is defined as the probability that an 
individual with a positive test result has the COI. Conversely, 
Negative Predictive Power is defined as the probability that an 
individual with a negative test result does not have the COI. 
For example, predictive power estimates derived from the data 
presented in Table 1-9 indicate that PPP = .94 and NPP = .79; 
thus, in the hypothetical data set, 94% of persons who obtain 
a positive test result actually have the COI, while 79% of peo- 
ple who obtain a negative test result do not in fact have the 
COI. When predictive power is close to .50, examinees are ap- 
proximately equally likely to be COI + as COI~, regardless of 
whether they are Test + or Test". When predictive power is less 
than .50, test-based classifications or diagnoses will be incor- 
rect more often than not. 10 

As with Sensitivity and Specificity, PPP and NPP are pa- 
rameter estimates that should always be considered in the 
context of estimation error. Unfortunately, standard errors or 
confidence intervals for estimates of predictive power are 
rarely listed when these values are reported; clinicians are thus 
left to their own devices to calculate them. Fortunately, these 
values may be easily calculated using a number of computer 
programs. Using one of these (by Mackinnon, 2000) with data 
from Table 1-9, the 95% confidence intervals for PPP and 
NPP were found to be .94 to .99 and .65 to .90, respectively. 



22 A Compendium of Neuropsychological Tests 



Figure 1-5 Relation of predictive power to prevalence- 
hypothetical data. 

Sensitivity = .75 Specificity = .95 



.u 

.9 
.8 
.7 
.6 
.5 
.4 
.3 
.2 
.1 
n 











































































































































































































.4 .5 .6 
Base Rate 



1.0 



■PPP 



NPP 



Clearly, the CI range is not trivial for this small data set. Of 
critical importance to clinical interpretation of test scores, 
PPP and NPP are not fixed properties of a test like row-based 
indices, but vary with the baserate or prevalence of a COL 

Sample vs. Actual Baserates and 
Relation to Predictive Power 

The prevalence of a COI is defined with respect to Table 1-8 as: 

(A + C)/N [15] 

As should be readily apparent from inspection of Table 1-9, 
the prevalence of the COI in the sample is 50 percent. Formu- 
las for deriving Predictive Power for any level of sensitivity 
and specificity and a specified prevalence are given below: 



PPP: 



NPP = 



Prevalence x Sensitivity 

(Prevalencex Sensitivity) + 

[ ( 1 — Prevalence) x (1 — Specificity) ] 

(1 — Prevalence) x Specificity 



[16] 



[17] 



[ (1 - Prevalence) x Specificity] + 
[Prevalence x (1 - Sensitivity)] 

From inspection of these formulas, it should be apparent that 
regardless of sensitivity and specificity, predictive power will 
vary between and 1 as a function of prevalence. Application 
of formulas 16 and 17 to the data presented in Table 1-9 
across the range of possible baserates provides the range of 
possible PPP and NPP values depicted in Figure 1-5. 

As can be seen in Figure 1-5, the relation between predic- 
tive power and prevalence is curvilinear and asymptotic with 



endpoints at and 1. For any given test cutoff score, PPP will 
always increase with baserate, while NPP will simultaneously 
decrease. For the hypothetical test being considered, one can 
see that both PPP and NPP are moderately high (at or above 
.80) when the COI baserate ranges from 20% to 50%. The 
tradeoff between PPP and NPP at high and low baserate levels 
is also readily apparent; as the baserate increases above 50%, 
PPP exceeds .95, while NPP declines, falling below .50 as the 
baserate exceeds 80%. Conversely, as the baserate falls below 
30%, NPP exceeds .95 while PPP rapidly drops off, falling be- 
low 50% as the baserate falls below 7%. 

From the forgoing, it is apparent that the predictive power 
values derived from data presented in Table 1-9 would not be 
applicable in settings where baserates vary from the 50% 
value in the hypothetical data set. This is important because 
in practice, clinicians may often be presented with PPP values 
based on data where "prevalence" values are near 50%. This is 
due to the fact that, regardless of the prevalence of a COI in 
the population, diagnostic validity studies typically employ 
equal sized samples of COI + and COI - individuals to facilitate 
statistical analyses. In contrast, the actual prevalence of COIs 
in the population is rarely 50%. The actual prevalence of a 
COI and the PPP in some clinical settings may be substan- 
tially lower than that reported in validity studies, particularly 
when a test is used for screening purposes. 

For example, suppose that the data from Table 1-9 were 
from a validity trial of a neuropsychological measure designed 
for administration to young children for purposes of predict- 
ing later development of schizophrenia. The question then 
arises: should the measure be used for broad screening given a 
lifetime schizophrenia prevalence of .008? Using Formula 16, 
one can determine that for this purpose the measures PPP is 
only .11 and thus the "positive" test results would be incorrect 
89% of the time. 11 Conversely, the prevalence of a COI may in 
some settings be substantially higher than 50%. As an exam- 
ple of the other extreme, the baserate of head injuries among 
persons referred to a head-injury rehabilitation service based 
on documented evidence of a blow to the head leading to loss 
of consciousness is essentially 100%, in which case the use of 
neuropsychological tests to determine whether or not exami- 
nees had sustained a "head injury" would not only be redun- 
dant, but very likely lead to false negative errors (such tests 
could of course be legitimately used for other purposes, such 
as grading injury severity). Clearly, clinicians need to carefully 
consider published data concerning sensitivity, specificity, and 
predictive power in light of intended test use and, if necessary, 
calculate PPP and NPP values and COI baserate estimates 
applicable to specific groups of examinees seen in their own 
practices. 

Difficulties With Estimating and Applying Baserates 

Prevalence estimates for some COIs may be based on large- 
scale epidemiological studies that provide very accurate preva- 
lence estimates for the general population or within specific 
subpopulations (e.g., the prevalence rates of various psychiatric 



disorders in inpatient psychiatric settings). However, in some 
cases, no prevalence data may be available or reported preva- 
lence data may not be applicable to specific settings or sub- 
populations. In these cases, clinicians who wish to determine 
predictive power must develop their own baserate estimates. 
Ideally, these can be derived from data collected within the 
same setting in which the test will be employed, though this is 
typically time consuming and many methodological chal- 
lenges may be faced, including limitations associated with 
small sample sizes. Methods for estimating baserates in such 
context are beyond the scope of this chapter; interested read- 
ers are directed to Mossmann (2003), Pepe (2003), and Rorer 
and Dawes (1982). 

Why Are Classification Actuary Statistics 

Not Ubiquitous in Neuropsychological Research 

and Clinical Practice? 

Of note, the mathematical relations between sensitivity, speci- 
ficity, prevalence, and predictive power were first elucidated 
by Thomas Bayes and published in 1763; methods for deriv- 
ing predictive power and other related indices of confidence 
in decision making are thus often referred to as Bayesian sta- 
tistics. 12 Needless to say, Bayes's work predated the first diag- 
nostic applications of psychological tests as we know them 
today. However, although neuropsychological tests are rou- 
tinely used for diagnostic decision making, information on the 
predictive power of most tests is often absent from both test 
manuals and applicable research literature. This is so despite 
the fact that the importance and relevance of Bayesian ap- 
proaches to the practice of clinical psychology was well de- 
scribed 50 years ago by Meehl and Rosen (1955), and has been 
periodically addressed since then (Willis, 1984; Elwood, 1993; 
Ivnik et al., 2001). Bayesian statistics are finally making major 
inroads into the mainstream of neuropsychology, particularly 
in the research literature concerning symptom validity mea- 
sures, in which estimates of predictive power have become 
de rigueur, although these are still typically presented with- 
out associated standard errors, thus greatly reducing utility of 
the data. 

Determining the Optimum Cutoff Score— ROC 
Analyses and Other Methods 

The forgoing discussion has focused on the diagnostic accu- 
racy of tests using specific cutoff points, presumably ones that 
are optimum cutoffs for given tasks such as diagnosing de- 
mentia. A number of methods for determining an optimum 
cutoff point are available and although they may lead to simi- 
lar results, the differences between them are not trivial. Many 
of these methods are mathematically complex and/or compu- 
tationally demanding, thus requiring computer applications. 

The determination of an optimum cutoff score for detec- 
tion or diagnosis of a COI is often based on simultaneous 
evaluation of sensitivity and specificity or predictive power 
across a range of scores. In some cases, this information, in 



Psychometrics in Neuropsychological Assessment 23 
Figure 1 -6 An ROC graph. 




.3 .4 .5 .6 .7 
False-Positive Probability 



1.0 



tabular or graphical form, is simply inspected and a score is 
chosen based on a researcher or clinician's comfort with a 
particular error rate. For example, in malingering research, 
cutoffs that minimize false-positive errors or hold them below 
a low threshold are often implicitly or explicitly chosen, even 
when such cutoffs are associated with relatively large false- 
negative error rates. 

A more formal, rigorous and often very useful set of tools 
for choosing cutoff points and for evaluating and comparing 
test utility for diagnosis and decision making and for deter- 
mining optimum cutoff scores falls under the rubric of Re- 
ceiver Operating Characteristics (ROC) analyses. Clinicians 
who use tests for diagnostic or other decision-making pur- 
poses should be familiar with ROC procedures. The statistical 
procedures utilized in ROC analyses are closely related to and 
substantially overlap those of Bayesian analyses. The central 
graphic element of ROC analyses is the ROC graph, which is a 
plot of the true positive proportion (Y-axis) against the false 
positive proportion (X-axis) associated with each specific 
score in a range of test scores. Figure 1-6 shows an example 
ROC graph. The area under the curve is equivalent to the 
overall accuracy of the test (proportion of the entire sample 
correctly classified), while the slope of the curve at any point 
is equivalent to the LR + associated with a specific test score. 

A number of ROC methods have been developed for de- 
termining cutoff points that consider not only accuracy, but 
also allow for factoring in quantifiable or quasi-quantifiable 
costs and benefits, and the relative importance of specific 
costs and benefits associated with any given cutoff score. ROC 
methods may also be used to compare the diagnostic utility of 
two or more measures, which may be very useful for purposes 
of test selection. Although ROC methods can be very useful 
clinically, they have not yet made great inroads in clinical 



24 A Compendium of Neuropsychological Tests 



neuropsychological literature. A detailed discussion of ROC 
methods is beyond the scope of this chapter; interested read- 
ers are referred to Mossmann and Somoza (1992), Pepe 
(2003), Somoza and Mossmann (1992), and Swets, Dawes and 
Monahan(2000). 

Evaluation of Predictive Power Across a Range 
of Cutoff Scores and Baserates 

As noted above, it is important to recognize that positive and 
negative predictive power are not properties of tests, but 
rather are properties of specific test scores in specific contexts. 
The forgoing sections describing the calculation and interpre- 
tation of predictive power have focused on methods for evalu- 
ating the value of a single cutoff point for a given test for 
purposes of classifying examinees as COI + or COI~. However, 
by focusing exclusively on single cutoff points, clinicians are 
essentially transforming continuous test scores into binary 
scores, thus discarding much potentially useful information, 
particularly when scores are considerably above or below a 
cutoff. Lindeboom (1989) proposed an alternative approach 
in which predictive power across a range of test scores and 
baserates can be displayed in a single Bayesian probability 
table. In this approach, test scores define the rows and baser- 
ates define the columns of a table; individual table cells con- 
tain the associated PPP and NPP for a specific score and 
specific baserate. Such tables have rarely been constructed for 
standardized measures, but examples can be found in some 
test manuals (e.g., the Victoria Symptom Validity Test; Slick 
et al., 1997). The advantage of this approach is that it allows 
clinicians to consider the diagnostic confidence associated 
with an examinee's specific score, leading to more accurate 
assessments. A limiting factor for use of Bayesian probability 
tables is that they can only be constructed when sensitivity 
and specificity values for an entire range of scores are avail- 
able, which is rarely the case for most tests. In addition, pre- 
dictive power values in such tables are subject to any validity 
limitations of underlying data, and should include associated 
standard errors or confidence intervals. 

Evaluating Predictive Power in the Context 
of Multiple Tests 

Often more than one test that provides data relevant to a spe- 
cific diagnosis is administered. In these cases, clinicians may 
wish to integrate predictive power estimates across measures. 
There may be a temptation to use the PPP associated with a 
score on one measure as the "baserate" when the PPP for a 
score from a second measure is calculated. For example, sup- 
pose that the baserate of a COI is 15%. When a test designed 
to detect the COI is administered, an examinee's score trans- 
lates to a PPP of 65%. The examiner then administers a sec- 
ond test designed to detect the COI, but when PPP for the 
examinee's score on the second test is calculated, a "baserate" 
of 65% is used rather than 15%, as the former is now the 



assumed prior probability that the examinee has the COI, 
given their score on the first test administered. The resulting 
PPP for the examinee's score on the second measure is now 
99% and the examiner concludes that the examinee has the 
COI. While this procedure may seem logical, it will produce 
an inflated PPP estimate for the second test score whenever 
the two measures are correlated, which will almost always be 
the case when both measures are designed to screen for or di- 
agnose the same COI. At present, there is no simple mathe- 
matical model that can be used to correct for the degree of 
correlation between measures so that they can be used in such 
an iterative manner; therefore this practice should be avoided. 
A preferred psychometric method for integrating scores 
from multiple measures, which can only be used when nor- 
mative data are available, is to construct optimum group 
membership (i.e., COI + vs. COI~) prediction equations or 
classification rules using logistic regression or multiway fre- 
quency analyses, which can then be cross-validated, and 
ideally distributed in an easy-to-use format such as software. 
More details on methods for combining information across 
measures maybe found in Franklin (2003b) and Pepe (2003). 



ASSESSING CHANGE OVER TIME 

Neuropsychologists are often interested in and/or confronted 
with issues of change in function over time. In these contexts 
three interrelated questions arise: 

• To what degree do changes in examinee test scores 
reflect "real" changes in function as opposed to mea- 
surement error? 

• To what degree do real changes in examinee test scores 
reflect clinically significant changes in function as 
opposed to clinically trivial changes? 

• To what degree do changes in examinee test scores 
conform to expectations, given the application of 
treatments or the occurrence of other events or pro- 
cesses occurring between test and retest, such as head 
injury, dementia or brain surgery? 

A number of statistical/psychometric methods have been de- 
veloped for assessing changes observed over repeated admin- 
istrations of neuropsychological tests; these differ considerably 
with respect to mathematical models and assumptions re- 
garding the nature of test data. As with most areas of psycho- 
metrics, the problems and processes involved in decomposing 
observed scores (i.e., change scores) into measurement error 
and "true" scores are often complex. Clinicians are certainly 
not aided by the lack of agreement about which methods to 
use for analysing test-retest data, limited retest data for many 
tests and limited coverage and direction concerning retest 
procedures in most test manuals. Only a relatively brief dis- 
cussion of this important area of psychometrics is presented 
here. Interested readers are referred to other sources (e.g., 
Chelune, 2003) for a more in-depth review. 



Psychometrics in Neuropsychological Assessment 25 



Reference Group Change Score Distributions 

If a reference or normative sample is administered a test 
twice, the distribution of observed change scores ("change 
score" = retest score minus baseline score) can be quantified. 
When such information is available, individual examinee 
change scores can be transformed into standardized change 
scores (e.g., percentiles), thus providing information on the 
degree of unusualness of any observed change in score. Un- 
fortunately, it is rarely possible to use this method of evaluat- 
ing change due to major limitations in most data available in 
test manuals. Retest samples tend to be relatively small for 
many tests, thus limiting generalizability. This is particularly 
important when change scores vary with demographic vari- 
ables (e.g., age and level of education) and/or initial test score 
level (e.g., normal vs. abnormal), because retest samples typi- 
cally are restricted with respect to both. Second, retest samples 
are often obtained within a short period of time after initial 
testing, typically less than two months, whereas in clinical 
practice typical test-retest intervals are often much longer. 
Thus any effects of extended test-retest intervals on change 
score distributions are not reflected in most change-score data 
presented in test manuals. Lastly, change score information is 
typically presented in the form of summary statistics (e.g., 
mean and SD) that have limited utility if change scores are not 
normally distributed (in which case percentile tables would be 
much preferable). As a result of these limitations, clinicians 
often must turn to other methods for analyzing change. 



The Reliable Change Index (RCI) 

Jacobson and Truax (1991; see also Jacobson et al., 1999) pro- 
posed a psychometric method for determining if changes in 
test scores over time are reliable (i.e., not an artefact of imper- 
fect test reliability) . This method involves calculation of a Re- 
liable Change Index (RCI). The RCI is an indicator of the 
probability that an observed difference between two scores 
from the same examinee on the same test can be attributed to 
measurement error (i.e., to imperfect reliability). When there 
is a low probability that the observed change is due to mea- 
surement error, one may infer that it reflects other factors, 
such as progression of illness, treatment effects, and/or prior 
exposure to the test. 

The RCI is calculated using the Standard Error of the Dif- 
ference (SE D ), an index of measurement error derived from 
classical test theory. It is the standard deviation of expected 
test-retest difference scores about a mean of given an as- 
sumption that no actual change has occurred. 13 The formula 
for the SE D is: 

SE D = ^2 -(SEM) 2 [18] 

where SEM is the Standard Error of Measurement, as previ- 
ously defined in Formula 6. Inspection of Formula 18 reveals 
that tests with a large SEM will have a large SE D . The RCI for a 
specific score is calculated by dividing the observed amount of 



change by the SE D transforming observed change scores into 
SE D units. The formula is given below: 



(Sa-SjVSEr 



[19] 



Where: 



Sj = an examinee's initial test score 

S 2 = an examinee's score at retest on the same measure 

The resulting RCI scores can be either negative or positive and 
can be thought of as a type of z score that can be interpreted 
with reference to upper or lower tails of a normal probability 
distribution. Therefore, RCI scores falling outside a range of 
-1.96 to 1.96 would be expected to occur less than 5% of the 
time as a result of measurement error alone, assuming that an 
examinee's true retest score had not changed since the first 
test. The assumption that an examinee's true score has not 
changed can therefore be rejected at p < .05 (two-tailed) when 
his or her RCI score is above 1.96 or below -1.96. 

The RCI is directly derived from classical test theory. Thus 
internal-consistency reliability (Cronbach's a) is used to esti- 
mate measurement error rather than test-retest reliability, as 
the latter reflects not just test-intrinsic measurement error, 
but also any additional variation over time arising from real 
changes in function and the effect of other intervening vari- 
ables. Thus use of test-retest reliability introduces additional 
complexity into the meaning of the RCI. 

The RCI is often calculated using SD (to calculate SEM) 
and reliability estimates obtained from test normative sam- 
ples. However, as these values may not be applicable to the 
clinical group to which an examinee belongs, care must be 
taken in interpretation of the RCI in such circumstances. It 
may be preferable to use SD and reliability estimates from 
samples similar to an examinee, if these are available. Because 
the SE D value is constant for any given combination of test 
and reference sample, it can be used to construct RCI confi- 
dence intervals applicable to any initial test score obtained 
from a person similar to the reference sample, using the for- 
mula below: 



RCI - CI = S 1 ± (z • SE D ) 



[20] 



Where: 



Sj = Initial test score 

z= z score associated with a given confidence range 
(e.g., 1.64 for a 90% C.I.) 

Retest scores falling outside the desired confidence interval 
about initial scores can be considered evidence of a significant 
change. Note that while a "significant" RCI value may be con- 
sidered as a prerequisite, it is not by itself sufficient evidence 
that clinically significant change has occurred. Consider RCIs 
in the context of highly reliable tests: relatively small score 
changes at retest can produce significant RCIs, but both the 
initial test score and retest score may remain within the same 
classification range (e.g., normal) so that the clinical implica- 
tions of observed change may be minimal. In addition, use of 



26 A Compendium of Neuropsychological Tests 



the RCI implicitly assumes that no practice effects pertain. 
When practice effects are present, significant RCI values may 
partially or wholly reflect effects of prior test exposure rather 
than a change in underlying functional level. 

To allow RCIs to be used with tests that have practice ef- 
fects, Chelune et al. (1993) suggest a modification to calcula- 
tion of the RCI in which the mean change score for a 
reference group is subtracted from the observed change score 
of an individual examinee and the result is used as an Ad- 
justed Change Score for purposes of calculating an Adjusted 
RCI. Alternatively, an RCI confidence interval calculated using 
Formula 21 could have its endpoints adjusted by addition of 
the mean change score. 



Adj. RCI - CI = (Sj + M c ) + (z ■ SE D ) 



[21] 



Where: 



Sj = Initial test score 
M c = Mean change score (Retest - Test) 
z= z score associated with a given confidence range 
(e.g., 1.64 for a 90% C.I.) 

This approach appears to offer some advantages over the tra- 
ditional RCI, particularly for tests where large practice effects 
are expected. However, adjusting for practice in this way is 
problematic in a number of ways, first and foremost of which 
is the use of a constant term for the practice effect, which will 
not reflect any systematic variability in practice effects across 
individuals. Secondly, neither standard nor adjusted RCIs ac- 
count for regression toward the mean because the associated 
estimated measurement error is not adjusted proportionally 
for the extremity of observed change. 

Standardized Regression-Based Change Scores 

The RCI may provide useful information regarding the likeli- 
hood of a meaningful change in the function being measured 
by a test, but as noted above, it may have limited validity in 
some circumstances. Many quantifiable factors not accounted 
for by RCI may influence or predict retest scores, including 
test-retest interval, baseline ability level (Time 1 score), scores 
from other tests, and examinee characteristics such as gender, 
education, age, acculturation, and neurological or medical 
conditions. In addition, while RCI scores factor in test reliabil- 
ity, error is operationalized as a constant that does not account 
for regression to the mean (i.e., the increase in measurement 
error associated with more extreme scores). One method for 
evaluating change that does allow clinicians to account for ad- 
ditional predictors and also controls for regression to the 
mean is the use of linear regression models (Crawford & 
Howell, 1998; Hermann et al, 1991). 14 

With linear regression models, predicted retest scores are 
derived and then compared with observed retest scores for 
purposes of determining if deviations are "significant." In the 
preferred method, this is accomplished by dividing the differ- 
ence between obtained retest scores and regression-predicted 
retest scores by the Standard Error for Individual Predicted 



Scores (SE Y ). Because score differences are divided by a stan- 
dard error, the resulting value is considered to be standard- 
ized. The resulting standardized score is in fact a t statistic that 
can be translated into a probability value using an appropriate 
program or table. Small probability values indicate that the 
observed retest score differs significantly from the predicted 
value. The SEy is used because, unlike the Standard Error of 
the Regression, it is not constant across cases, but increases as 
individual values of independent variables deviate from the 
mean, thus accounting for regression to the mean on a case- 
by-case basis (Crawford & Howell, 1998). Thus, persons who 
are outliers with respect to their scores on predictor variables 
will have larger margins of error associated with their pre- 
dicted scores and thus larger changes in raw scores will be re- 
quired to reach significance for these individuals. 

As with other standardized scores (e.g., z scores), stan- 
dardized regression-based change scores (SRB scores) from 
different measures can be directly compared, regardless of the 
original test score metric. However, a number of inferential 
limitations of such comparisons, described in the section on 
standardized scores earlier in this chapter, still apply. Regres- 
sion models can also be used when one wishes to consider 
change scores from multiple tests simultaneously; these are 
more complex and will not be covered here (see McCleary, 
et al., 1996). 

As an example of the application of SRB scores, consider 
data on IQ in children with epilepsy reported by Sherman 
et al. (2003). They found that in samples of children with in- 
tractable epilepsy who were not treated surgically, FSIQ scores 
at retest could be predicted by baseline FSIQ and number of 
anti-epileptic medications (AEDs) that the children were tak- 
ing at baseline. The resulting Multiple R 2 value was large 
(.92), indicating that the equation had acceptable predictive 
value. The resulting regression equation is given below: 

FSIQ retest = (0.965 x FSIQ baselme ) 
+ (-4.519 x AEDs baselme ) + 7.358 

It can be seen from inspection of this equation that predicted 
retest FSIQ values were positively related to baseline IQ and 
inversely related to number of AEDs being taken at baseline. 
Therefore, for children who were not taking any AEDs at 
baseline, a modest increase in FSIQ at retest was expected, 
while for those taking one or more AEDs (a marker of epilepsy 
severity), IQs tended to decline over time. Given a baseline 
FSIQ of 100, the predicted FSIQs at retest for children taking 0, 
1, 2, and 3 AEDs at baseline were 104, 99, 95, and 90, respec- 
tively. Using a program developed by Crawford 8c Howell 
(1998), Sherman et al. (2003) were able to determine which 
children in the surgery sample demonstrated unusual change, 
relative to expectations for children who did not receive sur- 
gery. For example, a child in the sample was taking 2 AEDs and 
had a FSIQ of 53 at baseline. The predicted retest IQ was thus 
49 but the actual retest IQ following right anterior temporal 
lobectomy was 63. The observed change was 14 points higher 
than the predicted change; the associated p value was .039 and 
thus the child was classified as obtaining a significantly higher 



Psychometrics in Neuropsychological Assessment 27 



than predicted retest score. The inference in this case was that 
better than expected FSIQ outcome was a positive effect of 
epilepsy surgery. Other examples of regression equations de- 
veloped for specific neuropsychological tests are presented 
throughout this volume. 

Limitations of Regression-Based Change Scores 

It is important to understand the limitations of regression 
methods. Regression equations based on smaller sample sizes 
will lead to large error terms so that meaningful predicted- 
obtained differences may be missed. Equations from large- 
scale studies or from cross-validation efforts are therefore 
preferred. In order to maximize utility, sample characteristics 
should match populations seen clinically and predictor vari- 
ables should be carefully chosen to match data that will likely 
be available to clinicians. Test users should generally avoid 
interpolation — that is, they should avoid applying a regres- 
sion equation to an examinee's data (predictor variables and 
test-retest scores) when the data values fall outside the ranges 
for corresponding variables comprising the regression equa- 
tion. For example, if a regression equation is developed for 
predicting IQ at retest from a sample with initial IQ scores 
ranging from 85 to 125, it should not be applied to an exami- 
nee whose initial IQ is 65. Finally, SRB scores should only be 
derived and used when necessary assumptions concerning 
residuals are met (see Pedhazur, 1997, pp. 33-34). 

It is critical to understand that SRB scores do not necessar- 
ily indicate whether a clinically significant change from base- 
line level has occurred — for which use of RCIs may be more 
appropriate. Instead, SRB scores are an index of the degree to 
which observed change conforms to established trends in a 
reference population. These trends may consist of increases or 
decreases in performance over time in association with com- 
binations of influential predictor variables, such as type and 
severity of illness, treatment type, baseline cognitive level, 
gender, age, and test-retest interval. Expected trends may in- 
volve increased scores at retest for healthy individuals, but de- 
creased scores for individuals with progressive neurological 
disease. The following two examples will illustrate this point. 

In the first example, consider a hypothetical scenario of a 
treatment for depression that is associated with improved 
post-treatment scores on a depression inventory, such that in 
a clinical reference sample, the test-retest correlation is high 
and the average improvement in scores at retest exceeds the 
threshold for clinical significance as established by RCI. In the 
simplest case (i.e., using only scores from Time 1), regression- 
predicted retest scores would be equivalent to the mean score 
change observed in the clinical reference sample. In this case, 
an examinee who at retest obtained a depression score at or 
near the post-treatment mean would obtain a non-significant 
SRB score but a significant RCI score, indicating that they 
demonstrated the typically seen clinically significant improve- 
ment in response to treatment. Conversely, an examinee who 
obtained an unchanged depression score following treatment 
would obtain a significant SRB score but a non-significant RCI 



score, indicating that they did not show the typically seen sig- 
nificant improvement in response to treatment. 

In the second example, consider a hypothetical scenario of 
a memory test that has significant prior-exposure (i.e., learn- 
ing) effects such that in the normative sample the test-retest 
correlation is high and the average improvement in scores at 
retest exceeds the threshold for clinical significance as estab- 
lished by RCI. As with the depression score example, in the 
simplest case (i.e., using only scores from Time 1), regression- 
predicted retest scores would be equivalent to the mean score 
change observed in the reference sample. In this case, an ex- 
aminee who at retest obtained a memory score at or near the 
retest mean would obtain a non-significant SRBC score but a 
significant RCI score, indicating that they demonstrated the 
typically seen prior exposure/learning effect (note the differ- 
ence in interpretation from the previous example — the im- 
provement in score is assumed to reflect treatment effects in 
the first case and to be artifactual in the second case). Con- 
versely, an examinee who obtained an unchanged memory 
score following treatment would obtain a significant SRB score 
but a non-significant RCI score, indicating that they did not 
show the typically seen prior exposure/learning effect. Con- 
ceivably, in the context of a clinical referral, the latter finding 
might be interpreted as reflective of memory problems (see 
Sawrie, et al, 1996, and Temkin et al., 1999, for excellent ex- 
amples of studies comparing utility of RCIs and SRB scores in 
clinical samples). 

Clinically Significant Change 

Once a clinician has determined that an observed test score 
change is reliable, he or she will usually need to determine 
whether the amount of change is clinically meaningful. Jacob- 
son and Truax (1991) proposed that clinically significant 
change occurs, in the context of treatment, when an exami- 
nee's score (e.g., on the Beck Depression Inventory) moves 
from within the clinical "depressed" range into the normal 
population range. However, this definition of clinically signif- 
icant change is not always relevant to neuropsychological as- 
sessment. There are at present no widely accepted criteria for 
clinically significant change within the context of neuropsy- 
chological assessment. Rather, the determination of clinical 
significance of any observed change that is reliable will de- 
pend greatly on the specific context of the assessment. 



NORMAL VARIATION 

Ingraham and Aiken (1996) have noted that when clinicians 
attempt to interpret examinee profiles of scores across multi- 
ple tests they "confront the problem of determining how 
many deviant scores are necessary to diagnose a patient as ab- 
normal or whether the configuration of scores is significantly 
different from an expected pattern" (p. 120). They further 
note that the likelihood that a profile of tests scores will ex- 
ceed criteria for abnormality increases as: (1) the number of 



28 A Compendium of Neuropsychological Tests 



tests in a battery increases; (2) the z score cutoff used to clas- 
sify a test score as abnormal decreases; and (3) the number of 
abnormal test scores required to reach criteria decreases. In- 
graham and Aiken (1996) developed a mathematical model 
that may be used for determining the likelihood of obtaining 
an abnormal test result from a given number of tests. Implicit 
in this model is an assumption that some "abnormal" test 
scores are spurious. As Ingraham and Aiken (1996) note, the 
problem of determining whether a profile of test scores meets 
criteria for abnormality is considerably complicated by the 
fact that most neuropsychological measures are intercorre- 
lated and therefore the probabilities of obtaining abnormal 
results from each test are not independent. However, they 
provide some suggested guidelines for adapting their model 
or using other methods to provide useful approximations. 

In a related vein of research, Schretlen et al. (2003), noting 
that little is known about what constitutes the normal range 
of intraindividual variation across cognitive domains and, by 
extension, associated test scores, evaluated normal variation 
in a sample of 197 healthy adults who were participants in a 
study on normal aging. For each individual, the Maximum 
Discrepancy, or MD (the absolute difference between stan- 
dard scores from two measures expressed in units of standard 
deviation) across scores from 15 commonly used neuropsy- 
chological measures was calculated. The smallest MD value 
observed was 1.6 SD, while the largest was 6.1 SD. Two thirds 
of the sample obtained MD scores in excess of 3 SD, and when 
these were recalculated with highest and lowest scores omit- 
ted, 27% of the sample still obtained MD scores exceeding 
3 SD. Schretlen et al. (2003) concluded from this data that 
"marked intraindividual variability is very common in nor- 
mal adults, and underscores the need to base diagnostic infer- 
ences on clinically recognizable patterns rather than 
psychometric variability alone" (p. 864). While the number of 
"impaired" scores obtained by each healthy participant was 
not reported, 44% of the sample were found to have at least 
one test score more than 2 SD below their estimated IQ score. 
Similarly Palmer et al. (1998) and Taylor and Heaton (2001) 
have reported that it is not uncommon for healthy people to 
show isolated weakness in one test or area. These data are cer- 
tainly provocative and strongly suggest that additional large- 
scale studies of normal variability and prevalence of 
impaired-range scores among healthy persons are clearly war- 
ranted. Clinicians should always consider available data on 
normal variability (e.g., Index Score discrepancy baserates for 
Wechsler Scales) when interpreting test scores. When these 
data are not available, mathematical models and research data 
suggest that a conservative approach to interpretation is war- 
ranted when considering a small number of score discrepan- 
cies or abnormal scores from a large test battery. 

A Final Word on the Imprecision of 
Psychological Tests 

Though progress has been made, much work remains to be 
done in developing more psychometrically sound and clinically 
efficient and useful measures. At times, the technical limita- 



tions of many neuropsychological tests currently available 
with regard to measurement error, reliability, validity, diag- 
nostic accuracy, and other important psychometric character- 
istics may lead to questions regarding their worth in clinical 
practice. Indeed, informed consideration may, quite appropri- 
ately, lead neuropsychologists to limit or completely curtail 
their use of some measures. The extreme argument would be 
to completely exclude any tests that entail measurement error, 
effectively eliminating all forms of objective measurement of 
human characteristics. However, it is important to keep in 
mind the limited and unreliable nature of human judgment, 
even expert judgment, when left to its own devices. 

Dahlstom (1993) provides the following historical example. 
While in the midst of their groundbreaking work on human 
intelligence and prior to the use of standardized tests to diag- 
nose conditions affecting cognition, Binet and Simon (1907) 
carried out a study on the reliability of diagnoses assigned to 
children with mental retardation by staff psychiatrists in three 
Paris hospitals. The specific categories included 'Tidiotie," 
"Pimbecilite," and "la debilite mentale" (corresponding to the 
unfortunate diagnostic categories of idiot, imbecile, and mo- 
ron, respectively). Binet and Simon reported the following: 

We have made a methodical comparison between the 
admission certificates filled out for the same children 
within only a few days' interval by the doctors of Sainte- 
Anne, Bicetre, the Salpetriere, and Vaucluse. We have 
compared several hundreds of these certificates, and we 
think we may say without exaggeration that they looked 
as if they had been drawn by chance out of a sack. (p. 76) 

Dahlstom (1993) goes on to state that "this fallibility in the 
judgments made by humans about fellow humans is one of the 
primary reasons that psychological tests have been developed 
and applied in ever-increasing numbers over the past century" 
(p. 393). In this context, neuropsychological tests need not be 
perfect, or even psychometrically exceptional; they need only 
meaningfully improve clinical decision making and signifi- 
cantly reduce errors of judgment — those errors stemming from 
prejudice, personal bias, halo effects, ignorance, and stereotyp- 
ing — made by people when judging other people (Dahlstom, 
1993; see also Meehl, 1973). The judicious selection, appropri- 
ate administration, and well-informed interpretation of stan- 
dardized tests will usually achieve this result. 



GRAPHICAL REPRESENTATIONS OF TEST DATA 

It is often useful to have a visual representation of test perfor- 
mance in order to facilitate interpretation and cross-test com- 
parison. For this purpose, we include example profile forms 
(Figures 1-7 and 1-8 on pp. 32-43) which we use to graphi- 
cally or numerically represent neuropsychological performance 
in the individual patient. We suggest that these forms be used 
to draw in confidence intervals for each test rather than point 
estimates. We also include a sample form that we use for eval- 
uation of children involving repeat assessment, such as epilepsy 
surgical candidates. 



Psychometrics in Neuropsychological Assessment 29 



NOTES 

1. It should be noted that Pearson later stated that he regretted 
his choice of "normal" as a descriptor for the normal curve; this 
"[had] the disadvantage of leading people to believe that all other 
distributions of frequency are in one sense or another 'abnormal'. 
That belief is, of course, not justifiable" (Pearson, 1920, p. 25). 

2. Micceti analyzed 400 datasets, including 30 from national 
tests and 131 from regional tests, on 89 different populations ad- 
ministered various psychological and education tests and found 
that extremes of asymmetry and "lumpiness" (i.e., appearance of 
distinct subpopulations in the distribution) were the norm rather 
than the exception. General ability measures tended to fare better 
than other types of tests such as achievement tests, but the results 
suggested that the vast majority of groups tested in the real world 
consist of subgroups that produce non-normal distributions, lead- 
ing Micceti to state that despite "widespread belief [ . . . ] in the 
naive assumption of normality," there is a "startling" lack of evi- 
dence to this effect for achievement tests and psychometric mea- 
sures (p. 156). 

3. Ironically, measurement error cannot be known precisely and 
must also be estimated. 

4. Note that this model focuses on test characteristics and does 
not explicitly address measurement error arising from particular 
characteristics of individual examinees or testing circumstances. 

5. In most cases, even though it is usually provided in the same 
metric as test scores (i.e., standard score units), users should note 
that some test publishers report SEMs in raw score units, which fur- 
ther impedes interpretation. 

6. When interpreting confidence intervals based on the SEM, it is 
important to bear in mind that while these provide useful informa- 
tion about the expected range of scores, such confidence intervals are 
based on a model that assumes expected performance across a large 
number of randomly parallel forms. Ideally, test users would there- 
fore have an understanding of the nature and limitations of classical 
test models and their applicability to specific tests in order to use es- 
timates such as the SEM appropriately. 

7. There are quite a number of alternate methods for estimating 
error intervals and adjusting obtained scores for regression to 
the mean and other sources of measurement error (Glutting, 
McDermott & Stanley, 1987) and there is no universally agreed-upon 
method. Indeed, the most appropriate methods may vary across 
different types of tests and interpretive uses, though the majority of 
methods will produce roughly similar results in many cases. A review 
of alternate methods for estimating and correcting for measurement 
error is beyond the scope of this book; the methods presented were 
chosen because they continue to be widely used and accepted and 
they are relatively easy to grasp conceptually and mathematically. Re- 
gardless, in most cases, the choice of which specific method is used 
for estimating and correcting for measurement error is far less im- 
portant than the issue of whether any such estimates and corrections 
are calculated and incorporated into test score interpretation. That is, 
test scores should not be interpreted in the absence of consideration 
of measurement error. 

8. COIs and outcomes of interest may also be defined along a 
continuum from binary (present-absent) to multiple discrete cate- 
gories (mild, moderate, severe) to fully continuous (percent impair- 
ment). This chapter will only consider the binary case. 

9. In medical literature, these may be referred to as the Predictive 
Value of a Positive Test (PV + ) or Positive Predictive Value (PPV) and 
the Predictive Value of a Negative Test (PV~) or Negative Predictive 
Value (NPV). 



10. Predictive power values at or below .50 should not be auto- 
matically interpreted as indicating that a COI is not present or that a 
test has no utility. For example, if the population prevalence of a 
COI is .05 and the PPP based on test results is .45, a clinician can 
rightly conclude that an examinee is much more likely to have the 
COI than members of the general population, which may be clini- 
cally relevant. 

11. Recalculating the PPP for this scenario using low and high 
values of Sensitivity and Specificity as defined by 95% confidence 
limits derived earlier from the data in Table 1-9 gives a worst-case to 
best-case PPP range of .03 to .41. 

12. In Bayesian terminology, prevalence of a COI is known as the 
prior probability, while PPP and NPP are known as posterior probabili- 
ties. Conceptually, the difference between the prior and posterior prob- 
abilities associated with information added by a test score is an index 
of the diagnostic utility of a test. There is an entire literature concern- 
ing Bayesian methods for statistical analysis of test utility. These will 
not be covered here and interested readers are referred to Pepe (2003). 

13. Compare this approach with use and limitations of the SE P , 
as described earlier in this chapter. 

14. The basics of linear regression will not be covered here; see 
Pedhauzer(1997). 



REFERENCES 

Altman, D. G., & Bland, ). M. (1983). Measurement in medicine: The 
analysis of method comparison. Statistician, 32, 307-317. 

American Educational Research Association, American Psychological 
Association, & National Council on Measurement in Education. 
(1999). Standards for educational and psychological testing. Wash- 
ington, DC: American Psychological Association. 

Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). Up- 
per Saddle River, N): Prentice Hall. 

Andrews, G., Peters, L., & Teesson, M. (1994). The measurement of 
consumer outcomes in mental health. Canberra, Australia: Aus- 
tralian Government Publishing Services. 

Axelrod, B. N., & Goldman, R. S. (1996). Use of demographic 
corrections in neuropsychological interpretation: How stan- 
dard are standard scores? The Clinical Neuropsychologist, 10(2), 
159-162. 

Baron, I. S. (2004). Neuropsychological evaluation of the child. New 
York: Oxford University Press. 

Binet, A., & Simon, T. (1907). Les enfants anormaux. Paris: Armond 
Colin. 

Bland, J. M., & Altman, D. G. (1986). Statistical methods for assess- 
ing agreement between two methods of clinical measurement. 
Lancet, i, 307-310. 

Bornstein, R. F. (1996). Face validity in psychological assessment: Im- 
plications for a unified model of validity. American Psychologist, 
51(9), 983-984. 

Burlingame, G. M., Lambert, M. J., Reisinger, C. W., Neff, W. M., & 
Mosier, J. (1995). Pragmatics of tracking mental health outcomes 
in a managed care setting. Journal of Mental Health Administration, 
22, 226-236. 

Canadian Psychological Association. (1987). Guidelines for educa- 
tional and psychological testing. Ottawa, Canada: Canadian Psy- 
chological Association. 

Chelune, G. J. (2003). Assessing reliable neuropsychological change. 
In R. D. Franklin (Ed.), Prediction in Forensic and Neuropsychology: 
Sound Statistical Practices (pp. 65-88). Mahwah, NJ: Lawrence 
Erlbaum Associates. 



30 A Compendium of Neuropsychological Tests 



Chronbach, L. (1971). Test validation. In R. Thorndike (Ed.), Educa- 
tional measurement (2nd ed., pp. 443-507). Washington, DC: 
American Council on Education. 

Chronbach, L., & Meehl, P. E. (1955). Construct validity in psycho- 
logical tests. Psychological Bulletin, 52, 167-186. 

Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for 
evaluating normed and standardized assessment instruments in 
psychology. Psychological Assessment, 6(4), 284-290. 

Cicchetti, D. V., & Sparrow, S. S. (1981). Developing criteria for estab- 
lishing interrater reliability of specific items: Applications to as- 
sessment of adaptive behavior. American Journal of Mental 
Deficiency, 86, 127-137. 

Cicchetti, D. V., Volkmar, F„ Sparrow, S. S., Cohen, D., Fermanian, J., 
& Rourke, B. P. (1992). Assessing reliability of clinical scales when 
data have both nominal and ordinal features: Proposed guidelines 
for neuropsychological assessments. Journal of Clinical and Exper- 
imental Neuropsychology, 14(5), 673-686. 

Crawford, J. R., & Garthwaite, P. H. (2002). Investigation of the single 
case in neuropsychology: Confidence limits on the abnormality 
of test scores and test score differences. Neuropsychologia, 40, 
1196-1208. 

Crawford, J. R., & Howell, D. C. (1998). Regression equations in clin- 
ical neuropsychology: An evaluation of statistical methods for 
comparing predicted and obtained scores. Journal of Clinical and 
Experimental Neuropsychology, 20(5), 755-762. 

Dahlstom, W. G. (1993). Small samples, large consequences. Ameri- 
can Psychologist, 48(4), 393-399. 

Dikmen S. S., Heaton R. K., Grant I., & Temkin N. R. (1999). Test- 
retest reliability and practice effects of expanded Halstead-Reitan 
Neuropsychological Test Battery. Journal of the International Neu- 
ropsychological Society, 5(4):346-56. 

Dudek, F. J. (1979). The continuing misinterpretation of the standard 
error of measurement. Psychological Bulletin, 86(2), 335-337. 

Elwood, R. W. (1993). Clinical discriminations and neuropsychologi- 
cal tests: An appeal to Bayes' theorem. The Clinical Neuropsycholo- 
gist, 7, 224-233. 

Fastenau, P. S. (1998). Validity of regression-based norms: An empir- 
ical test of the Comprehensive Norms with older adults. Journal 
of Clinical and Experimental Neuropsychology, 20(6), 906-916. 

Fastenau, P. S., & Adams, K. M. (1996). Heaton, Grant, and 
Matthews' Comprehensive Norms: An overzealous attempt. Jour- 
nal of Clinical and Experimental Neuropsychology, 18(3), 444-448. 

Fastenau, P. S., Bennett, J. M., & Denburg, N. L. (1996). Application 
of psychometric standards to scoring system evaluation: Is "new" 
necessarily "improved"? Journal of Clinical and Experimental 
Neuropsychology, 18(5), 462-472. 

Ferguson, G. A. (1981). Statistical analysis in psychology and educa- 
tion (5th ed.). New York: McGraw-Hill. 

Franklin, R. D. (Ed.). (2003a). Prediction in Forensic and Neuropsy- 
chology: Sound Statistical Practices. Mahwah, NJ: Lawrence Erl- 
baum Associates. 

Franklin, R. D., & Krueger, J. (2003b). Bayesian Inference and Belief 
networks. In R. D. Franklin (Ed.), Prediction in Forensic and Neu- 
ropsychology: Sound Statistical Practices (pp. 65-88). Mahwah, NJ: 
Lawrence Erlbaum Associates. 

Franzen, M. D. (2000). Reliability and validity in neuropsychological as- 
sessment (2nd ed.). New York: Kluwer Academic/Plenum Publishers. 

Gasquoine, P. G. (1999). Variables moderating cultural and ethnic 
differences in neuropsychological assessment: The case of His- 
panic Americans. The Clinical Neuropsychologist, 13(3), 376-383. 

Glutting, J. J., McDermott, P. A., & Stanley, J. C. (1987). Resolving 
differences among methods of establishing confidence limits for 



test scores. Educational and Psychological Measurement, 47(3), 
607-614. 

Gottfredson, L. S. (1994). The science and politics of race-norming. 
American Psychologist, 49(11), 955-963. 

Greenslaw, P. S., & Jensen, S. S. (1996). Race-norming and the Civil 
Rights Act of 1991. Public Personnel Management, 25(1), 13-24. 

Hambelton, R. K. (1980). Test score validity and standard-setting 
methods. In R. A. Berk (Ed.), Criterion-referenced measurement: 
The state of the art (pp. 80-123). Baltimore, MD: Johns Hopkins 
University Press. 

Harris, J. G., &Tulsky, D. S. (2003). Assessment of the non-native En- 
glish speaker: Assimilating history and research findings to guide 
clinical practice. In D. S. Tulsky, D. H. Saklofske, G. J. Chelune, 
R. K. Heaton, R. Ivnik, R. Bornstein, A. Prifitera, & M. F. Ledbetter 
(Eds.), Clinical interpretation of the WAIS-III and WMS-III 
(pp. 343-390). New York: Academic Press. 

Heaton, R. K., Chelune, G. J., Talley, J. L., Kay, G. G., & Curtiss, G. 
(1993). Wisconsin Card Sorting Test Manual. Odessa, FL: PAR. 

Heaton, R. K., Taylor, M. J., & Manly, J. (2003). Demographic effects 
and use of demographically corrected norms with the WAIS-III 
and WMS-III. In D. S. Tulsky, D. H. Saklofske, G. J. Chelune, 
R. K. Heaton, R. Ivnik, R. Bornstein, A. Prifitera, & M. F. Ledbetter 
(Eds.), Clinical interpretation of the WAIS-III and WMS-III 
(pp. 181-210). New York: Academic Press. 

Hermann, B. P., Wyler, A. R., VanderZwagg, R., LeBailly, R. K., Whit- 
man, S., Somes, G., & Ward, J. (1991). Predictors of neuropsy- 
chological change following anterior temporal lobectomy: Role 
of regression toward the mean. Journal of Epilepsy, 4, 139-148. 

Ingraham, L. J., & Aiken, C. B. (1996). An empirical approach to de- 
termining criteria for abnormality in test batteries with multiple 
measures. Neuropsychology, 10(1), 120-124. 

Ivnik, R. J., Smith, G. E., & Cerhan, J. H. (2001). Understanding the 
diagnostic capabilities of cognitive tests. Clinical Neuropsycholo- 
gist. 15(1), 114-124. 

Jacobson, N. S., Roberts, L. J., Berns, S. B., & McGlinchey, J. B. (1999). 
Methods for defining and determining the clinical significance of 
treatment effects description, application, and alternatives. Jour- 
nal of Consulting and Clinical Psychology, 67(3), 300-307. 

Jacobson, N. S. & Truax, P. (1991). Clinical significance: A statistical 
approach to defining meaningful change in psychotherapy re- 
search. Journal of Consulting and Clinical Psychology, 59, 12-19. 

Kalechstein, A. D., van Gorp, W. G., & Rapport, L. J. (1998). Variabil- 
ity in clinical classification of raw test scores across normative 
data sets. The Clinical Neuropsychologist, 12(3), 339-347. 

Lees-Haley, P. R. (1996). Alice in validityland, or the dangerous con- 
sequences of consequential validity. American Psychologist, 51(9), 
981-983. 

Lezak, M. D., Howieson, D. B., & Loring, D. W. (2004). Neuropsycho- 
logical assessment (4th ed.). New York: Oxford University Press. 

Lindeboom, J. (1989). Who needs cutting points? Journal of Clinical 
Psychology, 45(4), 679-683. 

Lineweaver, T. T, & Chelune, G. J. (2003). Use of the WAIS-III and 
WMS-III in the context of serial assessments: Interpreting reli- 
able and meaningful change. In D. S. Tulsky, D. H. Saklofske, 
G. J. Chelune, R. K. Heaton, R. Ivnik, R. Bornstein, A. Prifitera, & 
M. F. Ledbetter (Eds.), Clinical interpretation of the WAIS-III and 
WMS-III (pp. 303-337). New York: Academic Press. 

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test 
scores. Reading, MA: Addison-Wesley. 

Makinnon, A. (2000). A spreadsheet for the calculation of com- 
prehensive statistics for the assessment of diagnostic tests and inter- 
rater agreement. Computers in Biology and Medicine, 30, 127-134. 



Psychometrics in Neuropsychological Assessment 31 



McCleary, R., Dick, M. B., Buckwalter, G., Henderson, V., & Shankle, 
W. R. (1996). Full-information models for multiple psychometric 
tests: Annualized rates of change in normal aging and dementia. 
Alzheimer Disease and Associated Disorders, 10(4), 216-223. 

McFadden, T. U. (1996). Creating language impairments in typically 
achieving children: The pitfalls of "normal" normative sampling. 
Language, Speech, and Hearing Services in Schools, 27, 3-9. 

McKenzie, D., Vida, S., Mackinnon, A. J., Onghena, P. and Clarke, D. 
(1997) Accurate confidence intervals for measures of test perfor- 
mance. Psychiatry Research, 69, 207-209. 

Meehl, P. E. (1973). Why I do not attend case conferences. In P. E. Meehl 
(Ed.), Psychodiagnosis: Selected papers (pp. 225-302). Minneapo- 
lis: University of Minnesota. 

Meehl, P. E., & Rosen, A. (1955). Antecedent probability and the 
efficiency of psychometric signs, patterns, or cutting scores. 
Psychological Bulletin. 52, 194-216. 

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measure- 
ment (3rd ed., p. 13-103). New York: Macmillan. 

Messick, S. (1993). Validity. In R. L. Lin (Ed.), Educational measure- 
ment (3rd ed., pp. 13-103). Phoenix, AZ.: The Oryx Press. 

Messick, S. (1995). Validity of psychological assessment: Validation of 
inferences from persons' responses and performance as scientific 
inquiry into scoring meaning. American Psychologist, 9, 741-749. 

Messick, S. (1996). Validity of psychological assessment: Validation 
of inferences from persons' responses and performances as scientific 
inquiry into score meaning. American Psychologist, 50(9), 741-749. 

Micceri, T. (1989). The unicorn, the normal curve, and other im- 
probable creatures. Psychological Bulletin, 105(1), 156-166. 

Mitrushina, M. N., Boone, K. B., & D'Elia, L. F. (1999). Handbook of 
normative data for neuropsychological assessment. New York: Ox- 
ford University Press. 

Mitrushina, M. N., Boone, K. B., Razani, J., & D'Elia, L. F. (2005). 
Handbook of normative data for neuropsychological assessment 
(2nd ed.). New York: Oxford University Press. 

Mossman, D. (2003). Daubert, cognitive malingering, and test accu- 
racy. Law and Human Behavior, 27(3), 229-249. 

Mossman D., & Somoza E. (1992). Balancing risks and benefits: an- 
other approach to optimizing diagnostic tests. Journal of Neu- 
ropsychiatry and Clinical Neurosciences, 4(3), 331-335. 

Nevo, B. (1985). Face validity revisited. Journal of Educational Mea- 
surement, 22, 287-293. 

Nunnally, J. C, & Bernstein, I. H. (1994). Psychometric theory (3rd 
ed.). New York: McGraw-Hill, Inc. 

Palmer, B. W., Boone, K. B., Lesser, I. R., & Wohl, M. A. ( 1998). Base rates 
of "impaired' neuropsychological test performance among healthy 
older adults. Archives of Clinical Neuropsychology, 13, 503-511. 

Pavel, D., Sanchez, T, & Machamer, A. (1994). Ethnic fraud, native 
peoples and higher education. Thought and Action, 10, 91-100. 

Pearson, K. (1920). Notes on the history of correlation. Biometrika, 
J 3, 25-45. 

Pedhazur, E. (1997). Multiple Regression in Behavioral Research. New 
York: Harcourt Brace. 

Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Clas- 
sification and Prediction. New York: Oxford. 

Puente, A. E., Mora, M. S., & Munoz-Cespedes, J. M. (1997). Neu- 
ropsychological assessment of Spanish-Speaking children and 
youth. In C. R. Reynolds & E. Fletcher-Janzen (Eds.), Handbook of 
clinical child neuropsychology (2nd ed., pp. 371-383). New York: 
Plenum Press. 

Rapport, L. J., Brines, D. B., & Axelrod, B. N. (1997). Full Scale IQ as 
mediator of practice effects: The rich get richer. Clinical Neu- 
ropsychologist, 11(4), 375-380. 



Rey, G. J., Feldman, E., & Rivas-Vazquez. (1999). Neuropsychological 
test development and normative data on Hispanics. Archives of 
Clinical Neuropsychology, 14(7), 593-601. 

Rorer, L. G., & Dawes, R. M. (1982). A base-rate bootstrap. Journal of 
Consulting and Clinical Psychology, 50(3), 419-425. 

Sacket, P. R., & Wilk, S. L. (1994). Within-group norming and other 
forms of score adjustment in preemployment testing. American 
Psychologist, 49(11), 929-954. 

Sattler, J. M. (2001). Assessment of children: Cognitive applications 
(4th ed.). San Diego: Jerome M. Sattler Publisher, Inc. 

Sawrie, S. M., Chelune, G. J., Naugle, R. I., & Luders, H. O (1996). 
Empirical methods for assessing meaningful neuropsychological 
change following epilepsy surgery. Journal of the International 
Neuropsychological Society, 2, 556-564. 

Schretlen D. J., Munro C. A., Anthony J. C, & Pearlson G. D. (2003). 
Examining the range of normal intraindividual variability in 
neuropsychological test performance. Journal of the International 
Neuropsychological Society, 9(6): 864-70. 

Shavelson, R. J., Webb, N. M., & Rowley, G. (1989). Generalizability 
theory. American Psychologist, 44, 922-932. 

Sherman, E. M. S., Slick, D. J., Connolly, M. B., Steinbok, P., Martin, 
R., Strauss, E., Chelune, G. J., & Farrell, K. (2003). Re-examining 
the effects of epilepsy surgery on IQ in children: An empirically 
derived method for measuring change. Journal of the Interna- 
tional Neuropsychological Society, 9, 879-886. 

Slick, D. J., Hopp, G., Strauss, E., & Thompson, G. (1997). The Victo- 
ria Symptom Validity Test. Odessa, FL: Psychological Assessment 
Resources. 

Sokal, R. R., & Rohlf, J. F. (1995). Biometry. San Fransisco, CA: 
WH. Freeman. 

Somoza E., & Mossman D. (1992). Comparing diagnostic tests using 
information theory: the INFO-ROC technique. Journal of Neu- 
ropsychiatry and Clinical Neurosciences, 4(2), 214-219. 

Streiner, D. L. (2003a). Being inconsistent about consistency: When 
coefficient alpha does and doesn't matter. Journal of Personality 
Assessment, 80(3), 217-222. 

Streiner, D. L. (2003b). Starting at the beginning: An introduction to 
coefficient alpha and internal consistency. Journal of Personality 
Assessment, 80(1), 99-103. 

Streiner, D. L. (2003c). Diagnosing tests: Using and misusing diag- 
nostic and screening tests. Journal of Personality Assessment, 
81(3), 209-219. 

Swets, J. A., Dawes, R. M., & Monahan, J. (2000). Psychological sci- 
ence can improve diagnostic decisions. Psychological Science in the 
Public Interest, 1(1), 1-26. 

Taylor, M. J., & Heaton, R. K. (2001). Sensitivity and specificity of 
WAIS-III/WMS-III demographically corrected factor scores in 
neuropsychological assessment. Journal of the International Neu- 
ropsychological Society, 7, 867-874. 

Temkin, N. R., Heaton, R. K., Grant, I., & Dikmen, S. S. (1999). De- 
tecting significant change in neuropsychological test performance: 
A comparison of four models. Journal of the International Neu- 
ropsychological Society, 5, 357-369. 

Tulsky, D. S., Saklofske, D. H., & Zhu, J. (2003). Revising a standard: 
An evaluation of the origin and development of the WAIS-III. In 
D. S. Tulsky, D. H. Saklofske, G J. Chelune, R. K. Heaton, R. Ivnik, 
R. Bornstein, A. Prifitera, & M. F. Ledbetter (Eds.), Clinical inter- 
pretation of the WAIS-III and WMS-III (pp. 43-92). New York: 
Academic Press. 

Willis, W. G. (1984). Reanalysis of an actuarial approach to neu- 
ropsychological diagnosis in consideration of base rates. Journal 
of Consulting and Clinical Psychology, 52(4), 567-569. 



32 A Compendium of Neuropsychological Tests 



Woods, S. P., Weinborn, M., & Lovejoy, D. W. (2003). Are classifica- 
tion accuracy statistics underused in neuropsychological re- 
search? Journal of Clinical and Experimental Neuropsychology, 
25(3), 43 1-439. 



Yun, J., & Ulrich, D. A. (2002). Estimating measurement validity: A 
tutorial. Adapted Physical Activity Quarterly, 19, 'il-H. 

Zimiles, H. (1996). Rethinking the validity of psychological assess- 
ment. American Psychologist, 51(9), 980-981. 



Figure 1 -7 Profile form — Adult 



Name 



D.O.B. 



Age. 



Sex 



Education 
Test Dates 



Handedness 



Previous Testing 



Examiner 



COGNITIVE 



Score 



AgeSS 



%il 



(bile 



10 20 30 40 50 60 70 80 90 



WAIS-III 

Vocabulary 

Similarities 

Arithmetic 

Digit Span F B 

Information 

Comprehension 

Letter-Number Sequencing 

Picture Completion 
Digit-Symbol-Coding 
Block Design 
Matrix Reasoning 
Picture Arrangement 
Symbol Search 
Object Assembly 

VIQ 
PIQ 
FSIQ 

VCI 
POI 
WMI 
PSI 



NAART 

VIQ 
PIQ 
FSIQ 



Raven's Matrices ( 

Other 

Other 

Other 



[continued) 



Figure 1-7 (continued) 



EXECUTIVE FUNCTION 
Score Z %ile 



10 20 30 40 50 60 70 80 90 



WCST 

Categories 
Perseverative Errors 
FMS 

Category Test 

Cognitive Estimation Test 

Stroop ( ) 



Interference 

Verbal Fluency 

FAS 
Animals 

Ruff Figural Fluency 

Total Unique Designs 
Error Ratio 

BADS 

6-Elements 
Key Search 

Other 



Trails ( 

A 
B 
Interference 

CPT 

Omissions 
Commissions 

BTA 

Symbol-Digit 

PASAT 

2.4 
2.0 

1.6 
1.2 

Other 



ATTENTION/CONCENTRATION 
Score Z %ile 



10 20 30 40 50 60 70 80 90 



{continued} 



33 



Figure 1-7 (continued) 



MEMORY 



Score 



AgeSS 



%ile 



10 20 30 40 50 60 70 80 90 



WMS-III 

Logical Memory I 

Logical Memory II 

Faces I 

Faces II 

VPAI 

VPAII 

Family Pictures I 

Family Pictures II 

Spatial Span 

Visual Reprod. I 

Visual Reprod. II 

Visual Reprod. Recognition 

Word Lists I 

Word Lists II 

Auditory Immediate 

Visual Immediate 

Immediate Memory 

Auditory Delayed 

Visual Delayed 

Aud. Recogn. Delayed 

General Memory 

Working Memory 

CVLT-II 

List A Total 

List A Trial 1 

List A Trial 5 

List B 

List A Short-Delay Free Recall 

List A Short-Delay Cued Recall 

List A Long-Delay Free Recall 

List A Long-Delay Cued Recall 

Semantic Cluster Ratio 
Serial Cluster Ratio 
% Primacy 
% Middle 
% Recency 
Learning Slope 
Recall Consistency 

Perservations 

Free Recall Instrustions 

Cued Recall Intrusions 

Recognition Hits 
False Positives 
Response Bias 

B-Al 

Short Delay A5 

Long Delay - Short Delay Free 

Recognition - Long Delay Free 



[continued) 



34 



Figure 1-7 {continued) 







MEMORY (Contd.) 




Score Z %ile 10 20 30 40 50 60 70 80 90 




Rey Complex Figure 

Copy 
























3" Recall 


30" Recall 


Recognition 


BVMT-R 

Trial 1 


Trial 2 


Trial 3 


Total Recall 


Delayed Recall 


Recognition Hits 


Recog. False Alarms 


RAVLT 

Trial 1 


Trial 2 


Trial 3 


Trial 4 


Trial 5 


Total 


Recognition 


Other 


Other 


Other 






MEMORY/MOTIVATION 






10 20 30 40 50 60 70 80 90 




VSVT 

Easy Correct z = p = 
























Hard Correct z = p = 


Easy Time sd = 


Hard Time sd = 


TOMM 

Trial 1 


Trial 2 


Retention 


Word Memory Test 

Immediate Recognition 


Delayed Recognition 


Consistency 


Multiple Choice 


Paired Associate 


Delayed Free Recall 


Other 


{continued 





35 



Figure 1-7 (continued) 







LANGUAGE 




Score Z %ile 10 20 30 40 50 60 70 80 90 




Boston Naming Test 
























Dichotic Listening 

R. Ear 


L. Ear 


Total 


Token Test 


PPVT-III 


Other 






VISUAL 




Score Z %ile 10 20 30 40 50 60 70 80 90 




Hooper VOT 
























VOSP 

Incomplete Letters 


Silhouettes 


Object Decision 


Progressive Silhouettes 


Dot Counting 


Position Discrimination 


Number Location 


Cube Analysis 


Judgment of Line Orientation 


Right-Left Orientation 

Total Correct 


Reversal Score 


Other 






MOTOR 




Score Z %ile 1 20 30 40 50 60 70 80 90 




Grooved Pegboard 

Right 
























Left 


Finger Tapping 

Right 


Left 


Dynamometer 

Right 


Left 


Other 


(continued 





36 



Figure 1-7 (continued) 



Score 



SOMATOSENSORY/OLFACTION 
Z %ile 



10 20 30 40 50 60 70 80 90 



Smell Identification 

TPT 

Right 

Left 

Both 

Memory 

Location 



Other . 
Other 



ACADEMIC ACHIEVEMENT 
Score Z %ile 



10 20 30 40 50 60 70 80 90 



WRAT-III 

Reading 
Spelling 
Arithmetic 

WJ-III Achievement 

Letter/Word Identification 
Passage Comprehension 
Word Attack 
Reading Vocabulary 
Calculation 
Applied Problems 
Quantitative Concepts 
Dictation 
Writing Samples 

Other 



PERSONALITY/MOOD 



MMPI-2 



PAI 



BDI-II 



IADL 



Othe 



37 



Figure 1-8 Profile form — Children 



CHILDREN'S NEUROPSYCHOLOGICAL TEST PROFILE 



Name: 

Handedness: 

Test Date 1 : 

Age: Grade: 



WISC-IV 

SI 

vc 

CO 

IN 

WR 

BD 

PCn 

MR 

PCm 

DS 

LN 

AR 

CD 

SS 

CA 

FSIQ 
VCI 
PRI 
WMI 

PSI 



VC-PR 

VC-FD 

VC-PS 

PR-FD 

PR-PS 

FD-PS 



Patient No: 



D.O.B. 



Sex: 





Raw 

















































































































Test Date 2: 

Age: Grade: 



Test Date 3: 

Age: Grade: 



Raw 



Scaled/Age Eq. 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 




/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


P 









































%ile 

































































































%ile 









































Raw 



CAT 

Hits 

FA 

RT 

ADHD-IV- Parent 

HI 
IA 

ADHD-IV-Teacher 

HI 
IA 



Attention/Executive 

Scaled %ile 



[continued] 



38 



Figure 1-8 (continued) 



WCST(64/ 128) 


Attention/Executive (Contd.) 
Raw Scaled %ile 




Errors 

PR 

CLR 

Cat 

FMS 


































































































D-KEFS 






Letter Fluency 
Category Fluency 
Ca. Switching Correct 
Cat. Switching Accuracy 
















































































Color-Word Interference Tes 


t 




Naming Time 
Reading Time 
Inhibition Time 
Inhibition Errors 
Inhibition vs. Reading 
Switching Time 
Switching Errors 
Switching vs. Inhibition 
























































































































































MNI Design Fluency 














Perseverations 
Novel Output 












































NEPSY 






Tower 

Tower Rule Violations 


















































AA&RS 

Attention 
Response Set 




































































Visual Attention 






BRIEF-Parent 






Inhibit 

Shift 

Emotional Control 

Initiate 

Working Memory 

Plan/Organize 

Organization of Materials 

Monitor 

BRI 

Ml 

Composite 
















































































































































































































[continued 





39 



Figure 1-8 (continued) 



BRIEF-Teacher 


Attention/Executive (Contd.) 
Raw Scaled %ile 




Inhibit 

Shift 

Emotional Control 

Initiate 

Working Memory 

Plan/Organize 

Organization of Materials 

Monitor 

BRI 

Ml 

Composite 














































































































































































































CMS 

Dot Locations 


Learning & Memory 
Raw Scaled %ile 




Learning 
Total 
Long Delay 






























































Faces 






Immediate 
Delayed 












































Family Pictures 






Immediate 
Delayed 












































Stories 






Immediate Recall 
Delayed Recall 
Immediate Thematic 
Delayed Thematic 
Recognition 


































































































Word Pairs 






Immediate 

Learning 

Total 

Delayed 

Recognition 








































































































Visual Immediate 
Visual Delayed 
Verbal Immediate 
Verbal Delayed 
General Memory 
Learning 
Delayed Recognition 








































































































































P 




Vis l-PIQ 
Vis D-PIQ 
Ver l-VIQ 
Ver D-VIQ 


















































































[continued 





40 



Figure 1-8 (continued) 



CVLT 


Learning & Memory (Contd.) 
Raw Scaled 


%ile 




Tl 

T5 

Slope 

Total 

B 

Short Free 

Short Cued 

Long Free 

Long Cued 

Short-T5 

Long — Short 

d' 

Hits 

FA 




































































































































































































































































RCFT 








Copy 

3' 

30' 

Recognition 
















































































DICHOTIC LISTENING 


Language 
Raw Scaled 


%ile / AE 




Left 
Right 












































TOKEN 








Part 1 
Part 3 
Part 5 


















/ 


l 


l 
















/ 


l 


l 














/ 


l 


l 










CELF-3: C & D 




/ 1 / 1 / 












PPVT-III 




/ 1 / 1 / 












EOWPVT 


1 1 


/ 1 / 1 / 




WRAT-3 








Reading 
Spelling 
Arithmetic 


















/ 


l 


l 
















/ 


l 


l 














/ 


l 


l 


GORT-4 








Rate 

Accuracy 

Fluency 

Comprehension 

ORQ 


















/ 


l 


l 
















/ 


l 


l 














/ 


l 


l 














/ 


l 


l 














/ 


l 


l 






[continued 





41 



Figure 1-8 (continued) 



WJ-R/III 


Language (Contd.) 
Raw Scaled %ile / AE 




P-Voc 

Understanding Directions 

R-Comp 

L-Comp 


















1 


1 


1 
















1 


1 


1 














1 


1 


1 














1 


1 


1 




Visual Motor 
Raw Scaled %ile 




VMI 






PURDUE PEGBOARD 






Dominant 
Nondominant 












































CBCL 


Rating Scales 
Raw Scaled %ile 




Anxious/Depressed 
Withdrawn/Depressed 
Somatic Complaints 
Social Problems 
Thought Problems 
Attention Problems 
Rule-Breaking Behavior 
Aggressive Behavior 
Affective Problems 
Anxiety Problems 
Somatic Problems 
ADHD Problems 
Oppositional-Defiant 
Conduct Problems 




































































































































































































































































TRF 






Anxious/Depressed 
Withdrawn/Depressed 
Somatic Complaints 
Social Problems 
Thought Problems 
Attention Problems 
Rule-Breaking Behavior 
Aggressive Behavior 
Affective Problems 
Anxiety Problems 
Somatic Problems 
ADHD Problems 
Oppositional-Defiant 
Conduct Problems 






































































































































































































































































[continued 





42 



Figure 1-8 (continued) 



CDI (Long/Short) 


Rating Scales (Contd.) 
Raw Scaled %ile 




Negative Mood 
Interpersonal Problems 
Ineffectiveness 
Anhedonia 
Negative Self-Esteem 
Total 




















































































































SIB-R 


Raw Scaled %ile / AE 




Broad Independence 
Motor Skills 
Social & Com. Skills 
Personal Living Skills 
Community Living Skills 


















/ 


/ 


/ 
















/ 


/ 


/ 














/ 


/ 


/ 














/ 


/ 


/ 














/ 


/ 


/ 









43 



Norms Selection in Neuropsychological Assessment 



OVERVIEW 

In this chapter, we present an overview of factors pertinent to 
(1) understanding normative data and (2) selecting norms so 
that they best meet the goals of the assessment and meet the 
needs of the patient evaluated. The goal of the chapter is to fa- 
cilitate the user's task in making informed choices about 
norms. 

One of the main considerations in choosing norms is 
whether a broadly representative sample should be selected, 
or whether a more specific subgroup is more appropriate, 
such as one defined by specific gender, education, ethnicity, 
socioeconomic status (SES), or other variables. Additional 
considerations are sample size, sample composition, and date 
of norms collection. All of these factors are discussed in this 
chapter. Other pertinent psychometric issues such as score 
transformations, extrapolation/interpolation, and normative 
adjustments by case weighting are also essential factors to 
consider when selecting normative datasets for clinical prac- 
tice or research; these are considered in Chapter 1. 



NORMS SELECTION: BASIC PRINCIPLES 

Selecting an appropriate normative dataset is a prerequisite 
for effective and competent neuropsychological practice. Norms 
selection is as important as test selection; choosing an inade- 
quate normative sample is as disastrous as choosing a test 
with poor reliability or validity, as considerable variability 
in obtained scores can occur depending on which normative 
dataset was the basis of normed-referenced scores (Kalech- 
stein et al, 1998; Mitrushina et al, 2005). Because scores are 
directly tied to measurable consequences such as diagnoses, 
treatment recommendations, and funding decisions, neu- 
ropsychologists need to be aware of the specific characteristics 
of the norms they use. 

Ideally, norms should be selected a priori in order to avoid 
confirmatory bias (Kalechstein et al., 1998). From a practical 



standpoint, this also avoids wasted time and resources; many 
clinicians have administered a test to a patient, but only later 
realized that there existed no appropriate normative data cor- 
responding to the patient's specific demographic characteris- 
tics (Mitrushina et al., 2005). 

As most clinicians know, the process of norms selection of- 
ten occurs in tandem with test selection. Subtle administra- 
tion differences may characterize similar paradigms tied to 
different normative data sets. Word fluency is a case in point. 
To administer this test, users may employ the administration 
protocol described in this volume and derive standardized 
scores based on a variety of norms from several published 
studies. Or, one can use a word fluency paradigm as opera- 
tionalized by the D-KEFS, NEPSY, or RBANS, and use the 
specific normative dataset tied to each test. Because each nor- 
mative dataset differs in terms of its demographic composi- 
tion, deciding which word fluency test to use will rest on an 
evaluation of the individual patient's demographic character- 
istics such as age and education. Norms selection is therefore 
facilitated by a strong working knowledge of the similarities 
and differences between tests in the field, as well as by the 
varying characteristics of their respective normative data sets. 
In addition to our volume, Mitrushina et al. (2005) and Baron 
(2004) are invaluable sources in this regard. 



ARE LARGER NORMATIVE DATASETS 
ALWAYS BETTER? 

When choosing a test, most clinicians are primarily concerned 
with the size of the normative sample. The assumption is that 
the larger the N, the more reliable and representative the scores 
derived. As we have seen in Chapter 1, norms should be as 
large as possible to avoid sampling error, and to maximize the 
representativeness of the sample vis-a-vis the general popula- 
tion. As a rule of thumb, at least 200 subjects are needed to 
conduct item analysis (Nunnally & Bernstein, 1994), and many 
sources consider norms of less than 150 cases inadequate. 



44 



Norms Selection in Neuropsychological Assessment 45 



However, it is important to note that even large normative sets 
yield small cell sizes when scores are divided into demographi- 
cally defined subgroups according to variables such as age, 
gender, and education. For example, for a given patient, a 
smaller, homogenous normative dataset comprised only of in- 
dividuals from a similar demographic subgroup (e.g., elderly, 
white females from Minnesota with 12 years of education) 
may actually provide a larger N and a better demographic fit 
than norms from large, commercially produced, nationally 
representative tests whose cell sizes are the result of minute 
subdivision according to multiple demographic factors. This 
is one of the reasons that we provide additional normative 
datasets for commercially available tests that already have large 
normative databases. 

On the other hand, there are clear advantages to norms de- 
rived from large datasets from the general population based 
on statistical techniques that correct for the irregularities in- 
herent in smaller samples. To maximize clinical validity, users 
should be fully familiar with the properties of the normative 
sets they use, including techniques such as case weighting, to 
increase or adjust the composition of norm subgroups (e.g., 
by statistically increasing the number of cases defined by a de- 
mographic variable such as race or age, when the actual cell 
size falls short). Users should note that although techniques 
such as case weighting are relatively common and used in 
many large-scale tests (e.g., Wechsler scales), this practice 
means that derived scores are partially based on an estimate 
rather than on actual, real-world data. 



THE FLYNN EFFECT AND THE DATE OF NORMING 

New tests and new normative datasets appear with regularity 
in the neuropsychological assessment field. As a result, users 
are regularly faced with deciding whether to use existing, well- 
known normative datasets or the newest, most recently pub- 
lished normative datasets. Some have suggested that when 
choosing norms, recency of publication is less important than 
demographic match and cell size (Kalechstein et al., 1998). 
However, the effect of the passage of time on test scores 
should not be underestimated. The Flynn effect, which is the 
general trend for increased IQs over time with each subse- 
quent generation, is estimated to contribute to an increase of 
0.3 IQ points per year. Depending on the test, estimates range 
from 3 to 9 points per decade. A 20-point increase over one 
generation on some nonverbal tests such as the Raven's has 
even been recorded (i.e. Flynn, 1984, 2000). Gains have also 
been noted on neuropsychological tests such as the Halstead- 
Reitan Battery (Bengston et al., 1996). However, there is some 
evidence that the Flynn effect is more pronounced in fluid/ 
nonverbal tests than in crystallized/verbal tests (see Kanaya et 
al., 2003a, 2003b for review). Score increases attributed to the 
Flynn effect are thought to result from improved nutrition, 
cultural changes, experience with testing, changes in school- 
ing or child-rearing practices, or other factors as yet unknown 
(Neisser et al., 1996). 



We know that normed tests have a "lifespan" of about 15 to 
20 years, assuming that all other factors, such as the relevance 
of test items, remain constant (Tulsky et al., 2003). After this 
time, new norms need to be collected; this ensures that scores 
are reanchored to correct for the Flynn effect and this coun- 
teracts the tendency for increased scores over time. With re- 
norming, the older test becomes "easier" and the new test 
becomes "harder": an individual given both the old and the 
new versions of the same test would obtain higher scores on 
the test with older norms. In light of the well-known Flynn ef- 
fect, one would assume that most normative datasets used by 
neuropsychologists include only data from the most recent 
age cohorts. Surprisingly, some normative samples still used 
by neuropsychologists contain data that are as much as 40 
years old. These old datasets, when considered in light of the 
Flynn effect, introduce the potential for considerable score in- 
flation. 

The Flynn effect has not been studied sufficiently in neu- 
ropsychology. However, we do know that it has a number of 
potential effects of major consequence for neuropsychological 
assessment. The first is that it alters cutoffs used for determi- 
nation of certain conditions, depending on the recency of the 
norms used. For example, in the year a new re-standardized 
normative sample from an IQ test is released, an average child 
obtains a hypothetical score of 100. If tested with the older in- 
strument, his or her score might be 105. Although this slight 
shift in scores has minimal consequences in the middle of the 
score distribution (i.e., within the average range), it may have 
major consequences for children whose ability lies closer to 
the tail ends of the distribution. If their scores lie in the vicin- 
ity of standard cut-scores used in diagnosing major condi- 
tions such as mental retardation or giftedness (e.g., 70 and 
130, on IQ tests), these children may obtain scores above or 
below the cut-score, depending on whether they were given 
the older test or the re-standardized version. Conceivably, this 
could also affect factors such as eligibility for the death 
penalty, in some countries (Kanaya et al, 2003b). Thus, in 
these cases, the choice of norms has significant impact on 
whether an obtained score lies above or below a determined 
cutoff, and greatly alters the proportion of individuals with a 
specific condition that are detected by a test. 

On a group level, the Flynn effect also has far-reaching 
consequences. For instance, in a large longitudinal study, 
Kanaya et al. (2003a, 2003b) documented the effects of subse- 
quent re-normings of IQ tests, including cyclical fluctuations 
in the average score obtained by children consisting of in- 
creases up until the year of publication of new norms, fol- 
lowed by an immediate decrease in scores. Score levels were 
therefore tied to how closely children were tested to the year 
in which new norms appeared. On a national level, the year in 
which children were tested affected the reported prevalence of 
mental retardation, with possible large-scale implications for 
national policy regarding education financing, social security 
eligibility, and other issues of national importance. Similar 
findings have been reported for children with LD tested with 
new and older versions of the WISC-III (Truscott & Frank, 



46 A Compendium of Neuropsychological Tests 



2001), potentially affecting the reported prevalence of LD. Ad- 
ditional research is needed to determine how the Flynn effect 
might affect other kinds of cut-scores, such as those used to 
screen for dementia. 

Another reason that the Flynn effect is relevant for neu- 
ropsychologists is that single tests are rarely used as the sole 
basis of the neuropsychological evaluation. Potentially, the 
Flynn effect could add additional sources of complexity for 
users of flexible batteries, if a number of different tests normed 
in different decades are used. Batteries such as the NAB or 
NEPSY, which provide co-normed subtests in a variety of 
neuropsychological domains, provide a clear advantage over 
other test combinations in this regard. Alternatively, existing 
large-scale batteries from the intelligence, memory, and psy- 
choeducational domain, co-normed contemporaneously on a 
single sample (i.e., WAIS-III/ WMS-III/ WIAT-II; WJ-III- 
COG/WJ-III-ACH), can also form much of the core of a neuro- 
psychological battery, when supplemented with additional 
tests normed during a similar timeframe. 

As a result of the Flynn effect, some tests may lose their 
ability to discriminate among individuals with higher ability 
levels. This happens because score inflation secondary to the 
Flynn effect causes more and more higher functioning indi- 
viduals to reach ceiling on the test. In fact, in some cases (e.g., 
Raven's or SPM), ceiling effects are now quite noticeable. 



in these groups. Note that cohort effects have been docu- 
mented in even the youngest of subjects, in some cases as 
early as age 3 (Bocerean et al., 2003). 



PRACTICAL CONSIDERATIONS REGARDING 
THE DATE OF NORMING 

Practically speaking, consumers of increasingly expensive psy- 
chological tests must carefully scrutinize test manuals for the 
date of normative data collection before assuming that recently 
released normative sets actually contain recently collected nor- 
mative data. The reality is that some recently published norms 
actually include very old data. Several instances of this were 
noted in tests reviewed for this volume. The year in which 
normative data were collected should be explicitly stated in 
test manuals; omitting this information is not appropriate 
practice (see Standards for Educational and Psychological 
Testing [AERA] et al., 1999). Consequently, users are urged to 
contact test publishers when this information is lacking. In 
this volume, we have attempted to include the date of norming 
to facilitate the task for users. 



TYPES OF NORMS 



OLDER NORMS AND INTERACTIONS WITH 
SOCIODEMOGRAPHIC FACTORS 

Importantly, outdated norms can interact with demographic 
factors to raise the risk of misinterpretation of test results. 
Over the last few decades, the demographic composition of 
the U.S. population has changed dramatically, with increas- 
ing numbers of ethnic minority representation, particularly 
in the younger age ranges (e.g., Llorente et al., 1999). By 
2010, Hispanics will comprise 15% of the U.S. population; by 
2050, this percentage is projected to reach 25% (Ponton & 
Leon-Carrion, 2001). Because these demographic factors im- 
pact test scores, use of older norms may differentially penal- 
ize individuals from minority groups because of limited 
minority representation compared with more contemporary 
norms (see discussion of PPVT-R in PPVT-III review, for 
example). 

Cohort effects also raise the risk that older normative 
datasets may be obsolete. In elderly individuals, educational 
level may be restricted by a lack of universal access to educa- 
tion; when norms based on these cohorts are applied to later 
cohorts with better educational opportunities, the result is ar- 
tificial score inflation and reduced test sensitivity. Additional 
distortion is introduced to particular subgroups when cohort- 
specific features such as access to education interact with de- 
mographic factors such as gender and ethnicity as a result of 
institutionalized sexism and racism (e.g., Manly et al., 2003). 
Special care must therefore be exercised when selecting norms 



It was not so long ago that neuropsychologists relied exclu- 
sively on raw scores and cut-offs for determining level of per- 
formance, and that normative samples consisted solely of 
"normals" and "brain-damaged" individuals. Over time, the 
field has shifted toward the notion of general population 
norms comprised of a large sample of individuals character- 
ized by specific demographic characteristics. Use of these 
norms then allows the user to better determine whether the 
individual differs from the general population. While IQ and 
achievement tests have for many years set the standard and 
perfected techniques for constructing large, representative 
normative datasets, neuropsychological datasets have only re- 
cently begun to include a variety of individuals reflecting the 
composition of the general population. 

At the same time, demographically corrected normative 
datasets, which represent only a subset of the general popula- 
tion most similar to the patient under scrutiny, are increasingly 
used and demanded by practitioners. Demographic correc- 
tions have come to be routinely applied to most normative 
data in neuropsychology, first in the form of age-corrected 
norms, and in some cases, gender-corrected norms, and later, 
as education-corrected norms. As more and more neuropsy- 
chological tests become commercially produced, some norms 
specifically include a certain proportion of individuals who 
represent groups defined by race/ethnicity, in order to make 
norms as representative of the general population as possible. 
This has also allowed additional levels of within-group norm- 
ing. This gradual shift away from raw scores as the basis of 
interpretation to scores adjusted for multiple demographic 



Norms Selection in Neuropsychological Assessment 47 



factors occurred because of an acknowledgment by clinicians 
and researchers that demographics are significantly correlated 
with performance on neuropsychological tests, as they are for 
most cognitive tests in common usage (see test reviews in this 
volume for specific references, as well as sources below). The 
reasons for this shift, and the ways in which general popula- 
tion and demographically corrected normative datasets can 
be used effectively, will be reviewed in this chapter. 



NORMS: POPULATION-BASED VERSUS 
DEMOGRAPHICALLY ADJUSTED 

There are two schools of thought regarding how closely 
matched the norms must be to the demographic characteristics 
of the individual being assessed, and these views are diametri- 
cally opposed. These are: (1) that norms should be as represen- 
tative of the general population as possible, and (2) that norms 
should approximate, as closely as possible, the unique sub- 
group to which the individual belongs. The latter view is the 
central tenet of Mitrushina et al.'s text (Mitrushina et al., 2005), 
which is essentially a guidebook for choosing norms that most 
closely fit the demographic characteristics of the individual pa- 
tient being assessed. Although most neuropsychologists as- 
sume that the latter is always preferable, this is not necessarily 
the best choice at all times. Because a test's sensitivity, speci- 
ficity and impairment cutoffs (see Chapter 1) depend to some 
extent on the norms selected, choosing norms necessitates a 
trade-off between the risk of making false negative errors and 
the risk of making false positive errors. Thus, the use of broadly 
representative versus demographically specific norms will de- 
pend on the purpose of the testing. 

At times, it will be paramount to compare the individual to 
all other persons of the same age in the general population. 
Determining a diagnosis of mental retardation or learning dis- 
ability would be one example. At other times, the goal will be 
to compare the individual to the best-matched demographic 
subgroup. Here, the goal might involve mapping out an indi- 
vidual's relative strengths and weaknesses in order to inform 
diagnostic considerations, plan classroom accommodations or 
return to work, or to obtain a best estimate of premorbid level 
in a dementia work-up. In many clinical situations, the assess- 
ment will involve both approaches because the results serve to 
address questions at many levels, including diagnosis, individ- 
ual strengths and weaknesses, and areas in need of 
treatment/accommodation. For a more complete discussion of 
these and related issues, see Ethnicity, Race and Culture. 



STRATIFIED GENERAL POPULATION NORMS 

The rationale for stratifying norms according to demographic 
characteristics is to maximize the likelihood that norms are 
representative of the general population. Major tests are usu- 
ally stratified based on age, gender, education, ethnicity/race, 



socioeconomic status (SES), the latter defined as either actual 
SES, occupation, or education/parental education. Additional 
variables include geographic region and urban versus rural 
residence. Stratification occurs according to representative 
proportions such as those provided by U.S. Census data to 
best approximate the composition of the general population. 

Although each of these demographic factors tend to be 
treated as separate stratification variables, it is important to 
note there is considerable overlap between supposedly sepa- 
rate demographic characteristics. In most cases, some demo- 
graphic characteristics (e.g., ethnicity, geographic location) 
are actually surrogate variables for other more fundamental 
variables that influence test scores (e.g., SES, education). 

When nationally representative norms are considered, the 
sample should have an even distribution of demographic 
variables across ages, unless certain demographics are more 
heavily weighted in some age groups versus others. For exam- 
ple, because of factors related to parity and longevity, there is 
a larger ethnic minority representation among young children 
in the United States than in adults, and a higher representa- 
tion of women in the oldest age bands. Because the demo- 
graphic composition of nations changes over time, this 
suggests additional caution in using normative data that are 
outdated, as the norms may no longer be representative of the 
population being evaluated. 



DEMOGRAPHICALLY CORRECTED NORMS 

The second kind of normative dataset is one that includes 
only individuals from a specific category; these are sometimes 
known as within-group norms. The utility of these norms 
rests in the fact that individuals with particular demographic 
characteristics can be compared with the subgroup that best 
approximates their unique and specific demographic constel- 
lation. Demographically corrected norms will be discussed 
throughout the following sections, but in particular in the 
sections devoted to ethnicity, race, and culture. 



AGE RANGE, DEVELOPMENTAL GROWTH CURVES, 
AND AGE BANDS 

Most neuropsychological functions exhibit specific develop- 
mental growth curves when considered across age. For exam- 
ple, growth curves for different abilities will depend on 
factors such as the age at which the function displays its 
greatest growth, the extent to which the function changes 
during adulthood, and the extent of its vulnerability to age- 
related decline. Thus, a developmental growth curve for a test 
of executive function may differ markedly from that of a test 
of vocabulary in terms of slope during the period from early 
childhood to adolescence (i.e., a steeper, earlier increase for 
the vocabulary test), and during old age (i.e., a steeper, earlier 
decline for the executive function test). Figure 2-1 shows 



48 A Compendium of Neuropsychological Tests 



Figure 2-1 Developmental growth curves for abilities measured by the WJ-III. Source: 
Reprinted with permission from McGrew & Woodcock, 2001. 



160 

LO 

CD 

< 

E 120 
o 

CD 



(a) Short-Term Memory (Gsm) 



Working memory (MW) 




15 25 



35 45 55 65 
Age (in years) 



160 



in 

CD 

< 



120 



CD 
O 

D 

CD 

o 



(b) Comprehension-Knowledge (Gc) 




General information (K0/K2) 



Language development/Lexical 
knowledge (LD/VL) 




Listening ability (LS) 



0~t 1 1 1 1 1 1 1 r~ 

5 15 25 35 45 55 65 75 85 

Age (in years) 



160 



lO 

CD 



120 



(c) Long-Term Retrieval (Glr) 




- Naming facility (NA) 



Associative memory (MA) 

/ — Meaningful memory (MM) 



~ i 1 1 1 1 1 1 r~ 

15 25 35 45 55 65 75 85 



Age (in years) 



160 



(d) Reading (Grw) 




25 35 45 55 65 
Age (in years) 



(e) Writing (Grw) 



160 




15 25 35 45 55 65 75 85 
Age (in years) 



(f) Quantitative Reasoning/Fluid Reasoning (Gq/Gf) 

160 



Quantitative reasoning (RQ) 




25 35 45 55 65 
Age (in years) 



age-related changes in performance in the different WJ-III 
cognitive and achievement domains. 

While age correction has become an almost automatic pro- 
cess in transforming and interpreting scores in neuropsycho- 
logical assessment, the selection of age range and the range of 
each age band should not be arbitrarily determined. Rather, 
the portioning of norms by age and the range of each age 



band will depend on the shape of the growth curve for each 
test. Ideally, a well-constructed test will base its age band on 
growth curves and provide shorter age bands during periods 
of rapid development/change and longer age bands when the 
ability under question shows little developmental change. 
Batteries that use uniform age bands across domains (e.g., the 
same age bands for each subtest) should demonstrate that this 



approach is based on empirical evidence of similarity of de- 
velopmental trajectories across each domain sampled. 

A working knowledge of developmental growth curves for 
functions assessed by neuropsychological tests is useful in or- 
der to evaluate norms and their derived test scores. Note that 
age effects are also often considered evidence of test validity in 
those functions that display developmental growth curves. Al- 
though most manuals for major intelligence, achievement, 
and language tests tend to include this information, growth 
curves are not routinely included in manuals for neuropsy- 
chological tests. As a result, we have endeavored to provide in- 
formation on developmental/age effects for each of the tests 
reviewed in this volume, in the Normative Data section of 
each test. 



EDUCATION 

Many normative datasets offer education-corrected norms. In 
the case of children, parental education is often used as a 
stratification variable. Although some authors provide de- 
tailed instructions on how to code educational level in order 
to minimize error variance, most often this is left to the indi- 
vidual examiner who has to deal with complex educational 
histories that require considerable judgment. Heaton et al. 
(2004) offer guidelines for use with their normative data and 
these are shown in Table 2-1. A discussion of education ef- 
fects is included in almost all test reviews in this volume. 



Table 2-1 Guidelines for Assigning Years of Education 
Counted in Years of Education Not Counted 



Only full years of regular academic 
coursework that are successfully 
completed are counted 



Regular college or university 

No matter how much time it takes 
to complete a diploma or degree, 
standard numbers of education 
years are assigned: 

High school 12 

Associate's degree 14 

Bachelor's degree 16 

Master's degree 18 

Doctoral degree 20 



Years in which person 
obtained failing 
grades not counted; 
partial years not 
counted 

General Equivalency 
Diploma (GED) 

Vocational training 



Source: Reproduced by special permission of the publisher, Psychological Assessment 
Resources, Inc., 16204 North Florida Avenue, Lutz, FL 33549, from the Revised Com- 
prehensive Norms for an Expanded Halstead-Reitan Battery: Demographically ad- 
justed Neuropsychological Norms for African American and Caucasian Adults by 
Robert K. Heaton, PhD, S. Walden Miller, PhD, Michael J. Taylor, PhD, and Igor Grant, 
MD, Copyright 1991, 1992, 2004 by PAR, Inc. Further reproduction is prohibited 
without permission from PAR, Inc. 



Norms Selection in Neuropsychological Assessment 49 
ETHNICITY, RACE, AND CULTURE 

Differences Between Ethnicity, Race, and Culture 

The terms "ethnicity," "culture," and "race" are often used in- 
terchangeably. However, ethnicity and culture are multidi- 
mensional constructs that reflect groups characterized by 
common language, customs, heritage, or nationality, whereas 
race carries the implication of genetically based traits (Ardila 
et al., 1994; see also Gasquoine, 1999; Harris & Tulsky, 2003; 
and Tulsky et al., 2003, for further discussion). All of these 
terms, in most instances, are used when referring to minority 
groups that differ from the majority group. Test manuals that 
stratify according to ethnicity and/or race, or that provide 
within-group norms of racial or cultural groups, do not typi- 
cally provide detailed information on how minority groups 
are defined, even though the method by which race, culture, 
or ethnicity is determined may influence the composition of 
norms. Thus, minority groups defined by self-identification, 
observed physical differences, or rating scales reflecting degree 
of acculturation will necessarily differ to some extent. In addi- 
tion, there is increasing evidence that some of the variance in 
test scores predicted by "ethnicity" actually relates to level of 
acculturation, literacy or quality of education (Harris & 
Tulsky, 2003; Manly et al., 1998, 2002, 2003; Shuttleworfh- 
Edwards et al, 2004). 

Rationale for Ethnically Adjusted Norms 

There are numerous studies that document lower perfor- 
mance estimates on cognitive tests in minority populations 
(e.g., Manly & Jacobs, 2001; Manly et al, 1998, 2002; Ponton 
& Ardila, 1999). When scores are then used to make inferences 
about brain functioning, the result is an overestimate of 
deficits and misattribution about neuropsychological dys- 
function. For example, when normally functioning African 
Americans are assessed, a statistically high base rate of impair- 
ment has been documented using conventional cutoff scores 
(e.g., Campbell et al, 2002; Heaton et al, 2004; Patton et al, 
2003). Numerous examples can also be found with regard to 
groups such as Spanish-speaking individuals (e.g., Ardila 
et al., 1994) and people in southern Africa (Shuttleworth- 
Edwards et al., 2004). In other words, the specificity (i.e., the 
extent to which normal individuals are correctly identified) of 
many neuropsychological tests is inadequate when unadjusted 
cutoffs are used in minority and/or disadvantaged groups. 

In children, the whole issue of ethnicity and neuropsycho- 
logical assessment is relatively uncharted territory, despite 
considerable evidence from intelligence and language research 
that ethnicity significantly impacts test scores in children 
(Brooks-Gunn et al., 2003; Padilla et al, 2002; Sattler, 2001). 
Children's neuropsychological norms that take into account 
additional demographic variables besides age would therefore 
be of considerable utility. 

In addition to ethical issues regarding misdiagnosis, the 
costs of false positive errors in normal individuals are obvious. 



50 A Compendium of Neuropsychological Tests 



These include adverse psychological effects on the patient, un- 
necessary treatment, and negative financial repercussions 
(e.g., Patton et al., 2003). Thus, there is a critical need for rep- 
resentative normative data for minority subgroups, as well as 
adjusted cutoffs to reflect more clinically acceptable impair- 
ment levels within subgroups (e.g., Campbell et al., 2002; 
Heaton et al, 2004; Heaton et al, 2003; Manly & Jacobs, 2001; 
Patton et al, 2003). The field has begun to respond to this 
need, as shown by an ever- increasing pool of demographically 
corrected normative data, and by the recent publication of 
several neuropsychological texts devoted to cross-cultural as- 
sessment. The reader is highly encouraged to consult these 
texts for a more complete discussion of the issues (e.g., Ardila 
et al, 1994; Ferraro, 2002; Fletcher-Janzen et al, 2000; Ponton 
& Leon-Carrion, 2001). 

Types of Norms for Minority Groups 

Norms that take into consideration ethnicity, race, and/or cul- 
ture typically consist of separate normative datasets that in- 
clude only specific members such as African Americans or 
Spanish speakers (e.g., Manly et al., 1998; Ardila et al., 1994; 
Heaton et al, 2004; Ponton, 2001). There are other methods 
to adjust scores, including: bonus points (i.e., adding a con- 
stant to the score of all members of a subgroup so that the 
group mean is equivalent to the majority group's), separate 
cutoffs for each subgroup, and banding (i.e., treating all individ- 
uals within a specific score range as having equivalent scores 
to avoid interpretation of small score differences) (Sacket & 
Wilk, 1994). With the exception of the use of subgroup-specific 
cutoffs, these techniques are not commonly used in neuro- 
psychology. 

Of historical interest is the fact that neuropsychology's re- 
cent shift toward demographically corrected scores based on 
race/ethnicity and other variables has occurred with surpris- 
ingly little fanfare or controversy despite ongoing debate in 
other domains of psychology. For example, when race-norming 
was applied to pre-employment screening in the United States 
to increase the number of minorities chosen as job applicants, 
the result was the Civil Rights Act of 1991, which outlawed 
race-norming for applicant selection or referral (see Sackett & 
Wilk, 1994; Gottfredson, 1994; and Greenlaw & Jensen, 1996, 
for an interesting historical review of the ill-fated attempt at 
race-norming the GATB). 

Limitations of Demographically Adjusted Norms 

There are some cases where adjustment for demographic in- 
fluences might be questioned. For example, age and education 
are risk factors for dementia; removing the effects of these de- 
mographic variables might therefore remove some of the pre- 
dictive power of measures of cognitive impairment (Sliwinski 
et al., 1997, 2003). O'Connell et al. (2004) have recently re- 
ported that use of age- and education-corrected normative 
data failed to improve the diagnostic accuracy of the 3MS, a ver- 
sion of the MMSE, when screening for dementia or cognitive 



Figure 2-2 Model of Hispanic diversity in neuropsychological 
assessment. Source: Adapted from Ponton & Leon-Carrion, 2001. 



,o 







$> 



d 1 * 

G° Puerto Rico 

Mexico 
Spanish 

CD 

O) 
CO 

cd Bilingual 

CO 

_i 

English 



Other 
Cuba 




9-11 12-15 >16 
Education 



impairment. Instead, they recommended that the unadjusted 
normative data be used for screening purposes. In a similar 
vein, Reitan and Wolfson (1995, 1996) have argued that age 
and education corrections are appropriate for normal, healthy 
individuals, but that they are not needed for brain-damaged 
subjects. This view has been disputed, however (e.g., Lezak et 
al., 2004; also see Test Selection in this volume). There is also 
evidence that when "corrective norms" are applied, some de- 
mographic influences remain, and overcorrection may occur, 
resulting in score distortion for some subgroups and a risk of 
increased false negatives (e.g., Fastenau, 1998). 

Some of the limitations of ethnically adjusted norms are 
summarized by Sattler (2001), including the fact that they may 
provide (1) a basis for negative comparisons between groups, 
(2) lower expectations for children from groups that differ cul- 
turally and linguistically from the majority, and (3) have little 
relevance outside of the specific geographic area in which they 
were collected. Gasquoine (1999) also argues that within- 
group norming in neuropsychology has limited merit due to 
the complexities inherent in measuring ethnicity and culture; 
subgroups can be divided exponentially, limited only by 
the number of demographic variables available. Figure 2-2 
shows the various subgroups that can be derived from subdi- 
viding within the U.S. Hispanic population alone. 

One alternative to within-group norming is to directly take 
into account the influence of specific variables on test scores, 
since these factors presumably operate across ethnicity and 
culture. These include factors such as English language flu- 
ency, acculturation, length of residence in the United States, 
education, quality of education, and SES, including quality of 
the home environment, health/nutritional status, income, and 
degree and persistence of poverty (Gasquoine, 1999). There- 
fore, norms that correct for ethnicity may be correcting for 
the "wrong" variable, particularly if other variables appear to 
better account for observed differences between groups. For 
instance, differences between minority groups and the major- 
ity culture on cognitive tests may remain even after important 
variables such as number of years of education are accounted 



Norms Selection in Neuropsychological Assessment 51 



for (e.g., Farias et al., 2004; Shuttleworth-Edwards et al., 
2004), but procedures that take into account quality of educa- 
tion by coding for quality directly (e.g., Shuttleworth-Edwards 
et al., 2004) or using literacy corrections yield few group differ- 
ences (e.g., Byrd et al., 2004; Manly et al., 1998; 2002). 

It is important to note, as well, that variables such as the 
persistence of poverty and education level/quality within and 
across minority groups cannot be fully dissociated. For in- 
stance, in some Western countries, the prevalence of continu- 
ous periods of poverty versus temporary instances of poverty 
may be much higher in some minority subgroups than in 
children from the majority culture with similar SES. It is diffi- 
cult to conceptualize how norms could be made to correct for 
these multiple factors, but multivariate corrections on large, 
varied samples, including cross-validation, might be feasible. 

Lastly, there are individuals who, despite apparent similari- 
ties such as race, differ substantially from the overall charac- 
teristics of the target normative subgroup, and who actually 
may be more closely matched to another group that does not 
share obvious racial, cultural, or ethnic characteristics. For ex- 
ample, in the United States, the terms "Hispanic" and "African 
American" are associated with a unique sociopolitical envi- 
ronment related to SES, education, and health status. How- 
ever, these terms have limited applicability to individuals who 
may share only superficial characteristics such as race or lan- 
guage (e.g., Shuttleworth-Edwards et al, 2004), but who are 
recent immigrants to the United States, or who are citizens of 
other countries. Thus, demographicaily corrected norms 
developed in the United States may have limited utility for 
( 1 ) individuals who differ from the target subgroup in impor- 
tant aspects such as quality of education, or (2) in other coun- 
tries where ethnic subgroups may have different educational/ 
SES correlates. However, given the state of the field, the provi- 
sion of normative data for minority groups is a step in the 
right direction, as is increased awareness on the part of practi- 
tioners and researchers that diversity is an inherent character- 
istic of human populations that needs to be reflected in our 
neuropsychological tools and practice. 



(e.g., a screening program that provides free language inter- 
vention, literacy materials, and parent support for at-risk 
inner-city preschoolers). Ideally, the decision to use adjusted 
cutoff and within-group norms should be based on a full un- 
derstanding of the context in which they are destined to be 
used, including the base rate of the particular disorder within 
the subgroup, the sensitivity and specificity of the measure, 
and the costs of false positive and false negative errors. 

Importantly, with regard to the WAIS-III/WMS-III, the 
Psychological Corporation explicitly states that demographi- 
caily adjusted scores are not intended for use in psychoeduca- 
tional assessment, determination of intellectual deficiency, 
vocational assessment, or any other context where the goal is 
to determine absolute functional level (IQ or memory) in 
comparison to the general population. Rather, demographi- 
caily adjusted scores are best used for neurodiagnostic assess- 
ment in order to minimize the impact of confounding variables 
on the diagnosis of cognitive impairment. That is, they should 
be used to infer strengths and weakness relative to a presumed 
pre-morbid standard (The Psychological Corporation, 2002). 
Therefore, neuropsychologists need to balance the risks and 
benefits of using within-group norms, and use them with a 
full understanding of their implications and the situations in 
which they are most appropriate. 

Finally, eliminating score differences across demographic 
groups with demographic corrections or within-group norms 
will not adjust the current disparities in life circumstances 
and outcomes that are at times reflected in test scores, nor will 
it adjust for the relative lack of neuropsychological models 
that include sociocultural factors (e.g., Campbell et al, 2002; 
Perez- Arce, 1999). Above all else, as noted by Ardila et al. 
(1994), "the clinical neuropsychologist must entertain the no- 
tion that human diversity does not translate into human defi- 
ciency" (p. 5). 



PRACTICAL CONSIDERATIONS IN THE USE 
OF DEMOGRAPHICALLY CORRECTED NORMS 



Caveats in the Clinical Use of Demographicaily Based 
Norms for Minority Groups 

As we have already alluded to, within-group norms signifi- 
cantly affect sensitivity and specificity with regard to tests that 
employ cutoffs for assigning individuals to specific groups, 
such as those based on diagnosis (e.g., diagnostic criteria for 
dementia or language disorder), or for assigning individuals 
to treatment (e.g., Aricept trial, language intervention). The 
costs of a false positive may be high in one subgroup (e.g., 
African American elders undergoing a dementia workup), but 
false negatives might be of larger consequence in another sub- 
group (i.e., Hispanic preschoolers in an early screening pro- 
gram). In the former, a false positive is associated with clearly 
adverse consequences (e.g., false dementia diagnosis). How- 
ever, in the latter, a false negative might be worse if it means 
losing access to a service that might have far-reaching benefits 



By necessity, neuropsychologists often adopt a compromise 
between using population-wide norms and within-group 
norms. Almost all tests provide age-based scores, but the 
availability of norms based on education and minority status 
(ethnicity/race/culture) varies greatly across tests. As a result, 
practicing clinicians are therefore only able to demographi- 
caily correct some scores but not others, unless a uniform 
battery such as the WAIS-III/WMS-III or NAB is employed. 
The problem is that if only some scores are adjusted for mod- 
erator variables (e.g., education), and then compared with 
non-corrected scores, false attributions of impairment may 
occur (Kalechstein et al., 1998). 

As a practical solution, given equivalent sampling quality, 
the a priori selection of the normative set should be primarily 
guided by the particular moderating variable that is most 
likely to affect the classification of test performance (Kalech- 
stein et al., 1998). For example, on tests of psychomotor 



52 A Compendium of Neuropsychological Tests 



speed, it would be preferable to match on the basis of age 
rather than education, if a choice was required. In contrast, on 
tests of verbal achievement, education would likely be the pri- 
mary selection criterion. 

In each test review in this volume, we have included infor- 
mation on demographic influences on test performance. With 
this information, users can determine for themselves whether 
demographic corrections are necessary, and whether test 
scores need to be interpreted within a demographically rele- 
vant context. 



proportion that parallels that found in the general popula- 
tion), and indicate how the sample was screened for condi- 
tions affecting performance on the test. Users would then 
need to adjust interpretations when comparing tests with full 
and truncated distributions. In most cases, we have indicated 
screening criteria for the tests reviewed in this volume so that 
neuropsychologists can effectively make these comparisons. 
However, further research on the comparability of tests with 
different inclusion and exclusion criteria regarding individu- 
als with disabilities or health conditions is needed. 



EXCLUSIONARY CRITERIA AND THE PROBLEM 
OF THE TRUNCATED DISTRIBUTION 

Although many test developers make specific efforts to ex- 
clude individuals with disabilities or deficits from normative 
samples, there is a strong argument for including impaired 
individuals within normative sets, in a proportion that ap- 
proximates that found in the general population. The reason 
is that excluding individuals with disabilities from normative 
sets actually distorts the normal distribution of scores (Mc- 
Fadden, 1996). When the "tail" of the population is truncated 
by removing the scores of individuals with disabilities or dis- 
ease, this forces lower-functioning but healthy individuals in 
the normative group to then represent the lowest perfor- 
mance rankings in the population and to substitute for the 
missing lower tail. When this distribution is then used to de- 
rive a norm-derived score for a healthy but low functioning 
person, the result is a significant overestimation of deficits 
(see PPVT-IIItor additional discussion of this issue). 

In particular, using exclusionary criteria in older individu- 
als based on health status may disproportionately restrict nor- 
mative samples because of the increased prevalence of medical 
or other health conditions in this age range. The result is a 
"normal" sample that includes only the upper ranges of scores 
for older individuals, and which will disproportionately ren- 
der impairment scores for low-functioning but typically aging 
elders (Kalechstein et al, 1998). 

Hebben and Milberg (2002) discuss additional problems 
specific to constructing appropriate norms for use in the neu- 
ropsychological examination of the elderly patient. These in- 
clude cohort effects, the presence of non-neurological illness 
in many normal elderly persons, the increased likelihood of 
persons with early dementia in older age groups, and floor ef- 
fects. The inclusion of older adults in the terminal stages of 
decline associated with impending mortality also needs to be 
evaluated since such individuals are likely to show cognitive 
dysfunction (Macdonald, 2004). 

For individual patient evaluations, there is the potential for 
erroneous interpretations of strengths and weaknesses and 
inconsistencies across testing domains if test scores derived 
from truncated distributions are unknowingly compared with 
scores derived from full distributions (McFadden, 1996). Thus, 
a well- constructed normative set would indicate the number 
of individuals with disabilities in the sample (typically in a 



GEOGRAPHIC LOCATION 

Test manuals should always include information on where 
normative data were collected because of regional variation in 
demographic factors that can affect test scores, such as SES, 
education, and cultural background. Most large test batteries 
developed in the United States sample data from four or five 
major regions. These include the Northeast, South, West, and 
North Central, plus or minus the additional category of 
Southwest. In most cases, both urban and rural individuals 
are represented in order to represent regional variations in 
test performance. However, not all tests include this addi- 
tional stratification variable. 

On a much larger scale, norms may also differ to a signifi- 
cant extent depending on the country in which they were col- 
lected. Differences occur on two levels: content differences 
due to different exposure or culturally specific responses to 
individual items, and ability differences related to socioeco- 
nomic, sociodemographic, or other differences between coun- 
tries. For instance, there are significant demographic 
differences between English-speaking nations such that peo- 
ple from Britain and individuals from the United States per- 
form differently on common language tests such as the 
PPVT-III and WTAR, even when words are "translated" to 
reflect national dialect. Similarly, major IQ tests are typically 
re-normed in other countries because of cultural and socio- 
economic differences that translate into measurable score dif- 
ferences. For example, Canadian individuals tend to have 
higher IQ scores when U.S. -based IQ test versions are used; 
this may or may not be related to educational differences, but 
also likely reflects the different sociodemographic composi- 
tion of the two countries (e.g., see WAIS-III, Comment, in this 
volume). Similar findings have been reported for Australian 
children with regard to U.S. norms on Wechsler tests (e.g., 
Kamieniecki et al., 2002). Regardless of the underlying factors, 
these differences between countries necessitate re-normings 
for all the major Wechsler tests, including the WPPSI-III, 
WISC-IV, WAIS-III and WIAT-II. A detailed review of WAIS- 
based differences across nations was recently provided by 
Shuttleworth-Edwards et al. (2004). 

Because the vast majority of English-language neuropsy- 
chological tests are normed in the United States, additional 
research on the representativeness of different neuropsycho- 
logical norms across regions and across nations would be 



Norms Selection in Neuropsychological Assessment 53 



helpful. Caution is therefore warranted in using norms col- 
lected in other regions because of variation of important de- 
mographic factors (Patton et al, 2003). Local norms are 
therefore an asset when available. With regard to cross-test 
comparisons, clinicians need to be aware of the origins of the 
normative data they use, and keep any possible demographic 
differences in mind when comparing scores across tests 
normed in different countries or geographic regions. 

A FINAL COMMENT ON SAMPLES 
OF CONVENIENCE 

A well- constructed normative data set should consist of an 
actual standardization sample rather than a sample of conve- 
nience. This means that standard procedures should have 
been followed in the recruitment of subjects, test administra- 
tion, scoring, and data collection, including assessing exam- 
iners for consistency in administration and scoring, data 
error checks, and the like. Although this point seems obvious, 
the number of neuropsychological tests that rely on samples 
of convenience is surprising. These samples consist of data 
originally collected for another purpose, such as research, or 
under non- or semi-standardized conditions, including tests 
with aggregate normative data collected over several years, 
control data collected as part of several research projects, pi- 
lot data, partial standardization data, and even data collected 
as part of neuropsychological workshops for practitioners. As 
much as possible, we have attempted to identify the type and 
source of the normative data included in this volume so that 
clinicians can assess the conditions under which norms were 
collected, and judge the quality of the norming effort for 
themselves. 



REFERENCES 

American Educational Research Association, American Psychological 
Association, & National Council on Measurement in Education. 
(1999). Standards for educational and psychological testing. Wash- 
ington, DC: American Psychological Association. 

Ardila, A., Rosselli, M., & Puente, A. E. (1994). Neuropsychological 
evaluation of the Spanish speaker. New York: Plenum Press. 

Baron, I. S. (2004). Neuropsychological evaluation of the child. New 
York: Oxford University Press. 

Bengston, M. L., Mittenberg, W., Schneider, B., & Seller, A. (1996). An 
assessment of Halstead-Reitan test score changes over 20 years. 
[Abstract]. Archives of Clinical Neuropsychology, 11, 386. 

Bocerean, C, Fisher, J.-P., & Fliellier, A. (2003). Long-term compari- 
son (1921-2001) of numerical knowledge in three to five-and-a- 
half year-old children. European Journal of Psychology of 
Education, XVIII, 4, 403-424. 

Brooks-Gunn, J., Klebanov, P. K., Smith, J., Duncan, G. J., & Lee, K. 
(2003). The Black- White test score gap in young children: Contri- 
butions of test and family characteristics. Applied Developmental 
Science, 7(4), 239-252. 

Byrd, D. E., Touradji, P., Tang, M.-X., & Manly, J. J. (2004). Cancella- 
tion test performance in African American, Hispanic, and White 



elderly. Journal of the International Neuropsychological Society, 10, 
401-411. 

Campbell, A. L., Ocampo, C, Rorie, K. D., Lewis, S., Combs, S., Ford- 
Booker, P., Briscoe, J., Lewis-Jack, O., Brown, A., Wood, D., Den- 
nis, G., Weir, R., & Hastings, A. (2002). Caveats in the 
neuropsychological assessment of African Americans. Journal of 
the National Medical Association, 94(7), 591-601. 

Farias, S. T, Mungas, D., Reed, B., Haan, M. N., & Jagust, W. J. (2004). 
Everyday functioning in relation to cognitive functioning and neu- 
roimaging in community-dwelling Hispanic and non-Hispanic 
older adults. Journal of the International Neuropsychological Society, 
JO, 342-354. 

Fasteneau, P. S. (1998). Validity of regression-based norms: An 
empirical test of the comprehensive norms with older adults. 
Journal of Clinical and Experimental Neuropsychology, 20(6), 
906-916. 

Fasteneau, P. S., & Adams, K. M. (1996a). Heaton, Grant, and Matthews' 
Comprehensive Norms: An overzealous attempt. Journal of Clinical 
and Experimental Neuropsychology, 18(3), 444-448. 

Ferraro, F. R. (2002). Minority and cross-cultural aspects of neuropsy- 
chological assessment. Lisse, Netherlands: Swets & Zeitlinger Pub- 
lishers. 

Fletcher-Janzen, E., Strickland, T. L., & Reynolds, C.R. (2000). Hand- 
book of cross-cultural neuropsychology. New York: Springer. 

Flynn, J. R. (1984). The mean IQ of Americans: Massive gains 1932 to 
1978. Psychological Bulletin, 95, 29-51. 

Flynn, J. R. (2000). The hidden history of IQ and special education: 
Can the problems be solved? Psychology, Public Policy and Law, 
6(1), 191-198. 

Gasquoine, P. G. (1999). Variables moderating cultural and eth- 
nic differences in neuropsychological assessment: The case of 
Hispanic Americans. The Clinical Neuropsychologist, 23(3), 
376-383. 

Gottfredson, L. S. (1994). The science and politics of race-norming. 
American Psychologist, 49(11), 955-963. 

Greenslaw, P. S., & Jensen, S. S. (1996). Race-norming and the Civil 
Rights Act of 1991. Public Personnel Management, 25(1), 13-24. 

Harris, J. C, &Tulsky, D. S. (2003). Assessment of the non-native En- 
glish speaker: Assimilating history and research findings to guide 
clinical practice. In D. S. Tulsky, D. H. Saklofske, G. J. Chelune, 
R. K. Heaton, R. Ivnik, R. Bornstein, A. Prifitera, & M. F. Ledbetter 
(Eds.), Clinical interpretation of the WAIS-III and WMS-III 
(pp. 343-390). New York: Academic Press. 

Heaton, R. K., Miller, S. W., Taylor, M. J., & Grant, I. (2004). Revised 
Comprehensive Norms for an Expanded Halstead-Reitan Battery: 
Demographically adjusted neuropsychological norms for African 
American and Caucasian adults. Lutz, FL: PAR. 

Heaton, R. K., Taylor, M. J., & Manly, J. (2003). Demographic effects 
and use of demographically corrected norms with the WAIS-III 
and WMS-III. In D. S. Tulsky, D. H. Saklofske, G. J. Chelune, 
R. K. Heaton, R. Ivnik, R. Bornstein, A. Prifitera, & M. F. Ledbetter 
(Eds.), Clinical interpretation of the WAIS-III and WMS-III 
(pp. 181-210). New York: Academic Press. 

Hebben, N., & Milberg, W. (2002). Essentials of neuropsychological as- 
sessment. New York: John Wiley & Sons. 

Kalechstein, A. D., van Gorp, W. G., & Rapport, L. J. (1998). Variabil- 
ity in clinical classification of raw test scores across normative 
data sets. The Clinical Neuropsychologist, 12, 339-347. 

Kamieniecki, G. W., & Lynd-Stevenson, R. M. (2002). Is it appropri- 
ate to use United States norms to assess the "intelligence" of Aus- 
tralian children? Australian Journal of Psychology, 54(2), 67-78. 



54 A Compendium of Neuropsychological Tests 



Kanaya, T., Ceci, S. J., & Scullin, M. H. (2003a). The rise and fall of IQ 
in special ed: Historical trends and their implications. Journal of 
School Psychology, 41, 453-465. 

Kanaya, X, Scullin, M. H., & Ceci, S. J. (2003b). The Flynn effect and 
U.S. policies: The impact of rising IQ scores on American society 
via mental retardation diagnoses. American Psychologist, 5S(10), 
887-890. 

Lezak, M. D., Howieson, D. B., Loring, D. W., Hannary, H. J., & Fis- 
cher, J. S. (2004). Neuropsychological assessment (4th ed.). New 
York: Oxford University Press. 

Llorente, A. M., Ponton, M. O., Taussig, I. M., & Satz, P. (1999). Pat- 
terns of American immigration and their influence on the acqui- 
sition of neuropsychological norms for Hispanics. Archives of 
Clinical Neuropsychology, 14(7), 603-614. 

Macdonald, S. W. S. (2004). Longitudinal profiles of terminal decline: 
Associations between cognitive decline, age, time to death, and 
cause of death. PhD dissertation, University of Victoria. 

Manly, J. J., & Jacobs, D. M. (2001). Future directions in neuropsy- 
chological assessment with African Americans. In F. R. Ferraro 
(Ed.), Minority and cross-cultural aspects of neuropsychological as- 
sessment (pp. 79-96). Heereweg, Lisse, The Netherlands: Swets & 
Zeitlinger. 

Manly, J. J., Jacobs, D. M., Touradji, P., Small, S. A., & Stern, Y. (2002). 
Reading level attenuates differences in neuropsychological test 
performance between African American and White elders. Jour- 
nal of the International Neuropsychological Society, 8, 341-348. 

Manly, J. J., Miller, W., Heaton, R. K., Byrd, D., Reilly, J., Velasquez, 
R. J., Saccuzzo, D. P., Grant, I., and the HIV Neurobehavioral Re- 
search Center (HNRC) Group. (1998). The effect of African- 
American acculturation on neuropsychological test performance 
in normal and HIV-positive individuals. Journal of the Interna- 
tional Neuropsychological Society, 4, 291-302. 

Manly, J. J., Touradji, P., Tang, M. X., & Stern, Y (2003). Literacy and 
memory decline among ethnically diverse elders. Journal of Clini- 
cal and Experimental Neuropsychology, 25(5), 680-690. 

McFadden, X U. (1996). Creating language impairments in typically 
achieving children: The pitfalls of "normal" normative sampling. 
Language, Speech, and Hearing Services in Schools, 27, 3-9. 

McGrew, K. S., & Woodcock, R. W. (2001). Woodcock-Johnson III 
technical manual. Itasca, IL: Riverside Publishing. 

Mitrushina, M. N., Boone, K. B., Razani, J., & DXlia, L.F. (2005). 
Handbook of normative data for neuropsychological assessment 
(2nd ed.). New York: Oxford University Press. 

Mitrushina, M. N., Boone, K. B., & DXlia, L. F. (1999). Handbook of 
normative data for neuropsychological assessment. New York: Ox- 
ford University Press. 

Neisser, U, Boodoo, G., Bouchards, X J., Boykin, A. W., Broday, 
N., Ceci, S. J., Halpersn, D. E, Loehlin, J. C, Perloff, R., Sternberg, 
R. J., & Urbina, S. (1996). Intelligence: Knowns and unknowns. 
American Psychologist, 23(51), 77-101. 

Nunnally, J. C, & Bernstein, I. H. (1994). Psychometric theory (3rd 
ed.). New York: McGraw-Hill. 

O'Connell, M. E., Tuokko, H., Graves, R. E., & Kadlec, H. (2004). 
Correcting the 3MS for bias does not improve accuracy when 



screening for cognitive impairment or dementia. Journal of Clini- 
cal and Experimental Neuropsychology, 26, 970-980. 

Padilla, Y C, Boardman, J. D., Hummer, R. A., & Espitia, M. (2002). 
Is the Mexican American epidemiologic paradox advantage at 
birth maintained through early childhood? Social Forces, 80(3), 
1101-1123. 

Patton, D. E., Duff, K., Schoenberg, M. R., Mold, J., Scott, J. C, & 
Adams, R. L. (2003). Performance of cognitively normal African 
Americans on the RBANS in community dwelling older adults. 
The Clinical Neuropsychologist, 17(4), 515-530. 

Perez- Arce, P. (1999). The influence of culture on cognition. Archives 
of Clinical Neuropsychology, 14(7), 581-592. 

Ponton, M. O. (2001). Research and assessment issues with Hispanic 
populations. In M. O Ponton & J. Leon-Carrion (Eds.), Neu- 
ropsychology and the Hispanic patient (pp. 39-58). Mahwah, NJ: 
Lawrence Erlbaum Associates. 

Ponton, M. O., & Ardila, A. (1999). The future of neuropsychology 
with Hispanic populations in the United States. Archives of Clini- 
cal Neuropsychology, 14(7), 565-580. 

Ponton, M. O., & Leon-Carrion, J. (2001). Neuropsychology and the 
Hispanic patient. Mahwah, NJ: Lawrence Erlbaum Associates, 
Publishers. 

The Psychological Corporation. (2002). WAIS-III/WMS-III Updated 
Technical Manual. San Antonio, TX: Author. 

Reitan, R. M., & Wolfson, D. (1995). Influence of age and education 
on neuropsychological test results. The Clinical Neuropsychologist, 
9, 151-158. 

Reitan, R. M., & Wolfson, D. (1996). The influence of age and educa- 
tion on the neuropsychological test performance of older chil- 
dren. Child Neuropsychology, 1, 165-169. 

Sacket, P. R., & Wilk, S. L. (1994). Within-group norming and other 
forms of score adjustment in preemployment testing. American 
Psychologist, 49(11), 929-954. 

Sattler, J. M. (2001) Assessment of children: Cognitive applications (4th 
ed.). San Diego, CA: J. M. Sattler. 

Shuttleworth-Edwards, A. B., Kemp, R. D., Rust, A. L., Muirhead, J. G. 
L., Hartman, N. P., & Radloff, S. E. (2004). Cross-cultural effects 
on IQ test performance: A review and preliminary normative in- 
dications on WAIS-III test performance. Journal of Clinical and 
Educational Neuropsychology, 26(7), 903-920. 

Sliwinski, M., Buschke, H., Stewart, W. E, Masur, D., & Lipton, R.D. 
(1997). The effect of dementia risk factors on comparative and 
diagnostic selective reminding norms. Journal of the International 
Neuropsychological Society, 3, 317-326. 

Sliwinski, M., Lipton, R., Buschke, H., & Wasylyshyn, C. (2003). Op- 
timizing cognitive test norms for detection. In R. Petersen (Ed.), 
Mild cognitive impairment: Aging to Alzheimer s disease. New York: 
Oxford University Press. 

Truscott, S. D., & Frank, A. J. (2001). Does the Flynn effect affect IQ 
scores of students classified as LD? Journal of School Psychology, 
39(4), 319-334. 

Tulsky, D. S., Saklofske, D. H., Chelune, G. J., Heaton, R. K., Ivnik, R., 
Bornstein, R., Prifitera, A., & Ledbetter, M. E (2003). Clinical inter- 
pretation of the WAIS-III and WMS-II1. New York: Academic Press. 



History Taking 



The interview is one of the four main pillars of the assessment 
process, along with formal tests, observations, and informal 
test procedures (Sattler, 2002). The patient's account of his or 
her prior and present life circumstances, the current problem, 
and his or her behavior/style during the initial interview can 
provide a wealth of information regarding the presence, na- 
ture, and impact of neuropsychological disturbances. Conse- 
quently, taking a detailed history is essential to the evaluation 
and often invaluable since it helps to guide the test selection 
process, allows for interpretation of test data within a suitable 
context, and may yield important clues to a correct diagnosis, 
to the effects of the disorder on daily life, and to decisions re- 
garding rehabilitative interventions. In fact, the interview and 
case history will determine whether it is even possible to pur- 
sue formal testing (Vanderploeg, 2000). If formal assessment 
is possible, the history helps to determine the range of pro- 
cesses to be examined, the complexity of tests to be used, the 
accommodations that may be necessary given the patient's 
disabilities, and the level of cooperation and insight of the pa- 
tient. In neuropsychological assessment, test scores take on di- 
agnostic or practical significance only when evaluated in the 
context of the individual's personal circumstances, academic 
or vocational accomplishments, and behavior during the in- 
terview (Lezak et al., 2004). 

Competent history taking is a skill that requires, in addi- 
tion to a broad knowledge of neuropsychology and a wide ex- 
periential base, an awareness of the interactions that may 
occur in interview situations. Often patients are tense and 
concerned about their symptoms. They are seeking an evalua- 
tion often in spite of their fears about what they will learn 
about themselves or their family member. The clinician's job 
is to put the client at ease and convey that the assessment is a 
collaborative venture to determine the presence, nature, 
severity, and impact of the problem (see also Chapter 4, Test 
Selection, Test Administration, and Preparation of the Patient) . 
Yet, as Sattler (2002) points out, the clinical interview is not 
an ordinary conversation; nor is it a psychotherapeutic ses- 
sion. Rather, it involves an interpersonal interaction that has a 



specific goal, with formal, clearly defined roles and a set of 
norms governing the interaction. Thus, the interview is typi- 
cally a formally arranged meeting where the patient is obliged 
to attend and respond to questions posed by the examiner. 
The interviewer sets the agenda, which covers specific topics 
and directs the interaction, using probing questions to obtain 
detailed and accurate information. Further, the examiner fol- 
lows guidelines concerning confidentiality and privileged 
communication. 

Although interviews can be structured or unstructured, 
history taking typically follows a semistructured, flexible for- 
mat where referring questions are clarified and the following 
information is obtained (see also Lezak et al., 2004; Stowe, 
1998; Vanderploeg, 2000): 

1. basic descriptive data including age, marital status, place 
of birth, etc.; 

2. developmental history including early risk factors and 
deviations from normal development; 

3. social history including educational level/achievement, 
vocational history, family/personal relationships, and 
confrontations with legal system; 

4. relevant past medical history including alcohol/ drug/ 
medication use, exposure to toxins, and relevant family 
history; 

5. current medical status including description of the 
illness (nature of onset; duration; physical, intellectual, 
and emotional/behavioral changes), and 
medication/treatment regimens (including 
compliance); and 

6. the effect of the disorder on daily life, aspirations, and 
personal relations. 

In addition, with a view toward future management and 
rehabilitation, the clinician should obtain information on the 
coping styles and compensatory techniques that the patient 
uses to negotiate impairments and reduce or solve problems. 
Because compensatory strategies rely on functions that are rel- 
atively spared, the clinician should also identify the patient's 



55 



56 A Compendium of Neuropsychological Tests 



resources and strengths. At the end of the history, the clinician 
may wish to go over the material with the patient to ensure 
that the details and chronology are correct and to check for 
additional information or answer questions/concerns that the 
patient may have. The examiner should also clarify the cir- 
cumstances surrounding the examination (e.g., medical-legal 
context) and what the patient believes may be gained or lost 
from the examination, since this may affect test performance. 
Informal test questions about a patient's orientation and 
memory for current events, TV shows, local politics, or geog- 
raphy may be included in the interview. Ideally, the examiner 
should proceed in an orderly fashion, finishing one area be- 
fore going on to the next. However, the examiner must be 
flexible and prepared to deal with unanticipated findings. 

The length of interview varies but often lasts between 30 
and 90 minutes. If the clinical focus is a young child, the par- 
ents) or caregiver is typically interviewed alone since the 
child's presence may compromise the data to be collected 
(Sattler, 2002) as the parent/caregiver may feel uncomfortable 
disclosing concerns. In addition, small children can be dis- 
tracting. 



FRAMING OF QUESTIONS 

The way in which questions are framed will determine the in- 
formation elicited during the assessment. Open-ended ques- 
tions are useful in discovering the patient's point of view and 
emotional state. More-focused questions are better at eliciting 
specific information and help to speed the pace of the assess- 
ment. Both have their place in the assessment. There are, how- 
ever, a number of question types that should be avoided (e.g., 
yes/no questions; double-barreled questions; long, multiple 
questions; coercive, accusatory, or embarrassing questions), 
since they may not elicit the desired information and may cre- 
ate a climate of interrogation and defensiveness (Sattler, 
2002). 



OBSERVATIONS OF BEHAVIOR 

The content of the interview is important; however, the pa- 
tient's behavior and style are also noteworthy. During the in- 
terview, additional, diagnostically valuable information can 
be gleaned from (a) level of consciousness, (b) general appear- 
ance (e.g., eye contact, modulation of face and voice, personal 
hygiene, habits of dress), (c) motor activity (e.g., hemiplegia, 
tics, tenseness, hyper- or hypokinesia), (d) mood, (e) degree 
of cooperation, and (f ) abnormalities in language, prosody, 
thinking, judgment, or memory. 

Because memory is unreliable, it is important to take care- 
ful notes during the interview. Patients are usually forgiving of 
examiners' writing rates. However, the note taking should not 
be so slow as to interfere with rapport. Audiotapes can also be 
used; however, the examiner must obtain written consent to 
use an audio recorder, and it should be remembered that 



audio recording can impact neuropsychological test perfor- 
mance. Constantinou et al. (2002) found that in the presence 
of an audio recorder the performance of participants on 
memory tests declined (see Chapter 4 on Test Selection for 
additional discussion of the presence of observers and audio- 
video recordings). Clinicians who see clients for repeat as- 
sessments may also want to obtain consent for a photograph 
for the file. 



REVIEW OF RECORDS 

Often, the patient's account will have to be supplemented 
by information from other sources (e.g., informants, docu- 
ments), since a patient's statements may prove misleading 
with regard to the presence/absence of events and symptoms, 
the gravity of symptoms, and the time course of their evolu- 
tion. For example, patients with a memory disorder or who 
lack insight are likely to be poor historians. Similarly, chil- 
dren may not be reliable informants. In forensic matters in 
particular, it is important to verify the history given by the 
patient by obtaining and reviewing records. Distortion of 
past performance may be linked to prospects for compensa- 
tion, and the clinician runs the risk of setting the normal 
reference range too high, causing false diagnoses of neuro- 
logical conditions (Greiffenstein & Baker, 2003; Greiffenstein 
et al., 2002). Reynolds (1998) provides a listing of various 
records that the clinician should consider for review, shown 
in Table 3-1. The relevance of these records varies, depend- 
ing on the case. 



Table 3-1 Records to Consider in the Evaluation of a 
Patient's History 

Educational records Including primary, secondary, and 

postsecondary records, results of any 
standardized testing, and records of 
special education 

Including evaluations, workers' 
compensation claims, and results of 
any personnel testing 

Including criminal and civil litigation 



Employment records 

Legal records 
Medical records 
Mental health records 



Substance abuse 
treatment records 

Military records 



Both pre- and postmorbid 

Including any prior contact with a mental 
health professional whether 
psychiatrist, psychologist, social 
worker, counselor, etc., including 
therapy notes, history and intake 
forms, test results, reports, and 
protocols 

Such records may be more difficult to 
obtain because of federal privacy laws 

Including results of ASVAB, GATB, etc. 



Source: Adapted from Reynolds, 1998. 



History Taking 57 



SYMPTOM REPORT/RESPONSE STYLES 

It should be noted that patients may falsely attribute preexist- 
ing symptoms to an accident or insult, report a higher than ac- 
tual level of premorbid function, catastrophize or over-report 
current symptoms, or have difficulty reporting symptoms pre- 
cisely, without intending to deceive (Greiffenstein et al., 2002; 
Slick et al, 1999). In addition to motivational issues (e.g., 
Youngjohn et al, 1995), a number of non-neurologic factors 
have been shown to influence self-report of premorbid status 
and subjective complaints of physical, cognitive, and psycho- 
logical functioning. These include emotional status and nega- 
tive affectivity (e.g., Fox et al, 1995b; Gunstad & Suhr, 2001, 
2002; Sawchyn et al., 2000; Seidenberg et al, 1994), chronic 
pain (Iverson & McCracken, 1997), suggestibility or expecta- 
tions regarding outcome (Ferguson et al, 1999; Mittenberg 
et al., 1992; Suhr & Gunstad, 2002, 2005), and response biases 
shaped by the social context (Greiffenstein et al., 2002). There 
is also evidence that following any negative life event (be it ac- 
cident or illness, neurological or not), individuals may attrib- 
ute all symptoms, both current and preexisting, to that 
negative event (Gunstad & Suhr, 2001, 2002). 

Patients also have certain ways of answering questions, and 
some of these response styles or sets may affect the accuracy 
of the information. For example, children with developmental 
disabilities may be deficient in assertiveness skills and thus 
may be prone to acquiesce by answering "yes" to yes/no ques- 
tions (Horton & Kochurka, 1995). Other pitfalls in obtaining 
information from certain patients include position bias (e.g., 
the tendency to choose the first answer in a list), misunder- 
standing negatively worded items (prominent in young chil- 
dren), and difficulty with time sense (that is, relating past 
symptoms to current health status; Dodson, 1994). 

These considerations underscore the fact that self-report 
of particular symptoms or symptom clusters can suggest but 
should not be viewed as diagnostic of specific disorders. For 
example, complaints of headache, fatigue, and irritability may 
be found in such diverse conditions as head injury, chronic 
fatigue syndrome, gastrointestinal disorder, and the common 
cold (Binder, 1997; Gunstad & Suhr, 2002). Further, base rates 
of reported postconcussive symptoms (PCSs) have been 
found to be similar in comparisons of injured and noninjured 
individuals using symptom checklists (e.g., Ferguson et al., 
1999; Fox et al., 1995a, 1995b; Gouvier et al, 1988; Gunstad & 
Suhr, 2001; Paniak et al., 2002), suggesting that self-report of 
PCSs are not unique to that disorder. In a similar vein, com- 
plaints of memory disturbance are a consistent feature of 
early AD but are also evident in other disorders such as de- 
pression (American Psychiatric Association, 1994). 



QUESTIONNAIRES 

A good interview takes careful planning. It requires that the 
examiner learn as much as possible about the person and the 
purpose(s) of the assessment prior to the interview (e.g., by 



reviewing medical reports). Questionnaires have also been de- 
veloped to provide documentation of information routinely 
collected during the interview (e.g., Baron, 2004; Baron et al., 
1995; Dougherty & Schinka, 1989a, 1989b; Goldstein & Gold- 
stein, 1999; Sattler, 2002; Schinka, 1989). Clinicians may con- 
struct their own forms or refer to ones that we have 
developed. We use two questionnaires, one designed for adult 
patients (Figure 3-1) and the other for use with parents when 
children are clients (Figure 3-2). Note that none of the forms 
(our own or others) has data on validity or reliability. Accord- 
ingly, the questionnaires should be viewed as complementary 
to and not as substitutes for the interview. We emphasize that 
they are not diagnostic instruments. The most efficient use 
of these instruments is to have the client or caregiver com- 
plete them prior to the meeting. In this way, the interview can 
clarify or focus on the major concerns of the individual and 
need not touch on minor details already provided in the ques- 
tionnaire. Such a complementary process increases the confi- 
dence in the information obtained and ensures that no topics 
are overlooked. 



REFERENCES 

American Psychiatric Association. (1994). Diagnostic and Statistical 
Manual of Mental Disorders (4th ed.). Washington, DC: Author. 

Baron, I. S. (2004). Neuropsychological evaluation of the child. New 
York: Oxford University Press. 

Baron, I. S., Fennell, E. B., & Voeller, K. S. (1995). Pediatric neuropsy- 
chology in the medical setting. New York: Oxford University Press. 

Binder, L. M. (1997). A review of mild head trauma. Part II: Clinical 
implications. Journal of Clinical and Experimental Neuropsychol- 
ogy, 19, 432-457. 

Constantinou, M., Ashendorf, L., & McCaffrey, R. ]. (2002). When the 
third party observer of a neuropsychological examination is an 
audio-recorder. The Clinical Neuropsychologist, 16, 407-412. 

Dodson, W. E. (1994). Quality of life measurement in children with 
epilepsy. In M. R. Trimble & W. E. Dodson (Eds.), Epilepsy and 
quality of life (pp. 217-226). New York: Raven Press. 

Dougherty, E., & Schinka, ]. A. (1989a). Developmental History 
Checklist. Odessa, FL: PAR. 

Dougherty, E., & Schinka, J. A. (1989b). Personal History Checklist — 
Adolescent. Odessa, FL: PAR. 

Ferguson, R. I., Mittenberg, W, Barone, D. F, & Schneider, B. (1999). 
Postconcussion syndrome following sports-related head injury: 
expectation as etiology. Neuropsychology, 13, 582-589. 

Fox, D. D., Lees-Haley, P. R., Earnest, K., & Dolezal-Wood, S. (1995a). 
Base rates of postconcussive symptoms in health maintenance or- 
ganization patients and controls. Neuropsychology, 9, 606-6 1 1 . 

Fox, D. D., Lees-Haley, P. R., Earnest, K., & Dolezal-Wood, S. (1995b). 
Post-concussive symptoms: base rates and etiology in psychiatric 
patients. Clinical Neuropsychologist, 9, 89-92. 

Goldstein, A., & Goldstein, M. (1999). Childhood History Form. Salt 
Lake City: Neurology, Learning and Behavior Center. 

Gouvier, W. D., Uddo-Crane, M., & Brown, L. M. (1988). Base rates 
of post-concussional symptoms. Archives of Clinical Neuropsy- 
chology, 3, 273-278. 

Greiffenstein, M. F., & Baker, W. J. (2003). Premorbid clues? Prein- 
jury scholastic performance and present neuropsychological 



58 A Compendium of Neuropsychological Tests 



functioning in late postconcussion syndrome. The Clinical Neu- 
ropsychologist, 17, 561-573. 

Greiffenstein, M. R, Baker, W. J., & Johnson-Greene, D. (2002). Actual 
versus self-reported scholastic achievement of litigating postcon- 
cussion and severe closed head injury claimants. Psychological As- 
sessment, 14, 202-208. 

Gunstad, J., & Suhr, J. A. (2001). Expectation as etiology versus the 
good old days: Postconcussion syndrome reporting in athletes, 
headache sufferers, and depressed individuals. Journal of the In- 
ternational Neuropsychological Society, 7, 323-333. 

Gunstad, J., & Suhr, J. A. (2002). Perception of illness: nonspecificity 
of postconcussion syndrome symptom expectation. Journal of the 
International Neuropsychological Society, 8, 37-47. 

Horton, C. B., & Kochurka, K. A. (1995). The assessment of children 
with disabilities who report sexual abuse: A special look at those 
most vulnerable. In T. Ney (Ed.), True and false allegations of child 
sexual abuse: Assessment and case management (pp. 275-289). 
New York: Brunner/Mazel. 

Iverson, G., & McKracken, L. (1997). "Postconcussive" symptoms in 
persons with chronic pain. Brain Injury, 10, 783-790. 

Lezak, M. D., Howieson, D. B., & Loring, D. W. (2004). Neuropsycho- 
logical assessment (4th ed.). New York: Oxford University Press. 

Mittenberg, W., Diguilio, D. V., Perrin, S., & Bass, A. E. (1992). Symp- 
toms following mild head injury: Expectation as aetiology. Jour- 
nal of Neurology, Neurosurgery and Psychiatry, 55, 200-204. 

Paniak, C., Reynolds, S., Phillips, K., Toller-Lobe, G., Melnyk, A., & 
Nagy, J. (2002). Patient complaints within 1 month of mild trau- 
matic brain injury: A controlled study. Archives of Clinical Neu- 
ropsychology, 17, 319-334. 

Reynolds, C. R. (1998). Common sense, clinicians, and actuarialism 
in the detection of malingering during head injury litigation. In 
C. R. Reynolds (Ed.), Detection of malingering during head injury 
litigation (pp. 261-286). New York: Plenum Press. 



Sattler, J. (2002). Assessment of children: behavioral and clinical appli- 
cations, (4th ed.) San Diego: Sattler. 

Sawchyn, J., Brulot, M., & Strauss, E. (2000). Note on the use of the 
Postconcussion Symptom Checklist. Archives of Clinical Neuro- 
psychology, 15, 1-8. 

Schinka, J. A. (1989). Personal History Checklist— Adult. Odessa, FL: 
PAR. 

Seidenberg, M., Taylor, M. A., & Haltiner, A. (1994). Personality and 
self-report of cognitive functioning. Archives of Clinical Neuropsy- 
chology, 9, 353-361 . 

Slick, D. J., Sherman, E. M. S., & Iverson, G. L. (1999). Diagnostic crite- 
ria for malingered neurocognitive dysfunction: Proposed standards 
for clinical practice and research. The Clinical Neuropsychologist, 13, 
545-561. 

Stowe, R. M. (1998). Assessment methods in behavioural neurology 
and neuropsychiatry. In G. Goldstein, P. D. Nussbaum, & S. R. 
Beers (Eds.), Neuropsychology (pp. 439-485). New York: Plenum 
Press. 

Suhr, J. A., & Gunstad, J. (2002). "Diagnosis threat": The effect of 
negative expectations on cognitive performance in head injury. 
Journal of Clinical and Experimental Neuropsychology, 24, 
448-457. 

Suhr, J. A., & Gunstad, J. (2005). Further exploration of the effect of 
"diagnosis threat" on cognitive performance in individuals with 
mild head injury. Journal of the International Neuropsychological 
Society, 11,23-29. 

Vanderploeg, R. D. (2000). Interview and testing: The data collection 
phase of neuropsychological evaluations. In R. D. Vanderploeg 
(Ed.), Clinican's guide to neuropsychological assessment (2nd ed., 
pp. 3-38). New Jersey: Lawrence Erlbaum Associates. 

Youngjohn, J. R., Burrows, L., & Erdal, K. (1995). Brain damage or 
compensation neurosis? The controversial post-concussion syn- 
drome. The Clinical Neuropsychologist, 9, 112-123. 



Figure 3-1 Background questionnaire — Adult version. 



BACKGROUND QUESTIONNAIRE-ADULT 

Confidential 

The following is a detailed questionnaire on your development, medical history, and current functioning at home and at work. This informa- 
tion will be integrated with the testing results in order to provide a better picture of your abilities as well as any problem areas. Please fill out 
this questionnaire as completely as you can. 



Client's Name: 

(If not client, name of person completing this form . 
Relationship to Client 

Home address 

Client's Phone (H) 

Date of Birth 

Place of Birth 

Primary Language 



.(W). 



Hand used for writing (check one) 
Medical Diagnosis (if any) (1 ) 



Age. 



Today's Date: 



Work. 



Sex. 



Secondary Language 



Fluent/Nonfluent (circle one) 
□ Right □ Left 



(2). 



Who referred you for this evaluation? . 
Briefly describe problem: 



Date of the accident, injury, or onset of illness 

What specific questions would you like answered by this evaluation? 
(1) 



(2). 
(3). 



SYMPTOM SURVEY 

For each symptom that applies, place a check mark in the box. Add any comments as needed. 



Physical Concerns 
Motor 

□ Headaches 

□ Dizziness 

□ Nausea or vomiting 

□ Excessive fatigue 

□ Urinary incontinence 

□ Bowel problems 

□ Weakness 

on one side of body 
(Indicate body part) 

□ Problems with fine 
motor control 

□ Tremor or shakiness 

□ Tics or strange movements 

□ Balance problems 

□ Often bump into things 

□ Blackout spells (fainting) 

□ Other motor problems 



Rt 



Both 



Date of Onset 



[continued) 



59 



Figure 3-1 {continued) 



Sensory 

□ Loss of feeling/numbness 
(Indicate where) 

□ Tingling or strange skin 

sensations (Indicate where) 

□ Difficulty telling hot from cold 

□ Visual Impairment 

□ Wear glasses □ Yes □ No 

□ Problems seeing on one side 

□ Sensitivity to bright lights 

□ Blurred vision 

□ See things that are not there 

□ Brief periods of blindness 

□ Need to squint or move 
closer to see clearly 

□ Hearing loss 

□ Wear hearing aid D Yes D No 

□ Ringing in ears 

□ Hear strange sounds 

□ Unaware of things on one 
side of my body 

□ Problems with taste 

□ ( Increased Decreased sensitivity) 

Problems with smell 

□ ( Increased Decreased sensitivity) 

□ Pain (describe) 

□ Other sensory problems 



Rt 



Both 



Date of Onset 



Intellectual Concerns 
Problem Solving 

□ Difficulty figuring out how to do new things 

□ Difficulty figuring out problems that most others can do 

□ Difficulty planning ahead 

□ Difficulty changing a plan or activity when necessary 

□ Difficulty thinking as quickly as needed 

□ Difficulty completing an activity in a reasonable time 

□ Difficulty doing things in the right order (sequencing) 



Date of Onset 



Language and Math Skills 

□ Difficulty finding the right word 

□ Slurred speech 

□ Odd or unusual speech sounds 

□ Difficulty expressing thoughts 

□ Difficulty understanding what others say 

□ Difficulty understanding what I read 

□ Difficulty writing letters or words (not due to 
motor problems) 

□ Difficulty with math (e.g., balancing checkbook, making 
change, etc.) 

□ Other language or math problems 



Date of Onset 



Nonverbal Skills 

□ Difficulty telling right from left 

□ Difficulty drawing or copying 

□ Difficulty dressing (not due to motor problems) 

□ Difficulty doing things I should automatically be able to do 
(e.g., brushing teeth) 

□ Problems finding way around familiar places 

□ Difficulty recognizing objects or people 

□ Parts of my body do not seem as if they belong to me 

□ Decline in my musical abilities 



Date of Onset 



[continued) 



60 



Figure 3-1 {continued) 



□ Not aware of time (e.g., day, season, year) 

□ Slow reaction time 

□ Other nonverbal problems 



Awareness and Concentration Date of Onset 

□ Highly distractible 

□ Lose my train of thought easily 

□ My mind goes blank a lot 

□ Difficulty doing more than one thing at a time 

□ Become easily confused and disoriented 

□ Aura (strange feelings) 

□ Don't feel very alert or aware of things 

□ Tasks require more effort or attention 



Memory Date of Onset 

□ Forget where I leave things (e.g., keys, gloves, etc.) 

□ Forget names 

□ Forget what I should be doing 

□ Forget where I am or where I am going 

□ Forget recent events (e.g., breakfast) 

□ Forget appointments 

□ Forget events that happened long ago 

□ More reliant on others to remind me of things 

□ More reliant on notes to remember things 

□ Forget the order of events 

□ Forget facts but can remember how to do things 

□ Forget faces of people I know (when not present) 

□ Other memory problems 

Mood/Behavior/Personality Rate Severity Date of Onset 

Mild 



□ Sadness or depression D 

□ Anxiety or nervousness D 

□ Stress □ 

□ Sleep problems (falling asleep D staying asleep D) 

□ Experience nightmares on a daily/weekly basis 

□ Become angry more easily 

□ Euphoria (feeling on top of the world) 

□ Much more emotional (e.g., cry more easily) 

□ Feel as if I just don't care anymore 

□ Easily frustrated 

□ Doing things automatically (without awareness) 

□ Less inhibited (do things I would not do before) 

□ Difficulty being spontaneous 

□ Change in energy (D loss D increase) 

□ Change in appetite (D loss D increase) 

□ Increase D or loss D of weight 

□ Change in sexual interest (increase D decline D) 

□ Lack of interest in pleasurable activities 

□ Increase in irritability 

□ Increase in aggression 

□ Other changes in mood or personality or in how you deal with people 



Rate Severity 




Moderate 


Severe 


a 


a 


a 


a 


a 


a 



Have others commented to you about changes in your thinking, behavior, personality, or mood? If yes, who and what have they 
said? □ Yes □ No 



[continued) 



61 



Figure 3-1 (continued) 



Are you experiencing any problems in the following aspects of your life? If so, please explain: 
Marital/Family 



Financial/Legal . 



Housekeeping/Money Management . 



Driving . 



Overall, my symptoms have developed D slowly D quickly 

My symptoms occur D occasionally D often 

Over the past six months my symptoms have D improved D stayed the same D worsened 

Is there anything you can do (or someone does) that gets the problems to stop or be less intense, less frequent, or 
shorter? 



What seems to make the problems worse? . 



In summary, there is D definitely something wrong with me 

D possibly something wrong with me 
D nothing wrong with me 

What are your goals and aspirations for the future? 



EARLY HISTORY 

You were born: D on time D prematurely D late 
Your weight at birth: 

Were there any problems associated with your birth (e.g., oxygen deprivation, unusual birth position, etc.) or the period afterward 

(e.g., need for oxygen, convulsions, illness, etc.)? D Yes D No 

Describe: 

Check all that applied to your mother while she was pregnant with you: 

□ Accident 

□ Alcohol use 

□ Cigarette smoking 

□ Drug use (marijuana, cocaine, LSD, etc.) 

□ Illness (toxemia, diabetes, high blood pressure, infection, etc.) 

□ Poor nutrition 

□ Psychological problems 

□ Medications (prescribed or over the counter) taken during pregnancy 

□ Other problems 

List all medications (prescribed or over the counter) that your mother took while pregnant: 

Rate your developmental progress as it has been reported to you, by checking one description for each area: 

Early Average Late 



Walking □ □ □ 

Language D D D 

Toilet training ODD 

Overall development D D D 



[continued) 



62 



Figure 3-1 (continued) 



As a child, did you have any of these conditions: 

□ Attentional problems □ Learning disability 

□ Clumsiness □ Speech problems 

□ Developmental delay □ Hearing problems 

□ Hyperactivity □ Frequent ear infections 

□ Muscle weakness □ Visual problems 

MEDICAL HISTORY 

Medical problems prior to the onset of current condition 

If yes, give date(s) and brief description 

□ Head injuries 

□ Loss of consciousness 

□ Motor vehicle accidents 

□ Major falls, sports accidents, 

or industrial injuries 

□ Seizures 

□ Stroke 

□ Arteriosclerosis 

□ Dementia 

□ Other brain infection or disorder 
(meningitis, encephalitis, 

oxygen deprivation etc.) 

□ Diabetes 

□ Heart disease 

□ Cancer 

□ Back or neck injury 

□ Serious illnesses/disorder 
(Immune disorder, cerebral 

palsy, polio, lung, etc.) 

□ Poisoning 

□ Exposure toxins (e.g., lead, 

solvents, chemicals) 

□ Major surgeries 

□ Psychiatric problems 

□ Other 



Are you currently taking any medication? 

Name Reason for taking Dosage Date started 



Are you currently in counseling or under psychiatric care? D Yes D No 
Please list date therapy initiated and name(s) of professional(s) treating you: 



Have you ever been in counseling or under psychiatric care? D Yes D No 
If yes, please list dates of therapy and name(s) of professional(s) who treated you: 



Please list all inpatient hospitalizations including the name of the hospital, date of hospitalization, duration, and 
diagnosis: 



[continued) 



63 



Figure 3-1 {continued) 



SUBSTANCE USE HISTORY 

I started drinking at age: 

□ less than 10 years old □ 10-15 □ 16-19 □ 20-21 Dover 21 

I drink alcohol: D rarely or never D 1-2 days/week 

D 3-5 days/week D daily 

I used to drink alcohol but stopped: Date stopped: 

Preferred type(s) of drinks: 



Usual number of drinks I have at one time: 



My last drink was: D less than 24 hours ago D 24-48 hours ago 

D over 48 hours ago 

Check all that apply: 

D I can drink more than most people my age and size before I get drunk. 

D I sometimes get into trouble (fights, legal difficulty, work problems, conflicts with family, accidents, etc.) after 

drinking (specify): 

D I sometimes black out after drinking. 

Please check all the drugs you are now using or have used in the past: 

Presently Using Used in Past 

Amphetamines (including diet pills) D D 

Barbiturates (downers, etc.) D D 

Cocaine or crack D D 

Hallucinogenics (LSD, acid, STP, etc.) D D 

Inhalants (glue, nitrous oxide, etc.) D D 

Marijuana D D 

Opiate narcotics (heroin, morphine, etc.) D D 

PCP (or angel dust) □ □ 

Others (list) □ □ 

Do you consider yourself dependent on any of the above drugs? D Yes D No 
If yes, which one(s): 

Do you consider yourself dependent on any prescription drugs? D Yes D No 
If yes, which one(s): 

Check all that apply: 

D I have gone through drug withdrawal. 

D I have used IV drugs. 

D I have been in drug treatment. 

Has use of drugs ever affected your work performance? . 



Has use of drugs or alcohol ever affected your driving ability? . 
Do you smoke? D Yes D No 
If yes, amount per day: 



Do you drink coffee: D Yes D No 
If yes, amount per day: 



FAMILY HISTORY 

The following questions deal with your biological mother, father, brothers, and sisters: 

Is your mother alive? D Yes D No 

If deceased, what was the cause of her death? 

Mother's highest level of education: 

Mother's occupation: 



[continued) 



64 



Figure 3-1 (continued) 



Does your mother have a known or suspected learning disability? D Yes D No 

If yes, describe: 

Is your father alive? D Yes D No 

If deceased, what was the cause of his death? 

Father's highest level of education: 

Father's occupation: 



Does your father have a known or suspected learning disability? D Yes D No 

If yes, describe: 

How many brothers and sisters do you have? 

What are their ages? . 



Are there any unusual problems (physical, academic, psychological) associated with any of your brothers or sisters? 
If yes, describe: 



Please check all problems that exist(ed) in close biological family members (parents, brothers, sisters, grandparents, aunts, uncles). Note who 
it is (was) and describe the problem where indicated 

Who? Describe 

Neurologic disease 

□ Alzheimer's disease or senility 

□ Huntington's disease 

□ Multiple sclerosis 

□ Parkinson's disease 

□ Epilepsy or seizures 

□ Other neurologic disease 

Psychiatric illness 

□ Depression 

□ Bipolar illness (manic-depression) 

□ Schizophrenia 

□ Other 
Other disorders 

□ Mental retardation 

□ Speech or language disorder 

□ Learning problems 

□ Attention problems 

□ Behavior problems 



□ Other major disease or disorder 



PERSONAL HISTORY 

Marital History 

Current marital status: D Single D Married D Common-law 

D Separated D Divorced D Widowed 

Years married to current spouse: . 



Dates of previous marriages: From to 

From to 

Spouse's name: Age: 

Spouse's occupation: 

Spouse's health: D Excellent D Good D Poor 

Children (include stepchildren) 

Name Age Gender Occupation 



Who currently lives at home? 

Do any family members have any significant health concerns/special needs? 

[continued) 



65 



Figure 3-1 {continued) 



Name of Grades and Degree certifications 

School Attended Years Attended 



Educational History 

Elementary 
High school 
College/university 

Trade school 



If a high school diploma was not awarded, did you complete a GED? . 

Were any grades repeated? D Yes D No 

Reason: 



Were there any special problems learning to read, write, or do math? . 



Were you ever in any special class(es) or did you ever receive special services? 
D Yes □ No 

If yes, what grade(s) or age? 

What type of class? 



How would you describe your usual performance as a student? 

□ A & B Provide any additional helpful comments about your academic 

□ B & C performance: 

□ C&D 

□ D&F 



Military Service 

Did you serve in the military? D Yes D No 

If yes, what branch? Dates: 

Certifications/Duties: 



Did you serve in war time? If so, what arena? . 



Did you receive injuries of were you ever exposed to any dangerous or unusual substances during your service? 
□ Yes D No 



If yes, explain: . 



Do you have any continuing problems related to your military service? Describe: 



Occupational History 

Are you currently working? D Yes D No 

Current job title: 

Name of employer: 

Current responsibilities: 

Dates of employment: 



Are you currently experiencing any problems at work? D Yes D No 
If yes, describe: 



Do you see your current work situation as stable? D Yes D No 

[continued) 



66 



Figure 3-1 (continued) 



Approximate annual income: Prior to injury or illness 

After injury or illness 



Previous employers: 

Name Dates Duties/position Reason for leaving 



Recreation 

Briefly list the types of recreations (e.g., sports, games, TV, hobbies, etc.) that you enjoy: 



Are you still able to do these activities? . 



Recent Tests 

Check all tests that recently have been done and report any abnormal findings. 

Check if Abnormal findings 

normal 

□ Angiography 

□ Blood work 

□ CT scan 

□ MRI 

□ PET scan 

□ SPECT 

□ Skull x-ray 

□ EEG 

□ Neurological exam 

□ Other 



Identify the physician who is most familiar with your recent problems: 



Date of last vision exam: 

Date of last hearing exam: . 



Have you had a prior psychological or neuropsychological exam? D Yes D No 
If yes, complete the following: 

Name of psychologist: 

Date: 



Reason for evaluation: 

Finding of the evaluation: . 



Please provide any additional information that you feel is relevant to this referral: 



67 



Figure 3-2 Background questionnaire — Child version. 



BACKGROUND QUESTIONNAIRE -CHILD 

Confidential 

The following is a detailed questionnaire on your child's development, medical history, and current functioning at home and at 
school. This information will be integrated with the testing results to provide a better picture of your child's abilities as well as 
any problem areas. Please fill out this questionnaire as completely as you can. 

Child's Family 



Child's Name: 

Birthdate: 

Birth Country: . 



Today's Date: 



Age: 



Grade: 



Name of School: 

Age on arrival in country if born elsewhere:. 



Person filling out this form: □ Mother □ Father □ Stepmother □ Stepfather □ Other: 

Biological Mother's Name: Age: Highest Grade Completed: 



Number of Years of Education: 
Occupation: 



Degree/Diploma (if applicable): 



Age: 

Degree/Diploma (if applicable) 



Highest Grade Completed: 



Biological Father's Name: 

Number of Years of Education: 

Occupation: 

Marital status of biological parents: □ Married □ Separated □ Divorced □ Widowed □ Other: 

If biological parents are separated or divorced: 

How old was this child when the separation occurred? 

Who has legal custody of the child? (Check one) □ Mother □ Father □ Joint/Both □ Other: 

Stepparent's Name: Age: Occupation: 

If this child is not living with either biological parent: 

Reason: 

□ Adoptive parents □ Foster parents □ Other family members □ Group home □ Other: 

Name(s) of legal guardian(s): 



List all people currently living in your child's household: 
Name Relationship to child 



Age 



If any brothers or sisters are living outside the home, list their names and ages: 



Primary language spoken in the home: 

Other languages spoken in the home: 

If your child's first language is not English, please complete the following: 



Child's first language: 
English: 



Age at which your child learned 



Current Medications 

List all medications that your child is currently taking: 



Medication 


Reason taken 


Dosage (if known) 


Start date 











































[continued] 



68 



Figure 3-2 {continued) 



Behavior Checklist 

Place a check mark (V) next to behaviors that you believe your child exhibits to an excessive or exaggerated degree 
when compared to other children his or her age. 



Sleeping and Eating 

□ Nightmares 

□ Trouble sleeping 

□ Eats poorly 

□ Eats excessively 



□ Dangerous to self or others (describe) 



□ Purposely harms or injures self (describe): 



Social Development 

□ Prefers to be alone 

□ Excessively shy or timid 

□ More interested in objects than in people 

□ Difficulty making friends 

□ Teased by other children 

□ Bullies other children 

□ Not sought out for friendship by peers 

□ Difficulty seeing another person's point of view 

□ Doesn't empathize with others 

□ Overly trusting of others 

□ Doesn't appreciate humor 

Behavior 

□ Stubborn 

□ Irritable, angry, or resentful 

□ Frequent tantrums 

□ Strikes out at others 

□ Throws or destroys things 

□ Lying 

□ Stealing 

□ Argues with adults 

□ Low frustration threshold 

□ Daredevil behavior 

□ Runs away 

□ Needs a lot of supervision 

□ Impulsive (does things without thinking) 

□ Poor sense of danger 

□ Skips school 



□ Talks about killing self (describe): 



□ Unusual fears, habits or mannerisms (describe): 

□ Seems depressed 

□ Cries frequently 

□ Excessively worried and anxious 

□ Overly preoccupied with details 

□ Overly attached to certain objects 

□ Not affected by negative consequences 

□ Drug abuse 

□ Alcohol abuse 

□ Sexually active 

Other Problems 

□ Bladder control problems (not during seizure) 

□ Poor bowel control (soils self) 

□ Motor/vocal tics 

□ Overreacts to noises 

□ Overreacts to touch 

□ Excessive daydreaming and fantasy life 

□ Problems with taste or smell 

Motor Skills 

□ Poor fine motor coordination 

□ Poor gross motor coordination 



Other Problems; 



Education Program 

Does your child have a modified learning program? □ Yes □ No 

Is there an individual education plan (IEP)? □ Yes □ No 

Are you satisfied with your child's current learning program? If not, please explain: 



[continued] 



69 



Figure 3-2 (continued) 



Has your child been held back a grade? D No D Yes (Indicate grade: 


) 




Is your child in any special education classes? □ Yes □ No 






If yes, please describe: 








Is your child receiving learning assistance at school? □ Yes □ No 






If yes, please describe: 








Has your child been suspended or expelled from school? □ Yes □ No 






If yes, please describe: 








Has your child ever received tutoring? □ Yes □ No 






If yes, please describe: 








Briefly describe classroom or school problems if applicable: 


Cognitive Skills 






Rate your child's cognitive skills relative to other children of the same age. 






Above average Average Below average Severe problem 


Speech □ □ 


a 


a 


Comprehension of speech □ □ 


a 


a 


Problem solving □ □ 


a 


a 


Attention span □ □ 


a 


a 


Organizational skills □ □ 


a 


a 


Remembering events □ □ 


a 


a 


Remembering facts □ □ 


a 


a 


Learning from experience □ □ 


a 


a 


Understanding concepts □ □ 


a 


a 


Overall intelligence □ □ 


a 


a 


Check any specific problems: 






□ Poor articulation □ Frequently lose: 


belongings 




□ Difficulty finding words to express self □ Difficulty plann 


ng tasks 




□ Disorganized speech □ Doesn't foresee 


consequences of actions 




□ Ungrammatical speech □ Slow thinking 






□ Talks like a younger child □ Difficulty with math/handling money 




□ Slow learner □ Poor understanding of time 




□ Forgets to do things 






□ Easily distracted 






□ Frequently forgets instructions 






Describe briefly any other cognitive problems that your child may have: 






Describe any special skills or abilities that your child may have: 


Developmental History 






If your child is adopted, please fill in as much of the following information as you are aware of. 




During pregnancy, did the mother of this child: 






Take any medication? □ Yes □ No 






If yes, what kind? 






[continued) 



70 



Figure 3-2 (continued) 



Smoke? □ Yes □ No 

If yes, how many cigarettes each day? 

Drink alcoholic beverages? □ Yes □ No 
If yes, what kind? 



Approximately how much alcohol was consumed each day? 

Use drugs? □ Yes □ No 
If yes, what kind? 



How often were drugs used? 



List any complications during pregnancy (excessive vomiting, excessive staining/blood loss, threatened miscarriage, 
infections, toxemia, fainting, dizziness, etc.): 



Duration of pregnancy (weeks): Duration of labor (hours): Apgars: / . 



Were there any indications of fetal distress? □ Yes □ No 
If yes on any of other above, for what reason? 



Check any that apply to the birth: □ Labor induced □ Forceps □ Breech □ Cesarean 
If yes on any of other above, for what reason? 



What was your child's birth weight? 

Check any that apply following birth: □ Jaundice □ Breathing problems □ Incubator □ Birth defect 
If any, please describe: 



Were there any other complications? □ Yes □ No 
If yes, please describe: 



Were there any feeding problems? □ Yes □ No 
If yes, please describe: 



Were there any sleeping problems? □ Yes □ No 
If yes, please describe: 



Were there any growth or development problems during the first few years of life? □ Yes □ No 

If yes, please describe: 

Were any of the following present (to a significant degree) during infancy or the first few years of life? 

□ Unusually quiet or inactive □ Colic □ Headbanging 

□ Did not like to be held or cuddled □ Excessive restlessness □ Constantly into everything 

□ Not alert □ Excessive sleep □ Excessive number of accidents compared 

with other children 

□ Difficult to soothe □ Diminished sleep 

[continued] 



71 



Figure 3-2 (continued) 



Please indicate the approximate age at which your child first showed the following behaviors by checking the appropriate 
box. Check "Never" if your child has never shown the listed behavior. 



Early Average Late Never 



Smiled 


a 


a 


a 


a 


Rolled over 


a 


a 


a 


a 


Sat alone 


a 


a 


a 


a 


Crawled 


a 


a 


a 


a 


Walked 


a 


a 


a 


a 


Ran 


a 


a 


a 


a 


Babbled 


a 


a 


a 


a 


First word 


a 


a 


a 


a 


Sentences 


a 


a 


a 


a 


Medical History 









Vision problems D No D Yes (describe: _ 
Hearing problems D No D Yes (describe: 



Early Average Late Never 



Tied shoelaces 


a 


a 


a 


a 


Dressed self 


a 


a 


a 


a 


Fed self 


a 


a 


a 


a 


Bladder trained, day 


a 


a 


a 


a 


Bladder trained, night 


a 


a 


a 


a 


Bowel trained 


a 


a 


a 


a 


Rode tricycle 


a 


a 


a 


a 


Rode bicycle 


a 


a 


a 


a 



I Date of last vision examination: _ 
Date of last hearing examination: 



Place a check next to any illness or condition that your child has had. When you check an item, also note the approxi- 
mate date of the illness (if you prefer, you can simply indicate the child's age at illness). 



Illness or condition 



Date(s) or age(s) 



Illness or condition 



Date(s) or age(s) 



a 


Measles 


a 


German measles 


a 


Mumps 


a 


Chicken pox 


a 


Whooping cough 


a 


Diphtheria 


a 


Scarlet fever 


a 


Meningitis 


a 


Pneumonia 


a 


Encephalitis 


a 


High fever 


a 


Seizures 


a 


Allergy 


a 


Hay fever 


a 


Injuries to head 


a 


Broken bones 


a 


Hospitalizations 


u 


Operations 


u 


Ear infections 


a 


Paralysis 


Family Medical History 



a 


Loss of consciousness 


a 


Poisoning 


a 


Severe headaches 


a 


Rheumatic fever 


a 


Tuberculosis 


a 


Bone or joint disease 


a 


Sexually transmitted disease 


a 


Anemia 


a 


Jaundice/hepatitis 


a 


Diabetes 


a 


Cancer 


a 


High blood pressure 


a 


Heart disease 


a 


Asthma 


a 


Bleeding problems 


u 


Eczema or hives 


a 


Physical abuse 


a 


Sexual abuse 


a 


Other: 



Place a check next to any illness or condition that any member of the immediate family (i.e., 
cousins, grandparents) has had. Please note the family member's relationship to the child. 



brothers, sisters, aunts, uncles, 



Condition 

□ Seizures or epilepsy 

□ Attention deficit 

□ Hyperactivity 

□ Learning disabilities 

□ Mental retardation 



Relationship to child 



Relationship to child 



a 


Tics or Tourette's syndrome 


a 


Alcohol abuse 


a 


Drug abuse 


u 


Suicide attempt 


a 


Physical abuse 



[continued] 



72 



Figure 3-2 (continued) 



Relationship to child 



Condition 

□ Childhood behavior 
problems 

□ Mental illness 

□ Depression or anxiety 



List any previous assessments that your child has had: 

Dates of testing 
Psychiatric 



Psychological 
Neuropsychological 
Educational 
Speech pathology 



□ Sexual abuse 

□ Neurological illness or disease 

□ Antisocial behavior 
(assaults, thefts, etc.) 

Name of examiner 



Relationship to child 



List any form of psychological/psychiatric treatment that your child has had (e.g., psychotherapy, family therapy, inpatient 
or residential treatment): 



Type of treatment 



Dates 



Name of therapist 



Have there been any recent stressors that you think may be contributing to your child's difficulties (e.g., illness, deaths, opera- 
tions, accidents, separations, divorce of parents, parent changed job, changed schools, family moved, family financial prob- 
lems, remarriage, sexual trauma, other losses)? 



Other Information 

What are your child's favorite activities? 



Has your child ever been in trouble with the law? □ Yes □ No 
If yes, please describe briefly: 



On the average, what percentage of the time does your child comply with requests or commands? 



What have you found to be the most satisfactory ways of helping your child? 



What are your child's assets or strengths? 



Is there any other information that you think may help me in assessing your child? 



[continued) 



73 



74 A Compendium of Neuropsychological Tests 

Figure 3-2 (continued) 



Thank you for filling out this questionnaire. 



Test Selection, Test Administration, 
and Preparation of the Patient 



TEST SELECTION 

Although the terms "psychological testing" and "psychological 
assessment" are often used synonymously, they do represent 
different aspects of practice. While "testing" is one aspect, of- 
ten referring to the administration of a particular scale to ob- 
tain a specific score, psychological assessment is the more 
appropriate term for the evaluation of individual clients and 
includes the "integration of test results, life history informa- 
tion, collateral data, and clinical observations into a unified 
description of the individual being assessed" (Hunsley, 2002, 
p. 139). It is "a complex activity requiring the interplay of 
knowledge of psychometric concepts with expertise in an area 
of professional practice or application. Assessment is a con- 
ceptual, problem-solving process of gathering dependable, 
relevant information ... to make informed decisions" 
(Turner et al, 2001, p. 1100). As such, it provides a valuable 
source of clinical information, perhaps as informative as that 
provided by medical tests (Meyer et al., 2001). 

The American Psychological Association (APA; Turner 
et al., 2001) has developed guidelines to inform test users and 
the public of the qualifications the APA considers important 
for the competent and responsible use of psychological tests. 
The guidelines indicate that psychologists possess (a) core 
generic psychometric knowledge and skills and (b) specific 
qualifications for the responsible use of tests in particular set- 
tings or for specific purposes. With regard to test selection, the 
guidelines specify that "test users should select the best test or 
test version for a specific purpose and should have knowledge 
of testing practice in the content area and of the most appro- 
priate norms when more than one normative set is available. 
Knowledge of test characteristics such as psychometric prop- 
erties, basis in theory and research, and normative data . . . 
should influence test selection" (Turner et al., 2001, p. 1 101). 

Thus, test selection should be based on knowledge of the 
literature and the appropriateness of a particular test for a 
given individual under a unique set of circumstances and the 
distinctive set of referral questions. Well-standardized tests 



should be chosen, as the standardization process minimizes 
the error variance within a patient's assessment data (Miller & 
Rohling, 2001). 

Norms 

The reader is referred to Chapter 2 (Norms Selection in Neu- 
ropsychological Practice) for a discussion of factors to consider 
with regard to the adequacy of norms. 

Approaches to Assessment 

There are three main approaches to assessment. In the fixed 
battery approach, the clinician gives the same tests to every pa- 
tient regardless of the specific referral question. In the flexible 
battery approach, the unique nature of the patient's deficits 
guides the choice of tasks, and this selection may vary from 
one patient to another. An intermediate position is the flexible 
battery approach in which the assessment is tailored so that 
homogeneous groups of patients are routinely given specific 
subsets of tests. 

In our practice, the selection of tests for a given patient 
usually starts with a "core battery" based on the information 
available (or initial hypotheses) about the patient's problem 
(e.g., epilepsy, traumatic head injury, dementia, etc.) and the 
specific referral questions (e.g., differential diagnosis, post- 
treatment reevaluation, routine follow-up). Thus, we may be- 
gin with a general screening battery (e.g., RBANS, NAB, 
DRS-2) designed to be sensitive to various conditions and fol- 
low up with more detailed testing. Alternatively, depending 
on the patient and our understanding of the presenting prob- 
lem^), our approach may incorporate a routine grouping of 
tests that includes an intelligence test appropriate for the age 
of the patient and several tests in the areas of presumed deficit 
(e.g., tests of memory, concentration/attention, executive func- 
tion) to confirm the presence and severity of the deficit. Tests 
sampling other domains (e.g., language, sensory function, 
mood) are also included to explore alternative hypotheses, 



75 



76 A Compendium of Neuropsychological Tests 



obtain a broad overview of the person's functioning, and pro- 
vide information useful for a given question or referral 
source. The remainder of the test selection is often "open- 
ended" (i.e., not all tests to be given in a particular case are de- 
termined in advance). Rather, some tests are selected after a 
review of the results of the initial testing, of the client's com- 
plaints, and of the person's behavior during testing. In this 
way, we can clarify relevant issues and more precisely charac- 
terize the nature of any areas of impairment. Occasionally, it 
is obvious that the patient fatigues easily or is likely to fail de- 
manding tests. In that case, the selection of additional or al- 
ternative tests is critical, and must focus directly on the target 
problems and the patient's capacity to work with the exam- 
iner. When we work exclusively with a particular population, 
we often use specific "custom" batteries designed with a flexi- 
ble approach but administered routinely to all patients with 
similar presenting problems. This facilitates retesting for clin- 
ical purposes as well as using data for research and program 
evaluation. 

The conorming of some commonly used tests (e.g., Wech- 
sler intelligence with memory and with achievement tests, 
MAYO norms, NAB) permits the comparison of one set of 
test scores directly with another. However, such coordinated 
norming is relatively uncommon. Thus, in most cases, the 
flexible battery approach does not allow the use of score com- 
parisons across tests with common metrics. This is a distinct 
advantage of fixed batteries, or of "impairment indices" re- 
sulting from fixed test batteries (e.g., the Halstead Impair- 
ment Index, the GNDS: Oestreicher & O'Donnell, 1995; 
Reitan & Wolfson, 1988; HRNES: Russell, 2000a, 2000b). 
There are also statistical problems (e.g., the likelihood of ob- 
taining at least one score in the impaired range increases as 
more tests are administered) and biases (hind-bias, over- 
reliance on salient data, underutilization of base rates, failure 
to take into account covariation) inherent in the administra- 
tion of multiple tests. Miller and Rohling (2001) presented a 
statistically based method for calculating summary scores and 
interpreting data within a flexible battery approach. Test 
scores are converted to a common metric (T scores) and as- 
signed to particular cognitive domains. Overall, domain and 
individual test battery means are then evaluated with refer- 
ence to premorbid estimates using various statistical indices. 
The percentage of tests that fall in the impaired range is also 
calculated. Rohling et al. (2003) have provided some evidence 
supporting their claim that their method of analysis for gen- 
erating summary scores from a flexible battery performs as 
well as the GNDS, Average Impairment Rating, and the Hal- 
stead Impairment Index in determining neuropathology. 

Sweet et al. (2000a, 2000b) report that, according to a 
survey of 422 neuropsychologists in 1999, the flexible bat- 
tery approach (i.e., variable but routine groupings of tests 
for different types of patients such as head injury, alco- 
holism, elderly, etc.) is the favorite, with endorsement rates 
of about 70%. Fixed (or standard) battery use has dwindled 
to about 15% as has a totally flexible approach to test selection. 
Thus, impairment indices have become more obsolete as 



neuropsychologists are less and less called upon to deter- 
mine whether deficits are "organic or not organic," a ques- 
tion that is now better answered by neuroradiological and 
electrophysiological techniques. Instead, neuropsychological 
evaluations have moved toward providing a detailed de- 
scription of the nature of the deficit, its impact on daily liv- 
ing, and its rehabilitation. 

A variant of the flexible battery framework is the Boston 
"process" approach, which focuses on a qualitative exploration 
of how the patient attained the test score and how he or she 
succeeded or failed at a task (Kaplan, 1988; Lezak et al., 2004). 
The process approach requires careful observation of the pa- 
tient's strategy during each task and a follow-up of unusual 
approaches or errors by questioning or by the readministra- 
tion of tasks with modified materials or instructions to clarify 
the nature of the specific deficit underlying poor performance 
(see Milberg et al., 1986). Thus, to understand how the neu- 
ropathological process has impacted response strategy, the ex- 
aminer may "test the limits" by allowing the person more time 
to complete the problem or by providing specific structure or 
cueing not present in the standard administration format 
(Bauer, 2000). In general, the Boston Process Approach has 
been criticized for not having sufficient norms or detailed in- 
formation regarding reliability and validity (e.g., Erickson, 
1995; Slick et al., 1996). In addition, the practice of readminis- 
tering tasks in a modified or nonstandard fashion complicates 
readministration. Thus, when patients have been evaluated 
using a process approach in the past, interpretation of results 
from readministration with the standard instrument should be 
done with caution. Process-based versions of well-known intel- 
ligence tests have been published (e.g., WAIS-R NI, WISC-IV 
Integrated), and recently, proponents of the Boston Process 
Approach have published versions of list-learning and execu- 
tive functioning tasks that are normed and based on advance- 
ments in cognitive psychology (e.g., CVLT-2, D-KEFS). 
Parallel to this trend, researchers have attempted to quantify 
the process by which patients solve existing tests (e.g., by sup- 
plementing traditional memory tests with incidental and 
recognition tasks, by generating new indices of clustering and 
switching for fluency tasks; Poreh, 2000). These tests and tech- 
niques show promise in their ability to parse neuropsycholog- 
ical processes in a way that might allow a more detailed and 
thorough detection and characterization of neuropsychologi- 
cal deficits. 



TEST ADMINISTRATION 

Timing of Assessment 

Lezak et al. (2004) recommend that formal assessment typi- 
cally should not be done during the acute or postacute pe- 
riod. During the first 3 months, changes in the patient's 
status can be so rapid as to make test results obtained in one 
week obsolete by the next. In addition, fatigue often has 
significant deleterious effects during this period, making it 



Test Selection, Test Administration, and Preparation of the Patient 77 



difficult to evaluate actual patient competencies. Examiners 
should also bear in mind that both fatigue and awareness of 
poor performance can activate symptoms of depression and 
anxiety, which can further interfere with cognitive function- 
ing and perpetuate and exacerbate symptoms. Accordingly, 
scheduling an initial assessment typically occurs within 
about three to six months after the event, unless early base- 
line testing is needed to document severity of deficits and to 
track progress over time. 

The assessment may be needed for a variety of reasons in- 
cluding to determine if the individual is able to resume previ- 
ous activities (e.g., employment, schooling); to evaluate 
competency (e.g., manage financial affairs); to ascertain 
strengths and weaknesses; to provide information for diag- 
nostic purposes; to help determine the types of adaptations, 
supports, or remediations that may be needed; and to provide 
input regarding likely long-term outcome. The course of re- 
covery is frequently prolonged (e.g., following traumatic 
brain injury), with many patients experiencing dramatic 
change in outcome from the time of insult to one year 
postevent (Dikmen et al., 1995). The rate of change tends to 
slow in subsequent years (Powell & Wilson, 1994). Accord- 
ingly, a second assessment may be needed about 6 to 12 
months after the first to provide firmer opinions regarding 
outcome. In evolving conditions (e.g., dementia, MS, Parkin- 
son's), additional assessments at one- or two-year intervals 
may also be useful (for a more complete discussion, see Lezak 
etal.,2004). 

Amount of Time Required 

Not surprisingly, a large amount of time is spent in test ad- 
ministration alone. We include administration times for the 
measures provided in this volume. A recent survey of U.S. 
clinical neuropsychologists revealed an average estimate of 
about five hours, with forensic evaluations taking consider- 
ably longer (Sweet et al., 2002). The reader can also refer to a 
useful survey from Lundin and DeFilippis (1999) pertaining 
to usual times to complete specific neuropsychological tests 
(note this refers to time to administer, score, interpret, and 
report). 

Use of Psychometrists 

Although many neuropsychologists prefer to administer 
the tests personally, test administration by a well-trained psy- 
chometrician is both accepted and widespread (Brandt & 
van Gorp, 1999; The NAN Policy and Planning Committee, 
2000a). Sweet et al.'s (2000a; 2002) surveys showed that be- 
tween 42% and 69% of neuropsychologists use a psychometri- 
cian. In general, neuropsychologists in private practice are less 
likely to use an assistant than clinicians employed in institu- 
tions. However, most (>75%) conducted the interview per- 
sonally and most (>75%) observed the patient during some 
part of the testing. Record review, interpretation and report 
write-up, patient feedback, and consultation to the referral 



source are almost always (>80%) conducted by the clinician 
(Sweet etal, 2002). 

Written Notes 

Regardless of who does the testing (clinician, technician), it is 
important to keep detailed written notes about the patient's 
behaviors and responses during the course of the assessment. 
This is critical because the examiner may forget salient infor- 
mation or not realize its significance until after the examina- 
tion is complete. It is also good practice to keep a continuous 
record of the order in which the tests were administered as 
well as the time used for each test and rest period. This helps 
later if the potential effects of interference or fatigue need 
clarification. 



Computer Administration 

This is becoming popular, in part because of the advantages 
offered to the neuropsychologist in terms of presentation for- 
mat. Thus, extensive training is not required for the tester, and 
by limiting client/examiner interaction, a potential source of 
bias is removed and a truly standard administration is achieved. 
The computer provides precisely the same test each time; 
responses are automatically recorded and analyzed, also re- 
ducing error. The program can also be modified to allow con- 
siderable variation in test parameters. Another advantage is 
the ease with which different indices of performance can be 
obtained (e.g., accuracy, reaction time, variability). A number 
of tests have been specially developed for computer adminis- 
tration (e.g., ANAM: Bleiberg et al., 2000; CANTAB: Robbins 
et al, 1994; CARB: Condor et al., 1992; CPT-II: Connors & 
MHS staff, 2000; VSVT: Slick et al., 1997). However, close su- 
pervision is needed for computer- administered tests to ensure 
that the patient is able to follow instructions properly to pro- 
vide valid results. 

Computer-administered tests do not necessarily provide 
identical or even similar results as the same test administered 
by an examiner in person (Feldstein et al., 1999; Ozonoff, 
1995; Van Schijndel & van der Vlugt, 1992). Hence, the existing 
normative data may not be applicable. For example, a study by 
Feldstein et al. (1999) compared the distribution properties of 
central tendency, variability, and shape (e.g., skewedness) be- 
tween the manual version of the WCST and four computerized 
versions. None of the computerized versions was found to be 
equivalent to the manual version on all assessment measures, 
suggesting that norms provided for the standard version could 
not be used for the computer versions. Rather, new norms 
needed to be established for each computer version. A study by 
French and Beaumont (1987) compared automated and stan- 
dard forms of 8 tests in 367 subjects: While some tests, includ- 
ing Raven's Progressive Matrices, the Mill Hill Vocabulary Test, 
and Eysenck's Personality Questionnaire, showed acceptable 
reliability, others, especially the Digit Span Test and the Dif- 
ferential Aptitude Test, produced quite low reliabilities. The 
authors concluded that some tests were not amenable to 



78 A Compendium of Neuropsychological Tests 



automation. It is also possible that computer administration 
can mask deficits that would otherwise be apparent in some 
populations. For example, some individuals (e.g., those with 
autism) may perform better when faced with a computer than 
a person, tapping cognitive functions at a level that is rarely de- 
manded in real-life settings (Luciana, 2003). In short, using a 
computer for administration of standard tests requires 
demonstrating that meaningful and equivalent results can 
be obtained with computer testing. Moreover, there is some 
evidence that as computer-related anxiety increases, perfor- 
mance on computer administered measures tends to decrease 
(Browndyke et al, 2002). 

Order of Administration 

In general, test order has little impact on performance. For ex- 
ample, participants in the standardization sample took the 
WAIS-III and WMS-III in one session, in roughly counterbal- 
anced order. There were few test-order effects noted (Zhu & 
Tulsky, 2000). When tests (e.g., WCST, Category Test) do show 
order of administration effects, the clinician should consider 
which of the instruments will provide the best information 
regarding the referral question and should use only that test 
for the patient. 

There are a number of issues to consider with regard to or- 
dering the tests. Thus, test administration does require careful 
planning to avoid interference effects. For example, various 
memory tests call for the recall of stimuli after a period of 10 
to 30 minutes. While the examiner may wish to fill this delay 
with other tasks, it is important that tests with similar visual 
or verbal content be avoided because the patient may substi- 
tute the content of intervening tests during the delayed recall 
of the first test. 

There are other considerations as well. For example, some 
measures (e.g., broad-ranging measures such as Wechsler In- 
telligence Scales) are frequently placed early in the examina- 
tion to answer initial questions and to generate hypotheses 
regarding spared and impaired functions. Others (e.g., rela- 
tively easy ones) may be saved for last so that the patient 
leaves with a sense of success. Certain measures of motiva- 
tional status are best placed at the very beginning of a neu- 
ropsychological examination. Some of these tasks (e.g., Rey 
15-Item, 21 -Item) likely will prove less sensitive to exaggera- 
tion if used reactively (i.e., if used midway through the evalu- 
ation after suspicions of biased effort have been raised). The 
patient will have been exposed to a variety of difficult neu- 
ropsychological measures so that these tasks will appear rela- 
tively simple and straightforward. 



NEUROPSYCHOLOGICAL ASSESSMENT 

Informed Consent 

The 2002 APA Ethics Code, effective June 1, 2003, specifies the 
need for informed consent for assessments, evaluations, or 



diagnostic services, albeit with several exceptions in which 
patient assent represents the appropriate standard of care 
(Johnson-Greene, 2005). The code states that consent should 
include (a) an explanation of the nature and purpose of the 
assessment, (b) fees, (c) involvement of third parties, and 
(d) limits of confidentiality. The examinee should also be pro- 
vided with sufficient opportunity to ask questions and receive 
answers. Johnson-Greene (2005) states that there are also 
good practical and ethical reasons to provide information 
concerning the referral source, foreseeable risks, discomforts 
and benefits, and time commitment, as such elements may 
well be intrinsic to consent that really is adequately informed. 
Informed consent is not required in some instances in 
which assent, as defined as the absence of objection to as- 
sessment procedures, would be considered sufficient. Such 
situations include the following (Johnson-Greene, 2005): 

(a) testing is mandated by law or governmental regulations, 

(b) informed consent is implied because testing is conducted 
as a routine educational, institutional, or organizational activ- 
ity, or (c) where the purpose of testing is to evaluate deci- 
sional capacity. In such cases, as well as with children, patients 
should be provided with basic information about the proce- 
dures, their preferences should be noted, and their assent 
should be documented along with the consent of a legally 
authorized person. Forensic cases are viewed similarly in 
that a normal doctor-patient relationship does not exist but 
the basic components of patient assent would be expected 
(Johnson-Greene, 2005). Persons undergoing forensic evalu- 
ations may also be precluded from receiving an explanation 
of their test results normally afforded to patients, which 
should be explained in advance of any forensic evaluation 
(Johnson-Greene, 2005). 

The National Academy of Neuropsychology strongly en- 
courages neuropsychologists to provide informed consent to 
patients seeking services and views its conveyance as a basic 
professional and ethical responsibility. A flowchart (Johnson- 
Greene, 2005) is provided in Figure 4-1, which outlines the 
process of determining consent content and conveyance. A 
sample informed consent document (Johnson-Greene, 2005) 
is provided in Figure 4-2. 

Additional Considerations 

Relatively little attention has been paid to the consumer side of 
neuropsychological assessments (see also Prigatano, 2000). To 
be of maximum benefit to the patient, however, the neuropsy- 
chologist should be prepared to follow some of the basic rules 
that emerged from a mail follow-up of 129 outpatients seen in 
five Australian centers. These rules would seem to apply equally 
to other geographic locations (adapted from Bennett-Levy 
etal, 1994): 

• Patient Preparation. Patients are usually not prepared 
for what to expect during an assessment. Sixty percent 
of the patients in the study had no information on 
what to expect or were unaware that the assessment 



Test Selection, Test Administration, and Preparation of the Patient 79 



Figure 4-1 Flowchart for informed consent. Source: Johnson-Greene, 2005. Reprinted with 
permission from Elsevier. 



Special Referral Situations 

Mandated by law or governmental 

regulations 

Routine educational, institutional, or 

organizational activity where consent is 

implied 

Evaluation of decisional capacity 



Yes 



No 



Obtain Patient's Assent for 
Assessment Procedure 

Explain nature and purpose of the 

assessment 

Use language that is reasonably 

understandable to the person being 

assessed 

Consider patient's preferences and 

best interests 

Take reasonable steps to protect 

patient's rights and welfare 

Obtain substitute consent from 

authorized person when permitted 

or required by law* 

Document written or oral assent 



*Note: When substitute consent from authorized 
persons is unavailable, contact the Department of 
Social Services in your state. 



Patient Competency 

Patient presumed or 
known to be competent 




Yes 



Consider Assessment 
Characteristics 

Source of referral 

Referral question(s) and goal of 

the assessment 

c. Anticipated uses of assessment 

d. Inpatient or outpatient setting? 

e. Involvement of third parties? 

f. Special legal mandates or 
circumstances? 

g. Special limits of confidentiality? 
h. Use of interpreter? 

i. Recording of voice or images? 



Obtain Patient's Consent for 
Assessment Procedure 

a. Content 

1 . Referral source 

2. Purpose of the assessment 

3. Forseeable risks, discomforts, 
and benefits 

4. Fees and testing time 

5. Limits of confidentiality 

6. Involvement of third parties 

b. Provide opportunity for patient to ask 
questions and receive answers 

c. Ask probing questions to assess 
understanding 

d. Document written or oral consent 
(varies depending on situation) 



could take up to three hours. This can be remedied by 
educating referring agents, sending an informative 
letter before the assessment (e.g., who will be 
performing the assessment, how long it will last, what 
to bring), and giving an introduction before the 
beginning of testing that can be tailored for children 
and adults (see also Hebben & Milberg, 2002; Sattler, 
2001; and Figure 4-2). 

Provision of feedback. Feedback on strengths and 
problem areas with suggestions about how to get 
around problem areas is recommended. 
Presence of third parties/recordings. Inviting a relative to 
accompany the patient may be beneficial to alleviate 



anxiety, but may also help in history taking and in the 
informing interview. In most cases, the accompanying 
person should be interviewed separately to avoid 
embarrassment to both patient and interviewee. Also, 
the discrepancies between statements made by the in- 
formant and the patient may provide clues about the 
patient's insight or awareness of deficits and about the 
effects of such deficits on daily functioning. However, 
the presence of third-party observers in the test 
situation (or behind one-way mirrors or via 
electronic recording) is generally discouraged since it 
creates the potential for distraction and may increase 
the risk that motivational disturbances may impact 



Figure 4-2 Sample informed consent. Source: Johnson-Greene (2005). Reprinted with 
permission from Elsevier. 



Please note that this is a general template for informed consent that may not apply to 
your specific jurisdiction. It is recommended that psychologists seek advice from per- 
sonal counsel to determine if this consent is appropriate for their specific jurisdictions. 

Referral Source: You have been referred for a neuropsychological assessment (i.e. 

evaluation of your thinking abilities) by 

(name of referral source). 



Nature and Purpose of Assessment: The goal of neuropsychological assessment 
is to determine if any changes have occurred in your attention, memory, language, 
problem solving, or other cognitive functions. A neuropsychological assessment may 
point to changes in brain function and suggest possible methods and treatments for re- 
habilitation. In addition to an interview where we will be asking you questions about 
your background and current medical symptoms, we may be using different techniques 
and standardized tests including but not limited to asking questions about your knowl- 
edge of certain topics, reading, drawing figures and shapes, listening to recorded tapes, 
viewing printed material, and manipulating objects. Other specific goals and anticipated 
uses of the information we gather today include the following: 



Foreseeable Risks, Discomforts, and Benefits: For some individuals, assess- 
ments can cause fatigue, frustration, and anxiousness. Other anticipated risks, discom- 
forts, and benefits associated with this assessment include the following: 



Fees and Time Commitment: The hourly fee for this assessment is per 

hour. Assessments may take several hours or more of face-to-face testing and several 
additional hours for scoring, interpretation, and report preparation. This evaluation is 

estimated to take approximately hours of face-to-face assessment time. 

Though the fees are generally covered by insurance, patients are responsible for any 
and all fees for the assessment. 

Limits of Confidentiality: Information obtained during assessments is confidential 
and can ordinarily be released only with your written permission. There are some spe- 
cial circumstances that can limit confidentiality, including (a) a statement of intent to 
harm self or others, (b) statements indicating harm or abuse of children or vulnerable 
adults, and (c) issuance of a subpoena from a court of law. Other foreseeable limits to 
confidentiality for this assessment include: 



I have read and agree with the nature and purpose of this assessment and to each of the 
points listed above. I have had an opportunity to clarify any questions and discuss any 
points of concern before signing. 



Patient Signature Date 



Parent/Guardian or Authorized Surrogate (if applicable) Date 



Witness Signature Date 



80 



Test Selection, Test Administration, and Preparation of the Patient 8 1 



test performance (the NAN Policy and Planning 
Committee, 2000b). Thus, even audio recording affects 
neuropsychological test performance. Constantinou 
et al. (2002) found that in the presence of an audio 
recorder, the performance of participants on memory 
(but not motor) tests declined. Also bear in mind that 
neuropsychological test measures have been 
standardized under a specific set of highly controlled 
circumstances that did not include the presence of a 
third-party observer. Therefore, the presence of a third- 
party observer may represent a threat to the validity 
and reliability of the data generated under such 
circumstances. In addition, exposure of test procedures 
to nonpsychologists jeopardizes the validity of these 
methods for future use (The NAN Policy and Planning 
Committee, 2000c). Note, however, that there are 
circumstances that support the presence of a neutral 
party in nonforensic situations (e.g., in the training of 
students; a parent providing a calming influence 
during the evaluation of a child). 

• Reduction of stress. The patient's anxiety should be 
alleviated as much as possible. The testing experience 
can have a significant impact on self-confidence. 
Reassurance that not all items of a test can be 
completed by most clients or that on some tests (e.g., 
PAI, MMPI-2) there are no "correct" answers should be 
provided routinely. Most clients understand that a test 
contains high-difficulty items to avoid a ceiling effect 
for very bright subjects. 

• Reduction of discomfort/fatigue. A comfortable testing 
environment should be provided, and the patient 
should be asked how it can be made more 
comfortable. More than 90% of clients in the Bennett- 
Levy study mentioned that it was too hot, too cold, or 
too noisy. Some patients complain of backache; the 
availability of a back support may alleviate these 
complaints. Children should be provided with 
appropriately sized desks and chairs. Adequate rest 
breaks and hot and cold refreshments should be 
provided. Usually, a break after 1 Vi hours of testing is 
indicated. The patient should be offered the choice of 
a one-session assessment, or a split one. Seventy-two 
percent of clients in the Bennett-Levy study stated 
that, for a 3-hour assessment, they would have 
preferred two sessions instead of one. 

Cooperation/Effort 

In situations where the client is a "third party" (i.e., in forensic 
assessments where the psychologist works for the defendant 
or insurer, ostensibly "against" the client), the client should be 
assured that the aim of the assessment is to provide a valid 
picture of his or her strengths and weaknesses. The client 
should be encouraged to cooperate fully to avoid misleading 
results. 



Most patients who are referred for neuropsychological ex- 
aminations try to perform optimally, particularly if they 
understand the reasons for testing and that their efforts may 
improve their opportunities with regard to their treatment, 
job, or school performance. We rarely encounter clients who 
refuse outright to collaborate. In such cases, a repeated brief 
discussion of the purpose of the assessment and/or a break is 
indicated. If a patient refuses to collaborate on a given test, 
switching to a very different test may be an option. General 
refusal to cooperate, though extremely rare, should be ac- 
cepted by the examiner, who then discontinues the session. In 
such cases, offer the patient a face-saving way out of the situa- 
tion by assuring him or her that the tests can be scheduled at 
another time when the patient feels better or if he or she 
changes his or her mind. Test refusal in children may also in- 
dicate poor underlying skills (Mantynen et al., 2001). 

Some patients, however, do not display optimal effort. This 
may occur for a variety of reasons. For example, the person 
may be ill, may not understand the reasons for the assessment, 
or may gain (e.g., financially) from poor performance. A re- 
cent survey (Slick et al., 2004) revealed that experts rely on in- 
dicators of suboptimal performance from conventional tests 
and always give at least one symptom validity test, although 
the precise measure varies from one expert to another. There 
was no firm consensus on when the tests should be given or 
whether warnings at the outset of testing should be provided. 
Once suspicion is aroused, however, experts typically alter test 
routines. They administer additional symptom validity tests 
and encourage clients to give good effort. On occasion, they 
discontinue entirely the testing session. 

Testing of Children 

When testing children or adolescents, the establishment of a 
good rapport is especially important. As with adults, an atti- 
tude of acceptance, understanding, and respect for the patient 
is more important than entertaining tricks to establish and 
maintain rapport. Older children may think of the testing as 
similar to a school test situation, which may evoke fear of fail- 
ure and anxiety. The use of reinforcements such as stickers or 
tokens used in many schools may contribute to better coopera- 
tion. However, at present, there is little evidence that scores ob- 
tained using tangible or social reinforcement provide better 
estimates of children's abilities than those obtained under 
standard administration conditions (Sattler, 2001). It should 
also be noted that most tests were not standardized with the 
use of incentives. Therefore, their use should be reserved for 
exceptional circumstances (e.g., an extremely uncooperative 
child) and their use should be noted in the test protocol and 
report. Reassurance and encouragement for the effort may 
work better than praise for correct responses, because most 
tests progress from easy to difficult items, where failure is in- 
evitable (Sattler, 2001). Younger children and children with be- 
havior problems may try to manipulate the examiner by asking 
questions, refusing to answer, getting up, constantly asking for 



82 A Compendium of Neuropsychological Tests 



breaks, or even displaying open hostility or leaving the testing 
room. This can be avoided by changing to tasks with which the 
examinee is more comfortable. The examiner should keep in 
mind that the behavior of the child does not reflect on the ex- 
aminer, but shows a manner of "coping" the child may use in 
similar situations and that is a clinically relevant piece of infor- 
mation. The examiner should also note whether refusals to co- 
operate occur only with certain types of tasks (e.g., young 
children with aphasic disturbances may act out during verbally 
challenging tasks but show interest in nonverbal tasks). Open 
confrontation ("Do you want to do this, or should I send you 
back to your parents?") should be avoided. In cases of outright 
test refusal, although test scores may not be interpretable (i.e., 
scores could be poor due to refusal or to inability), the testing 
behavior itself will still be informative, particularly when taken 
in the context of other sources of information (e.g., parents' 
and teachers' questionnaires). It is important to note that it 
may be better to discontinue testing altogether in an uncoop- 
erative child than to obtain test data that may lead to erro- 
neous estimates of cognitive ability. Retesting can be attempted 
when the child is older or when the testing situation is less 
stressful for the child (e.g., following discharge from hospital 
or after family intervention). 

Testing Older Adults 

When evaluating older adults, it is critical that the examiner 
determine that the patient's vision and audition are sufficient 
for adequate task performance, and if not, to assist him or her 
to compensate for any losses (Lezak et al, 2004). Storandt 
(1994) discusses characteristics of elderly patients that may 
affect the progress and the results of the assessment: for the 
overly loquacious, usually elderly adult who may frequently 
stray off-target and may relay long personal stories while be- 
ing tested, Storandt recommends a pleasant but businesslike 
attitude on the part of the examiner, deferring all discussions 
to the end of the session, explaining that during testing there 
is much to be covered. Again, the examiner should be aware 
that tangential and overly talkative behavior may be clinically 
significant and be evidence of possible executive or compre- 
hension difficulties. In contrast, the depressed or despondent 
patient may require considerable encouragement and pa- 
tience from the examiner (and also more time). In all cases, it 
is important that the examiner maintains a friendly, neutral 
attitude and be aware of countertranference issues (e.g., view- 
ing the elderly patient as a parent, showing irritation toward a 
difficult child). 

Circadian arousal may impact performance, at least in 
older adults (May & Hasher, 1998; Paradee et al, 2005). There 
is evidence that when older adults are tested at their nonopti- 
mal times (i.e., evening), processing is compromised. Thus, 
time of testing must be considered in the administration of 
tasks to allow appropriate assessment of behavior in both 
single-test and test-retest situations. It is also possible that 
normative data need to be reevaluated with testing time con- 
trolled (May 8c Hasher, 1998). 



Test Modifications and Testing Patients With 
Special Needs or English as a Second Language 

With patients who have significant disabilities (e.g., problems 
with vision, audition, or the use of upper limbs), it is often 
necessary to modify standard testing procedures, either by al- 
lowing more time or by using modifications of the response 
mode (e.g., pointing rather than giving a verbal response). 
Most modifications invalidate the existing norms, although 
they may lead to valid inferences. Braden (2003) noted that 
the essential purpose of accommodations is to maintain as- 
sessment validity by decreasing (and hopefully eliminating) 
construct irrelevant variance. He noted that assessments are 
intended to measure certain constructs; however, the process 
may capture other, unintended constructs. For example, a low 
score on the Wechsler Performance Scale in a vision-impaired 
client confounds visual acuity with the construct intended to 
be measured (fluid intelligence). However, clinicians may un- 
derrepresent the construct of interest when they use methods 
that fail to capture adequately the intended construct. Thus, 
omitting the Performance Scale would reduce construct irrel- 
evant variance but may underrepresent the construct of intel- 
ligence (limiting it to an assessment of crystallized ability). 
The clinician might ensure representation of fluid abilities by 
including an orally administered test of fluid abilities (e.g., a 
verbal analogies test) along with the standard Wechsler verbal 
subtests. Accommodations must balance the need to reduce 
construct irrelevant variance with the simultaneous goal of 
maintaining construct representation, or they run the risk of 
invalidating the assessment results (Braden 2003). Any ac- 
commodations should be clearly noted on the test protocols 
as well as stated in the report, and any conclusions (including 
limitations in representing the construct) should be qualified 
appropriately (see Braden, 2003, and Sattler, 2001, for sugges- 
tions on test administration for visually impaired and other 
populations). 

The bilingual patient with English as a second language de- 
serves specific considerations. Even though the patient's En- 
glish may appear fully fluent, some first-language habits, such 
as silent counting and spelling in the first language, frequently 
persist and may invalidate some test results requiring these 
skills (e.g., digit repetition, recitation of the alphabet). Lan- 
guage preference and the amount and quality of education re- 
ceived within North America significantly impact cognitive 
performance, even when traditional demographic variables 
(e.g., age, level of education) are taken into account (Harris 
et al., 2003). If the examinee does not indicate a preference for 
English and is only a recent resident in North America, then 
his or her scores may be adversely affected, and as the individ- 
ual diverges from the standardization sample, the norms may 
become less meaningful (Harris et al., 2003; O'Bryant et al., 
2004). If the examination proceeds in English, then this should 
be noted in the report. 

Even more problems are posed by the patient with poor 
English and different sociocultural experiences; this may in- 
validate not only verbal tests, but also the so-called nonverbal 



Test Selection, Test Administration, and Preparation of the Patient 83 



and culture-free tests with complex instructions. Sociocul- 
tural effects have been reported even for seemingly nonverbal 
tasks, such as the TPT, Seashore Rhythm, Category Test, and 
Performance portions of the WAIS-III (e.g., Arnold et al., 
1994; Harris et al, 2003; Shuttleworth-Edwards et al., 2004b). 
Of note, recent research suggests that the WAIS-III Digit 
Symbol-Incidental Learning optional procedure (Pairing and 
Free Recall) may be a relatively culture-independent task with 
utility as a neuropsychological screening instrument. 
Shuttleworth-Edwards et al. (2004a) gave the task to a South 
African sample stratified for ethnicity in association with lan- 
guage of origin (white English first language versus black 
African first language), education level (grade 12, graduate), 
and quality of education (advantaged, disadvantaged). They 
found no significant differences for ethnicity/language of ori- 
gin and level or quality of education. 

The psychologist may wish to resort to an interpreter, but 
interpreters are not skilled clinicians and good practice sug- 
gests that the patient be referred to a colleague who is fully 
fluent in that language (Artiola I Fortuny & Mullaney, 1998). 
In addition to limited clinical skills of interpreters, there is no 
way for the clinician to assess whether clinician and patient 
statements are in fact translated accurately, which seriously 
undermines any conclusions that are made from interview 
data. Inability to communicate directly makes it difficult to 
assess not only the patient's statements, but also the quality 
and fluency of the language produced, the modulations of 
phrasing, the mood, and the level of cooperation (Artiola I 
Fortuny & Mullaney, 1998). However, in the event of a highly 
uncommon language and the unavailability of a colleague 
from the patient's country or linguistic group, use of an inter- 
preter may be unavoidable. Artiola I Fortuny and Mullaney 
state that in such cases, a professional interpreter or an indi- 
vidual with advanced fluency should be used. Friends or rela- 
tives of the patient should not be used because of the 
potential to contaminate data collection in ways that the neu- 
ropsychologist cannot appreciate. It should be noted that a 
simple translation of English-language tests may also distort 
the results. In particular, translated verbal tests may not be 
equivalent to the original English version in terms of lexical 
difficulty, lexical frequency, linguistic correctness, and cultural 
relevance and thus invalidate the available norms. The use of 
foreign-language versions of existing tests (e.g., the Wechsler 
Scales are available in a large number of foreign adaptations; 
see Harris et al, 2003, for a recent listing) maybe appropriate, 
but this approach may also lead to problems: A test adapted 
and standardized for Spain or France may not contain items 
appropriate for Spanish-speaking Americans or for Quebec 
Francophones, nor would the published norms for such tests be 
valid. Fortunately, some local (e.g., Hispanic, French-Canadian) 
language versions have been developed, and these are listed 
together with foreign adaptations in the test descriptions of 
this book. In addition, a number of authors have published 
instruments with normative data applicable to Spanish- 
speaking populations (e.g., Artiola I Fortuny et al., 1999; 
Ostroski-Solis et al., 1999), although issues of reliability and 



validity (e.g., impact of racial socialization, sensitivity to neu- 
rological insult) require further study. For a more complete 
discussion of cross-cultural and minority concerns pertaining 
to assessment, see Artiola I Fortuny et al. (2005), Ferraro 
(2002), Gasquoine (2001), Manly et al. (2002), Ponton and 
Leon-Carrion (2001), and Shuttleworth-Edwards et al. (2004a, 
2004b), as well as Chapter 2 in this volume. 



REFERENCES 

Arnold, B. R., Montgomery, G. T., Castaneda, I., & Longoria, R. 
(1994). Acculturation and performance of Hispanics on selected 
Halstead-Reitan neuropsychological tests. Assessment, 1, 239-248. 

Artiola I Fortuny, L., & Mullaney, H. A. (1998). Assessing patients 
whose language you do not know: Can the absurd be ethical? The 
Clinical Neuropsychologist, 12, 113-126. 

Artiola I Fortuny, L., Romo, D. H., Heaton, R. K., & Pardee, R. E. 
(1999). Manual de normas y procedimeintos para la bateria 
neuropsicologica en espanol. Lisse, the Netherlands: Swets & 
Zeitlinger. 

Artiola I Fortuny, L., Garola, M., Romo, D. H., Feldman, E., Barillas, 
E, Keefe, R., Lemaitre, M. J., Martin, A. O., Mirsky, A., Monguio, 
I., Morote, G, Parchment, S., Parchment, L. J., de Pena, E., Politis, 
D. G, Sedo, M. A., Taussik, I., Valdivia, E, de Valdivia, L. S., & 
Maestre, K. V. (2005). Research with Spanish-speaking popula- 
tions in the United States: Lost in translation. A commentary and 
a plea. Journal of Clinical and Experimental Neuropsychology, 27, 
555-564. 

Bauer, R. M. (2000). The flexible battery approach to neuropsycho- 
logical assessment. In R. D. Vanderploeg (Ed.), Clinician's guide to 
neuropsychological assessment (2nd ed., pp. 419-448). Mahwah, 
NJ: LEA. 

Bennett-Levy, J., Klein-Boonschate, M. A., Batchelor, J., McCarter, R. 
& Walton, N. ( 1 994) . Encounters with Anna Thompson: The con- 
sumer's experience of neuropsychological assessments. The Clini- 
cal Neuropsychologist, 8, 219-238. 

Bleiberg, J., Kane, R. L., Reeves, D. L., Garmoe, W. S., & Halpern, E. 
(2000). Factor analysis of computerized and traditional tests used 
in mild brain injury research. The Clinical Neuropsychologist, 14, 
287-294. 

Braden, J. P. (2003). Accommodating clients with disabilities on the 
WAIS-III and WMS. In D. Tulsky, R. K. Saklofske, G Heaton, 
G Chelune, R. A. Ivnik, R. A. Bornstein, A. Prifitera, & M. Ledbet- 
ter, (Eds.), Clinical interpretation of the WAIS-III and WMS-III. 
San Diego, CA: Academic Press. 

Brandt, J., & van Gorp, W. G (1999). American Academy of Clinical 
Neuropsychology policy on the use of non-doctoral-level person- 
nel in conducting clinical neuropsychological evaluations. The 
Clinical Neuropsychologist, 13, 385. 

Browndyke, J. N., Albert, A. L., Malone, W., Schatz, P., Paul, R. H., Co- 
hen, R. A., Tucker, K. A., & Gouvier, W D. (2002). Computer- 
related anxiety: Examining the impact of technology-specific 
affect on the performance of a computerized neuropsychological 
assessment measure. Applied Neuropsychology, 4, 210-218. 

Condor, R., Allen, L., & Cox, D. (1992). Computerized assessment of 
response bias test manual. Durham, NC: Cognisyst. 

Conners, C. K., & MHS Staff. (2000). Conners' Continuous Perfor- 
mance Test II (CPT II). Toronto, Ontario: Multi-Health Systems, 
Inc. 



84 A Compendium of Neuropsychological Tests 



Constantinou, M., Ashendorf, L., & McCaffrey, R. J. (2002). When the 
third party observer of a neuropsychological examination is an 
audio-recorder. The Clinical Neuropsychologist, 16, 407-412. 

Dikmen, S. S., Machamer, J. E., Winn, H. R., & Temkin, N. R. (1995). 
Neuropsychological outcome at 1-year post head injury. Neu- 
ropsychology, 9, 80-91. 

Erickson, R. C. (1995). A review and critique of the process approach 
in neuropsychological assessment. Neuropsychology Review, 5, 
223-243. 

Feldstein, S. N, Keller, F. R., Portman, R. E., Durham, R. L., Klebe, 
K. J., & Davis, H. P. (1999). A comparison of computerized and 
standard versions of the Wisconsin Card Sorting Test. The Clini- 
cal Neuropsychologist, 13, 303-313. 

Ferraro, F. R. (Ed.). (2002). Minority and cross-cultural aspects of neu- 
ropsychological assessment. Studies on neuropsychology, develop- 
ment, and cognition. Bristol, PA: Swets & Zeitlinger Publishers. 

French, C. C, & Beaumont, J. G. (1987). The reaction of psychiatric 
patients to computerized assessment. British Journal of Clinical 
Psychology, 26, 267-278. 

Gasquoine, P. G. (2001). Research in clinical neuropsychology with 
Hispanic American participants: A review. The Clinical Neuropsy- 
chologist, 15, 2-12. 

Harris, J. G., Tulsky, D. S„ & Schultheis, M. T (2003). Assessment of 
the non-native English speaker: Assimilating history and research 
findings to guide clinical practice. In D. S. Tulsky, D. H. Saklofske, 
G. J. Chelune, R. K. Heaton, R. J. Ivnik, R. Bornstein, A. Prifitera, 
& M. F. Ledbetter (Eds.), Clinical Interpretation of the WAIS-III 
and WMS-III. New York: Academic Press. 

Hebben, N., & Milberg, W. (2002). Essentials of neuropsychological as- 
sessment. New York: John Wiley & Sons. 

Hunsley, J. (2002). Psychological testing and psychological assess- 
ment: A closer examination. American Psychologist, 57, 139-140. 

Kaplan, E. (1988). A process approach to neuropsychological assess- 
ment. In T Boll & B. K. Bryant (Eds.), Clinical neuropsychology and 
brain function: Research, measurement, and practice (pp. 127-167). 
Washington, DC: American Psychological Association. 

Johnson-Greene, D. (2005). Informed consent in clinical neu- 
ropsychology practice. Official statement of the National Acad- 
emy of Neuropsychology. Archives of Clinical Neuropsychology, 
20, 335-340. 

Lezak, M. D., Howieson, D. B., & Loring, D. W. (2004). Neuropsy- 
chological assessment (4th ed.). New York: Oxford University 
Press. 

Luciana, M. (2003). Practitioner review: Computerized assessment of 
neuropsychological function in children: Clinical and research 
applications of the Cambridge Neuropsychological Testing Auto- 
mated Battery (CANTAB). Journal of Child Psychology and Psychi- 
atry, 44, 649-663. 

Lundin, K. A., & DeFilippis, N. A. (1999). Proposed schedule of usual 
and customary test administration times. The Clinical Neuropsy- 
chologist, 1 3, 433-436. 

Manly, J. L., Jacobs, D. M., Touradji, P., Small, S. A., & Stern, Y (2002). 
Reading level attenuates differences in neuropsychological test 
performance between African American and White elders. Jour- 
nal of the International Neuropsychological Society, 8, 341-348. 

Mantynen, H., Poikkeus, A. M., Ahonen, T, Aro, T, & Korkman, M. 
(2001). Clinical significance of test refusal among young children. 
Child Neurology, 7, 241-250. 

May, C. P., & Hasher, L. (1998). Synchrony effects in inhibitory con- 
trol over thought and action. Journal of Experimental Psychology: 
Human Perception and Performance, 24, 363-379. 



Meyer, G. L., Finn, S. E., Eyde, L. D., Kay, G. G., Moreland, K. L., Dies, 
R. R., Eisman, E. J., Kubiszyn, T W, & Reed, G. M. (2001). Psy- 
chological testing and psychological assessment: A review of evi- 
dence and issues. American Psychologist, 56, 128-165. 

Milberg, W P., Hebben, N., & Kaplan, E. (1986). The Boston process 
approach to neuropsychological assessment. In I. Grant & 
K. M. Adams (Eds.), Neuropsychological assessment of neuropsychi- 
atry disorders (pp. 65-86). New York: Oxford University Press. 

Miller, L. S., & Rohling, M. L. (2001). A statistical interpretive 
method for neuropsychological test data. Neuropsychology Re- 
view, 11, 143-169. 

The NAN Policy and Planning Committee. (2000a). The use of neu- 
ropsychology test technicians in clinical practice: Official state- 
ment of the National Academy of Neuropsychology. Archives of 
Clinical Neuropsychology, 15, 381. 

The NAN Policy and Planning Committee. (2000b). Presence of 
third party observers during neuropsychological testing: Official 
statement of the National Academy of Neuropsychology. Archives 
of Clinical Neuropsychology, 15, 379. 

The NAN Policy and Planning Committee. (2000c). Test security: 
Official statement of the National Academy of Neuropsychology. 
Archives of Clinical Neuropsychology, 15, 383-386. 

O'Bryant, S. E., O'Jile, J. R., & McCaffrey, R. J. (2004). Reporting of de- 
mographic variables in neuropsychological research: Trends in the 
current literature. The Clinical Neuropsychologist, 18, 229-233. 

Oestreicher, J. M. & O'Donnell, J. P. (1995) Validation of the General 
Neuropsychological Deficit Scale with nondisabled, learning- 
disabled, and head-injured young adults. Archives of Clinical Neu- 
ropsychology, 10, 185-191. 

Ostrosky-Solis, E, Ardila, A., & Rosselli, M. (1999). NEUROPSI: A 
brief neuropsychological test battery in Spanish with norms by 
age and educational level. Journal of the International Neuropsy- 
chological Society, 5, 413-433. 

Ozonoff, S. (1995). Reliability and validity of the Wisconsin Card 
Sorting Test in studies of autism. Neuropsychologia, 9, 491-500. 

Paradee, C. V., Rapport, L. J., Hanks, R. A., & Levy, J. A. (2005). Orca- 
dian preference and cognitive functioning among rehabilitation 
inpatients. The Clinical Neuropsychologist, 19, 55-72. 

Ponton, M. O, & Leon-Carrion, J. (2001). Neuropsychology and the 
Hispanic patient: A clinical handbook. Mahwah, NJ: Lawrence Erl- 
baum Associates. 

Poreh, A. M. (2000). The quantified process approach: An emerging 
methodology to neuropsychological assessment. The Clinical Neu- 
ropsychologist, 14, 212-222. 

Powell, G. E., & Wilson, S. L. (1994). Recovery curves for patients 
who have suffered very severe brain injury. Clinical Rehabilitation, 
8, 54-69. 

Prigatano, G. P. (2000). Neuropsychology, the patient's experience, 
and the political forces within our field. Archives of Clinical Neu- 
ropsychology, 15, 71-82. 

Reitan, R. M., & Wolfson, D. (1988). Traumatic brain injury: Recovery 
and rehabilitation. Tucson, AZ: Neuropsychology Press. 

Robbins, T, James, M., Owen, A., Sahakian, B., Mclnnes, L., & Rab- 
bitt, P. (1994). The Cambridge Neuropsychological Test Auto- 
mated Battery (CANTAB): A factor analytic study in a large 
number of elderly volunteers. Dementia, 5, 266-281. 

Rohling, M. L., Williamson, D. J., Miller, L. S., & Adams, R. L. (2003). 
Using the Halstead-Reitan Battery to diagnose brain damage: A 
comparison of the predictive power of traditional techniques to 
Rohling's Interpretive Method. The Clinical Neuropsychologist, 1 7, 
531-543. 



Test Selection, Test Administration, and Preparation of the Patient 85 



Russell, E. W. (2000a). The cognitive-metric, fixed battery approach to 
neuropsychological assessment. In R. D. Vanderploeg (Ed.), Clini- 
cian's guide to neuropsychological assessment (2nd ed., pp. 449-481). 
Mahwah, NJ: LEA. 

Russell, E. W. (2000b). The application of computerized scoring pro- 
grams to neuropsychological assessment. In R. D. Vanderploeg 
(Ed.), Clinician's guide to neuropsychological assessment (2nd ed., 
pp. 483-515). Mahwah, NJ: LEA. 

Sattler, J. M. (2001). Assessment of children: Cognitive applications 
(4th ed.). San Diego: J. M. Sattler. 

Shuttleworth-Edwards, A. B., Donnelly, M. J. R., Reid, I., & Radloff, S. E. 
(2004a). A cross-cultural study with culture fair normative indi- 
cations on WAIS-III Digit Symbol-Incidental Learning. Journal of 
Clinical and Experimental Neuropsychology, 26, 921-932. 

Shuttleworth-Edwards, A. B., Kemp, R. D., Rust, A. L., Muirhead, 
G. G. L., Hartman, N. P., & Radloff, S. E. (2004b). Cross-cultural 
effects on IQ test performance: A review and preliminary norma- 
tive indications on WAIS-III test performance. Journal of Clinical 
and Experimental Neuropsychology, 26, 903-920. 

Slick, D, Hopp, G., Strauss, E., Fox, D., Pinch, D., & Stickgold, K. (1996). 
Effects of prior testing with the WAIS-R NI on subsequent retest 
with the WAIS-R. Archives of Clinical Neuropsychology, 11, 123-130. 

Slick, D., Hopp, G., Strauss, E., & Thompson, G. B. (1997). Victoria 
Symptom Validity Test. Odessa, FL: PAR. 

Slick, D. J., Tan, J. E., Strauss, E. H., & Hultsch, D. F. (2004). Detect- 
ing malingering: A survey of Experts' practices. Archives of Clinical 
Neuropsychology, 19, 465-473. 



Storandt, M. (1994). General principles of assessment of older adults. 
In M. Storandt & G. R. Vanden Bos (Eds.), Neuropsychological as- 
sessment of dementia and depression. Washington, DC: American 
Psychological Association. 

Sweet, J. J., Moberg, P. J., & Sucy, Y. (2000a). Ten-year follow-up sur- 
vey of clinical neuropsychologists: Part 1. Practices and beliefs. 
The Clinical Neuropsychologist, 14, 18-37. 

Sweet, J. J., Moberg, P. J., & Sucy, Y. (2000b). Ten-year follow-up sur- 
vey of clinical neuropsychologists: Part II. Private practices and 
economics. The Clinical Neuropsychologist, 14, 479-495. 

Sweet, J. J., Peck, E. A., Abramowitz, C, & Etzweiler, S. (2002). National 
Academy of Neuropsychology/Division 40 of the American Psy- 
chological Association Practice Survey of clinical neuropsychology, 
part 1 : Practitioner and practice characteristics, professional activi- 
ties, and time requirements. The Clinical Neuropsychologist, 16, 
109-127. 

Turner, S. M., DeMers, S. T, Fox, H. R., & Reed, G. M. (2001). APA's 
guidelines for test user qualifications: An executive summary. 
American Psychologist, 12, 1099-1113. 

Van Schijndel, F. A. A., & van der Vlugt, H. (1992). Equivalence 
between classical neuropsychological tests and their com- 
puter version: Four neuropsychological tests put to the test. 
Journal of Clinical and Experimental Neuropsychology, 14, 45 
(abstract). 

Zhu, J., & Tulsky, D. S. (2000). Co-norming the WAIS-III and WMS- 
III: Is there a test-order effect on IQ and memory scores? The 
Clinical Neuropsychologist, 14, 461-467. 



Report Writing and Feedback Sessions 



THE NEUROPSYCHOLOGICAL REPORT 

Ultimately, the purpose of the neuropsychological assessment 
is to answer the referral question; practically speaking, the as- 
sessment serves to meet the needs of the referring party while 
helping the patient. In all cases, a report is prepared that re- 
flects background, test findings, observations, and recommen- 
dations. Neuropsychological assessment reports vary greatly in 
format, content, and language; no fixed format is appropriate 
for all purposes. Nevertheless, the steps in preparing the re- 
port are often similar across settings; these will be outlined in 
this chapter. Additional guidelines on report writing for the 
neuropsychologist can be found in Axelrod (2000), Baron 
(2004), Hebben and Milberg (2002), and Williams and Boll 
(2000). Sample neuropsychological reports are presented in 
these texts as well as in Donders (1999) and other sources. For 
general guides on report writing, see Ownby (1997), Sattler 
(2001), and Tallent (1993). This chapter will cover style, con- 
tent, confidentiality, computer use, and other relevant issues 
in report writing. 

Wording 

Wording should be kept as clear and simple as possible re- 
gardless of whom the recipient of the report is. In particular, 
psychological jargon and acronyms should be avoided and 
technical terms, if necessary, should be explained. When 
choosing wording, it must be kept in mind that in many cases 
the patient or family will read the report (see Feedback Ses- 
sion). In addition, the report is likely to be read by many other 
individuals other than the referring party and will likely re- 
main on file for several years to come (e.g., as part of a hospi- 
tal chart or permanent school record). Consequently, wording 
must be carefully and deliberately chosen, with an avoidance 
of unsubstantiated statements, wording that minimizes the 
individual, and terms with unnecessarily negative connota- 
tions (i.e., the patient should not be described as "a 43-year- 
old hemiplegic," but rather as "a 43-year-old man with 



hemiplegia"). The patient should be referred to by name 
rather than as "the patient" (Hebben & Milberg, 2002). A clin- 
ician's thesaurus for wording psychological reports (American 
Psychological Association [APA], 1997; Zuckerman, 1995) 
and dictionary of neuropsychological terms (Loring, 1999) 
may be useful for choosing appropriate, precise, and compre- 
hensible wording. 

Style and Length 

The report should be problem-oriented and should clearly 
answer the referral question. Depending on the recipient, the 
style and length of reports will vary and can be as informal as 
letters, as brief as a consultation form, or as comprehensive as 
a narrative report (Axelrod, 2000). The length of a report also 
depends on the purpose and on the complexity of the find- 
ings. However, the report should be kept as brief as possible 
by avoiding irrelevant or redundant information. Duplication 
of information readily available from other sources should be 
avoided. This only adds bulk to the report and wastes both the 
writer's and the reader's time. The guiding principle for inclu- 
sion should be whether the information is relevant to the pur- 
poses of the report and whether it contributes to a clearer 
understanding of the current test findings, interpretations, 
and recommendations. 

Ideally, the report should be written in short sentences to 
maximize clarity. The report should also "contain enough in- 
formation so an educated lay person will be able to grasp the 
major ideas, conclusions, and recommendations" (Axelrod, 
2000, p. 247). However, too much information, particularly 
when paired with excessive verbosity, is aversive to readers. 
Further, lengthy and wordy reports are less likely to be read in 
full by the recipient: a busy physician may read no more than 
the summary statement. Brevity can be maximized by cover- 
ing normal-range test results in a single sentence (e.g., "All 
other test results were in the normal range"; "Motor and sen- 
sory testing showed average results without significant side 
differences"). 



86 



Report Writing and Feedback Sessions 87 



In some settings, a single-page report may be perfectly ade- 
quate. A recent survey by Donders (2001) found a mean 
length of seven pages for neuropsychological reports. Those 
surveyed included clinicians whose reports routinely were 
only one page long while others regularly prepared reports of 
30 pages or more. Donders (1999) pleaded for brevity in re- 
ports dealing with children; he provides an example report 
that is only one to two pages in length. 

Privacy and Confidential Information 

Information that is valid but not relevant to the referral ques- 
tion (e.g., dental history) should be omitted, unless it is perti- 
nent to test findings. Omitting irrelevant information 
throughout the report is also mandated by principle 4.04 of 
the APA Ethical Principles of Psychologists and Code of Con- 
duct (2002). According to this principle, the psychologist 
must not include statements that are not germane to the eval- 
uation or that are an undue invasion of privacy. Similarly, 
caution should be employed when including confidential or 
negative information that involves a third party (e.g., ex- 
spouse, parent, sibling). Further, reports with identifiable ref- 
erences to a third party should not be released without that 
person's consent. Unless the information is absolutely essen- 
tial to the referral question and has been verified, third-party 
information should be avoided as much as possible. In cases 
where references to third parties are important for the referral 
question, the information should be written in a way that 
does not identify individuals without their consent (e.g., 
"There is a family history of bipolar disorder in first-degree 
relatives," rather than "The client's brother John has bipolar 
disorder"). In many cases, information on third parties is only 
marginally relevant to the referral question; in all cases, inclu- 
sion of third-party information should be done with caution. 

Content 

Most neuropsychological assessment reports contain certain 
basic information. These sections can either be formally iden- 
tified subheadings or be only briefly touched upon, depend- 
ing on the format of the report. See Table 5-1 for an example. 

Identifying Information 

To avoid confusion, the full name, birth date, age, date of test- 
ing, date of the report, and referral source should be listed, 
along with any other crucial identifying information such as 
chart number. Whether the report is written on a standard 
form provided by the hospital or agency or in letter form, this 
information is best listed separately at the beginning of the re- 
port. 

Reason for Referral 

It is essential to state the reason for referral at the beginning of 
the report. This serves to focus the report and to clarify why 



Table 5-1 Organization of Information in 
the Neuropsychological Report 

Report Sections 

Identifying information 

Reason for referral 

Relevant history 

Review of relevant previous reports 

Current concerns 

Report of informant(s) 

Observations during history taking and testing 

Test results 

General intellectual status 

Achievement 

Executive function 

Attention 

Memory 

Language 

Visual-spatial skills 

Motor function 

Sensory function 

Psychosocial functioning 
Summary and opinion 
Recommendations 
Appendix: Tests administered 



the evaluation was conducted and who requested it. Since the 
reason for referral often guides the selection of tests, citing 
this information helps clarify why certain tests were given as 
well as the rationale for the formulation and recommenda- 
tions. A sentence or two confirms that the neuropsychologist 
has understood the request and will address relevant ques- 
tions in the body of the report (e.g., "The patient was referred 
because of possible cognitive decline"; "The patient was re- 
ferred because of learning and memory problems"). This does 
not preclude addressing other important issues in the report. 
Some referrals are made without a clear statement of pur- 
pose (e.g., "request neuropsychological assessment"; "query 
organicity"). In some cases, the actual reason for referral is ev- 
ident after review of pertinent records. In other cases, the re- 
ferring party may need to clarify the reason for the referral. A 
referral form that lists several types of evaluations can be used 
(e.g., "Diagnostic Evaluation"; "Follow-Up Evaluation"; "Post- 
surgical Evaluation"; "Screening Evaluation"), as this helps to 
focus the reason for referral and provides referring parties 
with information on other services provided by the neuropsy- 
chologist. 

Relevant History 

The history section sets the stage for the interpretation of test 
results and provides the context for conclusions and recom- 
mendations. This section is based on history taking as outlined 
in Chapter 3 and typically includes information regarding the 
examinee's relevant personal and family medical history (in- 
cluding birth history, developmental milestones), educational 



88 A Compendium of Neuropsychological Tests 



attainment and school history, occupational history, alcohol 
and drug use, legal history, family and living situation, and in- 
terpersonal relationships. As noted previously, any informa- 
tion listed here will be relevant and germane to the assessment 
itself. 

The history section contains information that bears di- 
rectly on the interpretation of results. For example, in demen- 
tia evaluations, any genetic contributions can be outlined by a 
description of family medical history, and the description of 
cognitive complaints contributes to an understanding of the 
course of the disorder (e.g., abrupt or insidious onset). This 
section also provides clues to premorbid functioning by de- 
scribing highlights of educational and occupational achieve- 
ment. It also contributes to an understanding of the impact of 
the disorder on the patient's social and occupational situa- 
tion. The extent to which particular aspects are emphasized 
over others will depend on the type of evaluation conducted. 
For instance, if the assessment was requested to determine 
whether the client suffered traumatic brain injury, a detailed 
report of loss of consciousness and posttraumatic amnesia 
would be crucial since this information will have bearing on 
the diagnosis and prognosis. However, this information might 
not be reviewed with the same amount of detail if the same 
patient were later seen to determine the presence of a specific 
type of learning disability. 

Review of Relevant Previous Reports 

Specific sources of information, in addition to information 
gleaned in the interview, will include medical reports (e.g., 
neurological reports, MRI, CT, EEC), school reports, and em- 
ployment records (see also Chapter 3). Treatment informa- 
tion, including current medications and ongoing mental 
health interventions (e.g., therapy/counseling), should be in- 
cluded, and prior testing, including prior psychological or 
neuropsychological evaluations, should be summarized. Sny- 
der and Ceravolo (1998) provide steps to efficiently retrieve 
all the relevant information from medical charts, a task that 
can be time-consuming unless approached strategically. Mc- 
Connell (1998) provides assistance for understanding results 
from laboratory tests commonly found in hospital charts that 
may have implications for neuropsychologists (e.g., hemato- 
logic and endocrinologic tests). 

The client's background history is often obtained through 
a mix of self-report and information gleaned from other rec- 
ords. It is therefore important to differentiate the source of 
the information. For instance, a sentence such as "Mr. Smith 
reported that he was depressed as an adolescent" is not as in- 
formative for the reader as "Mr. Smith was diagnosed with de- 
pression in adolescence by Dr. Jones." Such seemingly minor 
changes in wording may influence the interpretation of re- 
sults. The source of the information obtained should be indi- 
cated clearly, along with the date. Specifying the source can 
serve to inform the reader about the accuracy of the informa- 
tion as well as indicate any potential for bias (Hebben & Mil- 
berg, 2002). Again, detailed repetition of information already 



available to the reader is not helpful. For instance, in some 
medical settings such as inpatient wards, only the most cur- 
sory review of neurological history may be required because 
the information is already fully available to the treatment 
team. 

Current Concerns 

It is important to include a description of the examinee's 
complaints and concerns (or of those of spouses or parents, in 
cases of low-functioning adults or children; see below). In ad- 
dition to physical and cognitive concerns, this section should 
also include information about the patient's emotional state 
(e.g., stress, anxiety, depression) and the impact of symptoms 
or complaints on daily living, since this may affect the inter- 
pretation of the test results as well as the recommendations. 
The history of presenting problems should include a descrip- 
tion of current complaints in terms of severity, pervasiveness, 
duration, and onset (Hebben & Milberg, 2002). In many 
cases, each area (physical, cognitive, emotional) will be specif- 
ically reviewed (e.g., "In terms of physical complaints, the pa- 
tient reported nausea, dizziness, etc."). Children and patients 
with limited insight should also be queried about their cur- 
rent concerns, albeit in a briefer and simpler form. 

The patient's current concerns may be quite different from 
those of the referral source. Often, this discrepancy directly af- 
fects the interpretation of test results, which underlines the 
value of conducting a thorough interview with the patient 
and not relying solely on test scores for interpretation. In 
some cases, a discrepancy between the referring party and the 
patient is due to the patient's impaired insight secondary to a 
neurological condition (e.g., dementia). In others, the dis- 
crepancy suggests that the difficulties that prompted the refer- 
ral are due to other factors altogether (e.g., a patient referred 
for a dementia workup due to poor work performance denies 
cognitive difficulties but reveals a complicated grief reaction 
after a spouse's death). In the case of children, discrepancies 
between the referring party and parents' concerns are com- 
monplace and differ depending on the age of the child. 
Awareness and sensitivity about issues important to the child 
or adolescent, even when these differ markedly from those of 
the adults involved, ensures better cooperation with any sub- 
sequent treatment recommendations. This may also apply to 
adults who have significant cognitive impairments and whose 
concerns differ from those of their caregivers. 

Report of Informant(s) 

This section includes information provided by other infor- 
mants such as a spouse, relative, teacher, or employer. Al- 
though informants such as parents and teachers are routinely 
polled in evaluations of children, informant information is a 
valuable component of the evaluation of adults, particularly 
in cases where adult self-report may not be accurate, either 
because of neurological disorder affecting insight (e.g., TBI) 
or because of outside factors such as litigation. In many cases, 



Report Writing and Feedback Sessions 89 



informant report is essential for diagnosis to determine whether 
a condition is situation-specific or pervasive (e.g., ADHD). In 
most cases, information from informants may highlight addi- 
tional concerns and symptoms as well as uncover examinee 
statements that are misleading with regard to both the gravity 
of the symptoms and the time course of their evolution, 
which is of interest from a clinical standpoint. In children and 
in adults with limited insight, this section may be greatly ex- 
panded compared with the previous section. Specific forms 
have been developed to document informant and client re- 
ports (see Chapter 3). 

Observations During History Taking and Testing 

Since testing frequently extends over several hours or days, 
the patient's behavior during that period provides valuable 
information about day-to-day functioning. Competence as a 
historian, personal appearance, punctuality, cooperativeness, 
rapport with the examiner, approach to novel or routine 
tasks, comprehension of instructions, response to encourage- 
ment, reaction to failure, and degree of effort should all be 
evaluated by the examiner. The behavior at the beginning and 
toward the end of the session or the effect of breaks during 
testing on subsequent motivation allows an estimate of persis- 
tence, fatigue, speed of response, and emotional control. In 
fact, substantial changes in behavior during the course of the 
assessment should be carefully documented because they may 
affect the validity of the test results. Any concerns that assess- 
ment findings may not be reliable or valid indicators of an in- 
dividual's ability should be stated clearly in this section along 
with reasons for the concerns (e.g., "The test results may un- 
derestimate the client's abilities because he had a severe cold on 
the day of testing or "Concern about her motivation to perform 
well emerged on the tests described below. Accordingly, the test 
results likely do not provide an accurate indication of her current 
strengths and weaknesses"). Again, this section of the report 
should be kept to a minimum of relevant details. There is no 
need to overload the report with lengthy observations not 
pertinent to the purpose of the assessment. 

In many cases, the specific behaviors listed will differ de- 
pending on the setting. For instance, evaluations of examinees 
in psychiatric settings may include lengthy observations on 
reality testing, thought content, and coherence of verbal ex- 
pression. These would not be addressed in detail in other set- 
tings, where the focus may be on distractibility, overactivity, 
or inattentiveness during testing (e.g., ADHD workup) or on 
effort, consistency of self- reported complaints versus test per- 
formance, and pain behaviors (e.g., medico-legal TBI evalua- 
tion). 

In some cases, neuropsychology reports include a section 
on the validity of test findings; in others, this consists of a de- 
tailed, stand-alone section (e.g., medico-legal reports). In 
other cases, the entire report may be prefaced with a qualify- 
ing statement that specifies that the results are time-limited 
and may no longer be applicable after a certain period of 
time. This is particularly true in pediatric reports due to de- 



velopmental changes, but may also be the case in settings 
where patients are seen while in an acute state (e.g., inpatient 
TBI or in the postoperative period after neurosurgery). Other 
general caveats can also be detailed, such as the relevance of 
norms for the particular individual and the degree to which 
the results likely reflect an adequate reflection of the individ- 
ual's ability, given the individual's perceived effort and the de- 
gree to which testing conditions were optimal. 

Test Results 

The section on test results is the most technical aspect of the 
report. While the tendency may be to describe scores test by 
test, ideally, the information is organized into logical do- 
mains, and results are presented in terms of performance 
(e.g., "Sustained attention skills were significantly below age 
expectations") rather than in terms of the test (e.g., "The pa- 
tient's score on a computerized test requiring her to respond 
to the letter X when presented with multiple letter trials over 
time was poor"). 

The domains covered (see Table 5-1) may vary depending 
on the purpose of the assessment. Noncontributory results 
may be omitted or summed up briefly. Other formats exist as 
well. For example, Donders (1999) advocates the use of a brief 
report that summarizes the entire results in one paragraph 
(devoid of scores), describing results in a way that is easily in- 
terpretable by a lay reader. Other settings also demand a more 
general summary of results without a domain-by-domain 
explanation (e.g., inpatient medical chart entries, screening 
evaluations). 

Many outpatient reports include separate sections for each 
domain (with subtitles). We use a variety of formats, depend- 
ing on the referral question; in some cases, a one- or two- 
paragraph summary of results is sufficient to integrate all the 
neuropsychological data. In longer reports, topic sentences in- 
tegrate the information from different tests that are relevant to 
interpreting the examinee's functioning within that domain. 
This is followed by a listing of the test data that led to the inter- 
pretation, with supporting data (usually described in per- 
centile form) and references to the relevant tests. For example: 

Verbal comprehension appeared intact. She was able to 
follow complex commands (Token Test — within 
average range). By contrast, expressive functions were 
poor. Her ability to generate words on command 
within a fixed time period (Verbal Fluency) was low, 
below the 10th percentile. She also had considerable 
difficulty naming pictured objects (Boston Naming 
Test), her score falling in the impaired range, below the 
5th percentile. Some paraphasias [substitution of 
sounds] were apparent on naming tasks (e.g., acorn — 
aircorn). The provision of phonemic cues facilitated 
naming. 

It is also useful to keep in mind that most tests provide re- 
sults for several different domains. For example, the WAIS-III 
subtest scores need not all be referenced under "Intellectual 



90 A Compendium of Neuropsychological Tests 



Ability." Rather, Digit Span may be considered under "Atten- 
tion" or "Working Memory," Vocabulary under "Language," 
Block Design under "Visual- Spatial Ability," etc. Similarly, the 
Trail Making Test may be considered under "Executive Func- 
tion," "Attention," and "Visuomotor Ability." 

Special mention should be made regarding validity indices. 
There is growing concern in the medical-legal context that in- 
dividuals may familiarize themselves with neuropsychological 
tests to manipulate the test findings for gain (e.g., to evade 
detection in the case of malingering). Accordingly, names of 
tests of motivational status can be provided in the list of tests 
administered, but it should not be stated in the body of the re- 
port that these tasks are measures of effort (see also Chapter 
16, Assessment of Response Bias and Suboptimal Performance). 
That is, comments on the validity of test data should be made 
without naming any specific test (Green, 2003). For example: 

The patient obtained very low scores on a number of 
memory tests that are objectively very easy. While such 
low scores can occur in severely demented individuals, 
they rarely if ever occur in normal individuals or 
from people suffering from mild brain injuries. These 
findings raise significant concerns regarding the validity 
of the test results. 

Alternatively: 

The patient was given some tasks that are relatively 
insensitive to severe brain injury, but that can be 
greatly affected by effort. Her performance was at a 
level that is rarely seen among non-compensation- 
seeking patients with documented significant brain 
damage and indicates suboptimal effort. 

It is important to note areas of strength as well as weak- 
nesses. Strengths in certain cognitive or other areas provide 
the main basis for intervention strategies. Crosson (2000) 
notes that a deficit-centered approach can have a negative im- 
pact when providing feedback to the patient and family. This 
is also true with regard to report writing. Again, the discus- 
sion of strengths and weaknesses refers to functional domains 
and not to tests. In cases where few strengths can be found on 
objective testing (i.e., severe developmental delay), the indi- 
vidual's personal assets can be highlighted (e.g., "Despite 
these cognitive limitations, Sarah is an engaging, sociable 
girl"). 

Describing Test Scores 

The question of whether to include raw data and standard or 
other scores within reports is controversial. Naugle and Mc- 
Sweeny (1995) point out that the practice of including raw 
data and scores such as IQs may lead to misinterpretations and 
contravenes the Ethical Principles of Psychologists, which state 
that psychological data should not be released to individuals 
unqualified to interpret them. By contrast, the routine report- 
ing of raw scores is recommended by authors like Freides 



(1993, 1995), Mitrushina et al. (2005), and Tallent (1993). Both 
Matarazzo (1995) and Freides (1995) argue that it is illogical to 
restrict access to scores when conclusions and interpretations 
of these scores, which are equally sensitive, are presented in re- 
ports. Others have argued that the practice of deliberately 
omitting important and detailed information such as IQ 
scores in reports goes against the accepted practice of all other 
professionals such as physicians, who freely provide the infor- 
mation gathered during their assessments to other health pro- 
fessionals. For example, Lees-Haley and Courtney (2000) 
argue that current practice of restricting the disclosure of tests 
and raw tests to courts undermines psychologists' credibility 
and is contrary to the best interest of consumers. Routinely 
omitting scores also makes interpretations impossible to verify. 
In this vein, Hebben and Milberg (2002) argue that test scores 
are the only common referent that will be used by future read- 
ers of the report and that labels such as "average" or "below av- 
erage" are not precise and may refer to different score ranges 
depending on the individual clinician. 

The practice of including scores in reports also allows for 
more precise information to be conveyed to the reader and 
permits the next examiner to measure change more accu- 
rately. Hebben and Milberg recommend providing actual test 
scores in standard form in the body of the report or a sum- 
mary table, with specification of the norms used and raw 
score, if several norms exist for a test (e.g., Boston Naming 
Test). A similar approach is taken by Donders (1999), who ap- 
pends a list of test scores in standard or in raw score format to 
the report, along with the corresponding normative means 
and standard deviations. Most neuropsychologists include 
scores in their reports (Donders, 2001). According to a survey 
of Division 40 members, raw scores are rarely reported in the 
body of the report, although there are some exceptions (e.g., 
the number of categories achieved on the WCST). Age and 
grade equivalents are also rarely mentioned, consistent with 
the significant psychometric problems associated with such 
scores (e.g., the lack of equal distances between measurement 
points and the gross exaggeration of small performance dif- 
ferences) (Donders 2001). Most neuropsychologists express 
scores in percentile ranks or standard scores (e.g., z or T 
scores; Donders, 2001), as described in Chapter 1. 

Few lay individuals understand the meaning of test scores, 
including percentiles. Bowman (2002) found that third-year 
undergraduates in a psychometrics course grossly underesti- 
mated below-normal percentile values and overestimated 
above-average percentile values when asked to interpret them 
in terms of corresponding IQ values. Thus, the communica- 
tion of scores is facilitated by the use of descriptors of ranges 
of ability. Commonly accepted classification systems of ability 
levels are provided by Wechsler (1997; see Table 5-2) and by 
Heaton et al. (2004; Table 5-3). 

Unless the recipient of the report can be expected to be 
fully familiar with the system used, we recommend that the 
test results section be prefaced with an explanation of the 
metric used. For example: 



Report Writing and Feedback Sessions 91 



Table 5-2 Classification/Descriptors of Test Scores Using the Wechsler System 













Lower Limit of 


Classification 


IQ 


z Score 


T Score 


Percent Included 


Percentile Range 


Very superior 


130 and above 


+2 and above 


70+ 


2.2 


98 


Superior 


120-129 


1.3 to 2 


63-69 


6.7 


91 


High average 


110-129 


0.6 to 1.3 


56-62 


16.1 


75 


Average 


90-109 


±0.6 


44-55 


50.0 


25 


Low average 


80-89 


-0.6 to -1.3 


43-37 


16.1 


9 


Borderline 


70-79 


-1.3 to -2.0 


36-30 


6.7 


2 


Extremely low 


69 and below 


-2.0 and below 


29 and below 


2.2 


— 



Source: Based on WAIS-III (Wechsler, 1997) description system. 



The following description of Ms. A's abilities is based 
on her performance in comparison to same-aged peers. 
A percentile ranking refers to the percentage of people 
in her age group who would be expected to score equal 
to or below her on that particular measure. Thus, a 
score of 50, falling at the 60th percentile, would mean 
that 40% of her peers obtained higher scores while 
60% obtained lower scores. Test scores that are better 
than 75% to 84% of individuals with the same 
background are considered to be above average. Scores 
that fall within the 25th to 74th percentile are 
considered to be within the average range. Scores that 
are within the 9th to 24th percentile are termed low 
average. Scores that fall within the 2nd to 8th percentile 
are borderline or mildly impaired, while scores below 
this range are considered to be extremely low or 
moderately/severely impaired. 

One drawback of percentile-rank reporting is the pseudoac- 
curacy implied in such scores. Basically, percentile ranks repre- 
sent the number of people covered by the normal distribution 
curve as expressed in standard deviations. One-half of a stan- 
dard deviation from the mean changes the percentile ranks 
from 50 to 69, a seemingly large difference that is in most 
cases clinically insignificant. With computer calculations, 
scores that differ by even small fractions of a standard devia- 
tion can be translated into percentile points that may seem to 
reflect real differences to the reader unfamiliar with basic psy- 
chometrics. 

Lezak et al. (2004) provide an illustrative example of this 
phenomenon. Lay individuals who see a Wechsler report of a 
Similarities score at the 37th percentile (scaled score of 9) and 
an Arithmetic score at the 63rd percentile (scaled score of 11) 
are likely to conclude erroneously that the client performs 
better in Arithmetic than in verbal reasoning. However, score 
differences of this magnitude are chance variations, and the 
individual's performance in these two areas is best viewed as 
equivalent. 

Some test scores are not normally distributed, and norms 
for these tests allow only a very limited range of scores (see 
Chapter 1). When most individuals succeed on the majority 
of items (e.g., Boston Naming Test, Hooper, Rey Complex 



Figure, RBANS, some WMS-III subtests), the distribution is 
negatively skewed and variability of scores falling within the 
normal or above-normal range is highly limited. The test has 
its highest discriminative power at the lower end of ability 
levels (Mitrushina et al., 2005). In the case where test items 
present difficulty for most of the subjects (e.g., Raven's Ad- 
vanced Progressive Matrices), the score distribution is posi- 
tively skewed and variability within the lower ranges is highly 
limited. Such a task would be most appropriate for the selec- 
tion of a few outstanding individuals from a larger sample 
(Mitrushina et al., 2005). In either case, the use of standard 
scores (e.g., z scores, T scores) is not advised. In such cases, it 
is more appropriate to describe the test results in terms of a 
cutoff of frequency of occurrence (e.g., at the 5th percentile; 
Lezak et al., 2004). 

The interpretation of scores described according to impair- 
ment classifications must consider the premorbid abilities of 
the examinee. "Average" or "normal range" scores in a previ- 
ously gifted individual may very well indicate considerable loss 
of abilities. On the other hand, for a person with borderline 
premorbid intelligence, impairment may only be inferred from 
scores two or three standard deviations below the mean. Refer- 
ence to indicators of premorbid functioning and to demo- 
graphically corrected scores are required for the interpretation 
of scores for people with nonaverage premorbid abilities 



Table 5-3 Classification/Descriptors of Test Scores Using the 
Heaton et al. System 



Performance 




T-Score 


Lower Limit of 


Range 


Classification 


Range 


Percentile Range 


Normal 


Above average 


55+ 


68 




Average 


45-54 


31 




Below average 


40-44 


16 


Impaired 


Mild 


35-39 


7 




Mild to moderate 


30-34 


2 




Moderate 


25-29 


<1 




Moderate to severe 


20-24 


— 




Severe 


0-19 


— 



Source: Heaton et at, 2004. 



92 A Compendium of Neuropsychological Tests 



(Bell & Roper, 1998; Heaton et al, 2003, 2004; Larrabee, 2000a; 
Tremont, 1998). 

It is also important to bear in mind that there are relation- 
ships among tests due to shared common factors (Larrabee, 
2000a). Each score must be interpreted in the context of 
scores on other tests of related abilities. Further, the findings 
must make neuropsychological sense; that is, they must be 
meaningful in terms of the patient's history and the suspected 
disorder (Lezak et al., 2004). Note, too, that a single "deviant" 
score may be the result of one of several error sources: misun- 
derstanding of instructions, inattention, distraction, momen- 
tary lapse of effort, etc. Thus, it is not at all uncommon for 
healthy people to show isolated impairment on one or two 
tests in the course of an assessment (Heaton et al., 2004; In- 
graham & Aiken, 1996; Taylor & Heaton, 2001). However, 
consistent impairment within specific cognitive domains is 
relatively unusual (Palmer et al., 1998). 

Norms 

When different norms are available for a test, the clinician 
may want to specify the norm set that was used to derive 
scores (either in parentheses in the body of the report or in an 
appendix of test scores). This information can be particularly 
helpful given that patients are often assessed multiple times. 
See Chapter 2 for an extended discussion of norms selection 
in neuropsychological assessment. 



Test Adaptations and Deviations from 
Standard Administration 

Test adaptations for the client with visual, auditory, or motor 
impairments, and any test modifications used for testing 
clients for whom English is a second language as discussed in 
Chapter 4, should be clearly stated, and restrictions on the use 
of published norms and consequent limitations in interpreta- 
tion should be explained in the report. 



Summary, Opinion, and Recommendations 

A summary is often provided in point form. It includes a brief 
restatement of the client's history, the major neuropsychologi- 
cal findings, and a diagnostic statement that includes the pre- 
sumed origin of the deficit. A prognostic statement is also 
expected. For example: 

1. This is a 54-year-old right-handed accountant with no 
previous history of neurological disorder. 

2. She was in a motor vehicle accident on Dec. 12, 2000, 
and suffered a severe head injury; she was comatose for 
two weeks and had significantly abnormal findings on 
neurological examination and imaging. 

3. Neuropsychological findings reveal significant deficits 
in memory and slowed processing speed. 

4. There are no concurrent conditions to account for 
these cognitive difficulties (i.e., she does not have 



significant depression, pain, or PTSD). In addition, she 
appeared to put forth maximum effort on the tests 
administered. 

5. It is therefore likely that she suffered significant 
neuropsychological compromise as a result of her 
accident. 

6. Given the time since her injury, further significant 
improvement is unlikely. 

Alternatively, this part of the report can be divided into 
relevant sections of interest to different readers. For example, 
the report can include a "Neurological Implications" or "Med- 
ical Implications" section written primarily for medical staff 
including likely etiology, chronicity, diffuse or focal nature of 
deficits, and need for medical follow-up. Often, the summary 
will include major recommendations. For example: 

The pattern of neuropsychological test results was 
consistent with dementia. Given the neurological 
results and history provided by his wife, it is likely that 
this represents dementia of the Alzheimer's type. 
Consultation with his neurologist or treating physician 
is recommended to ensure neurological follow-up. 

Or: 

Neuropsychological testing suggests preserved language 
skills in this left-handed boy with intractable seizures 
and a history of perinatal stroke in the left frontal lobe 
in areas presumed to subserve language functioning. 
Consequently, the possibility of atypical language 
dominance (i.e., bilateral or right hemisphere 
language) should be confirmed with Wada testing prior 
to epilepsy surgery. Language mapping is also 
recommended prior to any surgery involving the left 
frontal lobe. 

The interpretation, based on quantitative and qualitative 
information, must respect what is known about brain-behavior 
relations (i.e., it must make sense from a neurological and 
neuropsychological point of view) and must also take into ac- 
count information regarding test reliability and validity. 
Hebben and Milberg (2002) stress that if neuropsychological 
data do not fit with the other information such as neurologi- 
cal data or prior history, this should be stated. Further, they 
caution that there are few single test scores that are valid pre- 
dictors of lesions in specific areas of the brain. Even though 
conclusions about impaired function can be made, predic- 
tions about abnormal brain areas should be made with cau- 
tion. They provide a list of questions to follow when 
interpreting neuropsychological data, including considering 
base rates, premorbid levels, confounding factors, and specific 
points to consider when making inferences about brain dam- 
age or dysfunction based on neuropsychological data. Cimino 
(2000) also notes that conceptualization and interpretation 
needs to take into account the influence of subject-specific 
variables, effects of interaction between different cognitive 
domains, and consistency/inconsistency within cognitive 



Report Writing and Feedback Sessions 93 



domains, and should avoid erroneous assumptions (e.g., 
overinterpretation of test scores) and other issues central to 
making appropriate interpretations of neuropsychological 
data. For a full discussion of these and related issues, the 
reader is urged to consult comprehensive texts on neuropsy- 
chological assessment such as Baron (2004), Groth-Marnat 
(2000), Lezak et al. (2004), McCaffrey et al. (1997), Mitrushina 
et al. (2005), and Vanderploeg (2000). 

Diagnostic information and recommendations relevant for 
follow-up by other professionals are also highlighted in this 
section. This often includes implications for daily living and 
education (e.g., Baron et al., 1995, 2004). For example: 

Neuropsychological results show a clear discrepancy 
between measured intelligence and achievement levels 
in the area of reading. As a result, John meets criteria 
for a reading disability and will require a modified 
program and designation as a student with special 
educational needs. 

Or: 

Mr. Jones's pattern of test results and history suggest 
that his post-injury course has been complicated by 
PTSD. Referral to a therapist or counselor with 
expertise in this area is highly recommended. Referral 
has been discussed with Mr. Jones and names of local 
providers have been suggested. 

Unfortunately, in many settings, recommendations remain 
the most overlooked aspect of the neuropsychological evalua- 
tion (Hebben & Milberg, 2002). Recommendations should be 
both practical and realistic. The neuropsychologist preparing 
the assessment report should be familiar with remedial tech- 
niques, therapies, and basic management procedures in his or 
her field of expertise as well as with the available local re- 
sources that provide such assistance. Names and phone num- 
bers for specific treatment, training, or support facilities may 
need to be included or provided as separate attachments dur- 
ing the feedback session. 

Other recommendations include practical hints for the pa- 
tient and the caregiver for the management of particular 
problems in daily living, educational and occupational impli- 
cations, and, in some cases, an estimate of when reassessment 
should be scheduled to measure future progress or decline. 
Examples of recommendations can be found in many sources 
(e.g., Eslinger, 2002; Ylvisaker, 1997). 

Driving deserves special mention. The ability to drive a 
motor vehicle is often an issue of concern for the patient, the 
caregiver, and the referring psychologist or physician. In many 
jurisdictions, psychologists must, by law, report any patient 
they suspect is not competent to drive. Neuropsychological 
tests do not provide direct information about the ability to 
drive safely, although they can provide some relevant infor- 
mation (e.g., Brown et al., 2005), and in instances of severe 
impairment it is obvious that the driver's license should be 
suspended. Brouwer and Withaar ( 1997) provide a framework 
for determining fitness to drive after brain injury. For further 



discussion of driving and neuropsychological assessment, see 
Dobbs (1997) and Lemsky (2000). 

Appendix: Tests Administered 

The need for appendices varies depending on the report type. 
Some neuropsychologists include a full listing of all tests ad- 
ministered in the course of the assessment, while others supple- 
ment the list with actual test scores. Others include the test list 
in the body of the report. Briefer reports may not include any 
mention of the tests administered. When brevity is required, we 
favor the practice of including test names for the more impor- 
tant findings and describing other findings in general terms in 
the body of the report (e.g., "The patient showed severe execu- 
tive functioning deficits [<lst percentile, WCST]. However, at- 
tention, memory and language were all within normal limits"). 

RECIPIENTS 

The contents of a neuropsychological report are confidential 
and not to be shared with others not specifically designated by 
the patient or referring party. Even if the report is not directly 
shared with the patient, in most cases, he or she has the right 
to see the report. Although this is not always feasible in med- 
ical settings, it is good practice to allow the patient to read the 
report before it is distributed (see Feedback Session) and to 
clarify with the patient in writing who should receive a copy. 
This practice is followed, at least on an occasional basis, by a 
majority of neuropsychologists in a recent survey (Donders, 
2001). Neuropsychologists whose clientele is predominantly 
pediatric may defer report distribution until after review with 
the patient or family (e.g., see Baron, 2004; Baron et al, 1995). 
The distribution list should be indicated at the end of the 
report. Most neuropsychologists do not vary the content of 
the report depending on the distribution list. However, some 
clinicians working with children may draft an abbreviated 
version of the report or a letter for distribution to the school 
when the full report contains sensitive material that parents 
may not want disclosed to the school system. 

FORENSIC ASSESSMENT REPORTS 

Forensic assessment is a specialty in itself (McCaffrey et al., 
1997; Otto & Heilbrun, 2002). Typically, forensic reports are 
longer, more detailed, and more comprehensive than those 
used in routine clinical practice (Donders, 2001). However, it 
is important to remember that any report written for clinical 
purposes may be scrutinized in court at a later time. Forensic 
assessment reports are written for lawyers on both sides of the 
case and should be couched in clear language that cannot be 
misinterpreted. Most often the report will address questions 
regarding the existence of brain injury, cause of brain injury, 
degree of impairment, and prognosis. Reports may also focus 
on child custody, diminished criminal responsibility, and 
competency. Competency can be assessed in terms of specific 



94 A Compendium of Neuropsychological Tests 



questions such as the competency to stand trial, make a will, 
manage one's affairs (e.g., contracts, estate), determine resi- 
dence (commitment to an institution or hospital), or give in- 
formed consent (Grisso & Appelbaum, 1998). In addition to 
neuropsychological tests, specific instruments have been de- 
veloped for assessing competency (e.g., the MacArthur Com- 
petency Assessment Tool-Criminal Adjudication; Appelbaum 
& Grisso, 1995). 

In reports written for forensic purposes, the qualifications 
of the psychologist (registration or license, board certifica- 
tion, area of specialization, years of experience) are required 
and usually form the first sentence of the report ("I am a psy- 
chologist registered to practice in . Attached is my curricu- 
lum vitae outlining some of my qualifications"). The section 
detailing informant reports may also be quite detailed in 
medico-legal evaluations when corroboration of subjective 
complaints is necessary. The section on observations will also 
frequently be expanded to include a detailed opinion about 
the validity of the test findings (see later, and Chapter 16, 
Assessment of Response Bias and Suboptimal Effort, in this vol- 
ume). Additionally, the report should include specifics regard- 
ing norms (Williams, 1997a) because the neuropsychologist 
should be able to justify why a particular normative set was 
used. Some neuropsychologists also divide the interpretation 
section into "preexisting" (e.g., prior head injury), "concur- 
rent" (e.g., medication use), and "intervening factors" (e.g., 
PTSD) to fully discuss all the possible factors that may 
account for the obtained pattern of test results (Williams, 
1997b). 

The question of admissibility of testimony by neuropsy- 
chologists in various states is guided by two standards. The 
Frye rule holds that the evidence should be conditioned on 
having been sufficiently established that it is generally ac- 
cepted in the particular field in which it belongs. The stan- 
dards for admissibility changed in 1993 with the U.S. Supreme 
Court's decision in Daubert v. Merrell Dow Pharmaceuticals. It 
rejected the Frye standard on the ground that it focused too 
narrowly on a single criterion, general acceptance; rather, the 
court must decide whether the reasoning and methodology 
underlying the testimony is scientifically valid and can prop- 
erly be applied to the facts at issue. Criteria for deciding ac- 
ceptability include testability of the theoretical basis, error 
rates of the methods used, peer review and publication, and 
general acceptance. 

Due to the adversarial nature of court proceedings, the tes- 
tifying neuropsychologist can expect extremely critical analy- 
sis of his or her report. Well-prepared lawyers are familiar 
with many of the tests and their weaknesses, based on books 
specifically written for that purpose (e.g., Doerr & Carlin, 
1991; Faust et al, 1991; Hall & Pritchard, 1996; Melton et al, 
1997; Sbordone, 1995; Ziskin & Faust, 1988). Additionally, ex- 
pert neuropsychologists are frequently hired by the opposing 
side to critically review the neuropsychological report. Attacks 
may be expected, particularly regarding results based on per- 
sonality tests like the MMPI or the Rorschach Test and on 
results based on experimental or nonstandard measures of 



neuropsychological functioning (see also McCaffrey et al., 
1997, and Mitrushina et al., 2005, for a discussion of guide- 
lines for determining whether a neuropsychological test is 
considered standard). The report writer may also consider the 
ecological validity of neuropsychological tests, which may be 
dubious unless supported by sound studies. Authors like 
Larrabee (2000b), Lees-Haley and Cohen (1999), and Sweet 
(1999) provide guidance for the neuropsychologist, while 
Pope et al. (1993) focus their book primarily on the MMPI, 
the MMPI-2, and the MMPI-A. Neuropsychologists specializ- 
ing in forensic assessment should also refer to the Specialty 
Guidelines for Forensic Psychologists (Committee on Ethical 
Guidelines for Forensic Psychologists, 1991). 

A particularly important aspect of forensic reports is the 
question of symptom validity: Was the patient cooperating 
fully? Are some or all of the symptoms valid or are they influ- 
enced by a tendency to exaggerate or even to malinger? Many 
tests, especially the MMPI-2 and other personality tests, have 
developed scales or indices that can assist in the detection of 
response bias (see Chapter 15, Assessment of Mood, Personality, 
and Adaptive Functions). In addition, special testing for symp- 
tom validity has been developed (described in Chapter 16, 
Assessment of Response Bias and Suboptimal Performance). Such 
information is crucial for the interpretation of test results. 
However, the problem is complicated by the fact that these 
tests can at best only indicate that motivational and/or emo- 
tional factors (e.g., depression, anxiety, lack of effort) may be 
influencing task performance. Even in cases where financial or 
other incentives exist and the patient's performance is suspect, 
the patient may be impaired and/or acting without conscious 
intent. Accurate diagnosis requires examination of both test 
and extra-test behavior as well as a thorough evaluation of the 
patient's history and pertinent reports, including injury char- 
acteristics (Slick et al., 1999). Communication of findings can 
be problematic, given the difficulty in diagnosing malingering 
and the complexity of an individual's motivations. The report 
should be written in a factual manner, providing a detailed de- 
scription of the patient's behavior, and should acknowledge 
any limitations in the assessment. In some cases, the clinician 
may merely comment that the invalidity of the testing pre- 
cludes firm conclusions. A recent survey of experts in the area 
(Slick et al., 2004) suggests that the term "malingering" is rarely 
used. Rather, most experts typically state that the test results 
are invalid, inconsistent with the severity of the injury, or in- 
dicative of exaggeration (for a further discussion, see Chapter 
16, Assessment of Response Bias and Suboptimal Performance). 



COMPUTER-GENERATED SCORES AND REPORTS 

Computer scoring programs are fairly common, and the so- 
phistication of interpretative programs has increased over 
time. Several data management and storage, "report building," 
and report writing computer programs are available, with 
options ranging from organizing inputted test scores to pro- 
viding outlines and fully written reports. The clinician, how- 



Report Writing and Feedback Sessions 95 



ever, remains responsible for what goes into the report. For 
this reason, he or she should be thoroughly familiar with all 
material used in such programs as well as with the standards 
for educational and psychological testing that pertain to com- 
puter testing and interpretation (American Educational Re- 
search Association et al., 1999). 

Clinicians may be tempted to use the computer as a "cheap 
consultant" and take the validity of the computer-generated 
report for granted. It is important to bear in mind that 
computer-generated reports, by their very nature, use a shot- 
gun rather than a problem-oriented approach, and they ad- 
dress a hypothetical, "typical" individual based on averages of 
certain test scores, not the particular examinee who is the sub- 
ject of the report. It follows, then, that a computer-generated 
report can only be used selectively and with modifications re- 
quired by the referral questions and the specific circumstances 
of the examinee. 

Computer scoring can save time and often avoids compu- 
tation errors. However, the translation of raw scores into stan- 
dardized scores must be scrutinized by the psychologist to 
check which normative database is used and to ensure that 
proper corrections for age, education, and other factors are 
applied. 



FEEDBACK SESSION 

Psychological assessment results are of direct interest to the pa- 
tient, who is usually concerned about his or her mental or 
emotional problems. Informing interviews serve three pur- 
poses: (1) to review and clarify test results, (2) to acquire addi- 
tional information of relevance to the assessment process, and 
(3) to educate the patient and family about their condition 
(Baron, 2004; Baron et al, 1995). A step-by-step guide to com- 
municating results to patients and treatment teams is provided 
by Ryan et al. (1998). Crosson (2000) also discusses pitfalls and 
principles of giving feedback to patients and families. 

It is good practice to schedule an informing interview with 
the examinee soon after the assessment. Spouses and primary 
caregivers are typically invited to attend. In many cases (espe- 
cially with children), teachers, employers, case workers, or 
other persons directly involved may also be included at the 
discretion of the family. The informing interview usually 
starts with a review of the purpose of the assessment. The cli- 
nician may also ask the examinee and/or family about their 
expectations and what they hope to learn from the assess- 
ment, which helps clarify misinformation about the purpose 
and limits of the evaluation (Baron 2004; Baron et al., 1995). 
Test results can be summarized briefly and should be ex- 
plained in easily understood terms. It is good practice to be 
explicit and use examples. To prevent the impression that test 
results are kept "secret" from the client, it may be appropriate 
to explain why the psychologist came to a certain conclusion 
("You remember that list of words that you had to repeat? You 
had a lot of trouble remembering words compared to other 
people your age"). While it is true that few examinees have the 



training and sophistication to fully understand test scores 
(Lezak et al., 2004), terms like "average" or "seriously below 
average for your age" do make sense to most people. 

The most important parts of the informing interview, 
however, are the conclusions reached and the recommenda- 
tions. The patient wants to know whether he or she has a seri- 
ous problem, whether it is progressive, and what can be done 
about it. This should be discussed at some length and re- 
peated as necessary. Most clients retain only a small portion of 
the information given during a single feedback session. It is 
helpful to provide the examinee and/or family with additional 
written materials in some cases (i.e., phone and address of 
therapist, training group, or rehabilitation facility). 

At times, the psychologist may gain additional information 
during the informing interview that necessitates modifying 
the report in some way or leads to additional recommenda- 
tions. For this reason, some psychologists typically send out 
reports only after the informing interview is concluded 
(Baron, 2004; Baron et al., 1995). However, we find that this 
practice can cause unnecessary delays in the distribution of 
the report to the referring party in settings where results are 
needed quickly (i.e., inpatient settings). Additional, pertinent 
information may always be sent out in a letter or as an adden- 
dum to the report. 

As noted previously with regard to the report itself, focus- 
ing primarily on deficits during the feedback session may 
make patients (and their families) feel devastated; insensitive 
delivery of negative findings can also injure self-esteem and 
increase the risk of significant depression (Crosson, 2000); in 
the case of children, parents may feel hopeless and responsi- 
ble for their children's limitations. When extreme, these reac- 
tions may interfere with adjustment and rehabilitation 
processes. Such considerations underscore the importance of 
emphasizing strengths as well as weaknesses. Baron (2004; 
Baron et al. 1995) provides guidelines useful to pediatric 
neuropsychologists on how to gain parental acceptance, 
communicate results simply and define terminology, use re- 
statement, provide examples, encourage questions and par- 
ticipation, and decide when to include children in feedback 
sessions. 

As noted above, it may be appropriate to allow patients to 
read the report at the end of the session and to take a copy 
home. Providing written as well as oral feedback is one of the 
recommendations resulting from a study of consumers of 
neuropsychological assessment (Bennett-Levy et al., 1994; 
Cass & Brown, 1992). 



REFERENCES 

American Educational Research Association, American Psychological 
Association, & National Council on Measurement in Education. 
(1999). Standards for educational and psychological testing. Wash- 
ington, DC: American Educational Research Association. 

American Psychological Association. (1996). Statement on the dis- 
closure of test data. American Psychologist, 51, 644-648. 



96 A Compendium of Neuropsychological Tests 



American Psychological Association. (1997). Thesaurus of psychologi- 
cal index terms (8th ed.). Washington, DC: Author. 

American Psychological Association. (2002). Ethical principles of 
psychologists and code of conduct. American Psychologist, 57, 
1060-1073. 

Appelbaum, P. S., & Grisso, T. (1995). The MacArthur treatment 
competency study. I: Mental illness and competency to consent to 
treatment. Law and Human Behavior, 19, 105-126. 

Axelrod, B. N. (2000). Neuropsychological report writing. In 
R. D. Vanderploeg (Ed.), Clinician's guide to neuropsychological as- 
sessment (2nd ed., pp. 245-273). Mahwah, NJ: Lawrence Erlbaum 
Associates. 

Baron, I. S. (2004). Neuropsychological evaluation of the child. New 
York: Oxford University Press. 

Baron, I. S., Fennell, E. B., & Voeller, K. K. S. (1995). Pediatric neuropsy- 
chology in the medical setting. New York: Oxford University Press. 

Bau, C, Edwards, D., Yonan, C, & Storandt, M. (1996). The relation- 
ship of neuropsychological test performance to performance on 
functional tasks in dementia of the Alzheimer type. Archives of 
Clinical Neuropsychology, 11, 69-75. 

Bell, B. D., & Roper, B. L. (1998). "Myths of neuropsychology": An- 
other view. The Clinical Neuropsychologist, 12, 237-244. 

Bennett-Levy, J., Klein-Boonschate, M. A., Batchelor, J., et al. (1994). 
Encounters with Anna Thompson: The consumer's experience of 
neuropsychological assessment. The Clinical Neuropsychologist, 8, 
219-238. 

Bowman, M. L. (2002). The perfidy of percentiles. Archives of Clinical 
Neuropsychology, 17, 295-303. 

Brouwer, H.W.,&Withaar, F. K. (1997). Fitness to drive after traumatic 
brain injury. Neuropsychological Rehabilitation, 3, 177-193. 

Brown, L. B., Stern, R. A., Cahn-Weiner, D. A., Rogers, B., Messer, M. 
A., Lannon, M. C, Maxwell, C, Souza, T, White, T, & Ott, B. R. 
(2005). Driving scenes test of the Neuropsychological Assessment 
Battery (NAB) and on-road driving performance in aging and 
very mild dementia. Archives of Clinical Neuropsychology, 20, 
209-216. 

Cimino, C (2000). Principles of neuropsychological interpretation. 
In R. D. Vanderploeg (Ed.), Clinician's guide to neuropsychological 
assessment (2nd ed., pp. 69-109). Mahwah, NJ: LEA Publishers. 

Committee on Ethical Guidelines for Forensic Psychologists (1991). 
Specialty guidelines for forensic psychologists. Law and Human 
Behavior, 6, 655-665. 

Crosson, B. (2000) . Application of neuropsychological assessment re- 
sults. In R. D. Vanderploeg (Ed.), Clinician's guide to neuropsycho- 
logical assessment (2nd ed., pp. 195-244). Mahwah, NJ: LEA 
Publishers. 

Dobbs, A. R. (1997). Evaluating the driving competence of dementia 
patients. Alzheimers Disease and Related Disorders, JJ(Suppl. 1), 
8-12. 

Donders, J. (1999). Pediatric neuropsychological reports: Do they re- 
ally have to be so long? Child Neuropsychology, 5, 70-78. 

Donders, J. (2001). A survey of report writing by neuropsychologists. 
II: Test data, report format, and document length. Clinical Neu- 
ropsychologist, 15, 150-161. 

Doerr, H. O, & Carlin, A. S. (Eds.). (1991). Forensic neuropsychology: 
Legal and scientific bases. Odessa, FL: Psychological Assessment 
Resources. 

Dougherty, E., & Bortnick, D. M. (1990). Report writer: Adult's Intel- 
lectual Achievement and Neuropsychological Screening Test. 
Toronto, Ontario: Multi-Health Systems. 



Erickson, R. C, Eimon, P., & Hebben, N. (1992). A bibliography of 
normative articles on cognition tests for older adults. The Clinical 
Neuropsychologist, 6, 98-102. 

Eslinger, P. (2002). Neuropsychological interventions: Clinical research 
and practice. New York: Guilford Press. 

Faust, D., Ziskin, J., & Hiers, J. B. (1991). Brain damage claims: Coping 
with neuropsychological evidence (Vol. 2). Odessa, FL: Psychologi- 
cal Assessment Resources. 

Freides, D. (1993). Proposed standard of professional practice: Neu- 
ropsychological reports display all quantitative data. The Clinical 
Neuropsychologist, 7, 234-235. 

Freides, D. (1995). Interpretations are more benign than data? The 
Clinical Neuropsychologist, 9, 248. 

Gass, C. S., & Brown, M. C. (1992). Neuropsychological test feedback 
to patients with brain dysfunction. Psychological Assessment, 4, 
272-277. 

Green, P. (2003). Green's Word Memory Test. Edmonton: Green's Pub- 
lishing Inc. 

Grisso, T„ & Appelbaum, P. (1998). Assessing competence to consent to 
treatment: A guide for physicians and other health professionals. 
New York: Oxford University Press. 

Groth-Marnat, G. (2000). Neuropsychological assessment in clinical 
practice. New York: John Wiley & Sons. 

Hall, H. V, & Pritchard, D. A. (1996). Detecting malingering and 
deception. Forensic decision analysis. Delray Beach, FL: St. Lucie 
Press. 

Heaton, R. K., Miller, S. W., Taylor, M. J., & Grant, I. (2004). Revised 
comprehensive norms for an Expanded Halstead-Reitan Battery: 
Demographically adjusted neuropsychological norms for African 
American and Caucasian adults. Lutz, FL: Psychological Assess- 
ment Resources. 

Heaton, R. K., Taylor, M. J., & Manly, J. J. (2003). Demographic effects 
and use of demographically corrected norms with the WAIS-III 
and WMS-III. In Tulsky, D., Saklofske, R. K., Heaton, G., Chelune 
G, Ivnik, R. A., Bornstein, R. A., Prifitera, A., & Ledbetter, 
M. (Eds.), Clinical interpretation of the WAIS-III and WMS-III 
(pp. 183-210). San Diego, CA: Academic Press. 

Hebben, N., & Millberg, W. (2002). Essentials of neuropsychological 
assessment. New York: John Wiley and Sons. 

Ingraham, L. J., & Aiken, C. B. (1996). An empirical approach to de- 
termining criteria for abnormality in test batteries with multiple 
measures. Neuropsychology, 10, 120-124. 

Larrabee, G. J. (2000a). Association between IQ and neuropsycho- 
logical test performance: Commentary on Tremont, Hoffman, 
Scott and Adams (1998). The Clinical Neuropsychologist, 14, 
139-145. 

Larrabee, G. J. (2000b). Forensic neuropsychological assessment. In 
R. D. Vanderploeg (Ed.), Clinician's guide to neuropsychological as- 
sessment, (2nd ed.). Mahwah, NJ: LEA. 

Lees-Haley, P. R., & Cohen, L. J. (1999). The neuropsychologist as ex- 
pert witness: Toward credible science in the courtroom. In 
J. J. Sweet (Ed.), Forensic neuropsychology: Fundamentals and prac- 
tice. Lisse, the Netherlands: Swetts & Zeitlinger. 

Lees-Haley, P. R., & Courtney, J. C. (2000). Disclosure of tests and raw 
test data to the courts: A need for reform. Neuropsychology Re- 
view, 10(3), 169-182. 

Lemsky, C. M. (2000). Neuropsychological assessment and treatment 
planning. In G. Groth-Marnat (Ed.), Neuropsychological assess- 
ment in clinical practice: A guide to test interpretation and integra- 
tion (pp. 535-574). New York: John Wiley & Sons, Inc. 



Report Writing and Feedback Sessions 97 



Lezak, M. D., Howieson, D. B., & Loring, D. W. (2004). Neuropsycho- 
logical assessment (4th ed.). New York: Oxford University Press. 

Loring, D. W. (Ed.). (1999). INS dictionary of neuropsychology. New 
York: Oxford University Press. 

Matarazzo, J. D. (1995). Psychological report standards in neuropsy- 
chology. The Clinical Neuropsychologist, 9, 249-250. 

McCaffrey, R. J., Williams, A. D., Fisher, J. M., & Laing, L. C. (1997). 
The practice of forensic neuropsychology: Meeting challenges in the 
courtroom. New York: Plenum Press. 

McConnell, H. W. (1998). Laboratory testing in neuropsychology. In 
P. J. Snyder & P. D. Nussbaum (Eds.), Clinical neuropsychology: A 
pocket handbook for assessment (pp. 29-53). New York: American 
Psychological Association. 

Melton, G. B., Petrila, J., Poythress, N. G., & Slobogin, C. (1997). Psy- 
chological evaluations for the courts (2nd ed.). New York: Guilford. 

Mitrushina, M. N., Boone, K. B., Razani, J., & D'Elia, L. F. (2005). 
Handbook of normative data for neuropsychological assessment 
(2nd ed.). New York: Oxford University Press. 

Naugle, R. I., & McSweeny, A. J. (1995). On the practice of routinely 
appending neuropsychological data to reports. The Clinical Neu- 
ropsychologist, 9, 245-247. 

Otto, R. K., & Heilbrun, K. (2002). The practice of forensic psychol- 
ogy. American Psychologist, 57, 5-18. 

Ownby, R. L. (1997). Psychological reports (3rd ed.). New York: Wiley. 

Palmer, B. W., Boone, K. B., Lesser, J. M., & Wohl, M. A. (1998). Base 
rates of "impaired" neuropsychological test performance among 
healthy older adults. Archives of Clinical Neuropsychology, 13, 
503-511. 

Pope, K. S., Butcher, J. N., & Seelen, J. (1993). The MMPI, MMPI-2, 
and MMPI-A in court: A practical guide for expert witnesses and 
attorneys. Washington, D.C.: American Psychological Association. 

Reynolds, C. R., & Fletcher-Janzen, E. (1997). Handbook of clinical 
child neuropsychology (2nd ed.). New York: Plenum Press. 

Ryan, C. M., Hammond, K., & Beers, S. R. (1998). General assessment 
issues for a pediatric population. In P. J. Snyder & P. D. Nussbaum 
(Eds.), Clinical neuropsychology: A pocket handbook for assessment 
(pp. 105-123). Washington, D.C.: American Psychological Associ- 
ation. 

Sattler, J. (2001). Assessment of children: Cognitive applications (4th 
ed.). San Diego, CA: Jerome M. Sattler. 

Sbordone, R. J. (1995). Neuropsychology for the attorney. Delray 
Beach, FL: St. Lucie Press. 

Slick, D. J., Sherman, E. M. S., & Iverson, G. L. (1999). Diagnostic cri- 
teria for malingered neurocognitive dysfunction: Proposed stan- 
dards for clinical practice and research. Clinical Neuropsychologist, 
13, 545-561. 



Slick, D. J., Tan, J. E., Strauss, E. H., & Hultsch, D. F. (2004). Detecting 
malingering: A survey of experts' practices. Archives of Clinical 
Neuropsychology, 19, 465-473. 

Snyder, P. J., & Ceravolo, N. A. (1998). The medical chart: Efficient 
information-gathering strategies and proper chart noting. In 
P. J. Snyder & P. D. Nussbaum (Eds.), Clinical neuropsychology: A 
pocket handbook for assessment (pp. 3-10). New York: American 
Psychological Association. 

Sweet, J. J. (1999). Forensic neuropsychology: Fundamentals and prac- 
tice. Lisse, the Netherlands: Swets & Zeitlinger. 

Sweet, J. J., Newman, P., & Bell, B. (1992). Significance of depression 
in clinical neuropsychological assessment. Clinical Psychology Re- 
view, 12,21-45. 

Tallent, N. (1993). Psychological report writing (4th ed.). Englewood 
Cliffs, NJ: Prentice Hall. 

Taylor, M. J., & Heaton, R. K. (2001). Sensitivity and specificity of 
WAIS-III/WMS-III demographically corrected factor scores in 
neuropsychological assessment. Journal of the International Neu- 
ropsychological Society, 7, 867-874. 

Tremont, G. (1998). Effect of intellectual level on neuropsychological 
test performance: A response to Dodrill (1997). The Clinical Neu- 
ropsychologist, 12, 560-567. 

Vanderploeg, R. D. (2000). Clinician's guide to neuropsychological as- 
sessment (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates. 

Wechsler, D. (1997). Wechsler Adult Intelligence Scale — III. San Anto- 
nio, TX: The Psychological Corporation. 

Williams, A. D. (1997a). Fixed versus flexible battery. In R. J. McCaf- 
frey, A. D. Williams, J. M. Fisher, & L. C. Laing (Eds.), The practice 
of forensic neuropsychology: Meeting challenges in the courtroom 
(pp. 57-70). New York: Plenum Press. 

Williams, A. D. (1997b). The forensic evaluation of adult traumatic 
brain injury. In R. J. McCaffrey, A. D. Williams, J. M. Fisher, & 
L. C. Laing (Eds.), The practice of forensic neuropsychology: Meeting 
challenges in the courtroom (pp. 37-56). New York: Plenum Press. 

Williams, M. A., & Boll, T. J. (2000). Report writing in clinical neu- 
ropsychology. In G. Groth-Marnat (Ed.), Neuropsychological as- 
sessment in clinical practice: A guide to test interpretation and 
integration (pp. 575-602). New York: John Wiley & Sons, Inc. 

Ylvisaker, M. (1997). Traumatic brain injury rehabilitation: Children 
and adolescents (2nd ed.). New York: Butterworth-Heinemann. 

Ziskin, J., & Faust, D. (1988). Coping with psychiatric and psychologi- 
cal testimony (4th ed., Vol. 1-3). Marina Del Rey, CA: Law & Psy- 
chologys Press. 

Zuckerman, E. L. (1995). The clinician's thesaurus: A guidebook for 
wording psychological reports and other evaluations (4th ed.). 
Toronto, Ontario: Mental Health Systems. 



General Cognitive Functioning, 
Neuropsychological Batteries, and Assessment 
of Premorbid Intelligence 



GENERAL COGNITIVE FUNCTIONING 
IQ Tests 

Models of Intelligence 

There are diverse conceptions of intelligence and how it should 
be measured (e.g., Neisser et al., 1996; Sattler, 2001). Some the- 
orists (e.g., Spearman, 1927) have emphasized the importance 
of a general factor, g, which represents what all the tests (or 
subtests) have in common and determines how one would 
perform on the variety of tasks tapping intelligence; others 
(e.g., Thurstone, 1938) have theorized that there are actually 
multiple intelligences, each reflecting a specific dimension of 
cognitive ability, such as memory, verbal comprehension, or 
number facility. One of the most important theories in con- 
temporary intelligence research is based on the factor-analytic 
work of Carroll (1993, 1997) and Horn and Noll (1997). The 
Carroll-Horn-Cattell (CHC) framework (Flanagan & Mc- 
Grew, 1997) is a model that synthesizes Carroll and Horn- 
Cattell's models and stresses several broad classes of abilities at 
the higher level (e.g., fluid ability [Gf], crystallized intelligence 
[Gc], short-term memory, long-term storage and retrieval, 
processing speed), and a number of primary factors at the 
lower level (e.g., quantitative reasoning, spelling ability, free re- 
call, simple reaction time). The CHC factor- analytic theory of 
cognitive abilities is shown in Figure 6-1. Because contempo- 
rary theories of the structure of cognitive functioning empha- 
size multiple, somewhat independent factors of intelligence 
(e.g., Carroll, 1997; Flanagan et al, 2000; Larrabee, 2004; Mc- 
Grew, 1997; Tulsky et al, 2003a, 2003b), intelligence is best 
evaluated with multifaceted instruments and techniques. 

While modern models of intelligence such as the CHC 
model derive from an empirical framework (i.e., factor analy- 
sis), they are a relatively recent development in the history of 
applied intelligence testing. From the time of the very first 
standardized tests, intelligence tests were used by clinicians 
not because of empirically demonstrated rigor, but because of 



practical concerns and clinical utility. For instance, part of the 
impetus for the development of the Binet-Simon Scale (1905) 
was the French government's request that Binet and Simon 
find a way to examine school-aged children who had mental 
retardation (Sattler, 2001; Tulsky et al, 2003b). The goals of 
the Army Alpha and Beta forms developed by Robert M. 
Yerkes (1921) and his colleagues were to guide military selec- 
tion, that is, to screen intellectually incompetent individuals 
to exempt them from military service and to identify those ca- 
pable of more responsibility and greater duties (Tulsky et al., 
2003b). David Wechsler developed the first version of his fa- 
mous test to detect impairments in psychiatric inpatients in 
New York's Bellevue Hospital. Thus, the main criterion for se- 
lecting and constructing these early IQ tests was clinical rele- 
vance, not congruence with empirically derived theories of 
cognitive functioning. 

While they were revered and almost universally accepted 
by practicing clinicians, traditional IQ tests were not well re- 
ceived by contemporary theorists, particularly in the light of 
hindsight afforded by statistical techniques. As a result, tests 
grounded in clinical tradition, such as the Wechsler scales, 
have been criticized for spawning simplistic models of human 
intelligence such as the dichotomy of verbal versus nonverbal 
intelligence (Flanagan & McGrew, 1997), or more recently, the 
four-factor model of intelligence composed of verbal, non- 
verbal, working memory, and processing speed components 
(Flanagan & McGrew, 1997). However, regardless of empiri- 
cally based criticisms, traditional tests, including the many de- 
scendents of the first Binet and Wechsler scales, are some of 
the most important milestones in the history of psychological 
assessment, and as such, have contributed extensively to our 
knowledge of cognitive function in abnormal and normal 
populations. What is most intriguing is that there appears to 
be a rapprochement between traditional clinically based tests 
and those based on factor-analytic models of intelligence. For 
example, tests derived from different traditions are becoming 
increasingly similar in content, as a review of several of the 
major intelligence batteries in this volume will indicate, as test 



98 



Figure 6-1 An integrated Cattell-Horn-Carroll Gf-Gc model of the structure of cognitive 
abilities. Note: Italic font indicates abilities that were not included in Carroll's three-stratum 
model but were included by Carroll in the domains of knowledge and achievement. Bold font 
indicates abilities that are placed under different Gf-Gc broad abilities than in Carroll's model. 
These changes are based on the Cattell-Horn model and/or recent research (see McGrew, 1997, 
and McGrew & Flanagan, 1998). Source: Reprinted with permission from Flanagan et al., 2000. 







/ Decision/ \ 
/ Reaction Time 
\ Speed i 

\ < & > / 



Simple 

Reaction 

Time 



Choice 

Reaction 

Time 



Semantic 
Processing 



Mental 
Comparison 



100 A Compendium of Neuropsychological Tests 



developers adapt traditional tests to fit models such as the 
CHC model (e.g., SB-5, WJ III, WPPSI-III, WISC-IV). An- 
other example is the use of factor-analytic techniques to parse 
traditional measures into more contemporary models of cog- 
nitive functioning (e.g., Tulsky's six-factor model of the 
WAIS-III/WMS-III). Still another is the combination of sub- 
tests from traditional and nontraditional batteries, including 
the Stanford-Binet, the Wechsler, and factor-analysis-based 
scales such as the WJ III, into combined batteries that assess 
all the different dimensions of cognitive ability posited by fac- 
tor-analytic approaches (i.e., Cross-Battery Approach) (Flana- 
gan & McGrew, 1997; Flanagan et al., 2000). Overall, new tests 
based on empirical models such as factor analysis (e.g., 
Woodcock-Johnson-III, CAS) and new conceptualizations of 
existing tests (e.g., WAIS-III, WISC-IV) tend to assess a 
broader spectrum of abilities than previously represented and 
tend to be more consonant with factor-based theories of cog- 
nitive functioning. 

Content 

The more common measures of general intellectual function 
(e.g., the Wechsler Tests, WJ III, CAS, and the Stanford-Binet) 
include many different types of items, both verbal and non- 
verbal. Examinees may be asked to give the meaning of words, 
to complete a series of pictures, to construct block patterns, 
etc. Performance can then be scored to yield several subscores 
and composite scores. Although intelligence tests usually cor- 
relate highly with each other (e.g., .50 to .70), each test provides 
a significant amount of unique variance such that intelligence 
tests are not necessarily interchangeable. Thus, because the 
various intelligence tests sample different combinations of 
abilities, an individual's IQ is likely to vary from one test to 
another. It is also worth bearing in mind that a wide range of 
human abilities are outside the domain of standard intelli- 
gence tests (e.g., Neisser et al., 1996; Tulsky et al., 2003a). Ob- 
vious facets include wisdom, creativity, practical knowledge, 
and social skills. 

Utility of IQ scores 

There has been dispute over the utility of IQ scores. Some ar- 
gue that these scores are exceedingly reliable and therefore 
worthy of attention. Indeed, IQ tests are unique in that their 
reliabilities commonly exceed .95, a level that few other tests 
are able to achieve (e.g., Kaufman & Lichtenberger, 1999). 
Others (e.g., Lezak et al., 2004) contend that because of the 
multiplicity of cognitive functions assessed in these batteries, 
composites such as IQ scores are not useful in describing cog- 
nitive test performance and only serve to obscure important 
information obtainable only by examining discrete scores. In 
addition to more traditional summary or IQ scores, most 
contemporary tests (e.g., Wechsler, WJ III, CAS, Stanford- 
Binet) typically include measures of more discrete factors and 
domains. Such an approach is also consistent with current 
views of cognitive functioning outlined previously. 



Table 6-1 Percentile Ranks by Educational Level 









Years of Education 






Percentile 


0-7 


8 


9-11 


12 


13-15 


16+ 


VIQ 














95th 


105 


108 


119 


120 


126 


135 


75th 


91 


98 


105 


108 


115 


123 


50th 


82 


90 


96 


100 


108 


116 


25th 


73 


83 


87 


92 


100 


108 


5 th 


60 


72 


73 


80 


90 


97 


PIQ 














95th 


109 


117 


122 


122 


125 


132 


75th 


95 


103 


108 


109 


113 


120 


50th 


84 


93 


98 


100 


105 


111 


25th 


74 


83 


88 


91 


97 


102 


5th 


60 


69 


73 


78 


86 


90 


FSIQ 














95th 


106 


111 


120 


121 


126 


135 


75th 


92 


99 


106 


108 


115 


123 


50th 


82 


91 


96 


100 


107 


115 


25th 


73 


83 


87 


92 


100 


107 


5 th 


59 


71 


73 


79 


89 


95 



Source: Adapted from Ryan et al., 1991. Copyright by the American Psychological As- 
sociation. Reprinted with permission. 



While some are critical of IQ tests, they do not dispute the 
fact that IQ scores predict certain forms of achievement — 
especially school achievement — quite effectively. Table 6-1 
presents estimated percentile ranks for WAIS-R VIQ, PIQ, 
and FSIQ at six different educational levels (Ryan et al, 1991). 
Similar findings are obtained for the WAIS-III (see WTAR 
Manual, 2001; also, see WAIS-III in this volume). This table 
shows how IQ levels differ across education levels. For exam- 
ple, one might assume that following a traumatic brain injury, 
a child who obtains an FSIQ of 90 or lower would be unlikely 
to complete college since fewer than 5% of university- 
educated individuals have IQ scores in this range. 

Because IQ scores predict educational achievement, they 
also predict occupational and financial outcome (Neisser et al., 
1996). The correlation between IQ scores and grades is about 
.50; correlations with achievement test results are higher (the 
Psychological Corporation, 2002). Note, however, that corre- 
lations of this magnitude account for only about 25% of the 
overall variance. Other individual characteristics (e.g., persis- 
tence, interest) are probably of equal or greater importance in 
achieving academically. However, we may not have equally re- 
liable ways of measuring these traits. With these reservations 
in mind, IQ scores are clearly relevant to the neuropsycholog- 
ical context and need to be considered. 



Comprehensive IQ Batteries 

Choosing the appropriate IQ test is an integral part of the as- 
sessment process. The breadth of abilities that are tapped by 



General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 101 



various measures is obviously a critical concern. For example, 
with the exception of the WAIS-III Matrix Reasoning subtest, 
the scale does not contain any strong indicators of fluid rea- 
soning (see Flanagan et al., 2000, for a more complete discus- 
sion of the limitations of most current batteries in containing 
measures that sufficiently approximate the full range of abili- 
ties that define current views of intelligence). Further, while 
many clinicians routinely administer the same IQ test to all 
examinees, there are instances where a different IQ battery 
may be more appropriate. For instance, individuals who will 
require longitudinal assessment may be better suited to an IQ 
test that spans all the age ranges (e.g., WJ III), whereas others 
may require a test with a major nonverbal component (e.g., 
SB5). Considerations about conorming should also inform 
battery selection (e.g., depending on whether a memory 
workup or learning disability assessment will be conducted), 
since not all IQ tests are conormed with major memory or 
achievement tests. In Table 6-2, we list the major assessment 
batteries, with respective age ranges, domains assessed, typical 
administration times, and conormed tests. We tend to give the 
IQ test early in the course of the assessment because it allows 
the examiner to observe how the client behaves on a wide 
array of tasks. In this way, the examiner can develop hypothe- 
ses about the patient's spared and impaired abilities that can 
then be tested more thoroughly during the course of the as- 
sessment. 

In this chapter, we present the Wechsler Intelligence Scales 
(WPPSI-III, WISC-IV, WAIS-III), which have played a cen- 
tral role in neuropsychological thinking about intelligence 
and psychological testing. For example, the WAIS-III (and its 
previous versions) is one of the most frequently used mea- 
sures in neuropsychological batteries (e.g., Camara et al., 
2000; Rabin et al., 2005) and is often considered "the gold 
standard" in intelligence testing (e.g., Ivnik et al., 1992). The 
same can be said for the WISC-IV, and to a lesser extent, for 
the WPPSI-III. 

A number of authors (e.g., Flanagan et al., 2000; Larrabee, 
2004; Tulsky et al., 2003b) argue that a more complete evalua- 
tion of intellectual functioning necessitates supplementing the 
traditional Wechsler scales with other instruments to measure 
facets of g that are either not covered or insufficiently covered 
by the Wechsler tests. Thus, Tulsky et al. (2003b) have devel- 
oped a six-factor model from the WAIS-III and WMS-III (see 
WAIS-III and WMS-III) that includes verbal, perceptual, pro- 
cessing speed, working memory, auditory memory, and visual 
memory constructs. Whether the model increases clinical sen- 
sitivity and specificity remains to be determined. Another 
technique is to use the Cross-Battery Approach, which adds 
on specific subtests from other major IQ tests to supplement 
areas not fully assessed by the Wechsler scales (e.g., Flanagan 
etal.,2000). 

We include a number of other measures of general intel- 
lectual functioning that have taken advantage of recent theo- 
retical advances in the field. The structure of the WJ III COG 
is closely modeled after the CHC theory of intelligence. It has 
a three-level hierarchical structure consisting of (a) psycho- 



metric g (i.e., the General Intellectual Ability score [GIA]), 
(b) Stratum II level abilities (represented by seven factors), 
and (c) Stratum III abilities (represented by 20 subtests mea- 
suring narrow abilities). In addition, it yields five Clinical 
Clusters that measure abilities important for clinical evalua- 
tion, including executive functioning, working memory, and 
attention. 

The SB5 also measures a range of abilities in the CHC 
framework. The nonverbal domain includes a broad range of 
subtests that may prove useful in the neuropsychological set- 
ting. However, the omission of measures of speed of pro- 
cessing likely limits the test's sensitivity to impairment. 

The Cognitive Assessment Scale (CAS; Naglieri & Das, 
1997) is a theory-based measure of intelligence, deriving from 
the work of A. Luria. It is based on the notions that "cognitive 
processes" should replace the term "intelligence" and that a 
test of cognitive processing should rely as little as possible on 
acquired knowledge such as vocabulary or arithmetic. The au- 
thors propose that cognition depends on four interrelated es- 
sential elements: planning, attention, simultaneous processes, 
and successive processes. These are thought to interact with 
the individual's knowledge base and skills. 

The Bayley Scales (BSID-II; Bayley, 1993) are the most 
widely used measures for infants and very young children. 
The BSID-II is a developmental assessment test that measures 
cognitive, motor, and physical development with an item con- 
tent that is theoretically eclectic and broad. Despite the con- 
siderable challenges of assessing infants and toddlers, the test 
has a strong psychometric base. 

IQ Screening Methods 

For screening purposes, when a global IQ estimate is suffi- 
cient or when time constraints are an issue, examiners might 
consider giving a test such as the K-BIT (Kaufman & Kauf- 
man, 1990). It has the advantage of being motor-free, is brief 
to administer, and provides measures of both verbal and non- 
verbal functioning. 

However, tests such as the WASI (the Psychological Corpo- 
ration, 2001), with its four subtests (Vocabulary, Similarities, 
Block Design, Matrix Reasoning), show lower correlations 
among subtests and therefore may provide the clinician with 
more meaningful information by tapping a broader array of 
cognitive functions (Hays et al., 2002). The WASI also has the 
advantage that it was developed using linking samples with 
the WISC-III and WAIS-III. Wechsler short forms (see WAIS- 
III in this volume), however, are somewhat better at predict- 
ing WAIS-III summary scores and some (e.g., Kaufman's 
tetrad of Arithmetic, Similarities, Picture Completion, and 
Digit Symbol) require less time to administer. In addition, the 
short forms have the advantage that, should there be a need 
for a complete Wechsler profile, the nonadministered subtests 
can be administered at a later date (Eisenstein & Engelhart, 
1997). Many other established batteries (e.g., CAS, WJ III) 
provide a short-form IQ estimate based on a smaller number 
of subtests than the standard form. 



Table 6-2 Characteristics of Tests of General Intellectual Functioning 



Test Age Range Administration Time 

Bayley 1 month-3 years, 5 months Depends on age: 25-60 min 



CAS 



SB5 



5-17 years, 11 months 



DRS-2 55+ years 

K-BIT 4-90 years 

KBNA 20-89 years 



NAB 18-97 years 



NEPSY 3-12 years 



MMSE 18-85+ years with some 

limited data available 
for children 



Raven's 5 years 5 months+ 



RBANS 20-89 years 



2-85+ years 



Basic: 40 min 
Standard: 60 min 



Depends on mental status: 
Healthy elderly: 10-15 min 
Impaired: 30-45 min 



15-30 min 
120 min 



Depends on which modules 

are given: 

Screening: 45 min 

Individual modules: 25-45 min 

All modules except screening: 

3hr 

All modules: ~4 hr 

Depends on age: 

Age 3-4: 45 min Core; 1 hr full 

Ages 5+: 65 min Core; 2 hr full 



10-15 min 



SPM: 40 min 
CPM: 25 min 
APM: 40-60 min 

20-30 min 



TONI-3 6-89 years, 1 1 months 



Abbreviated: 15-20 min 
Full: 45-75 min 



15-20 min 



Domains Purported 
to Assess 

Mental Development 
Motor Development 
Behavioral Rating Scale 

Planning Processes 
Attention Processes 
Simultaneous Processes 
Successive Processes 

Attention 

Initiation/Perseveration 
Construction 
Conceptualization 
Memory and Attention 

Verbal Knowledge/Reasoning 
Nonverbal Reasoning 

Attention/Concentration 
Memory — Immediate, 
Delayed, Recognition 
Spatial Processing 
Verbal Fluency 
Reasoning/Conceptual Shifting 

Screening 

Attention 

Language 

Memory 

Spatial 

Executive Functions 

Attention/Executive 
Language 

Sensorimotor Functions 
Visuospatial Processing 
Memory and Learning 

Orientation 

Registration 

Attention/ Calculation 

Recall 

Language 

Nonverbal Reasoning 



Immediate Memory 

Visuospatial/ Constructional 

Language 

Attention 

Delayed Memory 

Fluid Reasoning 
Knowledge 

Quantitative Reasoning 
Visual-Spatial Reasoning 
Working Memory 

Nonverbal Reasoning 



Normative Linking 
Samples 



WJ III ACH 



(continued) 



General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 103 



Table 6-2 Characteristics of Tests of General Intellectual Functioning (continued) 



Test 

WAIS-III 



WASI 



Age Range 

16-89 



WISC-IV 6-16 years, 1 1 months 



WPPSI-III 2 years, 6 months-7 years, 

3 months 



6-89 years 



WJ III COG 2-90+ years 



Administration Time 

45-90 min 

-90 min 



Young children: -40 min 
Older children: 40-85 min, 
depending on which 
subtests are given 

15-30 min, depending on 
whether two or four 
subtests are given 

Standard: 25-35 min 
Extended: 90-120 min 
Brief: 1 5 min 



Domains Purported 
to Assess 

Verbal Knowledge/Reasoning 
Working Memory 
Nonverbal Reasoning 
Speed of Processing 

Verbal Knowledge/Reasoning 
Perceptual Reasoning 
Working Memory 
Speed of Processing 

Verbal Knowledge/Reasoning 
Nonverbal Reasoning 
Speed of Processing 
Language Ability 

Verbal Reasoning 
Nonverbal Reasoning 

Comprehension-Knowledge 
Long-Term Retrieval 
Visual-Spatial Thinking 
Auditory Processing 
Fluid Reasoning 
Processing Speed 
Short-Term/Working Memory 
Attention 
Cognitive Fluency 
Executive Processes 
Delayed Recall 



Normative Linking 
Samples 

WMS-III 

WIAT-II 

WASI 

WTAR 



WIAT-II 



WAIS-III 
WISC-III 

WJ III ACH 



Overall, screening forms should not be used to categorize 
an individual's intelligence for diagnostic purposes or disability 
determination. Rather, they should be reserved for situations 
where gross estimates will suffice or when patient stamina is 
an issue. The choice of test should be guided by a number of 
considerations, including the test's clinical utility, psychomet- 
ric characteristics, and time constraints. 

Nonverbal IQ Tests 

The TONI-3 (Brown et al., 1997) and Raven's Progressive Ma- 
trices (Raven, 1938, 1947, 1965) maybe considered in patients 
whose test performance may be confounded by language, 
hearing, or motor impairments, or who lack proficiency with 
English and/or come from diverse ethnic and racial back- 
grounds. The Raven's tests are older and better researched. 
While not culture-free, they are more culture-fair than tradi- 
tional IQ tests such as the Wechsler. New versions of the 
Raven's tests have also been developed to overcome the ero- 
sion of discriminative power as a result of the worldwide in- 
crease in intellectual ability over the years (i.e., Flynn effect). 

It is important to bear in mind that the one-dimensional 
nature of tasks such as the Raven's and the TONI-3 provide 
little information about an individual's strengths or weak- 



nesses, and both suffer from psychometric limitations. The 
motor- reduced component and untimed nature of these tasks 
may also make them relatively insensitive to various forms of 
impairment. Lastly, comprehensive nonverbal batteries such 
as the UNIT, C-TONI, or Leiter-R are more appropriate for 
diagnostic assessment than the one-dimensional tests such as 
the TONI-3 or Raven's (these are reviewed in Essentials of 
Nonverbal Assessment: McCallum et al., 2001). 

Neuropsychological Batteries 

Broad-based neuropsychological test batteries, such as the 
Halstead-Reitan Neuropsychological Battery (HRNB; Reitan & 
Wolfson, 1993) and the Luria-Nebraska Neuropsychological 
Battery (LNNB; Golden et al., 1985) have been used in the past 
to assess the presence, location, and extent of cerebral damage. 
With the advent of sophisticated neuroimaging procedures, 
the role of neuropsychology has shifted, with clinicians being 
asked to address other issues (e.g., nature of the cognitive 
deficit, potential for cognitive compensation/retraining and 
functional impact). In addition, the economics of health care 
has shifted focus toward brief assessment. In response to these 
changes, a number of authors have developed new tools for the 
assessment of a wide array of cognitive skills. Thus, the KBNA 



104 A Compendium of Neuropsychological Tests 



(Leach et al., 2000) consists of 25 subtests, some of which can 
be combined into indices that represent higher-order domains 
of functioning. The KBNA also measures behaviors commonly 
overlooked by neuropsychologists and behavioral neurologists 
(e.g., praxis, emotion expression). 

The Neuropsychology Assessment Battery (NAB; Stern & 
White, 2003) is a "new-generation" battery that provides a 
fairly comprehensive evaluation of functions in about 3.5 to 
4 hours (White & Stern, 2003). It offers a separate screening 
module to determine the presence/absence of impaired per- 
formance and the need for additional follow-up testing with 
any of the main NAB modules. Table 6-2 shows the six NAB 
modules. Each NAB module has two equivalent/parallel 
forms (Form 1 and 2) and each form consists of 33 individual 
tests. Of note, each module contains one daily living task that 
is designed to be congruent with an analogous real-world be- 
havior. Because each test was normed on the same standardi- 
zation sample, the examiner can use a single set (rather than 
several different sets) of normative tables including appropri- 
ate age, sex, and education corrections. These coordinated 
norms allow for within- and between-patient score compar- 
isons across the NAB tests. 

For children, the NEPSY (Korkman et al., 1998) is the first 
instrument designed exclusively and a priori as a neuropsy- 
chological battery for children. Although there are other neu- 
ropsychological batteries for children, they are based on 
modifications and downward extensions of existing adult bat- 
teries. 

Screening Tools 

Comprehensive batteries provide considerable data but can be 
time consuming. Another approach involves brief screening 
instruments such as the MMSE (Folstein et al., 1975), DRS 
(Jurica et al., 2001; Mattis, 1976), and RBANS (Randolph, 
1998). 

The MMSE is very popular. Most studies report that the 
MMSE summary score is sensitive to the presence of demen- 
tia, particularly in those with moderate to severe forms of 
cognitive impairment. However, it is less than ideal in those 
with mild cognitive deficits. 

The DRS is more comprehensive than the MMSE. It more 
accurately tracks progression of decline and is better able to 
discriminate among patients with dementia. Nonetheless, the 
clinician may prefer the MMSE, particularly with those indi- 
viduals who have difficulty concentrating longer than five to 
10 minutes. Further, although sensitive to differences at the 
lower end of functioning, the DRS may not detect impair- 
ment in the higher ranges of intelligence, since it was devel- 
oped to avoid floor effects in clinically impaired populations 
rather than ceiling effects in high-functioning individuals. 

The RBANS appears to be more sensitive to impairment 
than either the MMSE or the DRS. However, it too was de- 
signed for use with healthy adults as well as people with mod- 
erate/severe dementia, suggesting that it may have limited 
utility at the higher end of the intellectual continuum. 



PREMORBID ESTIMATION 

In neuropsychological assessment, the diagnosis of impairment 
requires some standard against which to compare current per- 
formance. Effectively comparing a person's performance to 
some population average score will depend on how closely the 
individual matches the population in terms of demographics 
and past experiences. However, the predictions are to a hypo- 
thetical average rather than to the specific individual under 
consideration (Franzen et al, 1997). The same test score can 
represent an entirely normal level of functioning for one 
individual and yet a serious decline for another (Crawford 
et al., 2001). Therefore, it is necessary to compare performance 
against an individualized comparison standard. Since premor- 
bid neuropsychological test data are rarely available, it becomes 
necessary to estimate an individual's premorbid level of func- 
tioning. 

Self- report is not a reliable basis for the estimation of pre- 
morbid ability. Even individuals with no external incentives to 
misrepresent their past achievements tend to inflate their re- 
call of their high school grades and past achievements. Studies 
of college students (Bahrick et al, 1996) and of psychiatric 
patients (Johnson-Greene et al., 1997) reveal that accuracy of 
grade recall declines with actual letter grade; namely, worse 
actual achievement is associated with more inflated reports. 
There are probably a variety of reasons that patients, and non- 
patients, overestimate their achievements. Some may inflate 
their preinjury ability or deny premorbid difficulties as a re- 
sponse to an adversarial context (Greiffenstein & Baker, 2001; 
Greiffenstein et al., 2002); others may misrepresent their abili- 
ties because of cognitive compromise. If a clinician interprets 
test scores in the context of exaggerated reports of past per- 
formance, he or she may overestimate the extent of a patient's 
decline and may infer deterioration when none exists (Glad- 
sjo et al., 1999). 

Past school, employment or military records may provide 
clues regarding premorbid ability. Relying on school records 
avoids the pitfalls of accepting unsubstantiated self-report. 
Grades in school records or performance reports on the job 
are confounded by the subjectivity in judgment of the various 
raters. Nonetheless, there is evidence that grade point average 
(GPA) is related to subsequent neuropsychological test per- 
formance in well-motivated neurologically intact individuals, 
suggesting that grade markings can provide an important 
context for interpretation of such measures (Greiffenstein & 
Baker, 2003). 

Standardized test scores from school records can be in- 
valuable as indicators of preinsult ability. In addition, it is 
known that achievement tests show moderate/strong cor- 
relations with measures of intelligence (e.g., Baade & Schoen- 
berg, 2004; the Psychological Corporation, 2002). Spelling 
scores correlate less with Wechsler FSIQ than do reading and 
mathematics scores. Recently, Baade and Schoenberg (2004) 
suggested a procedure to predict WAIS-R FSIQ using group- 
administered achievement test scores. However, additional 
studies that examine the relations between achievement tests 



General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 105 



(e.g., American College Test, Scholastic Achievement Test) 
given during school years and measures such as the WAIS- 
III given many years later are needed to ensure the adequacy 
of this method. Further, historical data are often difficult to 
acquire or may not even exist. Therefore, other approaches 
to estimating premorbid cognitive functioning are neces- 
sary. 

Premorbid Estimation in Adults 

The most investigated procedures to estimate premorbid abil- 
ity are based on (a) demographics alone, (b) current perfor- 
mance on tasks thought to be fairly resistant to neurological 
insult, and (c) combinations of demographics and current 
performance. We consider each of these methods in the fol- 
lowing sections. It should be noted that much of the work 
conducted in this area has focused on premorbid estimation 
of IQ (and typically as measured by the Wechsler scale); how- 
ever, increasingly efforts are being directed to other functions 
as well. 

Interestingly, while methods to predict premorbid ability 
receive considerable study in the literature, practitioners tend 
not to use these measures in clinical practice (Smith-Seemiller 
et al., 1997). The extent to which this may reflect a lack of 
awareness of these measures, as opposed to the perception 
that they are unnecessary or ineffective, is unclear (Smith- 
Seemiller et al., 1997). Some of the limitations of these meth- 
ods are that they may not detect patients with clinically 
significant impairment, either (a) because of inadequate sen- 
sitivity (e.g., error band is too wide) or (b) because of inade- 
quate resistance of the predictor to withstand the effects of the 
disease. However, our view is that it is worthwhile to use one 
of these methods. If the discrepancy between estimated and 
obtained performance does not exceed the cutoff, then the 
technique has failed to show evidence of abnormality and the 
patient may still have clinically significant decline; however, if 
the obtained discrepancy does exceed the cutoff, then the re- 
sult may have clinical value (Graves et al., 1999). Although the 
various methods reviewed have limited sensitivity, it is impor- 
tant to bear in mind that clinicians' informal estimates are 
even less accurate (Crawford et al., 2001; Williams, 1997). 

Demographic Prediction 

In this approach, premorbid ability is estimated on the basis 
of demographic variables, such as age, education, occupation, 
sex, and ethnicity. In the case of IQ, education, race, and occu- 
pation are the strongest predictors and these variables are 
combined in multiple regression equations to predict the var- 
ious IQ scores. The power of demographics in predicting 
Wechsler IQ scores is highest for FSIQ and VIQ (and VCI). 
Thus, Wilson et al. (1978) evaluated the ability of five demo- 
graphic variables (age, sex, race, education, and occupation) 
to predict WAIS IQs and obtained moderately favorable re- 
sults. The equations were able to predict 54%, 53%, and 42% 
of the variance in FSIQ, VIQ, and PIQ, respectively. The origi- 



nal WAIS was revised in 1981 and Barona et al. (1984) up- 
dated the formulas for use with the revision, the WAIS-R (see 
Spreen & Strauss, 1998, for the formulas and worksheet). 
These indices were able to account for only 36%, 38%, and 
24% of FSIQ, VIQ, and PIQ scores, respectively. In addition, 
the standard errors of estimate for the regression equations 
were rather large (e.g., 12.14 for WAIS-R FSIQ). The smaller 
proportion of variance explained in nonverbal ability has also 
been noted by Crawford and Allen (1997). They used the de- 
mographic approach to predict the WAIS-R IQ in the United 
Kingdom. Their regression equations predicted 53%, 53%, 
and 32% of the variance in FSIQ, VIQ, and PIQ, respectively. 

Methods for predicting premorbid IQ involving demo- 
graphic factors alone have also been developed for the most 
recent version of the WAIS (WAIS-III) based on a subsample 
of the Wechsler standardization sample (see Table 6-3). The 
WTAR Manual (2001; see review in this volume) provides 
regression-based indices and tables (separately for ages 16-19 
and 20-89 years) that allow the examiner to predict WAIS-III 
VIQ, PIQ, FSIQ, and VCI from a combination of educational 
level, race/ethnicity, and sex. The strongest predictor is educa- 
tion, followed by ethnicity. The contribution of sex is larger 
for minority groups than for Caucasians. As was the case with 
earlier Wechsler versions, prediction is better for VIQ, VCI, 
and FSIQ than for PIQ. 

Demographic methods (formal regression equations) have 
the advantage of being applicable to a wide variety of clients 
and, unlike performance on cognitive tests, they are not subject 
to decline in clinical conditions (e.g., dementia). They are also 
not compromised by suboptimal effort — a problem that may 
plague performance-based measures, particularly in medical- 
legal contexts. In fact, large discrepancies between expected 
and obtained scores may even prove helpful in identifying in- 
sufficient effort (Demakis et al., 2001). In addition, regression 
equations are more accurate than clinicians' informal esti- 
mates based on the same information (Crawford et al., 2001). 
Actuarial methods apply the optimal weights to the demo- 
graphic predictor variables, thereby maximizing accuracy, 
whereas clinical estimates are often based on vague or dis- 
torted impressions of cognition-demographic relationships 
(Crawford et al., 2001). Actuarial methods also have higher 
rates of interrater reliability than clinical judgment because 
they limit subjective estimates (Barona et al., 1984). 

There are however, a number of concerns associated with 
demographic methods. First, the band of error associated 
with these various demographic equations is considerable (see 
also Basso et al., 2000). For example, the predicted WAIS-III 
FSIQ for a white male aged 20 to 89 years with 12 years of ed- 
ucation is 102; however, the 90% confidence interval has a 
range of almost 40 points (83-121). As such, we may be 90% 
confident that the individual may have had a premorbid intel- 
ligence that was either low average, average, above average, or 
superior! Unless a very large decline has occurred, it may be 
very difficult to decide whether a patient's obtained IQ is gen- 
uinely less than expected. Further, given that 68% of the 
Wechsler normative sample obtained IQs between 85 and 



106 A Compendium of Neuropsychological Tests 

Table 6-3 Examples of Methods to Predict Premorbid Ability in Adults 



Method 

Demographics only 

Current performance 



Combined method 



Tool 

WTAR Manual (age, educational level, race/ethnicity, 
and gender) 

NART/NAART/AMNART 



WTAR 

SCOLP— Spot-the-Word 

WRAT-R/3 Reading 
Word Accentuation Test 

WTAR and Demographics 
NART/NAART/AMNART and Demographics 



SCOLP — Spot-the-Word and demographics 

OPIE-3 (Demographics and WAIS-III 
Vocabulary, Matrix Reasoning, Information, 
Picture Completion) 



Prediction 

WAIS-III VIQ, PIQ, FSIQ, VCI 

WAIS-RVIQ,PIQ,FSIQ 

FAS 

PASAT 

Raven's Progressive Matrices 

Doors and People Test 

WAIS-III VIQ, PIQ, FSIQ, VCI, POI, WMI, PSI 
WMS-III Immediate Memory, General Memory, 
Working Memory 

IQ 

Boston Naming Test 
FAS 

IQ 

WAIS 

Raven's Progressive Matrices 

WAIS-III VIQ, PIQ, FSIQ, VCI, POI, WMI, PSI 
WMS-III Immediate Memory, General Memory, 
Working Memory 

WAIS-R FSIQ 
WAIS-R Vocabulary 
Raven's SPM 

FAS 

WAIS-III FSIQ, VIQ, PIQ 



Note: The relevant prediction equations are located in the chapters on the WAIS-III (OPIE), NART, SCOLP, Raven's, PASAT, FAS, and BNT. 



115, the demographic prediction renders minimal improve- 
ment over an estimate based on base-rate information alone 
(Basso et al., 2000). In addition, the demographic method 
yields severely restricted IQ ranges and is affected by regres- 
sion to the mean, resulting in serious overestimation of pre- 
morbid IQ at the lower end and underestimation at the higher 
end of ability (e.g., Barona et al., 1984; Basso et al, 2000; Lan- 
geluddecke & Lucas, 2004; Sweet et al., 1990). As a conse- 
quence, those at the higher end of the IQ continuum who 
have suffered a decline may go undetected, while those at the 
lower end of the IQ distribution risk being diagnosed with 
impairment when none has occurred. 

Accordingly, the demographic indices should be used with 
considerable caution in the individual case since much pre- 
diction error may be expected. While they may be of use with 
individuals whose premorbid IQ is likely to have been in the 
average range, they should not be used to estimate the pre- 
morbid ability of exceptional individuals such as the gifted, 
mentally handicapped, or even slow learners (e.g., Basso et al., 
2000; Langeluddecke & Lucas, 2004; Ryan & Prifitera, 1990). 
Performance-based methods (e.g., OPIE, NART, WTAR), con- 
sidered below, tend to provide better predictors of IQ than de- 



mographic indices (e.g., Basso et al, 2000; Blair & Spreen, 
1989; Crawford, 1992; Langeluddecke & Lucas, 2004). 

Demographic indices alone may be preferred for estimat- 
ing premorbid IQ in patients for whom reliance on cognitive 
performance (e.g., NART, WRAT-3, WTAR) would be inap- 
propriate (e.g., patients with moderate or advanced dementia 
or aphasia, with severe brain injury, or who are suspected 
of insufficient effort). It is worth noting, however, that the 
deficits in cases of moderate/severe dysfunction are often clin- 
ically obvious and establishing a patient's premorbid IQ may 
be less critical in such cases. Caution should also be exercised 
when using demographic-based estimates with patients suf- 
fering from disorders that may have a lengthy prodromal 
phase. In such cases, the prodromal phase may have resulted 
in a failure to achieve patients' educational and occupational 
potential. Of course, a similar degree of caution must be exer- 
cised with performance-based estimates of premorbid ability 
(Crawford etal, 2001). 

The power of demographics to predict functions other than 
IQ appears limited. For example, low levels of association have 
been reported between demographic variables (age, education, 
race/ethnicity, sex, geographic region) and performance on 



General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 107 



measures of memory (Gladsjo et al., 1999; Williams, 1997), 
including the WMS-III indices (correlations ranging from .02 
to .31 in the U.S. standardization sample; the Psychological 
Corporation, 2001). 

Current Performance 

Assumptions that overall premorbid functioning can be reli- 
ably predicted from a single "spared" cognitive domain ("best 
performance" method) have been criticized by research show- 
ing that normal individuals demonstrate significant variabil- 
ity in functioning across different cognitive domains (e.g., 
Taylor & Heaton, 2001). As a consequence, the final result will 
be a systematic overestimation of intellectual impairment or 
deficits (Mortensen et al., 1991; but see Hoofien et al., 2000). 
Lezak et al. (2004) caution that an estimate of premorbid abil- 
ity should never be based on a single test score but should take 
into account as much information as possible about the 
patient. 

An alternative approach relies on the assessment of current 
abilities that are considered to be relatively resistant to the ef- 
fects of cerebral insult (i.e., "hold" tests). This method uses 
test scores obtained during the formal test session to estimate 
premorbid ability. These are reviewed below. 

Wechsler Subtests. Wechsler (1958) suggested that tests 
such as Vocabulary, Information, and Picture Completion 
were minimally affected by the effects of aging and brain im- 
pairment in adults and could therefore serve to estimate over- 
all levels of premorbid cognitive function. However, this 
method has significant limitations. Although Vocabulary is 
among the most resistant of the Wechsler subtests, perfor- 
mance is markedly impaired in a range of clinical conditions. 
Vocabulary scores are therefore likely to seriously underesti- 
mate premorbid intelligence (Crawford, 1989; Lezak et al., 
2004). The Information subtest reflects a person's general 
fund of knowledge. However, this score may be misleading in 
examinees with poor educational opportunities. In addition, 
there is evidence that tests such as Information and Picture 
Completion demonstrate decline following neurologic injury 
(e.g., Russell, 1972). 

Word Reading Tests. Nelson and O'Connell (Nelson, 1982; 
Nelson & O'Connell, 1978) proposed that a reading test for ir- 
regularly spelled words would be a better indicator of pre- 
morbid ability based on the rationale that (a) reading skills 
are fairly resistant to brain insult, and (b) that irregularly 
spelled words cannot be decoded phonologically and hence 
rely on previously acquired skills. They developed in Britain a 
test called the National Adult Reading Test (NART), which 
consists of 50 irregular words (e.g., debt, naive; see review in 
this volume). Subsequently, the NART was standardized 
against the WAIS-R (Crawford, 1992; Ryan & Paolo, 1992). 
Blair and Spreen (1989) adapted the NART for use with a 
North American population (North American Adult Reading 
Test or NAART or NART-R). Similar versions (AMNART: 



Grober & Sliwinski, 1991; ANART: Schwartz & Saffran, 1987, 
cited in Grober & Sliwinski, 1991) have been developed for 
use in the United States. 

NART scores correlate well with measures of IQ given con- 
currently. In addition, the NART has been validated against an 
actual premorbid criterion. Crawford et al. (2001) followed 
up 177 people who were given an IQ test at age 11. They found 
a correlation of .77 between these scores and NART scores at 
age 77. 

NART (NAART/AMNART) scores correlate highly with 
FSIQ and VIQ and less with PIQ, a not surprising finding 
since it is a verbal test. Although much of the research on the 
NART has focussed on the estimation of premorbid Wechsler 
IQ, prediction equations have also been developed for use 
with tests such as the FAS, Raven's, PASAT, and Doors and 
People Test (see Table 6-3). 

Prediction of IQ tends to be more accurate with equations 
based on NART (and its various adaptations) scores than with 
the Wechsler Vocabulary subtest (Crawford et al., 1989; 
Sharpe & O'Carroll, 1991) or with demographic variables 
(Blair & Spreen, 1989; Bright et al, 2002; Grober & Sliwinski, 
1991; Ryan & Paolo, 1992). Further, while a fairly large decline 
in cognitive ability (about 15-20 points) may need to occur 
before the reading test can reliably identify abnormality, other 
methods (e.g., demographic estimates) require larger declines 
(about 20-25 points) (e.g., Graves et al, 1999). 

The NART is not insensitive to decline; deterioration in 
NART performance does occur in patients with a variety of 
neurological conditions, for example, in cases of moderate 
to severe dementia (Patterson et al., 1994; Stebbins, Wilson, 
et al., 1988, 1990) and in those with mild dementia who have 
accompanying linguistic deficits (Stebbins, Gilley, et al., 
1990). However, the NART appears less sensitive to neurologi- 
cal compromise than other measures (e.g., Wechsler Vocabu- 
lary). In short, it may prove useful in providing a lower limit 
to the estimate of premorbid IQ (Stebbins, Gilley, et al., 1990, 
Stebbins, Wilson, et al. 1990). 

One should also note that the NART cannot be used with 
aphasic or dyslexic patients, nor in patients with significant 
articulatory or visual acuity problems (Crawford, 1989, 1992). 
Further, like any regression-based procedure, the NART over- 
estimates FSIQ at the lower end of the IQ range and underes- 
timates it at the higher end (Ryan 8c Paolo, 1992; Wiens et al., 
1993). 

Nelson, the NART's author, selected English words with an 
irregular pronunciation whose proper reading would depend 
on the previous knowledge of the subject rather than on 
phonological decoding skills. In the Spanish language, this 
cannot be done because every word is read in a regular way 
(with phonological decoding). Del Ser et al. (1997) therefore 
developed a reading task (Word Accentuation Test, WAT) 
with an ambiguous graphic clue for the Spanish reader: the 
accentuation of 30 infrequent words written in capital letters 
without accent marks. Its correlations with the WAIS (.84) 
and the Raven's Progressive Matrices (.63) are high, and the 
task appears fairly resistant to mild cognitive deterioration. 



108 A Compendium of Neuropsychological Tests 



The test was developed in Spain, and since Spanish has 
marked geographical differences, a version has also been de- 
veloped for use in Argentina (Burin et al., 2000). The reliabil- 
ity and validity of the WAT among Spanish speakers in other 
regions (e.g., the United States) has not yet been determined. 

The WTAR (the Psychological Corporation, 2001; see also 
description in this volume) is a relatively new tool that is sim- 
ilar to the NART in that it requires reading of irregularly 
spelled words. Although its validity is less researched, only the 
WTAR, not the NART (or its variants), has been validated 
against the WAIS-III. Like the NART, WTAR scores correlate 
highly with FSIQ and the verbal composites (VIQ and Verbal 
Comprehension Index) and only moderately well with other 
Wechsler scores. 

While the WTAR appears to provide a reasonable estimate 
of intellectual functioning, its utility in other domains (e.g., 
memory) appears more modest. Nonetheless, it displays a bet- 
ter prediction of WAIS-III and WMS-III scores than demo- 
graphic variables. For example, the amount of variance 
accounted for by WTAR-predicted FSIQ (56%) is higher than 
that based on demographics alone (36%; the Psychological 
Corporation, 2001). Like the NART, it is relatively but not 
completely resistant to cerebral insult; also like the NART, it 
should not be used in those with a preexisting learning disor- 
der or in those who suffer language or perceptual disorders. 

The SCOLP (Baddeley et al, 1992) Spot-the-Word test 
(STW) is a brief lexical decision task consisting of 60 word- 
nonword pairs. Individuals have to identify the real word in 
each pair. Scores on the task correlate moderately well with 
crystallized ability. However, other tests (e.g., NART) prove 
more resistant to neurological impairment and correlate 
higher with Wechsler IQ (e.g., Watt & O'Carroll, 1999). Of 
note, equations based on the STW test have been developed 
for a variety of functions (see Table 6-3 and test descriptions 
elsewhere in this volume). In addition, because it does not in- 
volve a spoken response, the STW may be particularly useful 
for aphasic or dysarthric people. It can even be administered 
in a group setting to provide a gross estimate of ability. 

The reading subtest of the Wide Range Achievement Test 
(WRAT, WRAT-R, WRAT-3) is used more frequently than 
other reading tests (e.g., NART) as an indicator of premorbid 
intellectual status (Smith-Seemiller et al, 1997). WRAT Read- 
ing shows a moderate relation with WAIS-R/WAIS-III IQ 
(r = .45-.63; Griffin et al., 2002; Johnstone et al., 1996; Wiens 
et al., 1993), although the relation is somewhat lower than 
that observed for the NART (Griffin et al, 2002) or the STW 
test (Lucas etal, 2003). 

WRAT Reading scores do not remain stable in patients 
with a history of neurological insult, suggesting that this mea- 
sure can be affected by CNS disease (Johnstone & Wilhelm, 
1996) and other factors (e.g., motivational status). In addition, 
the task tends to underestimate Wechsler IQ, particularly in 
the high IQ ranges, to an even greater degree than the NART/ 
NAART (Griffin et al, 2002; Weins et al., 1993). At the lower 
end of the IQ range (e.g., below 89), the WRAT-R/WRAT-3 may, 
however, be more appropriate than the NART/NAART or 



demographic indicators (Griffin et al., 2002; Johnstone et al., 
1996; Weins et al., 1993). In the average range, WRAT-3 and 
NAART scores provide equivalent and adequate classification 
(Griffin et al., 2002). In short, no one estimation method ap- 
pears uniformly accurate, and different estimation methods 
are relatively better within specific IQ categories (see also 
Basso etal, 2000). 

Accordingly, clinicians have to consider using multiple 
methods to derive estimates of premorbid ability rather than 
rely on a single estimate. In the event of differing classifications, 
clinicians are forced to rely on clinical judgement with regard 
to weighing the particular methodology that most accurately 
reflects premorbid functioning. Hybrid methods of premorbid 
estimation tend to provide for more accurate estimates. 

Combinations of Demographics 
and Current Performance 

Pairing test behavior with data from demographic variables 
appears to increase the power of prediction, producing less 
range restriction and less over- or underestimation of pre- 
morbid ability. The addition of demographics serves to buffer 
some of the effects of clinical status impacting cognitive per- 
formance (the Psychological Corporation, 2001); the inclu- 
sion of current performance indicators (e.g., reading test 
scores) can improve predictive accuracy particularly in those 
with unusual combinations (e.g., lower than expected reading 
ability given their educational achievement; Gladsjo et al., 
1999). 

Demographic variables (education level, race/ethnicity, 
and gender) and WTAR performance have been combined to 
predict intellectual (WAIS-III) and memory (WMS-III) func- 
tioning. The addition of demographic data to WTAR perfor- 
mance resulted in an increase of 4% to 7% over WTAR 
performance alone in the prediction of intellectual and mem- 
ory performance (the Psychological Corporation, 2001). In 
addition, the ranges of possible scores are increased modestly 
(the Psychological Corporation, 2001). Others have reported 
similar gains when combining demographics with scores on 
reading tests such as the NART (e.g., Grober & Sliwinski, 
1991; Watt & O'Carroll, 1999; Willshire et al, 1991). 

Another method (OPIE, or Oklahoma Premorbid Intelli- 
gence Estimate formulas) combines select WAIS-III subtests 
that are relatively insensitive to neurological dysfunction (Vo- 
cabulary, Matrix Reasoning, Information, and Picture Com- 
pletion) and demographic information (i.e., age, education, 
ethnicity, region of country, and gender) in prediction algo- 
rithms. Such formulas were initially developed by Krull et al. 
(1995) and Vanderploeg et al. (1996) for use with the WAIS-R 
and have been shown to correlate highly with premorbid abil- 
ity (premorbid military data) (Hoofien et al, 2000). Formulas 
have recently been generated by Schoenberg et al. (2002) for 
the WAIS-III. Six OPIE-3 algorithms were developed from the 
WAIS-III standardization sample to predict FSIQ (other for- 
mulas are available to predict VIQ and PIQ; see Premorbid 
Estimation in our review of the WAIS-III in this volume). The 



General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 109 



subtests comprising the formulas are as follows (note that ST 
refers to subtest): 

• OPIE-3 (4 ST): Vocabulary, Information, Matrix 
Reasoning, and Picture Completion 
OPIE-3 (2 ST): Vocabulary and Matrix Reasoning 
OPIE-3V: Vocabulary only 

OPIE-3P: Matrix Reasoning and Picture Completion 
OPIE-3MR: Matrix Reasoning only 
OPIE-3 (Best): OPIE-3V used if Vocabulary age-scaled 
score is higher, OPIE-3MR used if Matrix Reasoning 
age-scaled score is higher, and OPIE-3 (2ST) used if the 
age-scaled scores are equivalent 

The OPIE-3 algorithms yield estimates of FSIQ that closely 
match the general population mean (Langeluddecke & Lucas, 
2004; Schoenberg et al., 2002). They show a better range of 
scores in comparison with demographically based indices 
(Langeluddecke & Lucas, 2004) as well as NART-based esti- 
mates (see NART chapter in this volume), and while suscepti- 
ble to relatively large errors of prediction (standard errors of 
estimation, SE E s), other methods (i.e., demographic indexes) 
may yield larger prediction errors (Basso et al, 2000). The 
OPIE-3V, OPIE-3MR, and OPIE-3 (2ST) appear to be fairly 
robust where intellectual impairment is mild/moderate; the 
OPIE-3 (Best) is suitable for cases of severe brain injury (Lan- 
gelluddecke & Lucas, 2004; Schoenberg et al., 2003). However, 
OPIE estimates, particularly those including two nonverbal 
subtests, are susceptible to neurological impairment (Lange- 
luddecke & Lucas, 2004). 

The differential clinical utility of the OPIE-3 and reading 
test-demographics (e.g.,WTAR-Demographics) approaches is 
not known. The OPIE-3 correlations with WAIS-III IQ are 
likely inflated due to the lack of independence (WAIS-III sub- 
tests are used in the computation of IQ scores), which also in- 
flates the overlap in score distributions (Schoenberg et al., 
2002); the OPIE-3 algorithms provide a greater range of IQ es- 
timates (depending on the equation, range: 72-112 IQ score 
points; Schoenberg et al., 2002) than the WTAR used alone (de- 
pending on the IQ index, range: 30-49; the Psychological Cor- 
poration, 2001, p. 45) or when combined with demographic 
variables (range: 38-50; the Psychological Corporation, 2001, 
p. 57). Similarly, the proportion of OPIE-3 FSIQ estimates that 
fall within +10 points of actual IQ (75%-93%; Schoenberg 
et al, 2002) is greater than the WTAR alone (70.4%), demo- 
graphics alone (60.9%), or the combined WTAR-demographics 
approach (73.4%; the Psychological Corporation, 2001, p. 59). 

Despite the potential theoretical and psychometric prob- 
lems of the OPIE estimates (regression to the mean, high cor- 
relations between predictors and criterion), the available data 
suggest that the OPIE-3 is useful in predicting premorbid abil- 
ity in neurologically impaired populations (Langelluddecke & 
Lucas, 2004; Schoenberg et al, 2003). An advantage of the 
OPIE-3 is that an estimate can be obtained when the WTAR 
(or NART) is not administered. Nonetheless, both methods 
(OPIE-3 and WTAR-demographics) require additional valida- 
tion studies, including against an actual premorbid criterion. 



Table 6-4 Examples of Methods to Predict Premorbid 
Ability in Children 



Method 


Tool 


Prediction 


Demographics only 


Parental education, 


WISC-III VIQ, PIQ, 




ethnicity 


and FSIQ 


Performance of 


IQ of parents or 


IQ 


family members 


siblings 




Combined methods 


Parent ratings, 


CVLT 




maternal ethnicity, 


VMI 




SES, word 


WISC-III PIQ 




recognition 





Premorbid Estimation in Children 

Methods to estimate premorbid cognitive functioning in 
adults have been well researched. In contrast, researchers have 
directed little attention to the development of prediction 
methods for children. The issue is also more complicated in 
children as their skills are not fully developed, and unlike 
adults, children often do not achieve stable levels of function- 
ing prior to trauma or disease onset (Redfield, 2001). Table 6-4 
shows that a number of methods have been used to predict 
premorbid ability in children. These include (a) demographic 
information, (b) familial intellect, and (c) a combination of so- 
ciodemographic information and current performance. Over- 
all, attempts to develop measures of premorbid or expected 
ability for children have generally been less successful than 
for adults, typically producing estimates with inadequate pre- 
cision for clinical use (Klesges & Sanchez, 1981; Yeates & Tay- 
lor, 1997). 



Table 6-5 Regression Equations to Predict WISC-III IQ Scores 



Outcome 
Measure 



Regression Equation 



SE„ 



R 2 



FSIQ 5.44 (mean education) + 12.56 .28 

2.80 (white/non-white) - 
9.01 (black/non-black) + 81.68 

VIQ 5.71 (mean education)-!- 12.79 .27 

4.64 (white/non-white) - 
5.04 (black/non-black) + 79.06 

PIQ 4.18 (mean education)-!- 13.35 .20 

0.26 (white/non-white) - 
11.85 (black/non-black) + 88.09 

Ethnicity is composed of two coded variables, white/non-white (white — 1, non- 
white - 0) and black/non-black (black - 1, non-black - 0). Hispanic people would be 
uniquely identified with on both white/non-white and black/ non-black. The regres- 
sion equations had information for only three categories: white, black, and Hispanic; 
therefore, the regression equations should not be used with other ethnic groups. 
For parental education, the mean of the codings for the mother's and father's level 
of education is used in the regression formulas or the single parent's educational 
code if data from only one parent is available (parental education: 0-8 years— 1, 
9-11 years = 2, 12 years (or GED) - 3, 13-15 years - 4, 16+ years = 5). 

Source: From Vanderploeg et al., 1998. Reprinted with permission of APA. 



110 A Compendium of Neuropsychological Tests 



Reynolds and Gutkin (1979) used demographic variables 
(father's occupational status, child's gender, ethnicity, urban 
versus rural residence, and geographic region) from the 
WISC-R to predict IQ scores. The equations had multiple cor- 
relations of .44, .44, and .37 with FSIQ, VIQ, and PIQ, respec- 
tively. Similar equations have been developed on the basis of 
the WISC-III standardization data. Vanderploeg et al. (1998) 
used mean parental education and ethnicity to predict IQ 
scores. The equations had correlations with FSIQ, VIQ, and 
PIQ of .53, .52, and .45, respectively, accounting for slightly 
more variance in actual IQ scores than the formulas devel- 
oped by Reynolds and Gutkin (1979) from the WISC-R stan- 
dardization data. Equations that relied on demographics 



alone proved just as effective as combining demographic vari- 
ables with a WISC-III subtest scaled scores in a best-perfor- 
mance fashion (i.e., including Vocabulary or Picture 
Completion, whichever produced the higher estimated score), 
both of which were significantly more effective than using Vo- 
cabulary plus demographics or Picture Completion plus de- 
mographics individual formulas. Given that a purely 
demographic approach proved just as effective as the more 
complicated best-performance approach, it may be a better 
choice for premorbid estimation in children (Vanderploeg et 
al., 1998). The equations are provided in Table 6-5. Note that 
the equations refer to the WISC-III, not the more up-to-date 
WISC-IV, and that the demographic variables account for less 



Table 6-6 Cumulative Percentages of Expected Discrepancies (Regardless of Sign) Between Obtained 
and Predicted IQ for Several Estimation Methods 



Discrepancy from IQ Estimated by 


Amount of 


One Parent's 


One Sibling's 


Two Parents' 


Demographic 


Previous 


Discrepancy (IQ points) 


IQ a 


IQ a 


Mean IQ a 


Variables 11 


IQ a 


30 


3 


9 


2 


2 





29 


3 


3 


3 


2 





28 


4 


3 


3 


3 





27 


5 


4 


4 


3 





26 


6 


5 


5 


4 





25 


7 


6 


5 


5 


0.01 


24 


8 


7 


6 


6 


0.01 


23 


9 


8 


8 


7 


0.02 


22 


11 


10 


9 


8 


0.04 


21 


12 


11 


11 


9 


0.1 


20 


14 


13 


12 


11 


0.1 


19 


16 


15 


14 


13 


0.2 


18 


19 


17 


17 


15 


0.4 


17 


21 


20 


19 


18 


0.6 


16 


24 


23 


22 


20 


1 


15 


27 


26 


25 


23 


2 


14 


30 


29 


28 


27 


2 


13 


34 


33 


32 


30 


4 


12 


38 


36 


36 


34 


5 


11 


42 


41 


40 


38 


8 


10 


46 


45 


44 


43 


11 


9 


31 


50 


49 


47 


15 


8 


36 


55 


54 


52 


20 


7 


61 


60 


59 


58 


26 


6 


66 


65 


64 


63 


33 


5 


71 


71 


70 


69 


42 


4 


77 


76 


76 


75 


52 


3 


83 


82 


82 


81 


63 


2 


88 


88 


88 


87 


75 


1 


94 


94 


94 


94 


87 



Note: All values refer to discrepancies between obtained and estimated Full-Scale IQ; discrepancies between obtained IQs of fam- 
ily members may be larger. For directional comparisons (e.g., the cumulative frequency with which an obtained IQ is below pre- 
dicted IQ by a specified amount), divide the entries by 2. 

Estimated IQ is calculated by the equation IQ est - r(x- 100) + 100, where IQ est ls estimated IQ score, r is the correlation coeffi- 
cient with IQ for a particular estimation method (.42 for one parent's IQ, .47 for one sibling's IQ, .50 for two parents' mean IQ, .91 
for a previous IQ) and xis the familial or previous IQ used as an estimator. 
b Equation for estimating IQ using demographic variables from Vanderploeg et al., 1998. 

Source: From Redfield, 200 1 . Reprinted by kind permission of Psychology Press. 



General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 111 



than 30% of the variance in IQ scores. Note, too, the large 
standard errors of estimation suggesting that a very large dis- 
crepancy between observed and expected values will be 
needed to infer decline. In addition, bear in mind that any 
regression-based approach predicts scores toward the popula- 
tion mean, overestimating scores of persons with low ability 
and underestimating scores of those with high ability. 

Some have suggested using parental or sibling intelligence 
as an index of expected intellectual ability of children (e.g., 
Baron, 2000; Reynolds, 1997) — a recommendation based on 
the well-known finding that IQs among biological relatives 
are correlated. However, IQ is significantly correlated within 
families (r values ranging from .42-.50), but at a level that 
limits the precision of IQ estimates based on measures of 
other family members' abilities (Redfield, 2001). The confi- 
dence limits around an IQ estimate based on familial intelli- 
gence, with or without the addition of demographic variables, 
span a wide range of values. To assist clinicians wishing to 
compare a child's IQ with estimates based on familial IQ or 
demographic variables, Table 6-6 lists cumulative percentages 
of expected discrepancies between obtained and predicted IQ 
for several estimation methods. Reference to Table 6-6 shows 
that sizeable discrepancies from familial IQ are required to in- 
fer abnormality, thus reducing their sensitivity (Redfield, 
2001). Thus, while a significant discrepancy may be meaning- 
ful, the absence of one is often inconclusive. Redfield (2001) 
notes that estimates based on family members' IQ are not 
substantially more accurate than discrepancies derived from 
demographic indices. In short, there appears to be little justi- 
fication to actually test family members' IQ, because an 
equally accurate measure can be obtained from the less costly 
procedure of ascertaining selected demographic characteris- 
tics (Redfield, 2001). 

Attempts have also been made to predict premorbid ability 
using sociodemographic variables and measures of skills such 
as word recognition. Yeates and Taylor (1997) derived predic- 
tion equations from 80 children with orthopedic injuries be- 
tween the ages of 6 and 12 years. A combination of parent 
ratings of premorbid school performance (Child Behavior 
Checklist, CBCL), maternal ethnicity, family socioeconomic 
status, and children's word recognition skill (WJ-R) predicted 
13%, 36%, and 45% of the variance in three measures (re- 
spectively, a shortened version of the CVLT, Developmental 
Test of Visual-Motor Integration [VMI], and WISC-III pro- 
rated PIQ). Although prediction was statistically significant 
for all three outcomes, the regression equations were not es- 
pecially accurate for individual children, especially those with 
the lowest and highest scores. Less than two-thirds of the 
group had a predicted score within 10 points of their actual 
scores regardless of the outcome considered. 

In short, while clinicians can refer to multiple sources of 
information (i.e., demographics, family members' IQ, parent 
ratings, concurrent word reading skills) to predict premorbid 
functioning in children, the resulting estimates are not likely 
to be sufficiently accurate for individual cases, particularly 
those with milder forms of cerebral dysfunction. The various 



equations, however, may be of greater use in research contexts 
(e.g., to show that a group with TBI were similar to a group of 
noninjured counterparts premorbidly). 



REFERENCES 

Baade, L. E., & Schoenberg, M. R. (2004). A proposed method to esti- 
mate premorbid intelligence utilising group achievement mea- 
sures from school records. Archives of Clinical Neuropsychology, 
J 9, 227-243. 

Baddeley, A., Emslie, H., & Nimmo Smith, I. (1992). The Speed and 
Capacity of Language-Processing Test. Suffolk, England: Thames 
Valley Test Company. 

Bahrick, H. P., Hall, L. K., & Berger, S. A. (1996). Accuracy and distor- 
tion in memory for high school grades. Psychological Science, 7, 
265-271. 

Baron, I. S. (2000). Clinical implications and practical applications of 
child neuropsychological evaluations. In K. O. Yeates, M. D., Ris, 
& H. G. Taylor (Eds.), Pediatric neuropsychology: Research, theory, 
and practice (pp. 439-456). New York: Guilford Press. 

Barona, A., Reynolds, C. R., & Chastain, R. (1984). A demographi- 
cally based index of pre-morbid intelligence for the WAIS-R. 
Journal of Consulting and Clinical Psychology, 52, 885-887. 

Basso, M. R., Bornstein, R. A., Roper, B. L., & McCoy, V. L. (2000). 
Limited accuracy of premorbid intelligence estimators: A demon- 
stration of regression to the mean. The Clinical Neuropsychologist, 
14, 325-340. 

Bayley, N. (1993). Bayley Scales of Infant Development (2nd ed.; Bay- 
ley-II). San Antonio, TX: Psychological Corporation. 

Blair, J. R., & Spreen, O. (1989). Predicting premorbid IQ: A revision 
of the National Adult Reading Test. The Clinical Neuropsycholo- 
gist, 3, 129-136. 

Bright, P., ladlow, E., & Kopelman, M. D. (2002). The National Adult 
Reading Test as a measure of premorbid intelligence: A compari- 
son with estimates derived from premorbid levels. Journal of the 
International Neuropsychological Society, 8, 847-854. 

Brown, L., Sherbenou, R. J., & Johnson, S. K. (1997). Test of Nonverbal 
Intelligence: A language-free measure of cognitive ability (3rd ed.). 
Austin, TX: Pro-Ed. 

Burin, D. I., Jorge, R. E., Aizaga, R. A., & Paulsen, J. S. (2000). Estimation 
of premorbid intelligence: The word accentuation test — Buenos 
Aires version. Journal of Clinical and Experimental Neuropsychology, 
22, 677-685. 

Camara, W. J., Nathan, J. S., & Puente, A. E. (2000). Psychological test 
usage: Implications in professional psychology. Professional Psy- 
chology: Research and Practice, 31, 141-154. 

Carroll, J. B. (1993). Human cognitive abilities: A survey of factor ana- 
lytic studies. Cambridge: MA: Cambridge University Press. 

Carroll, J. B. (1997). The three-stratum theory of cognitive abilities. In 
D. P. Flanagan, ]. L. Genshaft, & P. L. Harrison (Eds.), Contempo- 
rary intellectual assessment: Theories, tests and issues (pp. 122-130). 
New York: Guilford. 

Crawford, J. R. (1989). Estimation of premorbid intelligence: A re- 
view of recent developments. In J. R. Crawford & D. M. Parker 
(Eds.), Developments in clinical and experimental neuropsychology. 
London: Plenum. 

Crawford, J. R. (1992). Current and premorbid intelligence measures 
in neuropsychological assessment. In J. R. Crawford, D. M. Parker, 
& W. M. McKinlay (Eds.), A handbook of neuropsychological as- 
sessment. West Sussex: LEA. 



112 A Compendium of Neuropsychological Tests 



Crawford, J. R., & Allan, K. M. (1997). Estimating premorbid WAIS- 
R IQ with demographic variables: Regression equation derived 
from a U.K. sample. The Clinical Neuropsychologist, 11, 192-197. 

Crawford, J. R., Millar, J., & Milne, A. B. (2001). Estimating premor- 
bid IQ from demographic variables: A comparison of a regression 
equation vs. clinical judgement. British Journal of Clinical Psychol- 
ogy, 40, 97-105. 

Del Ser, T., Gonzalez-Montalvo, J-L, Martinez-Espinosa, S., Delgado- 
Villapalos, C, & Bermejo, F. (1997). Estimation of premorbid in- 
telligence in Spanish people with the Word Accentuation Test and 
its application to the diagnosis of dementia. Brain and Cognition, 
33, 343-356. 

Demakis, G. J., Sweet, J. J., Sawyer, T. P., Moulthrop, M., Nies, K., & 
Clingerman, S. (2001). Discrepancy between predicted and ob- 
tained WAIS-R IQ scores discriminates between traumatic brain 
injury and insufficient effort. Psychological Assessment, 13, 240-248. 

Eisenstein, N., & Engelhart, C. I. (1997). Comparison of the K-BIT 
with short forms of the WAIS-R in a neuropsychological popula- 
tion. Psychological Assessment, 9, 57-62. 

Flanagan, D. P., & McGrew, K. S. (1997). A cross-battery approach in 
assessing and interpreting cognitive abilities: Narrowing the gap 
between practice and cognitive science. In D. P. Flanagan & 
J. L. Genschaft (Eds.), Contemporary intellectual assessment: Theo- 
ries, tests, and issues. New York: Guilford Press. 

Flanagan, D. P., McGrew, K. S., & Ortiz, S.O. (2000). The Wechsler In- 
telligence Scales and Gf-Gc theory: A contemporary approach to in- 
terpretation. Boston: McGraw-Hill. 

Folstein, M. E, Folstein, S. E., & McHugh, P. R. (1975). "Mini-mental 
state": A practical method for grading the cognitive state of out- 
patients for the clinician. The Journal of Psychiatric Research, 12, 
189-198. 

Franzen, M. D., Burgess, E. J., & Smith-Seemiller, L. (1997). Methods 
of estimating premorbid functioning. Archives of General Neu- 
ropsychology, 12, 711-738. 

Gladsjo, J. A., Heaton, R. K., Palmer, B. W., Taylor, M. J., & Jeste, D. V. 
(1999). Use of oral reading to estimate premorbid intellectual and 
neuropsychological functioning. Journal of the International Neu- 
ropsychological Society, 5, 247-254. 

Golden, C. J., Purisch, A. D., & Hammeke, T. A. (1985). Luria- 
Nebraska Neuropsychological Battery: Forms I and II. Los Angeles: 
Western Psychological Services. 

Graves, R. E., Carswell, L. M. N., & Snow, W. G. (1999). An evaluation 
of the sensitivity of premorbid IQ estimators for detecting cogni- 
tive decline. Psychological Assessment, 11, 29-38. 

Greiffenstein, M. F. & Baker, W J. (2001). Comparison of premorbid 
and postinjury MMPI-2 profiles in late postconcussion claimants. 
The Clinical Neuropsychologist, 15, 162-170. 

Greiffenstein, M. E, & Baker, W J. (2003). Premorbid clues? Preinjury 
scholastic performance and present neuropsychological function- 
ing in late postconcussion syndrome. The Clinical Neuropsycholo- 
gist, i7, 561-573. 

Greiffenstein, M. E, Baker, W. J., & Johnson-Greene, D. (2002). Actual 
versus self-reported scholastic achievement of litigating postcon- 
cussion and severe closed head injury claimants. Psychological As- 
sessment, 14, 202-208. 

Griffin, S. L., Mindt, M. R., Rankin, E. J., Ritchie, A. J., & Scott, J. G. 
(2002). Estimating premorbid intelligence: Comparison of tradi- 
tional and contemporary methods across the intelligence contin- 
uum. Archives of Clinical Neuropsychology, 17, 497-507. 

Grober, E., & Sliwinski, M. (1991). Development and validation of a 
model for estimating premorbid verbal intelligence in the elderly. 



Journal of Clinical and Experimental Neuropsychology, 13, 
933-949. 

Hays, J. R., Reas, D. L., & Shaw, J. B. (2002). Concurrent validity of the 
Wechsler Abbreviated Scale of Intelligence and the Kaufman Brief 
Intelligence Test among psychiatric patients. Psychological Re- 
ports, 90, 355-359. 

Hoofien, D., Vakil, E., & Gilboa, A. (2000). Criterion validation of 
premorbid intelligence estimation in persons with traumatic 
brain injury: "Hold/don't hold" versus "best performance" proce- 
dures. Journal of Clinical and Experimental Neuropsychology, 22, 
305-315. 

Horn, J. L. & Noll, J. (1997). Human cognitive capabilities. In 
D. P. Flanagan, J. L. Genshaft (Eds.), Contemporary intellectual as- 
sessment: Theories, tests, and issues (pp. 53-91). New York: Guilford 
Press. 

Ivnik, R. J., Malec, J. F., Smith, G. E., Tangalos, E. G., Peterson, R. C, 
Kokmen, E., & Kurland, L. T. (1992). Mayo's Older American 
Normative Studies: WAIS-R norms for ages 56 to 97. The Clinical 
Neuropsychologist, 6(Supplement), 1-30. 

Johnson-Greene, D., Dehring, M., Adams, K M., Miller, T, Arora, 
S., Beylin, A., & Brandon, R. (1997). Accuracy of self- reported ed- 
ucational attainment among diverse patient populations: A pre- 
liminary investigation. Archives of Clinical Neuropsychology, 12, 
635-643. 

Johnstone, B., Callahan, C. D., Kapila, C. J., & Bouman, D. E. (1996). 
The comparability of the WRAT-R reading test and NAART as es- 
timates of premorbid intelligence in neurologically impaired pa- 
tients. Archives of Clinical Neuropsychology, 11, 513-519. 

Johnstone, B., & Wilhelm, K. L. (1996). The longitudinal stability of 
the WRAT-reading subtest: Is it an appropriate estimate of pre- 
morbid intelligence? Journal of the International Neuropsychologi- 
cal Society, 2, 282-285. 

Jurica, P. J., Leitten, C. L., & Mattis, S. (2001). Dementia Rating Scale- 
2. Odessa, FL: Psychological Assessment Resources. 

Kaufman, A. S., & Kaufman, N. L. (1990). Kaufman Brief Intelligence 
Test. Circle Pines, MN: American Guidance Service. 

Kaufman, A. S., & Lichtenberger, E. O. (1999). Essentials ofWAIS-III 
assessment. New York: John Wiley & Sons, Inc. 

Klesges, R. C, & Sanchez, V. C. (1981). Cross-validation of an index 
of premorbid intellectual functioning in children. Journal of Con- 
sulting and Clinical Psychology, 49, 141. 

Korkman, M., Kirk, U, & Kemp, S. (1998). NEPSY: A developmental 
neuropsychological assessment manual. San Antonio, TX: The Psy- 
chological Corporation. 

Krull, K. R., Scott, J. G., & Sherer, M. (1995). Estimation of premor- 
bid intelligence from combined performance and demographic 
variables. The Clinical Neuropsychologist, 9, 83-88. 

Langeluddecke, P. M., & Lucas, S. K. (2004). Evaluation of two meth- 
ods for estimating premorbid intelligence on the WAIS-III in a 
clinical sample. The Clinical Neuropsychologist, 18, 423-432. 

Larrabee, G. J. (2004). A review of clinical interpretation of the 
WAIS-III and WMS-III: Where do we go from here and what 
should we do with WAIS-IV and WMS-IV? Journal of Clinical and 
Experimental Neuropsychology, 24, 707-717. 

Leach, L., Kaplan, E., Rewilak, D., Richards, B., & Proulx, B-B. (2000). 
Kaplan Baycrest Neurocognitive Assessment. San Antonio, TX: The 
Psychological Corporation. 

Lezak, M. D., Howieson, D. B., & Loring, D. W. (2004). Neuropsycho- 
logical assessment (4th ed.). New York: Oxford University Press 

Lucas, S. K, Carstairs, J. R., & Shores, E. A. (2003). A comparison 
of methods to estimate premorbid intelligence in an Australian 



General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 113 



sample: Data from the Macquarie University neuropsychological 
normative study (MUNNS). Australian Neuropsychologist, 38, 
227-237. 

Mattis, S. (1976). Mental status examination for organic mental syn- 
drome in the elderly patient. In L. Bellak & T. B. Karasu (Eds.), 
Geriatric psychiatry. New York: Grune & Stratton. 

McCallum, R. S., Bracken, B. A., & Wassermna, J. D. (2001). Essentials 
of nonverbal assessment. New York: John Wiley & Sons. 

McGrew, K. S. (1997). Analysis of the major intelligence batteries ac- 
cording to a proposed comprehensive GF-Gc framework. In 
D. F. Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary 
intellectual assessment: Theories, tests, and issues (pp. 151-182). 
New York: Guilford Press. 

Mortensen, E. L., Gade, A., & Reinisch, J. M. (1991). A critical note on 
Lezak's "best performance method" in clinical neuropsychology. 
Journal of Clinical and Experimental Neuropsychology, 13, 361-371. 

Naglieri, J. A., & Das, J. P. (1997). Cognitive Assessment System inter- 
pretive handbook. Itasca, IL: Riverside Publishing. 

Neisser, U., Boodoo, G., Bouchard, T. J., Boykin, A. W., Brody, N., 
Ceci, S. J., Halpern, D. E, Loehlin, J. C, Perloff, R., Sternberg, R. J., 
& Urbina, S. (1996). Intelligence: Knowns and unknowns. Ameri- 
can Psychologist, 51, 77-101. 

Nelson, H. E. (1982). National Adult Reading Test (NART): Test man- 
ual. Windsor, England: NFER Nelson. 

Nelson, H. E., & O'Connell, A. (1978). Dementia: The estimation of 
pre-morbid intelligence levels using the New Adult Reading Test. 
Cortex, 14, 234-244. 

Patterson, K., Graham, N., & Hodges, J. R. (1994). Reading in demen- 
tia of the Alzheimer type: A preserved ability? Neuropsychology, 8, 
395-407. 

The Psychological Corporation. (2001). Wechsler Test of Adult Read- 
ing manual. San Antonio, TX: Author. 

The Psychological Corporation. (2002). WAIS-III/WMS-III technical 
manual: Updated. San Antonio, TX: Author. 

Rabin, L. A., Barr, W. B., & Burton, L. A. (2005). Assessment practices 
of clinical neuropsychologists in the United States and Canada: A 
survey of INS, NAN, and APA Division 40 members. Archives of 
Clinical Neuropsychology, 20, 33-66. 

Randolph, C. (1998). RBANS manual. San Antonio, TX: The Psycho- 
logical Corporation. 

Raven, J. C. (1938). Progressive Matrices: A perceptual test of intelli- 
gence. London: H.K. Lewis. 

Raven, J. C. (1947). Colored Progressive Matrices sets A, Ab, B. London: 
H.K. Lewis. 

Raven, J. C (1965). Advanced Progressive Matrices sets I and II. Lon- 
don: H.K. Lewis. 

Redfield, J. (2001). Familial intelligence as an estimate of expected 
ability in children. The Clinical Neuropsychologist, 15, 446-460. 

Reitan, R. M., & Wolfson, D. (1993). The Halstead-Reitan neuropsy- 
chological test battery: Theory and clinical interpretation (2nd ed.). 
Tucson, AZ: Neuropsychology Press. 

Reynolds, C. R. (1997). Postscripts on premorbid ability estima- 
tion: Conceptual addenda and a few words on alternative and 
conditional approaches. Archives of Clinical Neuropsychology, 12, 
769-778. 

Reynolds, C. R., & Gutkin, T. B. (1979). Predicting the premorbid in- 
tellectual status of children using demographic data. Clinical 
Neuropsychology, 1, 36-38. 

Russell, E. (1972). WAIS factor analysis with brain damaged subjects 
using criterion measures. Journal of Consulting and Clinical Psy- 
chology, 39, 133-139. 



Ryan, J. J., & Paolo, A.M. (1992). A screening procedure for estimating 
premorbid intelligence in the elderly. The Clinical Neuropsycholo- 
gist, 6, 53-62. 

Ryan, J. J., Paolo, A. M., & Findley, G. (1991). Percentile rank conversion 
tables for WAIS-R IQs at six educational levels. Journal of Clinical 
Psychology, 47, 104-107. 

Ryan, J. J., & Prifitera, A. (1990). The WAIS-R index for estimating 
premorbid intelligence: Accuracy in predicting short form IQ. In- 
ternational Journal of Clinical Neuropsychology, 12, 20-23. 

Sattler, J. M. (2001). Assessment of children: Cognitive applications 
(4th ed.). San Diego: Jerome M. Sattler, Publishers, Inc. 

Schoenberg, M. R., Duff, K., Scott, J. G., & Adams, R. L. (2003). An 
evaluation of the clinical utility of the OPIE-3 as an estimate of 
premorbid WAIS-III FSIQ. The Clinical Neuropsychologist, 17, 
308-321. 

Schoenberg, M. R., Scott, J. G., Duff, K., & Adams, R. L. (2002). Esti- 
mation of WAIS-III intelligence from combined performance and 
demographic variables: Development of the OPIE-3. The Clinical 
Neuropsychologist, 16, 426-438. 

Sharpe, K, & O'Carroll, R. (1991). Estimating premorbid intellectual 
level in dementia using the National Adult Reading Test: A Cana- 
dian study. British Journal of Clinical Psychology, 30, 38 1-384. 

Smith-Seemiller, L., Franzen, M. D., Burgess, E. J., & Prieto, L. R. 
(1997). Neuropsychologists' practice patterns in assessing premor- 
bid intelligence. Archives of Clinical Neuropsychology, 12, 739-744. 

Spearman, C. (1927). The abilities of man. New York: MacMillan. 

Spreen, O., & Strauss, E. (1998). A compendium of neuropsychological 
tests: Administration, norms and commentary. New York: Oxford 
University Press. 

Stebbins, G. T, Gilley, D. W., Wilson, R. S., Bernard, B. A., & Fox, J. H. 
(1990). Effects of language disturbances on premorbid estimates 
of IQ in mild dementia. The Clinical Neuropsychologist, 4, 64-68. 

Stebbins, G. T, Wilson, R. S., Gilley, D. W., Bernard, B. A., & Fox, J. H. 
(1988). Estimation of premorbid intelligence in dementia. Journal 
of Clinical and Experimental Neuropsychology, 10, 63-64. 

Stebbins, G. T, Wilson, R. S., Gilley, D. W, Bernard, B. A., & Fox, J. H. 
(1990). Use of the National Adult Reading Test to estimate pre- 
morbid IQ in dementia. The Clinical Neuropsychologist, 4, 18-24. 

Stern, R. A., & White, T. (2003). Neuropsychological Assessment Bat- 
tery. Lutz, FL: PAR. 

Sweet, J., Moberg, P., & Tovian, S. (1990). Evaluation of Wechsler 
adult intelligence scale — revised premorbid IQ clinical formulas 
in clinical populations. Psychological Assessment, 2, 41-44. 

Taylor, M. J., & Heaton, R. K. (2001). Sensitivity and specificity of 
WAIS-III/WMS-III demographically corrected factor scores in 
neuropsychological assessment. Journal of the International Neu- 
ropsychological Society, 7, 867-874. 

Thurstone, L. L. (1938). Primary mental abilities. Chicago: University 
of Chicago Press. 

Tulsky, D. S., Ivnik, R. J., Price, L. R., & Wilkins, C. (2003). Assessment 
of cognitive functioning with the WAIS-III and WMS-III: Devel- 
opment of a six-factor model. In D. S. Tulsky, D. H. Saklofske, 
G. J. Chelune, R. K. Heaton, R. J. Ivnik, R. Bornstein, A. Prifitera, 
& M. F. Ledbetter (Eds.), Clinical interpretation of the WAIS-III 
and WMS-III (pp. 149-182). San Diego: Academic Press. 

Tulsky, D. S., Saklofske, D. S., & Ricker, J. (2003). Historical overview 
of intelligence and memory: Factors influencing the Wechsler 
Scales. In D. S. Tulsky, D. H. Saklofske, G. J. Chelune, R. K. Heaton, 
R. J. Ivnik, R. Bornstein, A. Prifitera, & M. F. Ledbetter (Eds.), 
Clinical interpretation of the WAIS-III and WMS-III (pp. 7-41). 
San Diego: Academic Press. 



114 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



Vanderploeg, R. D., Schinka, J. A., & Axelrod, B. N. (1996). Estima- 
tion of WAIS-R premorbid intelligence: Current ability and de- 
mographic data used in a best-performance fashion. Psychological 
Assessment, 8, 404-411. 

Vanderploeg, R. D., Schinka, J. A., Baum, K. M., Tremont, C, & Mit- 
tenberg, W. (1998). WISC-III premorbid prediction strategies: 
Demographic and best performance approaches. Psychological 
Assessment, 10, 277-284. 

Watt, K. J., & O'Carroll, R. E. (1999). Evaluating methods for estimat- 
ing premorbid intellectual ability in closed head injury. Journal of 
Neurology, Neurosurgery, and Psychiatry, 66, 474-479. 

Wechsler, D. (1958). The measurement and appraisal of adult intelli- 
gence (4th ed.). Baltimore, MD: Williams and Wilkins. 

White, T., & Stern, R. A. (2003). Neuropsychological Assessment Bat- 
tery: Psychometric and technical manual. Lutz, FL: PAR. 



Wiens, A. N., Bryan, J. E., & Crossen, J. R. (1993). Estimating WAIS-R 
FSIQ from the National Adult Reading Test — Revised in normal 
subjects. The Clinical Neuropsychologist, 8, 70-84. 

Wilson, R. S., Rosenbaum, G., Brown, G., Rourke, D., Whitman, D., & 
Grisell, ]. (1978). An index of premorbid intelligence. Journal of 
Consulting and Clinical Psychology, 46, 1554-1555. 

Williams, J. M. (1997). The prediction of premorbid memory ability. 
Archives of Clinical Neuropsychology, 12, 745-756. 

WiUshire, D., Kinsella, G., & Prior, M. (1991). Estimating WAIS-R IQ 
from the national adult reading test: A cross-validation. Journal of 
Clinical and Experimental Neuropsychology, 13, 204-216. 

Yeates, K. O., & Taylor, H. G. (1997). Predicting premorbid neuropsy- 
chological functioning following pediatric traumatic brain injury. 
Journal of Clinical and Experimental Neuropsychology, 19, 
825-837. 



Bayley Scales of Infant Development— Second Edition (BSID- 



PURPOSE 

The Bayley Scales of Infant Development, Second Edition 
(BSID-II) are designed to assess mental, motor, and behav- 
ioral development of infants and preschoolers. 



SOURCE 

The BSID-II (Bayley, 1993) can be ordered from the Psycho- 
logical Corporation, P.O. Box 9954, San Antonio, TX 78204- 
0354 (www.harcourtassessment.com). A complete kit with 
manual, 25 Record Forms, Motor Scale Forms, Behavior Rat- 
ing Scale Forms, Stimulus Cards, and other necessary materials 
costs $995 US. A Dutch version normed in the Netherlands, 
the BOS-II, will also soon be available. 



AGE RANGE 

The test can be administered to children between the ages of 1 
month and 3 1/2 years (i.e., 42 months). 



DESCRIPTION 

Overview 

The Bayley is the most important and most widely used devel- 
opmental assessment test. The original Bayley Scales of Infant 
Development were published in 1969 (Bayley, 1969); a second 
edition appeared in 1993 (BSID-II; Bayley, 1993). A new edi- 
tion is forthcoming (BSID-III). Items from the Bayley tests 
have as origins some of the earliest tests of infant develop- 
ment (e.g., Bayley, 1933, 1936; Gesell, 1925; Taffa, 1934; see 
Bendersky & Lewis, 2001; Bracken & Walker, 1997; and 
Brooks-Gunn & Weintraub, 1983, for reviews of the early his- 
tory of infant assessment). 

The BSID-II is designed to assess cognitive, physical, lan- 
guage, and psychosocial development of infants, toddlers, and 
preschoolers (Bayley, 1993). Although it arose out of the study 



of normal infant development, one of its primary uses is to 
evaluate infants suspected of delay or atypical development to 
determine eligibility for services and to track progress over 
time. Often, these are young children who were born prema- 
ture or of low birth weight or who have a major congenital 
anomaly, delayed milestones, or other risk factors for develop- 
mental disability. In addition, it is a preferred tool for devel- 
opmental research: hundreds of studies on clinical and 
nonclinical populations have been published involving the 
original BSID (see Black & Matula, 2000), and many more ap- 
pear each year using the BSID-II. 

The BSID-II is a developmental test, not an intelligence test 
(Bayley, 1993). Unlike intelligence tests that assume a curvilin- 
ear function between age and ability, the manual specifies that a 
developmental test assesses different abilities present at differ- 
ent ages. The BSID-II is therefore "designed to sample a wide 
array of emergent developmental abilities and to inventory the 
attainment of developmental milestones" (Bayley, 1993, p. 8). 

According to the manual, the Bayley Scales were designed 
to provide a set of standard situations and tasks that would al- 
low the child to display an observable set of behavioral re- 
sponses. One of its primary goals is to provide a method for 
standardized assessment that nevertheless allows considerable 
flexibility in administration. For example, the examiner can 
vary the order of item administration depending on the child's 
temperament, interest level, and the level of rapport estab- 
lished with the child (e.g., an item can be administered later, 
once an initially shy child becomes more sociable; Bayley, 
1993). Incidentally observed behaviors are also scorable, even 
if they occur before or after the actual test administration; test- 
ing can also occur over more than one session if required. 



Test Structure 

The BSID-II consists of three parts (see Table 6-7). According 
to the manual, each is designed to be complementary and to 
contribute unique information. The first two scales consist of 
structured assessment batteries designed to assess general 



Bayley Scales of Infant Development — Second Edition (BSID-II) 115 



Table 6-7 BSID-II Scales and Scores 



Scale 



Mental Scale 



Motor Scale 



Behavior Rating Scale (BRS) 



Score 



Mental Development Index (MDI) 



Psychomotor Development Index (PDI) 



Total Score 

Attention/ Arousal (up to 6 months) 
Orientation/Engagement (6 months +) 
Emotional Regulation (6 months +) 
Motor Quality (all ages) 



Overview 

Measures general cognitive development, including items that tap 
memory, habituation, problem solving, early number concepts, 
generalization, classification, vocalizations, language, and social 
skills; facet scores can also be derived using specific item clusters 

Measures overall motor development, including items assessing 
gross and fine motor skills (e.g., rolling, crawling, creeping, 
sitting, standing, walking, running, jumping, prehension, use of 
writing implements, imitation of hand movements) 

A rating scale completed by the examiner that provides a 
qualitative assessment in percentile form that reflects two or 
three factors, depending on age 



Adapted from Bayley, 1993. 



cognitive and motor development (Mental and Motor Scales), 
and the third component is a rating scale (Behavior Rating 
Scale [BRS] ). The Mental Scale provides an index score for gen- 
eral cognitive development (i.e., the Mental Development In- 
dex [MDI]). The Motor Scale provides an index of overall 
motor development (i.e., Psychomotor Development Index 
[PDI] ). From these two scales, four "facet" scores can be derived: 
Cognitive, Language, Motor, and Personal/Social. Although the 
manual does not provide additional information on the devel- 
opment or psychometric characteristics of the facet subscales, it 
notes that the facet scores are not intended for identifying im- 
pairment in specific subdomains, but rather for identifying rel- 
ative strengths and weaknesses. The facet scores were derived 
on rational and semi- empirical grounds (i.e., item placement 
into facets by expert review and based on correlation between 
each item and the final scales; Bayley, 1993). 

The BRS provides a qualitative assessment of behavior re- 
flecting three main factors: Orientation/Engagement, Emotional 
Regulation, and Motor Quality (infants under 6 months of age 
are assessed for Motor Quality and Attention/ Arousal only). 
Items are rated on a five-point Likert scale by the examiner, 
based in part on information provided by the caregiver. Unlike 
tests for older children that primarily assess test performance 
(e.g., Wechsler scales), the BRS is considered a critical aspect of 
the BSID-II because an infant's state (including engagement, 
arousal level, and motivation at the time of testing) may sub- 
stantially influence the Mental and Motor Scale scores (Black & 
Matula, 2000). In other words, the BRS is designed to supple- 
ment the Mental and Motor Scales by providing information on 
the validity of the evaluation, as well as on the child's relations 
with others and how the child adapts to the testing situation. 

Item Sets and Test Content 

One of the major features of the BSID-II is the organization of 
the test into a series of separate item sets of increasing diffi- 
culty that correspond to developmental ages. This arrange- 
ment makes sense theoretically because the test is designed to 



measure attainment of developmental milestones rather than 
specific cognitive domains. Developmental progression is re- 
flected in the relative domain content of item sets across age, in 
addition to the difficulty level of each item set. The result is a 
Mental Scale that primarily measures sensory and perceptual 
development in infancy and language skills and other cognitive 
abilities after 12 months (Fugate, 1998). Likewise, the Motor 
Scale assesses primarily gross motor skills in infancy and fine 
motor skills, along with other aspects of gross motor abilities, 
after 12 months (Fugate, 1998). The manual notes that this re- 
sults in less predictability from one testing to the next com- 
pared to intelligence tests (Bayley, 1993) because different item 
types are tested at different ages. However, items reflecting 
abilities that predict later functioning were specifically added 
to the BSID-II, and items for the older age bands are similar to 
those of intelligence tests for preschoolers such as the WPPSI- 
R (Bayley, 1993), which should afford some predictive validity. 
Item content for the BSID-II, according to the manual, is 
"theoretically eclectic," with items gleaned from a broad cross- 
section of developmental research and developmental testing 
(Bayley, 1993). New items were selected based on careful re- 
view of developmental research; these include items tapping 
recognition memory, habituation of attention, problem solv- 
ing, number concepts, language, personal/social development, 
and motor abilities (see pp. 12-14 of manual). 

BSID-II Versus BSID 

The BSID-II differs from the original BSID in several ways. It 
has extended age bands (1-42 months compared with 2-30 
months of age) and a newer normative sample. Many original 
items were dropped and many new items were added; older 
items were also rewritten and modified (Black & Matula, 
2000). About 76% of the Mental Scale items and 84% of the 
Motor Scale items were retained. Stimulus materials were also 
redesigned and updated (Bayley, 1993). In the original BSID, 
items were arranged in order of increasing difficulty, and 
basals and ceilings were established by passing or failing 



116 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



10 consecutive items. Because this technique was thought to 
be too time-consuming and frustrating for children, the 
BSID-II was revised to include item sets that correspond to 
specific age levels. As a result of the newer normative sample, 
altered format, and changed content, children retested with 
the BSID-II obtain significantly lower scores than when tested 
with the original BSID. Caution is therefore recommended 
when comparing scores from different versions of the test, par- 
ticularly when retest scores suggest a decline in functioning 
(DeWitt et al., 1998; see also Validity for details). 

BSID-II Translations 

The BSID-II Manual does not provide information on how to 
use the test with children whose first language is not English, 
and, to our knowledge, there currently exist no official ver- 
sions of the BSID-II normed in other languages apart from the 
BOS-II. It has nevertheless been used within the United States 
to assess Spanish-speaking children (e.g., Leslie et al., 2002) and 
in Canada to assess Francophone children from Quebec (e.g., 
Pomerleau et al, 2003). The test has also been used in Norway 
(Moe & Slinning, 2001), Finland (Lyytinen et al., 1999; Sa- 
janiemi et al, 2001a, 2001b), Germany (e.g., Walkowiak et al, 
2001), Malaysia (Ong et al, 2001), and Bangladesh (Hamadani 
et al., 2002). The original BSID was adapted for use in Nigeria; 
the modified version appears to show higher predictive ability 
than the original scale (Ogunnaike & Houser, 2002). Similar 
adaptations have been employed in Kenya (Sigman et al., 1988). 

ADMINISTRATION 

For details, see manual. Testing is conducted under optimal 
conditions (i.e., when the infant is fully alert and in the pres- 
ence of the primary caregiver. Other aspects of testing infants, 
such as an appropriate testing environment, reinforcement, 
providing feedback, and dealing with tantrums and fatigue 
are discussed in Black and Matula (2000) and in the manual. 
Above all else, examiners should ensure that examinees are 
well-fed and well-rested prior to attempting administration 
(Bendersky & Lewis, 2001). 

Because the examiner alters test administration in response 
to the child's performance, examiners should be knowledge- 
able about infant behavior and be experienced in test admin- 
istration. Black and Matula (2000) state that testing infants 
with the Bayley is a difficult skill that requires mastery of ad- 
ministration rules, thorough knowledge of normal and atypi- 
cal infant development, and careful attention to the infant 
and caregiver during test administration. Both novice and ex- 
perienced test users should attend at least one workshop in its 
use prior to administering the BSID-II (Nellis & Gridley, 
1994). Training should include lectures, demonstrations, 
practice testing of infants, and evaluation of videotaped ad- 
ministrations, as well as training to a criterion of greater than 
90% agreement between experienced examiners (Black & 
Matula, 2000). Special care should be given to learning spe- 
cific items, given the low agreement between raters. Chandlee 



et al. (2002) provide detailed administration/scoring guide- 
lines for specific items that tend to have low interrater agree- 
ment even in experienced, trained examiners. These are 
reproduced in Table 6-8. Familiarity with Black and Matula's 
text (2000) is also highly recommended. 

Materials and Setting 

Manipulatives are an integral part of the Bayley. The test kit 
therefore contains numerous small items (e.g., toys, cups, 
blocks, etc.). These are appealing and child-friendly, and main- 
tain the interest of most infants and toddlers (Nellis & Gridley, 
1994). Other items needed for administration but not in- 
cluded in the testing kit are plain paper, small plastic bags, tis- 
sues, and a stopwatch. Following each testing session, testing 
materials should be washed. Note that testing needs to occur 
in a location that allows administration of all the motor items, 
which includes access to three steps and to at least 9 feet of 
floor space to allow the child to stop from a full run. Because 
of practicality and safety issues related to using stairs in public 
settings (C. Miranda, personal communication, January 
2005), many examiners build a set of stairs according to the 
height/width specifications detailed in the manual. 

Although the test is designed to be portable, the kit itself 
is quite large and heavy, and contains many small parts that 
can be difficult to pack away quickly (Black & Matula, 2000). 
The testing manual includes both administration instruc- 
tions and technical information. Record booklets are well 
designed and contain much helpful information on admin- 
istration. 



Start Points, Basals, and Ceilings 

Start points for item sets are based on chronological age. 
However, start points can be altered depending on the exam- 
iner's judgment of the child's actual ability levels. For exam- 
ple, earlier start points can be selected for low-functioning 
children or those with atypical development, based on the ex- 
aminer's estimation of the child's developmental level as de- 
termined by behavioral observation, caregiver report, or other 
test scores (but see Comments for further discussion of this 
issue) 

Testing proceeds by item set until the child has met basal 
and ceiling criteria. On the Mental Scale, the basal is five or 
more correct items anywhere within the item set. A ceiling is 
attained when a child fails three or more items anywhere 
within the item set. On the Motor Scale, these values are four 
and two, respectively. 



BRS 

The BRS is filled out by the examiner following completion of 
the test, combined with information provided by the caregiver 
(e.g., whether the session was typical of the child's behavior 
and reflective of his or her skills). Items are rated on a five- 
point Likert-type scale. 



Table 6-8 Recommended Administration and Scoring Guidelines for BSID-II Items With Low Interscorer Agreement 



BSID-II Mental Possible Problems Encountered During 
Scale Item Administration or Scoring 



111. 



113. 



114. 



116. 



117. 



119. 



121. 



Incidental scoring issue 

Vague scoring criteria — minimal examples provided 

as to what constitutes a pass 



Incidental scoring issue 

Vague scoring criteria — examiners may differ on 
acceptance of "poorly articulated words" and "word 
approximations" as well as degree of temporal delay 
necessary for words to be considered nonimitative 



Incidental scoring issue 

Vague scoring criteria — examiners may differ on 
multiple criteria, including word articulation, 
interpretation of one or more concepts, and pauses 
between words 



Vague scoring criteria — difficult to differentiate 
between a "scribble" and a stroke 
Child may make definitive stroke on Item 9 1 , then 
scribble definitively on Item 116, but does not make 
another stroke 

Vague scoring criteria — examiners may differ on 
acceptance of "poorly" 

Inaccurate timing — child may pick up first peg 
before examiner completes directions, delaying start 
of timing 
Child may place and remove pegs 



May be incidental scoring issue 
Vague scoring criteria — examiners may differ on 
acceptability of pronouns that are not used in a 
"grammatically correct" manner 



Recommended Administration and Scoring Practices 

Examiner must closely monitor and record each vocalization 

produced 

Examiner must carefully determine whether word and gesture 

occur simultaneously 

Scoring criteria need clarification in manual (e.g., further 

exemplars) 

As stated in manual, caregiver's interpretation or report is not 

sufficient for credit 

Examiner must closely monitor and record each vocalization 

produced 

Examiner must carefully note whether the child's "intent is clear" 

for poorly articulated words or word approximations 

to be credited 

Examiner must carefully determine whether words occur 

spontaneously rather than in imitation 

Scoring criteria need clarification in manual (e.g., degree of 

temporal delay necessary for words to be considered 

nonimitative) 

As stated in manual, caregiver's interpretation or report is not 

sufficient for credit 

Examiner must closely monitor and record each vocalization 

produced 

Examiner must carefully note whether words signify "different 

concepts," are not separated by "distinct pause," and are 

used "appropriately" 

Scoring criteria need clarification in manual (e.g., further 

specification of above concepts) 

As stated in manual, caregiver's interpretation or report is not 

sufficient for credit 

Scoring criteria need clarification in manual (e.g., whether 
imitation or discrimination of different fine-motor 
movements is critically variable) 

According to manual, if child does not make another stroke on 
Item 116 after making scribble, child does not receive pass 
(temporal delay from Item 91 negates pass) 

Scoring criteria need clarification in manual 

Critically important for examiner to time accurately (e.g., begin 

timing as soon as child picks up first peg; stop timing 

immediately when six pegs are in board; time allotted 

70 seconds) 

All six pegs must be "standing" in pegboard at end of 25 seconds 

to score a pass 

As stated in manual, this item may be administered up to 

three times 

Examiner must closely monitor and record each vocalization 

produced 

Examiner should note specific directions in the manual for 

eliciting a response if none occurs spontaneously 

Scoring criteria need clarification in manual (e.g., exemplars 

of pronouns not used in grammatically correct manner, 

but credited) 

As stated in manual, use of pronouns need not be 

grammatically correct 

(continued) 



118 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 

Table 6-8 Recommended Administration and Scoring Guidelines for BSID-II Items With Low Interscorer Agreement (continued) 

BSID-II Mental Possible Problems Encountered During 
Scale Item Administration or Scoring 



127. 



129. 



136. 



142. 



148. 



159. 



165. 



Incidental scoring issue 

Vague scoring criteria — examiners may differ on 
acceptability of multiple -word utterances that are 
not "grammatically correct" 



Incidental scoring issue 

Vague scoring criteria — requires judgment as to 

whether "new information" is included 



Incidental scoring issue 

Vague scoring criteria — examiners may differ on 
degree of temporal delay necessary for question to 
be nonimitative 



Vague scoring criteria — multiple-word utterances 
may not be different 

Child may make utterances during reading of the 
story (Item 131) 

May be incidental scoring issue 
Vague scoring criteria — examiners may differ on 
acceptability of past tense verbs that are not 
"correctly formed" 

Vague scoring criteria — child may state numbers in 
same order (Items 146 and 164), but not stop at 
same endpoint 

Inaccurate timing — child may pick up first shape 
before examiner completes directions, delaying 
start of timing 
Child may place and remove shapes 



Recommended Administration and Scoring Practices 

Examiner must closely monitor and record each vocalization 

produced 

Unclear whether examiner must note whether words are not 

separated by "distinct pause," and are used "appropriately" (see 

Item 114) 

Scoring criteria need clarification in manual (e.g., clarify 

above concepts) 

As stated in manual, caregiver's interpretation or report is not 

sufficient for credit 

Examiner must closely monitor and record each vocalization 

produced 

Critical for examiner to accurately record each vocalization 

to determine if multiple-word utterances include topics of 

prior utterances 

Also critical for examiner to make judgment as to whether "new 

information" is included in multiple-word utterances 

Scoring criteria need clarification in manual (e.g., further 

exemplars of utterances including "new information") 

Examiner must closely monitor and record each vocalization 

produced 

Examiner should note specific directions in the manual for 

evoking a response if none occurs spontaneously 

Scoring criteria need clarification in manual (e.g., how much 

temporal delay necessary for question to be considered 

nonimitative) 

Two identical multiple-word utterances may be scored as pass if 

temporal delay between them 

Utterances during reading of story scored as pass 

Scoring criteria need clarification in manual (e.g., further 

exemplars and nonexemplars) 

Examiner must closely monitor and record each vocalization 

produced 

Scoring criteria need clarification in manual (e.g., further 

exemplars) 

Even if child does not stop at same endpoint, if numbers stated in 
same order (Items 146 and 164), score as pass 
Scoring criteria need clarification in manual (e.g., inclusion of 
exemplars and nonexemplars) 

Critically important for examiner to time accurately (e.g., begin 

timing as soon as child picks up first shape; stop timing 

immediately when all nine shapes are in board; time allotted 

150 seconds) 

All nine shapes must be completely inserted in board at end of 30 

seconds to score a pass 

As stated in manual, only one trial permitted 



Source: From Chandlee et al., 2002. Reprinted with permission. 



ADMINISTRATION TIME 

Testing time for children less than 15 months of age is 25 to 35 
minutes; for older children, it is up to 60 minutes. Actual test- 
ing time varies depending on the experience of the examiner, 
the choice of an appropriate basal level, the child's level of co- 
operation, variability of responses, and level of competence. 



SCORING 

Scores 

There is no computerized scoring system available as of this 
writing, but one is anticipated for the third edition. Raw scores 
are converted into standard scores (MDI and PDI) based on 



Bayley Scales of Infant Development — Second Edition (BSID-II) 119 

Table 6-9 BSID-II Developmental Index Scores Classification and Frequencies 







Theoretical 


Mental Scale 


Motor Scale 


Score Range 


Classification 


Normal Curve % 


(Actual Sample %) 


(Actual Sample %) 


115 and above 


Accelerated Performance 


16.0 


14.8 


16.5 


85-114 


Within Normal Limits 


68.0 


72.6 


68.7 


70-84 


Mildly Delayed Performance 


13.5 


11.1 


12.5 


69 and below 


Significantly Delayed Performance 


2.5 


1.5 


2.3 



Source: From Bayley, 1993. Reprinted with permission. 



the child's chronological age by using the appropriate tables in 
the manual (M= 100, SD= 15, range 50-150). Corrected 
age (i.e., gestational as opposed to chronological) is used 
for premature infants (see Comment for further discussion 
of this issue). Significance levels and cumulative percent- 
ages of the standardization sample obtaining various MDI- 
PDI discrepancies can be obtained in the manual (Bayley, 
1993). 

Facet scores reflecting age-equivalent performance in four 
areas of development can also be derived (Cognitive, Lan- 
guage, Motor, and Social). However, only developmental ages 
can be derived, based on the number of items passed at each 
age level; these are shown on the Record Form. Percentile 
ranks only are provided for the BRS. 



Classification of Developmental Index Scores 
and BRS Scores 

BSID-II scores can be interpreted using the four recom- 
mended classifications shown in the manual (e.g., Acceler- 
ated Performance, Within Normal Limits, Mildly Delayed 
Performance, or Significantly Delayed Performance). These 
are presented in Table 6-9, along with corresponding score 
ranges. For the BRS, scores are described in terms of three 
classification levels (i.e., Within Normal Limits, Question- 
able, and Nonoptimal). Corresponding ranges are shown in 
Table 6-10. 



Additional Domain Scores 

Although not expressly designed for this purpose, re- 
searchers have attempted to organize the items into sub- 
scales assessing specific domains of ability. For instance, 
Seigel et al. (1995) formed language subtests from BSID 



Table 6-10 BSID-II Behavior Rating Scale (BRS) Cutoff Scores 
Score Percentile Range 



Within Normal Limits 

Questionable 

Nonoptimal 



At or above the 26th percentile 
Between 11th and 25th percentile 
At or below the 10th percentile 



items that were useful in the identification of children with 
delayed language. From the BSID-II, Choudhury and Gor- 
man (2000) developed a scale of items measuring cognitive 
ability with reported sensitivity to attentional functioning 
(COG scale). For a list of items, see Table 6-11. Although 
they show promise in providing additional information on 
strengths and weaknesses, these scales need further refine- 
ment before they can be used in the diagnostic evaluation of 
individual children. 



Table 6-1 1 BSID-II Mental Scale Items Used for 
the COG Scale 



Item Number Description 



97 
98 
102 
103 
104 
105 
112 
115 
116 
118 
119 
120 
123 
125 
128 
130 
132 
135 
137 
138 
139 
143 
144 
144 
147 



Builds tower of two cubes 

Places pegs in 70 seconds 

Retrieves toy (visible displacement) 

Imitates crayon stroke 

Retrieves toy (clear box II) 

Uses rod to attain toy 

Places four pieces in 150 seconds 

Completes pink board in 180 seconds 

Differentiates scribble from stroke 

Identifies objects in photograph 

Places pegs in 25 seconds 

Completes reversed pink board 

Builds tower of six cubes 

Matches pictures 

Matches colors 

Completes blue board in 75 seconds 

Attends to story 

Places beads in tube in 120 seconds 

Builds tower of eight cubes 

Matches four colors 

Builds train of cubes 

Imitates horizontal stroke 

Recalls geometric form 

Discriminates picture I 

Compares masses 



Source: From Bayley, 1993. Reprinted with permission. 



Notes: Items excluded from the COG composite score were predomi- 
nantly language-related items that assessed preexisting skills (seman- 
tics and conceptual labels). It is important to note that although 
attention is required to correctly respond to these language-based 
items, they do not necessarily require children to sustain attention to 
the items to correctly respond and receive credit. 

Source: From Choudhury & Gorman, 2000. Reprinted with permission. 



120 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



DEMOGRAPHIC EFFECTS 



Age 



As noted above, items for the BSID-II were specifically in- 
cluded if they showed sensitivity to age. 



Gender 

There is some evidence, using the previous test version, that 
girls' scores are more predictive of later IQ than boys' (Anders- 
son et al., 1998). However, most studies report little or no effect 
of gender on test scores (e.g., r=-.16; Lyttinen et al., 1999), 
apart from interactions between developmental insult and gen- 
der suggesting that boys are more vulnerable to cognitive delays 
(e.g., Moe & Slinning, 2001; Sajaniemi et al., 2001a). 

Education/Socioeconomic Status 

Parental education, often used as a marker for socioeconomic 
status (SES), tends to be positively associated with scores on 
cognitive tests in children. Evidence for an effect of parental 
education on BSID-II scores is mixed, due to the use of differ- 
ent samples across studies. For example, correlations between 
BSID-II scores and parental education were moderate to high 
in one study of normal 27-month-olds, with both mothers' 
and fathers' education appearing to have about the same asso- 
ciation with the MDI (r= .45-.50; Roberts et al., 1999). In 
contrast, modest correlations have been reported in other 
samples for maternal education (e.g., r= .21) but not paternal 
education (Lyytinen et al., 1999, 2001). In other studies in- 
volving low SES samples with diverse ethnic backgrounds, no 
association between demographic variables such as mater- 
nal/paternal education, ethnicity, and BSID-II MDI scores are 
found (Shannon et al., 2002), possibly due to restricted range. 
However, scores of children from high parental education/SES 
backgrounds are typically higher than those of low educa- 
tion/SES backgrounds. For example, in one sample of highly 
educated, affluent parents, children obtained mean BSID-II 
MDI scores of 108.2 (SD = 11.9; range 84-134). In compari- 
son, MDI scores of 6-month-old infants of at-risk mothers 
characterized by young age, low education, and low income 
were slightly lower, though still within normal limits (M = 93.8, 
SD= 5.0; Pomerleau et al., 2003). Note that several studies 
have found that for children of low SES/low parental educa- 
tion, BSID-II scores exhibit a relative decrease or lag over 
time as children develop (e.g., Mayes et al., 2003) (see also 
Ethnicity /SES). 

Ethnicity/SES 

Empirical and rational methods for the elimination of bias 
were employed in the development of the BSID-II, which 
should minimize differences based on ethnicity/culture (see 
Validity, Content). Additionally, the widespread use of the test 



across cultures and languages suggests that the test is largely 
unbiased (see Translations), nor does the test appear to dif- 
ferentially penalize ethnic subgroups within the larger U.S. 
population. For instance, African American preschoolers 
attending community-based childcare centers obtain scores 
that are generally within the average range at 18 months of 
age (i.e., 95.7, SD= 10.1, range 79-122), despite a low-income 
background (Burchinal et al., 2000). Similarly, the proportion 
of children identified as significantly delayed, mildly delayed, 
or within normal limits does not differ proportionally by 
ethnicity in children apprehended for neglect or abuse (Leslie 
et al., 2002), suggesting that children of minority status are not 
differentially penalized on the test. Even children living in ex- 
treme poverty in the slums of Bangladesh typically achieve 
scores that are within normal limits, at least on the MDI (i.e., 
MDI = 102.6, SD= 10; PDI= 95.7, SD= 15; Hamadani et al, 
2002). 

Nevertheless, it is important to note that the attainment of 
developmental milestones may differ depending on cultural 
beliefs and caregiving practices; therefore, some groups may 
obtain lower scores on some items, reflecting alternative prac- 
tices rather than developmental deviance (Leslie et al., 2002). 
For example, normally developing Brazilian infants obtain 
lower scores than U.S. infants on the Motor Scale at 3, 4, and 5 
months of age. After this time, scores are equivalent to U.S. 
norms (Santos et al., 2000, 2001). Items exhibiting possible 
bias (i.e., passed by 15% or less of Brazilian infants) measured 
sitting, grasping, and hand posture (see Table 6-12 for specific 
items). However, items tapping motor behavior characteris- 
tics of older infants, such as crawling, cruising, and walking, 
were not different between groups. This is attributed to differ- 
ent child rearing practices, possibly mediated in part by 
parental education, such that Brazilian infants are held and 
carried most of the time, are rarely placed on the floor or 
seated without support, and have fewer opportunities to ma- 
nipulate toys compared with U.S. infants (Santos et al., 2000, 
2001). Other research within the United States indicates that 
performance on some BSID-II motor items are mediated in 
part by caregiving practices involving sleep position (Ratliff- 
Shaub et al., 2001; see Normative Data for more details). 

Table 6-1 2 Items Failed by 85% or More of Normally Developing 
Brazilian Infants (Ages 3 to 5 Months) 



Item 


Movement Group 


Rotates wrist 


Ungrouped 


Sits alone momentarily 


Sitting 


Uses whole hand to grasp rod 


Grasping 


Uses partial thumb opposition to grasp cube 


Grasping 


Attempts to secure pellets 


Grasping 


Sits alone for 30 seconds 


Sitting 


Sits alone while playing with toys 


Sitting 


Sits alone steadily 


Sitting 


Uses whole hand to grasp pellets 


Grasping 



Source: From Santos et al., 2000. Reprinted with permission. 



Bayley Scales of Infant Development — Second Edition (BSID-II) 121 



When scoring BSID-II items from cultural groups with care- 
giving practices that differ from North American practices, 
adjusting scores on these items may be appropriate to avoid 
underestimating ability. 

It is important to note that research shows that children 
from at-risk backgrounds, including those of poor, primarily 
minority, single-parent families with significant familial chaos, 
have decreasing BSID-II scores over time when multiple assess- 
ments are compared, despite normal-range scores in infancy 
(e.g., DeWitt et al., 1998; Mayes et al, 2003). The oft-cited ex- 
planation for this decline is that environmental factors become 
more important in determining skills as children grow, which 
maximizes the adverse impact of inadequate environments on 
development over time. An additional explanation is that the 
change in BSID-II item type from infancy to toddlerhood in- 
teracts with cultural factors. Thus, familiarity with situations 
similar to item sets for older toddlers (e.g., compliance and 
table-top activities) may differ across ethnic groups because of 
cultural factors. Lower scores in older children may therefore 
signify that that the BSID-II does not capture relevant compe- 
tence as defined in these cultural subgroups, which explains the 
decrease in scores over time for some low SES, primarily mi- 
nority groups (Black et al, 2001; Leslie et al, 2002). It is impor- 
tant to note that, regardless of the reason for the decrease in 
scores over time, the result is that a larger proportion of at-risk 
children may be identified as requiring intervention services 
with increasing age. 

NORMATIVE DATA 

Standardization Sample 

The BSID-II standardization sample consists of a national, 
stratified, random sample of 1700 children, stratified accord- 
ing to data from the 1988 U.S. Census. Table 6-13 shows 
sample characteristics with regard to stratification variables 
(e.g., parental education, ethnicity). One hundred children in 
17 separate age bands corresponding to specific ages were 
tested (i.e., 1, 2, 3, 4, 5, 6, 8, 10, 12, 15, 18, 21, 24, 27, 30, 36, 
and 42 months). Note that not all age intervals are repre- 
sented. Therefore, to derive standard scores for the Mental 
and Motor Scales across age, raw score distributions for each 
age group from the standardization sample were normalized. 
Irregularities within and across ages were eliminated by 
smoothing, and raw to index score conversions for age 
groups not included in the sampled age bands were interpo- 
lated (Bayley, 1993). 

To derive percentiles for the BRS, data from the standardi- 
zation sample, in addition to data from 370 children with 
clinical diagnoses, were analyzed for three age groups (1-5 
months, 6-12 months, and 13-42 months). Frequency distri- 
butions were constructed, and cutoffs were derived to reflect 
three categories: (1) Within Normal Limits, (2) Question- 
able, and (3) Nonoptimal (see Table 6-10 for actual score 
ranges). 



Table 6-1 3 Characteristics of the BSID-II Normative Sample 



Number 


1700 a 


Age 


1 to 42 months 


Geographic location 




Northeast 


18.7% 


South 


33.4% 


North Central 


23.9% 


West 


24.0% 


Sample type 


National, stratified, random 




sample b 


Parental education 




0-11 years 


16.5% 


12 years 


36.5% 


13-15 years 


26.0% 


16 years or more 


21.2% 


SES 


Not specified 


Gender 




Males 


50% 


Females 


50% 


Race/ethnicity 




African American 


15.0% 


Hispanic 


11.6% 


White 


69.7% 


Other 


3.7% 



Screening 



Only healthy children were 
included, defined as any child 
born without significant 
medical complications, without 
history of medical 
complications, and not 
currently diagnosed with or 
receiving treatment for mental, 
physical, or behavioral problems, 
based on parent report 



a Based on 17 age groupings of 100 cases each; more age groupings appear in the 
younger children (1-12 months) because of more rapid development compared to the 
older children (13-42 months); actual ages included are 1,2,3, 4, 5, 6, 8, 10, 12, 15, 18, 
21, 24, 27, 30, 36, and 42 months. 
b Based on 1988 U.S. Census data. 



Source: Adapted from Bayley, 1993. 



Demographically Adjusted Norms 

Given an apparent effect of caregiving practices on the emer- 
gence of certain motor milestones (see Ethnicity), score ad- 
justment may be considered for infants with cultural 
backgrounds that differ from that of the normative group. 
Raw score ranges and confidence intervals for Brazilian in- 
fants are shown in Table 6-14, given the temporary delay in 
the emergence of some motor skills in this group compared 
with the standardization sample. 

Note that even within North American samples, caregiving 
practices may influence performance on certain motor items. 
For example, healthy infants who sleep supine (i.e., on their 
back) fail more motor items than infants who sleep prone, 



122 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



Table 6-1 4 Raw Score Ranges for Brazilian Infants on the BSID-II 
PDI (Modal Maternal Education Level = 8 Years or Less) 



Age in Months 


N 


Mean Score 


95% Confidence Interval 


1 


47 


11.0 


10.4-11.7 


2 


40 


16.2 


15.6-16.9 


3 


42 


21.1 


20.5-21.8 


4 


36 


25.1 


24.4-25.9 


5 


38 


31.0 


30.1-32.1 


6 


39 


38.7 


37.7-39.7 



Note: Raw scores represent the same children retested over time. 
Source: From Santos et at, 2000. Reprinted with permission. 



particularly those items assessing head control tested prone. 
In addition, infants are more likely to fail items that are ad- 
ministered in the position opposite to their usual sleeping 
position (Ratliff-Schaub et al., 2001). The authors note that 
the BSID-II was normed during a time when the majority of 
young infants slept prone, a practice that is no longer rec- 
ommended because of an associated increased risk of Sud- 
den Infant Death Syndrome (SIDS), but which may have 
been associated with better head control in the standard- 
ization sample. Sleep practices may therefore need to be 
queried closely, particularly in infants with suspected motor 
delays. 

Estimation of Ability Levels Beyond Available 
Standard Score Range 

For very high or very low functioning infants (e.g., a standard 
score above 150 or below 50), the raw scores do not allow the 
calculation of a precise standard score. Because the test is in- 
tended for assessment of children with atypical and delayed 
development, scores below 50 are often needed. Age equiva- 
lents provided in the manual are one alternative (p. 325 of the 
manual). Derived (i.e., extrapolated) scores are another. Lind- 
say and Brouwers (1999) provide alternate age equivalents 
based on a linear derivation method for evaluating longitudi- 
nal development in very low or very high functioning infants. 
These are presented in Table 6-15. 

Robinson and Mervis (1996) also derived a regression 
equation to provide extrapolated standard scores for the MDI 
and PDI for very low scorers obtaining standard scores below 
50. Extrapolated standard scores are provided in Tables 6-16a 
and 6-16b. Because these scores are extrapolated and not 
based on actual data points, the authors note that they should 
be considered estimates only and used for research purposes. 

RELIABILITY 

Internal Reliability 

Internal consistency of the scales as measured by average coef- 
ficient alpha across age is high (i.e., .88 for the MDI, .84 for 
the PDI, and .88 for the BRS; Bayley, 1993). Within age bands, 



coefficients are also generally high, except for some of the 
younger age bands on the BRS (i.e., BRS alpha coefficient at 
age 2 months = .64, and falls between .71 and .77 for ages 3-5 
months; see manual for details). 

In the standardization sample, the SEM is 5.21 for the 
Mental Scale and 6.01 for the Motor Scale. Knowledge of 
these SEMs therefore provides a basis for constructing confi- 
dence intervals around the obtained score. However, the man- 
ual provides 90% and 95% confidence intervals for index 
scores based on the SE E , which is a more precise estimation 
method (see manual for details). 

Test-Retest Reliability 

Reliability is high for the Mental Scale and adequate for the 
Motor Scale, across age (i.e., r= .87 and .78, respectively, after 
a median interval of 4 days). Although adequate given the na- 
ture of the sample, test-retest reliability of some BSID-II 
scales, in some age groups, falls short in terms of standards for 
diagnostic evaluation of individual children. Overall, Mental 
Scale reliability is impressive at 2 to 3 years, and certainly 
meets criteria for clinical decision making. BRS total score re- 
liabilities are variable (i.e., quite low in the youngest group 
tested, but acceptable in 12-month olds; range of .55-90, de- 
pending on age). BRS factors, on average, are low to marginal 
except for Motor Quality, which demonstrates acceptable to 
high reliability across age. Stability by age is shown in Table 
6-17 based on test-retest stability estimates reported in the 
manual for 175 children from the standardization sample, 
drawn from four age groups (1, 12, 24, and 36 months) and 
retested after a median interval of four days. 

Because reliability coefficients do not inform on whether 
actual score ranges are maintained over time, percentage 
agreement for classification categories of the BRS are also pre- 
sented in the manual, based on dichotomous classification 
(e.g., Within Normal Limits vs. Nonoptimal/Questionable). 
Classification accuracy ranged from 73% (i.e., total score at 1 
month) to almost 97% (i.e., Motor Quality at 24-36 months), 
depending on age. Overall, Motor Quality ratings were highly 
consistent across age (all over 90% correctly classified), with 
all other factor scores showing at least 80% classification 
accuracy (Bayley, 1993). Classification accuracy information 
over time, to our knowledge, is not available for other BSID-II 
scores. 

Long-term stability of the BSID-II has not been thoroughly 
investigated. One study reports a one-year test-retest reliability 
of .67, based on testing at 12 months and 24 months in nor- 
mally developing toddlers (Markus et al., 2000). Pomerleau 
et al. (2003) report low test-retest reliability from a one-month 
visit to six-month visit in a small mixed sample of infants of 
high-risk, moderate-risk, and low-risk mothers, with scores ac- 
tually decreasing at 6 months (i.e., r=.30 for MDI; N=68). 
Thus, as noted by Bayley (1993), the test is not designed for 
long-term prediction of developmental level. 

It is important to note that test-retest reliability coef- 
ficients may be misleading (i.e., a low r may reflect sample 



Table 6-1 5 Extrapolated Age Equivalents for the BSID-II for High- and Low-Functioning Infants 



Developmental Age (Months) 



Developmental Age (Months) 



Developmental Age (Months) 



Raw Score Loess 



Manual 



Linear Raw Score 



Loess 



Manual 



Linear Raw Score Loess Manual Linear 








<1 


0.3 


60 


6.2 


5 


5.6 


120 


19.8 


20 


20.0 


1 




<1 


0.4 


61 


6.3 


6 


5.7 


121 


20.1 


20 


20.3 


2 




<1 


0.4 


62 


6.5 


6 


5.9 


122 


20.4 


20 


20.5 


3 




<1 


0.5 


63 


6.7 


6 


6.0 


123 


20.7 


21 


20.8 


4 




<1 


0.5 


64 


6.9 


6 


6.3 


124 


20.9 


21 


21.0 


5 




<1 


0.6 


65 


7.1 


6 


6.5 


125 


21.2 


21 


21.3 


6 




<1 


0.6 


66 


7.3 


7 


6.8 


126 


21.5 


22 


21.7 


7 




<1 


0.7 


67 


7.4 


7 


7.0 


127 


21.8 


22 


22.0 


8 




<1 


0.7 


68 


7.6 


7 


7.2 


128 


22.1 


22 


22.3 


9 




<1 


0.8 


69 


7.8 


7 


7.4 


129 


22.4 


23 


22.7 


10 




<1 


0.8 


70 


8.0 


7 


7.6 


130 


22.8 


23 


23.0 


11 




<1 


0.9 


71 


8.2 


8 


7.8 


131 


23.1 


23 


23.3 


12 




<1 


0.9 


72 


8.4 


8 


8.0 


132 


23.5 


24 


23.7 


13 




<1 


1.0 


73 


8.6 


8 


8.3 


133 


23.9 


24 


24.0 


M 


1.0 


1 


1.0 


74 


8.8 


8 


8.7 


134 


24.3 


24 


24.3 


15 


LI 


1 


1.1 


75 


9.0 


9 


9.0 


135 


24.8 


25 


24.7 


16 


LI 


1 


1.1 


76 


9.3 


9 


9.3 


136 


25.2 


25 


25.0 


17 


1.2 


1 


1.2 


77 


9.5 


9 


9.5 


137 


25.6 


25 


25.3 


18 


1.3 


1 


1.3 


78 


9.7 


10 


9.8 


138 


26.1 


26 


25.7 


19 


1.3 


1 


1.4 


79 


9.9 


10 


10.0 


139 


26.5 


26 


26.0 


20 


1.4 


1 


1.4 


80 


10.1 


10 


10.3 


140 


27.0 


26 


26.3 


21 


1.5 


1 


1.3 


81 


10.3 


11 


10.5 


141 


27.5 


27 


26.7 


22 


1.5 


2 


1.6 


82 


10.6 


11 


10.8 


142 


28.0 


27 


27.0 


23 


1.6 


2 


1.6 


83 


10.8 


11 


11.0 


143 


28.5 


27 


28.0 


24 


1.7 


2 


1.7 


84 


11.0 


11 


11.2 


144 


29.0 


29 


28.5 


25 


1.8 


2 


1.8 


85 


11.3 


11 


11.4 


145 


29.5 


29 


29.0 


26 


1.9 


2 


1.9 


86 


11.5 


11 


11.6 


146 


30.1 


30 


29.5 


27 


1.9 


2 


1.9 


87 


11.7 


12 


11.8 


147 


30.6 


30 


30.0 


28 


2.0 


2 


2.0 


88 


12.0 


12 


12.0 


148 


31.2 


31 


31.0 


29 


2.1 


2 


2.2 


89 


12.2 


12 


12.3 


149 


31.8 


32 


31.5 


30 


2.2 


2 


l.i 


90 


12.5 


12 


12.7 


150 


32.3 


32 


32.0 


31 


2.3 


2 


2.5 


91 


12.7 


13 


13.0 


151 


32.9 


33 


33.0 


32 


2.4 


3 


2.7 


92 


13.0 


13 


13.3 


152 


33.5 


34 


34.0 


33 


2.5 


3 


2.8 


93 


13.2 


13 


13.5 


153 


34.1 


35 


34.5 


3-1 


2.6 


3 


3.0 


94 


13.5 


14 


13.8 


154 


34.8 


35 


35.0 


35 


2.7 


3 


3.1 


95 


13.7 


14 


14.0 


155 


35.4 


36 


36.0 


36 


2.8 


3 


3.2 


96 


14.0 


14 


14.3 


156 


36.1 


36 


36.8 


37 


2.9 


3 


3.3 


97 


14.3 


14 


14.5 


157 


36.7 


36 


37.5 


38 


3.0 


3 


3.4 


98 


14.5 


15 


14.8 


158 


37.4 


37-39 


38.3 


39 


3.2 


3 


3.5 


99 


14.8 


15 


15.0 


159 


38.1 


37-39 


39.0 


40 


3.3 


3 


3.5 


100 


15.0 


15 


15.3 


160 


38.8 


37-39 


39.5 


41 


3.4 


4 


3.6 


101 


15.2 


15 


15.5 


161 


39.5 


37-39 


40.0 


42 


3.5 


4 


3.7 


102 


15.5 


16 


15.8 


162 


40.2 


37-39 


40.5 


43 


3.6 


4 


3.8 


103 


15.7 


16 


16.0 


163 


40.9 


40^2 


41.0 


41 


3.8 


4 


3.9 


104 


15.9 


16 


16.2 


164 


41.7 


40^2 


41.5 


45 


3.9 


4 


4.0 


105 


16.1 


16 


16.4 


165 


42.4 


40^2 


42.0 


46 


4.0 


4 


4.1 


106 


16.4 


16 


16.6 


166 




42+ 


42.9 


47 


4.2 


4 


4.2 


107 


16.6 


17 


16.8 


167 




42+ 


43.8 


18 


4.3 


4 


4.3 


108 


16.8 


17 


17.0 


168 




42+ 


44.7 


49 


4.5 


4 


4.4 


109 


17.0 


17 


17.2 


169 




42+ 


45.6 


50 


4.6 


4 


4.5 


110 


17.3 


17 


17.4 


170 




42+ 


46.5 


51 


4.7 


4 


4.5 


111 


17.5 


17 


17.6 


171 




42+ 


47.4 


52 


4.9 


5 


4.6 


112 


17.7 


18 


17.8 


172 




42+ 


48.2 


53 


5.0 


5 


4.7 


113 


18.0 


18 


18.0 


173 




42+ 


49.1 


54 


5.2 


5 


4.8 


114 


18.2 


18 


18.3 


174 




42+ 


50.0 


55 


5.4 


5 


4.9 


115 


18.5 


18 


18.5 


175 




42+ 


50.9 


56 


5.5 


5 


5.0 


116 


18.7 


19 


18.8 


176 




42+ 


51.8 


57 


5.7 


5 


5.1 


117 


19.0 


19 


19.0 


177 




42+ 


52.7 


58 


5.8 


5 


5.3 


118 


19.3 


19 


19.3 


178 




42+ 


53.6 


59 


6.0 


5 


5.4 


119 


19.6 


19 


19.7 











Source: From Lindsay & Brouvvcrs, 1999. Reprinted with permission. 



Table 6-1 6a Extrapolated Scores for the BSID-II Mental Developmental Index (MDI) 



















Age in 


Months 
















MDI 


2 


3 


4 


5 


6 


8 


10 


12 


15 


18 


21 


24 


27 


30 


36 


42 


50 


3 


— 


— 


— 


38 


47 


55 


— 


74 


— 


99 


108 


117 


— 


130 


140 


49 


— 


8 


19 


30 


— 


— 


— 


65 


73 


87 


— 


— 


— 


122 


— 


— 


48 


2 


— 


— 


— 


37 


46 


— 


— 


— 


— 


98 


107 


116 


— 


129 


139 


47 


— 


7 


18 


29 


— 


— 


54 


64 


72 


86 


— 


— 


— 


121 


— 


— 


46 


1 


— 


— 


— 


36 


45 


— 


— 


— 


— 


97 


106 


115 


— 


128 


138 


45 


— 


6 


17 


28 


— 


— 


53 


63 


71 


85 


— 


— 


— 


120 


— 




44 


— 


— 


— 


— 


35 


44 


— 


— 


— 


— 


96 


105 


114 


— 


127 


137 


43 


— 


5 


16 


27 


— 


— 


52 


62 


70 


84 


— 


— 


— 


119 


— 


— 


42 


— 




— 


— 


34 


43 


— 


— 


— 


— 


95 


104 


113 


— 


126 


136 


41 


— 


4 


15 


26 


— 


— 


51 


61 


69 


83 


— 


— 


— 


118 


— 




40 


— 




— 


— 


33 


42 


— 


— 


— 


— 


94 


103 


112 


— 


125 


135 


39 




3 


14 


25 


— 


— 


50 


60 


68 


82 


— 


— 


— 


117 


— 




38 


— 




— 


— 


32 


41 


— 


— 


— 


— 


93 


102 


Ill 


— 


124 


134 


37 


— 


2 


13 


24 


— 


— 


49 


— 


67 


81 


— 


— 


— 


116 


— 




36 






— 


— 


31 


40 


— 


59 


— 


— 


92 


101 


110 


— 


123 


133 


35 


— 


1 


12 


23 


— 


— 


48 


— 


66 


80 


— 


— 


— 


115 


— 




34 


— 


— 


— 


— 


30 


39 


— 


58 


— 


— 


91 


100 


109 


— 


122 


132 


33 


— 


— 


11 


22 


— 


— 


47 


— 


65 


79 


— 


— 


— 


114 


— 




32 


— 




— 


— 


29 


38 


— 


57 


— 


— 


90 


99 


108 


— 


121 


131 


31 


— 




10 


21 


— 


— 


— 


— 


64 


78 


— 


— 


— 


113 


— 




30 


— 


— 


— 


— 


28 


37 


46 


56 


— 


— 


89 


98 


107 


— 


120 


130 



Note: To use the table, find the appropriate age column and move down the rows until you reach the obtained raw score; the number in the far left column of that row represents the 
estimated index score. Researchers using these tables should be aware that both the accuracy and the interval nature of the index scores may not be retained in the estimated tables. 

Source: Adapted from Robinson & Mervis, 1996. Reprinted with permission. 

Table 6-1 6b Extrapolated Scores for the Psychomotor Developmental Index (PDI) 

Age in Months 
PDI 2 3 4 5 6 8 10 12 15 18 21 24 27 30 36 42 

50 ________________ 

49 — 8 — — 23 — — 52 57 — 66 71 77 — 84 — 

48 ___ 17 ___________ 92 

47 — — 11 — — 35 41 — — 62 — — — 80 — — 

46 — 7 — — 22 — — 51 — — — — — — 83 

45 — — — 16 — — — — 56 — 65 70 76 — — — 

44 — — 10 — — 34 40 — — — — — — — — 91 

43 — 6 — — 21 — — — — 61 — — — 79 82 — 

42 9 15 — 50 — 64 69 — — 

41 — — — — — 33 39 — 55 — — — 75 — — 90 

40 — 5 — — 20 — — — — — — — — — 81 

39 — — — 14 — — — — — 60 — — — 78 — 

38 — 8 — — 32 38 49 — — — 68 — — — 

37 — 4 — — 19 — — — 54 — 63 — 74 — 80 — 

36 ___ 13 ___________ 89 

35 — — 7 — — 31 37 — — 59 — — — 77 — — 

34 — 3 — — 18 — — 48 — — — 67 — — 79 

33 6 12 36 53 62 73 

32 — — — — 30 — — — — — — — — — 88 

31 — 2 — — 17 — — — — 58 — — — 76 78 — 

30 — 5 11 — 29 35 47 — — — 66 — — — — 

Note: To use the table, find the appropriate age column and move down the rows until you reach the obtained raw score; the number in the far left column of that row represents the 
estimated index score. Researchers using these tables should be aware that both the accuracy and the interval nature of the index scores may not be retained in the estimated tables. 

Source: Adapted from Robinson & Mervis, 1996. Reprinted with permission. 



Bayley Scales of Infant Development — Second Edition (BSID-II) 
Table 6-1 7 Test-Retest Reliability Coefficients for the BSID-II, by Age 



125 



Magnitude of Coefficient Age 1 Month 



Very high (.90+) 
High (.80-89) 

Adequate (.70-79) 

Marginal (.60-69) 

Low (<.59) 



Mental Scale 3 

Motor Scale" 

BRS Motor Quality 



BRS Total Score 
BRS Attention/ 
Arousal 



Age 12 Months 

BRS Total Score 

Mental Scale" 
BRS Motor Quality 

Motor Scale 

BRS Emotional 
Regulation 



BRS Orientation/ 
Engagement 



Ages 24 and 36 Months 

Mental Scale 



Motor Scale 

BRS Motor Quality 

BRS Total Score 
BRS Orientation/ 

Engagement 
BRS Emotional Regulation 



Note: Scores for Mental and Motor Scales are standard scores; scores for BRS are raw scores. 
■'Based on combined 1- and 12-month data. 



Source: Adapted from Bayley, 1993. 



characteristics rather than test characteristics). Stability es- 
timates may therefore differ significantly across conditions, 
depending on expected developmental trajectory, as well as 
on the age at which children are tested. For example, Nic- 
cols and Latchman (2002) reported that one-year BSID-II 
stability coefficients for children with Down syndrome were 
higher than those for medically fragile infants (i.e., .65 and 
.37, respectively). However, when actual developmental 
quotient scores were examined, on average, scores of chil- 
dren with Down syndrome dropped after one year, but 
scores of medically fragile infants increased in the same 
time span, consistent with their expected developmental 
trajectories. Additionally, BSID-II classifications (i.e., Nor- 
mal/Borderline vs. Abnormal) in infancy, compared with 
those obtained one year later, showed low agreement for 
Down syndrome children but moderate agreement for 
medically fragile infants (kappa = .04 vs. .41). As noted 
above (see Ethnicity/ 'SES), research shows that children 
from at-risk backgrounds, including those with high rates 
of poverty, single-parent families, and familial chaos, have 
decreasing BSID-II scores with time when multiple assess- 
ments are compared (e.g., Mayes et al., 2003). Lower test- 
retest stability estimates may occur in these subgroups. 
Additionally, these studies underline the need to examine 
other indices of test-retest stability (including expected de- 
velopmental trajectories and actual score ranges) in addi- 
tion to reliability coefficients when evaluating test characteristics 
in clinical groups. 

Practice Effects 

The manual reports that scores increased an average of two 
points on both the Mental and Motor Scales across all ages, 
indicating a minimal practice effect in the standardization 
sample after a median interval of four days (Bayley, 1993). 



Practice effects have not been found in all studies (e.g., 
Markus et al., 2000). Again, practice effects would not be ex- 
pected in populations whose developmental trajectory in- 
cludes a decrease from infancy to toddlerhood. 

Assessing Change 

A new way of modeling change using developmental trajecto- 
ries based on Rossavik modeling shows promise in determin- 
ing whether individual infants deviate from their projected 
developmental growth curve. These are estimates of change 
modeled on BSID-II raw scores for normally and atypically 
developing infants tested serially over time. Using these mod- 
els, Deter et al. (2001) found that differences of + 10% raw 
score points between two assessment points in time were 
beyond the bounds expected by measurement or prediction 
error, and were thus indicative of probable alternation of nor- 
mal development in the individual infant between 4 and 26 
months of age. 

Interrater Reliability 

Interscorer reliability is reported as .96 for the Mental Scale 
and .75 for the Motor Scale. BRS interscorer reliability is gen- 
erally high (over .90 for total and factor scores for 1-5-month- 
olds level, and over 87% for 13-42-month-olds; Bayley, 1993). 
Good interobserver agreement has also been found in other 
studies, in samples differing significantly from the standardi- 
zation sample (e.g., Hamadani et al., 2002). However, when 
interscorer reliability of experienced Bayley examiners is as- 
sessed at the item level, considerable variability may occur on 
certain items. Chandlee et al. (2002) reported that although 
agreement was generally high for the majority of items, there 
were certain items that were not scored consistently across 
raters (i.e., 23% of items sampled). This inconsistency yielded 



126 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



underestimations or overestimations of as much as -16 to +14 
standard score points on the MDI, even though the mean dif- 
ference across raters was small (i.e., —1.7 points). They also 
found greater scoring variability in younger children and for 
items requiring incidental observation of language or involv- 
ing timing. As noted above, Chandlee et al. (2002) provide rec- 
ommendations for administering and scoring these items to 
improve interrater agreement (see Table 6-8). 

VALIDITY 

Content 

Items for the BSID-II were selected with considerable care to 
ensure appropriate item content and absence of bias. Tech- 
niques included a survey of BSID users, literature review, 
panel review including experts in child development who 
were provided with specific guidelines on how to evaluate 
items, new item generation (based on literature review and 
generation of items from 25 experts), three pilot testings, 
testing using a tryout version, and bias analyses (including 
empirical analysis such as Rasch techniques, and expert panel 
bias review). After completion of standardization testing, 
items were re-reviewed, and only those demonstrating strong 
age trends, appropriate difficulty level, statistical relationship 
to other scale items, lack of bias, and no administration/ 
scoring problems were retained in the final test version (Bay- 
ley, 1993). 

Comparison With the Original BSID 

Correlations between the two versions are high in the stan- 
dardization sample for the Mental and Motor Scales (i.e., 
r= .62 and .63, respectively). However, the size of the correla- 
tions indicates that the two tests nevertheless have a signifi- 
cant amount of unique variance (Bayley, 1993). Similar 
correlations have been reported in other samples, including 
children from low SES and African American mothers (r= .78 
and .70, respectively; Black et al., 2001). Raw scores and age 
equivalents of profoundly delayed individuals whose ages fall 
outside the test norms are very highly intercorrelated between 
test versions (r= .90-97; DeWitt et al, 1998). 

BSID scores are, on average, 10 to 12 points higher than 
BSID-II scores when both tests are administered (Bayley, 
1993; Black & Matula, 2000). For example, mean standard 
score differences in MDI between the two scales have ranged 
from 9.2 to 18.2 points in normally developing infants (Bay- 
ley, 1993; Glenn et al., 2001; Tasbihsazan et al, 1997) and 9 to 
1 1 raw score points in profoundly delayed individuals outside 
the age range (DeWitt et al., 1998). Similar results have been 
reported in children with Down syndrome (8.4 standard score 
points; Glenn et al., 2001) and preterm infants (7.3 points; 
Goldstein et al., 1995). Additionally, children who obtain 
BSID scores that differ significantly from the norm (i.e., at the 
extremes of the distribution such as developmental delay or 



above average), tend to have larger BSID-BSID-II differences 
(i.e., 13 to 18 points; Bayley, 1993; Gagnon & Nagle, 2000; 
Glenn et al., 2001; Tasbihsazan et al., 1997), although this is 
not always found (e.g., DeWitt et al., 1998). 

Subscale Intercorrelations 

According to the manual, the Mental and Motor Scales are 
moderately intercorrelated across age (average r= .44, range 
.24 to .72; Bayley, 1993). One study on normal 6-month-old 
infants reported a high correlation between scales (r= .66; 
Porter et al., 2003). The manual also reports that the BRS's 
correlation to the Mental and Motor Scales is low to moder- 
ate across age (i.e., .27 to .45 for the BRS total score). Porter 
et al. (2003) reported negligible correlations between the 
BRS Emotional Regulation factor and MDI/PDI in 6- 
month-old infants (r= .11). Overall, these findings suggest 
that each of the three BSID-II components tap different 
sources of variance not necessarily covered by the other two 
BSID-II scales. 

Factor Structure 

The manual does not provide information on whether the 
three BSID-II components actually emerge as separate factors 
(Flanagan & Alfonso, 1995), or whether facet scores are replic- 
able in factor solutions. Factor analyses of clinical groups are 
also needed (Fugate, 1998). 

With regard to the BRS, the manual reports factor solu- 
tions corresponding to the factor scores in the scale. These 
include two factors, Motor Quality and Attention/Arousal, in 
the youngest age group (1-5 months), and three factors in 
the two older age groups, representing Orientation/Engage- 
ment, Motor Quality, and Emotional Regulation. Of note, the 
factor solutions only accounted for moderate amounts of 
variance (i.e., 46%, 54%, and 44%, respectively), which sug- 
gests that other ways of portioning variance may provide a 
better fit. In particular, one study suggests that Motor Quality 
appeared to dominate factor solutions generated using stan- 
dardization and clinical samples in both first-order and second- 
order factor solutions, either because motor behavior is easily 
observed by raters, or because motor quality mediates all 
other aspects of performance on the BRS (Thompson et al., 
1996). Of note, factor solutions became increasingly complex 
as samples became more heterogeneous, with the simplest 
factor solution occurring in the youngest, normally develop- 
ing infants (Thompson et al., 1996). 

Correlations With IQ 

Even though the manual is careful to explain that the test is a 
developmental scale and not an IQ test, the BSID-II shows 
substantial associations with IQ. For example, the BSID-II 
Mental Scale correlates highly with composite measures of 
general intelligence, including scores from the McCarthy 



Bayley Scales of Infant Development — Second Edition (BSID-II) 127 



Scales (r= .79), WPPSI-R (r- .73), and DAS (r= .49; Bayley, 
1993). As expected, correlations between the Motor Scale and 
composites from these measures are not as high. For example, 
the manual reports moderate correlations between the Motor 
Scale and McCarthy Scales (r= .45), WPPSI-R (r= .41), and 
DAS (r= .35; Bayley, 1993). 

Substantial correlations between tests do not necessarily in- 
form on whether scores obtained from different instruments 
are similar, particularly in children whose development maybe 
uneven or atypical. For example, in preschoolers with autism, 
Magiati and Howlin (2001) found that BSID-II scores were 
significantly lower than those yielded by the Merrill-Palmer, 
despite high correlations between the two tests (r=.82). In 
contrast, BSID-II and Vineland Scales appeared to show higher 
agreement; intercorrelations were high, and scores were com- 
parable (55.6 versus 55.1). Correlations to other measures are 
discussed in further detail in the manual (Bayley, 1993), and 
validity studies are also discussed in Black and Matula (2000). 
There is currently limited empirical evidence on the validity of 
the facet scores in terms of correlations to other measures. 

Scores on the BSID-II are usually predictive of subsequent 
IQ, but this may depend on the sample assessed and the time- 
frame evaluated. Overall, BSID-II scores of infants with sig- 
nificant delays or atypical development are more predictive of 
later IQ than those with scores in the average range (Bracken 
& Walker, 1997). For example, in one study of extremely low 
birth weight infants, BSID-II scores at the age of 2 were highly 
related to WPPSI-R IQ at 4 years, especially in girls (r= . 73; 
Sajaniemi et al., 2001b). BSID-II scores of children under age 
2 may be less predictive of later IQ because perceptual motor 
skills, rather than mental skills, are mostly assessed by earlier 
items on the Mental Scale (Roberts et al., 1999). 

Correlations With Other Cognitive Tests 

Research has also examined whether the BSID-II is a good 
predictor of more specific cognitive domains such as language 
or attention. Some studies report that BSID-II MDI scores are 
significantly correlated with language measures, with increas- 
ingly large associations with increasing age (e.g., .26-33 in 
14-month-olds versus .50-53 in 18-month-olds; Lyytinen 
et al., 1999). However, this has not been found in all studies. For 
example, in toddlers with developmental delay, prelinguistic 
vocalizations (e.g., rate, extent of consonant use, and rate of 
interactive vocalizations), but not BSID-MDI or Mental Age 
scores, predict expressive language scores assessed one year 
later (McCathren et al., 1999). Similarly, children with a posi- 
tive family history of dyslexia perform similarly to controls 
(i.e., both groups within normal limits) on the BSID-II MDI, 
despite group differences on expressive language tests (Lyyti- 
nen et al., 2001). Lack of group differences were also found for 
the BSID-II expressive score, a subset of items thought to be a 
more precise estimate of language competence compared to 
the entire scale (but see Seigel et al., 1995). This raises ques- 
tions about the test's sensitivity to language deficits. 



Other research has examined the association between 
BSID-II and attention. For example, the MDI is highly related 
to the ability to respond to joint attention (Markus et al., 
2000), an early precursor of focused and sustained attention 
(Bono & Stifter, 2003). Further, a composite score comprised 
of specific MDI items measuring cognitive ability (COG) ap- 
pears to be more predictive of the duration of attention in 1 7- 
to 24-month olds than the standard MDI (see Table 6-11 for 
specific items). This may be because the MDI reflects broader 
aspects of ability tapping general cognitive and perceptual 
skill development rather than specific cognitive skills such as 
attention (Choudhury & Gorman, 2000). 

Clinical Studies 

The manual provides preliminary data on BSID-II perfor- 
mance in specific clinical samples such as children with prema- 
turity, HIV infection, prenatal drug exposure, birth asphyxia, 
development delay, chronic otitis media, autism, and Down 
syndrome. Many of the clinical groups performed below nor- 
mal limits on the MDI and the PDI, with the Down syndrome 
group having the lowest scores and the otitis media group 
the highest scores. Although the manual recommends that these 
data be interpreted with caution, it is difficult to determine 
their clinical significance in the absence of information on 
score differences compared to matched normal samples, as well 
as diagnostic classification statistics such as sensitivity, speci- 
ficity, and positive/negative predictive power. Note that other 
studies on children with Down syndrome (Niccols & Latch- 
man, 2002) and drug exposure (Schuler et al., 2003) report 
significantly lower mean scores than those presented in the 
manual. 

The BSID-II has been used in a large number of studies 
on various clinical populations. For instance, in infants with 
extremely low birth weight, each additional pre-/perinatal 
risk factor (e.g., low maternal education, neonatal ultra- 
sound abnormalities, intraventricular hemorrhage, respira- 
tory intervention, etc.) is associated with a 3.5-point decrease 
in BSID-II MDI scores (Sajaniemi et al., 2001b). Similarly, 
periventricular leukomalacia in premature infants is found to 
be a strong risk factor for low BSID-II scores (Nelson et al., 
2001). The test has shown sensitivity to the effects of prenatal 
exposure to polysubstance abuse, where boys show increased 
vulnerability to delay (Moe et al, 2001). The BSID-II has also 
demonstrated sensitivity to developmental trajectories in 
Down syndrome such that scores typically drop from the first 
to the second year of life (Niccols & Latchman, 2002). In con- 
trast, BSID-II scores of medically fragile children tested in the 
first year of life may underestimate cognitive potential, as seen 
by the increase in scores with age (Niccols & Latchman, 2002). 
The BSID-II has also been used in studies assessing devel- 
opmental delay in adoptees from Eastern European coun- 
tries, where the rate of developmental delay detected by the 
test may be as high as 55% (Boone et al., 2003). The test has 
also been used in large-scale studies on vulnerable children, 



128 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



including those managed by child welfare and foster care 
(Leslie etal., 2002). 

Variability in BSID-II scores within groups, as evidenced 
by large standard deviations or large score ranges, may also be 
a feature of at-risk children. This has been documented in 
children with early risk factors such as neurological abnor- 
mality or prematurity (e.g., MDIs ranging from <50 to +110; 
Nelson et al., 2001), in children of families with extreme 
poverty and environmental instability (Mayes et al., 2003), 
and in prenatally drug-exposed children (Moe & Slinning, 
2001). 

The BSID-II has also been used in treatment studies, 
including multisensory intervention administered in the 
neonatal intensive care unit to infants with neurological in- 
jury or extreme prematurity (Nelson et al., 2001; Sajaniemi 
et al., 2001b). In children with PKU, the test has shown 
sensitivity to mild delay and has provided evidence for the 
beneficial effect of breastfeeding on neurodevelopment via 
intake of long-chain polyunsaturated fatty acids, thought to 
be involved in the development of frontal regions in the 
developing brain (Agostoni et al., 2003). In healthy infants, 
supplementation of formula with docosahexaenoic acid (DHA) 
during the first four months of life is reported to be associ- 
ated with a seven-point increase in MDI but not PDI or 
BRS, supporting the hypotheses that this nutritional compo- 
nent is selectively related to cognitive development (Birch 
etal, 2000). 

In typically developing children, the BSID-II has con- 
tributed to our understanding of normal development. For 
instance, BSID-II MDI scores are significantly related to sym- 
bolic play in preschoolers (Lyytinen et al, 1999); symbolic 
play is thought to represent prelinguistic skills that are a foun- 
dation for later language ability (Lyytinen et al., 2001). The 
BSID-II has also been used in research showing the impor- 
tance of family environment on early cognitive development. 
BSID-II scores are moderately correlated to measures of the 
home environment reflecting the quality of parenting in med- 
ically fragile infants (r= .41-.42; Holditch-Davis et al, 2000). 
Similarly, in research on developmental neurotoxicity exam- 
ining the effect of PCB in breast milk, increased concentra- 
tions are associated with poorer BSID-II scores; however, a 
good home environment appears to be a significant protective 
factor that may offset adverse cognitive effects of exposure 
( Walkowiak et al., 2001). BSID-II scores also appear to be re- 
lated to parenting characteristics, which provides evidence for 
the association between rearing practices and cognitive devel- 
opment. For instance, maternal vocalizations and maternal 
contingency (i.e., contingent responding to infant needs) are 
moderately associated with MDI and BRS scores, but not PDI 
(Pomerleau et al., 2003). Specific parenting behaviors that in- 
hibit toddlers' competence (e.g., forceful redirection, ignor- 
ing/reinforcing misbehavior) are moderately and inversely 
related to MDI scores (Coleman et al., 2002). Similarly, 
parental beliefs about parenting competence are moderately 
related to MDI scores in easy toddlers, but not in difficult tod- 



dlers (Coleman & Karraker, 2003). Extent of marital conflict 
appears to be inversely related to scores on the BRS Emotional 
Regulation factor in infants (Porter et al., 2003), suggesting 
that conflict in the home may adversely affect developing reg- 
ulatory abilities in infants. The BSID-II has also been used in 
research demonstrating the role of fathers' parenting style in 
cognitive development. For example, a responsive-didactic 
parenting style in fathers is predictive of risk of delay in low- 
income fathers of varied ethnicity (Shannon et al., 2002). 
When other factors are controlled, the MDI is also related to 
the quality of daycare centers (i.e., operationalized as adult to 
child ratio and group size) in low-income African American 
preschoolers (Burchinal et al., 2000), again providing evi- 
dence for the impact of early environmental influences on 
cognitive development. 

COMMENT 

The BSID-II is a well- designed, psychometrically sound test 
that is probably the most frequently used infant development 
assessment test in North America and, like its predecessor, is 
considered the best available assessment tool for infants (Sat- 
tler, 2001). It is also a preferred test for research. Especially 
compelling is the fact that, despite the considerable challenges 
of assessing infants and preschoolers, including inherent 
variability in behavior, limited attention, distractibility, and 
susceptibility to interference (Chandlee et al., 2002), the test 
boasts impressive psychometrics (Dunst, 1998; Flanagan & 
Alfonso, 1995). Additionally, the manual is well written and 
comprehensive (including historical overview), and test con- 
tent was developed with substantial care, including literature 
review and bias review. 

However, as with any test, users need to be aware of its lim- 
itations. First, there are a few psychometric issues. Although 
the test has impressive reliabilities, particularly with regard to 
the MDI and BRS Motor Quality Scales, there are still con- 
cerns about stability of some scores over time (e.g., Bradley- 
Johnson, 2001; Flanagan & Alfonso, 1995). Some BRS factor 
scores also have questionable reliability at some ages (i.e., At- 
tention/Arousal). The test may also have steep item gradients 
in the younger age bands (Nellis & Gridley, 1994). Facet scores 
have limited empirical evidence of validity, have a potential 
for misuse (Schock & Buck, 1995), and should be used, if at 
all, "with extreme caution" (Fugate, 1998; see also Interpreta- 
tion). Information on sensitivity, specificity, and other classifi- 
cation statistics such as positive predictive power are lacking 
for the MDI, PDI, and BRS, as well as for facet scores and BRS 
factor scores. 

The norms are based on 1988 Census data, which makes 
them outdated in terms of demographic composition and 
potentially vulnerable to the Flynn effect. Healthy children 
only are included in the sample, which is problematic because 
the test is commonly used to assess children with develop- 
mental delay (see McFadden, 1996, for the limitations of 
"truncated" norm distributions). On the other hand, other 



Bayley Scales of Infant Development — Second Edition (BSID-II) 129 



tests are hampered by floor effects, so that standard scores for 
low-functioning children are based on a flat profile consisting 
of few or no raw score points, which provides no information 
on what a child can actually accomplish (Flanagan & Alfonso, 
1995). Unlike these, the BSID-II has a range of items that are 
appropriate for even low-functioning preschoolers. As a re- 
sult, most children pass several items, which provides sub- 
stantial information on their capabilities — not just their 
limitations — and is of considerable utility in planning inter- 
ventions and remedial programs (Flanagan & Alfonso, 1995). 
Others disagree (Dunst, 1998), and note that the test was not 
designed for this purpose, nor is it capable of identifying be- 
havioral goals for intervention, particularly when the same 
MDI score can be obtained by different children passing and 
failing completely different combinations of item types and 
levels (Bendersky & Lewis, 2001). Note, too, that the BSID-II 
has limited utility in discriminating ability level below mild to 
moderate delay (i.e., minimum standard score = 50), and has 
been criticized as an assessment tool for children with disabil- 
ities (Dunst, 1998). Use of developmental age equivalents for 
quantifying deficit in extremely low-functioning individuals 
instead is an inadequate substitute. 

There is also the issue of cutoff points for adjacent norms 
tables for children whose ages are close to the next oldest or 
youngest age classification. Bradley- Johnson (2001) provides 
the following example: on a given day, a child aged 4 months, 
15 days obtains a raw score of 40, which yields an MDI of 91. 
If the child had been tested the next day instead, when she was 
aged 4 months, 16 days, her MDI score would be 69. Note that 
this problem with age cutoffs only occurs between 1 and 5 
months of age (Bradley-Johnson, 2001). As a possible solu- 
tion, users are urged to consider adjacent age-norms tables 
when calculating scores for infants whose ages are at the top 
or the bottom of an age band. However, this appears a rather 
inadequate solution, especially when scores can result in such 
divergent classifications. 

Because toddlers and preschoolers are known to either 
(a) not be able to express themselves verbally, (b) not choose to 
express themselves verbally, or (c) be difficult to understand 
when they actually do express themselves verbally, a nonverbal 
score would be a significant asset (Bradley-Johnson, 2001). In 
addition, lack of a nonverbal score limits the utility of the test 
in children with language or speech impairments given the 
large number of items that are language-based, particularly in 
the older age bands (Bradley- Johnson, 2001). In assessing 
young children with developmental delay, the BSID-II is prob- 
ably not sufficient to accurately predict future language scores 
(McCathren et al., 1999); the test should therefore be supple- 
mented by other language measures. Likewise, the test is heavily 
loaded with motor items, and therefore may not be appropri- 
ate for children with physical impairments such as cerebral 
palsy (Mayes, 1999). 

Given the outdated norms that inflate test scores, use of the 
original BSID for clinical or research purposes is not recom- 
mended (Tasbihsazan et al., 1997). Alternatively, researchers 



who have been using the BSID for longitudinal studies may 
consider reporting raw scores (Black & Matula, 2000). 

Start Points 

Competent administration of the BSID-II demands a thor- 
ough understanding of the practical limits and benefits of us- 
ing estimated starting points, whether based on estimated level 
or corrected age (i.e., gestational versus chronological). Specif- 
ically, BSID-II scores may vary depending on which item set 
was chosen as the starting point of the child's assessment. Mi- 
nor changes to the administration of the Bayley, such as those 
based on the examiner's assumptions of a child's levels, can 
significantly affect test scores and lead to underestimation or 
overestimation of levels (Gauthier et al., 1999; Washington 
et al., 1998). Note that one major contributor to poor replica- 
bility of MDI scores across raters is the use of different start 
points by different examiners (Chandlee et al, 2002). 

Ross and Lawson (1997) conducted an informal survey and 
found that most psychologists use the corrected age as a basis 
for selecting the starting point in premature infants. However, 
when the authors compared MDI and PDI scores of prema- 
ture children using both corrected and chronological ages, 
equivalent scores were obtained only for children with major 
developmental delays. Children with average scores when 
chronological age was used as a starting point had lower levels 
when corrected age was used to determine the initial item set. 
In another study (Glenn et al., 2001), infants could pass items 
in higher item sets despite reaching ceiling criteria on lower 
ranked items. The authors noted that some children with atyp- 
ical development tend to have scattered or variable perfor- 
mance, which could exacerbate any inaccuracies related to the 
starting point. To avoid this problem, they recommend preced- 
ing the BSID-II administration by the administration of other 
nonverbal tests/materials covering a wide range of age levels to 
identify an appropriate BSID-II starting point. This may be 
time-consuming in some contexts. A practical alternative is 
simply to use item types that span several start points, such 
as items involving cubes, pegs, or picture naming/pointing 
(C. Miranda, personal communication, January 2005). 

Users should keep in mind that delay in one area might not 
necessarily mean delay in other areas of functioning (Matula 
et al., 1997). For example, selecting a starting point based on a 
child's language deficits may result in item sets that are inap- 
propriately low for their motor strengths, and vice versa. Black 
and Matula (2000) recommend that designated starting points 
therefore only be altered when definitive information exists on 
a child's levels. Alternatively, the simplest approach is simply to 
use chronological age in all cases (Gauthier et al., 1999). 

Scoring: Corrected Age Versus 
Chronological Age 

The second issue concerns the selection of age level for 
conversion of raw scores to standard scores. The manual 



130 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



recommends that corrected age be used for children with pre- 
maturity when deriving standard scores. However, it is un- 
clear whether a correction should be applied at all ages or only 
to younger infants. In Ross and Lawson's (1997) survey, most 
psychologists corrected for age until 2 years of age, after which 
chronological age was used in score conversions. They noted 
that this was based on convention, not on empirical evidence. 
The authors review some of the research on age correction, 
which indicates that the need to correct for age is more com- 
plex than it appears (e.g., it may relate to such factors as birth 
weight and functional domain assessed, given cross-domain 
variability in children with uneven development). For more 
information on this issue, see Black and Matula (2000). 



Interpretation 

It is important to note that BSID-II scores should be used as 
general indicators of functioning, not as definitive predictors 
of future ability or potential (Nellis & Gridley, 1994). The 
BSID-II was not designed to provide information on specific 
subdomains of infant ability, and the test does not measure 
specific, well-defined cognitive processes, but is rather "a stan- 
dardized, developmentally ordered checklist of complex crite- 
rion behaviors" (Bendersky & Lewis, 2001, p. 443). The 
manual explicitly indicates that the test should not be used to 
measure deficit in a specific skill area such as language, nor to 
obtain a norm-referenced score for a severely physically or 
sensorially impaired child. 

As noted above, there are psychometric issues and limited 
validity evidence for the facet scores, including unequal item 
representation across ages. Black and Matula (2000) recom- 
mend that these be used with caution, keeping in mind that the 
scores may provide a general description of functioning within 
domains but that a precise evaluation of particular abilities 
should be obtained via other instruments designed for this pur- 
pose. The manual stresses that only the MDI and PDI summary 
scores should be used in interpreting test results, and that fail- 
ure on a particular cluster of items (e.g., language) "should not 
be used as a measure of deficit in a specific skill area" (p. 4). 

Lastly, the test is also at times used to characterize the de- 
velopmental level of severely or profoundly impaired children 
and adults who are outside the actual BSID-II age range by 
deriving developmental age equivalents of raw scores. Al- 
though this technique can yield useful clinical information, 
use of the test in this way should be done with caution, keep- 
ing in mind the practical and psychometric limitations of age 
equivalents (Bayley, 1993; DeWitt et al., 1998). 



REFERENCES 

Agostoni, C, Verduci, E., Massetto, N., Radaelli, G., Riva, E., & Gio- 
vannini, M. (2003). Plasma long-chain polyunsaturated fatty 
acids and neurodevelopment through the first 12 months of life 
in phenylketonuria. Developmental Medicine & Child Neurology, 
45, 257-261. 



Andersson, H. W., Sonnander, K., & Sommerfelt, K. (1998). Gender 
and its contribution to the prediction of cognitive abilities at 
5 years. Scandinavian Journal of Psychology, 39, 267-274. 

Atkinson, L. (1990). Intellectual and adaptive functioning: Some ta- 
bles for interpreting the Vineland in combination with intelli- 
gence tests. American Journal of Mental Retardation, 95, 198-203. 

Bayley, N. (1933). The California First-Year Mental Scale. Berkeley: 
University of California Press. 

Bayley, N. (1936). The California Infant Scale of Motor Development. 
Berkeley: University of California Press. 

Bayley, N. (1969). Bayley Scales of Infant Development. Manual. New 
York: Psychological Corporation. 

Bayley, N. (1970). Development of mental abilities. In P. H. Mussen 
(Ed.), Carmichael's manual of child psychology (3rd ed.). New 
York: Wiley. 

Bayley, N. (1993). Bayley Scales of Infant Development (2nd ed.; 
Bayley-II). San Antonio, TX: Psychological Corporation. 

Bendersky, M., & Lewis, M. (2001). The Bayley Scales of Infant De- 
velopment: Is there a role in biobehavioral assessment? In 
L. T. Singer & P. S. Zeskind (Eds.), Biobehavioral assessment of the 
infant (pp. 443-462). New York: Guilford Press. 

Birch, E. E., Garfield, S., Hoffman, D. R., Uauy, E., & Birch, D. G. 
(2000). A randomized controlled trial of early dietary supply of 
long-chain polyunsaturated fatty acids and mental development 
in term infants. Developmental Medicine & Child Neurology, 42, 
174-181. 

Black, M. M., Hess, C, & Berenson-Howard, }. (2001). Toddlers from 
low-income families have below normal mental, motor and be- 
havior scores on the revised Bayley scales. Journal of Applied De- 
velopmental Psychology, 21, 655-666. 

Black, M. M., & Matula, K. (2000). Essentials of Bayley Scales of 
Infant Development-II assessment. New York: lohn Wiley and 
Sons, Inc. 

Bono, M. A., & Stifter, C. A. (2003). Maternal attention-directing 
strategies and infant focused attention during problem solving. 
Infancy, 4(2), 235-250. 

Boone, I. L., Hostetter, M. K„ & Weitzman, C. C. (2003). The predic- 
tive accuracy of pre-adoption video review in adoptees from 
Russian and Eastern European orphanages. Clinical Pediatrics, 42, 
585-590. 

Bracken, B. A. (1987). Limitations of preschool instruments and 
standards for minimal levels of technical adequacy. Journal ofPsy- 
choeducational Assessment, 4, 313-326. 

Bracken, B. A., & Walker, K. C. (1997). The utility of intelligence tests 
for preschool children. In D. P. Flanagan, J. L. Genshaft, & 
P. L. Harrison (Eds.), Contemporary intellectual assessment: Theo- 
ries, tests and issues (pp. 484-502). New York: Guilford Press. 

Bradley-Iohnson, S. (2001). Cognitive assessment for the youngest 
children: A critical review of tests. Journal of Psychoeducational 
Assessment, 19, 19-44. 

Braungart, J. M., Plomin, R., DeFries, ]. C, & Fulker, D. W. (1992). 
Genetic influence on tester-rated infant temperament as assessed 
by Bayley 's Infant Behavior Record: Nonadoptive and adoptive 
siblings and twins. Developmental Psychology, 28, 40-47. 

Brooks-Gunn, J., & Weinraub, M. (1983). Origins of infant intelli- 
gence testing. In M. Lewis (Ed.), Origins of intelligence: Infancy and 
early childhood (2nd ed., pp. 25-66). New York: Plenum Press. 

Burchinal, M. R., Roberts, }. E., Riggins, R., Zeisel, S. A., Neebe, E., & 
Bryant, D. (2000). Relating quality of center-based child care to 
early cognitive and language development longitudinally. Child 
Development, 71(2), 339-357. 



Bayley Scales of Infant Development — Second Edition (BSID-II) 131 



Campbell, S. K., Siegel, E„ & Parr, C. A. (1986). Evidence for the need 
to renorm the Bayley Scales of Infant Development based on the 
performance of a population-based sample of 12-month-old in- 
fants. Topics in Early Childhood Special Education, 6, 83-96. 

Cattell, P. (1940). Cattell Infant Intelligence Scale. San Antonio, TX: 
Psychological Corporation. 

Chandlee, J., Heathfield, L. T., Selganik, M., Damokosh, A., & Rad- 
cliffe, J. (2002). Are we consistent in administering and scoring 
the Bayley Scales of Infant Development-II? Journal of Psychoedu- 
cational Assessment, 20, 183-200. 

Choudhury, N., & Gorman, K. S. (2000). The relationship between 
sustained attention and cognitive performance in 17-24-month 
old toddlers. Infant and Child Development, 9, 127-146. 

Coleman, P. K., & Karraker, K. H. (2003). Maternal self-efficacy be- 
liefs, competence in parenting, and toddler's behavior and devel- 
opment status. Infant Mental Health Journal, 24(2), 126-148. 

Coleman, P. K., Trent, A., Bryan, S., King, B., Rogers, N., & Nazir, M. 
(2002). Parenting behavior, mother's self-efficacy beliefs, and tod- 
dler performance on the Bayley Scales of Infant Development. 
Early Child Development and Care, 172(2), 123-140. 

Cook, M. J., Holder-Brown, L., Johnson, L. J., & Kilgo, J. L. (1989). An 
examination of the stability of the Bayley Scales of Infant Devel- 
opment with high-risk infants. Journal of Early Intervention, 13, 
45-49. 

Damarin, E (1978). Bayley Scales of Infant Development. In 
O. K. Buros (Ed.), The eighth mental measurement yearbook (Vol. 1, 
pp. 290-293). Highland Park, NJ: Gryphon. 

Deter, R. L., Karmel, B., Gardner, J. M., & Flory, M. J. (2001). Predict- 
ing 2nd year Bayley raw scores in normal infants: Individualized 
assessment of early developmental trajectories using Rossavik 
modeling. Infant Behavior & Development, 24, 57-82. 

DeWitt, M. B., Schreck, K. A., & Mulick, J. A. (1998). Use of Bayley 
Scales in individuals with profound mental retardation: Compar- 
ison of the first and second editions. Journal of Developmental and 
Physical Disabilities, 10, 307-313. 

DiLalla, L. F., Thompson, L. A., Plomin, R., Phillips, K., Fagan, J. E, 
Haith, M. M., Cyphers, L. H., & Fulker, D. W. (1990). Infant pre- 
dictors of preschool and adult IQ: A study of infant twins and 
their parents. Developmental Psychology, 26, 759-769. 

Dunst, C. (1998). Review of the Bayley Scales of Infant Development — 
Second edition. In J. C. Impara & B. S. Plake (Eds.), The thirteenth 
mental measurements yearbook (pp. 92-93). Lincoln, NE: The 
University of Nebraska-Lincoln. 

Flanagan, D. P., & Alfonso, V. C. (1995). A critical review of the tech- 
nical characteristics of new and recently revised intelligence tests 
for preschool children. Journal of Psychoeducational Assessment, 
13, 66-90. 

Fugate, M. H. (1998). Review of the Bayley Scales of Infant Develop- 
ment — Second Edition. In J. C. Impara & B. S. Plake (Eds.), The 
thirteenth mental measurements yearbook (pp. 93-96). Lincoln, 
NE: The University of Nebraska-Lincoln. 

Gagnon, S. G., & Nagle, R. J. (2000). Comparison of the revised and 
original versions of the Bayley Scales of Infant Development. 
School Psychology International, 21, 293-305. 

Gauthier, S. M., Bauer, C. R., Messinger, D. S., & Closius, J. M. (1999). 
The Bayley Scales of Infant Development-II: Where to start? Jour- 
nal of Developmental and Behavioral Pediatrics, 20, 75-79. 

Gesell,A. (1925). The mental growth of the preschool child. New York: 
MacMillan. 

Glenn, S. M., Cunningham, C. C, & Dayus, B. (2001). Comparison of 
1969 and 1993 standardizations of the Bayley Mental Scales of In- 



fant Development for infants with Down syndrome. Journal of In- 
tellectual Disability Research, 45, 56-62. 

Goldstein, D. J., Fogle, E. E., Wieber, J. L., & O'Shea, T M. (1995). 
Comparison of the Bayley Scales of Infant Development, Second 
Edition, and the Bayley Scales of Infant Development with pre- 
mature infants. Journal of Psychosocial Assessment, 13, 391-396. 

Hamadani, J. D., Fuchs, G. J., Osendarp, S. J. M., Huda, S. N., & 
Grantham-McGregor, S. M. (2002). Zinc supplementation during 
pregnancy and effects on mental development and behaviour of 
infants: A follow-up study. The Lancet, 360, 290-294. 

Holditch-Davis, D., Tesh, E. M., Goldman, B. D., Miles, M. S., & 
D'Auria, J. (2000). Use of the HOME inventory with medically 
fragile infants. Children's Health Care, 29(4), 257-278. 

Jaffa, A. S. (1934). The California Preschool Mental Scale. Berkeley: 
University of California Press. 

Kaplan, M. G., Jacobson, S. W., & Jacobson, J. L. (1991). Alternative 
approaches to clustering and scoring the Bayley Infant Behavior 
Record at 13 months. Paper presented at the meeting of the Soci- 
ety for Research in Child Development, Seattle, WA. 

Kohen-Raz, R. (1967). Scalogram analysis of some developmental se- 
quences of infant behavior as measured by the Bayley Infant Scales 
of Mental Development. Genetic Psychology Monographs, 76, 3-21. 

Lehr, C. A., Ysseldyke, J. E., & Thurlow, M. L. (1987). Assessment 
practices in model early childhood special education programs. 
Psychology in the Schools, 24, 390-399. 

Leslie, L. K, Gordon, J. N., Ganger, W., & Gist, K. (2002). Develop- 
mental delay in young children in child welfare by initial place- 
ment type. Infant Mental Health Journal, 23(5), 496-516. 

LeTendre, D., Spiker, D., Scott, D. T, & Constantine, N. A. (1992). 
Establishing the "ceiling" on the Bayley Scales of Infant Develop- 
ment at 25 months. Advances in Infancy Research, 7, 187-198. 

Lindsay, J. C, & Brouwers, P. (1999). Extrapolation and extrapola- 
tion of age-equivalent scores for the Bayley II: A comparison of 
two methods of estimation. Clinical Neuropharmacology, 22, 
44-53. 

Lyytinen, P., Laasko, M.-L., Poikkeus, A.-M., & Rita, N. (1999). The 
development and predictive relations of play and language across 
the second year. Scandinavian Journal of Psychology, 40, 177-186. 

Lyytinen, P., Poikkeus, A.-M., Laasko, M.-L., Eklund, K., & Lyytinen, 
H. (2001). Language development and symbolic play in children 
with and without familial risk for dyslexia. Journal of Speech, Lan- 
guage, and Hearing Research, 44, 873-885. 

Magiati, I., & Howlin, P. (2001). Monitoring the progress of pre- 
school children with autism enrolled in early intervention pro- 
grammes. Autism, 5(4), 399-406. 

Markus, J., Mundy, P., Morales, M., Delgado, C. E. E, & Yale, M. 
(2000). Individual differences in infant skills as predictors of 
child-caregiver joint attention and language. Social Development, 
9(3), 302-315. 

Matula, K., Gyurke, J. S., & Aylward, G. P. (1997). Bayley Scales II. 
Journal of Developmental and Behavioral Pediatrics, 18, 112-113. 

Mayes, S. D. (1999). Mayes Motor-Free Compilation (MMFC) for 
assessing mental ability in children with physical impairments. 
International Journal of Disability, Development and Education, 
46(4), 475-482. 

Mayes, L. C, Cicchetti, D., Acharyya, S., & Zhang, H. (2003). Devel- 
opmental trajectories of cocaine-and-other-drug-exposed and 
non-cocaine-exposed children. Developmental and Behavioral Pe- 
diatrics, 24(5), 323-335. 

McCathren, R. B., Yoder, P. J., & Warren, S. F. ( 1999). The relationship 
between prelinguistic vocalization and later expressive vocabulary 



132 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



in young children with developmental delay. Journal of Speech, 
Language and Hearing Research, 42, 915-924. 

McFadden, T. U. (1996). Creating language impairments in typically 
achieving children: The pitfalls of "normal" normative sampling. 
Language, Speech, and Hearing in the Schools, 27, 3-9. 

Moe, V., & Slinning, K. (2001). Children prenatally exposed to sub- 
stances: Gender-related differences in outcome from infancy to 3 
years of age. Infant Mental Health Journal, 22(3), 334-350. 

Nellis, L., & Gridley, B. E. (1994). Review of the Bayley Scales of In- 
fant Development, Second Edition. Journal of School Psychology, 
32(2), 201-209. 

Nelson, M. N., White-Traut, R. C, Vasan, U., Silvestri, I., Comiskey, 
E., Meleedy-Rey, P., Littau, S., Gu, G., & Patel, M. (2001). One- 
year outcome of auditory- tactile-visual-vestibular intervention in 
the neonatal intensive care unit: Effects of severe prematurity and 
central nervous system injury. Journal of Child Neurology, 16, 
493-498. 

Niccols, A., & Latchman, A. (2002). Stability of the Bayley Mental 
Scale of Infant Development with high-risk infants. British Jour- 
nal of Developmental Disabilities, 48, 3-13. 

Ogunnaike, O. A., & Houser, R. F. (2002). Yoruba toddler's engage- 
ment in errands and cognitive performance on the Yoruda Mental 
Subscale. International Journal of Behavioral Development, 26(2), 
145-153. 

Ong, L., Boo, N., & Chandran, V. (2001). Predictors of neurodevelop- 
mental outcome of Malaysian very low birthweight children at 
4 years of age. Journal of Paediatric Child Health, 37, 363-368. 

Pomerleau, A., Scuccimarri, C, & Malcuit, G. (2003). Mother-infant 
behavioral interactions in teenage and adult mothers during the 
first six months postpartum: Relations with infant development. 
Infant Mental Health Journal, 24(5), 495-509. 

Porter, C. L., Wouden-Miller, M., Silva, S. S., & Porter, A. E. (2003). 
Marital harmony and conflict: Links to infants' emotional regula- 
tion and cardiac vagal tone. Infancy, 4(2), 297-307. 

Ramey, C. T., Campbell, F. A., & Nicholson, J. E. (1973). The predic- 
tive power of the Bayley Scales of Infant Development and the 
Stanford-Binet Intelligence Test in a relatively constant environ- 
ment. Child Development, 44, 790-795. 

Ratliff-Schaub, K., Hunt, C. E., Crowell, D., Golub, H., Smok- 
Pearsall, S., Palmer, P., Schafer, S., Bak, S., Cantey-Kiser, ]., 
O'Bell, R., & the CHIME Study Group. (2001). Relationship be- 
tween infant sleep position and motor development in preterm 
infants. Developmental and Behavioral Pediatrics, 22(5), 
293-299. 

Roberts, E., Bornstein, M. H., Slater, A. M., & Barrett, J. (1999). Early 
cognitive development and parental education. Infant and Child 
Development, 8, 49-62. 

Robinson, B. R, & Mervis, C. B. (1996). Extrapolated raw scores for 
the second edition of the Bayley Scales of Infant Development. 
American Journal on Mental Retardation, 100(6), 666-670. 

Ross, G., & Lawson, K. (1997). Using the Bayley-II: Unresolved is- 
sues in assessing the development of prematurely born chil- 
dren. Journal of Developmental and Behavioral Pediatrics, 18, 
109-111. 

Sajaniemi, N., Hakamies-Blomqvist, L., Katainen, S., & von Wendt, L. 
(2001a). Early cognitive and behavioral predictors of later perfor- 
mance: A follow-up study of ELBW children from ages 2 to 4. 
Early Childhood Research Quarterly, 16, 343-361. 



Sajaniemi, N., Makela, J., Salokorpi, T, von Wendt, L., Hamalainen, 
T, & Hakamies-Blomqvist, L. (2001b). Cognitive performance 
and attachment patterns at four years of age in extremely low 
birth weight infants after early intervention. European Child & 
Adolescent Psychiatry, 10, 122-129. 

Samson, J. F., & de Groot, L. (2001). Study of a group of extremely 
preterm infants (25-27 weeks): How do they function at 1 year of 
age? Journal of Child Neurology, 16, 832-837. 

Santos, D. C. C, Gabbard, C, & Goncalves, V. M. G. (2000). Motor 
development during the first 6 months: The case of Brazilian in- 
fants. Infant and Child Development, 9(3), 161-166. 

Santos, D. C. C, Gabbard, C, & Goncalves, V. M. G. (2001). Motor 
development during the first year: A comparative study. The Jour- 
nal of Genetic Psychology, 162(2), 143-153. 

Sattler, J. M. (2001). Assessment of children: Cognitive applications 
(4th ed.). San Diego, CA: lerome M. Sattler Publisher, Inc. 

Schock, H. H., & Buck, K. (1995). Review of Bayley Scales of In- 
fant Development — Second Edition. Child Assessment News, 5(2), 
1, 12. 

Schuler, M. E., Nair, P., & Harrington, D. (2003). Developmental out- 
come of drug-exposed children through 30 months: A compari- 
son of Bayley and Bayley-II. Psychological Assessment, 15(3), 
435-438. 

Seigel, L. S., Cooper, D. C, Fitzhardinge, P. M., & Ash, A. J. (1995). 
The use of the Mental Development Index of the Bayley Scale to 
diagnose language delay in 2-year-old high risk infants. Infant Be- 
havior and Development, 18, 483-486. 

Shannon, J. D., Tamis-LeMonda, C. S., London, K., & Cabrera, N. 
(2002). Beyond rough and tumble: Low-income fathers' interac- 
tions and children's cognitive development at 24 months. Parent- 
ing: Science and practice, 2(2), 77-104. 

Shapiro, B. K., Palmer, F. B., Antell, S. E., Bilker, S., Ross, A., & Capute, 
A. J. (1989). Giftedness: Can it be predicted in infancy? Clinical 
Pediatrics, 28, 205-209. 

Sigman, M., Neumann, C, Carter, E., Cattle, D. J., D'Souza, N., & 
Bwibo, N. (1988). Home interactions and the development of 
Embu toddlers in Kenya. Child Development, 59, 1251-1261. 

Tasbihsazan, R., Nettelbeck, T, & Kirby, N. (1997). Increasing mental 
development index in Australian children: A comparative study 
of two versions of the Bayley Mental Scale. Australian Psycholo- 
gist, 32, 120-125. 

Thompson, B., Wasserman, J. D., & Matula, K. (1996). The factor 
structure of the Behavioral Rating Scale of the Bayley Scales of In- 
fant Development-II. Educational and Psychological Measurement, 
56, 460-474. 

Walkowiak, J., Wiener, J.-A., Fastabend, A., Heinzow, B., Kramer, U, 
Schmidt, E., Steingruber, H. J., Wundram, S., & Winneke, G. 
(2001). Environmental exposure to polychlorinated biphenyls 
and quality of the home environment: Effects on psychodevelop- 
ment in early childhood. Lancet, 358, 1602-1607. 

Washington, K., Scott, D. T, & Wendel, S. (1998). The Bayley Scales 
of Infant Development-II and children with developmental de- 
lays: A clinical perspective. Developmental and Behavioral Pedi- 
atrics, 19(5), 346-349. 

Yarrow, L. I., Morgan, G. A., Jennings, K. D., Harmon, R., & Gaiter, J. 
(1982). Infants' persistence at tasks: Relationship to cognitive 
functioning and early experience. Infant Behavior and Develop- 
ment, 5, 131-141. 



Cognitive Assessment System (CAS) 



Cognitive Assessment System (CAS) 133 



PURPOSE 

The Cognitive Assessment System (CAS) is designed to assess 
cognitive processes in children and adolescents. 

SOURCE 

The CAS (Naglieri & Das, 1997) can be ordered from River- 
side Publishing Company, 425 Spring Lake Drive, Itasca, IL 
60143-2079 (1-800-323-9540; fax: 630-467-7192; http://www 
.riverpub.com). The complete test kit price is $629 US. Com- 
puter scoring (CAS Rapid Score) is $165 US. There is also a 
Dutch translation of the test (Kroesbergen et al., 2003). 

AGE RANGE 

The test is designed for ages 5 years to 17 years, 1 1 months. 

DESCRIPTION 

Theoretical Underpinnings 

The CAS is based on the PASS theory of cognitive processing, 
which posits that cognition depends on four interrelated 
functions (i.e., Planning, Attention, Simultaneous, and Suc- 
cessive) that interact with the individual's knowledge base and 
skills (Das et al., 1994; see Figure 6-2). Like other nontradi- 
tional IQ tests such as the WJ III and the Kaufman tests (e.g., 
K-ABC, KAI), its theoretical underpinnings include cognitive 
psychology and factor analysis (Naglieri, 1996). The CAS is 
based on the theory that PASS processes are the essential ele- 
ments of human cognition (Naglieri & Das, 1997). This dif- 
fers from traditional IQ tests such as the Wechsler scales and 
the Stanford-Binet, which posit a general ability score ("g") 
and whose content derives from clinical applications rather 

Figure 6-2 Model of PASS processes. Source: Reprinted with 
permission from Naglieri, 1999a. 



Planning 




Simultaneous 



Base of Knowledge 



than specific models of intelligence (note that the newest ver- 
sions of these instruments are also based, to some extent, on 
factor- analytic theories of intelligence). The CAS model as- 
sumes that the term "cognitive processes" should replace the 
term "intelligence" and that a test of cognitive processing 
should rely as little as possible on acquired knowledge such as 
vocabulary or arithmetic. The CAS is conceived by its author 
to be a technological improvement over its more traditional 
predecessors (Naglieri, 1999a). 

The CAS consists of four main scales assessing each of the 
four PASS processes. Table 6-18 shows the functions mea- 
sured by each scale. Briefly, the CAS Planning subtest mea- 
sures self-control, self-monitoring, and plan development; the 
Attention subtest measures the various attentional processes 
as well as inhibition (e.g., sustained, focused, selective atten- 
tion, resistance to distraction). Simultaneous subtests mea- 
sure the ability to integrate information into a whole (e.g., 
integration of words into ideas), and Successive subtests mea- 
sure the ability to integrate information into a specific pro- 
gression involving strong serial or syntactic components (e.g., 
the linking of separate sounds into speech). 

According to the test authors, the four PASS processes 
correspond to Luria's three functional units, which Luria as- 
sociated with specific brain systems. Attention is posited to 
reflect Luria's first functional unit (brainstem, diencephalon, 
medial regions). Simultaneous and Successive processes re- 
flect the second functional unit (occipital, parietal, and tem- 
poral lobes posterior to the central sulcus), and Planning 
reflects the third functional unit (frontal lobes, particularly 
prefrontal cortex). Like Luria's model, the four PASS pro- 
cesses are thought to be interrelated, not independent. Al- 
though more than one PASS process might be involved in 
each CAS subtest, the PASS scale to which each subtest be- 
longs reflects the PASS process with the most influence on 
subtest performance (Naglieri, 1999a). 

The CAS is of particular interest to neuropsychologists be- 
cause it is based on Lurian theory and contains several sub- 
tests that approximate well-known neuropsychological testing 
paradigms such as the Trail Making Test and Stroop. In addi- 
tion, it contains scales of potential use in the neuropsycholog- 
ical assessment of children (i.e., Planning and Attention) and 
a method for tracking strategy use during test administration 
that may be of utility in assessing executive functions (i.e., 
Strategy Assessment; see Figure 6-3). 



Uses 

According to the authors, the CAS allows examiners to deter- 
mine (a) intraindividual strengths and weaknesses, (b) com- 
petence relative to peers, and (c) the relationship between 
cognitive functioning and achievement. This means that 
the CAS, like other intelligence tests, is intended to be used 
for diagnosis (i.e., learning disability, mental retardation, 



134 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 
Table 6-1 8 Descriptive Characteristics of PASS Processes 



Planning Scale 

Generation of strategies 

Execution of plans 

Anticipation of consequences 

Impulse control 

Organization of action 

Planful responses to new situations 

Self-control 

Self-evaluation 

Self-monitoring 

Strategy use 

Use of feedback 



Simultaneous Scale 

Integration of words into ideas 

Seeing parts as a whole or group 

Seeing several things at one time 

Comprehension of word relationships 

Understanding of inflection 

Understanding verbal relationships and concepts 

Working with spatial information 



Attention Scale 

Directed concentration 
Focus on essential details 
Focus on important information 
Resistance to distraction 
Selective attention 
Sustained attention over time 
Sustained effort 



Successive Scale 

Articulation of separate sounds into a consecutive series 
Comprehension when word order drives meaning 
Execution of movements in order 
Perception of stimuli in sequence 
Serial organization of spoken speech 
Working with sounds in a specific order 



Source: Adapted from Naglieri & Das, 1997. 



Figure 6-3 Example protocol for a CAS Planning subtest (i.e., 
Planned Codes). Source: Reprinted with permission from 
Naglieri, 1999a. 



A 


B 


C 


D 








X 





X 


X 





X 



A 


B 


C 


D 


A 





X 









D 


A 


B 


C 


D 







X 







C 


D 


A 


B 


C 









X 





B 


C 


D 


A 


B 











X 



giftedness) and eligibility for services (e.g., state- or province- 
mandated special education criteria). However, because it is 
purported by its authors to be broader in scope than other 
more traditional IQ tests, the CAS is intended to be sensitive 
to conditions that typically elude other intelligence tests. 
These include ADHD, learning disability, traumatic brain in- 
jury, and giftedness (Gutentag et al., 1998; Naglieri, 1999a, 
2001; Naglieri & Das, 1997; Naglieri & Edwards, 2004; Naglieri 
etal.,2003). 

The way in which particular diagnoses are detected also 
differs. For example, in the traditional model, IQ tests 
are presumed to be insensitive to learning disability. Conse- 
quently, a discrepancy between IQ and achievement is 
required for diagnosis in most cases. This differs from 
the CAS, which is presumed to detect specific scale weak- 
nesses in learning disabled children. A discrepancy/consistency 
analysis is therefore required, presuming both discrepancies 
and similarities between the CAS and the achievement mea- 
sure used in diagnosis (Naglieri, 1999a; Naglieri & Edwards, 
2004). 

Unlike other intelligence tests, the CAS is also designed for 
use in planning specific interventions. To this end, it can be 
used in conjunction with a related training program, the PASS 
Remedial Program. Interested readers can consult the interpre- 
tive manual as well as the following publications for more 
information on this program: Das et al. (1994, 1995, 1997). 
This approach includes specific programs for training "plan- 
ful" approaches to completing schoolwork, which may be of 



Table 6-1 9 CAS Subtests by PASS Domain 

PASS Domain Standard Battery 

Planning Matching Numbers 

Planned Codes 
Planned Connections 

Simultaneous Nonverbal Matrices 

Verbal- Spatial Relations 
Figure Memory 

Attention Expressive Attention 

Number Detection 
Receptive Attention 

Successive Word Series 

Sentence Repetition 
Speech Rate 
Sentence Questions 

Note: Subtests included in the Basic Battery are shown in italics; all 
subtests are included in the Standard Battery. 



particular interest to neuropsychologists involved in rehabilita- 
tion of children with attention or executive deficits. Examples 
of interventions that can be used to facilitate learning for chil- 
dren with specific PASS weaknesses are also outlined in 
Naglieri (1999b). 

Test Structure 

The CAS has two forms: the Standard Battery, and a shorter 
version, the Basic Battery. The Standard Battery includes three 
subtests for each of the four PASS domains, for a total of 12 
subtests. The Basic Battery involves two subtests per domain, 
for a total of eight subtests. See Table 6-19 for a summary of 
the subtests in the Standard and Basic Batteries (Basic Battery 
subtests are shown in italics). A detailed description of each 
subtest is provided in Table 6-20. 

The CAS is organized into three levels: the Full-Scale score, 
the four separate PASS scales, and the 12 separate subtests 
making up the PASS scales. However, it is not intended to 
have a hierarchical structure (Naglieri, 1999b). PASS scales are 
derived from multiple subtests that are each thought to mea- 
sure that particular PASS process. When these are combined, 
the result is a PASS scale with higher reliability than the indi- 
vidual subtests themselves. In other words, the different sub- 
tests of the Attention scale were not designed to measure 
different components of attention, but rather were chosen be- 
cause they were presumed to be good measures of the broader 
construct of attention. Interpretation is therefore at the PASS 
scale level. Naglieri (1999a) recommends that subtest-level in- 
terpretation occur only if there is a specific reason to do so 
(e.g., inconsistent strategy use associated with variable Plan- 
ning subtests). The authors note that the CAS Full-Scale score, 
which is a composite score based on the four PASS scales, is 
only provided for convenience to allow designations consis- 
tent with state regulations for special education, not because it 
is based on a hierarchical model of intelligence. 



Cognitive Assessment System (CAS) 135 

ADMINISTRATION TIME 

The Basic Battery takes about 40 minutes; the Standard Bat- 
tery takes about 60 minutes. 

ADMINISTRATION 

Materials 

The CAS comes in an easily portable carrying case. It con- 
tains fewer materials than other intelligence batteries for 
children, which is an advantage when portability is an issue. 
Test materials are well designed. The manual is split-back, 
which allows it to stand independently to facilitate test ad- 
ministration, similar to the Wechsler scales manuals. The test 
also includes a tabbed, spiral-bound stimulus book. Scoring 
templates are bound separately in a tabbed, spiral-bound 
booklet, which makes them easy to use and less likely to 
get damaged or lost (note that there are 19 different scoring 
templates in all). A separate Interpretive Handbook provides 
information on test development, standardization, psycho- 
metric properties, and interpretation. Test protocols are well 
designed and easy to use. There are three Response Books 
(one for each age group and a separate booklet for responses 
to the Figure Memory subtest). There is also a Record Form 
for use by the examiner for entering scores and recording re- 
sponses, including a space to record strategy use during Plan- 
ning subtests (i.e., the Strategy Assessment Checklist). Two 
red pencils are also included for use in some of the paper- 
and-pencil subtests. 

General Administration 

See manual and Naglieri (1999a) for additional administra- 
tion guidelines. Items are administered according to two age 
classifications (5-7 years and 8-17 years). With few excep- 
tions, this simply means using different start-points for differ- 
ent ages. Each subtest is administered until the time limit runs 
out or the discontinue criterion is met (i.e., four consecutive 
failed responses). 

Unlike other tests that require strict adherence to instruc- 
tions, the CAS allows the examiner to provide a brief explana- 
tion if the child does not understand what is required after 
standard sample items and demonstration. The additional in- 
structions can include gestures or verbal explanations in any 
language. 

Strategy Assessment 

In addition to providing subtest scores, the Planning sub- 
tests allow the examiner to conduct a Strategy Assessment. 
This allows the examiner to record the strategy used by 
the child and to determine whether it was similar to that 
used by the standardization sample. Strategies are recorded 
in two ways. The "Observed Strategies" are recorded by the 
examiner during the test, and "Reported Strategies" are 



Table 6-20 Description of CAS Subtests 



Planning 
Matching Numbers 



Planned Codes 



Planned Connections 



Attention 
Expressive Attention 



Number Detection 



Receptive Attention 



Simultaneous 
Nonverbal Matrices 

Visual-Spatial Relations 
Figure Memory 



Successive 
Word Series 



Sentence Repetition 



This is similar to a paper-and-pencil cancellation task. The child is instructed to find two matching numbers 
among a line of numbers, under time constraints. Although this subtest is purported to measure Planning, it 
does require that the child employ visual scanning, visual attention, and processing speed. The total score is a 
ratio of number correct and time. 

The child must fill in codes (Xs or Os) that correspond to letters presented at the top of the page, under time 
constraints. The basic paradigm is similar to the Wechsler Digit Symbol and Coding tasks, but uses a less 
involved motor response (X or O only), is more complex (requires a code pair instead of a single symbol), and 
does not require that items be completed in a prescribed order. Instead, the child is explicitly told to "do it any 
way you want," which is presumed to elicit strategy use that would maximize the number of items completed 
before the time limit. The total score is a ratio of number correct and time. 

This subtest follows the basic Trail Making Test paradigm. Numbers are initially presented on a page in random 
array. The child is instructed to connect the numbers as quickly as possible, under time constraints. Longer 
number series are presented until ceiling is reached. As in Trails, the child is corrected when an error is made. 
For younger children (5-7 years), this is the extent of the test. Older children are given two additional items 
that require switching between number and letters (similar to part B of the Trail Making Test). The total score 
is the sum of the time taken (one score reflects both serial order and switching items). 



This is a Stroop task analog. For ages 5 to 7, the interference paradigm involves naming the size of pictured 
animals according to two features: relative size of the animal on the page or absolute size of the animal itself in 
real life. The interference trial shows conflict between the relative and absolute size (e.g., small picture of a 
dinosaur, big picture of a butterfly). Items are presented in rows, so that the child can "read" across the page. 
For older children (8-17 years), the subtest closely follows the Stroop paradigm (i.e., word reading, color 
reading, color- word interference trial). The subtest score is a ratio score of time and accuracy for the 
interference trial only. 

This is a classic cancellation task. The child is shown specified targets (three to six numbers, depending on age), 
and then must cross out these targets among an array of numbers, working row by row from left to right under 
time constraints. Older children are also presented targets in two different fonts to make the task more 
difficult. The total score is the number of hits (corrected for errors of commission), which is then transformed 
to a ratio score involving time taken. Errors of commission and hits cannot be scored separately. 

This is another paper-and-pencil task that requires the child to underline pairs of like objects or letters, working 
row by row from left to right under time constraints. Targets are those pairs that "have the same name" (i.e., 
may differ on appearance, but are from the same category, such as two kinds of trees, or upper- and lowercase 
letters). A ratio score of time and hits (corrected for errors of commission) is then computed; hits and 
errors of commission cannot be scored separately. 



This is a classic nonverbal pattern-matching task similar to Raven's Progressive Matrices. The total score is the 
number correct. 

This is a multiple-choice receptive language test that requires the examinee to pick out a specific picture 
corresponding to a sentence read aloud by the examiner. Sentences get increasingly grammatically and 
syntactically complex. 

The child is instructed to remember a geometric figure presented for 5 seconds; the figure must then be identified 
and traced from memory within a larger, more complicated geometric figure presented in the record book. 
Given the task requirements, this subtest likely involves aspects of visual-spatial pattern matching and 
visual memory. 

The child must repeat words in series of increasing length, to a maximum of nine words (e.g., "Cat — 

Boy — Book"). The words are all of equal difficulty and not linked semantically or logically. Given these task 

requirements, this subtest appears to measure aspects of verbal memory span. 

Unlike other sentence repetition subtests from other children's batteries (e.g., WPPSI-R, NEPSY, WRAML), this 
involves novel sentences that cannot be easily recalled based on prior knowledge or on filling in missing words 
based on logic. All sentences are constructed of color words only (e.g., "The white greened the yellow"). 

{continued) 



Cognitive Assessment System (CAS) 137 



Table 6-20 Description of CAS Subtests (continued) 



Speech Rate 



Sentence Questions 



Administered only to 5 to 7-year-olds, this subtest involves the rapid repetition of short word series under time 
constraints. The child is instructed to repeat a word sequence 10 times (e.g., "wall, car, girl"), with a time limit 
of 30 seconds. Errors are alterations in the word sequence, but not mispronunciations, distortions, or other 
articulation difficulties. If the child has not completed 10 correct sequences in the 30-second time limit, he or 
she is given a time of 31 for that item. The total score is the time taken, summed across items. This subtest 
requires oralmotor speed as well as verbal memory span. 

Administered only to 8 to 17-year-olds, this subtest involves the auditory comprehension of novel spoken 
sentences (e.g., "The green is yellowing. Who is yellowing?"). As such, it requires an understanding of grammar 
and semantic relationships independent of vocabulary level, and would be an interesting measure of auditory 
comprehension when vocabulary skills are suspected to be weak. 



Adapted from Naglieri & Das, 1997. 



queried by the examiner using prompts (e.g., "Tell me how 
you did these"). During each of the Planning subtests, the 
examiner may complete the Strategy Assessment Checklist, 
which allows a quantification of strategy use. See Figure 6-4 
for an example. 

In theory, the examiner is instructed to observe and record 
strategy use during the test. In practice, this is sometimes diffi- 
cult to do, given the nature of the task and the speed with 
which it is executed (e.g., cancellation-type tasks such as 
Matching Numbers). Practically speaking, although this should 
be possible for all Planning subtests, only Planned Codes al- 
lows the examiner to objectively record strategy use without 
depending to some extent on self- report (Haddad, 2004). 

SCORING 



of differences between PASS scores and Full-Scale score, and 
differences between subtest scaled scores required for statisti- 
cal significance. Several tables are also provided to compare 
CAS scores to predicted WJ-R scores. These may be of limited 
utility now that a newer version of the WJ has been published 
(WJIII). 

Strategy Use 

Strategy use is coded on the response forms according to a 
checklist (see Figure 6-5). Higher scores reflect better strategy 
use. However, the manual provides limited information on 
how to score observed strategies and does not provide infor- 
mation on the relative accuracy of observed versus reported 
strategy use for different ages. 



Scores 

See manual, as well as additional scoring guidelines provided 
in Naglieri (1999a). Raw subtest scores are converted to scaled 
scores (M= 10, SD=3), which are then summed to derive 
PASS domain standard scores and the Full-Scale standard 
score. The four PASS scales are equally weighted to form the 
Full-Scale score, as are each group of three subtests making up 
each PASS scale score. Percentile ranks and confidence inter- 
vals (90% and 95%) are provided in the manual for both the 
Standard and Basic Battery. Tables in the manual also provide 
confidence intervals based on the estimated true score. Age 
equivalents are also provided, as are cumulative percentages 



Figure 6-4 Example protocol for a CAS Attention subtest (i.e., 
Number Detection). Source: Reprinted with permission from 
Naglieri, 1999a. 

Find the numbers that look like this: 12 4 1 

4 _2_ § j^ I J_ 3 I 5 
I 3 _2_ 4 J_ I JI^ 5 1 
16I5634I4 



Figure 6-5 Sample CAS Strategy Assessment Checklist. Source: 
Reprinted with permission from Naglieri, 1999a. 



Strategy Assessment Checklist 

Obs Rep Description of Strategy 







1 . Looked at first then last digits 






2. Looked at first then last digit 






3. Looked at first two digits 


• 


• 


4. Looked at the last number 






5. Looked at the first digit 






6. Put finger on the number 


• 




7. Verbalized the number 






8. Scanned the row for a match 






9. No strategy 



Other: 
Observed 



Reported . 



138 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



Repeat Testing 

The manual allows standard score comparisons for use with 
repeat testing (i.e., the score range expected, given a specific 
baseline score). The ranges are based on a method that ac- 
counts for regression to the mean and internal reliability. 
Specifics are outlined in Anderson (1991, as cited in Naglieri 
& Das, 1997). Tables in the manual can be used to determine 
whether a change in CAS performance pre- and postinterven- 
tion is statistically significant. These scores are based on calcu- 
lation of the standard error of prediction (SE P ; see Chapter 1), 
and are not based on test-retest data per se. 

DEMOGRAPHIC EFFECTS 



tation of African American children in special education 
programs (Naglieri & Rojahn, 2001). Compared with Cau- 
casian children, African American children had lower scores on 
the "achievement-loaded" WISC-III Verbal scales (Naglieri & 
Rojahn, 2001, p. 365). In contrast, African American children 
had higher scores on the Planning and Attention scales of the 
CAS. Importantly, use of the CAS would have led to a 30% de- 
crease in the identification of children with mental retardation 
within this group, which has significant implications for using 
this measure in diagnosis in minority groups. Based on these 
and other results, Naglieri (2001) notes that minority children 
are more likely to be fairly evaluated using the CAS than with 
other intelligence tests that rely more on English language or 
academic skills. 



Age 

The CAS's sensitivity to age-related changes is supported by 
raw score increases with increasing age for all subtests, as 
shown in the manual (p. 51, Naglieri & Das, 1997). Age- 
related changes in the percentage of children using strategies 
on the Planning subtests are also shown in the manual (p. 82, 
Naglieri & Das, 1997). However, the authors note that not all 
strategies are of equivalent utility across age (e.g., saying codes 
aloud might be effective in the younger age group, but not in 
older children). Further, while only 25% of younger children 
use strategies on the CAS, almost all adolescents do so (e.g., 
Planned Connections). In terms of predictive validity, the 
amount of variance in achievement accounted for by the CAS 
increases with age (Naglieri & Rojahn, 2004). 



Gend 



er 



Gender differences have been found on PASS scales, with girls 
obtaining slightly higher scores than boys on Planning and 
Attention subtests. 



Ethnicity 

The CAS is purported to be less culturally biased than other 
traditional IQ tests due to its lack of heavily language- or 
achievement-based subtests and its flexible instructions. During 
construction of the CAS, detailed bias analysis was conducted, 
including differential item functioning analysis to determine 
whether any subgroups performed differently on certain items 
of the test. In addition, the test was evaluated for predictive bias 
(i.e., whether the relationship of the test to a criterion measure — 
in this case, achievement — would differ by subgroup). There 
were no significant differences across groups denned by gender, 
race, or Hispanic status with regard to regression slopes predict- 
ing WJ-R performance (Naglieri & Das, 1997). 

In a comparison of the CAS and WISC-III in a group of 
children in special education placement, the WISC-III identi- 
fied disproportionately more African American children as 
having mental retardation, consistent with criticisms that the 
use of the WISC-III is partly responsible for the overrepresen- 



NORMATIVE DATA 

Standardization Sample 

The standardization sample consists of 2200 children strati- 
fied according to age, gender, race, Hispanic origin, region, 
community setting, and parental education level based on 
1990 U.S. Census data. See Table 6-21 for sample char- 
acteristics. To allow ability-achievement comparisons, 1600 
children were administered the CAS and the WJ-R, with 
sampling again closely matching the 1990 U.S. Census. An 
additional 872 children were recruited for validity/reliability 
studies. 

Derivation of Scaled and Standard Scores 

To construct norms for the CAS, cumulative frequency dis- 
tributions from the raw score distributions from each 1 -year 
age interval from the standardization sample were normal- 
ized to produce scaled scores. Scaled score progression both 
within and across age bands were examined, and smoothing 
was used to correct for any irregularities (Naglieri & Das, 
1997). 

RELIABILITY 

Internal Reliability 

CAS Full-Scale reliability coefficients provided in the manual 
are high, ranging from .95 to .97 across age. Reliabilities for 
the PASS scales are also high (i.e., with average reliabilities of 
r— .88 for Planning, .88 for Attention, .93 for Simultaneous, 
and .93 for Successive). Additionally, reliability coefficients 
were adequate to high for all CAS subtests, averaged across 
age (see Table 6-22). Of note, internal reliability estimates are 
based on split-half reliability (Spearman-Brown; Simultane- 
ous and Successive subtests except Speech Rate), test-retest 
reliability (timed tests such as Planning and Attention sub- 
tests, and Speech Rate), or on a formula for estimating the re- 
liability of linear combinations (composite scores). Reliability 
information on the Strategy Assessment is lacking. 



Cognitive Assessment System (CAS) 139 



Table 6-21 Normative Sample Characteristics for the CAS 



Number 

Age 

Geographic location 
Midwest 
Northeast 
South 
West 

Sample type 

Parental education 
Less than high school 
High school 
Some college 
Four or more years 
of college 

Community setting 
Urban 
Rural 

Gender 
Males 
Females 

Race 
Black 
White 
Other 

Hispanic origin 
Hispanic 
Non-Hispanic 

Screening 



2200 a 

5 to 17 years, 1 1 months 

25% 
19% 
34% 
23% 

National, stratified, random sample b 

20% 
29% 
29% 

23% 



75% 
25% 

50% 
50% 

14% 

77% 
10% 

11% 
89% 

None indicated; sample includes special populations such as 
learning disabled (5%), speech/language impaired (1%), 
seriously emotionally disturbed (0.8%), mentally retarded 
(1%), and gifted (4%) children 



a Based on nine age groupings representing one-year intervals for age 5 to 1 years, 1 1 months, two-year intervals for age 1 1 to 14 
years, 11 months, and three-year interval for age 15 to 17 years, 11 months; 100 to 150 cases are represented in each age band. 
b Based on 1990 U.S. Census data and collected between 1993 and 1996. 



Source: Adapted from Naglieri & Das, 1997. 



Standard Error of Measurement 

Average SEMs across age for the Full Scale and four compos- 
ites are within acceptable limits (i.e., 5.4 for the Full Scale and 
between 4.8 and 6.2 for the four composites). Average subtest 
SEMs across age range from 1.0 to 1.5 (Naglieri & Das, 1997). 

Test-Retest Reliability 

Test-retest stability was assessed in 215 children tested twice 
over a median interval of 21 days (Naglieri & Das, 1997). For 
the standard battery, PASS scales demonstrated adequate to 
high reliability (r= .77 to .86 for individual PASS scales, and 
.89 for the Full-Scale score). At the subtest level, most CAS 
subtests showed adequate to high test-retest stability (Table 
6-23). Verbal-Spatial Relations showed marginal stability in 
every age range assessed, as did Figure Memory in the two 
younger age groups. There is no information on the stability 
of the Strategy Assessment in the manual. 



Practice Effects 

The test-retest scores differed by approximately four standard 
score points on the Planning and Successive scales, by five 
points on the Simultaneous scale, and by six points on the At- 
tention scale and Full-Scale score (Naglieri & Das, 1997). This 
suggests a small practice effect on the Standard Battery, based 
on data presented in the manual for test-retest stability. For 
the Basic Battery, these values were three, two, four, and five 
points, respectively. 



VALIDITY 

Content 

As noted in Description, the test is based on a detailed theoret- 
ical foundation. Significant effort also went into the creation 
of the CAS subtests, including item selection, data analysis, 



140 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



Table 6-22 Internal Reliability Coefficients for CAS 
Subtests, Averaged Across Age 



Magnitude of Coefficient 

Very high (.90+) 
High (.80-89) 



Adequate (70-.79) 



Subtest 



Planned Codes 
Nonverbal Matrices 
Verbal-Spatial Relations 
Figure Memory 
Expressive Attention 
Word Series 
Sentence Repetition 
Speech Rate 
Sentence Questions 

Matching Numbers 
Planned Comparisons 
Number Detection 
Receptive Attention 



Marginal (.60-69) 
Low (<.59) 

Source: Adapted from Naglieri and Das, 1997. 



test revision, pilot testing, national tryout, and standardiza- 
tion (Naglieri, 1999a). See also Demographic Effects, Ethnicity 
for information on bias analyses conducted in the develop- 
ment of the test. 



Subtest Intercorrelations 

As reported in the Interpretive Manual, subtest intercorrela- 
tions within each PASS scale are high in the standardization 
sample (Naglieri & Das, 1997). For instance, on average, sub- 
tests for the Simultaneous domain are moderately to highly 
intercorrelated (r—A5— .53), as are the Successive subtests 
(r— .48-65). Planning and Attention also have moderate to 
high subtest intercorrelations within each scale (r=. 39-57, 
Planning; r— .39-44, Attention), but also appear to be related 
to subtests from other scales. In fact, in some cases, correla- 
tions between Attention and Planning subtests were higher 
than those within each scale (e.g., r= .51, Matching Numbers 
and Receptive Attention; r=.50, Planned Connections and 
Expressive Attention). This is not entirely surprising, given 
the similarities between some of the subtests between these 
two domains and the interrelated nature of planning/execu- 
tive and attention skills in the real world. This is also sup- 
ported by some factor-analytic studies (see later discussion). 

Factor Structure and Scale Specificity 

Factor-analytic studies provided in the manual support the 
four-factor PASS model, along with a three-factor solution 
that combines the Planning and Attention scales. The authors 
report that their decision to retain the four-factor model rather 
than the three-factor solution was based on clinical, empir- 
ical, and theoretical grounds. Other authors have reported 



Table 6-23 Test-Retest Stability Coefficients for CAS Subtests Across Age Groups 



Magnitude of 
Coefficient 

Very high (.90+) 
High (.80-89) 



Adequate (.70-.79) 



Marginal (.60-.69) 



Low(<.59) 



Ages 5-7 

Planned Codes 
Sentence Repetition 



Matching Numbers 
Planned Connections 
Nonverbal Matrices 
Number Detection 
Receptive Attention 
Word Series 
Speech Rate 



Verbal-Spatial 

Relations 
Figure Memory 
Expressive Attention 



Ages 8-11 



Planned Codes 
Planned Connections 
Nonverbal Matrices 
Sentence Questions 

Expressive Attention 
Number Detection 
Word Series 
Sentence Repetition 



Matching Numbers 
Verbal-Spatial Relations 
Figure Memory 
Receptive Attention 



Ages 12-17 

Planned Codes 
Sentence Repetition 



Matching Numbers 
Planned Connections 
Nonverbal Matrices 
Figure Memory 
Expressive Attention 
Number Detection 
Receptive Attention 
Word Series 
Sentence Questions 

Verbal-Spatial Relations 



Note: Correlations were corrected for restriction in range. 
Source: Adapted from Naglieri and Das, 1997. 



Cognitive Assessment System (CAS) 141 



different findings. In a confirmatory factor analysis of the 
standardization data, Kranzler and Keith (1999) and Keith 
et al. (2001) found only marginal fit for the four-factor model, 
and Planning and Attention were virtually indistinguishable 
in the factor solution. The best fit was a hierarchical solution 
with four first-order factors corresponding to the PASS pro- 
cesses, a second-order Attention-Planning factor, and a large 
third-order "g" factor. Kranzler and Keith (1999) also found 
better fit statistics for other ability tests such as the WISC-III 
and WJ-R. Overall, the CAS was felt to lack structural fidelity 
(i.e., correspondence between theory and test structure), a 
necessary but not sufficient condition for construct validity 
(Keith & Kranzler, 1999). Naglieri (1999a) countered that 
these authors overinterpreted a minor difference in factor- 
analytic results, overemphasized the role of factor analysis in 
determining the structure of human abilities, and underval- 
ued the need to measure validity in a broad manner. Interest- 
ingly, although Naglieri (1999a) asserts that the CAS does not 
have a hierarchical structure, which Keith et al. dispute on the 
basis of factor analysis, Luria's theory was hierarchical in na- 
ture. Primary, secondary, and tertiary zones were found within 
each functional unit, and the tertiary zones of the frontal 
lobes were considered a "superstructure" above all other parts 
of the cortex (Luria, 1973, p. 89). 

Keith et al. also conducted a joint confirmatory factor 
analysis of the CAS and WJ III (Keith et al, 2001; Kranzler 
et al., 2000), which led them to conclude that PASS scales ac- 
tually measure other abilities than those specified by 
Naglieri — namely, processing speed for Attention and Plan- 
ning, verbal memory span for the Successive scale, and fluid 
intelligence and broad visualization for the Successive scale. 
Note that Haddad (2004) concluded that one of the Planning 
subtests (Planned Codes) was not a simple speed measure; 
over 50% of children performed better on Planned Codes 
when they were not told to complete the test items in sequen- 
tial (i.e., nonstrategic) order. 

Keith et al. (2001) concluded that only the Successive scale 
measured g, and that it was the only scale with enough unique 
variance to be interpreted separately from the rest of the test. 
They concluded that, contrary to the authors' claims, the CAS 
measures the same psychometric g as other tests of intelligence 
and that the PASS theory was likely an inadequate model of 
human cognitive abilities. In contrast, Naglieri (1999a) 
stressed that the CAS has ample validity based on predictive 
validity and relevance to treatment and that when specificity 
was calculated for the PASS scales, all four scales were found to 
have sufficient unique variance to be interpreted on their own. 

Correlations With Other Neuropsychological Tests 

CAS subtests have moderate criterion validity when compared 
with other neuropsychological tests. In a small sample of chil- 
dren with TBI described in the manual (N= 22), the Planning 
scale was related to Trail Making A (r= -.40) but not Trail Mak- 
ing B, or to the Tower of London, a test measuring planning, in- 



hibition, and working memory. The implication is that the 
Planning scale is more a measure of speed than of planning (see 
later discussion). The authors explain the latter finding by stat- 
ing that the Tower test has a stronger successive than planning 
aspect, consistent with the sequential nature of the task (Das & 
Naglieri, 1997). In this same group, the Simultaneous scale was 
related to the Token Test and Embedded Figures (r= .45 and 
.39, respectively). The Attention scale was related to Trail Mak- 
ing A and Stroop, and the Successive scale was related to Trail 
Making A, the Tower of London, and the Sentence Repetition 
Test. In another study, Simultaneous and Successive scales were 
related to facial recognition (Kroeger et al, 2001). 

Correlations With IQ 

Based on data presented in the manual, correlations between 
CAS and Wechsler IQ tests are for the most part high, particu- 
larly between the CAS Full-Scale score and FSIQ scores from 
Wechsler scales (r= .66-.71, WISC-III; r = .60, WPPSI-R). Of 
the four PASS scales, Simultaneous and Successive appear to 
measure an aspect of cognitive ability most related to the tra- 
ditional notion of FSIQ, with average correlations between 
WISC-III FSIQ and Simultaneous/Successive scales in the 
moderate to high range (i.e., .36-64, with most >. 59). In 
contrast, correlations between WISC-III FSIQ and Plan- 
ning/Attention domains are moderate (i.e., .30-48). Similar 
effects occur with the WPPSI-R, with high correlations be- 
tween WPPSTR FSIQ and CAS Simultaneous/Successive 
scores (r= .73 and .67, respectively) but only modest correla- 
tions between WPPSI-R FSIQ and CAS Planning/Attention 
(r= .12 and .22; Naglieri & Das, 1997). 

At the PASS or factor index level, Planning correlates most 
highly with WISC-III Processing Speed (r= .70), PIQ (r= .53), 
and Perceptual Organization (r= .45, in a small sample of reg- 
ular education students described in the manual; N= 54). The 
Simultaneous scale correlates moderately to highly with all the 
WISC-III factor scores (r= .35-69). Successive processing also 
appears to correlate strongly with all WISC-III index scores 
(r= .58-64), except Processing Speed, where the correlation 
fails to reach significance (r= .32). CAS Attention correlates 
most highly with Processing Speed (r= .54) and PIQ (r= .35). 
These results seem to support the view of a substantial pro- 
cessing speed component to the CAS Planning and Attention 
scales (but see Haddad, 2004). 

In comparisons using the WPPSI-R in a small sample of 
5- to 7-year-olds (N= 33; see manual), Simultaneous pro- 
cessing was most related to PIQ (r= .76) and FSIQ (r= .73), 
but the correlation with VIQ was also substantial (r— .53). 
Successive processing was related to both VIQ (r= .57) and 
PIQ (r= .52). Neither Attention nor Planning was related to 
any WPPSI-R indices, as would be expected based on the the- 
ories underlying the CAS. 

When children are administered the CAS and other tests of 
intelligence, there are slight Full-Score differences (see test 
manual). In general, CAS scores are higher. For instance, there 



142 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



is a one-point difference between CAS and WISC-III Full-Scale 
scores in normal children (N— 54), a two-point difference in 
children with learning disabilities (N— 81), and a five-point dif- 
ference in children with mental handicap (N= 84; CAS = 65.9, 
WISC-III = 60.7). The CAS Full-Scale score was seven points 
higher in younger children administered the WPPSI-R (N= 33). 
In a separate study, children in special education placements 
obtained Full-Scale CAS scores that were approximately six 
points higher than WISC-III Full-Scale scores (Naglieri & Ro- 
jahn, 2001). Score differences using the WISC-IV and WPPSI- 
III are likely to be smaller, given the more recent standardization 
samples for these measures. 

Correlations With Achievement 

Like all IQ tests, the CAS is moderately to highly related to 
achievement. In fact, in a recent review, Naglieri and Born- 
stein (2003) noted that CAS correlations with achievement 
were some of the highest of all the intelligence batteries, along 
with the K-ABC and WJ III (i.e., r= .70-.74). CAS associa- 
tions with achievement are reviewed in the Interpretive Man- 
ual and in Naglieri and Edwards (2004). 

Unlike other IQ tests that are minimally sensitive to 
learning difficulties (hence the need for ability-achievement 
discrepancy analyses), specific patterns of CAS performance 
across PASS subtests are claimed to be related to certain 
kinds of academic difficulties. Specific profile differences on 
the CAS are also reported for different subgroups of chil- 
dren with academic difficulties. For instance, children with 
reading difficulties have lower Successive scores than chil- 
dren in regular education and children with ADHD, and 
lower Simultaneous scores than children in regular educa- 
tion (Naglieri & Edwards, 2004). The manual also indicates 
that PASS scales are associated with specific weaknesses on 
the WJ-R (e.g., low Planning scores are associated with low 
Calculation, Dictation, and Basic Writing on the WJ-R). The 
CAS authors also compared the sensitivity of the CAS to that 
of the WISC-III in detecting learning disability using the 
WJ-R. Both scales were moderately correlated with the Skills 
cluster of the WJ-R in regular students. The authors note 
that this occurred despite the fact that the CAS contains no 
items relating to achievement (i.e., learned information such 
as Vocabulary, Information, and Arithmetic), as does the 
WISC-III. 

Clinical Studies 

The manual provides CAS data on several clinical groups as 
evidence of further construct validity (Naglieri & Das, 1997). 
Children with ADHD (N= 66) were found to have relative 
weaknesses on the Planning and Attention scales (M=88.4, 
SD= .10.0; and M= 92.1, SD= .11.9, respectively), which the 
authors argued was consistent with the conceptualization of 
ADHD as a disorder of inhibition processes. In a subsequent 
study, lower Planning scores were also found for ADHD chil- 



dren compared with both normal children and children with 
anxiety/depression (Naglieri et al., 2003). The manual also 
indicates that children with reading disability identified using 
WISC-III/WJ-R discrepancies have relative weaknesses in the 
Successive scale (N= 24), which the authors interpreted as 
consistent with the view of reading disability as being related 
to phonological processing deficits. This has also been shown 
in a subsequent study, along with differences in Simultaneous 
scores (Naglieri & Edwards, 2004). The manual also reports 
that children with intellectual deficits (N= 86) had consis- 
tently depressed PASS scale scores consistent with presumed 
global deficits, whereas gifted children had Full-Scale scores 
that were, on average, one standard deviation above the mean, 
with slightly different elevations across domains (Das & 
Naglieri, 1997). In published studies, head-injured children 
have shown a relative weakness in Planning compared with 
non-head-injured children (Gutentag et al., 1998). Children 
with written expression disabilities also show specific CAS 
profiles (Johnson et al., 2003). 

In the educational setting, children with low Planning 
scores benefit from intervention to increase planning skills, 
while untreated controls show no improvement in math 
computation (Naglieri & Johnson, 2000). In another study, 
children with a selective Planning weakness benefited from 
a cognitive strategy instruction program in increasing their 
posttest reading comprehension scores (Haddad et al., 2003). 
However, not all studies have found a positive association be- 
tween CAS profiles and responsivity to intervention. Kroes- 
bergen et al. (2003) found that an intervention to increase 
math skills did not differentially improve scores of children 
with mathematical learning difficulties, despite evidence that 
children with mathematically based learning difficulties 
could be grouped into three subgroups based on CAS perfor- 
mance. 

Children with higher strategy use during Planning tasks 
appear to earn higher Planning scores than those who do not 
use strategies (Naglieri & Das, 1997). This serves as some evi- 
dence of the tests' construct validity. Planning subtests, which 
may tap creativity, may also be useful in the assessment of 
giftedness (Naglieri, 2001). There is also some indication that 
the CAS may have some utility in training environments and 
that its predictive validity may extend beyond achievement, 
based on research involving aviation simulations (Fein & Day, 
2004). This may be because the CAS is more highly loaded 
with attention and executive tasks than most standard IQ 
tests. 

COMMENT 

From a test development perspective, the CAS appears to be a 
well-designed, psychometrically sound instrument with a large 
standardization sample and good reliability. The manual is 
well written, thorough, and comprehensive, and contains 
detailed information on the psychometric properties of the 
test that are sometimes not available in many test manuals 



Cognitive Assessment System (CAS) 143 



(e.g., psychometrics of the short form or Basic Battery). 
The test is also based on modern theories of ability and on 
over 20 years of research and development by the test au- 
thors. Other features may make it more useable across popu- 
lations and cultures (i.e., limited reliance on language- or 
achievement-type content, flexible instructions to facilitate 
comprehension, and evidence of lack of bias in minorities). It 
also includes interesting and novel subtests, including a num- 
ber of executive and attention subtests, as well as subtests 
modeled after classic neuropsychological tests such as the 
Stroop and Trail Making Test. Although these deserve further 
study, the test is also associated with intervention programs 
based on CAS strengths and weaknesses. A small number of 
research studies have shown that PASS profiles may be used 
to determine responsivity to training in cognitive strategies 
to improve planning skills, which should be of considerable 
practical and clinical utility in making recommendations for 
individual children, as well as in designing interventions to 
improve executive skills. 

On the other hand, there continue to be significant con- 
cerns about its factor structure and the meaning of the fac- 
tors. Tables to enable CAS-achievement comparisons are also 
outdated because of the arrival of the WJ III. Some subtests 
demonstrate marginal test-retest reliabilities (see Table 6-23). 
The psychometric properties of the Strategy Assessment are 
mostly unknown; these scores should therefore not be used in 
diagnostic decision making. Users should also note that Planned 
Codes is the only Planning subtest where strategy can be ob- 
jectively observed by the examiner rather than relied upon via 
verbal report (Haddad, 2004). 

In addition, some of the features do not allow a process- 
based analysis of performance, despite the test's foundation in 
Lurian theories. For instance, there is no differentiation be- 
tween accuracy (pass/fail) and speed (time) on several Plan- 
ning and Attention subtests, and Attention subtests do not 
allow the differentiation of inattention (missed targets) from 
response inhibition (errors of commission). Although allow- 
ing this level of analysis was never a goal of the CAS, in the 
neuropsychological context, knowledge of the patient's ability 
on these separate processes is helpful in diagnosis and treat- 
ment planning. Granted, many other intelligence tests such as 
the Wechsler scales also use similar omnibus scores, and the 
reality is that parsing performance in this way sometimes 
leads to less-than-stellar reliabilities for some scores (e.g., see 
CPT-II; D-KEFS). However, because some CAS subtests model 
themselves after neuropsychological tests that commonly al- 
low these distinctions (e.g., Stroop), the clinician must there- 
fore find alternate means to assess these processes in detail. 
In this way, the CAS suffers somewhat from the difficulties 
inherent in translating Luria's methods to a clinically applied 
instrument (e.g., omnibus scores that reflect multiple neu- 
ropsychological processes are not consistent with Lurian prin- 
ciples), as have other instruments based on Lurian theory 
(e.g., Adams, 1980; Spiers, 1981). However, it is important to 
state that, overall, the CAS is a well-constructed, well- researched, 



theory-based measure of intelligence with several features of 
particular interest to neuropsychologists. 



REFERENCES 

Adams, K. M. (1980). In search of Luria's battery: A false start. Jour- 
nal of Consulting and Clinical Psychology, 48, 511-516. 

Atkinson, L. (1991). Three standard errors of measurement and the 
Wechsler Memory Scale-Revised. Psychological Assessment, 3(1), 
136-138. 

Das, J. P., Carlson, J., Davidon, M. B., & Longe, K. (1997). PREP: PASS 
remedial program. Seattle, WA: Hogrefe. 

Das, J. P., Mishra, R. K., & Pool, J. (1995). An experiment on cognitive 
remediation of word-reading difficulty. Journal of Learning Dis- 
abilities, 28, 66-79. 

Das, J. P., Naglieri, J. A., & Kirby, J.R. (1994). Assessment of cognitive 
processes: The PASS theory of intelligence. Boston: Allyn & Bacon. 

Fein, E. C, & Day, E. A. (2004). The PASS theory of intelligence and 
the acquisition of a complex skill: A criterion-related validation 
study of Cognitive Assessment System scores. Personality and In- 
dividual Differences, 37, 1123-1136. 

Gutentag, S., Naglieri, J. A., & Yeates, K. O. (1998). Performance on 
children with traumatic brain injury on the Cognitive Assessment 
System. Assessment, 5, 263-272. 

Haddad, F. A. (2004). Planning versus speed: An experimental 
examination of what planned codes of the Cognitive Assess- 
ment System measures. Archives of Clinical Neuropsychology, 19, 
313-317. 

Haddad, F. A., Garcia, Y. E., Naglieri, J, A., Grimditch, M., McAn- 
drews, A., & Eubanks, J, (2003). Planning facilitation and reading 
comprehension: Instructional relevance and the PASS theory. 
Journal of Psychoeducational Assessment, 21, 282-289. 

lohnson, J. A., Bardos, A. N, & Tayebi, K.A. (2003). Discriminant va- 
lidity of the Cognitive Assessment System for students with writ- 
ten expression disabilities. Journal of Psychoeducational Assessment, 
21, 180-195. 

Keith, T Z., & Kranzler, J. H. (1999). The absence of structural fi- 
delity precludes construct validity: Rejoinder to Naglieri on what 
the Cognitive Assessment System does and does not measure. 
School Psychology Review, 28(2), 303-321. 

Keith, T Z., Kranzler, J. H., & Flanagan, D. P. (2001). What does the 
Cognitive Assessment System (CAS) measure? loint confirmatory 
factor analysis of the CAS and the Woodcock-Iohnson Tests of Cog- 
nitive Ability (3rd Edition). School Psychology Review, 30(1), 89-119. 

Kranzler, J. H., & Keith, T Z. (1999). Independent confirmatory factor 
analysis of the Cognitive Assessment System (CAS): What does the 
CAS measure? School Psychology Review, 28(1), 117-144. 

Kranzler, J. H., Keith, T Z., & Flanagan, D. P. (2000). Independent ex- 
amination of the factor structure of the Cognitive Assessment Sys- 
tem (CAS): Further evidence challenging the construct validity of 
the CAS. Journal of Psychoeducational Assessment, 18(2), 143-159. 

Kroeger, T. L., Rojahn, J., & Naglieri, J. A. (2001). Role of planning, 
attention, and simultaneous and successive cognitive processing 
in facial recognition in adults with mental retardation. American 
Journal on Mental Retardation, 106(2), 151-161. 

Kroesbergen, E. H., Van Luit, J. E. H., & Naglieri, J. A. (2003). Mathe- 
matical learning difficulties and PASS cognitive processes. Journal 
of Learning Disabilities, 36(6), 574-582. 



144 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



Luria, A. R. (1973). The working brain: An introduction to neuropsy- 
chology. New York: Basic Books. 

Naglieri, J. A. (1996). Cognitive assessment: Nontraditional intelli- 
gence tests. In T. Fagan & P. Warden (Eds.), Encyclopedia of school 
psychology (pp. 69-70). Westport, CT: Greenwood Press. 

Naglieri, J. A. (1997). IQ: Knowns and unknowns, hits and misses. 
American Psychologist, 52(1), 75-76. 

Naglieri, J. A. (1999a). Essentials of CAS assessment. New York: John 
Wiley & Sons. 

Naglieri, J. A. (1999b). How valid is PASS theory and CAS? School 
Psychology Review, 28(1), 145-162. 

Naglieri, J. A. (2001). Understanding intelligence, giftedness and cre- 
ativity using the PASS theory. Roeper Review, 23(3), 151-157. 

Naglieri, J. A., & Bornstein, B. T. (2003). Intelligence and achieve- 
ment: Just how correlated are they? Journal of Psychoeducational 
Assessment, 21(3), 244-260. 

Naglieri, J. A., & Das, J. P. (1997). Cognitive Assessment System inter- 
pretive handbook. Itasca, IL: Riverside Publishing. 

Naglieri, J. A., & Edwards, G. H. (2004). Assessment of children with 
attention and reading difficulties using the PASS theory and Cog- 
nitive Assessment System. Journal of Psychoeducational Assess- 
ment, 22, 93-105. 



Naglieri, J. A., Goldstein, S„ Iseman, J. S„ & Schwebach, A. (2003). 
Performance of children with attention deficit hyperactivity dis- 
order and anxiety/depression on the WISC-III and Cognitive As- 
sessment System (CAS). Journal of Psychoeducational Assessment, 
21, 32-42. 

Naglieri, J. A., & Gottling, S. H. (1997). Mathematics instruction and 
PASS cognitive processes: An intervention study. Journal of Learn- 
ing Disabilities, 30(5), 513-520. 

Naglieri, J. A., & Johnson, D. (2000). Effectiveness of a cognitive strat- 
egy intervention in improving arithmetic computation based on 
the PASS theory. Journal of Learning Disabilities, 33(6), 591-597. 

Naglieri, J. A., & Rojahn, J. (2001). Gender differences in planning, at- 
tention, simultaneous and successive (PASS) cognitive processes 
and achievement. Journal of Educational Psychology, 93(2), 430-437. 

Naglieri, J. A., & Rojahn, J. (2004). Construct validity of the PASS 
theory and CAS: Correlations with achievement. Journal of Edu- 
cational Psychology, 96(1), 174-181. 

Nunnally, J. C, & Bernstein, I. H. (1994). Psychometric theory (3rd 
ed.). New York: McGraw-Hill, Inc. 

Spiers, P. A. (1981). Have they come to praise Luria or to bury him? 
The Luria-Nebraska neuropsychological battery controversy. 
Journal of Consulting and Clinical Psychology, 49, 331-341. 



Dementia Rating Scale-2 (DRS-2) 



PURPOSE 

The purpose of this scale is to provide an index of cognitive 
function in people with known or suspected dementia. 

SOURCE 

The test (including Professional Manual, 50 scoring booklets 
and profile forms, and one set of stimulus cards) can be or- 
dered from Psychological Assessment Resources, Inc., P.O. 
Box 998, Odessa, FL (www.parinc.com), at a cost of $229 US. 
An alternate form is also available. 

AGE RANGE 

The test is intended for individuals aged 55 years and older. 



The items on the test are similar to those employed by neu- 
rologists in bedside mental status examinations. They are 
arranged hierarchically, from difficult to easier items, so that 
adequate performance on an initial item allows the examiner 
to discontinue testing within that section and to assume that 
credit can be given for adequate performance on the subse- 
quent tasks. A global measure of dementia severity is derived 
from subscales of specific cognitive capacities. The subscales 
include measures of attention (e.g., digit span, detecting As'), 
initiation and perseveration (e.g., performing alternating 
movements, copying repeated patterns, semantic fluency), 
construction (e.g., copying designs, writing name), conceptu- 
alization (e.g., similarities), and verbal and nonverbal short- 
term memory (e.g., sentence recall, design recognition; see 
Table 6-24). 



DESCRIPTION 

Some patients, such as the elderly with profound cognitive 
impairments, may generate very few responses on such stan- 
dard tests as the Wechsler Adult Intelligence Scale or the 
Wechsler Memory Scale, making it difficult to assess the mag- 
nitude of their mental impairments. The Dementia Rating 
Scale (DRS) was developed to quantify the mental status of 
such patients (Coblentz et al, 1973; Mattis, 1976, 1988). A 
new version, the Dementia Rating Scale-2 (DRS-2), has re- 
cently been published (Jurica et al., 2001). Although the 
manual has been updated, the scoring booklet improved, and 
new norms provided, the test is the same. In addition, an 
alternate form (DRS-2: Alternate Form), consisting of new 
item content, has been provided (Schmidt & Mattis, 2004). 



ADMINISTRATION 

See Source. Briefly, the examiner asks questions or gives instruc- 
tions (e.g. "In what way are an apple and a banana alike?") and 
records responses. The DRS subtests are presented in a fixed or- 
der generally corresponding to the Attention (ATT), Initiation/ 
Perseveration (I/P), Construction (CONST), Conceptualiza- 
tion (CONCEPT), and Memory (MEM) subscales; however, 
not all Attention tasks are presented in a sequence because 
some also serve as time-filling distracters between presentations 
of memory tasks. 

Generally, if the first one or two tasks in a subscale are 
performed well, subsequent (easier) tasks are credited with a 
correct performance, and the examiner proceeds to the next 
subscale. 



Dementia Rating Scale-2 (DRS-2) 145 



Table 6-24 Subscales and Subtests of the DRS-2 



Subscale 

Attention 



Initiation/Perseveration 



Construction 
Conceptualization 



Memory and Attention 



Subtests 

Digit Span 

Two Successive Commands 

Single Command 

Imitation 

Counting Distraction 1 and 2 

Verbal Recognition — Presentation 

Visual Matching 

Complex Verbal Initiation/Perseveration 

Consonant Perseveration 

Vowel Perseveration 

Double Alternating Movements 

Alternate Tapping 

Graphomotor Design 

Construction Designs 

Identities and Oddities 

Similarities 

Priming Inductive Reasoning 

Differences 

Similarities-Multiple Choice 

Orientation 

Verbal Recall — Reading 

Verbal Recall — Sentence Initiation 

Verbal Recognition 

Visual Recognition 



Maximum 
Points 

37 



37 



6 

39 



25 



ADMINISTRATION TIME 

The time required is approximately 10 to 15 minutes for nor- 
mal elderly subjects. With a demented patient, administration 
may take 30 to 45 minutes to complete. 



SCORING 

See Source. One point is given for each item performed cor- 
rectly. Maximum score is 144. 



DEMOGRAPHIC EFFECTS 

Age 

Age affects performance, with younger adults obtaining 
higher scores than older ones (e.g., Bank et al., 2000; Lucas 
et al., 1998; Rilling et al, 2005; Smith et al, 1994). 

Education/IQ 

Numerous authors (Bank et al, 2000; Chan et al., 2001, 2003; 
Freidl et al, 1996, 1997; Kantarci et al., 2002; Lucas et al., 
1998; Marcopulos & McLain, 2003; Marcopulos et al, 1997; 
Monsch et al., 1995; Rilling et al, 2005; Schmidt et al, 1994, 



Smith et al., 1994) have reported that performance varies not 
only by age, but also by education and IQ. Accordingly, nor- 
mative data broken down by age and education are preferred. 



Gender/Race and Culture 

Gender (Bank et al., 2000; Chan et al., 2001; Lucas et al, 1998; 
Monsch et al, 1995; Rilling et al, 2005; Schmidt et al, 1994) 
has little impact on test scores. African Americans tend to ob- 
tain lower scores than Caucasians (Rilling et al, 2005), sug- 
gesting the need for ethnicity-specific norms. Cultural factors 
also appear to have an impact (see Validity). 



NORMATIVE DATA 

Mattis (1976; cited in Montgomery, 1982) initially recom- 
mended a cutoff score of 137 for identifying impairment. 
However, this cutoff is of limited value since the sample sizes 
on which the score is based (Coblentz et al., 1973) were ex- 
tremely small (i.e., 20 brain-damaged subjects, 11 normals). 
Different cutoff scores (e.g., DRS < 123) have been provided 
by others (e.g., Montgomery, 1982); however, the clinical util- 
ity of these scores is also limited, because sample sizes were 
small and the subjects were relatively well educated. Because 



146 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



studies have demonstrated significant relationships between 
age as well as education and DRS performance (see previous 
discussion as well as DRS-2 manual), simple cutoff scores are 
considered inappropriate. Recent studies have provided data 
stratified by both age and education. 

Schmidt et al. (1994) collected data from 1001 Austrian 
subjects, aged 50 to 80 years, who were free of neuropsychi- 
atric or severe general diseases. The mean age of participants 
was 66.3 years (range 50-80), with a mean education of 10.8 
years (SD= 2.3). Although the sample size is superior, the 
data may not be suitable for use with North American sam- 
ples, given the cultural and language differences and the lack 
of participants over the age of 80 years. 

Lucas et al. (1998; see also DRS-2 manual) present norms 
for 623 community-dwelling adults over the age of 55 years 
(age, M= 79.2 years, SD= 7.6). The participants were pre- 
dominantly Caucasian with a relatively high level of educa- 
tion (M= 13.1 years, SD= 7.6) and were reported by their 
physician to have no active medical disorder with potential to 
affect cognition. These data were collected as part of Mayo's 
Older Americans Normative Studies (MOANS) and afford 
the clinician the advantage of being able to compare DRS 
scores to patient performance on other tests having the 
MOANS norms (e.g., WAIS-R, WMS-R, RAVLT) (Lucas et 
al., 1998). The DRS data are shown in Tables 6-25 through 
6-33. Age-corrected MOANS scaled scores (M= 10, SD = 3) 
are presented in the left-hand column of the table while 
corresponding percentile ranks are given in the right-hand 
column. To further adjust for the effects of education, a stan- 
dard linear regression was used to derive age- and educa- 
tion corrected MOANS Total scaled scores. This formula is 



presented in Table 6-34. Efforts to provide adjustment for 
education at the subtest level resulted in scaling problems 
due to the highly skewed nature of some subtests. Therefore, 
education corrections are applied only to the total score. 
Note, however, that the underrepresentation of participants 
with limited educational backgrounds (i.e., <8 years) cau- 
tions against the application of this formula in such indi- 
viduals. 

Although the MOANS (Lucas et al. 1998) data represent a 
very important contribution to the normative base, the sam- 
ple consists largely of Caucasian adults living in economically 
stable regions of the United States. As a result, their norms 
will likely overestimate cognitive impairment in those with 
limited education and different cultural or health-related ex- 
periences (Bank et al., 2000; Yochim et al., 2003). A number of 
authors (Bank et al., 2000; Vangel & Lichtenberg, 1995) have 
attempted to address the issue by increasing African American 
representation in the normative samples. Recently, Rilling 
et al. (2005) provided age- and education-adjusted normative 
data based on 307 African American community-dwelling 
participants from the MOAANS (Mayo African American 
Normative Studies) project in Jacksonville, Florida. Partici- 
pants were predominantly female (75%), ranged in age from 
56 to 94 years (M= 69.6 years, SD = 6.87), and varied in edu- 
cation from to 20 years of formal education (M = 12.2 years, 
SD= 3.48). They were screened to exclude those with active 
neurological, psychiatric, or other conditions that might af- 
fect cognition. Age-corrected MOAANS scaled scores and 
percentile ranks for the DRS total and subtest scores are pre- 
sented in Tables 6-35 through 6-41. The computational for- 
mula to calculate age- and education-corrected MOAANS 



Table 6-25 MOANS Scaled Scores for Persons Under Age 69 Years 







Scores 


Attention 


2 


<27 


3 


27-28 


4 


29-30 


5 


31 


6 


32 


7 


33 


8 


34 


9 


— 


10 


35 


11 


36 


12 


— 


13 


37 


14 


— 


13 


— 


16 


— 


17 


— 


18 


— 



Dementia Rating Scale Subtests 



<24 


24-26 


27-28 


29-30 


31-33 


34 


35 


36 


37 



itruction 


Conceptualization 


Memory 


Total 


Percentile Ranges 


0-2 


<25 


<17 


<115 


<1 


3 


25-26 


17 


115-119 


1 


— 


27 


18-19 


120-121 


2 


4 


28-29 


20 


122-127 


3-5 


— 


30-32 


21 


128-130 


6-10 


5 


33 


22 


131-132 


11-18 


— 


34-35 


— 


133-134 


19-28 


— 


36 


23 


135-136 


29-40 


6 


37 


24 


137-139 


41-59 


— 


38 


— 


140 


60-71 


— 


39 


— 


141 


72-81 


— 


— 


25 


142 


82-89 


— 


— 


— 


143 


90-94 


— 


— 


— 


144 


95-97 
98 


— 


— 


— 


— 


99 
>99 



Source: From Lucas et al., 1998. Reprinted with the kind permission of Psychology Press. 



Dementia Rating Scale-2 (DRS-2) 147 



Table 6-26 MOANS Scaled Scores for Persons Aged 69-71 Years 



Scaled 
Scores 



Dementia Rating Scale Subtests 



Attention Initiation/Perseveration Construction Conceptualization Memory Total 



Percentile Ranges 



2 


<27 


<24 


3 


27-28 


24-26 


4 


29-30 


27-28 


5 


31 


29-30 


6 


32 


31-32 


7 


33 


33-34 


8 

9 
10 


34 


35 


35 


36 


11 


36 


37 


12 


— 


— 


13 


37 


— 


14 


— 


— 


15 


— 


— 


16 


— 


— 


17 


— 


— 


18 


— 


— 



1-2 


<25 


<17 


<110 


<1 


3 


25-26 


17 


110-119 


1 




27 


18-19 


120-121 


2 


4 


28-29 


20 


122-126 


3-5 




30-31 


21 


127-129 


6-10 


5 


32-33 


22 


130-132 


11-18 




34 


— 


133-134 


19-28 




35 


23 


135-136 


29-40 


6 


36-37 


24 


137-139 


41-59 




38 


— 


140 


60-71 




— 


— 


141 


72-81 




39 


25 


142 


82-89 




— 


— 


143 


90-94 




— 


— 


144 


95-97 
98 




— 


— 


— 


99 
>99 



Source: From Lucas et al., 1998. Reprinted with the kind permission of Psychology Press. 



scaled scores (MSS A&E ) for the DRS total score is shown in 
Table 6-42. 

The MOAANS normative data are similar to normative 
estimates provided by others (Banks et al., 2000; Marcopulos 
& McLain, 2003), based on mixed-ethnic samples with 
significant proportions of older African American partici- 



pants. Of note, norms for the DRS were developed in con- 
junction with norms for other tests (e.g., BNT, RAVLT, JLO, 
verbal fluency, MAE Token Test; see descriptions elsewhere in 
this volume), allowing the clinician to compare an individ- 
ual's performance across tasks included in the MOANS/ 
MOAANS battery. 



Table 6-27 MOANS Scaled Scores for Persons Aged 72-74 Years 









Dementia 


Rating Scale Subtests 












Scores 


Attention 


Initiation/Perseveration 


Construction 


Conceptualization 


Memory 


Total 


Percentile Ranges 


2 


<27 


<24 




0-2 




<25 


<17 


<110 


<1 


3 


27-28 


24-26 




3 




25-26 


17 


110- 


-119 


1 


4 


29 


27-28 




— 




27 


18-19 


120- 


-121 


2 


5 


30 


29-30 




4 




28-29 


20 


122- 


-126 


3-5 


6 


31 


31 




— 




30-31 


21 


127- 


-129 


6-10 


7 


32 


32-33 




5 




32-33 


— 


130- 


-131 


11-18 


8 


33-34 


34-35 




— 




34 


22 


132- 


-133 


19-28 


9 


— 


— 




— 




35 


— 


134- 


-135 


29^0 


10 


35 


36 




6 




36-37 


23-24 


136- 


-138 


41-59 


11 


36 


— 




— 




38 


— 


139 


60-71 


12 


— 


37 




— 




— 


— 


140- 


-141 


72-81 


13 


37 


— 




— 




39 


25 


142 


82-89 


14 


— 


— 




— 




— 


— 






90-94 


15 

16 


— 


— 




— 




— 


— 


143- 


-144 


95-97 
98 


17 
18 


— 


— 




— 




— 


— 






99 
>99 



Source: From Lucas et al., 1998. Reprinted with the kind permission of Psychology Press. 



148 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 
Table 6-28 MOANS Scaled Scores for Persons Aged 75-77 Years 



Scaled 
Scores 



Dementia Rating Scale Subtests 



Attention Initiation/Perseveration Construction Conceptualization Memory Total 



Percentile Ranges 



2 


<27 


<24 


3 


27-28 


24-26 


4 


29 


27-28 


5 


30 


29-30 


6 


31 


31 


7 


32 


32-33 


8 


33-34 


34 


9 


— 


35 


10 


35 


36 


11 


— 


— 


12 


36 


37 


13 


37 


— 


14 


— 


— 


15 


— 


— 


16 


— 


— 


17 


— 


— 


18 


— 


— 



i-2 


<23 


<17 


<109 


<1 


3 


23-26 


17 


109-119 


1 




27 


18 


120-121 


2 


4 


28-29 


19-20 


122-125 


3-5 




30-31 


— 


126-128 


6-10 


5 


32 


21 


129-130 


11-18 




33-34 


22 


131-132 


19-28 




35 




133-134 


29^0 


6 


36-37 


23 


135-137 


41-59 




— 


24 


138-139 


60-71 




38 


— 


140 


72-81 




39 


25 


141-142 


82-89 




— 


— 


— 


90-94 




— 


— 


143 


95-97 




— 


— 


144 


98 

99 

>99 











Source: From Lucas et al., 1998. Reprinted with the kind permission of Psychology Press. 



RELIABILITY 

Internal Consistency 

Gardner et al. (1981) reported a split-half reliability of 
.90 for the total scale in a sample of nursing home patients 
with neurological disorders. Vitaliano, Breen, Russo, et al. 



(1984) examined internal consistency of the DRS in a 
small sample of individuals with probable AD. The alpha 
coefficients were adequate to high for the subscales: Atten- 
tion (.95), Initiation/Perseveration (.87), Conceptualization 
(.95), and Memory (.75). Smith et al. (1994) found mixed 
support for the reliability of DRS scales in a sample of 274 



Table 6-29 MOANS Scaled Scores for Persons Aged 78-80 Years 



Scaled 
Scores 




Attention 


2 


<26 


3 


26-28 


4 


29 


5 


30 


6 


31 


7 


32 


8 


33 


9 


34 


10 


35 


11 


— 


12 


36 


13 


— 


14 


37 


15 


— 


16 


— 


17 


— 


18 


— 



Dementia Rating Scale Subtests 



<24 
24-26 
27-28 
29-30 
31 
32 
33-34 

35-36 

37 



itruction 


Conceptualization Memory 


Total 


Percentile Ranges 


0-2 


<19 


<17 


<108 


<1 


3 


19-25 


17 


108-115 


1 


— 


26 


18 


116-119 


2 


4 


27-28 


19-20 


120-122 


3-5 


— 


29-30 


— 


123-126 


6-10 


5 


31-32 


21 


127-129 


11-18 


— 


33-34 


22 


130-131 


19-28 


— 


35 


— 


132-134 


29-40 


6 


36 


23 


135-136 


41-59 


— 


37 


24 


137-138 


60-71 


— 


38 


— 


139-140 


72-81 


— 


— 


25 


141 


82-89 


— 


39 


— 


142 


90-94 


— 


— 


— 


143 


95-97 


— 


— 


— 


144 


98 

99 

>99 


_ 


_ 


_ 





Source: From Lucas et al., 1998. Reprinted with the kind permission of Psychology Press. 



Dementia Rating Scale-2 (DRS-2) 149 



Table 6-30 MOANS Scaled Scores for Persons Aged 81-83 Years 



Scaled 
Scores 



Dementia Rating Scale Subtests 



Attention Initiation/Perseveration Construction Conceptualization Memory Total 



Percentile Ranges 



2 


<26 


<24 


3 


26-28 


24-25 


4 


29 


26-27 


5 


30 


28-29 


6 


31 


30 


7 


32 


31-32 


8 


33 


33-34 


9 


34 


— 


10 


35 


35 


11 


— 


36 


12 


36 


— 


13 


— 


37 


14 


37 


— 


15 


— 


— 


16 


— 


— 


17 


— 


— 


18 


— 


— 



<19 


<16 


<108 


<1 


19-24 


16 


108-114 


1 


25 


17-18 


115-117 


2 


26-27 


19 


118-121 


3-5 


28-30 


20 


122-126 


6-10 


31-32 


21 


127-128 


11-18 


33 


22 


129-130 


19-28 


34 


— 


131-133 


29^0 


35-36 


23 


134-136 


41-59 


37 


— 


137 


60-71 


38 


24 


138-139 


72-81 


— 


— 


140-141 


82-89 


39 


25 


— 


90-94 


— 


— 


142 


95-97 


— 


— 


143 


98 


— 


— 


144 


99 


— 


— 


— 


>99 



Source: From Lucas et al., 1998. Reprinted with the kind permission of Psychology Press. 



older patients with cognitive impairment. Internal con- 
sistency (Cronbach's alpha) was greater than .70 for Construc- 
tion, Conceptualization, Memory, and total score, greater 
than .65 for Attention, and only about .45 for Initiation and 
Perseveration. Interpretation of the Initiation and Perse- 
veration scale as measuring a single construct is, therefore, 
hazardous. 



Test-Retest Reliability and Practice Effects 

When 30 patients with provisional diagnoses of AD were 
retested following a one-week interval, the correlation for the 
DRS total score was .97, whereas subscale correlations ranged 
from .61 to .94 (Coblentz et al., 1973). The means and stan- 
dard deviations for the total score and subtest scores at test 



Table 6-31 MOANS Scaled Scores for Persons Aged 84-86 Years 









Dementia Rating Scale Subtests 










Scores 


Attention 


Initiation/Perseveration 


Construction 


Conceptualization 


Memory 


Total 


Percentile Ranges 


2 


<26 


<24 




0-2 




<19 


<15 


<107 


<1 


3 


26-28 


24-25 




3 




19-23 


15 


107-112 


1 


4 


29 


26 




— 




24 


16 


113 


2 


5 


30 


27-28 




4 




25-26 


17-19 


114-120 


3-5 


6 


31 


29-30 




— 




27-29 


20 


121-124 


6-10 


7 


32 


31 




— 




30-31 


— 


125-127 


11-18 


8 


33 


32-33 




5 




32-33 


21 


128-129 


19-28 


9 


— 


34 




— 




34 


22 


130-132 


29-40 


10 


34-35 


35 




6 




35-36 


23 


133-135 


41-59 


11 


— 


36 




— 




37 


— 


136-137 


60-71 


12 


36 


— 




— 




38 


24 


138 


72-81 


13 


— 


37 




— 




— 


— 


139-140 


82-89 


14 


37 


— 




— 




39 


25 


141 


90-94 


15 


— 


— 




— 




— 


— 


142 


95-97 


16 


— 


— 




— 




— 


— 


— 


98 


17 


— 


— 




— 




— 


— 


143-144 


99 


18 


— 


— 




— 




— 


— 


— 


>99 



Source: From Lucas et al., 1998. Reprinted with the kind permission of Psychology Press. 



150 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



Table 6-32 MOANS Scaled Scores for Persons Aged 87-89 Years 



Scaled 
Scores 



Dementia Rating Scale Subtests 



Attention Initiation/Perseveration Construction Conceptualization Memory Total 



Percentile Ranges 



2 


<26 


<19 


3 


26-28 


19-21 


4 


29 


22-23 


5 


30 


24-26 


6 


31 


27-29 


7 


32 


30-31 


8 


— 


32-33 


9 


33 


34 


10 


34-35 


35 


11 


— 


36 


12 


36 


— 


13 


— 


37 


14 


37 


— 


15 


— 


— 


16 


— 


— 


17 


— 


— 


18 


— 


— 



0-1 


<19 


<14 


<104 


<1 


2 


19-22 


14-15 


104-109 


1 


3 


23-24 


— 


110-112 


2 


— 


25-26 


16 


113-115 


3-5 


4 


27-28 


17-19 


116-122 


6-10 


— 


29-30 


20 


123-126 


11-18 


5 


31-32 


21 


127-129 


19-28 


— 


33-34 


22 


130-131 


29-40 


6 


35-36 


23 


132-134 


41-59 


— 


37 


— 


135-136 


60-71 


— 


— 


24 


137-138 


72-81 


— 


38 


— 


139 


82-89 


— 


39 


25 


140-141 


90-94 


— 


— 


— 


142 


95-97 
98 

99 


— 





— 


143 


— 


— 


— 


144 


>99 



Source: From Lucas et at, 1998. Reprinted with the kind permission of Psychology Press. 



and retest, as well as the retest correlations, are shown in Table 
6-43 and reveal minimal effects of practice in this population. 
Smith et al. (1994) retested a sample of 154 older normal indi- 
viduals following an interval of about one year and found that 
DRS total score declines of 10 points or greater occurred in 
less than 5% of normals. Not surprisingly, over this compara- 
ble interval, 61% of 110 dementia patients displayed a decline 
in DRS total scores of 10 or more points. 



Alternate Form Reliability 

This is reported to be high in community-dwelling elderly in- 
dividuals, with a correlation coefficient of .82 for the total 
score and correlations ranging from .66 (Initiation/Persevera- 
tion) to .80 (Memory) for the subscales. In addition, no sig- 
nificant differences were found between total scale and 
subscale scores of the two forms (Schmidt et al, 2005). 



Table 6-33 MOANS Scaled Scores for Persons Over Age 89 Years 









Dementia Rating Scale Subtests 












Scores 


Attention 


Initiation/Perseveration 


Construction 


Conceptualization 


Memory 


Total 


Percentile Ranges 


2 


<26 


<19 




0-1 




<19 


<14 


<104 


<1 


3 


26-28 


19-20 




2 




19-21 


14 


104- 


-108 


1 


4 


29 


21 




3 




22 


15 


109- 


-112 


2 


5 


30 


22-25 




— 




23-25 


16 


113- 


-114 


3-5 


6 


31 


26-27 




4 




26-27 


17 


115- 


-118 


6-10 


7 


32 


28-29 




— 




28-30 


18-19 


119- 


-123 


11-18 


8 


— 


30-32 




5 




31 


20 


124- 


-126 


19-28 


9 


33 


33 




— 




32-33 


21 


127- 


-129 


29-40 


10 


34-35 


34 




6 




34-36 


22 


130- 


-133 


41-59 


11 


— 


35 




— 




37 


23 


134- 


-135 


60-71 


12 


36 


36 




— 




— 


— 


136- 


-137 


72-81 


13 


— 


— 




— 




38 


24 


138- 


-139 


82-89 


14 


37 


37 




— 




— 


— 


140- 


-141 


90-94 


15 

16 

17 


— 


— 




— 




39 


25 


142 


95-97 
98 

99 





















143 


18 


— 


— 




— 




— 


— 


144 


>99 



Source: From Lucas et al., 1998. Reprinted with the kind permission of Psychology Press. 



Dementia Rating Scale-2 (DRS-2) 151 



Table 6-34 Regression Formula for Age- and Education-Corrected 
MOANS Scaled Scores for DRS Total 

Age- and education-corrected MOANS scaled scores (AEMSS) can 
be calculated for DRS Total scores by using age-corrected MOANS 
scaled scores (AMSS) and education (expressed in years 
completed) in the following formula: 

AEMSS = 2.56 + (1.11 x AMSS) - (0.30 x EDUC). 

Source: From Lucas et at, 1998. Reprinted with the kind permission of Psychology 
Press. 



VALIDITY 

Construct Validity 

The test correlates well with the Wechsler Memory Scale Mem- 
ory Quotient (.70), the WAIS Full-Scale IQ (.67), cortical 
metabolism (.59) as determined by positron emission tomog- 
raphy (PET) (Chase et al., 1984), and composite scores de- 
rived from standard neuropsychological tests (Knox et al., 
2003), supporting its use as a global assessment measure. 
Smith et al. (1994) found that in older adults who were cogni- 
tively impaired, DRS total score shared 54% of its variance 
with MAYO FSIQ and 57% with MAYO VIQ. In individuals 
with mental deficiency, the test loaded on the same factor as 
the Peabody Picture Vocabulary Test-Revised (Das et al., 
1995). Moreover, the test correlates highly (about r .70-80) 
with other commonly used standardized mental status exami- 
nations, such as the Mini-Mental-State-Exam (MMSE) and 
the Information-Memory-Concentration test (IMC), suggest- 
ing that they evaluate overlapping mental abilities (e.g., Bob- 



holz & Brandt, 1993; Salmon et al, 1990). Freidl et al. (1996; 
2002), however, found only a weak relationship between the 
DRS and MMSE (r = .29) and low agreement with regard to 
cognitive impairment in a community sample. Conversion 
formulas are available (see Comparing Scores on Different 
Tests) but given their lack of precision should be used with 
considerable caution. 

In designing the test, Mattis grouped the tasks according to 
their face validity into five subsets: Memory, Construction, 
Initiation and Perseveration, Conceptualization, and Attention. 
Although there are generally high correlations between DRS 
and MMSE total scores, subscales of the DRS do not always 
show the expected relationships with items of the MMSE. For 
example, Bobholz and Brandt (1993) reported that the Atten- 
tion item of the MMSE (serial sevens) was not significantly 
correlated with the Attention subscale of the DRS. Smith et al. 
(1994) provided evidence of convergent validity for some of 
the DRS scales. In a sample of 234 elderly patients with cogni- 
tive impairment, DRS subscale scores for Memory, Attention, 
and Conceptualization were significantly correlated with ap- 
propriate indices (GMI, ACI, and VIQ, respectively) from the 
WAIS-R and Wechsler Memory Scale-Revised, as assessed in 
the Mayo Older Americans Normative Studies. Support for the 
convergent validity of the Construction scale was more prob- 
lematic. Smith et al. found that this scale correlated more 
highly with VIQ and ACI scales than with PIQ, raising the con- 
cern that it may provide a better index of attention and general 
cognitive status than of visual-perceptual/visual-constructional 
skills per se. Marson et al. (1997) reported that in a sample of 
50 patients with mild to moderate AD, four of the five DRS 
subscales correlated most strongly with their assigned criterion 



Table 6-35 MOAANS Scaled Scores for Persons Aged 56-62 Years (Midpoint Age = 61, Age Range for Norms = 56-66, N= 108) 



Scaled 






Dementia Rating Scale Subtests 










Scores 


Attention 


Initiation/Perseveration 


Construction 


Conceptualization 


Memory 


Total 


Percentile Range 


2 


0-23 


0-27 




0-1 




0-23 


0-15 


0-110 


<1 


3 


24-29 


28-29 




2 




24-25 


16-17 


111-115 


1 


4 


— 


30 




— 




26 


18 


116 


2 


5 


30 


31 




— 




27-28 


— 


117-121 


3-5 


6 


31 


32 




3 




29 


19-20 


122-123 


6-10 


7 


32 


33-34 




— 




30-31 


21 


124-126 


11-18 


8 


33 


35 




4 




32 


22 


127-129 


19-28 


9 


34 


36 




5 




33-34 


23 


130-132 


29-40 


10 


35 


— 




— 




35 


24 


133-135 


41-59 


11 


36 


— 




— 




36 


— 


136-137 


60-71 


12 


— 


37 




6 




37 


25 


138 


72-81 


13 


— 


— 




— 




38 


— 


139-141 


82-89 


14 


37 


— 




— 




— 


— 


142 


90-94 


15 

16 

17 


— 


— 




— 




39 


— 


143 


95-97 
98 

99 


— 


— 















144 


18 


— 


— 




— 




— 


— 


— 


>99 



Source: From Rilling et al., 2005. Reprinted by permission of the Mayo Foundation for Medical Education and Research. 



152 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 
Table 6-36 MOAANS Scaled Scores for Persons Aged 63-65 Years (Midpoint Age = 64, Age Range for Norms = 59-69, N= 130) 



Scaled 
Scores 



Dementia Rating Scale Subtests 



Attention Initiation/Perseveration Construction Conceptualization Memory Total 



Percentile Range 



2 


0-23 


0-27 


3 


24-27 


28-29 


4 


28-29 


30 


5 


— 


31 


6 


30-31 


32 


7 


32 


33-34 


8 


33 


35 


9 


34 


36 


10 


35 


— 


11 


— 


— 


12 


36 


37 


13 


— 


— 


14 


37 


— 


15 


— 


— 


16 


— 


— 


17 


— 


— 


18 


— 


— 



0-1 


0-21 


0-15 


0-102 


<1 


2 


22-23 


16-17 


103-107 


1 


— 


24 


— 


108-114 


2 


— 


25-27 


18 


115-118 


3-5 


3 


28-29 


19 


119-122 


6-10 


— 


30 


20 


123-125 


11-18 


4 


31-32 


21-22 


126-129 


19-28 


5 


33 


23 


130-131 


29^0 


— 


34-35 


— 


132-134 


41-59 


— 


36 


24 


135-136 


60-71 


6 


— 


— 


137-138 


72-81 


— 


37-38 


25 


139-140 


82-89 


— 


— 


— 


141 


90-94 


— 


39 


— 


142-143 


95-97 
98 

99 


— 





— 


144 


— 


— 


— 


— 


>99 



Source: From Rilling et al., 2005. Reprinted by permission of the Mayo Foundation for Medical Education and Research. 



variables (Attention with WMS-R Attention, Initiation/Perse- 
veration with COWA, Conceptualization with WAIS-R Simi- 
larities, Memory with WMS-R Verbal Memory). However, the 
Construction scale correlated as highly with Block Design as 
with WMS-R Attention. Brown et al. (1999) found that in a 
sample of patients with Parkinson's disease, some DRS sub- 
scales correlated significantly with conceptually related mea- 
sures from other tests (Attention with WAIS-R Digit Span 



Forward, Initiation/Perseveration with WCST perseverative re- 
sponses, Conceptualization with WAIS-R Similarities, Memory 
with WMS Immediate Logical Memory). No signification cor- 
relation was observed between the Construction subscale and 
other tests. Thus, the available literature suggests that the DRS 
does not assess aspects of visual-constructional/visual-spatial 
functioning and that additional measures will need to be sup- 
plemented to adequately examine this domain. 



Table 6-37 MOAANS Scaled Scores for Persons Aged 66-68 Years (Midpoint Age = 67, Age Range for Norms = 62-72, N= 167) 

Dementia Rating Scale Subtests 
Initiation/Perseveration 

0-27 

28 

29 

30 
31-32 
33-34 

35 

36 



37 



Scores 


Attention 


2 


0-23 


3 


24-27 


4 


28-29 


5 


— 


6 


30-31 


7 


32 


8 


33 


9 


34 


10 


35 


11 


— 


12 


36 


13 


— 


14 


37 


15 


— 


16 


— 


17 


— 


18 


— 



itruction 


Conceptualization 


Memory 


Total 


Percentile Range 


0-1 


0-21 


0-15 


0-102 


<1 


2 


22-23 


16 


103-107 


1 


— 


24 


17 


108-113 


2 


— 


25-26 


18 


114-117 


3-5 


3 


27-28 


19 


118-121 


6-10 


— 


29 


20 


122-125 


11-18 


4 


30-31 


21 


126-128 


19-28 


— 


32-33 


22 


129-130 


29-40 


5 


34-35 


23 


131-133 


41-59 


— 


— 


24 


134-135 


60-71 


6 


36 


— 


136-137 


72-81 


— 


37 


25 


138-139 


82-89 


— 


38 


— 


140 


90-94 


— 


39 


— 


141-142 


95-97 


— 


— 


— 


143 


98 


— 


— 


— 


144 


99 


— 


— 


— 


— 


>99 



Source: From Rilling et al., 2005. Reprinted by permission of the Mayo Foundation for Medical Education and Research. 



Dementia Rating Scale-2 (DRS-2) 153 

Table 6-38 MOAANS Scaled Scores for Persons Aged 69-71 Years (Midpoint Age = 70, Age Range for Norms = 65-75, N= 182) 



Scaled 
Scores 



Dementia Rating Scale Subtests 



Attention Initiation/Perseveration Construction Conceptualization Memory Total 



Percentile Ranges 



2 


0-23 


0-24 


3 


24-27 


25-28 


4 


28-29 


29 


5 


30 


30 


6 


31 


31 


7 


32 


32-33 


8 


33 


34-35 


9 


34 


36 


10 


35 


— 


11 


— 


— 


12 


36 


37 


13 


— 


— 


14 


37 


— 


15 


— 


— 


16 


— 


— 


17 


— 


— 


18 


— 


— 



0-1 


0-21 


0-15 


0-100 


<1 


2 


22-23 


16 


101-106 


1 


— 


24 


— 


107 


2 


— 


25 


17-18 


108-116 


3-5 


3 


26-27 


19 


117-119 


6-10 


— 


28-29 


20 


120-123 


11-18 


4 


30 


21 


124-127 


19-28 


— 


31-32 


22 


128-130 


29^0 


5 


33-35 


23 


131-133 


41-59 


— 


— 


24 


134-135 


60-71 


6 


36 


— 


136-137 


72-81 


— 


37 


25 


138 


82-89 


— 


38 


— 


139-140 


90-94 


— 


— 


— 


141 


95-97 


— 


39 


— 


142 


98 


— 


— 


— 


143 


99 


— 


— 


— 


144 


>99 



Source: From Rilling et al., 2005. Reprinted by permission of the Mayo Foundation for Medical Education and Research. 



Factor-analytic studies suggest that the five subscales do not 
reflect exclusively the constructs with which they are labeled 
(Colantonio et al, 1993; Kessler et al, 1994; Woodard et al., 
1996). Kessler et al. (1994) found that a two-factor model, 
specifying separate verbal and nonverbal functions, provided 
the best fit for the data in a heterogeneous sample, approxi- 
mately two-thirds of which carried psychiatric (depression) or 
dementia (AD-type) diagnoses. Similar results were reported 



by Das et al. (1995) in a sample of individuals with mental 
retardation of moderate to severe degree. In a study by Colan- 
tonio et al. (1993) with a sample of patients with probable AD, 
three factors emerged, which they labeled: ( 1 ) Conceptualiza- 
tion/Organization, containing tasks of priming inductive rea- 
soning, similarities, differences, identities and oddities, and 
sentence generation; (2) Visuospatial, containing subtests of 
graphomotor, construction, attention, alternating movements, 



Table 6-39 MOAANS Scaled Scores for Persons Aged 72-74 Years (Midpoint Age = 73, Age Range for Norms = 



-78,N=157) 



Scaled 
Scores 



Dementia Rating Scale Subtests 



Attention Initiation/Perseveration Construction Conceptualization Memory Total 



Percentile Range 



2 


0-23 


0-22 


3 


24-27 


23-25 


4 


28 


26-27 


5 


29 


28-30 


6 


30-31 


31 


7 


32 


32 


8 


33 


33-34 


9 


34 


35 


10 


— 


36 


11 


35 


— 


12 


36 


37 


13 


— 


— 


14 


37 


— 


15 


— 


— 


16 


— 


— 


17 


— 


— 


18 


— 


— 



— 


0-21 


0-15 


0-100 


>1 


0-1 


22-23 


16 


101-106 


1 


2 


24 


— 


107 


2 


— 


25 


17-18 


108-114 


3-5 


3 


26 


19 


115-118 


6-10 


— 


27-28 


20 


119-121 


11-18 


4 


29 


21 


122-126 


19-28 


— 


30-32 


22 


127-129 


29^0 


5 


33-34 


23 


130-133 


41-59 


— 


35 


24 


134-135 


60-71 


6 


36 


— 


136-137 


72-81 


— 


37 


25 


138 


82-89 


— 


38 


— 


139-140 


90-94 


— 


— 


— 


141 


95-97 


— 


39 


— 


142 


98 


— 


— 


— 


143 


99 


— 


— 


— 


144 


>99 



Source: From Rilling et al., 2005. Reprinted by permission of the Mayo Foundation for Medical Education and Research. 



154 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



Table 6-40 MOAANS Scaled Scores for Persons Aged 75-77 Years (Midpoint Age = 76, Age Range for Norms = 71-81, N= 119) 



Scaled 
Scores 



Dementia Rating Scale Subtests 



Attention Initiation/Perseveration Construction Conceptualization Memory Total 



Percentile Range 



2 


0-23 


0-22 


3 


24-27 


23-24 


4 


28 


25-26 


5 


29 


27-29 


6 


30-31 


30 


7 


32 


31-32 


8 


33 


33-34 


9 


34 


35 


10 


— 


36 


11 


35 


— 


12 


36 


37 


13 


— 


— 


14 


— 


— 


15 


37 


— 


16 


— 


— 


17 


— 


— 


18 


— 


— 





0-21 


0-15 


0-100 


<1 


i-l 


22-23 


16 


101-106 


1 




— 


— 


107 


2 


2 


24 


17 


108-111 


3-5 


3 


25-26 


18 


112-117 


6-10 




27 


19-20 


118-121 


11-18 


4 


28-29 


21 


122-125 


19-28 




30-31 


22 


126-128 


29^0 


5 


32-34 


23 


129-132 


41-59 




35 


24 


133-134 


60-71 


6 


36 


— 


135-137 


72-81 




37 


25 


138 


82-89 




38 


— 


139-140 


90-94 




— 


— 


141 


95-97 




39 


— 


142 


98 




— 


— 


143 


99 




— 


— 


144 


>99 



Source: From Rilling et al., 2005. Reprinted by permission of the Mayo Foundation for Medical Education and Research. 



and word and design recognition memory; and (3) Memory, 
consisting of sentence recall and orientation. Similar results 
have been reported by Woodard et al. (1996) in a sample of pa- 
tients with probable AD. Moderate correlations were also 
found between these factors and supplementary neuropsycho- 
logical measures, supporting the validity of these factors. On 
the other hand, Hofer et al. (1996) examined the factor struc- 



ture of the test in patients with dementia and healthy elderly 
controls grouped together. They found five factors, which they 
labeled Long-term Memory (Recall)/Verbal Fluency, Con- 
struction, Memory (Short-term), Initiation/Perseveration, and 
Simple Commands. The contrasting results (Hofer et al, 1996; 
Kessler et al., 1994 versus Colantonio et al., 1993; Woodard 
et al, 1996) highlight the fact that the resulting factor structure 



Table 6-41 MOAANS Scaled Scores for Persons Aged 78 + (Midpoint Age = 79, Age Range for Norms = 74-94, JV = 79) 

Dementia Rating Scale Subtests 
Scaled 

Scores Attention Initiation/Perseveration Construction Conceptualization Memory Total 



2 


0-23 


0-17 


3 


24-27 


18-23 


4 


28 


24 


5 


29 


25-27 


6 


30 


28-30 


7 


31 


31 


8 


32 


32 


9 


33 


33-34 


10 


34 


35-36 


11 


35 


— 


12 


36 


— 


13 


— 


37 


14 


— 


— 


15 


37 


— 


16 


— 


— 


17 


— 


— 


18 


— 


— 



0-1 

2 
3 

4 

5 

6 



0-17 
18-22 

23 

24 
25-26 

27 

28 
29-30 
31-33 
34-35 

36 

37 

38 

39 



lemory 


Total 


Percentile Range 


0-3 


0-78 


<1 


4-12 


79-103 


1 


13-16 


104-106 


2 


17 


107-110 


3-5 


18 


111-115 


6-10 


19 


116-118 


11-18 


20 


119-121 


19-28 


21 


122-126 


29^0 


22-23 


127-130 


41-59 


— 


131-133 


60-71 


24 


134-135 


72-81 


25 


136-138 


82-89 


— 


139 


90-94 


— 


140 


95-97 


— 


141 


98 


— 


142-143 


99 


— 


144 


>99 



Source: From Rilling et al., 2005. Reprinted by permission of the Mayo Foundation for Medical Education and Research. 



Dementia Rating Scale-2 (DRS-2) 155 



Table 6-42 Regression Formula for Age- and Education-Corrected 
MOAANS Scaled Scores for DRS Total 

Age- and education-corrected MOAANS scaled scores (MSS A&E ) 
can be calculated for DRS Total scores by using aged-corrected 
MOAANS scaled scores (MSS A ) and education (expressed in years 
completed) in the following formula: 



MSS A 



: 3.01 + (1.19 X MSS A ) - (0.41 x EDUC) 



Source: From Rilling et al., 2005. Reprinted by permission of the Mayo Foundation for 
Medical Education and Research. 



depends critically on the characteristics of the population 
studied including the severity of their impairment. 

Clinical Studies 

The DRS is useful in detecting cognitive impairment in older 
adults (e.g., Yochim et al., 2003). It can differentiate patients 
with dementia of the Alzheimer's type from normal elderly 
subjects (Chan et al., 2003; Monsch et al, 1995; Salmon et al., 
2002); is sensitive to early stages of dementia (Knox et al, 2003; 
Monsch et al, 1995; Salmon et al., 2002; Vitaliano, Breen, 
Albert, et al., 1984), even in individuals with mental retarda- 
tion (Das et al, 1995); and is useful in identifying stages (sever- 
ity) of impairment (Chan et al, 2003; Shay et al., 1991; 
Vitaliano, Breen, Albert, et al., 1984). Further, the DRS has 
demonstrated the ability to accurately track progression of 
cognitive decline, even in the later stages of AD (Salmon et al., 
1990). Although the DRS tends to be highly correlated with 
other mental status exams, such as the Information Memory 
Concentration (IMC) and MMSE, and shows equal sensitivity 
to dementia (Chan et al., 2003; van Gorp et al., 1999), it pro- 
vides more precise estimates of change than these tests, likely 
due to its wider sampling of item difficulty (Gould et al. 2001; 
Salmon et al, 1990). Therefore, to follow progression in se- 
verely demented patients, the DRS is clearly the instrument of 
choice. 

Unlike other standardized mental status examinations that 
were developed as screening instruments (e.g., MMSE), the 
DRS was designed with the intention of discriminating 
among patients with dementia. There is evidence that pattern 



analysis of the DRS can distinguish the dementias associated 
with AD from those associated with Huntington's disease 
(HD), Parkinson's disease (PD), or vascular dementia (VaD) 
(e.g., Cahn-Weiner et al., 2002; Kertesz & Clydesdale, 1994; 
Lukatela et al, 2000; Paolo et al, 1994; Paulsen et al, 1995; 
Salmon et al., 1989). Patients with AD display more severe 
memory impairment; patients with HD are more severely im- 
paired on items that involve the programming of motor se- 
quences (Initiation/Perseveration subtest), while patients with 
PD or VaD display more severe constructional problems. Pa- 
tients with PD may also show memory impairment on the 
DRS, although this might reflect the effects of depression 
(Norman et al., 2002). Further, patients with frontotemporal 
dementia are less impaired on the Memory subscale than AD 
patients (Rascovsky et al., 2002). It is worth noting that these 
various distinctions among patient groups emerge even when 
patients are similar in terms of overall level of cognitive im- 
pairment. 

These findings are generally consistent with neuroimaging 
findings suggesting relationships between particular brain re- 
gions and specific cognitive functions assessed by the DRS. 
For example, Fama et al. (1997) found that Memory subscale 
scores in patients with AD were related to MRI-derived hip- 
pocampal volumes, while Initiation/Perseveration scores were 
related to prefrontal sulcal widening. Others have also ob- 
served that scores on select subscales are related to the in- 
tegrity of specific brain regions. In patients with vascular 
dementia, performance on the Memory subscale is associated 
with whole brain volume, whereas the Initiation and Con- 
struction subscales are related to subcortical hyperintensities 
(Paul et al., 2001). Even in the absence of dementia, subcorti- 
cal ischemic vascular disease is associated with subtle declines 
in executive functioning, as measured by the Initiation/Perse- 
veration subscale (Kramer et al., 2002). 

There is evidence that depression impairs DRS perfor- 
mance, at least to some extent (Harrell et al., 1991; van 
Reekum et al., 2000). For example, one group of investigators 
(Butters et al., 2000) studied 45 nondemented, elderly de- 
pressed patients before and after successful treatment with 
pharmacotherapy. Among depressed patients with concomi- 
tant cognitive impairment at baseline, successful treatment of 



Table 6-43 DRS Test-Retest Reliability in People With Presumed AD 













Test-Retest 




Initial Test 


Retest 




Correlation 




M 


SD 


M 


SD 


r 


Total Score 


79.55 


33.98 


83.18 


30.60 


.97 


Attention 


23.55 


9.91 


24.16 


6.80 


.61 


Initiation/Perseveration 


21.36 


9.78 


22.00 


7.34 


.89 


Construction 


2.55 


1.81 


2.91 


1.70 


.83 


Conceptualization 


21.18 


10.58 


21.91 


9.28 


.94 


Memory 


10.91 


6.58 


12.20 


6.00 


.92 



Source: From Coblentz et al., 1973. Reprinted with permission from the AMA. 



156 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



depression was associated with gains on the DRS measures of 
Conceptualization and Initiation/Perseveration. Nonetheless, 
the overall level of cognitive functioning in these patients re- 
mained mildly impaired, especially in the Memory and Initia- 
tion/Perseveration domains. 

Although the DRS is typically used with older adults, it has 
found use in other groups as well. Thus, it has been given to 
adolescents and adults with mental retardation (spanning the 
spectrum from mild to severe; Das et al, 1995; McDaniel & 
McLaughlin, 2000). However, it should not be used to diag- 
nose mental retardation. 

Ecological Validity 

DRS scores show modest correlations with measures of func- 
tional competence (the ability to perform activities of daily 
living as well as engage in complex recreational activities; e.g., 
Cahn et al., 1998; LaBuda & Lichtenberg, 1999; Lemsky et al, 
1996; Loewenstein et al., 1992; Smith et al, 1994; Vitaliano, 
Breen, Albert, et al., 1984, Vitaliano, Breen, Russo, et al., 1984). 
In particular, the Initiation/Perseveration and Memory sub- 
tests have proved valuable indicators of functional status in 
the elderly (Nadler et al, 1993; Plehn et al., 2004). 

In addition, the DRS may be useful in predicting func- 
tional decline (Hochberg et al., 1989) and survival (Smith 
et al, 1994). Smith et al. (1994) reported that in sample of 274 
persons over age 55 with cognitive impairment, DRS total 
scores supplemented age information and provided a better 
basis for estimating survival than did gender or duration of 
disease. Median survival for those with DRS total scores below 
100 was 3.7 years. 



Other DRS Versions and Item Bias 

Attempts to develop other language versions of the DRS have 
met with varying success, perhaps because of cultural bias in- 
herent in the subscales or individual subscale items. For ex- 
ample, Hohl et al. (1999) found that Hispanic AD patients 
performed significantly worse than non-Hispanics in terms of 
total DRS score (on a translated version), despite being 



matched by MMSE score. This difference was accounted for 
primarily by poorer performance of the Hispanic patients, 
relative to the non-Hispanic patients, on the Memory and 
Conceptualization subtests. A Chinese version (Chan et al., 
2001, 2003) has also been developed that shows similar sensi- 
tivity and specificity as the English version. Comparison of 
age- and education-matched groups in Hong Kong and San 
Diego revealed differences in the pattern of subtest perfor- 
mance, even though groups did not differ in total DRS scores. 
Individuals in Hong Kong scored significantly higher than the 
San Diego participants on the Construction scale, whereas the 
opposite pattern was observed on the Initiation/Perseveration 
and Memory subscales. 

Woodard et al. (1998) investigated possible racial bias in 
the test by comparing 40 pairs of African American and Cau- 
casian dementia patients matched for age, years of education, 
and gender. Principal component analysis revealed similar 
patterns and magnitudes across component loadings for each 
racial group, suggesting no evidence of test bias. In addition, 
they identified only 4 of the 36 items of the DRS that showed 
differential item functioning: "palm up/palm down, fist 
clenched/fist extended, point out and count the As, and visual 
recognition." The implication is that the DRS may be used in 
both African American and Caucasian populations to assess 
dementia severity. Another study, by Teresi et al. (2000), found 
that most items of the Attention subscale of the DRS per- 
formed in an education-fair manner. 



Comparing Scores on Different Tests 

If a patient has been given a different mental status test (e.g., 
MMSE), one can translate the score on the test into scale-free 
units such as a z score or percentile score, or one can use a 
conversion formula. Equations have been developed to con- 
vert total scores from one test to the other (Bobholz & Brandt, 
1993; Meiran et al, 1996; Salmon et al, 1990), but given the 
mixed results in the literature (see Construct Validity), these 
should be used with caution. The equations are shown in 
Table 6-44. The formulas should be applied to similar pa- 
tients. 



Table 6-44 Conversion Formulas to Derive DRS Scores From the MMSE 



Test 



Formula 



Reference 



DRS 



DRS 



DRS 



41.53 + 3.26 (MMSE) 



33.86 + 3.39 (MMSE) 



45.5 + 3.01 (MMSE) 



Salmon et al., 1990 

Based on a sample of 92 patients with 

probable AD 

Bobholz & Brandt, 1993 

Based on a sample of 50 patients with suspected 

cognitive impairment 

Meiran et al, 1996 

Based on a sample of 466 patients in a memory 
disorders clinic; the expected error associated 
with this formula is ±11.1 



Dementia Rating Scale-2 (DRS-2) 157 



COMMENT 

The DRS is a fairly comprehensive screening test, evaluating 
aspects of cognition (e.g., verbal conceptualization, verbal flu- 
ency) not well assessed by tests such as MMSE. It also appears 
to be more useful than other measures (e.g., the MMSE) in 
tracking change. On the other hand, the DRS takes about four 
times longer to administer (Bobholz & Brandt, 1993) and may 
be more susceptible to cultural or educational factors in some 
populations (e.g., Hispanics) (Hohl et al., 1999). 

The summary score appears to have relatively good con- 
current and predictive validity. Given that the DRS may also 
be helpful in distinguishing among dementing disorders 
even in later stages of disease, the focus should also be on 
specific cognitive dimensions that the test offers. In this con- 
text, it is worth bearing in mind that the Conceptualization 
and Memory subscales appear fairly reliable and seem to 
represent discrete constructs. Construction, Attention, and 
Initiation/Perseveration items should also be administered, 
but given concerns regarding reliability and validity, their 
interpretation is more problematic. 

It is also important to note that the test is a screening de- 
vice and the clinician may need to follow up with a more in- 
depth investigation. Thus, individuals identified as cognitively 
impaired on the test should undergo additional assessment to 
determine the presence or absence of dementia. Further, al- 
though sensitive to differences at the lower end of function- 
ing, the DRS may not detect impairment in the higher ranges 
of intelligence (Jurica et al., 2001; Teresi et al., 2001). This is 
because the DRS was developed to avoid floor effects in clini- 
cally impaired populations rather than ceiling effects in high- 
functioning individuals (Turica et al., 2001). 

Some (Chan et al., 2002; Monsch et al., 1995) have sug- 
gested that an abbreviated version composed only of the 
Memory and Initiation/Perseveration subscales may be useful 
as a quick screening for individuals suspected of AD. This fo- 
cus is consistent with literature suggesting that deterioration 
of memory is an early, prominent symptom of the disease. 
However, with use of such a shortened procedure, the com- 
prehensive data on different aspects of cognitive functioning 
will be lost (Chan et al, 2003). 



REFERENCES 

Bank, A. L., Yochim, B. P., MacNeill, S. E., & Lichtenberg, P. A. (2000). 
Expanded normative data for the Mattis dementia rating scale for 
use with urban, elderly medical patients. The Clinical Neuropsy- 
chologist, 14, 149-156. 

Bobholz, J. H., & Brandt, J. (1993). Assessment of cognitive impair- 
ment: Relationship of the Dementia Rating Scale to the Mini- 
Mental State Examination. Journal of Geriatric Psychiatry and 
Neurology, 6, 210-213. 

Brown, G. G., Rahill, A. A., Gorell, J. M., McDonald, C., Brown, S. I., 
Sillanpaa, M., & Shults, C. (1999). Validity of the Dementia Rat- 
ing Scale in assessing cognitive function in Parkinson's disease. 
Journal of Geriatric Psychiatry and Neurology, 12, 180-188. 



Butters, M. A., Becker, J. T., Nebes, R. D., Zmuda, M. D., Mulsant, B. 
H., Pollock, B. G., & Reynolds, C. E, III. (2000). Changes in cogni- 
tive functioning following treatment of late-life depression. 
American Journal of Psychiatry, 157, 1949-1954. 

Cahn, D. A., Sullivan, E. V., Shear, P. K., Pfefferbaum, A., Heit, G., & 
Silverberg, G. (1998). Differential contributions of cognitive and 
motor component processes to physical and instrumental activi- 
ties of daily living in Parkinson's disease. Archives of Clinical Neu- 
ropsychology, 13, 575-583. 

Cahn-Weiner, D. A., Grace, J., Ott, B. R., Fernandez, H. H., & 
Friedman, ]. H. (2002). Cognitive and behavioural features dis- 
criminate between Alzheimer's and Parkinson's disease. Neu- 
ropsychiatry, Neuropsychology, & Behavioural Neurology, 15, 
79-87. 

Chan, A. S., Choi, A., Chiu, H., & Liu, L. (2003). Clinical validity of 
the Chinese version of Mattis Dementia Rating Scale in differen- 
tiating dementia of Alzheimer's type in Hong Kong. Journal of the 
International Neuropsychological Society, 9, 45-55. 

Chan, A. S., Salmon, D. P., & Choi, M-K. (2001). The effects of age, 
education, and gender on the Mattis Dementia Rating Scale per- 
formance of elderly Chinese and American individuals. Journal of 
Gerontology: Series B: Psychological Sciences & Social Sciences, 56B, 
356-363. 

Chase, T. N„ Foster, N. L., Fedio, P., Brooks, R., Mansi, L., & Di Chiro, 
G. (1984). Regional cortical dysfunction in Alzheimer's disease as 
determined by positron emission tomography. Annals of Neurol- 
ogy, 15, S170-S174. 

Coblentz, J. M., Mattis, S., Zingesser, L. H., Kasoff, S. S., Wisniewski, 
H. M., & Katzman, R. (1973). Presenile dementia. Archives of Neu- 
rology, 29, 299-308. 

Colantonio, A., Becker, J. T., & Huff, F. }. (1993). Factor structure of 
the Mattis Dementia Rating Scale among patients with probable 
Alzheimer's disease. The Clinical Neuropsychologist 7, 313-318. 

Das, I. P., Mishra, R. K., Davison, M., & Naglieri, J. A. (1995). Mea- 
surement of dementia in individuals with mental retardation: 
Comparison based on PPVT and Dementia Rating Scale. The 
Clinical Neuropsychologist, 9, 32-31 . 

Fama, R., Sullivan, E. V., Shear, P. K., Marsh, L., Yesavage, I., Tinklen- 
berg, I. R., Lim, K. O., & Pfefferbaum, A. (1997). Selective cortical 
and hippocampal volume correlates of Mattis Dementia Rating 
Scale in Alzheimer disease. Archives of Neurology, 54, 719-728. 

Freidl, W., Schmidt, R., Stronegger, W. J., Fazekas, E, & Reinhart, B. 
(1996). Sociodemographic predictors and concurrent validity of 
the Mini Mental State Examination and the Mattis Dementia Rat- 
ing Scale. European Archives of Psychiatry and Clinical Neuro- 
science, 246, 317-319. 

Freidl, W., Schmidt, R., Stronegger, W. J., & Reinhart, B. (1997). The 
impact of sociodemographic, environmental, and behavioural 
factors, and cerebrovascular risk factors as potential predictors on 
the Mattis Dementia Rating Scale. Journal of Gerontology, 52A, 
M111-M116. 

Freidl, W., Stronegger, W.-J-, Berghold, A., Reinhart, B., Petrovic, K., 
& Schmidt, R. (2002). The agreement of the Mattis Dementia 
Rating Scale with the Mini-Mental State Examination. Interna- 
tional Journal of Psychiatry, 17, 685-686. 

Gardner, R., Oliver-Munoz, S., Fisher, L., & Empting, L. (1981). Mat- 
tis Dementia Rating Scale: Internal reliability study using a dif- 
fusely impaired population. Journal of Clinical Neuropsychology, 3, 
271-275. 

Gould, R., Abramson, I., Galasko, D., & Salmon, D. (2001). Rate of 
cognitive change in Alzheimer's disease: Methodological ap- 



158 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



proaches using random effects models. Journal of the Interna- 
tional Neuropsychological Society, 7, 813-824. 

Harrell, L. E., Duvall, E., Folks, D. G., Duke, L., Bartolucci, A., Con- 
boy, T., Callaway, R., & Kerns, D. (1991). The relationship of high- 
intensity signals on magnetic resonance images to cognitive and 
psychiatric state in Alzheimer's disease. Archives of Neurology, 48, 
1136-1140. 

Hochberg, M. C, Russo, J., Vitaliano, P. P., Prinz, P. N., Vitiello, M. V., 
& Yi, S. (1989). Initiation and perseveration as a subscale of the 
Dementia Rating Scale. Clinical Gerontologist, 8, 27-41. 

Hofer, S. M., Piccinin, A. M., & Hershey, D. (1996). Analysis of struc- 
ture and discriminative power of the Mattis Dementia Rating 
Scale. Journal of Clinical Psychology, 52, 395-409. 

Hohl, U., Grundman, M., Salmon, D. P., Thomas, R. C, & Thai, L. J. 
(1999). Mini-Mental State Examination and Mattis Dementia 
Rating Scale performance differs in Hispanic and non-Hispanic 
Alzheimer's disease patients. Journal of the International Neu- 
ropsychological Society, 5, 301-307. 

lurica, P. J., Leitten, C. L., & Mattis, S. (2001). Dementia Rating Scale- 
2. Odessa, FL: Psychological Assessment Resources. 

Kantarci, K., Smith, G. E., Ivnik, R. I., Petersen, R. C, Boeve, B. E, 
Knopman, D. S., Tangalos, E. G., & lack, C. R. (2002). H magnetic 
resonance spectroscopy, cognitive function, and apolipoprotein E 
genotype in normal aging, mild cognitive impairment and 
Alzheimer's disease. Journal of the International Neuropsychologi- 
cal Society, 8, 934-942. 

Kertesz, A., & Clydesdale, S. (1994). Neuropsychological deficits in 
vascular dementia vs Alzheimer's disease. Archives of Neurology, 
51, 1226-1231. 

Kessler, H. R., Roth, D. L., Kaplan, R. E, & Goode, K. T. (1994). Con- 
firmatory factor analysis of the Mattis Dementia Rating Scale. 
The Clinical Neuropsychologist, 8, 451-461. 

Knox, M. R., Lacritz, L. H., Chandler, M. J., & Cullum, C. M. (2003). 
Association between Dementia Rating Scale performance and 
neurocognitive domains in Alzheimer's disease. The Clinical Neu- 
ropsychologist, 17, 216-219. 

Kramer, I. H., Reed, B. R., Mungas, D., Weiner, N. W., & Chui, H. C. 
(2002). Executive dysfunction in subcortical ischaemic vascular dis- 
ease. Journal of Neurology, Neurosurgery and Psychiatry, 72, 2 17-220. 

LaBuda, J., & Lichtenberg, P. (1999). The role of cognition, depres- 
sion, and awareness of deficit in predicting geriatric rehabilitation 
patients' IADL performance. The Clinical Neuropsychologist, 13, 
258-267. 

Lemsky, C. M., Smith, G., Malec, I. E, & Ivnik, R. J. (1996). Identify- 
ing risk for functional impairment using cognitive measures: An 
application of CART modeling. Neuropsychology, 10, 368-375. 

Loewenstein, D. A., Rupert, M. P., Berkowitz-Zimmer, N., Guterman, 
A., Morgan, R., Hayden, S. (1992) Neuropsychological test perfor- 
mance and prediction of functional capacities in dementia. Be- 
havior, Health, and Aging, 2, 149-158. 

Lucas, J. A., Ivnick, R. J., Smith, G. E., Bohac, D. L., Tangalos, E. G., 
Kokmen, E., Graff-Radford, N. R., & Petersen, R. C. (1998). Nor- 
mative data for the Mattis Dementia Rating Scale. Journal of Clin- 
ical and Experimental Neuropsychology, 20, 536-547. 

Lukatela, K, Cohen, R. A, Kessler, H., Jenkins, M. A., Moser, D. J., Stone, 
W. E, Gordon, N., & Kaplan, R. E (2000). Dementia Rating Scale 
performance: A comparison of vascular and Alzheimer's dementia. 
Journal of Clinical and Experimental Neuropsychology, 22, 445-454. 

Marcopulos, B. A., & McLain, C. A. (2003). Are our norms "normal"? 
A 4-year follow-up study of a biracial sample of rural elders with 
low education. The Clinical Neuropsychologist, 17, 19-33. 



Marcopulos, B. A., McLain, C. A., & Giuliano, A. J. (1997). Cognitive 
impairment or inadequate norms? A study of healthy, rural, older 
adults with limited education. The Clinical Neuropsychologist, 11, 
111-113. 

Marson, D. C, Dymek, M. P., Duke, L. W., & Harrell, L. E. (1997). 
Subscale validity of the Mattis Dementia Rating Scale. Archives of 
Clinical Neuropsychology, 12, 269-275. 

Mattis, S. (1976). Mental status examination for organic mental syn- 
drome in the elderly patient. In L. Bellak & T B. Karasu (Eds.), 
Geriatric psychiatry. New York: Grune and Stratton. 

Mattis, S. (1988). Dementia Rating Scale: Professional manual. 
Odessa, FL: Psychological Assessment Resources. 

McDaniel, W. E, & McLaughlin, T (2000). Further support for using 
the Dementia Rating Scale in the assessment of neuro-cognitive 
functions of individuals with mental retardation. The Clinical 
Neuropsychologist, 14, 72-75. 

Meiran, N., Stuss, D. T, Guzman, D. A., Lafleche, G., & Willmer, J. 
(1996). Diagnosis of dementia: Methods for interpretation of 
scores of 5 neuropsychological tests. Archives of Neurology, 53, 
1043-1054. 

Monsch, A. U., Bondi, M. W., Salmon, D. P., Butters, N., Thai, L. J., 
Hansen, L. A., Wiederholt, W. C, Cahn, D. A., & Klauber, M. R. 
(1995). Clinical validity of the Mattis Dementia Rating Scale in de- 
tecting dementia of the Alzheimer type. Archives of Neurology, 52, 
899-904. 

Montgomery, K. M. (1982). A normative study of neuropsychological 
test performance of a normal elderly sample (unpublished Master's 
thesis). University of Victoria, Victoria, British Columbia. 

Nadler, J. D., Richardson, E. D., Malloy, P. E, Marran, M. E., & 
Hostetler Brinson, M. E. (1993). The ability of the Dementia Rat- 
ing Scale to predict everyday functioning. Archives of Clinical 
Neuropsychology, 8, 449-460. 

Norman, S., Troster, A. I., Fields, J. A., & Brooks, R. (2002). Effects of 
depression and Parkinson's disease on cognitive functioning. 
Journal of Neuropsychiatry and Clinical Neurosciences, 14, 31-36. 

Paolo, A. M., Troster, A. I., Glatt, S. L., Hubble, J. P., & Roller, W. C. 
(1994). Utility of the Dementia Rating Scale to differentiate the de- 
mentias of Alzheimer's and Parkinson's disease. Paper presented to 
the International Neuropsychological Society, Cincinnati, OH. 

Paul, R. H., Cohen, R. A., Moser, D., Ott, B. R., Zawacki, T, Gordon, 
N., Bell, S., & Stone, W. (2001). Performance on the Mattis De- 
mentia Rating Scale in patients with vascular dementia: Relation- 
ships to neuro imaging findings. Journal of Geriatric Psychiatry & 
Neurology, 14, 33-36. 

Paulsen, J. S., Butters, N., Sadek, B. S., Johnson, B. S., Salmon, D. P., 
Swerdlow, N. R., & Swenson. M. R. (1995). Distinct cognitive pro- 
files of cortical and subcortical dementia in advanced illness. 
Neurology, 45, 951-956. 

Plehn, K., Marcopulos, B. A., & McLain, C. A. (2004). The relation- 
ship between neuropsychological test performance, social func- 
tioning, and instrumental activities of daily living in a sample 
of rural older adults. The Clinical Neuropsychologist, 18, 
101-113. 

Rascovsky, K., Salmon, D. P., Hi, G. J., Galasko, D., Peavy, G. M., 
Hansen, L. A., & Thai, L. J. (2002). Cognitive profiles differ in 
autopsy-confirmed frontotemporal dementia and AD. Neurology, 
1801-1807. 

Rilling, L. M., Lucas, J. A., Ivnik, R. J., Smith, G. E., Willis, F. B., Ferman, 
T. J., Petersen, R. C, & Graff-Radford, N. R. (2005). Mayo's Older 
African American Normative Studies: Norms for the Mattis De- 
mentia Rating Scale. The Clinical Neuropsychologist, 19, 229-242. 



Kaplan Baycrest Neurocognitive Assessment (KBNA) 159 



Salmon, D. P., Kwo-on-Yuen, P. R, Heindel, W. C, Butters, N., & Thai, 
L. J. (1989). Differentiation of Alzheiner's disease and Hunting- 
ton's disease with the Dementia Rating Scale. Archives of Neurol- 
ogy, 46, 1204-1208. 

Salmon, D. P., Thai, L. J., Butters, N., & Heindel, W. C. (1990). Longi- 
tudinal evaluation of dementia of the Alzheimer's type: A com- 
parison of 3 standardized mental status examinations. Neurology, 
40, 1225-1230. 

Salmon, D. P., Thomas, R. G., Pay, M. M., Booth, A., Hofstetter, C. R., 
Thai, L. J., & Katzman, R. (2002). Alzheimer's disease can be accu- 
rately diagnosed in very mildly impaired individuals. Neurology, 
59, 1022-1028. 

Schmidt, R., Freidl, W., Fazekas, R, Reinhart, P., Greishofer, 
P., Koch, M., Eber, B., Smith, G. E., Ivnik, R. J., Malec, J. R, Kok- 
men, E., Tangalos, E., & Petersen, R. C. (1994). Psychometric 
properties of the Mattis Dementia Rating Scale. Assessment, 1, 
123-131. 

Schmidt, K., & Mattis, S. (2004). Dementia Rating Scale-2: Alternate 
form. Lutz, FL: PAR. 

Schmidt, K. S., Mattis, P. J., Adams, J., & Nestor, P. (2005). Alternate- 
form reliability of the Dementia Rating Scale-2. Archives of Clini- 
cal Neuropsychology, 20, 435-441. 

Shay, K. A., Duke, L. W., Conboy, T, Harrell, L. E., Callaway, R., & 
Folks, D. G. (1991). The clinical validity of the Mattis Dementia 
Rating Scale in staging Alzheimer's dementia. Journal of Geriatric 
Psychiatry and Neurology, 4, 18-25. 

Smith, G. E., Ivnik, R. J., Malec, J. E, Kokmen, E., Tangalos, E. G., & 
Petersen, R. C. (1994). Psychometric properties of the Mattis De- 
mentia Rating Scale. Assessment, 1, 123-131. 

Teresi, J. A., Holmes, D., Ramirez, M., Gurland, B. J., & Lantigua, R. 
(2001). Performance of cognitive tests among different racial/eth- 
nic and education groups: Findings of differential item functioning 
and possible item bias. Journal of Mental Health and Aging, 17, 
79-89. 



Teresi, J. A., Kleinman, M., & Ocepek-Welikson, K. (2000). Modern 
psychometric methods for detection of differential item func- 
tioning: Application to cognitive assessment measures. Statistics 
in Medicine, 19, 1651-1683. 

Van Gorp, W. G., Marcotte, T. D., Sultzer, D., Hinkin, C, Mahler, M., 
& Cummings, J. L. (1999). Screening for dementia: Comparison 
of three commonly used instruments. Journal of Clinical and Ex- 
perimental Neuropsychology, 21, 29-38. 

Van Reekum, R., Simard, M., Clarke, D., Conn, D., Cohen, T, & 
Wong, J. (2000). The role of depression severity in the cognitive 
functioning of elderly subjects with central nervous system dis- 
ease. Journal of Psychiatry and Neuroscience, 25, 262-268. 

Vangel Jr., S. J., & Lichtenberg, P. A. (1995) Mattis Dementia Rating 
Scale: Clinical utility and relationship with demographic vari- 
ables. The Clinical Neuropsychologist, 9, 209-213. 

Vitaliano, P. P., Breen, A. R., Albert, M. S., Russo, J., & Prinz, P. N. (1984). 
Memory, attention, and functional status in community-residing 
Alzheimer type dementia patients and optimally healthy aged in- 
dividuals. Journal of Gerontology, 39, 58-64. 

Vitaliano, P. P., Breen, A. R., Russo, J., Albert, M., Vitiello, M., & Prinz, 
P. N. (1984). The clinical utility of the Dementia Rating Scale for 
assessing Alzheimer's patients. Journal of Chronic Disabilities, 
37(9/10), 743-753. 

Woodard, J. L., Auchus, A. P., Godsall, R. E., & Green, R. C. ( 1998). An 
analysis of test bias and differential item functioning due to race 
on the Mattis Dementia Rating Scale. Journal of Gerontology: Psy- 
chological Sciences, 53B, 370-374. 

Woodard, J. L., Salthouse, T. A., Godsall, R. E., & Green, R. C. (1996). 
Confirmatory factor analysis of the Mattis Dementia Rating Scale in 
patients with Alzheimer's disease. Psychological Assessment, 8, 85-91. 

Yochim, B. P., Bank, A. L., Mast, B. T, MacNeill, S. E., & Lichtenberg, 
P. A. (2003). Clinical utility of the Mattis Dementia Rating Scale 
in older, urban medical patients: An expanded study. Aging, Neu- 
ropsychology and Cognition, 10, 230-237. 



Kaplan Baycrest Neurocognitive Assessment (KBNA) 



PURPOSE 

The purpose of this test is to provide a comprehensive evalua- 
tion of cognitive abilities in individuals aged 20 to 89 years. 

SOURCE 

The test (including Test Manual, Stimulus Book, Response 
Chips, cassette, Response Grid, 25 Response Booklets, and 
Record Forms) can be ordered from the Psychological Cor- 
poration, 19500 Bulverde Road, San Antonio, TX 78259 (www 
.harcourtassessment.com). The kit costs $264 US. 

AGE RANGE 

The test can be given to individuals aged 20 to 89 years. 

DESCRIPTION 

The KBNA (Leach et al., 2000) was designed as a comprehen- 
sive test to identify and characterize mild as well as severe 



forms of cognitive dysfunction in adults, including the elderly. 
The KBNA is intended to capture within a reasonable time 
period (less than two hours) the full range of neuropsycho- 
logical functioning using tasks that derive from both behav- 
ioral neurology and psychometric approaches. Thus, the 
KBNA consists of 25 subtests similar in format to those found 
in widespread clinical use (Orientation, Sequences, Numbers, 
Word Lists 1 and 2, Complex Figure 1 and 2, Motor Program- 
ming, Auditory Signal Detection, Symbol Cancellation, Clocks, 
Picture Naming, Sentence Reading — Arithmetic, Reading 
Single Words, Spatial Location, Verbal Fluency, Praxis, Picture 
Recognition, Expression of Emotion, Practical Problem Solv- 
ing, Conceptual Shifting, Picture Description — Oral, Auditory 
Comprehension, Repetition, Picture Description — Written; 
see Table 6-45 for descriptions). 

The subtests were designed to measure specific aspects of 
functioning and to yield multiple scores (including error 
analyses) so that the clinician can evaluate the process or pro- 
cesses by which the person completed the tasks. Some of the 
subtest scores can also be combined to represent higher order 



Table 6-45 Description of KBNA Subtests 



Subtest 



Description 



1 


Orientation 


2 


Sequences 


3 


Numbers 


4 


Word Lists 1 


5 


Complex Figure 1 


6 


Motor Programming 


7 


Auditory Signal Detection 



10 

11 

12 
13 



18 
19 

20 
21 

22 
23 

24 

25 



Symbol Cancellation 

Clocks 

Word List 2 
Complex Figure 2 
Picture Naming 

Sentence Reading — Arithmetic 



14 


Reading Single Words 


15 


Spatial Location 


16 


Verbal Fluency 


17 


Praxis 



Picture Recognition 
Expression of Emotion 

Practical Problem Solving 
Conceptual Shifting 

Picture Description — Oral 
Auditory Comprehension 

Repetition 

Picture Description — Written 



Declarative memory for personally relevant information (e.g., date of birth, age) 

Mental control tasks (e.g., recite months of year in normal and reverse sequence, 
name letters that rhyme with word key) 

Recall set of telephone numbers in two oral-response trials and one written- 
response trial 

Learn and remember a list of 12 words on four list-learning trials; the list is 
categorized with four words representing each of three categories 

Copy of a complex figure 

Examinee performs five alternating movements with hands 

Examinee listens to a tape of alphabet letters and signals, by tapping, each time 
the letter A appears; subtest last for 195 seconds 

Examinee is presented with a page containing over 200 geometric figures and 
asked to circle the figures that match a designated target; time limit is 
2 minutes 

Consists of five components: free drawing (with hands set at 10 after 11), 
predrawn, copy, reading without numbers, and reading with numbers 

Free recall, cued recall, and recognition of target words from Word List 1 

Recall and recognition of Complex Figure 1 

Naming of 20 black-and-white line drawings; semantic and phonemic cues are 
provided if patient cannot name item 

One task requires examinee to read two word problems out loud and calculate 
the answers, using paper and pencil. The second task requires solving nine 
calculations involving addition, subtraction, or multiplication 

Examinee reads aloud a set of 10 words and five nonsense words 

Examinee is shown a series of figures, each consisting of a rectangle in which are 
arrayed three to seven dots. The design is hidden and the examinee is asked to 
place response chips on a response grid in the corresponding locations 

Consists of three components: "c" words, animals, and first names; 1 minute 
for each 

Tests of ideomotor praxis for intransitive (e.g., waving), transitive (e.g., turning 
a key), and buccofacial (e.g., blowing out a candle) movements. If patient 
fails to perform any movement, the patient is asked to imitate the 
examiner's performance 

Patient is presented with a series of 40 pictures including those from the Picture 
Naming task and is asked to indicate if each picture was presented before 

Examinee must demonstrate a series of facial expressions (angry, happy, 
surprised, sad); if patient fails, he or she is asked to imitate the 
examiner's expression 

Examiner reads aloud scenarios representing situations of urgency (e.g., if 
smelled smoke) and examinee has to indicate how he or she would respond 

Examinee is presented with a set of four line drawings that can be variously 
grouped and must indicate the three drawings that are alike and in what way. 
The examinee must then select three of the same four designs according to 
another shared attribute and state or describe the attribute 

Examinee must describe orally events depicted in a line drawing 

Examinee is read five questions (e.g., "Do you put on your shoes after your 
socks?") and must respond with yes or no 

Examinee is asked to repeat five orally presented items, ranging from single 
words (e.g., president) to complete sentences (e.g., "If he comes, I will go") 

Examinee is asked to describe events depicted in a line-drawn scene 



Kaplan Baycrest Neurocognitive Assessment (KBNA) 161 



Table 6-46 KBNA Indices and Contributing Scores 



Index 

Attention/Concentration 

Memory — Immediate Recall 
Memory — Delayed Recall 



Memory — Delayed 
Recognition 



Spatial Processing 
Verbal Fluency 



Reasoning/Conceptual 
Shifting 

Total 



Scores 

Sequences total score 
Spatial Location adjusted 
score 

Word Lists 1 — Recall Total score 
Complex Figure 1 — Recall 
Total score 

Word Lists 2 — Recall Total score 
Complex Figure 2 — Recall 
Total score 

Word Lists 2 — Recognition 

Total score 
Complex Figure 2 — 

Recognition Total score 

Complex Figure 1 — Copy/Clocks 
combined score 

Verbal Fluency — Phonemic score 
Verbal Fluency — Semantic 
score 

Practical Problem Solving/ 
Conceptual shifting 
combined score 

Attention/Concentration Index 
Memory — Immediate 

Recall Index 
Memory — Delayed 

Recall Index 
Memory — Delayed 

Recognition Index 
Spatial Processing Index 
Verbal Fluency Index 
Reasoning/Conceptual 

Shifting Index 



domains of functioning that are represented by eight indices. 
The indices and their contributing subtests (or components 
of subtests) are shown in Table 6-46. Note that only half of 
the subtests contribute to the indices. Subtests with acceptable 
score distributions are included in the indices; the other tests, 
not in the indices, represent those with highly skewed distri- 
butions. 



ADMINISTRATION 

Directions for administering the items, time limits, and cor- 
rect responses appear in the test manual. Instructions for 
recording information for each subtest are also provided on 
the record form. The subtests should be given in the num- 
bered order in which they appear in the subtest instructions 
because the durations of the delay intervals are determined by 
the administration of the intervening tasks. The authors rec- 
ommend that the test be given in one test session. However, 
not all subtests need be given to each client (L. Leach, personal 



communication, November 7, 2002); the choice depends on 
whether specific problems need more elaborate testing. In ad- 
dition, the authors caution that circumstances may warrant 
deviations from the order and time frame. For example, in 
cases of suspected aphasia, it is suggested that the examiner 
begin with the Picture Naming subtest since doing so may al- 
low the examiner to assess the impact of language disturbance 
on tests of memory and comprehension. 

ADMINISTRATION TIME 

The test can be administered in about two hours. 



SCORING 

Most of the subtests require little interpretation of scoring cri- 
teria. However, detailed scoring criteria are provided (see Ap- 
pendix A of the test manual) for Complex Figure 1, Picture 
Description — Oral, Picture Description — Written, and the 
Free Drawing Component of Clocks. 

The record form provides space to note responses, to con- 
vert raw scores to age-based scaled scores (1-19, M= 10, 
SD= 3), and to index scores (T score, M= 50, SD= 10) using 
tables in Appendices B and C of the Examiner's Manual. 
Percentile-rank equivalents and confidence intervals at the 
90% and 95% levels are also provided for the index scores 
(Appendix C of the manual). A graph is also available to plot 
the index scores. The various process scores can also be con- 
verted to percentile ranges (<2, 2-16, >16) using age-based 
tables in Appendix D of the manual. These bands correspond 
to below average, equivocal, and average, respectively. Subtest 
and index discrepancy criteria are also provided (Appendix E). 
The difference between pairs of subtest or index scores re- 
quired for statistical significance (.05 level) ranges from 
about four to six points. Discrepancies between indices of 
about 20 points or more would be considered rare, although 
this varies considerably depending upon the particular in- 
dices selected. 



DEMOGRAPHIC EFFECTS 

Age 

The test authors noted that age impacts performance, with 
scores declining with advancing age. Accordingly, perfor- 
mance is evaluated relative to age-based norms. 

Education 

The authors report that level of education may also impact 
performance, and they provide some data (Table 5.2 in the 
test manual) showing that performance across the various in- 
dices increases with increasing education. However, the vari- 
ous subtest and index scores are not broken down by age as 
well as education. 



162 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



Ethnicity 

No information is available. 

NORMATIVE DATA 

Standardization Sample 

Norms presented in the KBNA Manual are based on a sample 
of 700 healthy individuals, aged 20 to 89 years, considered 
representative of the U.S. population (See Table 6-47). 



tests (see Table 6-48). Many of these stability coefficients are 
low due to truncated distributions. Classification as to extent 
of impairment (below average, equivocal, average) for many 
of the subtests was, however, relatively consistent from test to 
retest (see KBNA Manual). 

Mean retest scores were generally higher than initial scores 
(particularly in the areas of memory and spatial processing 
where gains of about 8 to 10 standard score points were evi- 
dent); however, on two of the indices (Memory — Delayed 
Recognition, Verbal Fluency), scores declined by about one 
point on retest. 



RELIABILITY 

Internal Consistency 

Split-half reliability coefficients for the subtests vary depend- 
ing upon the age group. Considering the average reliability 
coefficients, they range from marginal (Sequences, r— .67) to 
high (Word Lists 2-Recall, r— .90) (see Table 6-48). The aver- 
age reliability coefficients for the index scores are reported by 
the authors to be in the .70s to .80s. The average reliability for 
the total scale score is .81. 

Test-Retest Reliability and Practice Effects 

The stability of KBNA scores was evaluated in 94 adults (from 
the normative sample; L. Leach personal communication, No- 
vember 15, 2004) who were retested following an interval 
ranging from two to eight weeks. Reliability coefficients were 
high for the total scale (.85), but lower for the individual sub- 



Table 6-47 Characteristics of the KBNA Normative Sample 



Number 


700 


Age 


20-89 years 3 


Geographic location 


Proportional representation from 




northeast, north central, south, and 




west regions of the United States 


Sample type 


Stratified sample according to 1999 




U.S. Census data 


Education 


<8 to >16 years 


Gender 


Approximate census proportions of 




males and females in each age group. 




About 53% female overall 


Race/ethnicity 


For each age group, based on racial/ 



Screening 



ethnic proportions of individuals in 
those age bands in the U.S. 
population according to Census data. 
About 78% Caucasian 

Screened by self-report for medical 
and psychiatric conditions that could 
affect cognitive functioning 



"Broken down into seven age groups: 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, and 
80-89, each consisting of 100 participants. 



Interrater Reliability 

Interscorer agreement for some of the subtests that require in- 
terpretation (e.g, Complex Figure; Picture Description — Oral, 
Written; and the Free Drawing Component of Clocks) is not 
provided, raising concern about the interpretation of these 
scores. 



VALIDITY 

Construct Validity 

The authors report correlational data suggesting that the 
KBNA is broadly consistent with the results of other global 
measures of cognitive status. Thus, the authors report that the 
KBNA total index score correlated .67 with the WASI FSIQ in 
about 500 nonimpaired people who participated in the stan- 
dardization sample. In a small (N— 14) mixed clinical sample, 
the KBNA total index score correlated fairly strongly (.82) 
with the total score on the Dementia Rating Scale (DRS). 

The subtests comprising each index were determined by 
the authors in part on theoretical grounds. However, intercor- 
relations between subtests comprising an index are low (see 
Table 6-49), raising questions regarding the meaning of these 
indices in individual patients. The manual provides some evi- 
dence of construct validity by examining the pattern of inter- 
correlations among the various indices. The authors note that 
that the correlations between related KBNA indices (i.e., Im- 
mediate, Delayed, and Recognition Memory) are relatively 
high (.6-8), while correlations with other indices are relatively 
lower. However in light of the low correlations between sub- 
tests, further research is needed to determine whether these 
index scores are measuring distinct cognitive constructs. 

The authors provide some evidence of convergent validity 
for some of the KBNA indices in small, clinically mixed sam- 
ples. Thus, correlations between the KBNA Attention/Concen- 
tration Index and various measures of attention on the WAIS-R 
(Digit Symbol, r- .64; Digit Span, r- .76) and WMS-III (Spa- 
tial Span, r— .71) were high, with one exception (WMS-III 
Mental Control, r= .24). With regard to the memory indices, 
verbal memory (CVLT) scores showed a moderate to high de- 
gree of association with the relevant memory indices of the 
KBNA (.48-.77); however, correlations were low between the 
Rey-O Delayed Recall measure and the delayed measures of 



Kaplan Baycrest Neurocognitive Assessment (KBNA) 163 



Table 6-48 Magnitude of Reliability Coefficients of KBNA Subtests and Indices 



Magnitude of Coefficient Internal Consistency 



Very high (.90+) 
High (.80-.89) 



Adequate (.70-79) 



Marginal (.60-.69) 



Low (<59) 



Word List 2— Recall 

Word Lists 1 

Word List 2 — Recognition 
Complex Figure 1 — Recall 
Complex Figure 2 — Recognition 
Attention/Concentration Index 
Memory — Immediate 

Recall Index 
Memory — Delayed 

Recall Index 
Spatial Processing Index 
Total Index 

Complex Figure 2 — Recall 
Complex Figure 1 — Copy/Clocks 
Spatial Location 
Practical Problem Solving/ 

Conceptual Shifting 
Memory — Delayed 

Recognition Index 
Verbal Fluency Index 
Reasoning/Conceptual 

Shifting Index 

Sequences 

Verbal Fluency — Phonemic 

Verbal Fluency — Semantic 



Test-Retest 



Sequences 

Attention/Concentration Index 
Total Index 



Word Lists 1 
Word Lists 2 — Recall 
Verbal Fluency — Semantic 
Memory-Immediate Recall Index 
Memory — Delayed Recall Index 
Memory — Delayed Recognition 
Index 



Complex Figure 1 — Recall 
Complex Figure 2 — Recognition 
Spatial Location 
Spatial Processing Index 

Word Lists 2 — Recognition 
Complex Figure 2 — Recall 
Complex Figure 1 — Copy/Clocks 
Practical Problem Solving/ 

Conceptual Shifting 
Verbal Fluency Index 
Reasoning/Conceptual Shifting 

Index 



the KBNA (-.03 to -.12). As might be expected, the KBNA 
spatial processing index correlated most strongly with the 
Rey-O copy score, and the KBNA Reasoning/Conceptual 
Shifting Index showed a strong relation to WAIS-R measures 
of crystallized (Vocabulary, r— .80) and fluid (Block Design, 
r=.81) ability. While FAS verbal fluency correlated highly 
with the KBNA Verbal Fluency-Phonemic score (r= .91), as- 
sociations between KBNA Verbal Fluency and the Boston 
Naming Test were moderate (r= .49). 

It should also be noted that no information is provided in 
the manual regarding the construct validity of subtests/pro- 
cess scores not included in the indices. 

Clinical Studies 

The authors compared a mixed sample, consisting mostly of 
patients (size of sample not reported) with dementia or head 
injury, with a matched group of controls and found that the 



patients scored significantly below the nonimpaired individu- 
als on all indices except the Reasoning/Conceptual Shifting 
Index. Unfortunately, no information is provided regarding 
the severity of impairments in the clinical sample. Therefore, 
it is not clear if the KBNA is sensitive to mild forms of cogni- 
tive disturbance. 

COMMENT 

The KBNA is not intended as a brief screening tool. Rather, 
the goal of the authors was to design a measure that could 
identify domain-specific disorders while maintaining a reason- 
able administration time. Thus, it has the potential to provide 
considerable information regarding an individual's function- 
ing in an efficient manner. In addition to assessing a wide ar- 
ray of traditional areas (e.g., memory, attention, naming), it 
measures behaviors commonly overlooked by neuropsycholo- 
gists (e.g., praxis, emotion expression). Further, the battery 



164 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



Table 6-49 Intercorrelations Between Subtests Comprising 
Indices for Ages 20-89 



Index 

Attention/Concentration 
Memory — Immediate Recall 
Memory — Delayed Recall 
Memory — Delayed Recognition 
Verbal Fluency 



Correlation Between 
Subtests 

.42 
.20 
.24 

.22 
.57 



Note that the Spatial Processing and Reasoning/Conceptual Shifting Index scores are 
direct linear transformations of their respective contributing scaled scores. 



approach facilitates cross-subtest comparisons afforded by a 
common normative sample. The quantification of process 
scores (e.g., intrusions, repetitions, perseverations, semantic 
errors) is also an asset. 

However, the KBNA is a new tool, and some of its psycho- 
metric properties are less than optimal. Although age-based 
normative data are provided, users cannot evaluate test per- 
formance in the context of both age and education. Further, 
information on the impact of other demographic variables 
(e.g., sex, ethnicity/culture) is not reported. The lack of data 
on the impact of ethnicity suggests considerable caution in 
the use of the test with members of minority groups. 

Many subtests (except Word Lists 1 and Phonemic and Se- 
mantic Fluency) have truncated floors and ceilings. For exam- 
ple, an 80-year-old who recalls nothing of the complex figure 
following the delay obtains a scaled score of 5. Similarly, a 20- 
year-old who recognizes all of the items from the Complex 



Figure only achieves a scaled score of 12. Accordingly, scores 
from the various subtests must be interpreted with care. 

Information needs to be provided regarding interrater relia- 
bility. Another consideration is that some of the subtests have re- 
liabilities necessitating considerable caution in interpretation — 
particularly when the issue of change is of concern. Users are en- 
couraged to refer to tables in the test manual (Tables 3.7-3.12) 
to determine the confidence that can be had in the accuracy of a 
particular score. 

In addition, the meaning of many of the subtests is unclear 
and the evidence supporting interpretation of the indices is 
rather weak. How effective the test is at identifying and char- 
acterizing different disorders is also not known, and whether 
it is more sensitive to impairment than other tests (e.g., DRS, 
RBANS) remains to be determined. Its sensitivity to change/ 
progression of disorder and relation to functional capacity 
also needs study. 

The KBNA yields a variety of scores, including index 
scores, subtest scaled scores, process scores, and discrepancy 
scores. Given the large number of scores that are evaluated in 
this battery, the process of score conversion/recording is cum- 
bersome. A computerized scoring program would be benefi- 
cial. A scoring template for the Symbol Cancellation task 
would also be helpful. 



REFERENCE 

Leach, L., Kaplan, E., Rewilak, D., Richards, B., & Proulx, B-B. (2000). 
Kaplan Baycrest Neurocognitive Assessment. San Antonio, TX: The 
Psychological Corporation. 



Kaufman Brief Intelligence Test (K-BIT) 



PURPOSE 

The aim of this individually administered test is to provide a 
brief estimate of intelligence for screening and related purposes. 



SOURCE 

The kit (including Manual, easel, 25 Record Forms, and carry 
bag) can be ordered from the American Guidance Service, 
4201 Woodland Road, Circle Pines, MN 55014-1796 
(www.agsnet.com). A new version, the K-BIT-2, has been re- 
cently released but is not yet available to us. The cost is 
$199.99 US for the new version. 



AGE RANGE 

The test can be given to individuals aged 4 through 90 years. 

DESCRIPTION 

The test (Kaufman & Kaufman, 1990) is based on the mea- 
surement of both verbal and nonverbal abilities. It consists of 



two subtests, Vocabulary and Matrices, that are presented in 
an easel format. The Vocabulary subtest provides an estimated 
Verbal IQ, the Matrices subtest provides an estimated Nonver- 
bal IQ, and the scores from both measures provide a Compos- 
ite IQ. 

Subtest 1, Vocabulary, is an 82-item measure of verbal abil- 
ity that demands oral responses for all items. Part A, Expres- 
sive Vocabulary (45 items), administered to individuals of all 
ages, requires the person to provide the name of a pictured 
object such as a lamp or calendar. Part B, Definitions (37 
items), administered to individuals 8 years and older, requires 
the person to provide the word that best fits two clues (a 
phrase description and a partial spelling of the word). For ex- 
ample, a dark color: BR W . The Vocabulary subtest mea- 
sures word knowledge and verbal concept formation and is 
thought to assess crystallized intelligence. 

Subtest 2, Matrices, is a 48-item nonverbal measure com- 
posed of several types of items involving visual stimuli, both 
meaningful (people and objects) and abstract (designs and 
symbols). All items require understanding of relations among 
stimuli, and all are multiple choice, requiring the patient 
either to point to the correct response or to say the letter 



Kaufman Brief Intelligence Test (K-BIT) 165 



corresponding to the position of the item. For the easiest 
items, the patient selects which one of five items goes best 
with a stimulus picture (e.g., a car goes with a truck). For the 
next set of items, the patient must choose which one of six or 
eight pictures completes a2x2or3x3 matrix. Abstract ma- 
trices were popularized by Raven (1956, 1960) as a method of 
assessing intelligence in a more "culture-fair" manner. The 
ability to solve visual analogies, especially those with abstract 
stimuli, is considered an excellent measure of fluid reasoning 
(i.e., the ability to be flexible when encountering novel 
problem-solving situations). 

ADMINISTRATION 

Directions for administering the items, the correct responses, 
and examples of typical responses that should be queried ap- 
pear on the easel facing the examiner. The only task that re- 
quires timing is Definitions. Clients are allowed 30 seconds to 
respond to each item. Starting points for each task are tailored 
to the patient's age. Items are organized in units, and the ex- 
aminer discontinues the task if the patient fails every item in 
one unit. The order of K-BIT tasks (Vocabulary, Matrices) is 
fixed but not inviolable. The inclusion of both verbal and 
nonverbal subtests in the K-BIT allows the examiner flexibil- 
ity when testing a patient with special needs (e.g., patients 
with aphasic disorders can be given only the Matrices sub- 
test). 

ADMINISTRATION TIME 

The test can be administered in about 15 to 30 minutes, de- 
pending in part on the age of the patient. 

SCORING 

The examiner records scores on each task and converts, by 
means of tables provided in the K-BIT Manual, raw scores to 
age-based standard scores (M= 100, SD= 15) for the sepa- 
rate subtests (Vocabulary and Matrices) and the total scale 
(the K-BIT IQ Composite). Space is also provided on the 
record form to record confidence intervals (a 90% confi- 
dence interval is recommended), percentile ranks, descrip- 
tive categories (e.g., average, below average, etc.), normal 
curve equivalents, and stanines, using tables provided in the 
K-BIT Manual. Examiners can also compare the patient's 
performance on the two K-BIT subtests to determine if the 
difference is significant and unusual by referring to tables in 
the manual. 

DEMOGRAPHIC EFFECTS 

Age 

Age affects performance. Raw scores on the Expressive Vocab- 
ulary task increase steadily from ages 4 to 15 years, begin to 
peak at about age 16, and maintain that same high level 
through age 74 before declining in the oldest age group (see 



Source). Mean raw scores on the Definitions task increase 
steadily from ages 6 to 44 years before declining gradually 
from ages 45 to 69 years; a large decrease is evident for ages 70 
and above (see Source). Average raw scores on the Matrices 
subtest increase steadily from ages 4 to 17, peak at ages 17 to 
19, and decline steadily across the rest of the adult age range 
(see Source). 

Education 

In clinical populations, K-BIT scores show a moderate rela- 
tion with educational attainment (r= about .40) (Hays et al., 
2002; Naugle et al, 1993). 

Gender 

Item analysis suggests no consistent gender differences (Web- 
ber & McGillivray, 1998); that is, no single item appears easier 
for one gender. 

Ethnicity 

No information is available. 

NORMATIVE DATA 

Standardization Sample 

The K-BIT was normed on a nationwide standardization 
sample of 2022 people, ages 4 to 92 years (age 4-6: N= 327; 
age 7-19: N= 1195; age 20-44: N- 320; age 45-92: N= 180), 
stratified according to 1985 and 1990 U.S. Census data on 
four background variables: gender, geographic region, socioe- 
conomic status, and race or ethnic group (see Table 6-50). 
Educational attainment of subjects ages 20 to 90 and of the 
parents of subjects ages 4 to 19 was used to estimate socioeco- 
nomic status. 

RELIABILITY 

Internal Consistency 

Split-half reliability coefficients for Vocabulary are excellent, 
ranging from .89 to .98 (M=.92) depending upon the age 
range. Matrices split-half coefficients range from .74 to .95 
(M= .87). Matrices coefficients for very young children, ages 
4 to 6 years, are acceptable (M= .78) but improve with older 
age groups. The split-half reliability of the K-BIT IQ compos- 
ite is excellent, with values ranging from .88 to .98 (M= .93) 
(see Source; Webber & McGillivray, 1998). 

Standard Errors of Measurement 

The K-BIT IQ Composite and Vocabulary standard scores 
have an average standard error of measurement (SEM) of 
about four points across the entire age range, whereas the Ma- 
trices standard scores have an average SEM of about 5.5 



166 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



Table 6-50 Characteristics of the K-BIT Normative Sample 



Number 


2022 


Age 


4-90 years* 


Geographic location 


Proportional representation from 




northeast, north central, south, 




and west regions of the 




United States 


Sample type 


Stratified sample so that equal 




numbers of males and females 




tested and the sample had same 




proportional distribution as the 




U.S. population across dimensions 




of geographic region, SES, and 




race or ethnic group according to 




1985 and 1990 U.S. Census data 


Education 


<4 to >16years b 


Gender 




Male 


50.4% 


Females 


49.6% 


Race/ethnicity 




Caucasian 


72% 


African American 


14.8% 


Hispanic 


9.4% 


Other 


3.8% 


Screening 


Not reported 



a The oldest person was actually 92 years old. 

b For ages 4-19 parental education used; ages 20-90 rely on examinee's educational 

level. 



points. A slightly higher standard error was found for the Ma- 
trices subtest (about 8.0) in Australian adolescents with an in- 
tellectual disability (Webber & McGillivray, 1998). 

Test-Retest Reliability and Practice Effects 

Test-retest reliability was evaluated by administering the K- 
BIT twice to 232 normal children and adults ages 5 to 89. The 
interval between tests ranged from 12 to 145 days, with a 
mean of 21 days. Reliability coefficients are high (>.90) for all 
age groups (see Source; see also Webber & McGillivray, 1998). 
Slight practice effects emerge following such short retest peri- 
ods. One can expect increases of about three standard score 
points on the K-BIT IQ Composite and about two to four 
standard score points on the Vocabulary and Matrices sub- 
tests on retest. These small increases caused by practice apply 
equally to all age groups. Thompson et al. (1997) reported 
similar findings. 

VALIDITY 

Age Differentiation 

As evidence of construct validity, a test that purports to mea- 
sure intelligence must demonstrate age differentiation. Con- 
sistent with expectation (Horn, 1985), average raw scores on 



the Expressive Vocabulary task increase steadily during child- 
hood, peak at about age 16, and maintain that same high level 
through age 74 before declining for the oldest age group. Raw 
scores on the Definitions task increase steadily to 44 years and 
then decline gradually. Average raw scores on the Matrices 
subtest increase up to 17 years and decline steadily with ad- 
vancing age (see Demographic Effects and Source). 

Correlations to IQ Tests 

K-BIT verbal and nonverbal IQ scores correlate moderately 
well (r— about .59) with one another (see Source). K-BIT IQ 
scores also correlate well with other measures of intelligence 
such as the Wechsler ( WISC-R, WISC-III, WAIS-R, WASI), 
the Stanford-Binet, the Kaufman Assessment Battery for 
Children, the Peabody Picture Vocabulary Test-3, the Ship- 
ley Institute for Living Scale, the Slosson, the Raven's Col- 
ored Progressive Matrices and the Test of Nonverbal 
Intelligence (see Source; Bowers & Pantle, 1998; Chin et al., 
2001; Donders, 1995; Eisenstein & Engelhart, 1997; Grados 
& Russo-Garcia, 1999; Lafavor & Brundage, 2000; Naugle 
et al, 1993; Powell et al, 2002; Prewett, 1992a, 1992b, 1995; 
Prewett & McCaffery, 1993; Thompson et al, 1997; Webber 
& McGillivray, 1998). Correlations ranging from .61 to .89 
have been reported between K-BIT Composite and Wechsler 
Full-Scale IQ scores. Correlations tend to be higher for the 
Vocabulary/VIQ and Composite/Full-Scale IQ indices than 
for Matrices and Performance IQ scores. For example, in a 
heterogeneous group of 200 patients referred for neuropsy- 
chological assessment, Naugle et al. (1993) reported that 
correlations between the Verbal, Nonverbal, and Composite 
scales of the two measures were .83, .77, and .88, respectively. 
K-BIT scores tend to be about five points higher than their 
Wechsler counterparts (Chin et al., 2001; Eisenstein & Engel- 
hart, 1997; Grados & Russo-Garcia, 1999; Hays et al., 2002; 
Naugle et al, 1993; Prewett, 1995; Thompson et al, 1997; 
Webber & McGillivray, 1998). However, standard errors of 
estimation between corresponding indices tend to be large 
(6-12 points), and in a significant proportion of individuals 
(more than 20%), K-BIT scores under- or overestimate 
Wechsler scores by 10 points or more (Axelrod & Naugle, 
1998; Chin et al, 2001; Donders, 1995; Thompson et al., 
1997). For example, Naugle et al. (1993) reported that K-BIT 
composite scores ranged from 12 points lower to 22 points 
higher than WAIS-R FSIQ; 5% of the differences between 
tests exceeded 15 points, or one standard deviation. In short, 
in individual cases, K-BIT scores can differ markedly from 
their corresponding Wechsler scores. If the aim is to predict 
Wechsler IQ indices, then a more accurate estimate can be 
obtained by using a Wechsler short form (Axelrod & Naugle, 
1998; Thompson et al., 1997). Thus, Axelrod and Naugle 
(1998) noted that on a seven-subtest short form, 92% of 
the cases fell within 5% of their full WAIS-R score, while on 
the K-BIT only 50% fell within five points. Even two- and 
four-subtest short forms of the Wechsler test do better than the 
K-BIT in predicting Wechsler IQ scores (Axelrod & Naugle, 



Kaufman Brief Intelligence Test (K-BIT) 167 



1998; Eisenstein and Engelhart, 1997). K-BIT IQ composite 
scores are on average about five points lower than the mean 
Stanford-Binet Test Composite (Prewett & McCaffery, 1993). 

Like longer tests of intelligence (e.g., the Wechsler tests), 
the K-BIT includes measures of both verbal and nonverbal in- 
telligence. For this reason, the K-BIT has an advantage over al- 
ternative screening measures such as the Peabody Picture 
Vocabulary Test-3, the Test of Nonverbal Intelligence, or the 
Raven Progressive Matrices, which tap primarily one type of 
ability (Naugle et al., 1993). However, tests such as the WASI 
(with its four subtests: Vocabulary, Similarities, Block Design, 
Matrix Reasoning) show lower correlations among subtests 
and therefore may provide the clinician with more clinically 
meaningful information by tapping a broader array of cogni- 
tive functions than the K-BIT (Hays et al., 2002). Wechsler 
short forms may also be appropriate in some situations (see 
review in this volume). 

Although the verbal-nonverbal dichotomy invites the user 
to contrast the two, the discrepancy derived from the K-BIT 
tends to correlate only modestly (.23-59) with that of the 
WAIS-R (Naugle et al., 1993). Structural equation analysis of 
the K-BIT and the WAIS-R in a sample of neurologically im- 
paired adults revealed that the K-BIT Vocabulary subtest had 
a significant visual-spatial component (Burton et al., 1995). 
That is, the K-BIT appears to provide less of a differentiation 
between verbal and nonverbal intellectual functions than the 
WAIS-R. Given this substantial visual-spatial component on 
the Vocabulary subtest, the K-BIT Verbal IQ may give a spuri- 
ously low estimate of verbal intelligence in persons with 
visual-spatial difficulties and consequently may obscure per- 
formance discrepancies between K-BIT Verbal and Matrices 
IQs (Burton etal, 1995). 

Correlations With Other Cognitive Tests 

Additional evidence of validity comes from using achieve- 
ment tests as the criteria. K-BIT Composite scores correlate 
moderately well (.3-8) with measures of achievement, such 
as the WRAT-R/3, the Kaufman Test of Educational Achieve- 
ment, and the K-FAST (see Source; Bowers & Pantle, 1998; 
Klimczak et al., 2000; Powell et al., 2002; Prewett & McCaffery, 
1993). Correlations with measures of memory (CVLT) appear 
high (.57-.68) (Powell et al, 2002). 

Clinical Studies 

There is evidence that the test is sensitive to severe forms of 
central nervous system disturbance. Donovick et al. (1996) re- 
ported that neurosurgical patients in the acute stages of re- 
covery and psychiatric patients performed more poorly than 
healthy college students, individuals with closed head injuries, 
and children with learning disabilities. However, the test ap- 
pears to be relatively insensitive to deficits that may occur in 
those with mild or moderate head injury (Donovick et al., 
1996). Donders (1995) reported that in children with loss of 
consciousness greater than 30 minutes (mean length of coma 



about six days), severity of traumatic brain injury (as mea- 
sured by length of coma) was not related to K-BIT indices, 
whereas correlations were statistically significant for several 
WISC-III indices. The K-BIT does not emphasize speed of 
performance, and this fact may explain the reason why it 
showed little sensitivity to sequelae associated with traumatic 
brain injury. 

COMMENT 

The K-BIT is a useful screening measure of verbal, nonverbal, 
and general intellectual ability when global estimates are suffi- 
cient (e.g., for research purposes or when a subtest profile is 
not needed), when the examiner wishes to avoid speeded 
tasks, when time constraints or functional abilities of the pa- 
tient preclude the use of a longer measure, or when the pa- 
tient is familiar with standard tests such as the Wechsler. It is 
also relatively simple to administer, does not require manual 
or rapid responses, and covers a wide age range. Psychometric 
properties (reliability, validity) are very good, and the test was 
normed as a brief test, not as an extrapolation from a compre- 
hensive test — a problem with many of the Wechsler short 
forms (Kaufman & Kaufman, 2001). 

K-BIT scores, however, should be considered tentatively 
when making clinical decisions, particularly with regard to 
the presence of impairment and differences between verbal 
and nonverbal performance. That is, when clinicians need to 
characterize a person's IQ for possible diagnosis or placement, 
or will be making inferences about the person's ability profile, 
then test results need to be supported by a more comprehen- 
sive assessment (Kaufman & Kaufman, 2001). Despite the 
finding that K-BIT scores are highly correlated with IQ scores 
from other standardized tests, K-BIT scores can differ 
markedly from their counterparts. Thus, the K-BIT and 
Wechsler test are not interchangeable. If the goal is to predict 
Wechsler IQ scores, then Wechsler short forms are the tools of 
choice. Further, the limited response alternatives (e.g., no 
manual or rapid responses) that make the K-BIT easy to ad- 
minister and score preclude an assessment of the diversity of 
behavior often required in a clinical setting (Naugle et al., 
1993). Tests such as the WASI may tap more diverse functions 
and provide more clinically useful information; however, this 
issue requires further study. It is also important to bear in 
mind that the K-BIT does require reading and spelling. The 
K-BIT-2 does not require these skills and therefore may be 
more appropriate for individuals with written language dis- 
orders. 

Users should also bear in mind that the norms for the K- 
BIT were collected about 20 years ago and some inflation in 
scores is expected due to the Flynn effect. Therefore, use of the 
newer version (K-BIT-2) is recommended. 

Finally, it should be noted that the K-BIT does not allow 
the examiner to make meaningful discriminations among in- 
dividuals with very low levels of functioning (Powell et al., 
2002). For example, individuals aged 25 to 34 years who obtain 
raw scores of 25 or below on the Vocabulary subtest receive 



168 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



a standard score of 40. In patients with severe deficits, tests that 
have a lower floor (e.g., the Stanford-Binet) may be preferred. 



REFERENCES 

Axelrod, B. N., &Naugle, R. I. (1998). Evaluation of two brief and re- 
liable estimates of the WAIS-R. International Journal of Neuro- 
science, 94, 85-91. 

Bowers, T. L. & Pantle, M. L. (1998). Shipley Institute for Living Scale 
and the Kaufman Brief Intelligence Test as screening instruments 
for intelligence. Assessment, 5, 187-195. 

Burton, D. B., Naugle, R. I., & Schuster, J. M. (1995). A structural 
equation analysis of the Kaufman Brief Intelligence Test and the 
Wechsler Intelligence Scale — Revised. Psychological Assessment, 7, 
538-540. 

Chin, C. E., Ledesma, H. M., & Cirino, P. T. (2001). Relation between 
Kaufman Brief Intelligence Test and WISC-III scores of children 
with RD. Journal of Learning Disabilities, 34, 2-8. 

Donders, J. (1995). Validity of the Kaufman Brief Intelligence Test 
(K-BIT) in children with traumatic brain injury. Assessment, 2, 
219-224. 

Donovick, P. ]., Burright, R. G., Burg, J. S., Gronendyke, S. J., Klim- 
czak, N., Mathews, A., & Sardo, J. (1996). The K-BIT: A screen for 
IQ in six diverse populations. Journal of Clinical Psychology in 
Medical Settings, 3, 131-139. 

Eisenstein, N., & Engelhart, C. I. (1997). Comparison of the K-BIT 
with short forms of the WAIS-R in a neuropsychological popula- 
tion. Psychological Assessment, 9, 57-62. 

Grados, J. J., & Russo-Garcia, K. A. (1999). Comparison of the Kauf- 
man Brief Intelligence Test and the Wechsler Intelligence Scale for 
Children-third edition in economically disadvantaged African 
American youth. Journal of Clinical Psychology, 59, 1063-1071. 

Hays, J. R., Reas, D. L., & Shaw, J. B. (2002). Concurrent validity of the 
Wechsler Abbreviated Scale of Intelligence and the Kaufman Brief 
Intelligence Test among psychiatric patients. Psychological Re- 
ports, 90, 355-359. 

Horn, J. L. (1985). Remodeling old models of intelligence. In B. B. 
Wolman (Ed.), Handbook of intelligence. New York: Wiley. 

Kaufman, A. S., & Kaufman, N. L. (1990). Kaufman Brief Intelligence 
Test manual. Circle Pines, MN: American Guidance Service. 



Kaufman, J. C, & Kaufman, A. S. (2001). Time for changing of the 
guard: A farewell to short forms of intelligence. Journal of Psy- 
choeducational Assessment, 19, 245-267. 

Klimczak, N. C, Bradford, K. A., Burright, R. G., & Donovick, P. J. 
(2000). K-FAST and WRAT-3: Are they really different? The Clin- 
ical Neuropsychologist, 14, 135-138. 

Lafavor, J. M., & Brundage, S. B. (2000). Correlation among demo- 
graphic estimates of intellectual abilities, performance IQ scores, 
and verbal IQ scores in non-brain-damaged and aphasic adults. 
Aphasiology, 14, 1091-1103. 

Naugle, R I., Chelune, G. J., & Tucker, G D. ( 1 993 ) . Validity of the Kauf- 
man Brief Intelligence Test. Psychological Assessment, 5, 182-186. 

Powell, S., Plamondon, R., & Retzlaff, P. (2002). Screening cognitive 
abilities in adults with developmental disabilities: Correlations of 
the K-BIT, PPVT-3, WRAT-3, and CVLT Journal of Developmen- 
tal and Physical Disabilities, 14, 239-246. 

Prewett, P. N. (1992a). The relationship between the K-BIT and the 
Wechsler Intelligence Scale for Children — Revised (WISC-R). 
Psychology in the Schools, 29, 25-27 '. 

Prewett, P. N. (1992b). The relationship between the Kaufman Brief 
Intelligence Test (K-BIT) and the WISC-R with incarcerated juve- 
nile delinquents. Educational and Psychological Measurement, 52, 
977-982. 

Prewett, P. N. (1995). A comparison of two screening tests (the Ma- 
trix Analogies Test — short form and the Kaufman Brief Intelli- 
gence Test) with the WISC-III. Psychological Assessment, 7, 69-72. 

Prewett, P. N., & McCaffery, L. K. (1993). A comparison of the Kauf- 
man Brief Intelligence Test (K-BIT) with the Stanford-Binet, a 
two subtest short form, and the Kaufman Test of Educational 
Achievement (K-TEA brief form). Psychology in the Schools, 30, 
299-304. 

Raven, J. C. (1956). Guide to using the Coloured Progressive Matrices 
(rev. ed.). London: H.K. Lewis. 

Raven, J. C. (1960). Guide to using the Standard Progressive Matrices 
(rev. ed.). London: H.K. Lewis. 

Thompson, A., Browne, ]., Schmidt, E, & Boer, M. (1997). Validity of 
the Kaufman Brief Intelligence Test and a four-subtest WISC-III 
short form with adolescent offenders. Assessment, 4, 385-394. 

Webber, L. S., & McGillivray, J. A. (1998). An Australian validation of 
the Kaufman Brief Intelligence Test (K-BIT) with adolescents with 
an intellectual disability. The Australian Psychologist, 33, 234- -237 '. 



Mini-Mental State Examination (MMSE) 



PURPOSE 

The purpose of this test is to screen for mental impairment, 
particularly in the elderly. 

SOURCE 

The test (MMSE User's Guide and 50 Test Forms) can be ob- 
tained from Psychological Assessment Resources, Inc., P.O. 
Box 998, Odessa, FL (www.parinc.com), at a cost of $63 US. 

AGE RANGE 

The test can be given to individuals aged 18 to 85+, although 
some limited data are available for children aged 10 and older 
for the MMSE and aged 4 and older for the 3MS. 



DESCRIPTION 

The MMSE is a popular measure to screen for cognitive im- 
pairment, to track cognitive changes that occur with time, and 
to assess the effects of potential therapeutic agents on cogni- 
tive functioning. It is attractive because it is brief, easily ad- 
ministered, and easily scored. 

Many of the items were used routinely by neurologists to 
screen mental ability informally. The items were formalized 
by Folstein et al. (1975) to distinguish neurological from psy- 
chiatric patients. The items were designed to assess orienta- 
tion to time and place, attention and calculation (serial 7s, 
spell "world" backward), language (naming, repetition, com- 
prehension, reading, writing, copying), and immediate and 
delayed recall (three words; see Figure 6-6). Over 100 transla- 
tions of the MMSE have also been developed, although most 



Mini-Mental State Examination (MMSE) 169 



Figure 6-6 Mini-Mental State Examination. Note that the choice of 
words used to test a person's ability to learn and retain three words 
was left originally to the discretion of the examiner. Most studies, 
however, have adopted the words apple, penny, and table. For the pur- 



pose of testing Canadian patients, the orientation item is modified by 
replacing state and county with country and province (Lamarre & 
Patten, 1991). Source: Reprinted from Tombaugh & Mclntyre, 1992. 
Copyright Blackwell Publishing. 



Item 


Max. 






points 


1. 


What is the: Year? Season? Date? Day? Month? 


5 


2. 


Where are we: State (Country)? County (Province)? Town or City? Hospital 
(Place)? Floor (Street)? 


5 


3. 


Name three objects (apple, penny, table), taking one second to say each. 
Then ask the patient to tell you the three words. Repeat the answers until the 
patient learns all three, up to six trials. The score is based on the first trial. 


3 


4. 


Serial 7s: Subtract 7 from 1 00. Then subtract 7 from that number, etc. Stop 
after five subtractions (93, 86, 79, 72, 65). 

Score the total number of correct answers. 

Alternate: Spell "world" backwards. The score is the number of letters in 
correct order (e.g., dlrow = 5, dlorw = 3). 


5 


5. 


Ask for the names of the three objects learned in #3. 


3 


6. 


Point to a pencil and watch. Have the patient name them as you point. 


2 


7. 


Have the patient repeat "No ifs, ands, or buts" (only one trial). 


1 


8. 


Have the patient follow a three-stage command: "Take the paper in 
your right hand. Fold the paper in half. Put the paper on the floor." 


3 


9. 


Have the patient read and obey the following: "Close your eyes." 
(Write in large letters.) 


1 


10. 


Have the patient write a sentence of his or her own choice. 


1 


11. 


Have the patient copy the following design (overlapping pentagons). 


1 





have not been extensively validated (Auer et al., 2000). The in- 
terested reader can contact PAR (MMSE Permissions) for one 
of the available translations. 

An expanded version of the MMSE, the Modified Mini- 
Mental State Examination (3MS), has been developed (Teng & 
Chui, 1987) to increase the test's sensitivity. Four additional 
questions that assess temporal and spatial orientation, the abil- 
ity to see relations between objects, and verbal fluency (i.e., 
date and place of birth, word fluency, similarities, and delayed 
recall of words) were added (Figure 6-7). In addition, the 3MS 
includes items that assess different aspects of memory includ- 
ing cued recall, recognition memory, delayed free and cued re- 
call, and delayed recognition memory. The maximum score 
was increased from 30 to 100 points, and a modified scoring 
procedure was introduced that allows partial credit for some 
items. One of the advantages of the 3MS is that both a 3MS and 
an MMSE score can be derived from a single administration. 



The 3MS has been studied in a pediatric sample (Besson & 
Labbe, 1997) and the MMSE has been adapted for use in pedi- 
atric settings (Ouvrier et al., 1993). The items are shown in 
Figure 6-8. 



ADMINISTRATION 

The examiner asks questions and records responses. Ques- 
tions are asked in the order listed (see Figures 6-6 through 
6-8) and are scored immediately. In addition, the following 
suggestions are offered: 

1. The version for adults should not be given unless the 
person has at least an eighth-grade education level and 
is fluent in English (Tombaugh & Mclntyre, 1992). 

2. A written version of the test may be preferable for 
hearing-impaired individuals (Uhlmann et al., 1989). 



Figure 6-7 Modified Mini-Mental State (3MS) Examination. Source: Reprinted from Teng & 
Chui, 1987. Copyright 1987 by the Physicians Postgraduate Press. Reprinted with permission. 



Name: 



Education: 



Date: 
Gender: 



Age: 



Total Score: 



/100 



Administered by: 



Date & Place of Birth 

• "What is your date of birth?" Year 

• "What is your place of birth?" Town/City . 



(1) 



Month 

(1) 



(1) 



Day_ 



Province/State 



(1) 



Total Score 

Registration 

• "I shall say three words for you to remember. Repeat them after I have said all 3 words": 

D SHIRT ... D BROWN ... D HONESTY (or: SHOES . . . BLACK . . . MODESTY, or: SOCKS . . . 

BLUE . . . CHARITY) 

• "Remember what they are because I am going to ask you to name them again in a few minutes." Accurate repetition 
(1 point each) Score (3) # of trials needed to repeat all three words (no score) 



/5 



Total Score 



/3 



Mental Reversal 

• "Can you count from 1 to 1 0? Like this, 1 ,2,3, all the way to 1 0. Go." Answer 

If correct say: 

• "Now, can you to count backwards from 5? Go." 

Answer Accurate repetition (2); 1 or 2 misses (1) Score_ 



(2) 



"Now I am going to spell a word forwards and I want you to spell it backwards. The word is 'world,' W-O-R-L-D. Spell 
'world' backwards." 



(Print letters) 



(D L R O W) 



Accurate repetition (1 ) each correctly placed letter Score . 



(5) 



Total Score 



n 



First Recall 



"What are the 3 words that I asked you to remember?" (If not recalled, provide a category prompt; if not recalled ask 
them to choose one of the three multiple-choice options; if correct answer not given, score and provide the correct 
answer.) 



Spontaneous Recall 
D Shirt (3) 
□ Brown (3) 
D Honesty (3) 



Category Prompt 

□ Something to wear (2) 
D A color (2) 

□ A good personal quality (2) 



Multiple Choice 

□ Shoes, Shirt, Socks (1) 

□ Blue, Black, Brown (1) 

□ Honesty, Charity, Modesty (1 ) 

Total Score /9 

[continued] 



170 



Figure 6-7 (continued) 



Temporal Orientation 

• "What is the year?" Accurate (8); miss by 1 year (4); miss by 2-5 years (2) 

• "What is the season?" Accurate or within 1 month (1 ) 

• "What is the month?" Accurate within 5 days (2); miss by 1 month (1 ) 

• "What is the date?" Accurate (3); miss by 1-2 days (2); miss by 3-5 days (1 ] 

• "What is the day of the week?" Accurate (1 ) 



Spatial Orientation 

• "Can you tell me where we are right now? For instance, what state/province are we in?" (2) 

• "What city are we in?" (1) 

• "Are we in a hospital or office building or home?" (1 ) 

• "What is the name of this place?" (1 ) 



Score 


(8) 


Score 


(1) 


Score 


(2) 


Score 


(3) 


Score 


(1) 


Total Score 


/15 



"What floor of the building are we on?" (no score) (0) 



Total Score /5 



Naming 

• "What is this called?" (show wristwatch) □ (no score) 

• "What is this called?" (show pencil) □ (no score) 

• "What is this called?" (point to a part of your own body; score if subject cannot readily name) 
Shoulder D (l)ChinD (1) Forehead □ (1) Elbow D (1) Knuckle □ (1) 

Total Score /5 

Four-Legged Animals 

• "What animals have four legs?" Allow 30 seconds (record responses). If no response after 10 seconds repeat the 
question once. Prompt after only the first incorrect answer by saying: "I want four-legged animals." 

(1) ID ID ID 

(1) ID ID ID 

(1) ID 



Total Score /10 



Similarities 

• "In what way are an arm and a leg alike?" 



If incorrect, this time only, prompt with: "An arm and leg are both limbs or parts of the body." 

Score Both are body parts; limbs; etc. (2) 

Score Less accurate answer (0 or 1 ) 



171 



{continued) 



Figure 6-7 (continued) 



• "In what way are laughing and crying alike?" 




Score Both are feelings, emotions (2) 


Score Less accurate answer (0 or 1 ) 




• "In what way are eating and sleeping alike?" 




Score Both are essential for life (2) 


Score Less accurate answer (0 or 1 ) 




Total Score _ 


_/6 


Repetition 




• "Repeat what 1 say: '1 would like to go home/out.' " Accurate Score (2) 




1-2 missed/wrong words Score (0 or 1 ) 




• "Now repeat: No ifs □, ands □, or buts □." Accurate Score (3) 




(no credit if "s" is left off a word) One point each Score (1 or 2) 




Total Score _ 


_/5 


Read & Obey Hold up piece of paper on which the command is printed "Close your eyes." 




• "Please do this." (wait 5 seconds, if no response provide next prompt) 




• "Read and do what this says." (if already said or reads the sentence only provide next prompt) 




• "Do what this says." 




Obeys without prompting Score (3) 




Obeys after prompting Score (2) 




Reads aloud only (no eye closure) Score (0 or 1 ) 




Total Score _ 


_/3 


Writing 




Provide subject with a pencil with eraser and say: 




• "Please write this down, '1 would like to go home/out.'" Repeat sentence word by word, if necessary. 




Allow one minute. 




Do not penalize self-corrections. (1 ) point for each word, except "1" Score (5) 




Total Score _ 


_/5 


[continued) 



172 



Figure 6-7 (continued) 



Copying Two Pentagons 






• "Here is a drawing. Please copy the drawing on the same paper." Allow one minute. If subject needs longer, docu- 
ment how much of design was completed in one minute and allow him or her to finish. 




Pentagon 1 Pentagon 2 




5 approximately equal sides 


Score (4) Score (4) 


Intersection 


5 unequal sides (>2:1 ) 


Score (3) Score (3) 


4 corners (2) 


Other enclosed figure 


Score (2) Score (2) 


Not-4-corner enclosure (1) 


2 or more lines 


Score (1) Score (1) 


No enclosure (0) 


Less than 2 lines 


Score (0) Score (0) 


Total Score /10 


Three-Stage Command 






Hold up a sheet of plain paper 


and say: 




• "Take this paper with your left r 


and □, fold it in half □, and hand it back to me D." 




(Note: use right hand for left-handed patients; do not repeat any part of the command 
such as keeping hand in ready to receive posture. Score (1 ) point for each step.) 


or give visual cues to return the paper, 






Total Score /3 


Second Recall 






• "What are the 3 words that 1 asked you to remember?" (If not recalled, provide a category prompt; if not recalled ask 
them to choose one of the three multiple-choice options; if correct answer not given, score 0) 


Spontaneous Recall 


Category Prompt 


Multiple Choice 


D Shirt (3) 


□ Something to wear (2) 


□ Shoes, shirt, socks (1) 


D Brown (3) 


D A color (2) 


□ Blue, black, brown (I) 


D Honesty (3) 


□ A good personal quality (2] 


□ Honesty, charity, modesty (1) 

Total Score /9 

Total Score: /100 


[continued) 



173 



Figure 6-7 (continued) 




174 



Figure 6-8 MMSE for children. Source: From Ouvrier et al., 1993. Reprinted with permission 
of BC Decker. 



Orientation 

1 . What is the 



Score 



Points 



2. Where are we? 



Year? 

Season? 

Date? 

Day? 

Month? 

Country 

State or territory 

Town or city 

Hospital or suburb 

Floor or address 



Registration 



3. Name three objects, taking one second to say each. Then ask the patient all three after you have said them 
twice (tree, clock, boat). Give one point for each correct answer. Repeat the items until patient learns all three. 

Attention and Calculation 

4. Serial 7s. Give one point for each correct answer. 
Stop after five answers. 

5. Spell "world" backwards. 

The practice word "cat" can be used for younger or children suspected of significant intellectual deficits. 
Older children are first asked to spell "world" forward and then backward. 

Recall 

6. Ask for names of three objects learned in Q3. Give one point for each correct answer. 
Language 

7. Point to a pencil and a watch. Have the patient name them as you point. 

8. Have the patient repeat "No ifs, ands, or buts." 

Say the sentence twice before asking the patient to repeat. 

9. Have the patient follow a three-stage command. "Take a piece of paper in your right hand. 
Fold the paper in half. 

Put the paper on the floor." 

1 0. Have the patient read and obey the following: "Close your eyes." (Write in large letters.) 

1 1 . Have the patient write a sentence of his or her choice. (The sentence should contain a subject and 
an object, and should make sense. Ignore spelling errors when scoring.) 

1 2. Have the patient copy the design printed below. (Give one point if all sides and angles are 
preserved and if the intersecting sides form diamond shape.) 

Total 




1 
35 



175 



176 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



3. Serial 7s and "world" should not be considered 
equivalent items. Tombaugh and Mclntyre (1992) 
recommend that both items should be given and the 
higher of the two used, although Espino et al. (2004) 
recommend giving serial 7s alone to increase reliability 
and discriminative ability. Folstein et al. (2001) indicate 
that "world" should only be given if the examinee 
refuses to perform the serial 7s task. Further, "world" 
should be spelled forward (and corrected) prior to 
spelling it backward (Folstein et al., 2001; Tombaugh & 
Mclntyre, 1992). 

4. Item 2 of the MMSE (i.e., "where are we" orientation- 
to-place questions) should be modified. The name of 
the county (province) where the person lives should be 
asked rather than the name of the county where the 
test is given. The name of the street where the 
individual lives should be asked rather than the name 
of the floor where the testing takes place (Tombaugh & 
Mclntyre, 1992). 

5. The words apple, penny, and table should be used for 
registration and recall. If necessary, the words may be 
administered up to three times to obtain perfect 
registration, but the score is based on the first trial 
(Tombaugh & Mclntyre, 1992). 

6. Alternative word sets (e.g., pony, quarter, orange) can be 
substituted (Folstein et al, 2001; Teng & Chui, 1987) 
when retesting an examinee. Note, however, that 

the equivalence of the various sets has not 
been established. 

7. If any physical limitations prevent the individual from 
using the right hand on the Comprehension task or for 
placing the paper on the floor, it is acceptable to 
instruct the individual to use his or her left hand or to 
place the paper on a table. 



ADMINISTRATION TIME 

The task can be administered in 5 to 10 minutes. 



SCORING 

The score is the total number of correct answers. The maxi- 
mum score for the MMSE is 30 points for the adult version 
and 35 points for the children's version. See Folstein et al. 
(2001) for detailed scoring criteria. For the 3MS, the maxi- 
mum score is 100. Failures to respond should be scored as er- 
rors (Fillenbaum, Hughes, et al., 1988). 

Scoring criteria are fairly explicit for the 3MS. However, 
the following is recommended: 

1 . Registration/recall: Accept only the correct suffix. For 
example, do not accept "shoe" for "shoes" or "honest" 
for "honesty." 

2. Repetition: Give no credit if the "s" is omitted. 

3. Writing: Assign one point for each correct word, but give 
no credit for "I." Do not penalize self-corrected errors. 



DEMOGRAPHIC EFFECTS 



Age 



MMSE/3MS scores increase with age in children (Ouvrier et al, 
1993) and decrease with advancing age in adults (Antsey et al., 
2000; Bleeker et al., 1988; Bravo & Hebert, 1997a; Brown 
et al., 2003; Crum et al, 1993; Dufouil et al., 2000; Freidl et al, 
1996; Jorm et al, 1988; O'Connell et al., 2004; O'Connor et 
al., 1989; Olin & Zelinski, 1991; Starr et al, 1992; Tombaugh 
& Mclntyre, 1992; Tombaugh et al., 1996). In children, scores 
for both the MMSE and 3MS reach a plateau at about 9 or 10 
years of age (Besson 8c Labbe, 1997; Ouvrier et al., 1993). 
Most of the age-related change in adults begins at about age 
55 to 60 and then dramatically accelerates at the age of 75 to 
80. These age effects persist even when individuals are strati- 
fied by educational level. 

IQ/Education 

MMSE/3MS scores are related to premorbid intelligence and 
educational attainment: Individuals with higher premorbid 
ability and/or more education tend to score higher than those 
with lower IQs and/or few years of schooling (Anthony et al., 
1982; Antsey et al., 2000; Bravo 8c Hebert, 1997a; Brown et al, 
2003; Christensen 8c Jorm, 1992; Crum et al., 1993; Dufouil et 
al., 2000; Fountoulakis et al, 2000; Freidl et al, 1996; Ishizaki 
et al, 1998; Jorm et al., 1988; Marcopulos et al, 1997; O'Con- 
nell et al., 2004; O'Connor et al., 1989; Olin 8c Zelinski, 1991; 
Ouvrier et al., 1993; Starr et al, 1992; Taussig et al., 1996; 
Tombaugh et al., 1996; Van Der Cammen et al., 1992). IQ has 
a stronger relationship to MMSE scores than education (Bieli- 
auskas et al., 2000). There is evidence that low educational or 
intelligence levels increase the likelihood of misclassifying 
normal people as cognitively impaired while higher ability 
and educational levels may mask mild impairment. Education 
and premorbid ability, however, may also reflect etiological 
factors (e.g., hypertension, obesity) critical in the process that 
eventually results in some form of dementia (e.g., ischemic 
vascular dementia). In short, education may represent a psy- 
chometric bias and/or a risk factor (for further discussion, see 
Crum et al, 1993; Jorm et al., 1988; Tombaugh 8c Mclntyre, 
1992). 



Gender 

Gender has little impact on the total score (e.g., Antsey et 
al, 2000; Besson 8c Labbe, 1997; Bleecker et al, 1988; Bravo 
8c Hebert, 1997a; O'Connor et al., 1989; but see Brown 
et al., 2003). Some (Bleecker et al., 1988; Jones 8c Gallo, 
2002; O'Connor et al., 1989) have reported that gender influ- 
ences performance on some items. For example, Jones and 
Gallo (2002) reported that women are more likely to err on 
serial subtractions and men on spelling and other language 
tasks; however, the magnitude of the effect tends to be quite 
small. 



Mini-Mental State Examination (MMSE) 177 



Race/Ethnicity/Language 

There is some evidence that MMSE scores are affected by 
race/ethnicity and social class (e.g., Espino et al, 2004; Mul- 
grew et al., 1999; Taussig et al., 1996). MMSE scores tend to be 
decreased in individuals of nonwhite ethnicity (Espino et al., 
2001; Espino et al, 2004; Shadlen et al., 1999; but see Ford et 
al., 1996; Marcopulos et al, 1997; Marcopulos & McLain, 
2003) and lower social class. Ethnic differences, at least in the 
case of Mexican Americans, appear related to educational dif- 
ferences and location of residence (neighbourhood) with barrio 
residents scoring considerably lower than Mexican Americans 
living in transitional neighbourhoods and the suburbs (Es- 
pino et al., 2001). Espino and colleagues (2001) have specu- 
lated that the regional differences reflect cultural and/or social 
factors (e.g., differences in familiarity with the kinds of skills 
measured by the MMSE, daily stress, assimilation). 

Some of the items appear to be biased with respect to 
race/ethnicity and education (Jones & Gallo, 2002; Mulgrew 
et al., 1999; Teresi et al., 2001). For example, the sentence pro- 
duction item appears easier for Caucasians than for African 
Americans. The items "close your eyes" and serial 7s are also 
problematic, having different results for various ethnic and 
education groups. 

There is also some evidence that language of testing may 
impact performance. For example, Bravo and Hebert (1997a) 
noted that English-speaking older adults performed slightly 
better than their French-speaking counterparts, although the 
effect was very small (less than one point for the MMSE and 
about two points for the 3MS). 



Other 

Health status (e.g., history of heart disease) impacts perfor- 
mance (Antsey et al., 2000). There is also evidence that im- 
pending mortality lowers performance. The mortality-related 
effects appear most pronounced within three years before 
death; however, the magnitude of the effect is relatively small 
(Tan, 2004). 



NORMATIVE DATA-ADULTS 

MMSE 

Extensive norms by age (18 to approximately 85 years) and 
education (no formal schooling to one or more college de- 
grees) have been reported (Crum et al, 1993), based on prob- 
ability sampling of more than 18,000 community-dwelling 
adults. The sample includes individuals, regardless of their 
physical or mental health status, from five metropolitan areas: 
New Haven, Baltimore, Durham, St. Louis, and Los Angeles. 
The data are presented in Table 6-51. Iverson (1998) derived 
age- and education-corrected cutoff scores for individuals 
aged 60 and above in this dataset. The cutoffs, also shown in 
Table 6-51, are greater than 1.64 standard deviations below 
the sample mean (if normally distributed, 90% of all scores 



should fall within + 1.64 z score units from the mean). Note 
that the MMSE scores were based on either the response to se- 
rial 7s or spelling "world" backwards, whichever yielded the 
higher score. Also note the wider range of scores in the lowest 
educational groups and at the oldest ages. MMSE scores 
ranged from a median of 29 for those aged 18 to 24 years to 25 
for individuals aged 80 years and older. The median MMSE 
score was 29 for individuals with at least 9 years of schooling, 
26 for those with 5 to 8 years of schooling, and 22 for those 
with to 4 years of schooling. 

Similar data have been reported for rural U.S. elderly (ages 
55+) with low education (Marcopulos & McLain, 2003), older 
adults (aged 62-95) living in retirement villages and institu- 
tions in Australia (Antsey et al., 2000), and participants (ages 
65+) drawn from various geographical regions in Canada, 
who were classified as cognitively intact on the basis of an exten- 
sive battery of neuropsychological and medical tests (Bravo & 
Hebert, 1997a; Tombaugh et al, 1996). About 95% of nonde- 
mented older adults score over 23 on the MMSE (Bravo and 
Hebert, 1997a; Meiran et al, 1996). 

3MS 

Bravo and Hebert (1997a) provide data based on 7,754 adults, 
aged 65+, randomly chosen to take part in the Canadian 
Study of Health and Aging (CSHA). Individuals classed as 
cognitively impaired or demented following a clinical and 
neuropsychological examination were excluded. The reference 
values, stratified by age and education, are shown in Table 6-52. 
About 95% of the sample obtained a score over 76. 

Other smaller normative sets have also been provided. 
Thus, Tombaugh et al. (1996) report percentile scores derived 
from a select subsample of the CSHA, judged to be cognitively 
intact on the basis of an extensive clinical examination. The 
normative data were stratified across two age groups (65-79 
and 80-89) and two educational levels (0-8 and 9+ years). 
Jones et al. (2002) present normative data on the 3MS for a 
sample of 393 U.S. community-dwelling, primarily Caucasian, 
older adults. Their sample of individuals aged 80+ or with less 
than 12 years of education is relatively small (N= 44). 

Brown et al. (2003) have recently reported normative data 
based on a sample of 238 African American, community- 
dwelling older adults, aged 60 to 84. Tables 6-53 and 6-54 
provide these data along with the score adjustments for edu- 
cation and gender. Brown et al. (2003) caution that the qual- 
ity of education varies greatly within the population of 
African Americans and that matching on years of education 
does not necessarily mean that the quality of education is 
comparable. 

NORMATIVE DATA- CHILDREN 

MMSE 

For the children's version of the MMSE, Ouvrier et al. (1993) 
tested a heterogeneous sample of 117 children who attended 



Table 6-51 Mini-Mental State Examination Score by Age and Education Level, Number of 
Participants, Mean, SD, and Selected Percentiles 











Age(Y 


:ars) 








Educational Level 


18-24 


25-29 


30-34 


35-39 


40-44 


45-49 


50-54 


55-59 


0-4 years N 


17 


23 


41 


33 


36 


28 


34 


49 


Mean 


22 


25 


25 


23 


23 


23 


23 


22 


SD 


2.9 


2.0 


2.4 


2.5 


2.6 


3.7 


2.6 


2.7 


Lower quartile 


21 


23 


23 


20 


20 


20 


20 


20 


Median 


23 


25 


26 


24 


23 


23 


22 


22 


Upper quartile 


25 


27 


28 


27 


27 


26 


25 


26 


5-8 years N 


94 


83 


74 


101 


100 


121 


154 


208 


Mean 


27 


27 


26 


26 


27 


26 


27 


26 


SD 


2.7 


2.5 


1.8 


2.8 


1.8 


2.5 


2.4 


2.9 


Lower quartile 


24 


25 


24 


23 


25 


24 


25 


25 


Median 


28 


27 


26 


27 


27 


27 


27 


27 


Upper quartile 


29 


29 


28 


29 


29 


29 


29 


29 


9-12 years or 


















high school 


















diploma N 


1326 


958 


822 


668 


489 


423 


462 


525 


Mean 


29 


29 


29 


28 


28 


28 


28 


28 


SD 


2.2 


1.3 


1.3 


1.8 


1.9 


2.4 


2.2 


2.2 


Lower quartile 


28 


28 


28 


28 


28 


27 


27 


27 


Median 


29 


29 


29 


29 


29 


29 


29 


29 


Upper quartile 


30 


30 


30 


30 


30 


30 


30 


30 


College experience 


















or higher 


















degree N 


783 


1012 


989 


641 


354 


259 


220 


231 


Mean 


29 


29 


29 


29 


29 


29 


29 


29 


SD 


1.3 


0.9 


1.0 


1.0 


1.7 


1.6 


1.9 


1.5 


Lower quartile 


29 


29 


29 


29 


29 


29 


28 


28 


Median 


30 


30 


30 


30 


30 


30 


30 


29 


Upper quartile 


30 


30 


30 


30 


30 


30 


30 


30 


Total N 


2220 


2076 


1926 


1443 


979 


831 


870 


1013 


Mean 


29 


29 


29 


29 


28 


28 


28 


28 


SD 


2.0 


1.3 


1.3 


1.8 


2.0 


2.5 


2.4 


2.5 


Lower quartile 


28 


28 


28 


28 


27 


27 


27 


26 


Median 


29 


29 


29 


29 


29 


29 


29 


29 


Upper quartile 


30 


30 


30 


30 


30 


30 


30 


30 










Age(Y 


;ars) 









Educational Level 



60-64 



65-69 



70-74 



75-79 



80-84 



=85 



Total 



0-4 years N 

Mean 

SD 

Lower quartile 

Median 

Upper quartile 

Abnormal cutoff 
5-8 years N 

Mean 

SD 

Lower quartile 

Median 

Upper quartile 

Abnormal cutoff 
9-12 years or 

high school 

diploma N 

Mean 

SD 



88 

23 

1.9 

19 

22 

26 

19 

310 

26 

2.3 

24 

27 

29 

22 



626 

28 
1.7 



126 

22 
1.9 

19 
22 
25 
18 
633 
26 

1.7 
24 
27 
29 
23 



814 
28 
1.4 



139 

22 
1.7 

19 

21 

24 

19 
533 

26 
1.8 

24 

26 

28 

23 



550 
27 
1.6 



112 

21 
2.0 

18 

21 

24 

17 
437 

25 
2.1 

22 

26 

28 

21 



315 
27 
1.5 



105 

20 
2.2 

16 

19 

23 

16 
241 

25 
1.9 

22 

25 

27 

21 



163 
25 
2.3 



61 
19 

2.9 
15 
20 
23 
14 
134 
23 

3.3 
21 
24 
27 
17 



892 
22 
2.3 
19 
22 
25 

3223 
26 
2.2 

23 
26 
28 



99 8240 

26 28 

2.0 1.9 

(continued) 



Mini-Mental State Examination (MMSE) 179 



Table 6-51 Mini-Mental State Examination Score by Age and Education Level, Number of 
Participants, Mean, SD, and Selected Percentiles (continued) 









Age 


(Years) 








Educational Level 


60-64 


65-69 


70-74 


75-79 


80-84 


&85 


Total 


Lower quartile 


27 


27 


26 


25 


23 


23 


27 


Median 


28 


28 


28 


27 


26 


26 


29 


Upper quartile 


30 


29 


29 


29 


28 


28 


30 


Abnormal cutoff 


25 


25 


24 


24 


21 


22 




College experience or 
















higher degree N 


270 


358 


255 


181 


96 


52 


5701 


Mean 


29 


29 


28 


28 


27 


27 


29 


SD 


1.3 


1.0 


1.6 


1.6 


0.9 


1.3 


1.3 


Lower quartile 


28 


28 


27 


27 


26 


25 


29 


Median 


29 


29 


29 


28 


28 


28 


29 


Upper quartile 


30 


30 


29 


29 


29 


29 


30 


Abnormal cutoff 


26 


27 


25 


25 


25 


24 




Total N 


1294 


1931 


1477 


1045 


605 


346 


18,056 


Mean 


28 


27 


27 


26 


25 


24 


28 


SD 


2.0 


1.6 


1.8 


2.1 


2.2 


2.9 


2.0 


Lower quartile 


26 


26 


24 


23 


21 


21 


27 


Median 


28 


28 


27 


26 


25 


25 


29 


Upper quartile 


29 


29 


29 


28 


28 


28 


30 


Abnormal cutoff 


24 


24 


24 


22 


21 


19 





Data from the Epidemiologic Catchment Area household surveys in New Haven, CT; Baltimore, MD; St. Louis, MO; Durham, 
NC; and Los Angeles, CA, between 1980 and 1984. The data are weighted based on the 1980 U.S. population Census by age, gen- 
der, and race. 

Source: From Crum et al., 1993. Copyright, American Medical Association. Iverson (1998) provides the abnormal cutoff scores, 
which are greater than 1.64 standard deviations below the sample means for participants aged 60 and above. 



private or outpatient clinics. They suggest that values below 
27/35 are abnormal in children over the age of 10 years; how- 
ever, definition of lower limits of normal at various ages re- 
mains to be determined. 



3MS 

Besson and Labbe (1997) gathered preliminary data on the 
3MS in a sample of 79 children, free of neurological impairment 
as reported by parents and medical history. Means and stan- 



Table 6-52 Age- and Education-Specific Reference Values for the 3MS Based on a Sample 7754 
Normal Elderly Across Canada 



Education 



Age (Years) 



65-69 



70-74 



75-79 



80-84 



85+ 



0-4 years 



5-8 years 



-12 years 



1 3 years and over 



N = 78 

82.0 (8.7) 
(70, 79, 82) 

iV=495 

87.1 (7.7) 
(76,83,88) 

N=942 
91.7 (6.5) 
(81,89,93) 

iV=581 
93.9 (5.7) 
(85,92,95) 



N=85 

82.6 (7.5) 
(71,78,83) 

IV =422 
87.1 (8.1) 
(78,83,87) 

N=752 

90.7 (6.3) 
(80,87,92) 

N=375 
92.9 (6.4) 
(82,91,94) 



N=93 

81.0(5.4) 
(70,77,83) 

N=556 

85.7(5.8) 

(75,81,86) 

JV=921 
89.8 (4.7) 
(79,86,90) 

N=535 

91.3(5.2) 

(80,88,92) 



Data reported as sample size, mean (standard deviation), and (5th, 25th, 50th) percentiles. 
Source: From Bravo & Hcbert, 1997a. Reproduced with permission, John Wiley & Sons. 



N=78 

79.6(8.1) 

(65,76,81) 

N=277 
84.0 (6.0) 
(70,79,85) 

N=455 

87.5(5.1) 

(76,83,88) 

N=236 
89.8 (5.3) 
(79,86,91) 



N=65 
77.0 (8.8) 
(50,74,80) 

N = 239 

82.6(5.1) 

(66,78,83) 

N=332 
85.6 (4.3) 
(72,81,86) 

N=208 
88.0 (4.2) 
(75,84,89) 



180 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



Table 6-53 Percentile Scores for 3MS Raw Scores 
Based on a Sample of Elderly African American Adults 

Age Group 



Raw Score 


60-7 


100 




99 


97 


98 


94 


97 


89 


96 


87 


95 


84 


94 


80 


93 


75 


92 


69 


91 


60 


90 


54 


89 


48 


88 


46 


87 


41 


86 


38 


85 


34 


84 


28 


83 


23 


82 


20 


81 


16 


80 


13 


79 


12 


78 


10 


77 


8 


76 


8 


75 


7 


74 


6 


73 


5 


72 


4 


71 


3 


70 


3 


69 


2 


68 


2 


67 


1 


66 


1 


65 


1 


64 


1 


63 


<1 


62 


<1 


61 


<1 


60 


<1 


59 


<1 


58 


<1 


57 


<1 


56 


<1 


55 


<1 


54 


<1 



72-84 



99 
98 
98 
94 
92 
89 
81 
78 
76 
73 
69 
63 
60 
58 
52 
50 
47 
43 
38 
36 
33 
32 
32 
29 
26 
26 
25 
23 
21 
21 
18 
17 
15 
15 
14 
13 
10 
9 
7 
6 
6 
6 
3 
2 
2 
1 



Based on a sample of 238 African Americans who reported no 
history of neurological disorder. The sample was not screened for 
psychiatric disorder. 

Source: From Brown et at, 2003. Reproduced with the kind permis- 
sion of Psychology Press. 



Table 6-54 Adjustments for 3MS Raw Scores for Elderly 
African Americans 

Age Group 



60-71 



72-84 



Years of Educa 


don Male 


Female 


Male 


Female 


<12 


+4 





+7 


-3 


12 





-2 


-1 


-11 


>12 


-4 


-7 


-13 


-12 



Source: From Brown et at, 2003. Reproduced with the kind permission of Psychology 
Press. 



dard deviations of scores are shown in Table 6-55. Note that 
cell sizes are very small. 



RELIABILITY 



MMSE 



Internal Consistency. With regard to the standard ver- 
sion, estimates of internal consistency range from .31 for 
community-based samples to .96 for a mixed group of medical 
patients (Espino et al., 2004; Foreman, 1987; Hopp et al, 1997; 
Jorm et al, 1988; Lopez et al, 2005; Tombaugh et al., 1996). 
The lower reliability in some samples likely reflects the reduced 
variability in healthy and better-educated samples. Lopez et al. 
(2005) suggest that reliabilities may be sufficiently high for 
clinical use if examiners use true score confidence intervals. 

The MMSE can be scored using three strategies: serial 7s or 
spelling (using the greater number of correct responses from 
either item), serial 7s only, and spelling only. The serial 7s- 
only method maximized variability and yielded the highest al- 
pha coefficient {r— .81) in a community-based sample. The 
serial 7s or spelling method yielded a marginally adequate 
level (r= .68), while the alpha coefficient for the spelling only 
method was less than optimal (r= .59; Espino et al., 2004). 



Table 6-55 Children's Norms for the 3MS 



Age 



Mean 



SD 



N 



4 


26.9 


11.6 


11 


5 


46.6 


11.2 


16 


6 


59.7 


9.5 


11 


7 


78.1 


12.9 


9 


8 


84.5 


4.3 


11 


9 


85.3 


6.2 


7 


10 


89.6 


2.4 


4 


11 


92.0 


2.9 


5 


12 


90.6 


6.5 


5 



Based on a sample of 79 children, aged 4-12 years, with r 
of neurological disorder. 



i history 



Source: From Besson & Labbe, 1997. Reprinted with permission of 
BC Decker. 



Mini-Mental State Examination (MMSE) 181 



Table 6-56 Regression Formula for Detecting Reliable Change on the MMSE and 3MS 



MMSE Short Interval 
3MS Short Interval 
MMSE Long Interval 
3MS Long Interval 



Percent of Variance 

Explained by all 
Variables (R 2) Total 

.41 
.60 
.37 
.53 



Formula for 
Obtaining Predicted Score 

.38 (test l)-.07 (age) + .10 (educ) + 21.65 
.53 (test l)-.27 (age) + .20 (educ) + 62.60 
.45 (test l)-.09 (age) + 1.06 (sex) + 19.12 
.52 (test l)-.23 (age) + .30 (educ) + 1.93 (sex) + 53.86 



Value Needed 

for Detecting 

Reliable Change 

±2.73 
±7.41 
±3.60 
±9.82 



Based on a sample of 232 older adults who were retested following three-month and five-year intervals and who received consensus diagnoses of no cognitive impairment on both 
examinations. RCI-Reg-Reliable Change Index — Regression: After the predicted retest score is obtained, it is subtracted from the observed retest score. If this change score exceeds 
the Value Needed for Dectecting Reliable Change, it is considered to represent a significant change at .05 (one tailed). Age and education are expressed in years. Sex was coded as 
male - 1 and female — 2. 

Source: Reprinted from Tombaugh (2005). Reprinted with permission from Elsevier. 



Test-Retest Reliability and Practice Effects. Test-retest relia- 
bility estimates for intervals of less than two months generally 
fall between .80 and .95 (see Clark et al., 1999; Folstein et al., 
1975; O'Connor et al, 1989; Tombaugh & Mclntyre, 1992, for 
a review). In patients with probable AD retested within a two- 
week period, slight improvement is noted (.2 to + 2.1 points), 
with most patients (95%) showing a short-term change of 
four points or less (Doraiswamy & Kaiser, 2000). Following 
retest intervals of about three months, nondemented individ- 
uals tend to show slight improvement (less than one point) 
(Tombaugh, 2005), while individuals with dementia or mild 
cognitive impairment tend not to benefit from prior exposure 
to the test (Helkala et al, 2002). 

With lengthier retest intervals (e.g., 1-2 years), normal 
subjects typically show a small amount of change (less than 
two points), and retest correlations are lower (<.80) (Hopp 
et al, 1997; Mitrushina & Satz, 1991; Olin & Zelinski, 1991), 
perhaps due in part to the inclusion at one time of individuals 
with mild cognitive impairment (Tombaugh, in press). Recall 
and Attention subtests tend to be the least reliable (Olin & 
Zelinski, 1991). 

In patients with probable AD, the average annual change in 
MMSE score is about four points, although there is high mea- 
surement error (which almost equals the average annual score 
change) and there is striking variability among individuals 
(Clark et al., 1999; Doody et al., 2001). It is not uncommon 
for patients to have a stable or even an improved score during 
a one-year interval (Chan et al., 1999). The implication of 
these findings is that clinicians monitoring change in older 
adults should be cautious in interpreting small changes in 
scores. Iverson (1998) has reported that depending upon the 
age and education of the patient, changes of about two points 
may be statistically reliable. However, Clark et al. (1999) have 
suggested that to be clinically meaningful (rather than merely 
reflecting testing imprecision), a change in MMSE score must 
exceed three points. Doody et al. (2001) recommend a drop of 
five or more points to reflect clinically meaningful decline. 

Over intervals of five years, test-retest correlations are limited 
for older adults (65-99 years) classed as intact on both test occa- 
sions (.55) as well as for individuals who were reclassified as 



showing mild cognitive impairment on the second test session 
(.59) (Tombaugh, 2005). In part, the modest correlation for in- 
tact individuals is due to regression to the mean (high scores 
tend to decline whereas low scores tend to increase on subse- 
quent testing). However, there is also considerable variability 
among older individuals over the five-year period. On average, 
MMSE scores remain relatively stable among intact older 
adults, changing less than one point (+.08) over the five-year 
interval, but declining (-1.38) in those with mild cognitive 
impairment. 

Tombaugh (2005) used reliable change methodology to 
provide a better estimate of whether an individual's retest 
score has changed significantly from the initial test score over 
both short (<3 months) and long (five years) retest intervals. 
Table 6-56 shows the regression formulas, the percent of vari- 
ance explained by all variables (e.g., test one score, age, educa- 
tion, gender), and the amount of change needed to exceed the 
.05 level (one-tailed). Note that most of the variance is ac- 
counted for by the score on the first test occasion. A normative 
table containing the percentile equivalents for various change 
scores is also provided (see Table 6-57). The data are based on 
a sample of 232 older adults who were retested following both 
short and long intervals and who received consensus diagnoses 
of no cognitive impairment on both examinations. The values 
needed for clinically meaningful change agree fairly well with 
those suggested by Clark et al. (1999), Doody et al. (2001), and 
Eslinger et al. (2003). 

Interrater Reliability. Scoring of some items (e.g., over- 
lapping polygons) is somewhat subjective, and there is no 
suggested time limit for any item. Interrater reliability is 
marginal (above .65) (Folstein et al., 1975; Foster et al., 1988) 
and could be enhanced with more precise administration 
and scoring criteria (Molloy et al., 1991; Olin & Zelinski, 
1991). 



Pediatric Adaptations 

No information regarding reliability is yet available for the pe- 
diatric adaptation of the MMSE by Ouvrier et al. (1993). 



182 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 

Table 6-57 Percentiles for the Difference Between Obtained Retest Scores and Retest Scores Predicted 
on the Basis of Regression Equations Using Baseline Scores and Demographic Information as 
Regressors for the MMSE and 3MS 



MMSE 



MMSE 



3MS 



3MS 





Short Interval 


Long Interval 


Short Interval 


Long Interval 


Percentiles 


(<3 months) 


(5 years) 


(<3 months) 


(5 years) 


98 (+2 SD) 


3.08 


4.26 


11.27 


12.06 


95 


2.53 


3.79 


7.48 


8.90 


90 


2.05 


3.44 


4.90 


7.05 


84 (+1 SD) 


1.43 


3.16 


3.99 


4.62 


75 


1.06 


2.55 


2.66 


3.45 


50 (0 SD) 


0.13 


1.18 


0.21 


0.47 


25 


-0.78 


0.01 


-1.54 


-3.12 


16 (-1 SD) 


-1.50 


-0.84 


-2.96 


-5.47 


10 


-2.16 


-1.67 


-5.40 


-7.79 


05 


-3.20 


-2.99 


-8.47 


-10.15 


02 (-2 SD) 


-4.22 


-4.67 


-10.28 


-17.31 



Based on a sample of 232 older adults who were retested following three-month and five-year intervals and who received consen- 
sus diagnoses of no cognitive impairment on both examinations. 

Source: From Tombaugh, 2005. Reprinted with permission from Elsevier. 



3MS 

The reliability (internal consistency, test-retest) of the 3MS 
tends to be higher than that of the MMSE (e.g., Besson & 
Labbe, 1997; Bravo & Hebert, 1997b; Tombaugh et al, 1996; 
Tombaugh, in press). For example, Tombaugh et al. (1996) re- 
ported that in normal individuals, Cronbach's alpha was .82 
for the 3MS and .62 for the MMSE. For patients with AD, 
Cronbach's alpha was .88 for the 3MS and .81 for the MMSE. 
The consistently higher alphas for the 3MS reflect, at least in 
part, its larger number of items. 

Test-retest reliability is reported to be high for intervals of 
one to four weeks in children (.85-99; Besson & Labbe, 1997) 
and for intervals of about three months in individuals diag- 
nosed with dementia (intraclass correlation of .85; Correa et 
al.,2001). 

A recent study (Correa et al., 2001) showed that, under 
conditions of repeat testing (<90 days but >14 days) with two 
different assessors in a setting compatible with no change in 
cognitive status, the discrepancy between repeat 3MS scores of 
people with dementia can be as large as + 16 points. This sug- 
gests that the smallest individual change in score that can be 
reliably detected must exceed 16 points. With lengthier retest 
intervals (five years), declines of about seven points are un- 
usual in normal older adults (Tombaugh, in press; see Table 
6-57). 

Interrater reliability is reported to be high (r— .98) for the 
overlapping figures (Teng & Chui, 1987). The interrater relia- 
bility of the 3MS was moderate when measured by agreement 
of clinician categorization of cognitive impairment versus no 
cognitive impairment based on 3MS scores (kappa = 0.67; 
Lamarre 8c Patten, 1991). 



VALIDITY 

Construct Validity— MMSE 

The MMSE shows modest to high correlations with other 
brief screening tests such as the Blessed Test, the DRS, Spanish 
versions of the Mental Status Questionnaire, the Information- 
Memory-Concentration Test, the Orientation-Memory- 
Concentration Test, and the Clock Drawing Task (e.g., 
Adunsky et al, 2002; Fillenbaum et al., 1987; Foreman, 1987; 
Freidl et al, 1996; Salmon et al, 1990; Taussig et al, 1996), 
suggesting that the various tests tap similar, though not neces- 
sarily identical, cognitive domains. For example, 9 of 30 
(40%) points on the MMSE relate to memory and attention. 
By contrast, only 25 of 144 points (17%) on the DRS are for 
memory items. Not surprisingly, agreement between tests 
(e.g., MMSE and DRS) regarding cognitive impairment can 
be low (Freidl et al., 1996, 2002). Thus, in a large sample of 
healthy adults (N= 1957), Freidl et al. reported a correlation 
of .29 between the total scores of the MMSE and DRS. Using 
recommended cutoff scores for each test, the DRS classified 
4.2% of study participants as cognitively impaired in compar- 
ison to 1.6% for the MMSE. Equations have been developed 
to convert total scores from one test to the other (e.g., MMSE 
and DRS), but given the mixed results in the literature, they 
should be used with caution (see also DRS-2 in this volume). 
The equations are provided in Table 6-58. 

Modest to high correlations have also been reported be- 
tween total MMSE scores and a variety of other cognitive mea- 
sures, including tests of intelligence, memory, attention/ 
concentration, constructional ability, and executive function 
(e.g., Axelrod et al., 1992; Bieliauskas et al., 2000; Feher et al., 



Mini-Mental State Examination (MMSE) 183 



Table 6-58 Conversion Formulas From DRS to MMSE 



Test 



MMSE 



MMSE 



MMSE 



Formula 



-12.72 + 0.31 (DRS) 



-10.0 + 0.29 (DRS) 



-3.1 + 0.23 (DRS) 



Reference 

Salmon etal., 1990 
Based on a sample of 92 
patients with probable AD 

Bobholz& Brandt, 1993 
Based on a sample of 50 
patients with suspected 
cognitive impairment 

Meiranetal., 1996 
Based on a sample of 466 
patients in a memory 
disorders clinic. The 
expected error associated 
with this formula is ±3 



1992; Folstein et al., 1975; Giordani et al, 1990; Jefferson et al., 
2002; Mitrushina & Satz, 1991; Perna et al., 2001; Tombaugh 8c 
Mclntyre, 1992). Thus, the total score seems to measure some 
general cognitive ability. 

Folstein et al. (1975) grouped the items into discrete sub- 
sections (e.g., orientation, registration, attention and calcula- 
tion, recall, language); however, these categories were derived 
without empirical justification and it is not clear whether the 
MMSE subsections and individual items can be viewed as 
measures of specific aspects of cognition (Giordani et al., 
1990; Mitrushina & Satz, 1994). Concordance rates between 
individual MMSE tasks and neuropsychological tests address- 
ing corresponding cognitive domains can be quite low (Bene- 
dict & Brandt, 1992; Giordani et al., 1990; Jefferson et al, 
2002; Mitrushina & Satz, 1994). 

Factor-analytic studies of the MMSE often yield a two- 
factor solution (Braekhus et al., 1992; Giordani et al., 1990; 
Tombaugh & Mclntyre, 1992), although other solutions have 
also been found. For example, a study with a large (N— 8556) 
sample of community- dwelling older adults suggested the 
presence of five separate dimensions (concentration, language 
and praxis, orientation, memory, and attention), although the 
MMSE also satisfied criteria of unidimensionality (Jones & 
Gallo, 2000). Similar findings have been reported by Banos and 
Franklin (2002) in 339 adult inpatients at a nonforensic state 
psychiatric hospital. It is difficult to compare the various stud- 
ies due to differences in variable construction, sample compo- 
sition, and statistical method. The two most recent studies 
(Banos & Franklin, 2002; Jones & Gallo, 2000) provide empiri- 
cal support for some of the traditional categories, such as ori- 
entation (time and place), attention (serial 7s), and memory 
(recall of three words). There is less empirical support for the 
categories of registration, language, and construction. 

Clinical Findings 

MMSE. Most studies report that the MMSE summary 
score is sensitive to the presence of dementia, particularly in 



those with moderate to severe forms of cognitive impairment; 
however, the MMSE appears less than ideal when those with 
mild cognitive impairment are evaluated, when focal neuro- 
logical deficits are present (e.g., poststroke), or when psychi- 
atric patients are included (Benedict & Brandt, 1992; Feher et 
al., 1992; Grut et al., 1993; Kupke et al, 1993; Kuslansky et al, 
2004; Meyer et al, 2001; Nys et al., 2005; O'Connor et al, 
1989; Shah et al., 1992; Tombaugh 8c Mclntyre, 1992; Van Der 
Cammen et al., 1992; Wells et al., 1992). There are a number 
of possible explanations for this decreased sensitivity and 
specificity. One possibility rests on the fact that the MMSE is 
biased toward verbal items and does not adequately measure 
other functions such as ability to attend to relevant input, 
ability to solve abstract problems, ability to retain information 
over prolonged time intervals, visual-spatial ability, construc- 
tional praxis, and mood. Accordingly, it may overestimate de- 
mentia in aphasic patients. At the same time, it may be 
relatively insensitive to various dysexecutive and amnestic 
syndromes as well as disturbances of the right hemisphere, re- 
sulting in an increase in false negatives. In addition, the lan- 
guage items are very simple, and mild impairments may go 
undetected. 

The results from the factor-analytic studies reported previ- 
ously (Construct Validity) imply that the individual subsection 
scores of the MMSE should not be used in lieu of more com- 
prehensive assessments if a detailed diagnostic profile is de- 
sired (Banos 8c Franklin, 2002; Giordani et al., 1990). This 
does not mean that the MMSE cannot provide useful infor- 
mation in differentiating among patients with dementia. For 
example, Brandt et al. (1988) showed different profiles on the 
MMSE in patients with AD and patients with Huntington's 
disease. The differences between the groups rested on differ- 
ent scores on the memory and attention/concentration items. 
Patients with AD did worse on the memory items, whereas 
patients with Huntington's disease did worse on the atten- 
tion/concentration items. Similarly, Jefferson et al. (2002) 
found that patients with AD scored lower than patients with 
ischemic vascular dementia (VaD) or Parkinson's disease 
(PD) on temporal orientation and recall tasks, while those 
with VaD obtained lower scores than patients with AD on 
motor/constructional tasks (copying, writing) and an index 
comprising items requiring working memory (spelling 
"world" backward, carrying out three-step commands). The 
VaD and PD groups also made more errors in writing a sen- 
tence and copying intersecting polygons. 

The MMSE may also be useful in predicting who will de- 
velop AD or VaD (e.g., Small, Herlitz, et al., 1997; Jones et al., 
2004). For example, Jones et al. (2004) found that lower base- 
line scores on the MMSE in nondemented persons were asso- 
ciated with an increased risk of AD or VaD after a three-year 
follow-up period. Delayed memory was the best predictor in 
both preclinical VaD and preclinical AD. It is worth bearing in 
mind, however, that although a large proportion (more than 
two-thirds) of cognitively impaired individuals (one or more 
SD below their age and education mean) become demented 
or die in three years, a substantial proportion improve over 



184 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



the same period without a higher risk of later progressing to 
dementia (Palmer et al, 2002). 

The MMSE is also sensitive to cognitive decline; however, as 
the disorder becomes more severe, the test loses its sensitivity 
to change (Salmon et al., 1990; Tombaugh & Mclntyre, 1992). 
In such cases, other tests are preferred, such as the DRS, which 
includes more easy items. For example, the DRS contains a 
number of simple items that assess attentional processes. Se- 
verely demented patients can complete these items since atten- 
tion is often preserved until late in the course of the disease. 
Thus, severely demented patients who can make few, if any, 
correct responses on the MMSE can still respond adequately 
on the DRS. 

A number of investigators report an average annual rate 
of decline of about two to four points on the MMSE for pa- 
tients with probable dementia of the Alzheimer type (Becker 
et al, 1988; Clark et al., 1999; Doody et al., 2001; Salmon et 
al., 1990; Small, Viitanen, et al., 1997). However, progression 
rates of AD are nonlinear and quite variable between per- 
sons (Clark et al, 1999; Doody et al, 2001). Nonetheless, 
there is some consistency in that patients who begin with 
progression rates that are more rapid than average (>5 
MMSE points per year) continue to decline sooner than pa- 
tients who begin at slow (<1.9 points per year) or average 
rates (2-4.9 points per year; Doody et al., 2001). Further, in- 
dividuals with AD decline at a faster rate than patients with 
VaD (Nyenhuis et al., 2002) or frontotemporal dementia 
(Pasquier etal, 2004). 

MMSE scores correlate with histopathological findings in 
in vivo brain images and event-related potentials (Aylward 
et al, 1996; Bigler et al., 2002; Colohan et al, 1989; DeKoskey 
et al., 1990; Finley et al, 1985; Martin et al, 1987; Pearlson 8c 
Tune, 1986; Stout et al., 1996; Tsai & Tsuang, 1979). For in- 
stance, a number of authors (e.g., Bigler et al., 2002; Stout et 
al. 1996) have reported an association between white matter 
lesions, noted on MRI, and impaired cognitive function as 
measured by the MMSE. Clinical-pathological study of pa- 
tients with AD reveal that the best predictors of MMSE 
scores are the total counts of neurofibrillary tangles (NFT) in 
the entorhinal cortex and area 9 as well as degree of neuronal 
loss in the CA1 field of the hippocampus (Ginnakapoulos 
et al, 2003). 

Sensitivity of Items 

Analyses of individual items reveal that errors rarely occur on 
questions related to orientation to place and language; for 
both normal and demented individuals, most errors occur for 
the recall of three words, serial 7s/"world," pentagon, and ori- 
entation to time. In short, these latter items are the most sen- 
sitive to normal aging and a variety of diseases (e.g., diabetes, 
cardiovascular disease) including dementing processes (Hill & 
Backman, 1995; Nilsson et al, 2002; Tombaugh & Mclntyre, 
1992; Tombaugh et al, 1996; Wells et al., 1992). Sensitivity of 
specific items to cognitive impairment in children has not yet 
been established. 



Serial 7s/"World" 

There is evidence that serial 7s and reverse spelling of "world" 
represent different tasks. Spelling "world" backward consis- 
tently produces higher scores than does counting backward by 
sevens (Tombaugh & Mclntyre, 1992). Serial 7s maximizes 
variability, increases internal consistency, and reduces mea- 
surement error, thus increasing the likelihood of discriminat- 
ing between individuals in their level of cognitive ability 
(Espino et al., 2004). In fact, Espino et al. (2004) have argued 
that only serial 7s should be given. However, users should bear 
in mind that performance on the serial 7s task appears heavily 
influenced by basic arithmetic skills and therefore should be 
used with caution as a measure of concentration (Karzmark, 
2000). 



Ecological Validity 

MMSE. MMSE scores show modest relations with mea- 
sures of functional capacity (e.g., driving, cooking, caring for 
finances, consent to participate in studies), functional out- 
come after stroke, and time to nursing home care and death 
(e.g., Adunsky et al, 2002; Bigler et al, 2002; Burns et al, 
1991; Fillenbaum et al, 1988a, 1988b; Kim & Caine, 2002; 
Lemsky et al., 1996; Marcopulos et al, 1997; Stern et al, 1997; 
Taussig et al., 1996; see also Ruchinskas & Curyto, 2003, for a 
recent review). For example, Gallo et al. (1998) reported that 
poor performance on the copy polygons task is associated 
with an increase in motor vehicle crashes. In addition, the test 
is somewhat sensitive to mortality-related effects with de- 
clines in scores evident about three to five years prior to death 
(e.g., Nguyen et al., 2003; Tan, 2004). The greatest risk is for 
those with moderate to severe cognitive impairment, al- 
though mild impairment (e.g., MMSE scores 18-23) is also 
associated with increased risk (Nguyen et al., 2003). A decline 
of at least four points over two years is also predictive of an 
increased risk of mortality, perhaps reflecting symptoms of 
medical diseases that carry with them long-term risk of mor- 
tality (Nguyen et al., 2003). Stern et al. (1997) have developed 
an equation to predict the estimated time to nursing home 
care and death in people with AD. The prediction equation is 
available at the following site: cpmcnet.columbia.edu/dept/ 
sergievsky/predictor.html. 

Because cognitive deficits can render self-report (e.g., in- 
cluding responses on the Geriatric Depression Scale, Beck De- 
pression Inventory) invalid, interpretation is considered 
hazardous once MMSE scores decline below 20 (Bedard et al., 
2003). In such cases, reports from informants and observa- 
tions are critical. A threshold of 20 may also apply to patient 
involvement in medical decision making (Hirschman et al., 
2003). However, Kim and Caine (2002) noted that in a sample 
of patients with mild to moderate Alzheimer's disease, a fairly 
wide range of MMSE scores (21-25, which includes an often 
used cutoff for normal) did not discriminate consent capacity 
status well. They recommend that if there are strong ethical 
reasons to select only patients who are clearly capable of 



Mini-Mental State Examination (MMSE) 185 



providing consent, then the researcher/practitioner is advised 
to use a higher MMSE cutoff score (e.g., 26). 

The 3MS and Other Versions 

Several attempts have been made to improve the utility of the 
MMSE by omitting items of limited diagnostic utility and/or 
adding items or tests known to be sensitive to cognitive im- 
pairment. For example, Loewenstein et al. (2000) incorpo- 
rated three extended-delay recall trials (for the recall items) at 
five-minute intervals. The modification showed increased 
sensitivity and specificity in differentiating cognitively normal 
older adults from those with mild cognitive impairment. 

As noted earlier (see Description), the 3MS by Teng et al. 
(1987) added four additional items (date and place of birth, 
word fluency, similarities, and delayed recall of words), in- 
cluded items that assess different aspects of memory, and in- 
creased the maximum score to permit greater differentiation 
among individuals. A factor analysis of the 3MS yielded the 
following five domains: psychomotor skills, memory, identifi- 
cation and association, orientation, and concentration and 
calculation (Abraham et al., 1993). In addition, the 3MS has 
moderate to high correlations with neuropsychological tests 
that assess similar domains (Grace et al., 1995), providing evi- 
dence for its concurrent validity. 

Although some studies have found the sensitivity and 
specificity of the 3MS and MMSE to be similar (Nadler et al., 
1995; Tombaugh et al, 1996), others have found the 3MS to 
be more sensitive in detecting cognitive deficits and to be a 
better predictor of functional outcome (Grace et al., 1995). In 
all studies, the criterion validity of the 3MS is high. For exam- 
ple, Tombaugh et al. (1996) reported that the sensitivity of the 
3MS was 92.6% when screening for AD versus no cognitive 
impairment at a cutoff score of 77/78 (i.e., those who scored 
77 or below were classed as impaired whereas those who 
scored 78 or above were classed as normal). 

Recently, the 3MS has been modified (3MS-R; Tschantz 
et al., 2002). The modifications are primarily in the area of as- 
sessing remote memory where the authors substituted the re- 
call of personal demographic information (date and place of 
birth) with the recall of current and past prominent politi- 
cians to make the information easier to verify. They also 
changed some of the item scaling in the orientation section 
and shortened the time allotted for the verbal fluency item 
(from 30 to 20 seconds). As might be expected, the test is sen- 
sitive to dementia. Lower age, higher education, and female 
gender were linked to higher 3MS-R scores. Normative data, 
stratified by age, education, and gender, are available based on 
a sample of over 2000 cognitively intact older people. 

COMMENT 

The MMSE has been used widely to detect dementia for over 
25 years. Part of its popularity can be attributed to its ease of 
administration, its brevity, and the large volume of literature 
that has accumulated. 



The choice of screening measure for identifying dementia 
will depend on the goals of the examination and on the sam- 
ple studied. In geriatric or neurological samples with a high 
prevalence of patients with illiteracy or language or motoric 
disorders, the MMSE may not be ideal and may lead to an over- 
estimation of dementia (e.g., Kuslansky et al, 2004; Tombaugh 
& Mclntyre, 1992). At the same time, it may miss a significant 
proportion of individuals who have mild memory or other 
cognitive losses. Nonetheless, it may offer equally effective de- 
tection of dementia as lengthier tests (e.g., DRS, Neurobehav- 
ioral Cognitive Status Examination, Hopkins Verbal Learning 
Test), at least in some populations (Chan et al, 2003; Kuslan- 
sky et al., 2004; van Gorp et al., 1999; but see Meyer et al., 
2001). However, it is important to bear in mind that agree- 
ment between tests is not assured. 

Further, the MMSE lacks diagnostic specificity: Low scores 
signal that there may be important changes in cognition and 
health. Analysis of some of the items (particularly those related 
to orientation, attention, and memory) may generate more 
targeted questions and/or offer clues with regard to the type of 
disorder. In short, the presence and nature of cognitive impair- 
ment should not be diagnosed on the basis of MMSE scores 
alone. The examiner needs to follow up suspicions of impair- 
ment with a more in-depth evaluation. 

The use of age- and education-stratified normative vari- 
ables when screening for dementia or cognitive impairment 
has recently been questioned. O'Connell et al. (2004) reported 
that correcting for demographic influences (age, education, 
gender) failed to improve the accuracy of the 3MS. Similar 
findings have been reported with regard to the MMSE (Krae- 
mer et al., 1998). Because age and education are in themselves 
apparent risk factors for dementia, removal of the effects of 
these demographic variables (i.e., age or education, or both) 
might remove some of the predictive power of the screening 
measure (Sliwinski et al, 1997). Additional research is needed 
to determine whether correction is important to improve di- 
agnostic accuracy and whether it is critical in some contexts 
but not others. 

It should also be noted that the use of simple cutoffs, while 
widely used, oversimplifies the situation by ignoring the preva- 
lence of dementia in a given setting. In other words, the sensi- 
tivity and specificity of specific cutoffs will vary with the base 
rates of dementia in each setting. Meiran et al. (1996) show 
that a score of 26 is unlikely to be associated with dementia in 
an environment where most individuals do not have dementia 
(e.g., 20% prevalence). In that setting (i.e., low base rate), a 
score of 26 probably indicates dementia with a 52% probabil- 
ity. However, a score of 26 in a nursing home setting is likely to 
be associated with dementia since many of the patients (e.g., 
50%) have dementia (i.e., high base rate). In this case, the 
chances are 81% that the patient has dementia. On the other 
hand, in ambulatory memory disorder clinics, the presence of 
dementia is about 75%. In that setting, a score of 26 indicates 
dementia with a 93% probability. 

While the test may be a useful screening instrument to as- 
sess level of performance, it has limited value in measuring 



186 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



progression of disease in individual patients. This is because 
of a large measurement error and substantial variation in 
change in annual scores (Clark et al., 1999). 

Finally, a number of revisions of the MMSE have been pro- 
posed, including the 3MS — a modification that has promising 
psychometric characteristics. The various revisions, however, 
have not yet gained widespread use. 



REFERENCES 

Abraham, I. L., Manning, C. A., Boyd, M. R., Neese, J. B., Newman, 
M. C, Plowfield, L. A., et al. (1993). Cognitive screening of nurs- 
ing home residents: Factor analytic structure of the modified 
Mini-Mental State (3MS) examination. International Journal of 
Geriatric Psychiatry, 8, 133-138. 

Adunsky, A., Fleissig, Y., Levenkrohn, S., Arad, M., & Noy, S. (2002). 
Clock drawing task, Mini-Mental State Examination and 
cognitive-functional independence measure: Relation to func- 
tional outcome of stroke patients. Archives of Gerontology and 
Geriatrics, 35, 153-160. 

Anthony, J. C, LeResche, L., Niaz, U., Von Korff, M. R., & Folstein, M. F. 
(1982). Limits of the "Mini-Mental State" as a screening test for 
dementia and delirium among hospital patients. Psychological 
Medicine, 12, 397-408. 

Antsey, K. J., Matters, B., Brown, A. K., & Lord, S. R. (2000). Norma- 
tive data on neuropsychological tests for very old adults living in 
retirement villages and hostels. The Clinical Neuropsychologist, 14, 
309-317. 

Auer, S., Hampel, FL, Moller, H-J., & Reisberg, B. (2000). Translations 
of measurements and scales: Opportunities and diversities. Inter- 
national Geriatrics, 12, 391-394. 

Axelrod, B. N., Goldman, R. S., & Henry, R. R. (1992). Sensitivity of 
the Mini-Mental State Examination to frontal lobe dysfunction in 
normal aging. Journal of Clinical Psychology, 48, 68-71. 

Aylward, E. FL, Rasmussen, D. X., Brandt, J., Raimundo, L., Fol- 
stein, M., & Pearlson, G. D. (1996). CT measurement of supra- 
cellar cistern predicts rate of cognitive decline in Alzheimer's 
disease. Journal of the International Neuropsychological Society, 
2, 89-95. 

Banos, J. FL, & Franklin, L. M. (2002). Factor structure of the Mini- 
Mental State Examination in adult psychiatric inpatients. Psycho- 
logical Assessment, 14, 397-400. 

Becker, J. X, Huff, F. J., Nebes, R. D., Holland, A., & Boiler, F. (1988). 
Neuropsychological function in Alzheimer's disease: Pattern of 
impairment and rate of progression. Archives of Neurology, 45, 
263-268. 

Bedard, M., Molloy, D. W., Squire, L., Minthorn-Biggs, M-B., Dubois, 
S., Lever, J. A., & O'Donnell, M. (2003). Validity of self-reports 
in dementia research: The Geriatric Depression Scale. Clinical 
Gerontologist, 26, 155-163. 

Benedict, R. H. B., & Brandt, J. (1992). Limitation of the Mini-Mental 
State Examination for the detection of amnesia. Journal of Geri- 
atric Psychiatry and Neurology, 5233-5237 '. 

Besson, P. S„ & Labbe, E. E. (1997). Use of the modified Mini-Mental 
State Examination with children. Journal of Child Neurology, 12, 
455-460. 

Bieliauskas, L. A., Depp, C, Kauszler, M. L., Steinberg, B. A., & Lacy, 
M. (2000). IQ and scores on the Mini- Mental State Examination 
(MMSE). Aging, Neuropsychology, and Cognition, 7, 227-229. 



Bigler, E. D, Kerr, B., Victoroff, J., Tate, D. F, & Breitmner, J. C. S. (2002). 
White matter lesions, quantitative magnetic resonance imaging and 
dementia. Alzheimer Disease and Associated Disorders, 16, 16 1-1 70. 

Bleecker, M. L., Bolla-Wilson, K., Kawas, C, & Agnew, J. (1988). Age- 
specific norms for the Mini-Mental State Exam. Neurology, 33, 
1565-1568. 

Bobholz, J. H., & Brandt, J. (1993). Assessment of cognitive impair- 
ment: Relationship of the Dementia Rating Scale to the Mini-Men- 
tal State Examination. Journal of Geriatric Psychiatry and Neurology, 
12, 180-188. 

Braekhus, A., Laake, K., & Engedal, K. (1992). The Mini-Mental State 
Examination: Identifying the most efficient variables for detect- 
ing cognitive impairment in the elderly. Journal of the American 
Geriatrics Society, 40, 1139-1145. 

Brandt, J., Folstein, S. E., & Folstein, M. F. (1988). Differential cogni- 
tive impairment in Alzheimer's disease and Huntington's disease. 
Annals of Neurology, 23, 555-561. 

Bravo, G., & Hebert, R. (1997a). Age- and education-specific refer- 
ences values for the Mini-Mental and Modified Mini-Mental State 
Examinations derived from a non-demented elderly population. 
International Journal of Geriatric Psychiatry, 12, 1008-1018. 

Bravo, G., & Hebert, R. (1997b). Reliability of the modified Mini- 
Mental State Examination in the context of a two-phase commu- 
nity prevalence study. Neuroepidemiology, 16, 141-148. 

Brown, L. M., Schinka, J. A., Mortimer, J. A., & Graves, A. B. (2003). 
3MS normative data for elderly African Americans. Journal of 
Clinical and Experimental Neuropsychology, 25, 234-241. 

Burns, A., Facoby, R., & Levy, R. (1991). Progression of cognitive im- 
pairment in Alzheimer's disease. Journal of the American Geriatric 
Society, 39, 39-45. 

Chan, A. S., Choi, A., Chiu, H., & Liu, L. (2003). Clinical validity of 
the Chinese version of Mattis dementia rating scale in differenti- 
ating dementia of Alzheimer's type in Hong Kong. Journal of the 
International Neuropsychological Society, 9, 45-55. 

Christensen, H., & Jorm, A. F. (1992). Effect of premorbid intelli- 
gence on the Mini- Mental State and IQCODE. International Jour- 
nal of Geriatric Psychiatry, 7, 159-160. 

Clark, C. M., Sheppard, L., Fillenbaum, G. G., Galasko, D., Morris, I. C, 
Koss, E., Mohs, R., Heyman, A., & the Cerad Investigators. (1999). 
Variability in annual Mini-Mental State Examination score in pa- 
tients with probable Alzheimer disease. Archives of Neurology, 56, 
857-862. 

Colohan, H., O'Callaghan, E., Larkin, C, Waddington, J. L. (1989). 
An evaluation of cranial CT scanning in clinical psychiatry. Irish 
Journal of Medical Science, 158, 178-181. 

Correa, ]. A., Perrault, H., & Wolfson, C. (2001). Reliable individual 
change scores on the 3MS in older persons with dementia: Results 
from the Canadian study of health and aging. International Psy- 
chogeriatrics, 13, 71-78. 

Crum, R. M., Anthony, }. C, Bassett, S. S., & Folstein, M. F. (1993). 
Population-based norms for the Mini-Mental State Examination 
by age and educational level. Journal of the American Medical As- 
sociation, 269, 2386-2391. 

DeKosky, S. T, Shih, W. L, Schmitt, F. A., Coupal, L, & Kirkpatrick, C. 
(1990). Assessing utility of single photon emission computed to- 
mography (SPECT) scan in Alzheimer disease: Correlation with 
cognitive severity. Alzheimer Disease and Associated Disorders, 4, 
14-23. 

Doody, R. S., Massman, P., & Dunn, ]. K. (2001). A method for esti- 
mating progression rates in Alzheimer disease. Archives of Neurol- 
ogy, 58, 449-454. 



Mini-Mental State Examination (MMSE) 187 



Doraiswamy, P. M., & Kaiser, L. (2000). Variability of the Mini- 
Mental State Examination in dementia. Neurology, 54, 1538-1539. 

Dufouil, C, Clayton, D., Brayne, C, Chi, L. Y., Dening, T. R., Paykel, E. 
S., O'Connor, D. W., Ahmed, A., McGee, M. A., & Huppert, F. A. 
(2000). Population norms for the MMSE in the very old. Neurol- 
ogy, 55, 1609-1612. 

Eslinger, P. J., Swan, G. E., & Carmelli, D. (2003). Changes in the 
mini-mental state exam in community-dwelling older persons 
over 6 years: Relationships to health and neuropsychological 
measures. Neuroepidemiology, 22, 23-30. 

Espino, D. V., Lichtenstein, M. J., Palmer, R. E, & Hazuda, H. P. 
(2001). Ethnic differences in Mini-Mental State Examination 
(MMSE) scores: Where you live makes a difference. Journal of the 
American Geriatric Society, 49, 538-548. 

Espino, D. V., Lichtenstein, M. J., Palmer, R. E, & Hazuda, H. P. (2004). 
Evaluation of the Mini-Mental State Examination's internal con- 
sistency in a community-based sample of Mexican-American and 
European-American elders: Results from the San Antonio longitu- 
dinal study of aging. Journal of the American Geriatrics Society, 52, 
822-827. 

Feher, E. P., Mahurin, R. K., Doody, R. S., Cooke, N., Sims, J., & Piroz- 
zolo, F. J. (1992). Establishing the limits of the Mini-Mental State. 
Archives of Neurology, 49, 87-92. 

Fillenbaum, G. G, George, L. K., & Blazer, D. G. (1988a). Scoring 
nonresponse on the Mini-Mental State Examination. Psychologi- 
cal Medicine, 18, 1021-1025. 

Fillenbaum, G. G., Heyman, A., Wilkinson, W. E., & Haynes, C. S. 
(1987). Comparison of two screening tests in Alzheimer's disease. 
Archives of Neurology, 44, 924-927. 

Fillenbaum, G. G., Hughes, D. C, Heyman, A., George, L. K., et al. 
(1988b). Relationship of health and demographic characteristics 
to Mini-Mental State Examination scores among community res- 
idents. Psychological Medicine, 18, 719-726. 

Finley, W. W., Faux, S. E, Hutcheson, J., & Amstutz, L. (1985). Long- 
latency event-related potentials in the evaluation of cognitive 
function in children. Neurology, 35, 323-327. 

Folstein, M. E, Folstein, S. E., & McHugh, P. R. (1975). "Mini-Mental 
State." A practical method for grading the cognitive state of pa- 
tients for the clinician. Journal of Psychiatric Research, 12, 189-198. 

Folstein, M. E, Folstein, S. E., McHugh, P. R., & Fanjiang, G. (2001). 
Mini-Mental State Examination: User's guide. Odessa, FL: PAR. 

Ford, G R., Haley, W. E., Thrower, S. L., West, C. A. C, & HarreU, L. E. 
(1996). Utility of Mini-Mental State Exam scores in predicting func- 
tional impairment among White and African American dementia 
patients. Journal of Gerontology: Medical Sciences, 51A, M185-M188. 

Foreman, M. D. (1987). Reliability and validity of mental status ques- 
tionnaires in elderly hospitalized patients. Nursing Research, 36, 
216-220. 

Foster, J. R., Sclan, S., Welkowitz, J., Boksay, 1., & Seeland, I. (1988). Psy- 
chiatric assessment in medical long-term care facilities: Reliability 
of commonly used rating scales. International Journal of Geriatric 
Psychiatry, 3, 229-233. 

Fountoulakis, K. N., Tsolaki, M., Chantzi, H., & Kazis, A. (2000). 
Mini Mental State Examination (MMSE): A validation study in 
Greece. American Journal of Alzheimer's Disease and Other De- 
mentias, 15, 342-345. 

Freidl, W, Schmidt, R., Stronegger, W. J., Fazekas, E, & Reinhart, B. 
(1996). Sociodemographic predictors and concurrent validity of 
the Mini Mental State Examination and the Mattis Dementia Rat- 
ing Scale. European Archives of Psychiatry and Clinical Neuro- 
science, 246, 317-319. 



Freidl, W., Stronegger, W.-J., Berghold, A., Reinhart, B., Petrovic, K„ 
& Schmidt, R. (2002). The agreement of the Mattis Dementia 
Rating Scale with the Mini-Mental State Examination. Interna- 
tional Journal of Psychiatry, 17, 685-686. 

Gallo, J. J., Rebok, G., & Lesikar, S. (1998). The driving habits of 
adults aged 60 years and older. Journal of the American Geriatrics 
Society, 47, 335-341. 

Giannakopoulos, P., Herrmann, F. R., Bussiere, T, Bouras, G, Kovari, E., 
Perl, D. P., Morrison, J. H., Gold, G, & Hof, P. R. (2003). Tangle and 
neuron numbers, but not amyloid load, predict cognitive status in 
Alzheimer's disease. Neurology, 60, 1495-1500. 

Giordani, B„ Boivin, M. J., Hall, A. L., Foster, N. L., Lehtinen, S. J., 
Bluemlein, M. S., & Berent, S. (1990). The utility and generality of 
Mini-Mental State Examination scores in Alzheimer's disease. 
Neurobgy, 40, 1894-1896. 

Grace, J., Nadler, J. D., White, D. A., Guilmette, T J., et al. (1995). Fol- 
stein vs modified Mini-Mental State Examination in geriatric 
stroke: Stability, validity, and screening utility. Archives of Neurol- 
ogy, 52, 477-484. 

Grut, M., Fraiglioni, L., Viitanen, M., & Winblad, B. (1993). Accuracy 
of the Mini-Mental Status Examination as a screening test for de- 
mentia in a Swedish elderly population. Acta Neurologica Scandi- 
navica, 87, 312-317. 

Helkala, E-L., Kivipelto, M., Hallikainen, M., Alhainene, K., Heinonen, 
H., Tuomilehto, J., Soininen, H., & Nissines, A. (2002). Usefulness 
of repeated presentation of Mini-Mental State Examination as a di- 
agnostic procedure — a population-based study. Acta Neurologica 
Scandinavica, 106, 341-346. 

Hill, R. D., & Backman, L. (1995). The relationships between the 
Mini-Mental State Examination and cognitive functioning in 
normal elderly adults: A componential analysis. Age Aging, 24, 
440-446. 

Hirschman, K. B., Xie, S. X., Feudtner, C, & Karlawish, J. H. T 
(2003). How does Alzheimer's disease patient's role in medical 
decision making change over time? Journal of Geriatric Psychiatry 
& Neurology, 17, 55-60. 

Hopp, G. A., Dixon, R. A., Backman, I., & Grut, M. (1997). Stability of 
two measures of cognitive functioning in nondemented old-old 
adults. Journal of Clinical Psychology, 53, 673-686. 

Ishizaki, J., Meguro, K., Ambo, H., Shimada, M., Yamaguchi, S., 
Hayasaka, C, Komatsu, H., Sekita, Y, & Yamadori, A. (1998). A 
normative, community-based study of Mini-Mental State in el- 
derly adults: The effect of age and educational level. Journal of 
Gerontology: Psychological Sciences, 53B, P359-P363. 

Iverson, G. L. (1998). Interpretation of Mini-Mental State Examina- 
tion scores in community-dwelling elderly and geriatric neu- 
ropsychiatry patients. International Journal of Geriatric Psychiatry, 
13, 661-666. 

Jefferson, A. L., Consentino, S. A., Ball, S. K., Bogdanoff, B., Kaplan, 
E., & Libon, D. J. (2002). Errors produced on the Mini-Mental 
State Examination and neuropsychological test performance in 
Alzheimer's disease, ischemic vascular dementia, and Parkinson's 
disease. Journal of Neuropsychiatry and Clinical Neurosciences, 14, 
311-320. 

Jones, R. N., & Gallo, J. J. (2000). Dimensions of the Mini-Mental 
State Examination among community dwelling older adults. Psy- 
chological Medicine, 30, 605-618. 

Jones, R. N., & Gallo, J. J. (2002). Education and sex differences in the 
Mini-Mental State Examination: Effects of differential item func- 
tioning. Journals of Gerontology: Series B: Psychological Sciences & 
Social Sciences, 57B, P548-P558. 



188 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



Jones, S., Laukka, E. J., Small, B. J., Fratiglioni, L., & Backman, L. 
(2004). A preclinical phase in vascular dementia: Cognitive im- 
pairment three years before diagnosis. Dementia & Geriatric Cog- 
nitive Disorders, 18, 233-239. 

Jones, T. G., Schinka, J. A., Vanderploeg, R. D., Small, B. J., Graves, A. B., 
& Mortimer, J. A. (2002). 3MS normative data for the elderly. 
Archives of Clinical Neuropsychology, 17, 171-177. 

Jorm, A. R, Scott, R., Henderson, A. S., & Kay, D. W. (1988). Educa- 
tional level differences on the Mini-Mental State. Psychological 
Medicine, 18, 727-788. 

Karzmark, P. (2000). Validity of the serial seven procedure. Interna- 
tional Journal of Geriatric Psychiatry, 15, 677-679. 

Kim, S. Y. K., & Caine, E. D. (2002). Utility and limits of the Mini 
Mental State Examination in evaluating consent capacity in Al- 
zheimer's disease. Psychiatric Services, 53, 1322-1324. 

Kraemer, H. C, Moritz, D. J., & Yesavage, J. (1998). Adjusting Mini- 
Mental State Examination scores for age and education level to 
screen for dementia: Correcting bias or reducing variability. Inter- 
national Psychogeriatrics, 10, 43-51. 

Kupke, T., Revis, E. S., & Gantner, A. B. (1993). Hemispheric bias of 
the Mini-Mental State Examination in elderly males. The Clinical 
Neuropsychologist, 7, 210-214. 

Kuslansky, G., Katz, M., Verhese, J., Hall, C. B., Lapuerta, P., LaRuffa, 
G., & Lipton, R. B. (2004). Detecting dementia with the Hopkins 
learning test and the Mini-Mental State Examination. Archives of 
Clinical Neuropsychology, 19, 89-104. 

Lamarre, C. J., & Patten, S. B. (1991). Evaluation of the modified 
Mini-Mental State Examination in a general psychiatric popula- 
tion. Canadian Journal of Psychiatry, 36, 507-51 1. 

Lemsky, C. M., Smith, G., Malec, J. R., & Ivnik, R. J. (1996). Identify- 
ing risk for functional impairment using cognitive measures: An 
application of CART modeling. Neuropsychology, 10, 368-375. 

Loewenstein, D. A., Barker, W. W., Harwood, D. G., Luis, C, Acevedo, 
A., Rodriguez, I., & Duara, R. (2000). Utility of a modified Mini- 
Mental State Examination with extended delayed recall in screen- 
ing for mild cognitive impairment and dementia among 
community dwelling elders. International Journal of Geriatric Psy- 
chiatry, 15, 434-440. 

Lopez, M. N„ Charter, R. A., Mostafavi, B., Nibut, L. P., & Smith, W. E. 
(2005). Psychometric properties of the Folstein Mini-Mental 
State Examination. Assessment, 12, 137-144. 

Marcopulos, B. A., & McLain, C. A. (2003). Are our norms "normal"? 
A 4-year follow-up study of a biracial sample of rural elders with 
low education. The Clinical Neuropsychologist, 17, 19-33. 

Marcopulos, B. A., McLain, C. A., & Giuliano, A. J. (1997). Cognitive 
impairment or inadequate norms? A study of healthy, rural, older 
adults with limited education. The Clinical Neuropsychologist, 11, 
111-131. 

Martin, E. M., Wilson, R. S., & Penn, R. D., Penn, R. D., Fox, J. H., 
et al. ( 1 987) . Cortical biopsy results in Alzheimer's disease: Corre- 
lation with cognitive deficits. Neurology, 37, 1201-1204. 

Meiran, N., Stuss, D. X, Guzman, A., Lafleche, G., & Willmer, J. ( 1996). 
Diagnosis of dementia: Methods for interpretation of scores of 5 
neuropsychological tests. Archives of Neurology, 53, 1043-1054. 

Meyer, J. S., Li, Y-S., & Thornby, J. (2001). Validating Mini-Mental 
Status, cognitive capacity screening and Hamilton Depression 
scales utilizing subjects with vascular headaches. International 
Journal of Geriatric Psychiatry, 16, 430-435. 

Mitrushina, M., & Satz, P. (1991). Reliability and validity of the Mini- 
Mental State Exam in neurologically intact elderly. Journal of 
Clinical Psychology, 47, 537-543. 



Mitrushina, M., & Satz, P. (1994). Utility of Mini-Mental State Exam- 
ination in assessing cognition in the elderly. Paper presented to 
the International Neuropsychological Society, Cincinnati, OH. 

Molloy, D. W., Alemayehu, E., & Roberts, R. (1991). Reliability of a 
standardized Mini-Mental State Examination compared with the 
traditional Mini-Mental State Examination. American Journal of 
Psychiatry, 148, 102-105. 

Mulgrew, C. L., Morgenstern, N., Shtterly, S. M., Baxter, J., Baron, A. E., 
& Hamman, R. F. (1999). Cognitive functioning and impairment 
among rural elderly Hispanics and non-Hispanic Whites as as- 
sessed by the Mini- Mental State Examination. Journal of Geron- 
tology: Psychological Sciences, 54B, P223-230. 

Nadler, J. D., Relkin, N. R., Cohen, M. S., Hodder, R. A., Reingold, J., 
& Plum, F. (1995). Mental status testing in the elderly nursing 
home populations. Journal of Geriatric Neurology, 8, 177-183. 

Nilsson, E., Fastbom, J., & Wahlin, A. (2002). Cognitive functioning in 
a population-based sample of very old non-demented and non- 
depressed persons: The impact of diabetes. Archives of Gerontology 
and Geriatrics, 35, 95-105. 

Nguyen, H. T, Black, S. A., Ray, L., Espino, D. V, & Markides, K. S. 
(2003). Cognitive impairment and mortality in older Mexican 
Americans. Journal of the American Geriatrics Society, 51, 178-183. 

Nyenhuis, D. L., Gorelick, P. B„ Freels, S., & Garron, D. C. (2002). 
Cognitive and functional decline in African Americans with VaD, 
AD, and stroke without dementia. Neurology, 58, 56-61. 

Nys, G. M. S., van Zandvoort, M. J. E., de Kort, P. L. M., Jansen, B. P. W., 
Kappelle, L. J., & de Haan, E. H. F. (2005). Restrictions of the 
Mini-Mental State Examination in acute stroke. Archives of Clini- 
cal Neuropsychology, 20, 623-629. 

O'Connell, M. E., Tuokko, H., Graves, R. E., & Kadlec, H. (2004). 
Correcting the 3MS for bias does not improve accuracy when 
screening for cognitive impairment or dementia. Journal of Clini- 
cal and Experimental Neuropsychology, 26, 970-980. 

O'Connor, D. W., Pollitt, P. A., Treasure, F. P., Brook, C. P. B., & Reiss, 
B. B. (1989). The influence of education, social class and sex on 
Mini-Mental State scores. Psychological Medicine, 19, 771-776. 

Olin, J. T, & Zelinski, E. M. ( 1991 ). The 12-month stability of the Mini- 
Mental State Examination. Psychological Assessment, 3, 427-A32. 

Ouvrier, R. A., Goldsmith, R. E, Ouvrier, S., & Williams, I. C. (1993). 
The value of the Mini-Mental State Examination in childhood: A 
preliminary study. Journal of Child Neurology, 8, 145-149. 

Palmer, K., Wang, H-X., Backman, L., Winblad, B., & Fratiglioni, L. 
(2002). Differential evolution of cognitive impairment in nonde- 
mented older persons: Results from the Kungholmen project. 
American Journal of Psychiatry, 159, 436-442. 

Pasquier, R, Richard, R, & Lebert, F. (2004). Natural history of fron- 
totemporal dementia: Comparison with Alzheimer's disease. De- 
mentia & Geriatric Cognitive Disorders, 17, 253-257. 

Pearlson, G. D., & Tune, L. E. (1986). Cerebral ventricular size and 
cerebrospinal fluid acetylcholinesterase levels in senile dementia 
of the Alzheimer type. Psychiatry Research, 1 7, 23-29. 

Perna, R. B., Bordini, E. J., & Boozer, R. H. (2001). Mini-Mental Sta- 
tus Exam: Concurrent validity with the Rey Complex Figure Test 
in a geriatric population with depression. The Journal of Cognitive 
Rehabilitation, 19, 24-29. 

Ruchinskas, R. A., & Curyto, K. J. (2003). Cognitive screening in geri- 
atric rehabilitation. Rehabilitation Psychology, 48, 14-22. 

Salmon, D. P., Thai, L. J., Butters, N, & Heindel, W. C. (1990). Longi- 
tudinal evaluation of dementia of the Alzheimer type: A compar- 
ison of 3 standardized mental status examinations. Neurology, 40, 
1225-1230. 



National Adult Reading Test (NART) 189 



Shadlen, M-E, Larson, E. B„ Gibbons, L., McCormick, W. C, & Teri, L. 
(1999). Alzheimer's disease symptom severity in blacks and 
whites. Journal of the American Geriatrics Society, 47, 482-486. 

Shah, A., Phongsathorn, V., George, C., Bielawska, C., & Katona, C. 
(1992). Psychiatric morbidity among continuing care geri- 
atric inpatients. International Journal of Geriatric Psychiatry, 7, 
517-525. 

Sliwinski, M., Buschke, H., Stewart, W. E, Masur, D., & Lipton, R. D. 
(1997). The effect of dementia risk factors on comparative and 
diagnostic selective reminding norms. Journal of the International 
Neuropsychological Society, 3, 317-326. 

Small, B. J., Herlitz, A., Fratiglioni, L., Almkvist, O., & Backman, L. 
(1997a). Cognitive predictors of incident Alzheimer's disease: A 
prospective longitudinal study. Neuropsychology, 11, 413-420. 

Small, B. J., Viitanen, M., Winblad, B., & Backman, L. (1997b). Cog- 
nitive changes in very old persons with dementia: The influence 
of demographic, psychometric, and biological variables. Journal 
of Clinical and Experimental Neuropsychology, 19, 245-260. 

Starr, J. M., Whalley, L. J., Inch, S., & Shering, P. A. (1992). The quan- 
tification of the relative effects of age and NART-predicted IQ on 
cognitive function in healthy old people. International Journal of 
Geriatric Psychiatry, 7, 153-157. 

Stern, Y., Tang, M-X., Albert, M. S., Brandt, J., Jacobs, D. M., Bell, K., 
Marder, K., Sano, M., Devanand, D., Slbert, S. M., Blysma, E, & 
Tsai, W-Y. Predicting time to nursing home care and death in in- 
dividuals with Alzheimer disease. Journal of the American Medical 
Association, 277, 806-812. 

Stout, J. C, Jernigan, T. L., Archibald, S. L., & Salmon, D. P. (1996). 
Association of dementia severity with cortical gray matter and 
abnormal white matter volumes in dementia of the Alzheimer 
type. Archives of Neurology, 53, 742-749. 

Tan, J. (2004). Influence of impending death on the MMSE. MSc The- 
sis, University of Victoria. 

Taussig, I. M., Mack, W. J., & Henderson, V. W. (1996). Concur- 
rent validity of Spanish-language versions of the Mini-Mental 
State Examination, Mental Status Questionnaire, Information- 
Concentration Test, and Orientation-Memory-Concentration 
Test: Alzheimer's disease patients and non-demented elderly 
comparison subjects. Journal of the International Neuropsycho- 
logical Society, 2, 286-298. 

Teng, E. L., & Chui, H. C. (1987). The modified Mini-Mental State 
(3MS) Examination. Journal of Clinical Psychiatry, 48, 314-318. 

Teng, E. L., Chiu, H. C, Schneider, L. S., & Metzger, L. E. (1987). 
Alzheimer's dementia: Performance on the Mini-Mental State Ex- 



amination. Journal of Consulting and Clinical Psychology, 55, 
96-100. 

Teresi, J. A., Holmes, D., Ramirez, M., Gurland, B. J., & Lantigua, R. 
(2001). Performance of cognitive tests among different racial/eth- 
nic and education groups: Findings of differential item function- 
ing and possible item bias. Journal of Mental Health and Aging, 7, 
79-89. 

Tombaugh, T N. (2005). Test-retest reliable coefficients and 5-year 
change scores for the MMSE and the 3MS. Archives of Clinical 
Neuropsychology, 20, 485-503. 

Tombaugh, T. N. (in press). How much change in the MMSE and 
3MS is a significant change? Psychological Assessment. 

Tombaugh, T N., McDowell, I., Krisjansson, B., & Hubley, A. M. 
(1996). Mini-Mental State Examination (MMSE) and the modi- 
fied MMSE (3MS): A psychometric comparison and normative 
data. Psychological Assessment, 8, 48-59. 

Tombaugh, T. N., & Mclntyre, N. J. (1992). The Mini-Mental State 
Examination: A comprehensive review. Journal of American Geri- 
atric Society, 40, 922-935. 

Tsai, L., & Tsuang, M. T. (1979). The Mini-Mental State Test and 
computerized tomography. American Journal of Psychiatry, 136, 
436-439. 

Tschanz, J. T, Welsh-Bohmer, K. A., Plassman, B. L., Norton, M. C, 
Wyse, B. W., & Breitner, J. C. S. (2002). An adaptation of the mod- 
ified Mini-Mental State Examination: Analysis of demographic 
influences and normative data. Neuropsychiatry, Neuropsychology, 
and Behavioral Neurology, 15, 28-38. 

Uhlmann, R. F., Teri, L., Rees, T. S., Mozlowski, K. J., & Larson, E. B. 
(1989). Impact of mild to moderate hearing loss on mental 
status testing. Journal of the American Geriatrics Society, 37, 
223-228. 

Van Der Cammen, T J. M., Van Harskamp, E, Stronks, D. L., Pass- 
chier, J., & Schudel, W. J. (1992). Value of the Mini-Mental 
State Examination and informants' data for the detection of de- 
mentia in geriatric outpatients. Psychological Reports, 71, 1003- 
1009. 

Van Gorp, W. G., Marcotte, T D., Sultzer, D., Hinkin, C, Mahler, M., 
& Cummings, J. L. (1999). Screening for dementia: Comparison 
of three commonly used instruments. Journal of Clinical and Ex- 
perimental Neuropsychology, 21, 29-38. 

Wells, J. C, Keyl, P. M., Aboraya, A., Folstein, M. E, & Anthony, J. C. 
(1992). Discriminant validity of a reduced set of Mini-Mental 
State Examination items for dementia and Alzheimer's disease. 
Acta Psychiatrica Scandinavica, 86, 23-31. 



National Adult Reading Test (NART) 



OTHER TEST NAMES 

There are three main versions of this test (see Table 6-59). 
Nelson (1982) developed the NART using a British sample 
and the original WAIS. For the second edition (NART-2), the 
test was restandardized for use with the WAIS-R (Nelson & 
Willison, 1991). The test has also been adapted for use in the 
United States (American National Adult Reading Test, AM- 
NART: Grober & Sliwinski, 1991) and in the United States 
and Canada (North American Adult Reading Test, NAART: 
Blair & Spreen, 1989). 



PURPOSE 

The purpose of the test is to provide an estimate of premorbid 
intellectual ability. 

SOURCE 

The NART-2 (word card and booklet, manual including pro- 
nunciation guide, and scoring forms) can be purchased from 
NFER- Nelson, Unit 28, Bramble Road, Techno Trading Centre, 
Swindon, Wiltshire, SNZ 8E2, at a cost of £74.25 sterling + vat. 



190 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



Table 6-59 NART Versions 



Test 


Test Predicted 


Number of Items 


NART-2 


WAIS-R 
FAS 
PASAT 
Raven's SPM 


50 


NAART 


WAIS-R 
Vocabulary 


61 


AMNART 


WAIS-R 


45 



Figure 6-9 The New Adult Reading Test-2. Source: Extract from 
the National Adult Reading Test. Hazel S. Nelson, 1991, by 
permission of the publishers, NFER-Nelson. 



There is no commercial source for the NAART or the AM- 
NART. Users may refer to the following text to design their own 
material (Figures 6-10 and 6-11). 

AGE RANGE 

The test can be used with individuals aged 18 and older. 

DESCRIPTION 

There are a number of clinical, medico-legal, or research situ- 
ations where knowledge of premorbid cognitive ability (e.g., 
IQ) is essential. Since premorbid test data are rarely available, 
methods of estimation are needed. The National Adult Read- 
ing Test or NART-2 (Nelson, 1982; Nelson & O'Connell, 
1978; Nelson & Willison, 1991), a reading test of 50 irregu- 
larly spelled words (e.g., ache, naive, thyme), has promise as 
an assessment tool for the determination of premorbid in- 
tellectual function (Figure 6-9). Assuming that the patient 
is familiar with the word, accuracy of pronunciation is used 
to predict IQ. As the words are short, patients do not have 
to analyze a complex visual stimulus, and because they are 
irregular, phonological decoding or intelligent guesswork 
will not provide the correct pronunciation. Therefore, it has 
been argued that performance depends more on previous 
knowledge than on current cognitive capacity (Nelson & O'- 
Connell, 1978). The value of the test lies in (a) the high corre- 
lation between reading ability and intelligence in the normal 
population (Crawford, Stewart, Cochrane, et al., 1989), (b) 
the fact that word reading tends to produce a fairly accu- 
rate estimate of preinjury IQ (Crawford et al., 2001; Moss & 
Dowd, 1991), and (c) the fact that the ability to pronounce 
irregular words is generally retained in mildly demented in- 
dividuals (Crawford, Parker, et al., 1988; Fromm et al., 1991; 
Sharpe & O'Carroll, 1991; Stebbins, Gilley, et al, 1990; see 
Validity). It is important to note that the NART and its vari- 
ous versions (see Table 6-59), as of this writing, can only be 
used to predict WAIS-R IQ. To predict WAIS-III scores, the 
reader is referred to the reviews of the WTAR and WAIS III 
and the introduction to Chapter 6, which also considers meth- 
ods for premorbid estimation (General Cognitive Function- 
ing, Neuropsychological Batteries and Assessment of Premorbid 
Intelligence). 



Ache 


Procreate 


Leviathan 


Debt 


Quadruped 


Aeon 


Psalm 


Catacomb 


Detente 


Depot 


Superfluous 


Gauche 


Chord 


Radix 


Drachm 


Bouquet 


Assignate 


Idyll 


Deny 


Gist 


Beatify 


Capon 


Hiatus 


Banal 


Heir 


Simile 


Sidereal 


Aisle 


Rarefy 


Puerperal 


Subtle 


Cellist 


Topiary 


Nausea 


Zealot 


Demesne 


Equivocal 


Abstemious 


Campanile 


Naive 


Gouge 


Labile 


Thyme 


Placebo 


Syncope 


Courteous 


Facade 


Prelate 


Gaoled 


Aver 





Nelson (1982) developed the test in England for use with 
the WAIS. Nelson and Willison (1991) restandardized the 
NART (NART-2) on a British sample so that it is possible to 
convert NART-2 scores directly to WAIS-R scores. Ryan and 
Paolo (1992) also standardized the NART for the WAIS-R, 
using an American sample of people 75 years and older. 
Blair and Spreen (1989) modified the test for use with North 
American populations (NAART) and validated it against the 
WAIS-R. The NAART consists of a list of 61 words printed 
in two columns on both sides of an 8 Vi" x 11" card, which is 
given to the subject to read. The examiner records errors on 
a NAART scoring sheet. A sample scoring sheet along with 
the correct pronunciations is given in Figure 6-10. A short 
version, the NAART35, has recently been developed by Uttl 
(2002) and appears to provide a reliable and valid measure 
of verbal intelligence. The 35 items comprising this version 
are highlighted in Figure 6-10. Grober and Sliwinski (1991) 
also developed their own North American version, the AM- 
NART, which consists of 45 items. The items are shown in 
Figure 6-11. 

In addition to these three main versions, other modifica- 
tions have also appeared in the literature. An abbreviated 
NART (Short NART) is described later, which is based on the 
first half of the test (Beardsall & Brayne, 1990). Another useful 
modification is to place the words into sentences (e.g., the 
Cambridge Contextual Reading Test or CCRT: Beardsall & 
Huppert, 1994; C-AUSNART: Lucas et al., 2003), since the 



National Adult Reading Test (NART) 191 



Figure 6-1 NAART and NAART35 sample scoring sheet. Pronunciation symbols follow 
Webster's. Single asterisk indicates correct U.S. pronunciation only. Double asterisks indicate 
correct Canadian pronunciation only. Items in bold comprise the NAART35. 



NAART 




Sample Scoring 


Sheet 


Page 1 




DEBT det 


SUBPOENA se-pe'-na 


DEBRIS dabre, dabre', da'-bre 


PLACEBO pla-se'-bo 


AISLE Tl 


PROCREATE pro'-kre-at 


REIGN ran 


PSALM sam, salm* 


DEPOT de,po, de'po 


BANAL banal', banal', ban'-al 


SIMILE sim'-a-le 


RAREFY rar'afT 


LINGERIE lan'-zha-re', lon'zhara' 


GIST jist 


RECIPE res'a-pe 


CORPS kor, korz 


GOUGE gauj 


HORS D'OEUVRE or' darv(r)' 


HEIR ar 


SIEVE siv 


SUBTLE sot'-ol 


HIATUS hlatas 


CATACOMB kat'-a-kom 


GAUCHE gosh 


BOUQUET bo-ka', biika' 


ZEALOT zel'-at 


GAUGE gaj 


PARADIGM par'-a-dlm, par'-a-dim 


COLONEL karn'-al 


FACADE fa-sad' 


Page 2 




CELLIST chel'-ast 


LEVIATHAN li vT' a than 


INDICT in-dTt' 


PRELATE prel'-at, pre'-lat* 


DETENTE da-rci(n)t 


QUADRUPED kwad' raped 


IMPUGN impyiin' 


SIDEREAL sT-dir'-e-al, sa-dir'-e-al 


CAPON ka'pan, ka'pon 


ABSTEMIOUS ab ste' me as 


RADIX rad'-iks 


BEATIFY beat'afT 


AEON e'-an, e'-an 


GAOLED jald 


EPITOME i-pit'-a-me 


DEMESNE diman', dimen' 


EQUIVOCAL ikwiv'a-kal 


SYNCOPE sing'-ka-pe, sin'-k'rrn-pe 


REIFY ra'-a-fl, re'-a-fi 


ENNUI anwe' 


INDICES in'-dasez 


DRACHM dram 


ASSIGNATE as'-ig-nat' 


CIDEVANT sed-a-vd(n)' 


TOPIARY topeer'e 


EPERGNE i-parn', aparn' 


CAVEAT kav'eat, kav'eat, 


VIVACE vevach'a, vevach'e 


ka-ve-at"* 


TALIPES tal'-a-pez 


SUPERFLUOUS su-per'-flij-as 


SYNECDOCHE sanek'dake 



provision of semantic and syntactic cues (context) results in a 
larger number of words being read correctly and, hence, in a 
higher estimate of IQ, particularly among demented people 
and poor- to-average readers (Beardsall & Huppert, 1994; Con- 
way & O'Carroll, 1997; Watt & O'Carroll, 1999). Beardsall 
(1998) has developed an equation to convert CCRT errors to 
WAIS-R Verbal IQ scores based on a relatively small sample 
(73) of healthy British individuals aged 70 years and older. 
The inclusion of demographic information (education, gen- 
der) improved prediction. The CCRT as well as a conversion 
table (to convert CCRT errors to Verbal IQ) is available upon 
request from the author (L. Beardsall, School of Psychology, 



University of Birmingham, Edgbaston, Birmingham, B15 
2TT, UK). The test has not yet been adjusted for use in North 
America. 

ADMINISTRATION 

NART-2 

See Source. Briefly, the patient is presented with the word card 
and is instructed to read each word. Because the reading of 
words in a list format may be confusing for some subjects, 
the NART-2 is available in booklet format with each word 



192 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



Figure 6-1 1 AMNART list of words. Source: From Grober & 
Sliwinski, 1991. Reprinted with permission of the authors and 
Psychology Press. 



Table 6-60 Conversion Table for Predicted Full NART 
Error Score From Short NART Score 



ACHE 


CHASSIS 


AISLE 


CELLIST 


CAPON 


ALGAE 


DEBT 


SUPERFLUOUS 


CHORD 


CHAMOIS 


HEIR 


THYME 


DENY 


APROPOS 


BOUQUET 


VIRULENT 


CAPRICE 


ZEALOT 


GAUGE 


FACADE 


WORSTED 


CABAL 


DEPOT 


ABSTEMIOUS 


NAUSEA 


DETENTE 


NAIVE 


SCION 


SUBTLE 


PAPYRUS 


PUGILIST 


QUADRUPED 


FETAL 


PRELATE 


BLATANT 


EPITOME 


PLACEBO 


BEATIFY 


HIATUS 


HYPERBOLE 


SIMILE 


IMBROGLIO 


MERINGUE 


SYNCOPE 


SIEVE 







Conversion to Full NART 


Correct Score 


Error Score 


0-11 


As in Full NART 




(50 minus correct) 


12 


38 


13 


36 


14 


34 


15 


33 


16 


31 


17 


30 


18 


28 


19 


26 


20 


24 


21+ 


As in Full NART 




(50 minus correct) 



Compute the number of correct words in the Short NART. If to 20 
are correct, then do not continue to the Full NART. If 21 to 25 are 
correct, then continue to the Full NART. These scores can then be 
converted to predicted IQ scores using appropriate equations. 

Source: From Beardsall & Brayne, 1990. Copyright British Psycholog- 
ical Society. Reprinted with permission. 



NAART/NAART35 

The instructions are shown in Figure 6-12. 



displayed in large print on a separate card. The reading of 
words is paced by requiring the patient to pause between 
words until the examiner calls, "next." 



Short NART 

An optional criterion for discontinuation of the NART-2 
(14 incorrect in 15 consecutive responses) is presented in 
the test manual. However, Beardsall and Brayne (1990) have 
reported that, in a sample of elderly subjects, this criterion 
was rarely met. To reduce anxiety or distress in people with 
poor reading skills, they developed an equation that esti- 
mates a person's score on the second half of the NART 
(items 26-50) from the first half (the Short NART). If a pa- 
tient scores less than 12 correct on the Short NART, this is 
taken as the total correct score, since Beardsall and Brayne 
(1990) showed that people who score to 11 correct are un- 
likely to add to their score by completing the second half of 
the test. For those scoring between 12 and 20, a conversion 
table (Table 6-60) is used to predict the full error score. For 
people scoring more than 20 correct, the complete NART is 
administered. The accuracy of the Short NART in estimat- 
ing premorbid IQ has been found to be virtually equivalent 
to the full NART in a cross-validation study (Crawford 
etal., 1991). 



AMNART 

For the AMNART, the person is asked to read each word 
aloud. If the individual changes a response, the last response is 
scored as correct or not correct. Boekamp et al. (1995) re- 
ported that no discontinuation rule could be established for 
the AMNART because word difficulty order was not invariant 
across different samples. 

ADMINISTRATION TIME 

The approximate time required is 10 minutes. 

SCORING 

The use of a pronunciation guide and a tape recorder is rec- 
ommended to facilitate scoring. Each incorrectly pronounced 
word counts as one error. Slight variations in pronunciation 
are acceptable when they are due to regional accents. The total 
number of errors is tabulated. 

NART-2 and Short NART 

These scores can be converted to WAIS-R VIQ, PIQ, and FSIQ 
using a table provided in the manual (Nelson & Willison, 
1991). Note that Short NART scores must be converted to full 
NART-2 scores. An additional table is provided in the test 



National Adult Reading Test (NART) 193 

Figure 6-1 2 Instructions for NAART and NAART35. 



"I want you to read slowly down this list of words starting here (indicate 'debt') and continuing down this column and on to 
the next. When you have finished reading the words on the page, turn the page over and begin here" (indicate top of sec- 
ond page). 

"After each word please wait until I say 'Next' before reading the next word." 

"I must warn you that there are many words that you probably won't recognize; in fact, mosf people don't know them, so just 
guess at these, OK? Go ahead." 

The examinee should be encouraged to guess, and all responses should be reinforced ("good," "that's fine," etc.). The exami- 
nee may change a response if he or she wishes to do so but if more than one version is given, the examinee must decide on 
the final choice. No time limit is imposed. 



manual to evaluate whether the predicted minus obtained 
discrepancy is unusual. Alternatively, examiners working with 
American patients can use Ryan and Paolo's (1992) equations 
to predict WAIS-R IQ: 

Estimated VIQ = 132.3893 + (NART-2 errors) (-1.1 64) 

Estimated PIQ = 123.0684 + (NART-2 errors) (-0.823) 

Estimated FSIQ = 131.3845 + (NART-2 errors) (-1.1 24) 

The standard errors of estimate (SE E s) are 7.70, 12.08, and 
8.83 for WAIS-R VIQ, PIQ, and FSIQ, respectively. Note that 
the regression equations were developed on a sample of nor- 
mal, community-dwelling subjects, 75 years and older. The 
inclusion of education (plus NART errors) in the equation 
did not significantly improve prediction. Carswell et al. (1997) 
cross-validated the equation for VIQ and also noted that de- 
mographic variables (age and education) failed to signifi- 
cantly improve the accuracy of postdiction. 

Wiltshire et al. (1991), however, included demographic 
variables with the NART and provided a substantially better 
estimate of premorbid cognitive functioning than that given 
by the NART or by demographic information alone. The 
equation (appropriate for subjects between the ages of 55 and 
69 years) is as follows: 

Estimated IQ = 123.7 - 0.8 (NART errors) 

+ 3.8 education — 7.4 gender 

To use this equation, note that educational level comprised 
the following five categories: (1) some primary school, (2) 
some secondary school, (3) some secondary school plus trade 
qualifications, (4) secondary school completed, and (5) terti- 
ary education begun. Gender was assigned as males = 1 and 
females = 2. 

NAART/NAART35 

NAART equations (Blair & Spreen, 1989) to predict WAIS-R 
VIQ, PIQ, and FSIQ are as follows: 



Estimated VIQ = 128.7 ■ 

Estimated PIQ =119.4 

Estimated FSIQ =127.8- 



.89(NAART errors) 
.42 (NAART errors) 
.78(NAART errors) 



The SE E s for VIQ, PIQ, and FSIQ are 6.56, 10.67, and 7.63, 
respectively. 

NAART equations (Uttl, 2002) to predict WAIS-R Vocabu- 
lary are: 

Estimated Vocabulary (raw score) 

= 31.30 + 0.622(NAART correct) (SE E = 5.14) 

Estimated Vocabulary (raw score) 

= 25.71 + 0.566(NAART correct) + 0.508 
(Education) (SE E = 5.02) 

Estimated Vocabulary Scaled Score 
= 5.383 + 0.179(NAART) (SE E =1.71) 

Estimated Vocabulary Scaled Score 
= 4.112 + 0.167(NAART) + 0.115 
(Education) (SE E =1.69) 

For the NAART35, Uttl (2002) provides the following equa- 
tions to predict WAIS-R Vocabulary: 

Estimated Vocabulary (raw score) 

= 38.67 + 0.8 11(NAART35 correct) (SE E = 5.11) 

Estimated Vocabulary (raw score) 

= 32.50 + 0.740(NAART35 correct) + 0.500 
(Education) (SE E = 5.00) 

Estimated Vocabulary Scaled Score 

= 7.52 + 0.233(NAART35) (SE E = 1.71) 

Estimated Vocabulary Scaled Score 

= 6.12 + 0.217(NAART35) + 0.114(Education) 
(SE E =1.69) 

The equation to predict NAART from the NAART35 is: 

NAART correct = 12.39 + 1.282(NAART35) (SE E = 1.54) 



194 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



AMNART 

For the AMNART (Grober & Sliwinski, 1991), the equation to 
estimate WAIS-R VIQ is: 

Estimated VIQ 

= 1 18.2 - .89 (AMNART errors) 

+ .64 (years of education). The SE E is 7.94. 

DEMOGRAPHIC EFFECTS 

As might be expected, NART (NART-2, NAART) errors sys- 
tematically decrease with increasing FSIQ (Wiens et al., 1993). 
NART (NAART, AMNART) performance is correlated with 
years of education and social class. Age, gender, and ethnicity 
(Caucasian versus African American) have little effect on perfor- 
mance (Antsey et al, 2000; Beardsall, 1998; Boekamp et al, 
1995; Cockburn et al, 2000; Crawford, Stewart, et al, 1988; Free- 
man & Godfrey, 2000; Graf & Uttl, 1995; Grober & Sliwinski, 
1991; Ivnik et al., 1996: Nelson, 1982; Nelson & Willison, 1991; 
Starr et al., 1992; Storandt et al., 1995; Wiens et al., 1993), al- 
though when a wide age range is studied (well-educated healthy 
individuals aged 16-84 years), an age-related increase in "cor- 
rect" NAART scores appears to emerge (see Table 6-61) (Graf & 
Uttl, 1995; Parkin & lava, 1999; Uttl 2002; Uttl & Graf, 1997) due 
largely to the relatively weak performance of young adults. 

While cross-sectional studies do not suggest significant de- 
cline in NART (all versions) scores in older adults, a recent lon- 
gitudinal study (Deary et al., 1998) showed evidence of decline 
with aging. In that study, 387 healthy elderly people were tested 
with the NART-2 at baseline and followed up four years later. 
NART-estimated IQ fell by a mean of 2. 1 points over four years. 
Further, the amount of decline was differentially related to initial 
cognitive status, social class, and education. Those with higher 
baseline ability, in higher social groups, with more education, 
and who were younger were relatively protected from decline. 

INTERPRETIVE GUIDELINES 

Prediction of Wechsler Intelligence Scores 

NART-2. Nelson and Willison (1991) have provided a 
discrepancy table to determine the probability of a chance oc- 



currence of a discrepancy in favor of NART-2-estimated IQ 
over observed WAIS-R IQ. The equations are based on a 
British sample. For North American samples, Ryan and Paolo 
(1992) developed regression equations to predict WAIS-R IQs 
from NART error scores (see Scoring), but they did not pro- 
vide corresponding discrepancy tables. 

Willshire et al. (1991) have developed regression equations 
to predict WAIS-R IQs for use with adults, aged 55 to 69 years, 
based on a combination of NART errors and demographic 
variables. The equation (see Scoring) is based on an Australian 
sample. However, standard errors of estimate and discrepancy 
tables are not provided. 

Crawford, Stewart, Parker, et al. (1989) have also developed 
regression equations that combine NART errors and demo- 
graphic variables to predict premorbid IQ. Unfortunately, 
these equations are based on the WAIS, not the newer versions 
(WAIS-R, WAIS-III). 

Because there are limitations in estimating premorbid 
ability from NART scores (e.g., regression to the mean, lim- 
ited range of scores, vulnerability of performance to disease), 
Crawford et al. (1990) have developed a regression equation 
to predict NART scores from demographic variables (years of 
education, social class, age, and gender). 1 The equation is as 
follows: 



Predicted NART error score 

= 37.9 - 1.77 (education) + 2.7 (class) ■ 
- 0.03 (gender); Se est = 6.93 



.07 (age) 



The equation can also be downloaded from Dr. Crawford's 
site www.psyc.abdn.ac.uk. The equation allows the clinician 
to compare a current NART score against a predicted score. A 
large discrepancy between the predicted and obtained scores 
(obtained error score more than 11.4 points over the pre- 
dicted score) suggests impaired NART performance and alerts 
the clinician to the fact that the NART will not provide an ac- 
curate estimate of premorbid ability. 

NAART. Blair and Spreen (1989) recommended that for 
PIQ, a positive discrepancy of at least 21 points between esti- 
mated and actual IQs indicates the possibility of deteriora- 
tion. For VIQ and FSIQ, a positive discrepancy of 15 or more 
points between estimated and actual IQ scores indicates the 



Table 6-61 NAART and NAART35 Mean Number Correct by Age Group 
Performance on NAART and NAART35 by Midpoint Overlapping Age Groups 



Midpoint 


20 


25 


30 


35 


40 


45 


50 


55 


60 


65 


70 


73 


80 


Range 


18-25 


20-30 


25-35 


30-40 


35-45 


40-50 


45-55 


50-60 


55-65 


60-70 


65-75 


70-80 


75-91 


N 


52 


63 


55 


51 


51 


59 


68 


62 


59 


56 


57 


52 


48 


NAART M 


38.46 


39.90 


39.44 


38.58 


40.05 


40.20 


41.57 


42.88 


44.38 


42.28 


43.15 


43.55 


43.82 


SD 


9.29 


8.30 


8.57 


9.33 


10.87 


10.79 


9.16 


8.33 


8.26 


9.31 


9.42 


8.84 


8.09 


NAART35 M 


20.60 


21.76 


21.39 


20.59 


21.80 


21.72 


22.75 


23.76 


24.84 


22.90 


23.70 


24.18 


24.51 


SD 


6.63 


6.04 


6.32 


7.11 


8.48 


8.29 


7.27 


6.66 


6.32 


7.31 


7.59 


7.08 


6.59 



Source: Adapted from Uttl, 2002. 



National Adult Reading Test (NART) 195 



possibility of intellectual deterioration or impairment (based 
on the calculation of 95% confidence levels). Wiens et al. (1993) 
reported that only about 10% of normal individuals have 
predicted-obtained FSIQ differences as large as 15 points. Thus, 
a difference of this size is infrequent enough among healthy 
people to merit clinical attention (see also Berry et al., 1994). 

To evaluate whether the obtained NAART score is within 
the expected range, Uttl (2002) has developed NAART predic- 
tion equations based on age as well as age and education. The 
equations are as follows: 

NAART correct = 36.60 + 0.0925(Age) (SE E = 9.15) 
NAART correct = 14.07 + 1.518(Education) + 0.071 
(Age) (SE E = 8.38) 

AMNART. Grober and Sliwinski (1991) recommend a dis- 
crepancy of 10 or more points between estimated and obtained 
VIQ values to suggest the possibility of intellectual decline. 
However, Boekamp et al. (1995) suggest caution in applying 
this rule; in their study, such a heuristic resulted in consider- 
able overlap between participants who were demented and 
those who were not. Ivnik et al. (1996) have provided age- and 
education-based normative information derived from a sample 
of older adults (aged 55+), almost exclusively Caucasian, living 
in an economically stable region of the United States (as part of 
the MOANS project). Their data are shown in Table 6-62 and 
may help the examiner to determine whether the obtained score 
provides a reasonable estimate of premorbid ability (see also 
Comment). 



Caveat. It is important to note the ranges of possible pre- 
dicted WAIS-R scores. The range of possible NAART-predicted 
IQs is 129 to 74 for the Verbal Scale, 119 to 94 for the Perfor- 
mance Scale, and 128 to 80 for the Full Scale (Wiens et al, 
1993). The range of possible NART-predicted IQs is 132 to 74 
for the Verbal Scale, 123 to 82 for the Performance Scale, and 
131 to 75 for the Full Scale (Ryan & Paolo, 1992). The range of 
possible VIQ estimates for the AMNART is about 131 to 83 
(Grober 8c Sliwinski, 1991). Thus, there is truncation of the 
spread of predicted IQs on either end of the distribution, lead- 
ing to unreliable estimates for individuals at other than average 
levels of ability (Ryan & Paolo, 1992; Wiens et al, 1993). 

Prediction of Other Cognitive Tests 

The majority of research on the NART has focussed on esti- 
mating premorbid intelligence and has employed the Wechsler 
intelligence scales as the criterion variable. NART equations 
are, however, also available for estimating premorbid perfor- 
mance on other cognitive tasks (see Table 6-59), including the 
FAS verbal fluency task (Crawford et al, 1992), the PASAT 
(Crawford et al., 1998), and the Raven's Standard Progressive 
Matrices (Freeman 8c Godfrey, 2000; Van den Broek 8c Brad- 
shaw, 1994). The equations are provided below (see also re- 
views of Verbal Fluency, PASAT, and Ravens in this book): 

Estimated FAS 

= 57.5 - (0.76 x NART errors), SE est = 9.09; 
also see www.abdn.ac.uk 



Table 6-62 AMNART (Correct Count) MOANS Norms for Persons Aged 56-97 Years 



anges 


Score 


56-62 


63-65 


66-68 


69-71 


72-74 


75-77 


78-80 


81-83 


84-86 


87-89 


90-97 


<1 


2 


>34 


>34 


>34 


>34 


>34 


>35 


>35 


>35 


>35 


>35 


>35 


1 


3 


34 


34 


34 


34 


34 


35 


35 


35 


35 


35 


35 


2 


4 


33 


33 


33 


33 


33 


34 


34 


34 


34 


34 


34 


3-5 


5 


30-32 


30-32 


30-32 


30-32 


30-32 


30-33 


32-33 


32-33 


32-33 


32-33 


32-33 


6-10 


6 


28-29 


28-29 


28-29 


28-29 


28-29 


28-29 


28-31 


28-31 


28-31 


28-31 


28-31 


11-18 


7 


25-27 


25-27 


25-27 


25-27 


25-27 


25-27 


25-27 


25-27 


25-27 


25-27 


25-27 


19-28 


8 


23-24 


23-24 


23-24 


23-24 


23-24 


23-24 


23-24 


23-24 


23-24 


23-24 


23-24 


29-40 


9 


20-22 


20-22 


20-22 


20-22 


20-22 


20-22 


20-22 


20-22 


20-22 


20-22 


20-22 


41-50 


10 


14-19 


15-19 


15-19 


17-19 


17-19 


17-19 


17-19 


17-19 


17-19 


17-19 


17-19 


60-71 


11 


11-13 


12-14 


12-14 


13-16 


13-16 


13-16 


13-16 


13-16 


13-16 


13-16 


13-16 


72-81 


12 


9-10 


9-11 


9-11 


9-12 


9-12 


9-12 


10-12 


10-12 


10-12 


10-12 


10-12 


82-89 


13 


7-8 


7-8 


7-8 


7-8 


7-8 


7-8 


7-9 


7-9 


7-9 


7-9 


7-9 


90-94 


14 


6 


6 


6 


6 


6 


6 


6 


6 


6 


6 


6 


95-97 


15 


5 


5 


5 


5 


5 


5 


5 


5 


5 


5 


5 


98 


16 


3-4 


4 


4 


4 


4 


4 


4 


4 


4 


4 


4 


99 


17 


1-2 


1-3 


1-3 


2-3 


3 


3 


3 


3 


3 


3 


3 


>99 


18 











0-1 


0-2 


0-2 


0-2 


0-2 


0-2 


0-2 


0-2 


N 




160 


169 


152 


134 


125 


112 


91 


83 


83 


83 


83 



Note: Age- and education-corrected MOANS scales scores (MSS A&E ) are calculated from a person's age-corrected MOANS scaled score (MSS A ) and that person's education ex- 
pressed in years of formal schooling completed as follows: MSS AfkE - 6.77 + (1.25 X A-MSS) - (0.73 X Education). 

Source: Adapted from Ivnik et al., 1996, based on Mayo's Older Normative Studies (MOANS). Reprinted with the kind permission of Psychology Press. 



196 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



Estimated PASAT Total 

= 215.74 - (1.85 x NART errors) - (.77 x Age) 
SE est = 34.87; also see www.abdn.ac.uk 

Estimated RSPM 

= 66.65 + (-.462 x NART errors) + (-.254 x age) 



RELIABILITY 

The NART/NART-2/NAART/AMNART is among the most 
reliable tests in clinical use. 



Internal Consistency 

Reliability estimates are above .90 for the various versions, in- 
cluding the NAART35 (Blair & Spreen, 1989; Crawford, Stew- 
art, et al, 1988; Grober & Sliwinski, 1991; Uttl, 2002). 



Test-Retest Reliability and Practice Effects 

A test-retest reliability of .98 has been reported for the NART, 
with practice effects emerging over the short term (10 days; 
Crawford, Stewart, Besson, et al., 1989). However, the mean 
decrease is less than one NART error, suggesting that prac- 
tice effects are of little clinical significance. One-year test- 
retest coefficients are high (.89) (Deary et al., 2004). With 
longer retest intervals (e.g., four years), reliability is lower, 
though still respectable (.67—72; Deary et al., 1998; Kondel 
etal, 2003). 

Raguet et al. (1996) gave the NAART to 51 normal adults on 
two separate occasions, separated by about one year. NAART 
estimates were highly reliable, with a coefficient of .92. Practice 
effects were minimal. 



Interrater Reliability 

The NART also has high interrater reliability (above .88; 
Crawford, Stewart, Besson, et al, 1989; O'Carroll, 1987; Riley 
& Simmonds, 2003; Sharpe & O'Carroll, 1991). Some NART 
words, however, have a disproportionately high rate of inter- 
rater disagreement {aeon, puerperal, aver, sidereal, and prelate) 
and particular care should be taken when scoring these words 
(Crawford, Stewart, Besson, et al., 1989). Training by an expe- 
rienced examiner and use of the pronunciation guide appears 
to improve accuracy for these items (Alcott et al., 1999). Blair 
and Spreen (1989) report that a measure of interscorer relia- 
bility for the NAART was .99 (p < .001). 



VALIDITY 

Construct Validity 

Researchers generally report moderate to high correlations 
(.40-.80) between NART (NAART, AMNART) performance 
and concurrently given measures of general intellectual status 
(Blair & Spreen, 1989; Carswell et al, 1997; Cockburn et al, 



2000; Crawford, Stewart, Besson, et al, 1989, 2001; Freeman & 
Godfrey, 2000; Grober & Sliwinski, 1991; Johnstone et al., 
1996; Nelson & O'Connell, 1978; Paolo et al., 1997; Raguet 
et al, 1996; Sharpe & O'Carroll, 1991; Uttl, 2002; Wiens et al., 
1993; Willshire et al, 1991), WRAT-R Reading (Johnstone 
et al., 1993; Wiens et al., 1993), and education (Maddrey et al., 
1996). In the standardization sample, the NART predicted 55%, 
60%, and 30% of the variance in prorated WAIS Full Scale, 
Verbal, and Performance IQ, respectively (Nelson, 1982). Simi- 
lar results have been reported by others for the various versions 
(e.g., Blair & Spreen, 1989; Crawford, Stewart, Besson, et al, 
1989; Ryan & Paolo, 1992; Wiens et al, 1993). In short, the test 
is a good predictor of VIQ and FSIQ, but is relatively poor at 
predicting PIQ. Among verbal subtests, NART (NAART) er- 
rors correlate most highly with Vocabulary and Information 
(Wiens et al., 1993). Combined factor analysis of the NART 
and WAIS (Crawford, Stewart, Cochrane, et al, 1989) has re- 
vealed that the NART has a very high loading (.85) on the first 
unrotated principal component, which is regarded as repre- 
senting general intelligence (g). Further, the NAART appears to 
measure verbal intelligence with the same degree of accuracy 
in various age groups (Uttl, 2002). 

The test has good accuracy in the retrospective estimation 
of IQ (Berry et al., 1994; Carswell et al., 1997; Moss & Dowd, 
1991; Raguet et al., 1996). A recent study by Crawford et al. 
(2001) followed up 177 individuals who had been adminis- 
tered an IQ test (Moray House Test, MHT) at age 11. They 
found a correlation of .73 between these scores and NART 
scores at age 77. The NART also had a significant correlation 
with the MMSE (r— .25), but this correlation fell near zero af- 
ter controlling for the influence of childhood IQ. These re- 
sults provide strong support for the claim that NART scores 
estimate premorbid, rather than current, intelligence. 

There is also evidence that an estimate of cognitive change 
(as indexed by the discrepancy between current performance 
on the NART and Raven's Standard Progressive Matrices) re- 
flects lifetime cognitive change. Deary et al. (2004) followed up 
80 nondemented people who took the MHT at age 11 and 
retested them at age 77, on the NART, Raven's, and two WAIS- 
R subtests (Digit Symbol, Object Assembly). The NART- Raven 
difference correlated highly with the MHT-Raven difference 
(r= .64) and with MHT- WAIS difference (r= .66). That is, the 
estimate of cognitive change based on the NART- Raven's dif- 
ference correlated highly with actual cognitive change that 
took place over a course of 67 years. 

Prediction of IQ tends to be more accurate with equations 
based on NART (or NAART/ AMNART) scores than with the 
Wechsler Vocabulary subtest (Crawford, Parker, & Besson, 
1988; Sharpe & O'Carroll, 1991) or with demographic vari- 
ables (Blair & Spreen, 1989; Bright et al., 2002; Grober & Sli- 
winski, 1991; Ryan & Paolo, 1992). However, Paolo et al. (1997) 
reported that both the NART and Barona demographic pro- 
cedures demonstrated adequate ability to detect intellectual 
decline in persons with mild dementia. 

Whether combining demographic and NART (or NAART/ 
AMNART) estimates increases predictive accuracy is of some 



National Adult Reading Test (NART) 197 



debate. Bright et al. (2002) found that an equation combining 
NART scores with demographic variables did not significantly 
increase the amount of variance in WAIS/WAIS-R IQ ex- 
plained by NART only, either in patients (e.g., with AD, Kor- 
sakoff 's) or healthy controls. By contrast, other authors have 
concluded that the addition of NART to demographic infor- 
mation improves prediction. For example, Willshire et al. 
(1991) reported that in a sample of residents in Melbourne, 
Australia, 56% of the variance in WAIS-R IQ scores could be 
predicted on the basis of a formula that included NART error 
score, education, and gender. This was 24% more than could 
be predicted on the basis of education alone and 18% more 
than on the basis of NART error score alone (see Scoring). 
Similarly, Watt and O'Carroll (1999) reported that in healthy 
participants, the addition of demographic variables improved 
the amount of explained variance in current WAIS-R Verbal 
IQ when using the NART. Carswell et al. (1997) found that a 
regression equation that combined NART errors and WAIS-R 
Vocabulary age-scaled scores provided a better estimate of 
VIQ scores obtained three years previously than did NART er- 
rors or Vocabulary scores alone. Grober and Sliwinski (1991) 
found that inclusion of both AMNART errors and education 
permitted a larger range of possible VIQ estimates and was 
slightly more accurate in estimating VIQ than the equation 
using only AMNART errors. Finally, Gladsjo et al. (1999) found 
that in normal individuals (aged 20-81 years), addition of 
the ANART (a 50-item American variant of the NART) im- 
proved prediction of WAIS-R VIQ, FSIQ, and learning score 
(based on CVLT, Story and Figure Learning Tests) beyond 
that achieved by demographics alone. The use of the ANART 
with demographic information appeared particularly use- 
ful in those examinees who have unusual combinations of 
lower than expected reading ability given their educational 
achievement. Of note, the ANART did not improve estimates 
of other premorbid abilities (general cognitive impairment as 
indexed by the Average Impairment Rating, which is a sum- 
mary of 12 measures from the Halstead-Reitan Battery, De- 
layed Memory) beyond that accomplished by demographic 
correction. 

The test is generally resistant to neurological insults such 
as closed head injury (Crawford, Parker, et al., 1988; Watt & 
O'Carroll, 1999) and is one of the few cognitive measures that 
is relatively robust against the effects of disease and decline in 
old age (Anstey et al., 2001). However, the test is not insen- 
sitive to cerebral damage, and deterioration in reading test 
performance does occur in some patients with cerebral dys- 
function, for example, in patients with a severe TBI tested 
within 12 months of injury (Riley & Simmonds, 2003); in pa- 
tients with moderate to severe levels of dementia (Boekamp 
et al., 1995; Fromm et al., 1991; Grober & Sliwinski, 1991; 
Paolo et al, 1997; Stebbins et al., 1988; Stebbins, Gilley, et al, 
1990), even when obviously aphasic or alexic patients are ex- 
cluded (Taylor, 1999); in patients with mild dementia who 
have accompanying linguistic or semantic memory deficits 
(Grober & Sliwinski, 1991; Stebbins, Wilson, et al, 1990; 
Storandt et al., 1995); and in some other specific conditions 



(e.g., patients with multiple sclerosis (MS), particularly those 
with a chronic-progressive course; Friend 8c Grattan, 1998). 
Although some (O'Carroll et al., 1992) have reported that 
NART scores are low in patients with Korsakoff 's syndrome 
or frontal disturbance, others (Bright et al, 2002; Crawford 8c 
Warrington, 2002) have not found that NART performance is 
affected in these patients. 

There are other reports of deterioration in NART perfor- 
mance in the later stages of dementia. Patterson et al. (1994) 
found a dramatic decrease in NART performance as a func- 
tion of AD severity and reported a correlation of .56 between 
MMSE and NART scores. They attributed the specific reading 
deficit manifested on the NART to the deterioration of se- 
mantic memory in AD and to an impaired ability to perform 
specific phonological manipulations. Paque and Warrington 
(1995) compared the performance of 57 dementing patients 
on the NART and the WAIS-R. Patients were examined on two 
occasions spaced at least 10 months apart. Although NART 
performance declined over time, the deterioration on VIQ 
and PIQ was more rapid and severe. Patients whose reading 
declined tended to have a lower VIQ than PIQ, raising the 
concern that verbal skills may have already been compro- 
mised by disease. Taylor et al. (1996) tested a sample of AD 
patients on three or four occasions each separated by about 
one year. AMNART performance declined with increasing de- 
mentia severity as measured by the MMSE. Cockburn et al. 
(2000) assessed patients with AD on four yearly occasions. 
They also found that NART scores declined over time, par- 
ticularly if the initial MMSE score was low. Whereas initial 
NART scores were associated with educational level, the 
extent of change depended more on initial MMSE perfor- 
mance. However, they also noted individual differences in 
the rate of decline, suggesting that reliance on group data 
may be quite misleading. Cockburn et al. also found that 
lower frequency words disappear faster from the lexicon than 
higher frequency words. However, there were widespread in- 
dividual differences and also wide variability in word recog- 
nition over time. That is, words might be recognized at one 
visit, not a year later, but then correctly recognized in later 
visits. 

The influence of psychiatric disorder is not clear. Some 
have reported that NART performance is not affected by de- 
pression (Crawford et al., 1987). However, others (Watt 8c O'- 
Carroll, 1999) have found that NART scores are influenced by 
depression, at least in patients who have suffered head in- 
juries. Findings by Kondel et al. (2003) in people with schizo- 
phrenia suggest that the NART may provide a reasonable 
estimate of premorbid IQ in younger patients (20-51 years), 
but not necessarily in older ones (52-85 years). Russell et al. 
(2000) gave the NART and WAIS-R to a sample of adults 
with schizophrenia who had a measure of IQ (WISC/WISC- 
R) taken during childhood (i.e., about 23 years earlier). There 
were no significant differences between childhood and adult 
measures of IQ; however, there were significant differences 
between these two indices and NART-estimated IQ, particu- 
larly when IQ did not fall in the average range. The NART 



198 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



overestimated IQ by an average of 15 IQ points. The authors 
recommended the use of more than one index of premorbid 
functioning. 

COMMENT 

Administration of this measure is relatively straightforward 
and takes little time. The test has high levels of internal, test- 
retest, and interrater reliability. It correlates moderately well 
with measures of intelligence (particularly VIQ and FSIQ) 
and is less related to demographic variables than various mea- 
sures of cognitive functioning (e.g., Wechsler test; Bright 
et al., 2002). Although the reading ability assessed by the 
NART/NAART/AMNART may not be entirely insensitive to 
cerebral damage, the available evidence suggests that it may be 
less vulnerable than many other cognitive measures, such as 
the MMSE and Wechsler tests (Berry et al., 1994; Christensen 
et al., 1991; Cockburn et al, 2000; Maddrey et al., 1996). Thus, 
although far from perfect, tests like the NART may be the in- 
struments of choice for assessing premorbid IQ. 

The test, however, should not be used with patients who 
have compromised language or reading ability, with those 
who have VIQs less than PIQs, or with those who have signifi- 
cant articulatory or visual acuity problems. Use of the NART 
(or its variants) within 12 months of a severe head injury is 
also not recommended since doing so runs the risk of signifi- 
cantly underestimating premorbid IQ (Riley & Simmonds, 
2003). Further, while the test may provide an acceptable pre- 
morbid index in the early stages of a dementing disorder, it is 
susceptible to changes that occur with disease progression. 
The potential confounding effect of depression requires addi- 
tional study. 

It is also important to bear in mind that use of regression 
procedures has some limitations, including regression to- 
ward the mean and limited range of scores. These limita- 
tions suggest that two types of errors may occur when the 
equations are applied to individuals with suspected demen- 
tia (Boekamp et al., 1995; Ryan & Paolo, 1992; Wiens et al, 
1993). In the case of superior premorbid ability, the pre- 
dicted IQ will represent an underestimate of the amount of 
cognitive deterioration present. However, since patients with 
dementia are rarely referred for psychological assessment 
when their intelligence levels remain in the superior range, 
this ceiling effect should not necessarily invalidate the clini- 
cal utility of the test (Ryan & Paolo, 1992). On the other 
hand, in individuals whose premorbid abilities are relatively 
low, the estimated IQ might suggest cognitive deterioration 
when, in actuality, it has not occurred. Note, too, that a fairly 
large loss in cognitive ability (about 15-21 IQ points) may 
need to occur before the NART (NAART) can reliably iden- 
tify potential abnormality. Accordingly, the clinician needs 
to be cautious in inferring an absence of decline when cutoff 
discrepancies are not met. 

These limitations underscore the need to supplement NART 
(NAART /AMNART) estimates of premorbid functioning with 
clinical observations and information about a patient's edu- 



cational and occupational accomplishments as well as other 
performance-based data (e.g., MMSE, Vocabulary). Raguet 
et al. (1996) recommend averaging estimates from the Barona 
formula and NAART. Crawford et al. (1990) and Uttl (2002) 
have attempted to address the problem by developing re- 
gression equations to predict NART/NAART scores from 
demographic variables (see Interpretive Guidelines). A similar 
procedure has been developed for use with the new WTAR. 

The majority of research on the NART has focussed on es- 
timating premorbid intelligence. NART equations are, how- 
ever, also available for estimating premorbid performance on 
other cognitive tasks (see Table 6-59 and Interpretive Guide- 
lines). Some (Schlosser & Ivison, 1989) have speculated that 
NART equations based on memory test performance may be 
capable of assessing dementia earlier than the NART/WAIS-R 
combination. However, findings by Gladsjo et al. (1999) and 
Issella et al. (2005) suggest that the NART (or its variants) 
does not improve accuracy of prediction of premorbid mem- 
ory abilities beyond that accomplished by demographic cor- 
rection. 

Although the rationale for developing various versions of 
the NART (NAART, AMNART, CCRT) appears sound, there 
is no empirical comparison of their relative efficacy (Franzen 
et al., 1997). The various forms and equations should not be 
used interchangeably, and the exact version should be speci- 
fied in clinical or research reports (Franzen et al., 1997). Note, 
too, that the equations developed for the NAART35 remain to 
be cross-validated in other samples. 

One other limitation deserves attention. The various NART 
versions have been developed for use with the WAIS and WAIS- 
R. To date, only the WTAR, not the NART (NAART, AM- 
NART), has been validated against the updated WAIS-III, the 
version currently used in most research and clinical settings. 



NOTE 

1. Social class coding: 1: professional (e.g., architect, church 
minister); 2: intermediate (e.g., computer programmer, teacher); 
3: skilled (e.g., carpenter, salesperson); 4: semiskilled (e.g., assembly- 
line worker, waiter); 5: unskilled (e.g., cleaner, laborer). Gender cod- 
ing: male = 1, female = 2. 



REFERENCES 

Alcott, D., Swann, R., & Grafhan, A. (1999). The effect of training on 
rater reliability on the scoring of the NART. British Journal of 
Clinical Psychology, 38, 431-434. 

Anstey, K. J., Luszcz, M. A., Giles, L. C, & Andrews, G. R. (2001). 
Demographic, health, cognitive, and sensory variables as pre- 
dictors of mortality in very old adults. Psychology and Aging, 16, 
3-11. 

Beardsall, L. (1998). Development of the Cambridge Contextual 
Reading Test for improving the estimation of premorbid verbal 
intelligence in older persons with dementia. British Journal of 
Clinical Psychology, 37, 229-240. 



National Adult Reading Test (NART) 199 



Beardsall, L., & Brayne, C. (1990). Estimation of verbal intelligence in 
an elderly community: A prediction analysis using a shortened 
NART. British Journal of Clinical Psychology, 29, 83-90. 

Beardsall, L., & Huppert, F. A. (1994). Improvement in NART word 
reading in demented and normal older persons using the Cam- 
bridge Contextual Reading Test. Journal of Clinical and Experi- 
mental Neuropsychology, 16, 232-242. 

Berry, D. T. R., Carpenter, G. S., Campbell, D. A., Schmitt, F. A., Hel- 
ton, K., & Lipke-Molby, T. (1994). The new adult reading test- 
revised: Accuracy in estimating WAIS-R IQ scores obtained 3.5 
years earlier from normal older persons. Archives of Clinical Neu- 
ropsychology, 9, 239-250. 

Blair, J. R., & Spreen, O. (1989). Predicting premorbid IQ: A revision 
of the National Adult Reading Test. The Clinical Neuropsycholo- 
gist, 3, 129-136. 

Boekamp, J. R., Strauss, M. E., & Adams, N. (1995). Estimating pre- 
morbid intelligence in African-American and white elderly veter- 
ans using the American version of the National Adult Reading 
Test. Journal of Clinical and Experimental Neuropsychology, 17, 
645-653. 

Bright, P., Jadlow, E., & Kopelman, M. D. (2002). The National Adult 
Reading Test as a measure of premorbid intelligence: A compari- 
son with estimates derived from premorbid levels. Journal of the 
International Neuropsychological Society, 8, 847-854. 

Carswell, L. M., Graves, R. E., Snow, W. G., & Tierney, M. C. (1997). 
Postdicting verbal IQ of elderly individuals. Journal of Clinical 
and Experimental Neuropsychology, 19, 914-921. 

Christensen, H., Hadzi-Pavlovic, D., & Jacomb, P. (1991). The psy- 
chometric differentiation of dementia from normal aging: A 
meta-analysis. Psychological Assessment, 3, 147-155. 

Cockburn, J., Keene, J., Hope, T, & Smith, P. (2000). Progressive de- 
cline in NART scores with increasing dementia severity. Journal of 
Clinical and Experimental Neuropsychology, 22, 508-517. 

Conway, S. C, & O'Carroll, R. E. (1997). An evaluation of the Cam- 
bridge Contextual Reading Test (CCRT) in Alzheimer's disease. 
British Journal of Clinical Psychology, 36, 623-625. 

Crawford, J. R., Allan, K. M., Cochrane, R. H. B., & Parker, D. M. 
(1990). Assessing the validity of NART-estimated premorbid IQs 
in the individual case. British Journal of Clinical Psychology, 29, 
435-436. 

Crawford, J. R., Besson, J. A. O., Parker, D. M., Sutherland, K. M., & 
Keen, P. L. (1987). Estimation of premorbid intellectual status in 
depression. British Journal of Clinical Psychology, 26, 313-314. 

Crawford, J. R., Deary, I. J., Starr, J., & Whalley, L. J. (2001). The 
NART as an index of prior intellectual functioning: A retrospec- 
tive validity study covering a 66-year interval. Psychological Medi- 
cine, 31,451-458. 

Crawford, J. R., Moore, J. W., & Cameron, I. M. (1992). Verbal 
fluency: A NART-based equation for the estimation of premor- 
bid performance. British Journal of Clinical Psychology, 31, 327- 
329. 

Crawford, J. R., Parker, D. M., Allan, K. M., Jack, A. M., & Morrison, 
F. M. (1991). The Short NART: Cross-validation, relationship to 
IQ and some practical considerations. British Journal of Clinical 
Psychology, 30, 223-229. 

Crawford, J. R., Obansawin, M. C, & Allan, K. M. (1998). PASAT 
and components of WAIS-R performance: Convergent and dis- 
criminant validity. Neuropsychological Rehabilitation, 8, 255-272. 

Crawford, J. R., Parker, D. M., & Besson, J. A. O. (1988). Estimation of 
premorbid intelligence in organic conditions. British Journal of 
Psychiatry, 153, 178-181. 



Crawford, J. R., Stewart, L. E., Garthwaite, P. H., Parker, D. M., and 
Besson, J. A. O. (1988). The relationship between demographic 
variables and NART performance in normal subjects. British 
Journal of Clinical Psychology, 27, 181-182. 

Crawford, J. R., Stewart, L. E., Besson, J. A. O., Parker, D. M., & De 
Lacey, G. (1989). Prediction of WAIS IQ with the National Adult 
Reading Test: Cross-validation and extension. British Journal of 
Clinical Psychology, 28, 267-273. 

Crawford, J. R., Stewart, L. E., Cochrane, R. H. B., Parker, D. M., & 
Besson, J. A. O. (1989). Construct validity of the National Adult 
Reading Test: A factor analytic study. Personality and Individual 
Differences, 10, 585-587. 

Crawford, J. R., Stewart, L. E., Parker, D. M., Besson, J. A. C, & 
Cochrane, R. H. B. (1989). Estimation of premorbid intelligence: 
Combining psychometric and demographic approaches improves 
predictive accuracy. Personality and Individual Differences, 10, 
793-796. 

Crawford, J. R., & Warrington, E. K. (2002). The Homophone Mean- 
ing Generation Test: Psychometric properties and a method for 
estimating premorbid performance. Journal of the International 
Neuropsychological Society, 8, 547-554. 

Deary, I. J., MacLennan, W. J., & Starr, J. M. (1998). Is age kinder to 
the initially more able?: Differential ageing of a verbal ability 
in the healthy old people in Edinburgh study. Intelligence, 26, 
357-375. 

Deary, I. J., Whalley, L. J., & Crawford, J. R. (2004). An "instanta- 
neous" estimate of a lifetime's cognitive change. Intelligence, 32, 
113-119. 

Franzen, M. D., Burgess, E. J., & Smith-See-Miller, L. (1997). Methods 
of estimating premorbid functioning. Archives of Clinical Neu- 
ropsychology, 12, 711-738. 

Freeman, J., & Godfrey, H. (2000). The validity of the NART-RSPM 
index in detecting intellectual declines following traumatic brain 
injury: A controlled study. British Journal of Clinical Psychology, 
39, 95-103. 

Friend, K. B., & Grattan, L. (1998). Use of the North American Adult 
Reading Test to estimate premorbid intellectual function in pa- 
tients with multiple sclerosis. Journal of Clinical and Experimental 
Neuropsychology, 20, 846, 851. 

Fromm, D., Holland, A. L., Nebes, R. D., & Oakley, M. A. (1991). A 
longitudinal study of word-reading ability in Alzheimer's disease: 
Evidence from the National Adult Reading Test. Cortex, 27, 367- 
376. 

Gladsjo, J. A., Heaton, R. K., Palmer, B. W. M. Taylor, M. J., & Jeste, D. V. 
(1999). Use of oral reading to estimate premorbid intellectual and 
neuropsychological functioning. Journal of the International Neu- 
ropsychological Society, 5, 247-254. 

Graf, P., & Uttl, B. (1995). Component processes of memory: Changes 
across the adult lifespan. Swiss Journal of Psychology, 54, 113-130. 

Grober, E., & Sliwinski, M. (1991). Development and validation of a 
model for estimating premorbid verbal intelligence in the elderly. 
Journal of Clinical and Experimental Neuropsychology, 13, 933-949. 

Isella, V., Villa, M. L., Forapani, E, Piamarta, A., Russo, I. M., & 
Appolonio, I. M. (2005). Ineffectiveness of an Italian NART- 
equivalent for the estimation of verbal learning ability in normal 
elderly. Journal of Clinical and Experimental Neuropsychology, 27, 
618-623. 

Ivnik, R. J., Malec, J. E, Smith, G. E., Tangalos, E. G., & Petersen, R. C. 
(1996). Neuropsychological tests norms above age 55: COWAT, 
BNT, token, WRAT-R reading, AMNART, Stroop, TMT, JLO. The 
Clinical Neuropsychologist, 10, 262-278. 



200 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



Johnstone, B., Callahan, C. D., Kapila, C. J., & Bouman, D. E. (1996). 
The comparability of the WRAT-R reading test and NAART as 
estimates of premorbid intelligence in neurologically impaired 
patients. Archives of Clinical Neuropsychology, 11, 513-519. 

Kondel, T. K., Mortimer, A. M., Leeson, M. C, Laws, K. R., & 
Hirsch, S. R. (2003). Intellectual differences between schizo- 
phrenic patients and normal controls across the adult lifespan. 
Journal of Clinical and Experimental Neuropsychology, 25, 1045- 
1056. 

Lucas, S. K., Carstairs, J. R., & Shores, E. A. (2003). A comparison of 
methods to estimate premorbid intelligence in an Australian 
sample: Data from the Macquarie University Neuropsychological 
Normative Study (MUNNS). Australian Psychologist, 38, 227- 
237. 

Maddrey, A. M., Cullum, C. M., Weiner, M. E, & Filley, C. M. (1996). 
Premorbid intelligence estimation and level of dementia in 
Alzheimer's disease. Journal of the International Neuropsychologi- 
cal Society, 2, 551-555. 

Moss, A. R., & Dowd, T. (1991). Does the NART hold after head 
injury: A case report. British Journal of Clinical Psychology, 30, 
179-180. 

Nelson, H. E. (1982). National Adult Reading Test (NART): Test man- 
ual. Windsor, UK: NFER Nelson. 

Nelson, H. E., & O'Connell, A. (1978). Dementia: The estimation of 
pre-morbid intelligence levels using the new adult reading test. 
Cortex, 14, 234-244. 

Nelson, H. E., & Willison, J. (1991). National Adult Reading Test 
(NART): Test manual (2nd ed.). Windsor, UK: NFER Nelson. 

O'Carroll, R. E. (1987). The inter-rater reliability of the National 
Adult Reading Test (NART): A pilot study. British Journal of Clin- 
ical Psychology, 26, 229-230. 

O'Carroll, R. E., Moffoot, A., Ebmeier, K. P., & Goodwin, G. M. 
(1992). Estimating pre-morbid intellectual ability in the alcoholic 
Korsakoff syndrome. Psychological Medicine, 22, 903-909. 

Paque, L., & Warrington, E. K. (1995). A longitudinal study of read- 
ing ability in patients suffering from dementia. Journal of the In- 
ternational Neuropsychological Society, 1, 517-524. 

Paolo, A. M., Troster, A. I., Ryan, J. J., & Roller, W. C. (1997). Compar- 
ison of NART and Barona demographic equation premorbid IQ 
estimates in Alzheimer's disease. Journal of Clinical Psychology, 53, 
713-722. 

Parkin, A. J., & Java, R. I. (1999). Deterioration of frontal lobe func- 
tion in normal aging: Influences of fluid intelligence versus per- 
ceptual speed. Neuropsychology, 13, 539-545. 

Patterson, K., Graham, N., & Hodges, J. R. (1994). Reading in demen- 
tia of the Alzheimer type: A preserved ability? Neuropsychology, 8, 
395-407. 

Raguet, M. L., Campbell, D. A., Berry, D. T. R., Schmitt, F. A., & 
Smith, G. T. (1996). Stability of intelligence and intellectual pre- 
dictors in older persons. Psychological Assessment, 8, 154-160. 

Riley, G. A., & Simmonds, L. V. (2003). How robust is performance 
on the National Adult Reading Test following traumatic brain in- 
jury? British Journal of Clinical Psychology, 42, 319-328. 

Russell, A. J., Munro, J., Jones, P. B., Hayward, P., Hemsley, D. R., & 
Murray, R. M. (2000). The National Adult Reading Test as a mea- 
sure of premorbid IQ in schizophrenia. British Journal of Clinical 
Psychology, 39, 297-305. 



Ryan, J. J., & Paolo, A. M. (1992). A screening procedure for estimat- 
ing premorbid intelligence in the elderly. The Clinical Neuropsy- 
chologist, 6, 53-62. 

Schlosser, D., & Ivison, D. (1989). Assessing memory deterioration 
with the Wechsler Memory Scale, the National Adult Reading 
Test, and the Schonell Graded Word Reading Test. Journal of Clin- 
ical and Experimental Neuropsychology, 11, 785-792. 

Sharpe, K., & O'Carroll, R. (1991). Estimating premorbid intel- 
lectual level in dementia using the National Adult Reading Test: 
A Canadian study. British Journal of Clinical Psychology, 30, 
381-384. 

Starr, J. M., Whalley, L. J., Inch, S., & Shering, P. A. (1992). The quan- 
tification of the relative effects of age and NART-predicted IQ on 
cognitive function in healthy old people. International Journal of 
Geriatric Psychiatry, 7, 153-157. 

Stebbins, G. T, Gilley, D. W., Wilson, R. S., Bernard, B. A., & Fox, J. H. 
(1990). Effects of language disturbances on premorbid esti- 
mates of IQ in mild dementia. The Clinical Neuropsychologist, 4, 
64-68. 

Stebbins, G. T, Wilson, R. S., Gilley, D. W., Bernard, B. A., & Fox, J. H. 
(1988). Estimation of premorbid intelligence in dementia. Journal 
of Clinical and Experimental Neuropsychology, 10, 63-64. 

Stebbins, G. T, Wilson, R. S., Gilley, D. W, Bernard, B. A., & Fox, J. H. 
(1990). Use of the National Adult Reading Test to estimate 
premorbid IQ in dementia. The Clinical Neuropsychologist, 4, 
18-24. 

Storandt, M., Stone, K., & LaBarge, E. (1995). Deficits in reading per- 
formance in very mild dementia of the Alzheimer type. Neuropsy- 
chology, 9, 174-176. 

Taylor, R. (1999). National Adult Reading Test performance in estab- 
lished dementia. Archives of Gerontology and Geriatrics, 29, 291- 
296. 

Taylor, K. I., Salmon, D. P., Rice, V. A., Bondi, M. W., Hill, L. R., 
Ernesto, C. R., & Butters, N. (1996). A longitudinal examination 
of American National Adult Reading Test (AMNART) perfor- 
mance in dementia of the Alzheimer type (DAT): Validation and 
correction based on rate of cognitive decline. Journal of Clinical 
and Experimental Neuropsychology, 18, 883-891. 

Uttl, B. (2002). North American Reading Test: Age norms, reliability, 
and validity. Journal of Clinical and Experimental Neuropsychol- 
ogy, 24, 1123-1137. 

Utti, B., & Graf, P. (1997). Color Word Stroop test performance across 
the life span. Journal of Clinical and Experimental Neuropsychology, 
19, 405-420. 

Van den Broek, M. D., & Bradshaw, C. M. (1994). Detection of ac- 
quired deficits in general intelligence using the National Adult 
Reading Test and Raven's Standard Progressive Matrices. British 
Journal of Clinical Psychology, 33, 509-515. 

Watt, K. J., & O'Carroll, R. E. (1999). Evaluating methods for estimat- 
ing premorbid intellectual ability in closed head injury. Journal of 
Neurology, Neurosurgery, and Psychiatry, 66, 474-479. 

Wiens, A. N., Bryan, J. E., & Crossen, J. R. (1993). Estimating WAIS-R 
FSIQ from the National Adult Reading Test-Revised in normal 
subjects. The Clinical Neuropsychologist, 7, 70-84. 

Willshire, D., Kinsella, G., & Prior, M. (1991). Estimating WAIS-R 
from the National Adult Reading Test: A cross-validation. Journal 
of Clinical and Experimental Neuropsychology, 13, 204-216. 



NEPSY: A Developmental Neuropsychological Assessment 201 
NEPSY: A Developmental Neuropsychological Assessment 



PURPOSE 

The NEPSY was designed to assess neuropsychological devel- 
opment in preschoolers and children. 

SOURCE 

The NEPSY (Korkman et al., 1998) can be obtained from 
the Psychological Corporation, 19500 Bulverde Rd, San Anto- 
nio, TX 78259 (www.harcourtassessment.com). American 
and Finnish versions are available from this publisher. The 
complete NEPSY testing kit costs approximately $600 US. The 
NEPSY Scoring Assistant for computer scoring costs approxi- 
mately $150 US. 

A French version of the test is available from Les Editions 
du Centre de Psychologie Appliquee (ECPA; www.ecpa.fr), and 
French-Canadian norms from Quebec have also reportedly 
been collected. A revised Swedish version was also published in 
2000 (www.psykologiforlaget.se). The test has also been used in 
German-speaking children from Austria (Perner et al, 2002). 

AGE RANGE 

The test can be administered to children aged 3 to 12 years. 

TEST DESCRIPTION 

Background 

The NEPSY is the American adaptation of the NEPS, a Finnish 
instrument that first appeared over 25 years ago (Korkman, 
1980). As noted by Korkman (1999), the test originally con- 
sisted of only two to five tasks for 5- and 6-year-olds, designed 
along traditional Lurian approaches, scored in a simple pass/ 
fail manner, and calibrated so that items were passed by the 
vast majority of children. The NEPS was revised and expanded 
in 1988 and 1990 to include more tasks (including the VMI and 
Token Test as complements) and a wider age range (NEPS-U; 
Korkman, 1988a; 1988b). A Swedish version was also devel- 
oped (Korkman, 1990). The most recent version of the NEPSY, 
again revised and expanded to an even broader age range, was 
standardized in Finland (Korkman et al., 1997) and in the 
United States (Korkman et al, 1998). Specific differences be- 
tween the 1988 Finnish version and the most recent American 
and Finnish versions are listed in several sources (Korkman, 
1999; Korkman et al, 1998; Mantynen et al., 2001). 

Overview and Theoretical Orientation 

The NEPSY is the first instrument designed exclusively and a pri- 
ori as a neuropsychological battery for children. Although there 
are other neuropsychological batteries for children, these are 
based on modifications and downward extensions of existing 
adult batteries (e.g., Luria-Nebraska Neuropsychological Bat- 
tery — Children's Version: Golden, 1987; Reitan-Indiana Neu- 



ropsychological Test Battery for Children and Halstead-Reitan 
Neuropsychological Test Battery for Older Children: Reitan & 
Davidson, 1974; Reitan & Wolfson, 1985, 1992). The NEPSY was 
created as an instrument with four main functions: ( 1 ) sensitivity 
to subtle deficits that interfere with learning in children, (2) de- 
tection and clarification of the effects of brain damage or dys- 
function in young children, (3) utility for long-term follow-up of 
children with brain damage and dysfunction, and (4) provision 
of reliable and valid results for studying normal and atypical neu- 
ropsychological development in children (Korkman et al., 1998). 

The test's theoretical orientation is a melding of Lurian 
principles with developmental neuropsychological assessment. 
Although the precise way in which these two orientations in- 
teract is not fully described in the manual, further detail is 
provided by Korkman (1999). The most important similarity 
to Luria's method is the approach of analyzing neurocognitive 
disorders through an evaluation of component processes, us- 
ing systematic component-by-component assessment. Cog- 
nitive processes are viewed as complex capacities mediated by 
interacting functional systems. Some subtests were therefore 
designed to assess basic components within functional do- 
mains, whereas others were designed to assess cognitive func- 
tions that require integration of several functional domains. 
Level of performance and qualitative aspects of performance 
are both measured within and across functional domains 
(Korkman et al., 1998). The pattern of errors and qualitative 
behaviors is expected to change with age, which the authors 
indicate would provide information on normal and atypical 
development (Korkman et al, 1998). 

There are also significant differences compared to Luria's 
approach. Whereas Luria used a hypothesis-driven approach 
to assessment that included forming and revising hypotheses 
as the evaluation unfolded, the NEPSY approach is to screen 
across all Core domains first (using the Core battery), followed 
by a more in-depth analysis within domains where deficits are 
suspected (i.e., Expanded or Selective assessment; Korkman, 
1999). Another obvious difference is that the NEPSY comprises 
standardized subtests that are relatively homogenous for con- 
tent and that provide norm-based comparisons, an approach 
that is more in line with traditional psychological assessment of 
cognitive ability (e.g., Wechsler scales) than to Luria's methods. 

To interpret the test, the authors recommend the use of 
Core domains to identify impairments and subtest-level analy- 
sis to analyze the nature of impairments, followed by verifica- 
tion with corroborating information (Korkman et al, 1998). 
The authors stress that without appropriate training in pedi- 
atric neuropsychology, interpretation must be limited to a de- 
scription of neurocognitive strengths and weaknesses. 



Test Structure 

The NEPSY was designed as a flexible testing instrument. It 
includes five Core domains, along with additional Expanded 
and Supplemental subtests that can be administered selectively, 



202 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



depending on the type of assessment and characteristics of 
the examinee. Interestingly, unlike other cognitive batteries 
such as the Wechsler scales and WJ III, or neuropsychological 
batteries such as the Halstead-Reitan, the NEPSY does not 
yield an overall score. This is consistent with a conceptualiza- 
tion of neuropsychological functioning as reflecting indepen- 
dent but related functional systems. 

The five Core domains, shown in Table 6-63 with their as- 
sociated subtests, include: ( 1 ) Attention/Executive Functions, 
(2) Language, (3) Sensorimotor Functions, (4) Visuospatial 
Processing, and (5) Memory and Learning. There are two ver- 
sions of the test depending on the age range of the child (i.e., 



Table 6-63 NEPSY Core and Expanded Subtests 
Across Age 





Ages 3-4 


Ages 5-12 


Attention/Executive 






Tower 




/ 


Auditory Attention and 






Response Set 




/ 


Visual Attention 


/ 


/ 


Statue 


/ 




Design Fluency 




* 


Knock and Tap 




* 


Language 






Body Part Naming 


/ 




Phonological Processing 


/ 


/ 


Speeded Naming 




/ 


Comprehension of 






Instructions 


/ 


/ 


Repetition of Nonwords 




* 


Verbal Fluency 


* 


* 


Oromotor Sequencing 


* 


* 



Sensorimotor Functions 

Fingertip Tapping 

Imitating Hand Movements </ 

Visuomotor Precision </ 

Manual Motor Sequences 

Finger Discrimination 

Visuospatial Processing 

Design Copy </ 

Arrows 

Block Construction S 

Route Finding 

Memory and Learning 

Memory for Faces 

Memory for Names 

Narrative Memory / 

Sentence Repetition / 

List Learning 

</ Core Subtest. * Expanded Subtest. 
Source: Adapted from Korkman et at, 1998. 



/ 
/ 
/ 



/ 
/ 



/ 
/ 

/ 



3-4 years, and 5-12 years). Core and Expanded subtests differ 
somewhat for the two age ranges (see Table 6-63). Subtests are 
described in Table 6-64. 

An Expanded or Selective Assessment, with additional 
subtests not included in the Core domains, can also be admin- 
istered when specific neuropsychological aspects need to be 
evaluated in more depth. Additionally, "Supplemental Scores" 
allow a more fine-grained analysis of performance on Core do- 
main subtests; these are conceptually separate from the two 
"Supplemental Subtests" that assess basic skills such as orienta- 
tion and handedness. Qualitative Observations, taken by the 
examiner during the administration of the NEPSY subtests, can 
also be quantified and compared with expected levels in the 
standardization sample to provide a process-approach compo- 
nent to the assessment. See Table 6-65 for a listing of specific 
Qualitative Observation scores. A schematic representation of 
NEPSY domains, subtests, and scores is shown in Figure 6-13. 

Because it allows broad and specific analyses of test perfor- 
mance along with qualitative analysis of testing behavior, the 
NEPSY allows the examiner to design a multidimensional as- 
sessment that can be customized to suit the needs of the indi- 
vidual child (Kemp et al., 2001). 

ADMINISTRATION TIME 

Administration time depends on the type of assessment per- 
formed and the age of the child. The Core Assessment takes 
approximately 45 minutes in children ages 3 to 4 years and 65 
minutes in children age 5 and older. The full NEPSY takes 
about one hour in younger children and two hours in chil- 
dren 5 and older. 



ADMINISTRATION 

See manual for specific instructions. Different subtests are ad- 
ministered depending on the age of the child (see Table 6-63). 
Consequently, different forms are used for younger (ages 3-4) 
and older children (ages 5-12). Specific administration proce- 
dures for children with special needs, including those with at- 
tention problems, language disorder, or hearing, visual, or 
motor impairments, are outlined in Kemp et al. (2001). 

Test refusal is common for some NEPSY subtests in 3'/2- 
year-olds, especially in subtests requiring verbal expression or 
that have no manipulatives (e.g., Sentence Repetition, Finger 
Discrimination), with the best cooperation rate obtained for 
Visuomotor Precision (Mantynen et al, 2001). Users who test 
recalcitrant preschoolers may therefore wish to begin the eval- 
uation with tests that yield the best cooperation rates, such as 
Visuomotor Precision, Visual Closure, Phonological Pro- 
cessing, and Comprehension of Instructions (see Mantynen 
et al., 2001, for specific refusal rates per subtest). 

Materials 

NEPSY testing materials are bright, child-friendly, and easy to 
administer. Test protocols are attractively designed and easy to 



Table 6-64 Descriptions of NEPSY Subtests 



Attention/Executive Domain 
Tower 



Auditory Attention and 
Response Set 



Visual Attention 
Design Fluency 
Statue 
Knock and Tap 



Language Domain 
Body Part Naming 

Phonological Processing 

Speeded Naming 
Comprehension of Instructions 

Repetition of Nonsense Words 

Verbal Fluency 

Oromotor Sequencing 

Sensorimotor Domain 
Fingertip Tapping 
Imitating Hand Movements 
Visuomotor Precision 

Manual Motor Sequences 
Finger Discrimination 

Visuospatial Processing Domain 
Design Copy 

Arrows 

Block Construction 

Route Finding 



Designed to assess planning, monitoring, self-regulation, and problem solving; similar to other tower 
paradigms (e.g., Tower of Hanoi). The child must move three colored balls to specific positions on 
three pegs in a specific number of moves and under time constraints. 

Continuous performance task purported to measure vigilance, selective auditory attention, and set 
shifting. The first condition is an auditory attention task during which the child must respond to 
specific words on a tape and resist responding to distracters. The second condition introduces 
conflicting demands between the actual stimuli and the response required (i.e., when the child 
hears "red," he or she must pick up a yellow square), which demands set shifting and response 
inhibition 

Two-part visual cancellation task that assesses both speed and accuracy at detecting targets among 
distracters 

A nonverbal fluency task during which the child must draw as many novel designs as possible in a 
given time limit from both structured and unstructured dot arrays 

Assesses response inhibition and motor persistence. During this subtest, the child is asked to stand 
still over a 75-second interval during which the examiner produces distracting stimuli. 

A motor task that measures self-regulation and response inhibition. The child is required to learn 
simple hand gestures in response to specific hand gestures from the examiner, and then learn a 
conflicting set of responses that requires the child to inhibit the previously learned gestures as 
well as the tendency to imitate the examiner's gestures 



Simple naming task for younger children that involves naming body parts on a picture of a child 
or on the child's own body 

A two-part test that requires the child to identify words based on presented word segments, and then 
to create a new word by omitting or substituting word segments or phonemes 

Requires the child to name the size, shape, and color of familiar stimuli as rapidly as possible 

Auditory comprehension task that requires the child to point to the correct picture in response to 
examiner commands of increasing syntactic complexity 

Assesses the child's phonological encoding and decoding skills with regard to novel sound patterns 
by requiring the child to repeat nonsense words presented on audiotape 

A standard verbal fluency paradigm. The child must produce as many words as possible from semantic 
and/or phonological categories (in younger children, animals and foods; children ages 7+ also 
must provide as many words as possible beginning with the letters "F" and "S") 

Involves the repetition of tongue twisters to assess oromotor coordination 

Tapping test that assesses speeded finger dexterity 

Involves the child copying complex hand positions demonstrated by the examiner 

A paper- and-pencil task involving timed eye-hand coordination in which the child is asked to 
rapidly trace the path of a vehicle on paper without crossing any lines 

Involves the imitation of rhythmic hand movement sequences 

A finger agnosia test in which the examiner touches the child's fingers, who then must identify the 
fingers without visual input 



Similar to the Beery Visuomotor Integration Test in that the child must copy two-dimensional 
designs of increasing difficulty on paper 

Subtest is similar to the Judgment of Line Orientation test in that the child must select the arrow 
that points to a target from a number of arrows with different orientations 

Requires the child to reproduce three-dimensional block constructions using actual models and 
pictures, using unicolored blocks 

Visual-spatial task involving finding the correct route leading to a target on a map 

(continued) 



204 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 
Table 6-64 Descriptions of NEPSY Subtests (continued) 



Memory and Learning Domain 
Memory for Faces 
Memory for Names 

Narrative Memory 

Sentence Repetition 



A face recall task involving recalling a series of photographs of children's faces 

Involves repeated exposure trials to a set of cards on which are drawn children's faces; the child is 
required to learn the name associated with each face 

A story recall task; the examiner reads a story to the child, who then must recite it from memory; if 
trials are failed, multiple-choice items are administered 

Sentences are aurally presented to the child, who must then recite the sentences to the examiner 



Note: The NEPSY also includes additional Supplementary subtests assessing Orientation and Handedness. 
Source: Adapted from Korkman et at, 1998. 



use during test administration and hand scoring. Instructions 
for administering the NEPSY are included on an easel for 
some tests and in the manual for others. Because the manual 
is a small paperback, accessing and using the instructions in 
the manual while engaged in testing is somewhat cumber- 
some. However, the entire test is easily portable in a carrying 
case, which facilitates test administration at the bedside. 

SCORING 

Index scores for the five Core domains are provided in stan- 
dard score format (M= 100, SD= 15), with corresponding 
percentile ranks and confidence intervals. All of the Core and 
most of the Expanded subtest scores are derived in scaled 
score format (M= 10, SD=3). For Core domain subtests 
that have either non-normal distributions or significant ceil- 
ing or floor effects in the standardization sample, percentile 
ranges are provided instead. This includes Knock and Tap, 
Finger Discrimination, Route Finding, Manual Motor Se- 
quences, and Oromotor Sequences. Percentages of children in 
the normative sample with specific scores are provided for the 
optional Handedness and Orientation subtests. The NEPSY 
Manual provides tables to evaluate the significance of Core 
domain and subtest scores as well as frequencies of discrep- 
ancy scores. 

Supplemental Scores (see Table 6-65) are all derived as cu- 
mulative percentages of the standardization sample. Qualita- 
tive Observations are derived as either cumulative percentages 
of the standardization sample or as base rates reflecting the 
percentage of children in the standardization sample showing 
the particular behavior. The authors note that the Supple- 
mental and Qualitative Observation scores are not equivalent 
to the percentiles obtained for the Expanded subtests de- 
scribed previously. While the former reflect actual base rates 
of occurrence of specific behaviors, the latter represent per- 
centile ranks that underwent smoothing to adjust for sam- 
pling irregularities (Korkman et al., 1998). 

The NEPSY computer-scoring program greatly facilitates 
derivation of scores and is a significant time saver. It is well 
designed and easy to use, and printouts are easy to read and 
interpret. 



DEMOGRAPHIC EFFECTS 



Age 



Significant age effects are found on all NEPSY subtests. These 
are strongest from 5 to 8 years of age and moderate from 9 to 
12 years (Korkman et al., 2001). From 10 to 12 years, only flu- 
ency subtests and Sentence Repetition show age-related 
increases, which may relate to either psychometric limitations 
of the tasks or to an actual developmental plateau. Memory 
for Faces does not improve significantly after age 8, and ceil- 
ing effects may curtail age-related increases on Imitating 
Hand Positions. Overall, the subtests that show the greatest 
sensitivity to age, with continued, steady improvements over 
the age span, are Design Fluency, Verbal Fluency, Speeded 
Naming, and Sentence Repetition (Korkman et al., 2001). Sig- 
nificant age effects, graphically illustrated by growth curves, 
have also been published for Phonological Processing, Speeded 
Naming, and Sentence Repetition for the NEPSY's Finnish 
standardization sample (Korkman et al., 1999), with the most 
significant increments occurring before age 9. 



Gender 

Overall, gender effects are minimal. However, there is a small 
but significant, temporary advantage for girls on language 
subtests assessing Phonological Processing and Sentence Rep- 
etition (Korkman et al., 1999), at least at the start of reading 
instruction. A small gender effect favoring boys on the Arrows 
subtest in Zambian schoolchildren has also been reported 
(Mulenga et al., 2001). An interaction between gender and 
prenatal drug exposure favoring girls over boys has also been 
reported for the NEPSY Language domain (Bandstra et al., 
2002), with boys scoring about one-fifth of an SD below girls. 



Ethnicity 

There are few studies on cultural influences and performance 
on the NEPSY. A study in urban Zambian children found that 
most scores were within one SD of U.S. norms. However, lower 
scores were found on the NEPSY Attention/Executive and 



Table 6-65 Supplemental Scores and Qualitative Observations for Each NEPSY Subtest 



Domain 


Subtest 


Attention/Executive 


Tower 




Auditory Attention and 




Response Set 




Visual Attention 




Design Fluency 


Language 


Body Part Naming 




Phonological Processing 




Speeded Naming 



Sensorimotor 



Visuospatial 



Learning and Memory 



Comprehension of Instructions 
Repetition of Nonsense Words 
Verbal Fluency 



Oromotor Sequences 



Fingertip Tapping 



Imitating Hand Positions 
Visuomotor Precision 
Manual Motor Sequences 



Design Copying 

Arrows 

Sentence Repetition 

Memory for Faces 

Memory for Names 

Narrative Memory 

List Learning 



Note: Supplemental scores are in addition to Total scores for each subtest. 
Source: Adapted from Korkman et al„ 1998. 



Supplemental Scores 



Attention task score 
Response Set task score 
Omission Errors (by task) 
Commission Errors (by task) 

Time to Completion (by task) 
Omission Errors (by task) 
Commission Errors (by task) 

Structured Array score 
Random Array score 



Time to Completion 
Accuracy 



Animals Trial score 
Food/Drink Trial score 
Semantic score 
Phonemic score 



Repetitions score 
Sequences score 
Preferred Hand score 
Nonpreferred Hand score 



Preferred Hand score 
Nonpreferred Hand score 

Time to Completion (by item) 
Errors (by item) 



Immediate Memory score 
Delayed Memory score 

Learning Trials score 
Delayed Memory score 

Free Recall points 
Cued Recall points 

Learning effect 
Interference effect 
Delay effect 
Learning curve 



Qualitative 
Observations 

Rule violations 
Motor difficulty 
Off-task behavior 



Off-task behavior 



Poor articulation 

Asks for repetition 

Increasing voice volume 
Reversed sequences 

Body movement 

Asks for repetition 

Stable misarticulation 

Increasing voice volume 
Body movement 



Stable misarticulation 
Oromotor hypotonia 
Rate change 

Visual guidance 

Incorrect positioning 

Posturing 

Mirroring 

Overflow 

Rate change 

Mirror hand 
Other hand helps 

Pencil grip 

Recruitment 
Overflow 
Perseveration 
Loss of asymmetry 
Body movement 
Forceful tapping 
Rate change 

Pencil grip 
Hand tremor 

Rotation 

Asks for repetition 
Off-task behavior 



Repetitions 
Novel intrusions 
Interference intrusions 



206 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 

Figure 6-1 3 Schematic representation of NEPSY structure. Source: Reprinted with permission 
from Korkman et al., 1998. 



The NEPSY has 
five domains. 



Domai 



ins 



Each domain is 
composed of Core 
and Expanded 
subtests. 



Expanded 
Subtests 



Qualitative Observations 
may also be made at the 
subtest level. 




V^ Subtest scores may 
be further divided into 
Supplemental Scores. 



Language domains, and higher scores were found on the Visu- 
ospatial domain compared with U.S. norms (Mulenga et al., 
2001). Interestingly, Mulenga et al. (2001) observed that Zam- 
bian schoolchildren tended to work slowly on NEPSY tests in- 
volving speed, despite explicit instructions to work quickly. In 
British preschoolers, scores are marginally higher than U.S. 
norms and show lower variability on subtest scores, particu- 
larly on language subtests. This suggests that the NEPSY may 
be less sensitive to underachievement in England if U.S. norms 
are used (Dixon & Kelly, 2001). In studies involving African 
American, inner-city children, mean scores on the Language 
Core domain are somewhat lower than average (Bandstra et 
al., 2002, 2004), but generally within the broad range typically 
obtained for language tests in this group (see PPVT-III). 

IQ 

As expected, IQ is related to NEPSY performance. Correla- 
tions of NEPSY Domain scores to WISC-III FSIQ in the stan- 
dardization sample are highest for Language (r= .59), in the 
moderate range for Visuospatial, Memory and Learning, and 
Attention/Executive (r=.45, .41, and .37, respectively), and 
modest for the Sensorimotor domain (r= .25). Correlations 
to WPPSI-R follow the same general pattern, with the highest 
association between IQ and NEPSY Language (r=.57) and 
the lowest occurring between NEPSY Sensorimotor and At- 
tention/Executive (r= .31 and .26, respectively). See also Va- 
lidity for further discussion. 

NORMATIVE DATA 

Standardization Sample 

Table 6-66 shows detailed sample characteristics. The NEPSY 
normative sample is a national, stratified random sample of 
1000 children, aged 3 to 12 years, with data collected between 
1994 and 1996. There were 100 children (50 boys, 50 girls) 
in each of 10 age groups. Stratification by age, race/ethnicity, 



geographic location, and parental education was based on 
1995 US. Census data (Korkman et al., 1998). Children with 
neurological or other conditions were excluded (Kemp et al., 
2001). 



Table 6-66 Characteristics of the NEPSY Normative Sample 



Number 


1000 a 


Age 


3 to 12 years, 11 months 


Geographic location 




Northeast 


20% 


South 


35% 


North Central 


24% 


West 


23% 


Sample type 


National, stratified, random 




sample b 


Parental education 




0-11 years 


10% 


12-15 years 


60% 


16 years or more 


30% 


SES 


Reflected by parental 




education 


Gender 




Males 


50% (overall, and in each age 




band) 


Females 


50% 


Race/ethnicity 




African American 


16% 


Hispanic 


12% 


White 


69% 


Other 


4% 



Screening 



Children with diagnosed neurological, 
psychological, developmental, or 
learning disabilities were excluded 



a Based on 10 age groupings of 100 cases each. b Based on 1995 U.S. Census data, strati- 
fied according to age, gender, geographic location, race/ ethnicity, and parental educa- 
tion. 

Source: Adapted from Korkman et al., 1998, and Kemp et al., 2001. 



NEPSY: A Developmental Neuropsychological Assessment 207 



Table 6-67 NEPSY Domain Scores and Subtest Scores for Zambian Schoolchildren 



Domains and Subtests 



9-Year-Olds 


11-Year-C 


Hds 


M 


SD 


M 


SD 


93.56 


10.30 


89.60 


15.50 


9.80 


1.85 


11.10 


3.65 


11.28 


2.05 


10.45 


2.35 


6.40 


3.73 


4.25 


3.31 


7.88 


3.19 


7.05 


3.36 


85.64 


11.32 


87.05 


13.37 


8.84 


2.85 


10.15 


2.70 


6.28 


3.60 


5.95 


2.87 


8.12 


3.72 


7.65 


3.96 


7.48 


2.68 


6.45 


4.85 


8.48 


3.12 


8.55 


2.74 


94.16 


19.71 


99.45 


10.00 


9.52 


4.56 


10.80 


2.14 


9.80 


2.57 


10.25 


2.24 


8.28 


4.46 


8.90 


1.52 


111.80 


16.14 


113.70 


20.38 


15.44 


3.15 


15.30 


3.66 


8.72 


4.60 


9.55 


5.35 


9.36 


2.91 


8.25 


3.68 


100.76 


14.32 


97.25 


14.45 


12.28 


2.94 


10.20 


2.86 


9.72 


2.113 


8.20 


3.68 


8.24 


3.33 


10.45 


2.83 


9.44 


4.02 


9.25 


3.70 


9.92 


2.50 


8.95 


2.21 



Attention and Executive Functioning 

Tower 

Auditory Attention and Response Set 

Visual Attention 

Design Fluency 

Language 

Phonological Processing 

Speeded Naming 

Comprehension of Instructions 

Repetition of Nonsense Words 

Verbal Fluency 

Sensorimotor Functioning 

Fingertip Tapping 

Imitating Hand Positions 

Visuomotor Precision 

Visuospatial Processing 

Design Copying 

Arrows 

Block Construction 

Memory and Learning 

Memory for Faces 

Memory for Names 

Narrative Memory 

Sentence Repetition 

List Learning 

Source: Mulenga et at, 2001. Reprinted with permission from Lawrence Erlbaum. 



Other Normative Data 

Mulenga et al. (2001) reported that there is an overall lack of 
normative data for children in the developing world. As a re- 
sult, these investigators administered the NEPSY to 45 Zam- 
bian schoolchildren tested in English. Means and standard 
deviations are presented in Table 6-67. Note that children 
were schooled in English and spoke English as a second lan- 
guage, in addition to local dialects. Attention/Executive and 
Language domain scores are lower than those in the U.S. 
norms, but scores for the Visuospatial Processing domain are 
higher. In comparison, the mean score for 7-year-old, African 
American, low SES, inner-city children on the Language Core 
domain is approximately 87 points (M=86.8, SD= 13.3; 
N- 176; Bandstra et al, 2002; M= 87.3, SD= 13.3, N= 192; 
Bandstra et al., 2004). 



Score Conversion Procedures 

To form the NEPSY normative tables, scale score conversions 
were derived through conversion of raw scores to normalized 
z-scores through direct lookup tables corresponding to normal 
curve values (Korkman et al., 1998). Comparisons of resulting 
distributions across age were evaluated, and irregularities were 
eliminated by smoothing and polynomial curve fitting. The 



process was repeated for sums of scaled scores to derive index 
scores. Six-month interval values for subtests were interpo- 
lated from the whole -year norms (Korkman et al., 1998). 

Five subtests had highly skewed raw score distributions. 
These were Finger Discrimination, Route Finding, Manual 
Motor Sequences, Oromotor Sequences, and Knock and Tap. 
Percentile rankings were derived for these subtests instead of 
scaled scores after elimination of minor sampling irregulari- 
ties by smoothing (Korkman et al., 1998). As noted in Kork- 
man et al. (2001), as a general rule, subtests with skewness of 
+2.5 were converted to percentiles. A table containing infor- 
mation on raw score ranges, means, SDs, standard error, and 
degree of skewness and kurtosis is provided in Korkman et 
al. (2001). Because percentile rankings can be misleading 
when they are derived from non-normal samples, the au- 
thors made a decision to include only general percentile 
ranking classifications for these subtests. These are shown in 
Table 6-68. 

Some subtests require consideration of both speed and ac- 
curacy to adequately describe performance (i.e., Visual Atten- 
tion, Visuomotor Precision). However, in most cases, accuracy 
scores were highly skewed in the standardization sample. 
Weights were therefore assigned to accuracy and speed scores 
to normalize the distribution, and to enable assessment of the 
combined effect of speed and accuracy on performance. The 



208 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 



Table 6-68 Percentile Ranking Classifications for NEPSY Subtests* 
With Non-Normal Raw Score Distributions 



Percentile Ranking 

< 2nd percentile 
3rd-10th percentile 
1 lth-25th percentile 
26th to 75th percentile 
>75th percentile 



Classification 

Well below expected level 

Below expected level 

Borderline 

At expected level 

Above expected level 



*Finger Discrimination, Route Finding, Manual Motor Sequences, Oromotor Sequences, 
and Knock and Tap. 

Source: Adapted from Korkman et at, 1998. 



total score for these subtests takes into account weighted ac- 
curacy and speed scores and is normally distributed (see p. 39 
of the manual for further details; Korkman et al, 1998). Ta- 
bles are provided in the manual to calculate these combined 
scores; this is done automatically by the scoring program. 

RELIABILITY 

Internal Consistency 

The manual presents detailed information on internal relia- 
bility based on the standardization sample (Korkman et al., 
1998). With regard to Core domain scores, reliability was ade- 
quate to high across age (Table 6-69). The authors note that 
the lower coefficient for the Attention/Executive domain in 3- 
to 4-year-olds, which is marginally adequate (r— .70), may be 
a result of developmental variability in this age range. The 
lowest domain score reliability in the 5 to 12 age group was for 
the Sensorimotor domain (r— .79). All other domain scores for 
this age group showed high reliability, but none were over .90, 
a minimal level recommended for clinical decision making 
(see Chapter 1). 

With regard to subtests, reliability coefficients were for 
the most part adequate to high. Reliabilities for ages 3 to 4 
and 5 to 12 are presented in Table 6-70. Reliabilities were 



marginal for Verbal Fluency in the 3 to 4 age group and for 
Visuomotor Precision in the 5 to 12 age group. Coefficients 
for Statue, Design Fluency, and Immediate Memory for Faces 
were low. 



Standard Error of Measurement 

For the Core domain scores, standard error of measurement 
(SEM) is largest for the Attention/Executive domain for ages 
3 to 4 (i.e., 8.22). Other SEMs range between four and seven 
standard score points. In the case of NEPSY subtests, the SEM 
for Statue at age 3 to 4 was 2.12; all other SEMs were less than 
two scaled score points (Korkman et al., 1998). Overall, these 
values are comparable to those obtained by other major as- 
sessment measures for children such as the WISC-III. 

Test-Retest Stability 

Because one of the NEPSY's goals is to facilitate assessment of 
children over time (Korkman et al., 1998), evidence of its tem- 
poral stability is particularly important. Test-retest stability 
coefficients in the manual are based on 168 normal children 
tested twice over a mean testing interval of 38 days (range 
2-10 weeks). Coefficients were calculated separately for each 
of five age groups consisting of 30 to 41 children and consist 
of Pearson's correlations corrected for variability. For those 
subtests with normative scores based on percentiles due to 
skewed distribution, consistency of percentile range classifica- 
tions from test to retest was used instead (e.g., Knock and Tap, 
Finger Discrimination, Route Finding, Manual Motor Se- 
quences, and Oromotor Sequences). 

A number of NEPSY subtests had less than optimal test- 
retest stability. Table 6-71 shows the NEPSY subtests classified 
according to the magnitude of their test-retest coefficients for 
different age groups. Over half of the NEPSY subtests had 
test-retest reliabilities that were marginal or low, even at this 
relatively short testing interval. This was the case for both 
age groups. Some reliabilities were adequate to high in the 
younger age group, but marginal or low in the older children 



Table 6-69 Internal Reliability of NEPSY Core Domain Scores 



Magnitude of Coefficient 

Very High (.90+) 

High (.80-.89) 



Adequate (.70-.79) 
Marginal (.60-.69) 
Low (<.59) 



Core Domain Score 
Ages 3-4 

Language 

Memory and Learning 

Sensorimotor 
Visuospatial 



Attention/Executive 



Core Domain Score 
Ages 5-12 



Attention/Executive 
Language 
Visuospatial 
Memory and Learning 

Sensorimotor 



Source: Adapted from Korkman et at, 1998. 



NEPSY: A Developmental Neuropsychological Assessment 209 



Table 6-70 Internal Reliability of NEPSY Subtests 



Magnitude of Coefficient Ages 3-4 

Very high (.90+) Sentence Repetition 



High (.80-.89) 



Adequate (.70-79) 



Marginal (.60-69) 



Low (<.59) 



Phonological Processing 
Comprehension of Instructions 
Imitating Hand Positions 
Visuomotor Precision 3 
Design Copying 
Block Construction 
Narrative Memory 

Body Part Naming 
Visual Attention" 



Statue b 

Verbal Fluency b 



Note: Supplemental scores are shown in italics. 
a Generalizability coefficient. b Test-retest reliability coefficient. 

Source: Adapted from Korkman et al., 1998. 



Ages 5-12 

Phonological Processing 
List Learning 

Tower 

Repetition of Nonsense Words 

Imitating Hand Positions 

Memory for Names 

Sentence Repetition 

Auditory Attention and Response Set" 

Immediate Memory for Names 

Comprehension of Instructions 

Design Copying 

Arrows 

Block Construction 

Memory for Faces 

Narrative Memory 

Verbal Fluency b 

Fingertip Tapping 13 

Visual Attention" 

Speeded Naming" 

Auditory Attention b 

Auditory Response Set b 

Visuomotor Precision" 
Delayed Memory for Faces 
Delayed Memory for Names 

Design Fluency b 
Immediate Memory for Faces 



(e.g., Narrative Memory, Imitating Hand Positions). The re- 
verse was also true (e.g., Verbal Fluency). 

Many subtests that base scores on percentile rankings had 
poor test-retest classification accuracy (see Table 6-72). At age 
3 to 4, subtest classifications were at chance levels. Although 
consistency from test to retest is better in older children, al- 
most half of the sample was misclassified at retest on some 
subtests (e.g., Manual Motor Sequences). The highest accu- 
racy was obtained for Statue at age 5, which correctly classified 
69% of children at retest. While this is clearly an improvement 
over chance, about 30% of children would be misclassified at 
retest, which is problematic if the test is to be used in clinical 
evaluation. Note that classification accuracy may be affected 
by practice effects, if children are classified in a higher cate- 
gory at retest because of improved performance. Whether 
practice effects actually affected classification accuracy is not 
detailed in the manual. 

Practice Effects 

Both practice effects and test-retest ability are important in 
assessing an instrument's ability to produce valid measure- 
ments over time. Practice effects, defined as improvements in 



performance at retest, were largest for the Core Memory do- 
main score and the Memory and Learning subtests, based on 
tables provided in the NEPSY Manual (Korkman et al, 1998). 

Specifically, in the 3 to 4 age group, all Core domain scores 
increased on average three to four points from test to retest. In 
the 5 to 6 age group, the Attention/Executive and Memory 
and Learning Core domain scores increased approximately 1 5 
points, with the remainder of domain scores increasing six 
to seven standard score points. In the 9 to 10 age group, all 
changes were less than five points except for Memory and 
Learning, which increased 15 points from test to retest. In the 
11 to 12 age group, all changes were less than six points, with 
the exception of Memory and Learning, which increased 1 1 
points (Korkman et al., 1998). It is unclear whether lower 
gains seen in older children may be due to ceiling effects. 

Overall, the magnitude of standard score changes from test 
to retest was considerable for some scores (e.g., Memory and 
Learning, and Attention/Executive domain in the 5-to-6-year 
group). These large test-retest differences must be taken into 
account when interpreting test-retest scores of individual 
children, as only very large increases are likely to be due to ac- 
tual improvement over and above the effects of practice. In- 
terestingly, the NEPSY Visuospatial Domain showed less of 



210 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 
Table 6-71 Test-Retest Stability Coefficients for NEPSY Subtests 



Magnitude of Coefficient Ages 3-4 



Ages 5-12 



Very high (.90+) 
High (.80-.89) 

Adequate (.70-79) 



Marginal (.60-.69) 



Low (<59) 



Narrative Memory 
Sentence Repetition 

Imitating Hand Positions 
Design Copying 



Auditory Attention and Response Set 

Repetition of Nonsense Words 
Verbal Fluency 
Fingertip Tapping 
Design Copying 
Memory for Names 
Sentence Repetition 
Auditory Attention 
Auditory Response Set 
Immediate Memory for Names 
Delayed Memory for Names 

Visual Attention 
Phonological Processing 
Block Construction 
Narrative Memory 
List Learning 

Tower 

Design Fluency 

Speeded Naming 

Comprehension of Instructions 

Imitating Hand Positions 

Visuomotor Precision 

Arrows 

Memory for Faces 

Immediate Memory for Faces 

Delayed Memory for Faces 

Note: Correlations denote Pearson's coefficients corrected for variability; Supplemental scores are shown in italics. 
Source: Adapted from Korkman et at, 1988. 



Visual Attention 
Body Part Naming 
Comprehension of Instructions 
Visuomotor Precision 

Statue 

Phonological Processing 
Verbal Fluency 
Block Construction 



a practice effect (i.e., about 0-6 points, depending on age) 
than the conceptually similar Perceptual Organization Index 
from the WISC-III, which shows a 10-to- 11 -point increase 
from test to retest in its standardization sample ( Wechsler, 
1991). Again, it is unclear whether ceiling effects may be con- 
founding results in the older age range. 



values). Kappas were below .50 for four Qualitative Obser- 
vation scores. These were Misarticulations During the Repe- 
tition of Nonsense Words, Mirroring and Rate Change 
During Fingertip Tapping, and Pencil Grip During Visuo- 
motor Precision (Korkman et al., 1998). Due to their poor 



Interrater Reliability 

According to the manual, for tests requiring examiner judg- 
ment/interpretation for scoring (i.e., Design Copying, Vi- 
suomotor Precision, Repetition of Nonsense Words), a high 
degree of interrater reliability was obtained in a subsample of 
50 children in the standardization sample (.97-99; Korkman 
et al., 1998). 

For Qualitative Observations, two trained raters inde- 
pendently rated a mixed sample of 21 children (Nash, 1995, 
as cited in Korkman et al., 1998). Interrater reliability of 
Qualitative Observations varied dramatically across domains 
(e.g., from as high as 1.00 for Tower Motor Difficulty to as 
low as 0.34 for Fingertip Tapping Mirroring; see Korkman 
et al, 1998, p. 188, or Kemp et al., 2001, p. 222, for specific 



Table 6-72 Classification Accuracy of Test-Retest Rankings for 
NEPSY Subtests 



Subtest 



Ages 3-4 Ages 5-12 



Statue 

Knock and Tap 

Route Finding 

Oromotor Sequences 

Finger Discrimination — Preferred 

Finger Discrimination — Nonpreferred 

Manual Motor Sequences 



50% 



47% 



69% 
65% 
65% 
62% 
61% 
56% 
54% 



Note: Values reflect the percentage of children correctly classified in the same percentile 
rankings at retest (i.e., decision consistency of classification). 

Source: Adapted from Korkman et al., 1998. The test-retest correlation for Statue at age 
3-4 is presented in Table 6-71. 



NEPSY: A Developmental Neuropsychological Assessment 211 



interrater reliability, interpretation of these specific scores is 
not recommended. 



VALIDITY 

Content 

The subtests comprising each Core domain were selected by 
the authors a priori, based on theoretical grounds and on 
prior research with the NEPSY. The test itself was developed 
in several phases. Based on experience with the previous 
Finnish and Swedish versions, the test was adapted for use in 
the United States following additional literature reviews, a pi- 
lot phase, bias review, a tryout version (comprised of 52 sub- 
tests), second bias review, second tryout version, and national 
standardization and further validation. 



Subtest Intercorrelations 

In some cases, intercorrelations of subtests within each Core 
domain are low, and some subtests are more highly correlated 
with subtests outside their assigned Core domain than with 
subtests within their Core domain (see pp. 361-363 in Kork- 
man et al, 1998, for subtest correlation matrix). Only the 
Language and Visuospatial subtests are moderately intercor- 
related within their respective Core domains (see Table 6-73 
for actual correlation ranges in the standardization sample). 
Strong intercorrelations of similar magnitude are also reported 
in a clinical study for these two domains (Till et al., 2001). 

Subtest intercorrelations for the Attention/Executive do- 
main are especially weak in the standardization sample. For 
example, in the 3 to 4 age group, the two subtests making up 
the Attention/Executive Core domain are only modestly cor- 
related (r= .24). Intercorrelations for the 5 to 12 age group 
are even lower (see Table 6-73). However, higher values are 
reported by Till et al. (2004), with moderate intercorrelations 
between some subtests (i.e., r— .42, NEPSY Visual Attention 
and NEPSY Statue) but not others (r- .17, NEPSY Visual At- 
tention and NEPSY Tower). See Comment for further discus- 
sion of this issue. 

Factor Structure and Subtest Specificity 

Only one study to date has examined whether the NEPSY's 
five-domain structure can be replicated empirically. Stinnett 
et al. (2002) conducted a principal axis factor analysis on 
standardization data for ages 5 to 12 and found little evi- 
dence for a five-factor model. Instead, they found a robust 
one-factor solution on which language-based subtests loaded 
most highly. This omnibus factor included Phonological 
Processing, Comprehension of Instructions, Memory for 
Names, and Speeded Naming. Stinnett et al. noted that fur- 
ther research using confirmatory factor analysis and factor 
analyses using clinical samples are needed. However, these 
results, in conjunction with the low correlations between 
some Core domain subtests, raise questions about the valid- 



Table 6-73 Intercorrelations Between Core Subtests Within 
NEPSY Core Domains 



Domain 

Attention/Executive 
Language 
Sensorimotor 
Visuospatial 
Memory and Learning 



Ages 3-4 

.24 
.40-.59 

.25 
.40 
.40 



Ages 5-12 

.07-. 18 
.32-38 
.14-.18 

.34 
.14-34 



Soil 



'.■Adapted from Korkman et at, 1998. 



ity of interpreting Core domain scores in the evaluation of in- 
dividual children. 

Stinnett et al. (2002) also examined the subtest specificity 
of NEPSY subtests (i.e., comparing unique variance con- 
tributed by each factor to common variance with the test as a 
whole, while taking into account each subtest's error vari- 
ance). To be considered as having ample specificity, a subtest 
should have unique variance that exceeds 25% of the total 
variance and have a total variance that is larger than error 
variance (Kaufman, 1994). Stinnett et al. concluded that only 
Phonological Processing and Memory for Names had enough 
unique variance and low enough error variance to be inter- 
preted separately from the main factor underlying the NEPSY. 
The remainder of subtests with sufficient unique variance had 
as much error variance as common variance, making individ- 
ual interpretation of subtests dubious at best. The authors 
also noted that nine subtests were judged to be too unreliable 
to be used individually in the interpretation of performance. 
These included Comprehension of Instructions, Speeded Nam- 
ing, Design Copying, Narrative Memory, Arrows, Visual At- 
tention, Visuomotor Precision, Finger Tapping, and Memory 
for Faces. Whether these conclusions also apply to clinical sam- 
ples remains an open question. 

Correlations With IQ 

Correlations between the NEPSY and WISC-III/WPPSI-R are 
shown in Table 6-74 for nonclinical data provided in the 
manual (Korkman et al., 1998). Overall, these indicate that 
the NEPSY clearly measures abilities that overlap with, but are 
distinct from, measures of general intelligence. Correlations 
to WISC-III FSIQ for NEPSY domain scores in the standardi- 
zation sample are highest for Language (r= .59), in the mod- 
erate range for Visuospatial, Memory and Learning, and 
Attention/Executive (r=.45, .41, and .37, respectively), and 
modest for Sensorimotor (r= .25; Korkman et al., 1998). Cor- 
relations to WPPSI-R FSIQ follow the same general pattern, 
with the highest association between IQ and NEPSY Lan- 
guage (r= .57) and the lowest for NEPSY Sensorimotor and 
Attention/Executive (r= .31 and .26, respectively). 

In particular, correlations between IQ tests and NEPSY 
provide particular support for the convergent validity of the 
NEPSY Language and NEPSY Visuospatial Index scores, 



212 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 
Table 6-74 Correlations Between the NEPSY and Tests of IQ/Mental Development 



NEPSY 


WISC-III 


WISC-III 


WISC-III 


WISC-III 


WISC-III 


WPPSI-R 


BSID-II 


BSID-II 


Index 


FSIQ 


VCI 


POI 


FFD 


PSI 


FSIQ 


MDI 


PSI 


Attention/Executive 


.37 


.33 


.25 


.35 


.26 


.26 


-.31 


-.37 


Language 


.59 


.58 


.36 


.57 


.29 


.57 


.61 


-.11 


Sensorimotor 


.25 


.17 


.18 


.24 


.32 


.31 


.31 


.22 


Visuospatial 


.45 


.36 


.45 


.30 


.19 


.47 


-.04 


-.09 


Memory and Learning 


.41 


.43 


.22 


.35 


.26 


.51 


.05 


-.56 



Note: WISC-III data represents 127 nonclinical cases (mean age of 9), WPPSI-R data represents 45 nonclinical cases (mean age of 4), and BSID-II data represents data on 20 3-year- 
olds. WISC-III - Wechsler Intelligence Scale for Children — Third Edition; FSIQ - Full-Scale Intelligence Quotient; VCI - Verbal Comprehension Index; POI — Perceptual Organiz- 
ation Index; FFD - Freedom from Distractibility; PSI - Processing Speed Index; WPPSI-R - Wechsler Preschool and Primary Scale of Intelligence — Revised; BSID-II - Bayley 
Scales of Infant Development — Second Edition; MDI - Mental Development Index; PSI - Psychomotor Development Index. 

Source: Adapted from Korkman et at, 1998. 



which correlate moderately to highly with similar scores from 
IQ tests (e.g., r= .57 between NEPSY Language and WISC-III 
VCI, and r = .45 between NEPSY Visuospatial and WISC-III 
POI). Less convergent support is found for other NEPSY do- 
mains. However, the NEPSY Attention/Executive domain 
score is modestly related to WISC-III FFD to about the same 
extent as it is to the FSIQ (r= .35 and .37, respectively; Kork- 
man etal., 1998). 

Note that the mean index scores for the WISC-III pre- 
sented in the manual are slightly higher than NEPSY scores 
(i.e., about three points), which may relate to the older WISC- 
III norms. However, NEPSY and WPPSI-R scores were gener- 
ally equivalent (see data in Korkman et al., 1998). Correlations 
to newer versions of these Wechsler scales (WISC-IV, WPPSI- 
III) are not yet available. 

The manual also presents NEPSY and BSID-II data for a 
small group of normal 3-year-olds. Unexpectedly, negative 
correlations are found between some scores (see Table 6-74), 
despite the strong association between NEPSY Language do- 
main and developmental level as measured by the MDI (i.e., 
r=.61; Korkman et al., 1998). Differences in mean scores 
were also relatively large; the most pronounced was between 
the NEPSY Attention/Executive score and the MDI, which 
showed a mean difference of over 16 points. While these dis- 
crepancies may reflect the limitations of a small sample size, 
caution is recommended in comparing test scores in this age 
range until further research is conducted comparing the two 
measures. 

Correlations With Achievement Tests 
and School Grades 

One of the main uses of testing children is predicting how 
deficits impact school functioning. As a result, measures that 
are sensitive to school performance are of considerable utility 
for clinical assessment. The NEPSY Manual presents data on 
correlations between NEPSY scores and school grades for a 
large sample of children {N= 445; Korkman et al., 1998). As is 
the case for most test batteries, the language-based score (i.e., 
NEPSY Language domain) was most predictive of school 



grades and showed the highest correlations with language- 
based school grades (r= .40). The NEPSY Language domain 
also predicted school grades in nonlanguage areas such as 
mathematics and science (r=.37 and .34, respectively). All 
other correlations were modest (range .10-32), with some of 
the lowest correlations to school grades occurring for the 
NEPSY Attention/Executive domain (r— .10-. 17) and Senso- 
rimotor domain (.13-17). 

Similar relationships occur when the NEPSY is compared 
to standardized achievement tests such as the WIAT. In a 
small sample of children with LD presented in the manual 
(N= 39), NEPSY Language is highly related to almost all 
WIAT test composites, including Reading, Mathematics, and 
Writing (r= .26-.41; Korkman et al., 1998). One exception is 
NEPSY Attention/Executive, which demonstrates an impres- 
sive correlation with WIAT Language but not WIAT Reading 
(r= .57 versus r= .08). Although the NEPSY Visuospatial In- 
dex is moderately related to the Mathematics Composite, 
NEPSY Language is equally related to success in this area 
(r= .44 versus r— .41). The NEPSY Memory and Learning In- 
dex is virtually unrelated to WIAT scores (r= .06-15), except 
for a modest correlation with WIAT Language (r= .24). Corre- 
lations between the newer WIAT-II are not available. Correla- 
tions with other achievement tests are also shown in the 
manual; these generally show similar findings (i.e., that NEPSY 
Language is most associated with academic achievement). 

Correlations With Other 
Neuropsychological Tests 

Certain NEPSY subtests are very similar to classic neuropsycho- 
logical paradigms. The manual presents correlations between 
conceptually similar NEPSY subtests and other, well-known 
neuropsychological tests (Korkman et al, 1998). For the most 
part, these provide some criterion-based evidence of the valid- 
ity of NEPSY subtests. 

For example, there is a very high correlation between 
NEPSY Arrows and the conceptually similar Judgement of 
Line Orientation (r= .77) in a small mixed sample in the man- 
ual (N= 18; Korkman et al., 1998). Other correlations with 



NEPSY: A Developmental Neuropsychological Assessment 213 



conceptually similar tests are moderate (NEPSY Statue and 
Benton Motor Impersistence; NEPSY Design Copying and 
BVRT-Copy) or minimal (NEPSY Finger Discrimination 
and Benton Finger Localization; NEPSY Immediate Memory 
For Faces and Benton Facial Recognition; see manual for more 
details). Correlations between NEPSY Comprehension of In- 
structions and the MAE are moderate (r= .48, MAE Token 
test) to high (r= .76, MAE Aural Comprehension of Words 
and Phrases). Sentence Repetition subtests from the NEPSY 
and MAE are uncorrected (r=.01), whereas word fluency 
tests from both measures exhibit a moderate degree of overlap 
(r= .44, NEPSY Verbal Fluency and MAE Controlled Word 
Association). 

With regard to memory, correlations between NEPSY 
memory subtests and conceptually similar CMS subtests 
are in the moderate to high range in a group of 27 nonclini- 
cal children presented in the manual (r— .36-56; p. 210 of 
Korkman et al., 1998). CMS correlations with other NEPSY 
nonmemory subtests are not shown, which complicates inter- 
pretation of discriminant validity. However, index score inter- 
correlations are presented, which indicate that although 
NEPSY Memory and Learning is highly related to CMS General 
Memory (r=.57), NEPSY Memory and Learning is actually 
more highly related to CMS Attention/Concentration (r— .74) 
than to any other memory-related CMS index. Whether this is 
due to a large attentional component in the NEPSY Memory 
and Learning score or to sample characteristics is unknown, but 
does suggest care in comparing index scores from the two scales 
in individual children, as they may measure different cognitive 
domains despite similar nomenclature. 

With regard to attention, the NEPSY Attention/Executive 
domain score correlates moderately with the CMS Atten- 
tion/Concentration Index (r— .31). However, this NEPSY In- 
dex is not selectively associated with CMS-based attention, 
since moderate to high correlations are also found across 
CMS index scores purported to measure aspects of memory 
functioning (see manual for details, p. 211). In children with 
ADHD, the NEPSY Attention/Executive Index is moderately 
related to performance on the Auditory Continuous Perfor- 
mance Test (r = -.27 to —.28; Keith, 1994) but not to perfor- 
mance on the Conners' CPT (r= -.06 to -.09). This holds true 
even when only NEPSY Attention/Executive subtests specifi- 
cally related to attention are examined, such as Visual Atten- 
tion (r— — .05 to -.11). Instead, Conners' CPT scores appear to 
be more closely related to performance on the NEPSY Senso- 
rimotor and Visuospatial domains (see manual for details). In 
other studies, the NEPSY Attention/Executive domain sub- 
tests appear to correlate with some tests of theory of mind in 
preschoolers (Perner et al., 2002), even after partialing out IQ 
(N=22). 

Examples of criterion-based evidence of the validity of the 
NEPSY Language domain are provided by studies involving 
the EOWPVT-R, PPVT-III, and CELF-P, well-known and well- 
validated language measures for children. For instance, NEPSY 
Body Part Naming and EOWPVT-R are highly correlated in 
children with sleep-disordered breathing and matched con- 



trols (r = .52; Till et al, 2001). Likewise, NEPSY Phonological 
Processing and NEPSY Comprehension of Instructions are 
both moderately to highly associated with PPVT-III scores 
(r= .53 and .42, respectively; Till et al., 2001). In a longitudinal 
study on the effects of cocaine on language development, both 
the CELF-P at ages 3 and 5 and the NEPSY Language score at 
age 7 showed the same magnitude of cocaine-related perfor- 
mance decrement (i.e., approximately one-fifth of an SD). This 
provides some evidence for the fact that the CELF-P and the 
NEPSY Language domain measure a similar dimension of 
functioning in preschoolers (Bandstra et al, 2002). 

With regard to visual- spatial abilities, NEPSY Block Con- 
struction and NEPSY Hand Positions are both moderately 
correlated to the Wide Range Assessment of Visual-Motor 
Abilities (WRAVMA; Adams & Sheslow, 1995). 

Clinical Studies 

The manual provides NEPSY data on a number of small clini- 
cal groups compared with matched controls, including chil- 
dren with ADHD, LD/ADHD, LD-reading, language disorder, 
autism, FAS, TBI, and hearing impairment (N= 8-51; Kork- 
man et al., 1998). The manual concludes that these studies 
support the clinical validity of the test, and can be used as a 
guide for determining which NEPSY components to use in 
the diagnostic workup for conditions such as ADHD (see 
Kemp et al., 2001). However, the data are not compelling for 
three main reasons: (1) the actual number of children with 
clinical-level impairments in each group is sometimes quite 
low, (2) some group means are actually in the broadly normal 
range, despite statistically significantly lower scores com- 
pared with matched controls, and (3) no data showing that 
the NEPSY can discriminate between clinical groups are pro- 
vided, given that demonstration of group differences versus 
controls is an inadequate metric of the clinical utility of 
neuropsychological tests. A higher standard is the percent of 
children with impairments who are actually detected by the 
test, and an even higher standard is whether the test can dis- 
criminate between clinical groups. Overall, the percentage of 
children in the clinical groups with impaired performance 
on the NEPSY raises serious questions about the test's sensi- 
tivity to neurocognitive impairment in the context of these 
specific disorders. Table 6-75 shows the percentage of chil- 
dren identified as "impaired" (i.e., with scores <2 SD below 
the mean) for each of the clinical groups presented in the 
manual. 

Previous test reviews of the NEPSY have noted the paucity 
of validity information on the test (e.g., Ahmad & Warriner, 
2001). Nevertheless, composites comprised of NEPSY Lan- 
guage domain subtests, PPVT-III, and EOWPVT-R appear 
sensitive to the effects of prenatal organic solvent exposure 
based on group comparisons (Till et al., 2001), along with 
NEPSY subtests measuring graphomotor ability (Visuomotor 
Precision, Design Copying), but not Attention/Executive sub- 
tests or a composite comprised of NEPSY Visuospatial sub- 
tests and WRAVMA (Till et al, 2001). However, the NEPSY 



214 General Cognitive Functioning, Neuropsychological Batteries, and Assessment of Premorbid Intelligence 
Table 6-75 Percent of Clinical Groups With Impaired Performance on the NEPSY 











Language 








Hearing 




ADHD 


LD/ADHD 


LD-Reading 


Disorder 


Autism 


FAS 


TBI 


Impairment 


NEPSY Core Domain 


N=51 


N=20 


N = 36 


N=19 


N = 23 


N=10 


N=8 


N=32 


Attention/Executive 


2.0 


25.0 


2.8 


37.5 


17.6 


77.8 


62.5 


20.0 


Language 


2.0 


20.0 


2.8 


7.7 


11.1 


33.3 


50.0 


— 


Sensorimotor 


9.8 


0.0 


16.7 


11.8 


17.6 


22.2 


62.5 


4.0 


Visuospatial 


2.0 


10.0 


13.9 


5.6 


8.7 


30.0 


28.6 


12.5 


Memory and Learning 


3.9 


15.0 


11.1 


31.6 


27.3 


40.0 


25.0 


9.4 



Note: Impairment - scores less than two SDs below the mean. 
Source: Adapted from Korkman et at, 1998. 



Attention/Executive Index score appears to be sensitive to the 
neurocognitive effects of sleep-disordered breathing in 5-year 
olds, at least as measured by group differences (Gottlieb et al., 
2004; O'Brien et al, 2004), as is the NEPSY Memory domain 
(Gottlieb et al., 2004). In particular, an index measuring the 
total number of arousals per hour of sleep time was negatively 
associated with performance on the Tower subtest (r=-.43; 
O'Brien et al., 2004). Preschoolers at risk for ADHD also ap- 
pear to have lower scores on Attention/Executive domain 
scores compared with controls (Perner et al., 2002). NEPSY 
Language domain scores are reportedly sensitive to the effects 
of prenatal cocaine exposure (Brandstra et al., 2002, 2004). 

A study by Schmitt and Wodrich (2004) examined whether 
the NEPSY domain scores provide additional information 
not already accounted for by IQ tests. They compared three 
groups of children (neurological, scholastic concerns, and stan- 
dardization controls) and found that, after controlling for 
IQ, only the NEPSY Language and NEPSY Sensorimotor do- 
main scores differed between groups. They concluded that al- 
though the test passes preliminary evidence of validity (i.e., 
group differences without controlling for IQ), other more rig- 
orous evidence supportive of the sensitivity of index scores to 
group differences is lacking. Specifically, differences between 
children with neurological conditions and controls disap- 
peared when IQ was controlled for on the Attention/Executive 
and Memory and Learning domains. Even when IQ was not 
controlled for, there were no group differences on the Visu- 
ospatial domain. The authors note that the main practical im- 
plication of these findings is that there is empirical evidence 
for the practice of supplementing IQ tests with NEPSY Lan- 
guage and Sensorimotor domains (i.e., Phonological Pro- 
cessing, Speeded Naming, and all the Sensorimotor subtests), 
but not for the other NEPSY subtests. We would temper these 
group-based conclusions by adding that only those Sensori- 
motor subtests with adequate reliabilities should be considered 
for differential diagnosis of individual children (e.g., Imitating 
Hand Movements in younger children and Fingertip Tapping 
in older children). 

Other group studies on the Finnish NEPSY, or on prior 
versions of the NEPSY in Finnish and Swedish samples, are 
reviewed in Korkman et al. (1998), Korkman (1999), and 



Kemp et al. (2001). These include outcome studies on con- 
genital hypothyroidism (Song, 2001), early-onset hemiparesis 
(Kolk & Talvik, 2000), congenital brain lesions and epilepsy 
(Kolk et al., 2001), juvenile neuronal ceroid lipofuscinosis 
(Lamminranta et al., 2001), fetal alcohol (Korkman et al., 
1998), and low birth weight and asphyxia (Korkman et al., 
1996; Sajaniemi et al., 2001; see also Korkman, 1988; Kork- 
man & Haekkinen-Rihu, 1994; and Korkman et al., 1996). 
Although of interest, these studies provide little information 
on the test's sensitivity/specificity to clinical conditions, other 
than group comparisons. Further, given considerable changes 
compared with the 1988 Finnish edition, some results based 
on earlier test versions may not necessarily be applicable to 
the newer American version. 



COMMENT 

When one considers some of the limitations of existing neu- 
ropsychological tests for children, the NEPSY is a major tech- 
nological advancement in pediatric neuropsychology. It is the 
only neuropsychological battery for children that is not sim- 
ply a downward extension of an adult battery, and it is the 
only existing comprehensive neuropsychological battery for 
children normed on a single, large, randomized, stratified 
normative sample. As such, it allows users to compare perfor- 
mance across domains, unencumbered by the psychometric 
and methodological limitations inherent in cross-test com- 
parisons of tests normed on different samples. In addition, it 
is a test that is modeled, to some extent, on Luria's approach 
to assessment, and many of its subtests are adaptations of 
classic neuropsychological testing paradigms (e.g., Word Flu- 
ency, Tower of London/Hanoi, Design Fluency, Finger Ag- 
nosia, Judgment of Line Orientation, etc.). It therefore has a 
respectable theoretical foundation, and from a practical stand- 
point, it provides users with a wide array of subtests that were 
previously unavailable for children or that had inadequate 
norms. In addition, even though it is well normed and em- 
ploys standardized administration procedures, it allows a 
process-oriented approach by measuring how children arrive 
at a certain level of performance (i.e., through the Qualitative 
Observation scores). It has brightly colored, child-friendly 



NEPSY: A Developmental Neuropsychological Assessment 215 



stimuli, is a relatively simple battery to learn and administer, 
and maintains the interests of most young children. Addition- 
ally, its age range extends to preschoolers (i.e., 3-5 years), an 
age band overlooked by many tests. Lastly, although many 
North Americans see it as a new test, it is actually one of the 
oldest instruments in pediatric neuropsychology, used in Fin- 
land for over 25 years. It is a test with a distinguished past and 
is hardly a newcomer in the field. 

Despite its many assets, the NEPSY has some significant 
limitations, and could benefit greatly from additional vali- 
dation research and psychometric refinement. Although it 
remains a major milestone in the evolution of pediatric neu- 
ropsychological assessment, it may have been overly ambi- 
tious in its attempts to cover the whole breadth and depth of 
neuropsychological functioning in children and may have 
been better served by including fewer subtests with better psy- 
chometric properties. 

Psychometric Properties 

Some NEPSY subtests, and their related Core domains, have 
poor psychometric properties and should probably rarely be 
administered in a diagnostic clinical context. These include 
Verbal Fluency, Statue, Phonological Processing, and Block 
Construction at ages 3 to 4, and Visuomotor Precision, Design 
Fluency, Tower, Speeded Naming, Comprehension of Instruc- 
tions, Imitating Hand Positions, Arrows, and Memory for 
Faces at ages 5 to 12. It is possible that in some cases (i.e., 
older children), poor reliabilities may be due to ceiling effects. 
Note that almost all the subtests that use percentile classifica- 
tions rather than scaled scores show poor reproducibility 
(e.g., Knock and Tap, Finger Discrimination, Route Finding, 
Manual Motor Sequences, Oromotor Sequences). Due to 
poor reliabilities for some Qualitative Observation scores and 
a lack of corroborative clinical evidence overall, these scores 
should be used with caution in the clinical assessment of indi- 
vidual children. Reliability and validity information are en- 
tirely lacking for a number of Supplemental Scores. Given that 
these are based on even fewer items than the total scores from 
which they derive, it is doubtful that their reliabilities would 
attain those of the Core domain scores shown in Tables 6-70 
and 6-71. 

Although this seems like a long list of subtests and scores 
with psychometric limitations, it should be remembered that 
the NEPSY includes 27 subtests, a number of which demon- 
strate at least adequate reliability. It should nevertheless be 
noted that only two subtests have both high internal reliability 
and high test-retest reliability defined as r greater than .80 
(i.e., Sentence Repetition at ages 3-4, and Auditory Attention 
and Response Set at ages 5-12). No NEPSY subtest demon- 
strates both types of reliability above .90, a level recom- 
mended for clinical decision making. See Tables 6-70 to 6-72 
for specific subtests. 

Substantial practice effects must also be taken into account 
when assessing children for change over time with certain sub- 
tests such as those from the Memory and Learning domain. 



Clinical Utility and Differential Diagnosis 

The sensitivity of a test (i.e., its ability to detect impairment) 
is one of the main prerequisites for using it in a clinical con- 
text. As noted above in Validity, data from the manual on the 
proportion of children with impairments in specific domains 
within various neurocognitive conditions of childhood (e.g., 
ADHD, LD/ADHD, language disorder, autism, etc.) suggest 
poor sensitivity to impairment for NEPSY domain scores (see 
Table 6-75). It is difficult, for example, to see how the test 
would be of any utility in the evaluation of children with 
ADHD, since almost the entire group performed well on the 
test (e.g., only 2% had Attention/Executive scores in the im- 
paired range). Similarly, excluding conditions with excessively 
small samples sizes (i.e., TBI, FAS), the proportion of children 
identified was at best 38%, which again suggests that the test 
"missed" more children than it actually identified. 

While low test sensitivity may be the norm in certain 
conditions (e.g., see discussion of diagnosis of ADHD in re- 
views of CPTs in this volume), a minimally acceptable level of 
sensitivity — at the very least exceeding chance — should be 
demonstrated as evidence of a test's clinical utility and predic- 
tive validi