MEASUREMENT
HYPOTHETICAL QUIZ 1 (a measurement of
social work research knowledge)
1. Operationally define social worker
2. Name 2 of 3 authors of the required
text for this course
3. What is the atomic number of oxygen?
grade own papers
What is wrong with this quiz as a measurement
of social research knowledge?
-
Content Validity: not a TRUE indicator of social
research knowledge because:
-
it measures things outside the construct
of social research knowledge
-
it fails to measure things that should be
included in the construct of social
research knowledge
-
Reliability: not a CONSISTENT indicator of social
research knowledge because:
-
there is probably no correlation between
the items
-
Measurement
= describing abstract
concepts in terms of specific
indicators by assigning numbers
or symbols to the
indicators according to rules
-
most variables of interest in the behavioral
sciences are not things
but concepts, and hence
require both conceptual and operational definitions in order to be measured
-
if you cant count it, you have to construct
it
-
eg: depression (abstract
concept, conceptual definition) an unplanned and unwanted reduction
of cognitve, affective, biological, and interpersonal organismic activity
-
indicators
(arbitrary) of depression based on the above definition are:
-
sleep
-
time spent with other people
-
self esteem
-
number/symbols
(arbitrary) assigned to indicators
-
# hours slept
-
# times saw people other than family socially
outside work or school
-
score on a standardized self esteem scale
-
rules
-
self report of clock hours slept (0-24),
interval level
-
# times active for at least 5 min in 24 hour
day, interval
-
Beck Depression Inventory (0-63), ordinal
treated as interval
-
e.g. Race (Conceptual D=see Webster)
-
indicators:
-
Physical (skin color, facial features, hair
texture)
-
self reference
-
numbers
-
categories (e.g. 1=Indigenous/Native American,
2=African-American, 3=European-American, etc.)
-
rules
-
eg, Caretaker Experctations (CE) regarding
Pediatric Asthma (conceptual definition =
belief in ones ability to take action, anticipate outcomes, and respond
to problems encountered in taking care of a child with asthma)
-
indicators:
-
self
efficacy
-
outcome
expectations
-
response
difficulty
-
symbols:
each subscale consists of 5 items rated on a 1 to 9
metric
-
rules: items are added to generate a subscale
score
-
Discuss: If
you can't measure a problem, it doesn't exist
-
measurement depends on
how we collect data, which in turn is determined by the research question.
There are usually four ways to collect data:
-
(1)
paper & pencil tests
-
Beck
Depression Inventory as a measure of depression
-
Child
Well-being scale as a measure of child functioning,
-
Caretaker
Experctations Regarding Pediatric Asthma
is a measure of competence in managing pediatric asthma in a child in one's
care
-
SAT
as a measure of potential for success in college
-
MCMI
as a measure of personality functioning
-
voting
machine as a measure of political preference
-
questionnaire
as a measure of lots of things
-
if I wanted to measure knowledge of research
in a group of social work students using paper & pencil methods, I
would give them an examination
-
(2)
observation
-
counting
the number of times a client uses the word "I" during an interview
as a measure of ego strength
-
watching
for control & aggression while a father plays with a child
-
looking
for bruises in an ER
-
watching
a video or audiotape
-
observing
the sequence of interaction in a family session behind a 1-way mirror
-
if I
wanted to measure knowledge of research in a group of social work students
using observational methods, I could put
several you into a focus group and ask you to design a research project,
the assign your grade based on my observations
-
interview
-
asking questions of your clients in the agency
-
oral surveys
-
oral exams
-
taking a social history
-
if I
wanted to measure knowledge of research in a group of social work students
using interview methods, I would give them an oral
examination (something we usually reserve for PhD students)
-
existing
records
-
reading case records
-
grades
-
public records such as voting record, taxes
paid,
-
census documents
-
internet data
-
if I
wanted to measure knowledge of research in a group of social work students
using existing records, I could give you a grade
based on the grades you got in your other classes
-
to be an "existing record" the data have
to be used for something other than for what they were originally collected
-
I used to say attendance in class was an
existing record, but in the process of the IRB overhaul at UIC, I have
learned that isnt so
-
indicators = an
observation presumed to be evidence of a phenomenon
-
it is the "slipperyness" between a concept
and an indicator that is a big concern
-
one "indicator" may indicate many phenomena/variables,
eg: bruises on a child's legs may indicate something good (active, athletic
child) or something bad (child abuse) or something neutral (situational
clumsiness)
-
rapid speech =>
-
culture
-
family speech pattern
-
inner sense of urgency or fear (anxiety)
-
biological pressure (mania)
-
income =>
-
poverty
-
values
-
adherence to judeo-xtian ethic
-
racism or sexism
-
score on exam =>
-
understanding of material
-
compulsiveness
-
test-taking skill
-
luck
-
adaptivity to mainstream culture
-
In the same way
a single observation may indicate many different phenomenona, a
single phenomenon/variable may have many indicators, eg:
-
child abuse <=
-
child's self report (interview)
-
score on a scale of disciplinary techniques
(paper & pencil test)
-
written report in an agency record (existing
record)
-
how
else?
-
Score on a Milner's scale of child abuse
potential?
-
observation of discipline in a grocery store?
-
anxiety <=
-
self report (interview)
-
score on Burns Anxiety scale (P&P)
-
DSM diagnosis in a case file (existing record)
-
how
else?
-
observation of motor movements?
-
breathing and other autonomic indicators?
-
political preference <=
-
vote count (P&P)
-
conversation (interview)
-
how
else?
-
ears (republicans have funny looking ears)?
-
job description?
-
understanding research <=
-
class attendance (existing record),
-
participation (observation)
-
self report (interview)
-
how
else?
-
In general, it is desirable to measure the
same variable using multiple indicators, preferably using at least 2 means
of data collection
-
eg: measure depression using a standard scale
such as BDI is fine if thats all you can do
-
if possible add an existing record (e.g.
school attendence), narrative (interview) and ask a relative to report
(observation)
-
items,
indices, and scales
-
item =
single indicator of a variable
-
item # 17 on the BDI is
-
(0) I dont get more tired than usual
-
(1) I get tired more easily than I used to
-
(2) I get tired from doing almost anything
-
(3) I am too tired to do anything
-
item #17 on the Hudson Generalized Contentment
Scale asks Ss to rate on a 1-5 scale how often "I feel downtrodden"
-
may have an item on a checklist (eg. Achenbach
Child Behavior, or Denver Child Developement Inventory, or a hyperactivity
rating list may be used by teachers or social workers -- consist of items)
-
index = two
or more items combined to indicate a variable
-
BDI has 21 items
-
MAST, Hudson scales have 25 items
-
Millon Clinical Multiaxis Inventory has 175
items
-
Achenbach (child behavior checklist) has
113 items
-
scale = an index
with known validity, reliability, and other psychometric characteristics
such as underlying factor structure
-
BDI is a scale measuring depression
-
MAST is a scale measuring alcoholism
-
we will talk about validity & relaibility
later
-
Levels of measurement (review)
-
nominal
measures: classify a variable by catergories which are mutually exclusive,
but which have no natural order or numbering (eg: religion, gender, political
preference)
-
e.g. "are you a democrat, republican, or
independent?"
-
the only mathematical property is or
-
ordinal
measures: variables which may be classified and ordered, but may not
be numbered (eg: attitude, opinion, belief, strength of preference)
-
e.g. "how democratic are you, 1-10?"
-
math
properties: or
-
interval
measures: variables which may be classified, ordered, and numbered
(eg: number of children, age, income in dollars)
-
interval variables may be continuous or discrete
-
continuous: age
-
discrete: number of times married
-
if an interval measure has an true and observable
zero (eg. number of siblings, number of times arrested) it is called a ratio
level measure
-
height, weight, age are interval but not
ratio
-
e.g. "how many times have you voted for a
democrat?"
Evaluating
Measures: How Slippery Are They?
-
Validity, Reliability, and Error in Testing
-
X = T + R + S
-
X is the obtained
score
-
T is the true
score, the error-free ideal score
-
R is random
error, including:
-
transient personal factors
-
situational factors
-
e.g. distraction due to heat
-
S is systematic
error, including
-
enduring characteristics of the student
-
e.g. intelligence, test-taking skill
-
the more systematic and random error figure
into the measurement process, the more we say a measurement is either invalid
or unreliable
-
validity = accuracy of the measure. Ways
of demonstrating VALIDITY:
-
face/content
validity: is there a logical relationship between the variable and
the measure?
-
did exam 1 look like it measured knowledge
of research?
-
did exam 1 leave out anything that should
have been asked, or did it ask things that should not have been there?
-
does vote count look like a good measure
of political belief?
-
is anything missed by using vote count as
a measure of political belief?
-
does income look like a measure of poverty?
-
are there elements of the poverty idea that
are not covered by "income"; does income introduce other factors that are
outside the idea of poverty?
-
criterion
validity: is there a correlation between the measure and another measure
which indicates the variable?
-
concurrent
criterion validity: present correlation
-
can quiz 1 be correlated to any other current
variable that is correlated to research knowledge? GPA? class attendance?
-
can income be correlated to any other variable
that is correlated to poverty? Parent's educational level?
-
can depression measure be correlated to another
measure, behavior, belief?
-
predictive
criterion validity: temporal correlation
-
can exam be correlated to any future variable?
final grade? income in 5 years? # articles published
-
can income be correlated to any future indicator
of poverty? children's income in 10 years
-
construct validity: to what extent does the
variable correlate with other variables which are elements of the theoretical
framework from which the variable arose?
-
IF
I were interesed in the construct validity of QUIZ 1 as a measure of research
knowledeg, AND
IF my theory that good practice requires good research is true, THEN
midterm grades should related to some measure of practice skill, like your
grade in PX I or your score on your field instructor's evaluation.
-
To the extent there is such a correlation,
Quiz 1 has construct validity
-
IF
I were interested in the construct validity of the Beck Depression Inventory,
I know that the BDI comes from the cognitive theory of depression. Since,
according to this theory, the principal cause of suicide is hopelessness,
people who attempt suicide should score higher on the BDI than a matched
group of non-suicidal people.
-
To the extent there is a difference on the
BDI between suicidal and non-suicidal groups, the BDI has construct validity
-
reliability
= consistency of a measure
-
how consistently does a measure detect a
variable when the variable is present, and not detect it when it is not
present?
-
validity is not numerical, but reliability
is represented in numerical terms
-
preferred reliability > 0.80; 0.70 is marginal
-
e.g. if a measure detected a condition 50%
of the time, we say the reliability is 0.50, not too good, since flipping
a coin would be just as effective
-
test-retest
reliability
-
if I gave you quiz 1 again, immediately afterward,
would you get the same score?
-
to the extent scores on Quiz 1 (a) correlate
with the scores on Quiz 1 (b), Quiz 1 is a reliable measure, specifically
it has test-retest reliability
-
multiple
forms reliability
-
if I gave half you a completely different
version of Quiz 1, there should be a high correlation between scores
-
to the extent Quiz 1.1 version 1 correlated
with Quiz 1.2, Quiz 1 is reliable; specifically, it has multiple forms
reliability
-
split-half
reliability
-
Is there a correlation between your scores
on questions odd numbered and even numbered questions?
-
to the extent Q1odd
and Q1even correlate, Q1
is reliable, specifically, it has split-half relaibility
Chronbach's
alpha (internal consistency) is a computer generated average of all
possible split half correlations