Introduction to Educational and Psychological Measurement Using R Chapter 7 Item Response Theory One could make a case that item response theory is the most important statistical method about which most of us know little or nothing. IRT provides a foundation for statistical methods that are utilized in contexts such as test development, item analysis, equating, item banking, and computerized adaptive testing. Its applications also extend to the measurement of a variety of latent constructs in a variety of disciplines. Given its role and influence in educational and psychological measurement, the topic of IRT has accumulated an extensive literature.
Numerous scale development procedures are reviewed. They are all summarized into an overall framework of consecutive steps. A concise description is contained in each step. Issues covered comprise the following. First, the theoretical underpinning of the scale construct is described, along with the response specifications and response formats available most popular like Likert and some more elaborated.
Then the item writing guidelines follow together with strategies for discarding poor items when finalizing the item pool. The item selection criteria described comprise an expert panel review, pretesting and item analysis.
Finally, the dimensionality evaluation is summarized along with test scoring and standardizing norming. Scale construction has implications on research conclusions, affecting reliability and the statistical significance of the effects obtained or stated differently the accuracy and sensitivity of the instruments.
In other words, it is a set of objective and standardized self-report questions whose responses are then summed up to yield a score. Item score is defined as the number assigned to performance on the item, task, or stimulus Dorans, The definition of a questionnaire or test is rather broad and encompasses everything from a scale, to measure life satisfaction e.
The scale items are indicators of the measured construct and hence the score is also an indicator of the construct Zumbo et al. Attitude, ability and intellectual reasoning measures or personality measures are considered as technical tools, equivalent e.
Over the past decades, such instruments became popular in psychology mainly because they provide multiple related pieces of information on the latent construct been assessed Raykov, The target population is as the group for whom the test is developed Dorans, Test development and standardization or norming are two related processes where test development comes first and standardization follows.
During test development, after item assembly and analysis, the items which are strongest indicators of the latent construct measured are selected and the final pool emerges, whereas in standardization, standard norms are specified Chadha, Effective scale construction has important implications on research inferences, affecting first the quality and the size of the effects obtained and second the statistical significance of those effects Furr,or in other words the accuracy and sensitivity of the instruments Price, The purpose of this work is to provide a review of the scale development and standardization process.
The Scale Development Process Overview The scale development process as described by Trochim is completed in five steps as quoted by Dimitrov, In a similar vein, Furr also described it as a process completed in five steps: Steps d and e are an iterative process of refinement of the initial pool until the properties of the scale are adequate.
Test score then can be standardized see relevant section. There are several models of test development. In Table 1 the scale development process described by multiple different sources is presented as the steps suggested by different sources differ.
Note that in Table 1 an integrative approach to the scale development process combining steps by all sources is contained at the bottom of Table 1. The phases of the scale development process are presented in the sections below.
Instrument Purpose and Construct Measured When instruments are developed effectively, they show adequate reliability and validity supporting the use of resulting scores.
To reach this goal, a systematic development approach is required Price, However, the development of scales to assess subjective attributes is considered rather difficult and requires both mental and financial resources Streiner et al.
The prerequisite is to be aware of all existing scales that could suit the purpose of the measurement instrument you wish to develop, judging their use without any tendency to maximizing deficiencies before embark on any test construction adventure. Then, there is one more consideration: Some feasibility dimensions need to be considered are time, cost, scoring, the method of administration, intrusiveness, the consequences of false-positive and false-negative decisions, and so forth Streiner et al.
After that, the scale development process can start with the definition of the purpose of the instrument within a specific domain, the instrument score and the constraints inherent in the development Dimitrov, ; Price, As a rule, in the research field of psychology, the general purpose of a scale is to discriminate between individuals with high levels of the construct being measured from those with lower levels Furr, However, the test developed should first determine clearly the intended construct been measured.writing ability is a good example of the kind of test that should be given in an essay response format.
This type of item, however, is difficult to score reliably and can require a . 1. Introduction and Basic Concepts. Questionnaire (also called a test or a scale) is defined as a set of items designed to measure one or more underlying constructs, also called latent variables (Fabrigar & Ebel-Lam, ).In other words, it is a set of objective and standardized self-report questions whose responses are then summed up to yield a score.
A general rule for efficient implementation of reading ability rules is common sense (DeVellis, ), and the same is true for the item writing rules (Krosnick & Presser, ).
Generally, the personalized wording is more involving and is preferable by most developers. writing ability is a good example of the kind of test that should be given in an essay response format.
This type of item, however, is difficult to score reliably and can require a significant. Scale Development: Theory and Applications (Applied Social Research Methods) [Robert F. Devellis] on ashio-midori.com *FREE* shipping on qualifying offers.
In the Fourth Edition of Scale Development, Robert F. DeVellis demystifies measurement by emphasizing a logical rather than strictly mathematical understanding of ashio-midori.coms: CH6: 1. DeVellis guidelines 1. Define clearly what you want to measure.
(Items as specific as possible) 2. Generate an item pool. (Avoid redundant items) 3. Avoid exceptionally long items (Long items are often confusing or misleading) 4. Keep the level of reading difficulty appropriate for .