IRT i pomiar edukacyjny

Kondratek, Bartosz; Pokropek, Artur

Article details

Journal

Edukacja

2013 | 4(124) | 42–66

Article title

IRT i pomiar edukacyjny

Authors

Bartosz Kondratek , Artur Pokropek

Title variants

EN

IRT and educational measurement

Languages of publication

PL

Abstracts

PL

Pod nazwą „item response theory” kryje się rodzina narzędzi statystycznych wykorzystywanych do modelowania odpowiedzi na rozwiązywane zadania oraz umiejętności uczniów. Modele IRT czynią to poprzez wprowadzenie parametryzacji, która określa: właściwości zadań oraz rozkład poziomu umiejętności uczniów. W artykule przedstawiony zostanie ogólny opis jednowymiarowego modelu IRT, przybliżone zostaną najczęściej stosowane modele dla zadań ocenianych dwupunktowo (2PLM, 3PLM, 1PLM) oraz wielopunktowo (GPCM), a także zarysowana zostanie problematyka estymacji poziomu umiejętności. Artykuł ma za zadanie wprowadzić czytelnika w techniczne szczegóły związane z modelowaniem IRT oraz przedstawić wybrane zastosowania praktyczne w pomiarze edukacyjnym. Wśród zastosowań praktycznych omówiono wykorzystanie IRT w analizie skomplikowanych schematów badawczych, zrównywaniu/łączeniu wyników testowych, adaptatywnym testowaniu oraz przy tworzeniu map zadań.

EN

Item response theory (IRT) is a family of statistical tools used to model relationships between item response and student ability. IRT models achieve this by parameterisation of item properties and distribution of the ability variable among students. This article presents a general introduction to the unidimensional IRT model, the most commonly used for dichotomously scored items (1PLM, 2PLM, 3PLM). Polytomously scored items and student ability estimation are also described. This article aims at introducing the reader to the technical aspects of IRT modelling in educational measurement and presents a range of practical applications. The article describes the analysis of complex research designs, test linking and equating, adaptive testing and item mapping as examples.

Keywords

PL

IRT skalowanie złożone schematy badawcze zrównywanie testowanie adaptatywne mapowanie zadań

EN

IRT scaling complex research designs linking and equating adaptive testing item mapping

Publisher

Instytut Badań Edukacyjnych

Journal

Edukacja

Year

2013

Issue

4(124)

Pages

42–66

Physical description

Dates

published

2013-12-31

Contributors

author

Bartosz Kondratek

Instytut Badań Edukacyjnych

author

Artur Pokropek

Instytut Badań Edukacyjnych

References

Aitkin, I. i Aitkin, M. (2011). Statistical modeling of the National Assessment of Educational Progress. New York: Springer.
Ayala, R. J. de (2009). The theory and practice of Item Response Theory. New York – London: The Guilford Press.
Birnbaum, A. (1968). Some latent trait models. W: F. M. Lord i M. R. Novick (red.), Statistical theories of mental test scores. Reading: Addison – Wesley.
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29–51.
Cheng, P. E. i Liou, M. (2000). Estimation of trait level in Computerized Adaptive Testing. Applied Psychological Measurement, 24(3), 257–265.
Choi, S. W., Cook, K. F. i Dodd, B. G. (1997). Parameter recovery for the partial credit model using MULTILOG. Journal of Outcome Measurement, 1(2), 114–142.
Cochran, W. G. i Cox, G. M. (1957). Experimental designs. New York: John Willey & Sons.
De Boeck, P. i Wilson, M. (red.). (2004). Explanatory item response models: a generalized linear and nonlinear approach. New York: Springer.
DeMars, C. (2010). Item Response Theory. Oxford – New York: Oxford University Press.
Deutsch, R. (1969). Teoria estymacji. Warszawa: Państwowe Wydawnictwa Naukowe.
Deville, C. (1993) Flow as a testing ideal. Rasch Measurement Transactions, 7(3), 308.
Dubiecka, A, Szaleniec, H. i Węziak, D. (2006). Efekt egzaminatora w egzaminach zewnętrznych. W: W: B. Niemierko i M. K. Szmigel (red.), O wyższą jakość egzaminów szkolnych, cz. I, Etyka egzaminacyjna i zagadnienia ogólne (s. 526–355). Kraków: Polskie Towarzystwo Diagnostyki Edukacyjnej.
Frey, A., Hartig, J. i Rupp, A. A. (2009). An NCME Instructional module on booklet designs in large‐scale assessments of student achievement: theory and practice. Educational Measurement: Issues and Practice, 28(3), 39–53.
Gruijter, D. N. M. i Kamp, L. J. van der (2005). Statistical test theory for education and psychology. Pobrano z: http://irt.com.ne.kr/data/test_theory.pdf
Holland, P. W. (2007). A framework and history for score linking. W: N. J. Dorans, M. Pommerich i P. W. Holland (red.), Linking and aligning scores and scales (s. 5–30). New York: Springer.
IBE (2011). «Laboratorium myślenia». Diagnoza umiejętności gimnazjalistów w zakresie przedmiotów przyrodniczych. Raport z badań. Pobrano z: http://eduentuzjasci.pl/pl/publikacje-ee-lista/162-raport/raport-z-badania/laboratorium-myslenia/812-laboratorium-myslenia-raport-z-badania.html
Jasińska, A., i Modzelewski M. (2012). Można inaczej. Wykorzystanie IRT do konstrukcji testów osiągnięć szkolnych. W: B. Niemierko i M. K. Szmigel (red.), Regionalne i lokalne diagnozy edukacyjne (s. 178–187). Kraków: Polskie Towarzystwo Diagnostyki Edukacyjnej.
Kaczan, R. i Rycielski, P. (2012). Diagnoza umiejętności dzieci 5-, 6- i 7-letnich za pomocą Testu Umiejętności na Starcie Szkolnym (TUnSS). Referat wygłoszony na konferencji Polskiego Towarzystwa Diagnostyki Edukacyjnej, Wrocław. Pobrano z: http://www.ptde.org/file.php/1/Archiwum/XVIII_KDE/XVIII%20KDE%20-%20referaty/Kaczan,Rycielski.pdf
Kolen, M. J. (2004). Linking assessments: concept and history. Applied Psychological Measurement, 28(4), 219–226.
Kolen, M. J. i Brennan R. L. (2004). Test equating, scaling, and linking: method and practice (wyd. 2). New York: Springer.
Kondratek, B. (2012). Konteksty osiągnięć uczniów. W: M. Żytko (red.), Badanie umiejętności podstawowych uczniów trzecich klas szkoły podstawowej. Uczeń, szkoła, dom. Raport z badań. (s. 187–217). Warszawa: Instytut Badań Edukacyjnych.
Koretz, D. (2008). Measuring up: what educational testing really tells us. Cambridge: Harvard University Press.
Kyngdon, A. (2011). Plausible measurement analogies to some psychometric models of test performance. British Journal of Mathematical and Statistical Psychology, 64(3), 478–497.
Lehmann, E. L. (1991). Teoria estymacji punktowej. Warszawa: Wydawnictwo Naukowe PWN.
Linacre, M. (2000). Computer Adaptive Testing: a methodology whose time has come. MESA Memorandum No. 69.
Linden, W. J. van der i Glas, C. A. W. (2000). Computerized Adaptive Testing: theory and practice. Norwell: Kluwer Academic.
Linden, W. J. van der i Pashley, P. J. (2010). Item selection and ability estimation in adaptive testing. W: W. J. van der Linden i C. A. W. Glas (red.), Elements of adaptive testing (s. 3–30). New York: Springer.
Linden, W. J. van der, Veldkamp, B. P. i Carlson, J. E. (2004). Optimizing balanced incomplete block designs for educational assessments. Applied Psychological Measurement, 28(5), 317–331.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale: Lawrence Erlbaum Associates.
Lord, F. M. i Novick, M. R. (1968). Statistical theories of mental test scores. Reading: Addison – Wesley.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174.
Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56(2), 177–196.
Muraki, E. (1992). A generalized partial credit model: application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176.
Muraki, E. i Bock, D. (2003). PARSCALE 4.0 [Instrukacja programu komputerowego]. Lincolnwood: Scientific Software International.
OECD (2009). PISA 2006. Technical Raport. Paris: OECD Publishing.
OECD (2012). PISA 2009. Technical Raport. Paris: OECD Publishing.
Pokropek, A. (2008). Metody obliczania edukacyjnej wartości dodanej dla szkół kończących się egzaminem maturalnym. W: B. Niemierko i M. K. Szmigel (red.), Uczenie się i egzamin w oczach nauczyciela (s. 237–247). Kraków: Polskie Towarzystwo Diagnostyki Edukacyjnej.
Pokropek, A. (2011). Missing by design: planned missing-data designs in social science. ASK. Research&Methods, 20, 81–105.
Pokropek, A. i Żółtak, T. (2012). Nowe modele jednorocznej EWD. W: B. Niemierko i M. K. Szmigel (red.), Regionalne i lokalne diagnozy edukacyjne (s. 178–187). Kraków: Polskie Towarzystwo Diagnostyki Edukacyjnej.
Preece, D. A. (1990). Fifty years of Youden squares: a review. Bulletin of the Institute of Mathematics and Its Applications, 26(4), 65–75.
Rao, C. R. (1982). Modele liniowe statystyki matematycznej. Warszawa: Państwowe Wydawnictwa Naukowe.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.
Reckase, M. D. (2009). Multidimensional Item Response Theory. New York: Springer.
Reise, S. P. i Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27(2), 133–144.
Rutkowski, L., E. Gonzalez, M. Joncas i Davier, M. von (2010). International large-scale assessment data. Educational Researcher, 39(2), 142–151.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores [Psychometric Monograph No. 17]. Richmond: Psychometric Society.
Stone, C. A. (1992). Recovery of marginal maximum likelihood estimates in the two-parameter logistic response model: an evaluation of MULTILOG. Applied Psychological Measurement, 16(1), 1–16.
Szaleniec, H., Grudniewska, M., Kondratek, B., Kulon, F. i Pokropek, A. (2012). Wyniki egzaminu gimnazjalnego 2002–2010 na wspólnej skali. Edukacja, 119(3), 9–30.
Thissen, D. J. i Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51(4), 567–577.
Wang, S. i Wang. T, (2001). Precision of Warm's weighted likelihood estimates for a polytomous model in Computerized Adaptive Testing. Applied Psychological Measurement, 25(4), 317–331.
Warm, T. A. (1989). Weighted likelihood estimation of ability in the item response theory. Psychometrika, 54(3), 427–450.
Weiss, D. J. (2004). Computerized adaptive testing for effective and efficient measurement in counseling and education. Measurement and Evaluation in Counseling and Development, 37(2), 70–84.
Wilson, M. (2005). Constructing measures: an item response modeling approach. Mahwah: Lawrence Erlbaum Associates.
Woods, C. M. (2008). Consequences of ignoring guessing when estimating the latent density in item response theory. Applied Psychological Measurement, 32(5), 371–384.
Wright, B. D. (1983). Fundamental measurement in social science and education. Research Memorandum No. 33a MESA Psychometric Laboratory. Pobrano z: http://www.rasch.org/memo33a.htm
Wright, B. D. i Stone, M. (1979). Best test design. Chicago: MESA Press.
Wu, M. (2005). The role of plausible values in large-scale surveys. Studies in Educational Evaluation, 31, 114–128.

Notes

http://www.edukacja.ibe.edu.pl/images/numery/2013/4-3-kondratek-pokropek-irt-i-pomiar-edukacyjny.pdf

Document Type

Publication order reference

Identifiers

ISSN

0239-6858

YADDA identifier

bwmeta1.element.desklight-dda05385-4eb2-4cf2-b041-b529ec7d6ca5

Article details

Journal

Edukacja

Article title

IRT i pomiar edukacyjny

Authors

Title variants

Languages of publication

Abstracts

Keywords

Publisher

Journal

Year

Issue

Pages

Physical description

Dates

Contributors

References

Notes

Document Type

Publication order reference

Identifiers

YADDA identifier