All reports in ¾«¶«Ó°Òµâ€™s Research Library are available upon request. Executive summaries are available below for the latest LSAT Technical Reports and other research published within the last 10 years.
Current Research:
Standard item response theory (IRT) models have been extended with testlet effects to account for the nesting of items; these are well known as (Bayesian) testlet models or random effect models for testlets. The testlet modeling framework has several disadvantages. A sufficient number of testlet items are needed to estimate testlet effects, and a sufficient number of individuals are needed to estimate testlet variance. The prior for the testlet variance parameter can only represent a positive association among testlet items.
Bayesian covariance structure modeling (BCSM) offers a flexible approach to modeling complex interdependences that arise when gathering test-taker data through computerized testing. In addition to the scored responses, process data such as response times or action patterns are obtained. Data from different sources may be cross-correlated; furthermore, within each data source, blocks of correlated observations may form testlet structures. In previous reports, BCSM was limited to the assumption that all test takers are part of the same group.
The aim of this study was twofold: First, we investigated whether scores on an admission test administered in proctored and unproctored environments led to similar predictions of future academic success. Second, we explored how Bayesian modeling can be of help in interpreting admission-testing data. Results showed that the two modes of administering an admission test did not require the use of different models for predicting academic success, and that Bayesian modeling provides a very useful and easy-to-interpret framework for predicting future academic success.
Test collusion (TC) is the sharing of test materials or answers to test questions (items) before or during a test. Because of the potentially large advantages for the test takers involved, TC poses a serious threat to the validity of score interpretations. The proposed approach applies graph theory methodology to response similarity analyses to identify groups involved in TC while minimizing the false-positive detection rate. The new approach is illustrated and compared with a recently published method using real and simulated data.
With computerized testing, it is possible to record not only the responses of test takers to test questions but also other details about the test taker’s activity, such as the amount of time spent responding to each question. These details comprise a new type of data called process data. This report proposes a new approach to modeling responses, response times, and other process data: Test-taker data that naturally belong together are grouped in a cross-classification structure. Five examples of models applying this approach are illustrated.
A new statistical model is proposed to study the effects of various testing conditions on a population of test takers. This flexible model allows for numerous effects to be considered simultaneously. A Bayesian approach is employed, taking prior information into consideration. An empirical example demonstrates the utility of the suggested model to test the influence of item presentation formats on the performance of test takers. This research could be of practical value in a potential transition of the Law School Admission Test (LSAT) from a paper-and-pencil format to a digital mode.
This report addresses a general type of cluster aberrancy in which a subgroup of test takers has an unfair advantage on some subset of administered items. Examples of cluster aberrancy include item preknowledge and test collusion. In general, cluster aberrancy is hard to detect due to the multiple unknowns involved: Unknown subgroups of test takers have an unfair advantage on unknown subsets of items. The issue of multiple unknowns makes the detection of cluster aberrancy a challenging problem from the standpoint of applied mathematics.
Most high-stakes testing programs apply methods to identify unlikely patterns of correct/incorrect responses to test questions. Some examples of why such patterns may occur include misinterpretation of questions, question preknowledge, answer copying, or guessing behavior. This report provides an overview of existing approaches to identifying atypical response patterns that fall into a class of analyses known as nonparametric statistics. Results of a simulation study comparing the different approaches, along with guidelines for applying these indices in practice, are also presented.
Many standardized tests are now administered via computer rather than paper-and-pencil format. The computer-based delivery mode brings with it certain advantages, one of which is the ability to record not only the test taker’s response to each item (i.e., question), but also the amount of time the test taker spends considering and answering each item. Research on how to represent and utilize response time data has proliferated, but most of the research is based on the assumption of constant working speed in relation to a certain accuracy level.
Test theory typically deals with categorical responses to test questions (items), for instance, correct/incorrect responses or responses that represent a choice from a finite number of alternatives. Whenever technically possible, it is attractive to collect information on continuous response variables that accompany these responses as a covariate. One obvious example is response time; other examples are information on cursor movement in computer-based testing, eye-tracking information, or physiological information.
In high-stakes testing, it is important to verify the validity of individual test scores. Although a test, in general, results in valid test scores for most test takers, there may be individual test takers with unusual answer patterns for whom test score validity is questionable. One example of such aberrance is a test taker who guesses on a large number of questions or one who has preknowledge of the answers to some questions. An effective statistical technique (developed for a single test) was extended for tests that consist of multiple subtests, as does the Law School Admission Test.
Several statistics used to detect inconsistent patterns of correct/incorrect answers to test questions (items) were evaluated based on data from one Analytical Reasoning (AR) and one Logical Reasoning (LR) section of the Law School Admission Test. Item score patterns were also evaluated based on gender and racial/ethnic subgroups. We showed that test takers who were consistently flagged by all statistics evaluated and for both the AR and the LR sections had relatively low scores, which may have been the result of extensive guessing.
With computerized testing, it is possible to record both the responses of test takers to test questions (i.e., items) and the amount of time spent by a test taker in responding to each question. Various models have been proposed that take into account both test-taker ability and working speed, with many models assuming a constant working speed throughout the test. The constant working speed assumption may be inappropriate for various reasons.
This report presents a new algorithm for detecting groups of test takers (aberrant groups) who had access to subsets of test questions (aberrant subsets) prior to an exam. This method is in line with the development of statistical methods for detecting test collusion, a new research direction in test security. Test collusion may be described as the large-scale sharing of test materials, including answers to test questions. The algorithm employs several new statistics to perform a sequence of statistical tests to identify aberrant groups.
Many standardized tests are now administered via computer rather than paper-and-pencil format. In a computer-based testing environment, it is possible to record not only the test taker’s response to each question (item), but also the amount of time spent by the test taker in considering and answering each item. Response times (RTs) provide information not only about the test taker’s ability and response behavior but also about item and test characteristics. The current study focuses on the use of RTs to detect aberrant test-taker responses.
In standardized testing, test takers may change their answer choices for various reasons. The statistical analysis of answer changes (ACs) has uncovered multiple testing irregularities on large-scale assessments and is now routinely performed at some testing organizations. Research on answer-changing behavior has recently branched off in several directions, including modeling of ACs and addressing scanning errors.
While an admission test may strongly predict success in university or law school programs for most test takers, there may be some test takers who are mismeasured. To address this issue, a class of statistics called person-fit statistics is used to check the validity of individual test scores. However, most person-fit statistics are designed for a single test, and not much is known about the performance of these statistics for admission tests consisting of multiple highly correlated subtests.
In standardized multiple-choice testing, test takers often change their answers for various reasons. The statistical analysis of answer changes (ACs) has uncovered multiple testing irregularities on large-scale assessments and is now routinely performed at some testing organizations. This report presents two new approaches to analyzing ACs at the individual test-taker level. The information about all previous answers is used only to partition the data into two disjoint subsets: responses where an AC occurred and responses where an AC did not occur.