¾«¶«Ó°Òµ

Researching on tablet

Research Library

All reports in ¾«¶«Ó°Òµâ€™s Research Library are available upon request. Executive summaries are available below for the latest LSAT Technical Reports and other research published within the last 10 years.

Looking for older reports? Consult the Research Archive

Current Research:

In a large-scaled high-stakes testing program such as the Law School Admission Test (LSAT), it is necessary to maintain a large bank of test items to support the demand for a new test form at nearly every administration. To assure that the item bank can support the test assembly requirements, ongoing monitoring of the quality of the item bank is necessary to identify deficiencies and direct item development efforts.

In standardized testing, test takers may change their answer choices for various reasons. The statistical analysis of answer changes (ACs) has uncovered multiple testing irregularities on large-scale assessments and is now routinely performed at some testing organizations. Research on answer-changing behavior has recently branched off in several directions, including modeling of ACs and addressing scanning errors.

While an admission test may strongly predict success in university or law school programs for most test takers, there may be some test takers who are mismeasured. To address this issue, a class of statistics called person-fit statistics is used to check the validity of individual test scores. However, most person-fit statistics are designed for a single test, and not much is known about the performance of these statistics for admission tests consisting of multiple highly correlated subtests.

Many standardized tests are now administered via computer rather than paper-and-pencil format. The computer-based delivery mode brings with it certain advantages. One advantage is the ability to adapt the difficulty level of the test to the ability level of the test taker in what has been termed computerized adaptive testing, or CAT. A second advantage is the ability to record not only the test taker’s response to each item (i.e., question), but also the amount of time the test taker spends considering and answering each item.

In standardized multiple-choice testing, test takers often change their answers for various reasons. The statistical analysis of answer changes (ACs) has uncovered multiple testing irregularities on large-scale assessments and is now routinely performed at some testing organizations. This report presents two new approaches to analyzing ACs at the individual test-taker level. The information about all previous answers is used only to partition the data into two disjoint subsets: responses where an AC occurred and responses where an AC did not occur.

Among the assumptions that should be met when applying an item response theory (IRT) model to the analysis of test data is measurement invariance. Measurement invariance requires that, after controlling for a test taker’s proficiency, group membership have no effect on the probability that that test taker will answer a test question correctly. Groups may be defined on the basis of many factors, including gender, race/ethnicity, and citizenship.

The statistical theory of estimating and testing item response theory (IRT) models for items (questions) with discrete (correct or incorrect) responses has been thoroughly developed (recall that IRT is a mathematical model that is typically used to analyze test data). In contrast, the theory for IRT models for items with continuous responses has hardly received any attention. This omission is mainly due to the fact that, so far, the continuous response format has hardly been used by the testing industry.

In this report we present a measure to identify unlikely patterns of correct/incorrect answers to test questions (commonly referred to as items). Some examples of why such patterns may occur include the misinterpretation of questions, item preknowledge, answer copying, or guessing behavior. The proposed measure is the probability of exceedance (PE). PE provides information about the probability of a correct/incorrect answer pattern, conditional on the test taker’s total score. Although this concept is not new, it is hardly if ever applied in practice.

When a test taker has prior knowledge about an administered test question (item), then this event is called item preknowledge, the test taker is called aberrant, and the item is called compromised. Item preknowledge negatively affects the corresponding testing program and its test score users (universities, companies, government organizations) because the scores produced for aberrant test takers will be invalid. The performance of eight statistics for detection of item preknowledge (five existing, two modified, and one new) was studied via computer simulations.

Item response theory (IRT) is a mathematical model that is often applied in the development and analysis of educational and psychological assessments. Various IRT models exist, and practitioners must choose the model that is most appropriate for their particular assessment. Even when the most appropriate model is applied, the fit of the assessment data to the model is rarely perfect in practice. How serious, then, is model misfit for practical decision-making?

This study was conducted to evaluate the predictive validity of each of the current Law School Admission Test (LSAT) item types as well as the interrelationships among them. The current LSAT consists of three item types: Analytical Reasoning (AR), Logical Reasoning (LR), and Reading Comprehension (RC). Even though the correlation of overall LSAT scaled score with first-year average (FYA) in law school is examined on a regular basis at the Law School Admission Council (¾«¶«Ó°Òµ), the separate correlations for each of these three item types have only rarely been studied.

In the analysis of data for the Law School Admission Test (LSAT) and other similar standardized tests, a mathematical model called item response theory (IRT) is commonly used to estimate both the characteristics of the test questions (items) and the ability level of the test takers. Such analyses are based on the test takers’ correct and incorrect responses to the test items.