¾«¶«Ó°Òµ

Researching on tablet

Research Library

All reports in ¾«¶«Ó°Òµâ€™s Research Library are available upon request. Executive summaries are available below for the latest LSAT Technical Reports and other research published within the last 10 years.

Looking for older reports? Consult the Research Archive

Current Research:

Prelaw advisors at undergraduate institutions serve a vital role in higher education and the legal profession, guiding people through one of the most consequential decisions of their lives: whether to pursue a law degree. Their advisees come to them with questions about where to apply, how to present themselves as applicants, and where to attend. Advisors perform this work under significant institutional constraints, such as a lack of time, funding, or other support.

Standard item response theory (IRT) models have been extended with testlet effects to account for the nesting of items; these are well known as (Bayesian) testlet models or random effect models for testlets. The testlet modeling framework has several disadvantages. A sufficient number of testlet items are needed to estimate testlet effects, and a sufficient number of individuals are needed to estimate testlet variance. The prior for the testlet variance parameter can only represent a positive association among testlet items.

Bayesian covariance structure modeling (BCSM) offers a flexible approach to modeling complex interdependences that arise when gathering test-taker data through computerized testing. In addition to the scored responses, process data such as response times or action patterns are obtained. Data from different sources may be cross-correlated; furthermore, within each data source, blocks of correlated observations may form testlet structures. In previous reports, BCSM was limited to the assumption that all test takers are part of the same group.

The aim of this study was twofold: First, we investigated whether scores on an admission test administered in proctored and unproctored environments led to similar predictions of future academic success. Second, we explored how Bayesian modeling can be of help in interpreting admission-testing data. Results showed that the two modes of administering an admission test did not require the use of different models for predicting academic success, and that Bayesian modeling provides a very useful and easy-to-interpret framework for predicting future academic success.

With computerized testing, it is possible to record not only the responses of test takers to test questions but also other details about the test taker’s activity, such as the amount of time spent responding to each question. These details comprise a new type of data called process data. This report proposes a new approach to modeling responses, response times, and other process data: Test-taker data that naturally belong together are grouped in a cross-classification structure. Five examples of models applying this approach are illustrated.

A new statistical model is proposed to study the effects of various testing conditions on a population of test takers. This flexible model allows for numerous effects to be considered simultaneously. A Bayesian approach is employed, taking prior information into consideration. An empirical example demonstrates the utility of the suggested model to test the influence of item presentation formats on the performance of test takers. This research could be of practical value in a potential transition of the Law School Admission Test (LSAT) from a paper-and-pencil format to a digital mode.

Automated methods have been developed for assembling test forms, evaluating a pool of test questions (i.e., items) to determine the number of test form assemblies it can support, and designing an item pool that can most efficiently support the test form assembly process. Automated methods have greatly maintained and improved such activities, all of which are essential to the support of every testing program. This report reviews the major approaches that have been applied in the development of these methods.

The problems of item pool analysis and design are the subject of many recent studies. The rationale for this type of research is to increase the usability of existing item pools and to decrease the cost of designing new items. Clearly these are crucial problems for all testing agencies.

Many standardized tests are now administered via computer rather than paper-and-pencil format. The computer-based delivery mode brings with it certain advantages, one of which is the ability to record not only the test taker’s response to each item (i.e., question), but also the amount of time the test taker spends considering and answering each item. Research on how to represent and utilize response time data has proliferated, but most of the research is based on the assumption of constant working speed in relation to a certain accuracy level.

Test theory typically deals with categorical responses to test questions (items), for instance, correct/incorrect responses or responses that represent a choice from a finite number of alternatives. Whenever technically possible, it is attractive to collect information on continuous response variables that accompany these responses as a covariate. One obvious example is response time; other examples are information on cursor movement in computer-based testing, eye-tracking information, or physiological information.

Many standardized tests are now administered via computer rather than paper and pencil. The computer-based delivery mode brings with it certain advantages, such as the ability to record not only the test taker’s response to each item (i.e., question), but also the amount of time the test taker spends considering and answering each item. The analysis of response times (RTs) is still a developing area of research.

In high-stakes testing, it is important to verify the validity of individual test scores. Although a test, in general, results in valid test scores for most test takers, there may be individual test takers with unusual answer patterns for whom test score validity is questionable. One example of such aberrance is a test taker who guesses on a large number of questions or one who has preknowledge of the answers to some questions. An effective statistical technique (developed for a single test) was extended for tests that consist of multiple subtests, as does the Law School Admission Test.

Several statistics used to detect inconsistent patterns of correct/incorrect answers to test questions (items) were evaluated based on data from one Analytical Reasoning (AR) and one Logical Reasoning (LR) section of the Law School Admission Test. Item score patterns were also evaluated based on gender and racial/ethnic subgroups. We showed that test takers who were consistently flagged by all statistics evaluated and for both the AR and the LR sections had relatively low scores, which may have been the result of extensive guessing.

With computerized testing, it is possible to record both the responses of test takers to test questions (i.e., items) and the amount of time spent by a test taker in responding to each question. Various models have been proposed that take into account both test-taker ability and working speed, with many models assuming a constant working speed throughout the test. The constant working speed assumption may be inappropriate for various reasons.

A mathematical model called item response theory is often applied to high-stakes tests to estimate test-taker ability level and to determine the characteristics of test questions (i.e., items). Often, these tests contain subsets of items (testlets) grouped around a common stimulus. This grouping often leads to items within one testlet being more strongly correlated among themselves than among items from other testlets, which can result in moderate to strong testlet effects.

Text similarity measurement provides a rich source of information and is increasingly being used in the development of new educational and psychological applications. However, due to the high-stakes nature of educational and psychological testing, it is imperative that a text similarity measure be stable (or robust) to avoid uncertainty in the data. The present research was sparked by this requirement. First, multiple sources of uncertainty that may affect the computation of semantic similarity between two texts are enumerated.

In a large-scaled high-stakes testing program such as the Law School Admission Test (LSAT), it is necessary to maintain a large bank of test items to support the demand for a new test form at nearly every administration. To assure that the item bank can support the test assembly requirements, ongoing monitoring of the quality of the item bank is necessary to identify deficiencies and direct item development efforts.

Many standardized tests are now administered via computer rather than paper-and-pencil format. The computer-based delivery mode brings with it certain advantages. One advantage is the ability to adapt the difficulty level of the test to the ability level of the test taker in what has been termed computerized adaptive testing, or CAT. A second advantage is the ability to record not only the test taker’s response to each item (i.e., question), but also the amount of time the test taker spends considering and answering each item.

Among the assumptions that should be met when applying an item response theory (IRT) model to the analysis of test data is measurement invariance. Measurement invariance requires that, after controlling for a test taker’s proficiency, group membership have no effect on the probability that that test taker will answer a test question correctly. Groups may be defined on the basis of many factors, including gender, race/ethnicity, and citizenship.

The statistical theory of estimating and testing item response theory (IRT) models for items (questions) with discrete (correct or incorrect) responses has been thoroughly developed (recall that IRT is a mathematical model that is typically used to analyze test data). In contrast, the theory for IRT models for items with continuous responses has hardly received any attention. This omission is mainly due to the fact that, so far, the continuous response format has hardly been used by the testing industry.