All reports in ¾«¶«Ó°Òµâ€™s Research Library are available upon request. Executive summaries are available below for the latest LSAT Technical Reports and other research published within the last 10 years.
Current Research:
Prelaw advisors at undergraduate institutions serve a vital role in higher education and the legal profession, guiding people through one of the most consequential decisions of their lives: whether to pursue a law degree. Their advisees come to them with questions about where to apply, how to present themselves as applicants, and where to attend. Advisors perform this work under significant institutional constraints, such as a lack of time, funding, or other support.
Test collusion (TC) is the sharing of test materials or answers to test questions (items) before or during a test. Because of the potentially large advantages for the test takers involved, TC poses a serious threat to the validity of score interpretations. The proposed approach applies graph theory methodology to response similarity analyses to identify groups involved in TC while minimizing the false-positive detection rate. The new approach is illustrated and compared with a recently published method using real and simulated data.
This report addresses a general type of cluster aberrancy in which a subgroup of test takers has an unfair advantage on some subset of administered items. Examples of cluster aberrancy include item preknowledge and test collusion. In general, cluster aberrancy is hard to detect due to the multiple unknowns involved: Unknown subgroups of test takers have an unfair advantage on unknown subsets of items. The issue of multiple unknowns makes the detection of cluster aberrancy a challenging problem from the standpoint of applied mathematics.
Most high-stakes testing programs apply methods to identify unlikely patterns of correct/incorrect responses to test questions. Some examples of why such patterns may occur include misinterpretation of questions, question preknowledge, answer copying, or guessing behavior. This report provides an overview of existing approaches to identifying atypical response patterns that fall into a class of analyses known as nonparametric statistics. Results of a simulation study comparing the different approaches, along with guidelines for applying these indices in practice, are also presented.
Automated methods have been developed for assembling test forms, evaluating a pool of test questions (i.e., items) to determine the number of test form assemblies it can support, and designing an item pool that can most efficiently support the test form assembly process. Automated methods have greatly maintained and improved such activities, all of which are essential to the support of every testing program. This report reviews the major approaches that have been applied in the development of these methods.
The problems of item pool analysis and design are the subject of many recent studies. The rationale for this type of research is to increase the usability of existing item pools and to decrease the cost of designing new items. Clearly these are crucial problems for all testing agencies.
Stochastic Programming for Individualized Test Assembly With Mixture Response Time Models (RR 15-01)
Many standardized tests are now administered via computer rather than paper and pencil. The computer-based delivery mode brings with it certain advantages, such as the ability to record not only the test taker’s response to each item (i.e., question), but also the amount of time the test taker spends considering and answering each item. The analysis of response times (RTs) is still a developing area of research.
A mathematical model called item response theory is often applied to high-stakes tests to estimate test-taker ability level and to determine the characteristics of test questions (i.e., items). Often, these tests contain subsets of items (testlets) grouped around a common stimulus. This grouping often leads to items within one testlet being more strongly correlated among themselves than among items from other testlets, which can result in moderate to strong testlet effects.
Text similarity measurement provides a rich source of information and is increasingly being used in the development of new educational and psychological applications. However, due to the high-stakes nature of educational and psychological testing, it is imperative that a text similarity measure be stable (or robust) to avoid uncertainty in the data. The present research was sparked by this requirement. First, multiple sources of uncertainty that may affect the computation of semantic similarity between two texts are enumerated.
This report presents a new algorithm for detecting groups of test takers (aberrant groups) who had access to subsets of test questions (aberrant subsets) prior to an exam. This method is in line with the development of statistical methods for detecting test collusion, a new research direction in test security. Test collusion may be described as the large-scale sharing of test materials, including answers to test questions. The algorithm employs several new statistics to perform a sequence of statistical tests to identify aberrant groups.
Many standardized tests are now administered via computer rather than paper-and-pencil format. In a computer-based testing environment, it is possible to record not only the test taker’s response to each question (item), but also the amount of time spent by the test taker in considering and answering each item. Response times (RTs) provide information not only about the test taker’s ability and response behavior but also about item and test characteristics. The current study focuses on the use of RTs to detect aberrant test-taker responses.
In a large-scaled high-stakes testing program such as the Law School Admission Test (LSAT), it is necessary to maintain a large bank of test items to support the demand for a new test form at nearly every administration. To assure that the item bank can support the test assembly requirements, ongoing monitoring of the quality of the item bank is necessary to identify deficiencies and direct item development efforts.
In standardized testing, test takers may change their answer choices for various reasons. The statistical analysis of answer changes (ACs) has uncovered multiple testing irregularities on large-scale assessments and is now routinely performed at some testing organizations. Research on answer-changing behavior has recently branched off in several directions, including modeling of ACs and addressing scanning errors.
While an admission test may strongly predict success in university or law school programs for most test takers, there may be some test takers who are mismeasured. To address this issue, a class of statistics called person-fit statistics is used to check the validity of individual test scores. However, most person-fit statistics are designed for a single test, and not much is known about the performance of these statistics for admission tests consisting of multiple highly correlated subtests.
Many standardized tests are now administered via computer rather than paper-and-pencil format. The computer-based delivery mode brings with it certain advantages. One advantage is the ability to adapt the difficulty level of the test to the ability level of the test taker in what has been termed computerized adaptive testing, or CAT. A second advantage is the ability to record not only the test taker’s response to each item (i.e., question), but also the amount of time the test taker spends considering and answering each item.
In standardized multiple-choice testing, test takers often change their answers for various reasons. The statistical analysis of answer changes (ACs) has uncovered multiple testing irregularities on large-scale assessments and is now routinely performed at some testing organizations. This report presents two new approaches to analyzing ACs at the individual test-taker level. The information about all previous answers is used only to partition the data into two disjoint subsets: responses where an AC occurred and responses where an AC did not occur.
When a test taker has prior knowledge about an administered test question (item), then this event is called item preknowledge, the test taker is called aberrant, and the item is called compromised. Item preknowledge negatively affects the corresponding testing program and its test score users (universities, companies, government organizations) because the scores produced for aberrant test takers will be invalid. The performance of eight statistics for detection of item preknowledge (five existing, two modified, and one new) was studied via computer simulations.