At this point, we have a bit over 30,000 individual questions in the SF database.

Our goal is to not only expand our question database, but also improve the questions and their effectiveness.   We can do this by analyzing questions, then building on the best ones and removing/replacing the worst ones.

Thus, we need to be able to compare and critique the questions themselves.  Given that a LOT of programs are using the same question database, we have a deep source for excellent data analysis.


Question analysis will help us recognize questions that are effective (or not) in predicting a student's overall performance.   This is done using a combination of statistical methods and item analysis techniques. 

1. Difficulty Index (P-Value):

Definition: Measures the proportion of students who answered the question correctly.
Calculation: Divide the number of students who answered correctly by the total number of students.  
Interpretation: A high value indicates an easy question, and a low value indicates a difficult question. Generally, values too close to 1.0 or 0 indicate the question might be too easy or too hard, respectively.    
Ranking:  In general, the best questions sit in the P-Value of 0.4 to 0.6 range. Thus, for ranking, we can use a standardized P-value, where the highest quality questions are close to 1.

Standardized P-Value = 1 – |2P – 1|

2. Discrimination Index (D-Value):

Definition: Measures how well a question differentiates between high-performing and low-performing students.
Calculation: Compare the performance on a specific question of the top 27% of scorers (high group) with the bottom 27% (low group). The formula is

D-Value=  (2(# Correct Responses in top 27%-# Correct Responses in bottom 27%) )/(Total Number of Students)

Interpretation: A positive value indicates good discrimination, with high values indicating a better discrimination. A negative value or a value close to zero suggests the question does not discriminate well between high and low performers.
Ranking: In general, the best questions sit in the D-Value of closest to +1. Thus, we can normalize the D-Values, then rank using a transformed D-Value that sits between 0 and 1, where the highest quality questions are closest to 1.

3. Point-Biserial Correlation Coefficient:

Definition: Measures the correlation between the score on a particular question and the total score on the test.
Calculation: Use statistical software to calculate the point-biserial correlation coefficient between a dichotomous variable (correct/incorrect for the question) and a continuous variable (total test score).


Interpretation: A higher positive coefficient indicates that students who got the question right tend to have higher overall scores, suggesting the question is a good indicator of overall success.
Ranking: In general, the best questions sit in the Point-Biserial Correlation Coefficient closest to +1. Thus, we can normalize these (just like the D-Values), then rank using a transformed coefficient that sits between 0 and 1, where the highest quality questions are closest to 1.

Ranking Scores:

Score = 0.30(Standardized P-Value) + 0.40(Normalized D-Value) + 0.30(Normalized P-B Correlation)

Questions are ranked from highest (best) to lowest (worst), based on this score.


Test Analysis Samples:

We're in a position to start learning from the data.  Of course, next year's data will be even more telling, as we'll have more and cleaner data.    Cleaner data due to the fact that merging of questions is pretty much done, and more students will be writing tests with these questions.   Bottomline is that we'll be able to analyze questions more accurately starting next year.

Exam Sample Results
Physics 11 - Unit 1 Test PH11-U1.pdf
English 12 - Final Exam ENST12-FIN.pdf
PC Math 11 - Unit 1 Test PCM11-U1.pdf

 

Last modified: Tuesday, 23 April 2024, 4:02 PM