Question Analysis (Technical)

Contents

Overview

The OnTarget Question Analysis Report is a comprehensive statistical review system that evaluates test questions after administration to ensure each item meets minimum quality control criteria. This post-administration analysis provides educators with detailed insights into question performance, difficulty, and validity.

Analysis Methods

The system offers three distinct approaches for analyzing questions:

1. Quality-Based Filtering

All Questions: Comprehensive review of entire test
Quality-Specific: Focus on poor-performing questions only for targeted improvement

2. Sorting Criteria

Question Number: Sequential analysis by item position
Question Difficulty: Organized by statistical difficulty measures
Question Quality: Ranked by discriminatory power

3. Display Order

Ascending Order: Lowest to highest values
Descending Order: Highest to lowest values

Key Statistical Metrics

Question Difficulty (P-Value)

Range: 0.00 to 1.00

Formula: P-Value = (Number of students answering correctly) / (Total number of students)

Calculation Example:

Total students: 100
Students answering correctly: 93
P-Value = 93/100 = 0.93

Interpretation Guidelines:

Interpretation: Percentage of students answering correctly
Optimal Range: 0.30-0.70 for classroom assessments
Ideal Difficulty: 0.50 (50% correct)
Example: P-Value of 0.93 indicates 93% correct responses (too easy)

Question Difficulty (Rasch Scale)

Range: Typically -4 to +4

Formula (simplified): Difficulty (logits) = ln[(1-P)/P]
Where P = proportion correct (P-Value)

Calculation Example:

P-Value = 0.93
Difficulty = ln[(1-0.93)/0.93] = ln[0.07/0.93] = ln[0.075] = -2.59 logits

Interpretation Guidelines:

Negative Values: Easier questions
Positive Values: Harder questions
Zero Point: Moderate difficulty
Example: -1.37 confirms an easy question

Question Quality (Point Biserial)

Range: -1.00 to +1.00

Formula:

rpb = (X̄₁ - X̄₀) × √(p×q) / SD_total

Where:
- X̄₁ = Mean total score of students who answered correctly
- X̄₀ = Mean total score of students who answered incorrectly  
- p = Proportion answering correctly (P-Value)
- q = Proportion answering incorrectly (1-p)
- SD_total = Standard deviation of total test scores

Calculation Example:

Mean score (correct group): 85
Mean score (incorrect group): 72
P-Value: 0.93 (p = 0.93, q = 0.07)
Total test SD: 12
rpb = (85-72) × √(0.93×0.07) / 12 = 13 × √0.065 / 12 = 13 × 0.255 / 12 = 0.28

Interpretation Guidelines:

Threshold: 0.30+ considered “good” quality
Function: Measures ability to distinguish high vs. low-performing students
Positive Values: Students who scored well overall were more likely to answer correctly
Negative Values: Indicates problematic item (high scorers missing, low scorers getting correct)
Higher Positive Values: Better discrimination
Function: Measures ability to distinguish high vs. low-performing students
Example: 0.35 indicates good discriminatory power

Additional Statistical Measures

Discrimination Index (D)

Formula:

D = P_upper - P_lower

Where:
- P_upper = Proportion correct in upper 27% of scorers
- P_lower = Proportion correct in lower 27% of scorers

Interpretation:

Excellent: D ≥ 0.40
Good: D = 0.30-0.39
Fair: D = 0.20-0.29
Poor: D < 0.20

Standard Error of Measurement (SEM)

Formula:

SEM = SD × √(1 - reliability)

Where:
- SD = Standard deviation of test scores
- reliability = Test reliability coefficient (e.g., Cronbach's α)

Reliability (Cronbach’s Alpha)

Formula:

α = (k/(k-1)) × (1 - (Σσ²ᵢ/σ²_total))

Where:
- k = Number of test items
- σ²ᵢ = Variance of individual items
- σ²_total = Variance of total scores

Evidence of Validity Framework

The system includes a comprehensive validity review checklist with six evaluation criteria:

1. Standards Alignment

Curriculum Match: Verification against state standards
Learning Objectives: Alignment with intended outcomes
Depth of Knowledge (DOK) Levels:
- DOK 1: Recall facts (identify, list)
- DOK 2: Apply skills/concepts (describe, compare)
- DOK 3: Strategic thinking (analyze, evaluate)
- DOK 4: Extended thinking (synthesize, create)

2. Bias and Sensitivity Review

Cultural Bias Detection: Identification of cultural assumptions
Socioeconomic Considerations: Avoiding disadvantages based on background
Stereotype Prevention: Elimination of harmful assumptions
Inclusive Content: Ensuring fairness across student populations

3. Language and Vocabulary Assessment

Reading Level Appropriateness: Grade-level vocabulary verification
Clarity Standards: Clear, concise wording requirements
Terminology Consistency: Uniform vocabulary usage
Active Voice Preference: Enhanced readability standards

4. Structure and Context Evaluation

Organizational Flow: Logical information progression
Realistic Scenarios: Relevant, authentic contexts
Format Alignment: Question structure supports objectives
Instruction Clarity: Unambiguous task requirements

5. Answer Choices Analysis (Multiple Choice)

Plausible Distractors: Realistic incorrect options based on common misconceptions
Length Consistency: Similar complexity across all choices
Single Correct Answer: Elimination of ambiguity
Avoiding “Gotcha” Elements: Fair assessment practices

6. Visual Elements Review

Clarity Standards: Legible charts, graphs, and images
Purpose Alignment: Graphics support question objectives
Accessibility Compliance: Universal design principles
Complete Information: All necessary data provided

Analysis Workflow

Individual Question Review Process

Selection Phase: Choose analysis method and sorting criteria
Statistical Review: Examine P-Value, Rasch, and Point Biserial metrics
Validity Assessment: Apply six-criteria evaluation framework
Documentation: Record findings and recommended actions
Revision Planning: Note improvements for future iterations

Collaborative Features

Team Analysis: Multi-educator review capabilities
Professional Learning Community (PLC) Integration: Structured group analysis
Individual Review Mode: Personal assessment workflow
Note-Taking System: Built-in documentation for future reference

Interpretation Guidelines

Statistical Thresholds and Calculations

P-Value Classification

Difficulty Level = {
  Very Easy:    P ≥ 0.90
  Easy:         0.70 ≤ P < 0.90
  Moderate:     0.30 ≤ P < 0.70
  Hard:         0.10 ≤ P < 0.30
  Very Hard:    P < 0.10
}

Point Biserial Quality Standards

Quality Rating = {
  Excellent:    rpb ≥ 0.40
  Good:         0.30 ≤ rpb < 0.40
  Fair:         0.20 ≤ rpb < 0.30
  Poor:         0.10 ≤ rpb < 0.20
  Very Poor:    rpb < 0.10
}

Rasch Difficulty Interpretation

Difficulty Category = {
  Very Easy:    δ ≤ -2.0 logits
  Easy:         -2.0 < δ ≤ -1.0 logits
  Moderate:     -1.0 < δ ≤ 1.0 logits
  Hard:         1.0 < δ ≤ 2.0 logits
  Very Hard:    δ > 2.0 logits
}

Sample Calculation Workflow

Step 1: Calculate Basic Statistics

Given Data:
- Total Students (N) = 120
- Correct Responses = 87
- Mean Score (Correct Group) = 82.5
- Mean Score (Incorrect Group) = 71.2
- Overall Test SD = 15.3

Step 2: Compute P-Value

P-Value = 87/120 = 0.725

Step 3: Compute Rasch Difficulty

δ = ln[(1-0.725)/0.725] = ln[0.275/0.725] = ln[0.379] = -0.97 logits

Step 4: Compute Point Biserial

p = 0.725, q = 0.275
rpb = (82.5-71.2) × √(0.725×0.275) / 15.3
rpb = 11.3 × √0.199 / 15.3
rpb = 11.3 × 0.446 / 15.3 = 0.33

Step 5: Interpret Results

Difficulty: Easy (P = 0.725)
Rasch: Easy (-0.97 logits)
Quality: Good (rpb = 0.33)
Recommendation: Consider increasing difficulty while maintaining good discrimination

High-Quality Questions

Criteria:

P-Value: 0.30-0.70 range
Point Biserial: 0.30+ discrimination
Rasch: -1.0 to +1.0 logits
Full validity criteria compliance

Questions Requiring Revision

Statistical Red Flags

Revision Priority = {
  High:     P > 0.90 OR P < 0.10 OR rpb < 0.10
  Medium:   0.70 < P ≤ 0.90 OR 0.10 ≤ P < 0.30 OR 0.10 ≤ rpb < 0.20
  Low:      P slightly outside optimal range OR rpb = 0.20-0.29
}

Specific Issues

Too Easy: P-Value > 0.70, consider increasing difficulty
Too Hard: P-Value < 0.30, review for clarity or content alignment
Poor Discrimination: Point Biserial < 0.30, examine answer choices and distractors
Validity Issues: Any “No” responses in validity checklist

Distractor Analysis Calculations

Distractor Effectiveness Formula

For each incorrect option i:
Attractiveness_i = (Students selecting option i) / (Total students answering incorrectly)

Discrimination_i = (% Low scorers selecting i) - (% High scorers selecting i)

Ideal Distractor Characteristics:

Attractiveness: 15-35% of incorrect responses
Discrimination: Positive values (more low scorers than high scorers selecting)

Implementation Benefits

Data-Driven Decisions: Statistical foundation for assessment improvement
Quality Assurance: Systematic validation of test items
Instructional Insights: Understanding of student learning patterns
Assessment Bank Development: Building validated question repositories
Professional Development: Enhanced educator assessment literacy

Updated on May 28, 2025

Was this article helpful?

Yes No

Overview

Analysis Methods

1. Quality-Based Filtering

2. Sorting Criteria

3. Display Order

Key Statistical Metrics

Question Difficulty (P-Value)

Question Difficulty (Rasch Scale)

Question Quality (Point Biserial)

Additional Statistical Measures

Discrimination Index (D)

Standard Error of Measurement (SEM)

Reliability (Cronbach’s Alpha)

Evidence of Validity Framework

1. Standards Alignment

2. Bias and Sensitivity Review

3. Language and Vocabulary Assessment

4. Structure and Context Evaluation

5. Answer Choices Analysis (Multiple Choice)

6. Visual Elements Review

Analysis Workflow

Individual Question Review Process

Collaborative Features

Interpretation Guidelines

Statistical Thresholds and Calculations

P-Value Classification

Point Biserial Quality Standards

Rasch Difficulty Interpretation

Sample Calculation Workflow

High-Quality Questions

Questions Requiring Revision

Statistical Red Flags

Specific Issues

Distractor Analysis Calculations

Distractor Effectiveness Formula

Implementation Benefits

Was this article helpful?

Related Articles

Leave a Comment Cancel