1. Home
  2. Knowledge Base
  3. Technical
  4. Question Analysis (Technical)

Question Analysis (Technical)

Overview

The OnTarget Question Analysis Report is a comprehensive statistical review system that evaluates test questions after administration to ensure each item meets minimum quality control criteria. This post-administration analysis provides educators with detailed insights into question performance, difficulty, and validity.

Analysis Methods

The system offers three distinct approaches for analyzing questions:

1. Quality-Based Filtering

  • All Questions: Comprehensive review of entire test
  • Quality-Specific: Focus on poor-performing questions only for targeted improvement

2. Sorting Criteria

  • Question Number: Sequential analysis by item position
  • Question Difficulty: Organized by statistical difficulty measures
  • Question Quality: Ranked by discriminatory power

3. Display Order

  • Ascending Order: Lowest to highest values
  • Descending Order: Highest to lowest values

Key Statistical Metrics

Question Difficulty (P-Value)

Range: 0.00 to 1.00

Formula: P-Value = (Number of students answering correctly) / (Total number of students)

Calculation Example:

  • Total students: 100
  • Students answering correctly: 93
  • P-Value = 93/100 = 0.93

Interpretation Guidelines:

  • Interpretation: Percentage of students answering correctly
  • Optimal Range: 0.30-0.70 for classroom assessments
  • Ideal Difficulty: 0.50 (50% correct)
  • Example: P-Value of 0.93 indicates 93% correct responses (too easy)

Question Difficulty (Rasch Scale)

Range: Typically -4 to +4

Formula (simplified): Difficulty (logits) = ln[(1-P)/P]
Where P = proportion correct (P-Value)

Calculation Example:

  • P-Value = 0.93
  • Difficulty = ln[(1-0.93)/0.93] = ln[0.07/0.93] = ln[0.075] = -2.59 logits

Interpretation Guidelines:

  • Negative Values: Easier questions
  • Positive Values: Harder questions
  • Zero Point: Moderate difficulty
  • Example: -1.37 confirms an easy question

Question Quality (Point Biserial)

Range: -1.00 to +1.00

Formula:

rpb = (X̄₁ - X̄₀) × √(p×q) / SD_total

Where:
- X̄₁ = Mean total score of students who answered correctly
- X̄₀ = Mean total score of students who answered incorrectly  
- p = Proportion answering correctly (P-Value)
- q = Proportion answering incorrectly (1-p)
- SD_total = Standard deviation of total test scores

Calculation Example:

  • Mean score (correct group): 85
  • Mean score (incorrect group): 72
  • P-Value: 0.93 (p = 0.93, q = 0.07)
  • Total test SD: 12
  • rpb = (85-72) × √(0.93×0.07) / 12 = 13 × √0.065 / 12 = 13 × 0.255 / 12 = 0.28

Interpretation Guidelines:

  • Threshold: 0.30+ considered “good” quality
  • Function: Measures ability to distinguish high vs. low-performing students
  • Positive Values: Students who scored well overall were more likely to answer correctly
  • Negative Values: Indicates problematic item (high scorers missing, low scorers getting correct)
  • Higher Positive Values: Better discrimination
  • Function: Measures ability to distinguish high vs. low-performing students
  • Example: 0.35 indicates good discriminatory power

Additional Statistical Measures

Discrimination Index (D)

Formula:

D = P_upper - P_lower

Where:
- P_upper = Proportion correct in upper 27% of scorers
- P_lower = Proportion correct in lower 27% of scorers

Interpretation:

  • Excellent: D ≥ 0.40
  • Good: D = 0.30-0.39
  • Fair: D = 0.20-0.29
  • Poor: D < 0.20

Standard Error of Measurement (SEM)

Formula:

SEM = SD × √(1 - reliability)

Where:
- SD = Standard deviation of test scores
- reliability = Test reliability coefficient (e.g., Cronbach's α)

Reliability (Cronbach’s Alpha)

Formula:

α = (k/(k-1)) × (1 - (Σσ²ᵢ/σ²_total))

Where:
- k = Number of test items
- σ²ᵢ = Variance of individual items
- σ²_total = Variance of total scores

Evidence of Validity Framework

The system includes a comprehensive validity review checklist with six evaluation criteria:

1. Standards Alignment

  • Curriculum Match: Verification against state standards
  • Learning Objectives: Alignment with intended outcomes
  • Depth of Knowledge (DOK) Levels:
    • DOK 1: Recall facts (identify, list)
    • DOK 2: Apply skills/concepts (describe, compare)
    • DOK 3: Strategic thinking (analyze, evaluate)
    • DOK 4: Extended thinking (synthesize, create)

2. Bias and Sensitivity Review

  • Cultural Bias Detection: Identification of cultural assumptions
  • Socioeconomic Considerations: Avoiding disadvantages based on background
  • Stereotype Prevention: Elimination of harmful assumptions
  • Inclusive Content: Ensuring fairness across student populations

3. Language and Vocabulary Assessment

  • Reading Level Appropriateness: Grade-level vocabulary verification
  • Clarity Standards: Clear, concise wording requirements
  • Terminology Consistency: Uniform vocabulary usage
  • Active Voice Preference: Enhanced readability standards

4. Structure and Context Evaluation

  • Organizational Flow: Logical information progression
  • Realistic Scenarios: Relevant, authentic contexts
  • Format Alignment: Question structure supports objectives
  • Instruction Clarity: Unambiguous task requirements

5. Answer Choices Analysis (Multiple Choice)

  • Plausible Distractors: Realistic incorrect options based on common misconceptions
  • Length Consistency: Similar complexity across all choices
  • Single Correct Answer: Elimination of ambiguity
  • Avoiding “Gotcha” Elements: Fair assessment practices

6. Visual Elements Review

  • Clarity Standards: Legible charts, graphs, and images
  • Purpose Alignment: Graphics support question objectives
  • Accessibility Compliance: Universal design principles
  • Complete Information: All necessary data provided

Analysis Workflow

Individual Question Review Process

  1. Selection Phase: Choose analysis method and sorting criteria
  2. Statistical Review: Examine P-Value, Rasch, and Point Biserial metrics
  3. Validity Assessment: Apply six-criteria evaluation framework
  4. Documentation: Record findings and recommended actions
  5. Revision Planning: Note improvements for future iterations

Collaborative Features

  • Team Analysis: Multi-educator review capabilities
  • Professional Learning Community (PLC) Integration: Structured group analysis
  • Individual Review Mode: Personal assessment workflow
  • Note-Taking System: Built-in documentation for future reference

Interpretation Guidelines

Statistical Thresholds and Calculations

P-Value Classification

Difficulty Level = {
  Very Easy:    P ≥ 0.90
  Easy:         0.70 ≤ P < 0.90
  Moderate:     0.30 ≤ P < 0.70
  Hard:         0.10 ≤ P < 0.30
  Very Hard:    P < 0.10
}

Point Biserial Quality Standards

Quality Rating = {
  Excellent:    rpb ≥ 0.40
  Good:         0.30 ≤ rpb < 0.40
  Fair:         0.20 ≤ rpb < 0.30
  Poor:         0.10 ≤ rpb < 0.20
  Very Poor:    rpb < 0.10
}

Rasch Difficulty Interpretation

Difficulty Category = {
  Very Easy:    δ ≤ -2.0 logits
  Easy:         -2.0 < δ ≤ -1.0 logits
  Moderate:     -1.0 < δ ≤ 1.0 logits
  Hard:         1.0 < δ ≤ 2.0 logits
  Very Hard:    δ > 2.0 logits
}

Sample Calculation Workflow

Step 1: Calculate Basic Statistics

Given Data:
- Total Students (N) = 120
- Correct Responses = 87
- Mean Score (Correct Group) = 82.5
- Mean Score (Incorrect Group) = 71.2
- Overall Test SD = 15.3

Step 2: Compute P-Value

P-Value = 87/120 = 0.725

Step 3: Compute Rasch Difficulty

δ = ln[(1-0.725)/0.725] = ln[0.275/0.725] = ln[0.379] = -0.97 logits

Step 4: Compute Point Biserial

p = 0.725, q = 0.275
rpb = (82.5-71.2) × √(0.725×0.275) / 15.3
rpb = 11.3 × √0.199 / 15.3
rpb = 11.3 × 0.446 / 15.3 = 0.33

Step 5: Interpret Results

  • Difficulty: Easy (P = 0.725)
  • Rasch: Easy (-0.97 logits)
  • Quality: Good (rpb = 0.33)
  • Recommendation: Consider increasing difficulty while maintaining good discrimination

High-Quality Questions

Criteria:

  • P-Value: 0.30-0.70 range
  • Point Biserial: 0.30+ discrimination
  • Rasch: -1.0 to +1.0 logits
  • Full validity criteria compliance

Questions Requiring Revision

Statistical Red Flags

Revision Priority = {
  High:     P > 0.90 OR P < 0.10 OR rpb < 0.10
  Medium:   0.70 < P ≤ 0.90 OR 0.10 ≤ P < 0.30 OR 0.10 ≤ rpb < 0.20
  Low:      P slightly outside optimal range OR rpb = 0.20-0.29
}

Specific Issues

  • Too Easy: P-Value > 0.70, consider increasing difficulty
  • Too Hard: P-Value < 0.30, review for clarity or content alignment
  • Poor Discrimination: Point Biserial < 0.30, examine answer choices and distractors
  • Validity Issues: Any “No” responses in validity checklist

Distractor Analysis Calculations

Distractor Effectiveness Formula

For each incorrect option i:
Attractiveness_i = (Students selecting option i) / (Total students answering incorrectly)

Discrimination_i = (% Low scorers selecting i) - (% High scorers selecting i)

Ideal Distractor Characteristics:

  • Attractiveness: 15-35% of incorrect responses
  • Discrimination: Positive values (more low scorers than high scorers selecting)

Implementation Benefits

  • Data-Driven Decisions: Statistical foundation for assessment improvement
  • Quality Assurance: Systematic validation of test items
  • Instructional Insights: Understanding of student learning patterns
  • Assessment Bank Development: Building validated question repositories
  • Professional Development: Enhanced educator assessment literacy
Updated on May 28, 2025

Was this article helpful?

Related Articles

Leave a Comment