Overview
The OnTarget Question Analysis Report is a comprehensive statistical review system that evaluates test questions after administration to ensure each item meets minimum quality control criteria. This post-administration analysis provides educators with detailed insights into question performance, difficulty, and validity.
Analysis Methods
The system offers three distinct approaches for analyzing questions:
1. Quality-Based Filtering
- All Questions: Comprehensive review of entire test
- Quality-Specific: Focus on poor-performing questions only for targeted improvement
2. Sorting Criteria
- Question Number: Sequential analysis by item position
- Question Difficulty: Organized by statistical difficulty measures
- Question Quality: Ranked by discriminatory power
3. Display Order
- Ascending Order: Lowest to highest values
- Descending Order: Highest to lowest values
Key Statistical Metrics
Question Difficulty (P-Value)
Range: 0.00 to 1.00
Formula: P-Value = (Number of students answering correctly) / (Total number of students)
Calculation Example:
- Total students: 100
- Students answering correctly: 93
- P-Value = 93/100 = 0.93
Interpretation Guidelines:
- Interpretation: Percentage of students answering correctly
- Optimal Range: 0.30-0.70 for classroom assessments
- Ideal Difficulty: 0.50 (50% correct)
- Example: P-Value of 0.93 indicates 93% correct responses (too easy)
Question Difficulty (Rasch Scale)
Range: Typically -4 to +4
Formula (simplified): Difficulty (logits) = ln[(1-P)/P]
Where P = proportion correct (P-Value)
Calculation Example:
- P-Value = 0.93
- Difficulty = ln[(1-0.93)/0.93] = ln[0.07/0.93] = ln[0.075] = -2.59 logits
Interpretation Guidelines:
- Negative Values: Easier questions
- Positive Values: Harder questions
- Zero Point: Moderate difficulty
- Example: -1.37 confirms an easy question
Question Quality (Point Biserial)
Range: -1.00 to +1.00
Formula:
rpb = (X̄₁ - X̄₀) × √(p×q) / SD_total
Where:
- X̄₁ = Mean total score of students who answered correctly
- X̄₀ = Mean total score of students who answered incorrectly
- p = Proportion answering correctly (P-Value)
- q = Proportion answering incorrectly (1-p)
- SD_total = Standard deviation of total test scores
Calculation Example:
- Mean score (correct group): 85
- Mean score (incorrect group): 72
- P-Value: 0.93 (p = 0.93, q = 0.07)
- Total test SD: 12
- rpb = (85-72) × √(0.93×0.07) / 12 = 13 × √0.065 / 12 = 13 × 0.255 / 12 = 0.28
Interpretation Guidelines:
- Threshold: 0.30+ considered “good” quality
- Function: Measures ability to distinguish high vs. low-performing students
- Positive Values: Students who scored well overall were more likely to answer correctly
- Negative Values: Indicates problematic item (high scorers missing, low scorers getting correct)
- Higher Positive Values: Better discrimination
- Function: Measures ability to distinguish high vs. low-performing students
- Example: 0.35 indicates good discriminatory power
Additional Statistical Measures
Discrimination Index (D)
Formula:
D = P_upper - P_lower
Where:
- P_upper = Proportion correct in upper 27% of scorers
- P_lower = Proportion correct in lower 27% of scorers
Interpretation:
- Excellent: D ≥ 0.40
- Good: D = 0.30-0.39
- Fair: D = 0.20-0.29
- Poor: D < 0.20
Standard Error of Measurement (SEM)
Formula:
SEM = SD × √(1 - reliability)
Where:
- SD = Standard deviation of test scores
- reliability = Test reliability coefficient (e.g., Cronbach's α)
Reliability (Cronbach’s Alpha)
Formula:
α = (k/(k-1)) × (1 - (Σσ²ᵢ/σ²_total))
Where:
- k = Number of test items
- σ²ᵢ = Variance of individual items
- σ²_total = Variance of total scores
Evidence of Validity Framework
The system includes a comprehensive validity review checklist with six evaluation criteria:
1. Standards Alignment
- Curriculum Match: Verification against state standards
- Learning Objectives: Alignment with intended outcomes
- Depth of Knowledge (DOK) Levels:
- DOK 1: Recall facts (identify, list)
- DOK 2: Apply skills/concepts (describe, compare)
- DOK 3: Strategic thinking (analyze, evaluate)
- DOK 4: Extended thinking (synthesize, create)
2. Bias and Sensitivity Review
- Cultural Bias Detection: Identification of cultural assumptions
- Socioeconomic Considerations: Avoiding disadvantages based on background
- Stereotype Prevention: Elimination of harmful assumptions
- Inclusive Content: Ensuring fairness across student populations
3. Language and Vocabulary Assessment
- Reading Level Appropriateness: Grade-level vocabulary verification
- Clarity Standards: Clear, concise wording requirements
- Terminology Consistency: Uniform vocabulary usage
- Active Voice Preference: Enhanced readability standards
4. Structure and Context Evaluation
- Organizational Flow: Logical information progression
- Realistic Scenarios: Relevant, authentic contexts
- Format Alignment: Question structure supports objectives
- Instruction Clarity: Unambiguous task requirements
5. Answer Choices Analysis (Multiple Choice)
- Plausible Distractors: Realistic incorrect options based on common misconceptions
- Length Consistency: Similar complexity across all choices
- Single Correct Answer: Elimination of ambiguity
- Avoiding “Gotcha” Elements: Fair assessment practices
6. Visual Elements Review
- Clarity Standards: Legible charts, graphs, and images
- Purpose Alignment: Graphics support question objectives
- Accessibility Compliance: Universal design principles
- Complete Information: All necessary data provided
Analysis Workflow
Individual Question Review Process
- Selection Phase: Choose analysis method and sorting criteria
- Statistical Review: Examine P-Value, Rasch, and Point Biserial metrics
- Validity Assessment: Apply six-criteria evaluation framework
- Documentation: Record findings and recommended actions
- Revision Planning: Note improvements for future iterations
Collaborative Features
- Team Analysis: Multi-educator review capabilities
- Professional Learning Community (PLC) Integration: Structured group analysis
- Individual Review Mode: Personal assessment workflow
- Note-Taking System: Built-in documentation for future reference
Interpretation Guidelines
Statistical Thresholds and Calculations
P-Value Classification
Difficulty Level = {
Very Easy: P ≥ 0.90
Easy: 0.70 ≤ P < 0.90
Moderate: 0.30 ≤ P < 0.70
Hard: 0.10 ≤ P < 0.30
Very Hard: P < 0.10
}
Point Biserial Quality Standards
Quality Rating = {
Excellent: rpb ≥ 0.40
Good: 0.30 ≤ rpb < 0.40
Fair: 0.20 ≤ rpb < 0.30
Poor: 0.10 ≤ rpb < 0.20
Very Poor: rpb < 0.10
}
Rasch Difficulty Interpretation
Difficulty Category = {
Very Easy: δ ≤ -2.0 logits
Easy: -2.0 < δ ≤ -1.0 logits
Moderate: -1.0 < δ ≤ 1.0 logits
Hard: 1.0 < δ ≤ 2.0 logits
Very Hard: δ > 2.0 logits
}
Sample Calculation Workflow
Step 1: Calculate Basic Statistics
Given Data:
- Total Students (N) = 120
- Correct Responses = 87
- Mean Score (Correct Group) = 82.5
- Mean Score (Incorrect Group) = 71.2
- Overall Test SD = 15.3
Step 2: Compute P-Value
P-Value = 87/120 = 0.725
Step 3: Compute Rasch Difficulty
δ = ln[(1-0.725)/0.725] = ln[0.275/0.725] = ln[0.379] = -0.97 logits
Step 4: Compute Point Biserial
p = 0.725, q = 0.275
rpb = (82.5-71.2) × √(0.725×0.275) / 15.3
rpb = 11.3 × √0.199 / 15.3
rpb = 11.3 × 0.446 / 15.3 = 0.33
Step 5: Interpret Results
- Difficulty: Easy (P = 0.725)
- Rasch: Easy (-0.97 logits)
- Quality: Good (rpb = 0.33)
- Recommendation: Consider increasing difficulty while maintaining good discrimination
High-Quality Questions
Criteria:
- P-Value: 0.30-0.70 range
- Point Biserial: 0.30+ discrimination
- Rasch: -1.0 to +1.0 logits
- Full validity criteria compliance
Questions Requiring Revision
Statistical Red Flags
Revision Priority = {
High: P > 0.90 OR P < 0.10 OR rpb < 0.10
Medium: 0.70 < P ≤ 0.90 OR 0.10 ≤ P < 0.30 OR 0.10 ≤ rpb < 0.20
Low: P slightly outside optimal range OR rpb = 0.20-0.29
}
Specific Issues
- Too Easy: P-Value > 0.70, consider increasing difficulty
- Too Hard: P-Value < 0.30, review for clarity or content alignment
- Poor Discrimination: Point Biserial < 0.30, examine answer choices and distractors
- Validity Issues: Any “No” responses in validity checklist
Distractor Analysis Calculations
Distractor Effectiveness Formula
For each incorrect option i:
Attractiveness_i = (Students selecting option i) / (Total students answering incorrectly)
Discrimination_i = (% Low scorers selecting i) - (% High scorers selecting i)
Ideal Distractor Characteristics:
- Attractiveness: 15-35% of incorrect responses
- Discrimination: Positive values (more low scorers than high scorers selecting)
Implementation Benefits
- Data-Driven Decisions: Statistical foundation for assessment improvement
- Quality Assurance: Systematic validation of test items
- Instructional Insights: Understanding of student learning patterns
- Assessment Bank Development: Building validated question repositories
- Professional Development: Enhanced educator assessment literacy