CSEDU2026 - LLM-Based Rubric Grading for Programming Assignments: An Empirical Study [Appendix 1]

APPENDICES

A.1 Evaluation Grid

Evaluation rubric used in the assessment process (by both the instructor and the LLM-based system). The rubric is organized into four categories (first column), each associated with a percentage weight. Each category includes one or more specific indicators. For every indicator, a corresponding guiding question is provided (third column), and responses are evaluated according to predefined performance levels.

CategoryIndicatorQuestionLevels (0–5)
Correctness (weight: 60%)Functional CompletenessDoes the application correctly implement the basic and advanced tasks described in the assignment?0: Application does not correctly implement the required tasks.
1: Many basic tasks are missing or incorrect.
2: Some basic tasks are incomplete or partially incorrect.
3: All basic tasks are correctly implemented.
4: All basic tasks and most advanced tasks are correctly implemented.
5: All basic and advanced tasks are correctly implemented.
 RobustnessDoes the implementation correctly handle edge cases, errors, and abnormal conditions?0: No handling of edge cases or errors.
1: Error handling almost absent.
2: Minimal or incomplete error handling.
3: Main cases handled but some checks missing.
4: Most edge cases handled correctly.
5: All edge cases and errors identified and handled correctly.
 Design AutonomyDoes the solution include autonomous design choices or improvements not explicitly required by the assignment?0: Solution copied or nearly identical to examples or other submissions.
1: Mechanical solution replicating examples without meaningful adaptation.
2: Only the minimal required solution with rigid design.
3: Standard approach aligned with the assignment without extensions.
4: At least one autonomous design choice improving the project.
5: Multiple motivated design choices improving quality, flexibility, or robustness.
 Specification AdherenceWere the submission instructions (format, naming, required files, execution environment) respected?0: Submission not compliant or not evaluable.
1: Severe issues complicating evaluation.
2: Several instructions not respected.
3: Some formal inaccuracies but the project is evaluable.
4: Minor non-critical formal inaccuracies.
5: All submission instructions respected.
Maintainability (weight: 20%)ModularityIs the code divided into modules or functions with well-defined responsibilities?0: Completely monolithic code.
1: Nearly monolithic code.
2: Limited modularity; functions perform multiple tasks.
3: Sufficient modularity but improvable.
4: Good modularity with rare violations of the single-responsibility principle.
5: Highly modular code with clearly separated responsibilities.
 MaintainabilityCan the code be extended or modified without introducing significant errors?0: Code cannot be modified without rewriting.
1: Modifications extremely difficult.
2: Modifications complex and risky.
3: Modifications possible but require care.
4: Modifications possible with moderate effort.
5: Extensions and modifications are simple and safe.
Readability (weight: 10%)NamingAre variable, function, and class names clear and meaningful?0: Names incomprehensible or absent.
1: Misleading or confusing names.
2: Unclear or inconsistent names in several places.
3: Understandable but sometimes generic names.
4: Generally clear names with rare ambiguities.
5: Names always clear, consistent, and self-explanatory.
 FormattingDo indentation, spacing, and coding style improve readability?0: No readable formatting.
1: Severely inadequate formatting.
2: Disorganized formatting.
3: Acceptable but inconsistent formatting.
4: Clear formatting with minor imperfections.
5: Impeccable and consistent formatting.
Documentation (weight: 10%)DocumentationDo comments help understand the structure and behavior of the code?0: No comments at all.
1: Misleading or unnecessary comments.
2: Rare or not useful comments.
3: Comments present but limited.
4: Generally useful comments.
5: Clear, useful, and well-distributed comments.