CSEDU2026 - LLM-Based Rubric Grading for Programming Assignments: An Empirical Study [Appendix 1]
APPENDICES
A.1 Evaluation Grid
Evaluation rubric used in the assessment process (by both the instructor and the LLM-based system). The rubric is organized into four categories (first column), each associated with a percentage weight. Each category includes one or more specific indicators. For every indicator, a corresponding guiding question is provided (third column), and responses are evaluated according to predefined performance levels.
| Category | Indicator | Question | Levels (0–5) |
|---|---|---|---|
| Correctness (weight: 60%) | Functional Completeness | Does the application correctly implement the basic and advanced tasks described in the assignment? | 0: Application does not correctly implement the required tasks. 1: Many basic tasks are missing or incorrect. 2: Some basic tasks are incomplete or partially incorrect. 3: All basic tasks are correctly implemented. 4: All basic tasks and most advanced tasks are correctly implemented. 5: All basic and advanced tasks are correctly implemented. |
| Robustness | Does the implementation correctly handle edge cases, errors, and abnormal conditions? | 0: No handling of edge cases or errors. 1: Error handling almost absent. 2: Minimal or incomplete error handling. 3: Main cases handled but some checks missing. 4: Most edge cases handled correctly. 5: All edge cases and errors identified and handled correctly. | |
| Design Autonomy | Does the solution include autonomous design choices or improvements not explicitly required by the assignment? | 0: Solution copied or nearly identical to examples or other submissions. 1: Mechanical solution replicating examples without meaningful adaptation. 2: Only the minimal required solution with rigid design. 3: Standard approach aligned with the assignment without extensions. 4: At least one autonomous design choice improving the project. 5: Multiple motivated design choices improving quality, flexibility, or robustness. | |
| Specification Adherence | Were the submission instructions (format, naming, required files, execution environment) respected? | 0: Submission not compliant or not evaluable. 1: Severe issues complicating evaluation. 2: Several instructions not respected. 3: Some formal inaccuracies but the project is evaluable. 4: Minor non-critical formal inaccuracies. 5: All submission instructions respected. | |
| Maintainability (weight: 20%) | Modularity | Is the code divided into modules or functions with well-defined responsibilities? | 0: Completely monolithic code. 1: Nearly monolithic code. 2: Limited modularity; functions perform multiple tasks. 3: Sufficient modularity but improvable. 4: Good modularity with rare violations of the single-responsibility principle. 5: Highly modular code with clearly separated responsibilities. |
| Maintainability | Can the code be extended or modified without introducing significant errors? | 0: Code cannot be modified without rewriting. 1: Modifications extremely difficult. 2: Modifications complex and risky. 3: Modifications possible but require care. 4: Modifications possible with moderate effort. 5: Extensions and modifications are simple and safe. | |
| Readability (weight: 10%) | Naming | Are variable, function, and class names clear and meaningful? | 0: Names incomprehensible or absent. 1: Misleading or confusing names. 2: Unclear or inconsistent names in several places. 3: Understandable but sometimes generic names. 4: Generally clear names with rare ambiguities. 5: Names always clear, consistent, and self-explanatory. |
| Formatting | Do indentation, spacing, and coding style improve readability? | 0: No readable formatting. 1: Severely inadequate formatting. 2: Disorganized formatting. 3: Acceptable but inconsistent formatting. 4: Clear formatting with minor imperfections. 5: Impeccable and consistent formatting. | |
| Documentation (weight: 10%) | Documentation | Do comments help understand the structure and behavior of the code? | 0: No comments at all. 1: Misleading or unnecessary comments. 2: Rare or not useful comments. 3: Comments present but limited. 4: Generally useful comments. 5: Clear, useful, and well-distributed comments. |