Description
Synthetic Dataset and Evaluation Results for LLM Output Validation
This repository contains the synthetic dataset and evaluation results used in the Master's thesis: Validating AI-Generated Content: Security Challenges and Solutions Inspired by OWASP by Kamran Khan, Tampere University, 2025.
Contents
1. Dataset
dataset.csv: Synthetic dataset generated for evaluating LLM outputs
Contains 446 samples across 9 categories
Each sample includes: Context, LLMOutput, TrueLabel
Generated using: Deepseek-V3.2-Exp , Gpt-4
2. Evaluation Results
BaselineResulat.csv: Results from the baseline model using only procedural security and schema/structure validator agent(non-llm)
NoJudgeAgentResult: Results ablation study using semantic, validator and Semantic Agent without Judge audit
EvalauationResult: Results from comple
This repository contains the synthetic dataset and evaluation results used in the Master's thesis: Validating AI-Generated Content: Security Challenges and Solutions Inspired by OWASP by Kamran Khan, Tampere University, 2025.
Contents
1. Dataset
dataset.csv: Synthetic dataset generated for evaluating LLM outputs
Contains 446 samples across 9 categories
Each sample includes: Context, LLMOutput, TrueLabel
Generated using: Deepseek-V3.2-Exp , Gpt-4
2. Evaluation Results
BaselineResulat.csv: Results from the baseline model using only procedural security and schema/structure validator agent(non-llm)
NoJudgeAgentResult: Results ablation study using semantic, validator and Semantic Agent without Judge audit
EvalauationResult: Results from comple
| Date made available | 23 Dec 2025 |
|---|---|
| Publisher | Zenodo |
Field of science, Statistics Finland
- 113 Computer and information sciences
Cite this
- DataSetCite