Challenge Significance

The rapid evolution of AIGC technologies has transformed document tampering into a sophisticated threat that blends visual forgery with high-level semantic manipulation. The GenText-Forensics Challenge is established to address these emerging risks through three strategic objectives:

Competition Rules and Incentives

Register & Submit on Codabench

1. Model Submission Requirements

2. Data Usage and Augmentation

3. Compliance and Authority

Challenge Task: Forgery Analysis Report Generation

This challenge focuses on the comprehensive analysis of text-centric forgeries. Participants are required to develop systems that automatically generate a structured forensic analysis report in Markdown format for a given text-rich image.

Defense Track

Forgery Analysis Report Generation

  • Requiring the generation of comprehensive reports that integrate detection, spatial grounding, and natural language explanation.

Evaluation Metrics

The evaluation measures performance across four distinct dimensions:

  • Detection (SDet): We use the standard F1-Score to measure the trade-off between Precision and Recall.
  • Grounding (SLoc): To evaluate localization precision, we calculate the Pixel-level F1-Score (mF1) and mean Intersection over Union (mIoU) based on the intersection between predicted masks and Ground Truth masks, strictly adhering to the protocol in TruFor.
  • Explanation (SExp): This metric evaluates the linguistic quality and semantic fidelity of the generated explanation. We utilize BERTScore to calculate the semantic similarity between the participant's generated text and the expert-annotated ground truth explanation.
  • Report Quality Rubrics (SRep): We employ an advanced LLM Judge (e.g., Qwen3-MAX or GPT-4o) to conduct a rubric-based evaluation. We define fine-grained scoring rubrics (normalized to 0-100) covering three critical dimensions: Factuality (accuracy of verdict and evidence), Reasoning (logical deduction from visual clues), and Completeness (coverage of manipulated regions and format compliance).

The final ranking is determined by a weighted sum of the four components:

Final Score Formula

Submission Formats

Forgery Analysis Report Generation

To ensure seamless automated evaluation, all participants must follow a unified submission protocol. The final submission must be a single compressed file named prediction.jsonl.gz.

1. File Format & Packaging

The submission must be a JSON Lines (.jsonl) file where each line is a valid JSON object representing one test image.

  • image_name: (String) The exact filename of the test image.
  • report: (String) The complete forensic analysis report in Markdown format, embedded as an escaped string.

Submission Workflow:

  1. Generate the Markdown report for each image.
  2. Package image_name and report into a .jsonl file.
  3. Compress the file using Gzip: gzip prediction.jsonlprediction.jsonl.gz.

2. Report Structure Requirements

The report string must strictly adhere to the following schema. The LLM Judge relies on specific tags (e.g., [...]) to extract data for scoring.

I. Overall Assessment

This section provides the high-level verdict.

  • [Conclusion]: Clearly declare the status as FORGED or AUTHENTIC.
  • [RISK_SCORE]: A numerical confidence score (0–100) representing the likelihood of manipulation.

II. Detailed Anomaly Analysis

For each detected forgery, participants must create an anomaly entry (e.g., ### ANOMALY_001). Each entry must contain:

  • [GROUNDING]: Normalized bounding box coordinates in the format [xmin, ymin, xmax, ymax].
  • [REASON]: A natural language explanation detailing visual artifacts (e.g., clumsy patching, noise inconsistency) and semantic contradictions (e.g., logical errors, identity fraud).

III. Summary

A final synthesis of the findings, summarizing how the identified anomalies collectively support the overall assessment.

3. Parsing & Evaluation Notice (Strict Compliance)

  • Schema Strictness: Reports failing to use the exact tags (e.g., using [Result] instead of [Conclusion]) will result in parsing failures and a score of zero for that sample.
  • Coordinate Accuracy: Grounding coordinates must be normalized relative to the image dimensions.
  • String Escaping: Since the Markdown report contains newlines (\n) and quotes ("), ensure the string is correctly escaped within the JSON object.

4. Comprehensive Submission Example

Single Line in prediction.jsonl (Visual Representation):

{
  "image_name": "sample_document_001.jpg",
  "report": "# FORGERY ANALYSIS REPORT\n\n**Overall Assessment:**\n    **[Conclusion]:** FORGED\n    **[RISK_SCORE]:** 73\n\n---\n\n## DETAILED ANOMALY ANALYSIS\n\n### ANOMALY_001: Visual Clumsy Alteration\n[GROUNDING]: [1081, 933, 1288, 998]\n[REASON]: A crude, solid black rectangular block has been applied to obscure the phone number. The sharp edges and uniform color create a clear discontinuity with the surrounding texture.\n\n### ANOMALY_002: Logical Fraud\n[GROUNDING]: [1372, 585, 1630, 655]\n[REASON]: The document states the meeting is at '5:00 a.m.', which contradicts standard business practice. The font of 'a.m.' shows slight misalignment, suggesting digital alteration.\n\n---\n\n## SUMMARY\nThe document identified 2 distinct anomalies. The forgery pattern involves a mix of crude redactions and logical inconsistencies.\n\n**END OF REPORT**"
}

5. Summary of Mandatory Tags

Category Mandatory Markdown Tag Expected Value
Verdict **[Conclusion]:** FORGED or AUTHENTIC
Confidence **[RISK_SCORE]:** Integer 0 to 100
Location [GROUNDING]: [xmin, ymin, xmax, ymax]
Explanation [REASON]: String (Natural Language)