Publications Dataset

Officially published, legitimate, and copyrighted ebooks across major disciplines.

Web Novels Dataset

Hand-picked high-quality Chinese web novels; fully customizable for specific needs.

Parallel Corpus Dataset

Large-scale multilingual parallel sentence pairs and document-level parallel pairs.

Scripts Dataset

Chinese and English scripts across film/TV/stage/anime/comedy and more.

Art Auction Image–Text Dataset

Fine art & auction image-text pairs with rich metadata; 1080P images.

Educational Video–Text Dataset

Lecture videos with timestamped subtitles (MD LaTeX / SRT phonetic formulas).

Video Q&A Dataset

Each video includes 2 questions (MCQ + short answer) for reasoning and comprehension.

Text–Image Paired Dataset

Process-aligned image-text groups for instruction following and consistency tasks.

Front-End Coding Dataset

Runnable front-end projects: websites, dashboards, admin panels, games, and 3D scenes.

Professional Certification Exam Dataset

Exam questions across public security, civil service, medical, law, engineering, finance, and more.

K12 Exam Questions Dataset (With Images)

Multi-subject questions with structured fields for difficulty, analysis, subject, and more.

Programming Competition Problems Dataset

Algorithmic problems with multi-language solutions and optional images per item.

Multilingual Competition Questions Dataset

Real competition questions with structured fields for question, answer, and explanation.

Multilingual Handwritten OCR Dataset

OCR dataset covering 12 languages for text spotting and document/layout understanding.

English & LCTLs Academic Papers Dataset

Academic papers across disciplines and degrees with strong long-context coverage.

Technical Documentation Dataset

Latest technical documentation from installation to operation and maintenance workflows.

3-Model High-Difficulty Test Questions Dataset

Undergrad+ science questions validated via three models with rich annotation fields.

High-Difficulty Physics Test Questions Dataset (ChatGPT o3)

Undergrad+ physics; o3 attempts reasoning 10 times; average pass rate < 30%.

Original Informatics Competition Questions Dataset

NOI-benchmark difficulty with complete per-question package: code, tests, and solutions.

Structured High-Emotional-Intelligence Dataset

High-EQ framework with evaluation criteria and multi-turn dialogues across 500+ categories.

Publications Dataset

Officially published, legitimate, and copyrighted ebooks across major disciplines.

Web Novels Dataset

Hand-picked high-quality Chinese web novels; fully customizable for specific needs.

Parallel Corpus Dataset

Large-scale multilingual parallel sentence pairs and document-level parallel pairs.

Scripts Dataset

Chinese and English scripts across film/TV/stage/anime/comedy and more.

Art Auction Image–Text Dataset

Fine art & auction image-text pairs with rich metadata; 1080P images.

Educational Video–Text Dataset

Lecture videos with timestamped subtitles (MD LaTeX / SRT phonetic formulas).

Video Q&A Dataset

Each video includes 2 questions (MCQ + short answer) for reasoning and comprehension.

Text–Image Paired Dataset

Process-aligned image-text groups for instruction following and consistency tasks.

Front-End Coding Dataset

Runnable front-end projects: websites, dashboards, admin panels, games, and 3D scenes.

Professional Certification Exam Dataset

Exam questions across public security, civil service, medical, law, engineering, finance, and more.

K12 Exam Questions Dataset (With Images)

Multi-subject questions with structured fields for difficulty, analysis, subject, and more.

Programming Competition Problems Dataset

Algorithmic problems with multi-language solutions and optional images per item.

Multilingual Competition Questions Dataset

Real competition questions with structured fields for question, answer, and explanation.

Multilingual Handwritten OCR Dataset

OCR dataset covering 12 languages for text spotting and document/layout understanding.

English & LCTLs Academic Papers Dataset

Academic papers across disciplines and degrees with strong long-context coverage.

Technical Documentation Dataset

Latest technical documentation from installation to operation and maintenance workflows.

3-Model High-Difficulty Test Questions Dataset

Undergrad+ science questions validated via three models with rich annotation fields.

High-Difficulty Physics Test Questions Dataset (ChatGPT o3)

Undergrad+ physics; o3 attempts reasoning 10 times; average pass rate < 30%.

Original Informatics Competition Questions Dataset

NOI-benchmark difficulty with complete per-question package: code, tests, and solutions.

Structured High-Emotional-Intelligence Dataset

High-EQ framework with evaluation criteria and multi-turn dialogues across 500+ categories.