Data Selection for Multi-turn Dialogue Instruction Tuning
MDS selects training-effective multi-turn dialogues by combining global semantic coverage with local structural quality, building compact supervision sets that are more diverse, coherent, and reliable for dialogue instruction tuning.
Why multi-turn data selection needs a different design
Many selectors score isolated instruction-response pairs. MDS instead treats each dialogue as a coherent trajectory and selects full conversations that are both semantically representative and structurally reliable.
Multi-turn corpora are structurally noisier
Later turns may drift away from the original intent, end in chitchat tails, or violate the requested response format. These errors accumulate across the dialogue and are hard to detect at the single-turn level.
Turn-level scoring misses trajectory structure
Applying single-turn selectors to dialogue turns ignores cross-turn dependencies such as history anchoring, topic continuity, and query-answer form alignment.
MDS selects dialogues, not disconnected turns
By combining semantic coverage with dialogue-level quality, MDS builds a compact subset that suppresses noisy conversations while preserving long-tail intents and better-formed supervision.
A two-stage selector for semantic diversity and dialogue structure
MDS first preserves broad intent coverage in a dialogue-trajectory space, then reranks candidate dialogues using local structural signals that capture history grounding, information progress, and form consistency.
Trajectory-aware semantic coverage
User queries are embedded into a dialogue-level trajectory representation. MDS clusters this space into semantic bins and performs coverage-aware bin-wise selection to avoid collapsing onto a few frequent patterns.
- User-query trajectory embeddings
- Semantic bins via K-means
- Coverage and anti-redundancy within each bin
Structural reliability inside each dialogue
MDS scores candidate dialogues using entity-grounded topic grounding and information progress, together with a hard form-consistency filter that removes badly aligned query-answer pairs.
- Entity coherence and anti-redundancy
- History anchoring and information progress
- Query-answer form consistency
Consistent gains across backbones, benchmarks, and domains
On the general-domain Baize pool, MDS achieves the best average rank on both LLaMA3-8B-Instruct and Qwen3-8B-Instruct. It also improves domain-specific training on Banking while retaining cross-domain transfer.
| Method | MT-Eval L-E |
MT-Eval G-E |
MT-Eval Ent-F1 |
MT-Eval Cos |
ConsistentChat L-E |
ConsistentChat G-E |
ConsistentChat Ent-F1 |
ConsistentChat Cos |
TopDial L-E |
TopDial G-E |
TopDial Ent-F1 |
TopDial Cos |
Avg. Rank ↓ |
|---|
MDS also improves Banking selection without sacrificing transfer
| Method | Banking Test G-E |
Banking Test Ent-F1 |
ConsistentChat G-E |
ConsistentChat Ent-F1 |
|---|
Where the gains come from
Beyond main results, MDS shows stronger long-dialogue robustness, preserves order-sensitive cross-turn structure, and shifts the selected training pool toward cleaner, more grounded conversations.
Long-dialogue robustness
As turns get deeper, MDS preserves stronger entity coverage and semantic fidelity than competing selectors.
Error-type difference sets
MDS-only dialogues contain more clean examples and substantially fewer topic-drift and unsupported errors.
Controlled perturbation analysis
The paper also includes an order-perturbation study on the same 10K dialogues selected by MDS. Shuffling turns degrades order-sensitive consistency, especially on the high-history-dependency subset, supporting the design choice of explicitly modeling cross-turn anchoring and anti-redundancy.
Reproducibility at a glance
The project centers on offline dialogue selection, compact subset construction, and multi-turn evaluation under a fixed dialogue budget.
Useful links
Deployment note
This page is provided as a single self-contained index.html, which makes local preview and GitHub Pages deployment straightforward.
BibTeX
If you find this work useful, please cite the paper below.
@misc{li2026dataselectionmultiturndialogue,
title = {Data Selection for Multi-turn Dialogue Instruction Tuning},
author = {Bo Li and Shikun Zhang and Wei Ye},
year = {2026},
eprint = {2604.07892},
archivePrefix= {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2604.07892}
}