FineScope couples domain-aware data selection with structured pruning and self-distillation fine-tuning to produce compact, domain-specialized language models from large unlabeled corpora.
FineScope starts from a small set of user-provided seed examples, trains sparse autoencoders (SAEs) on intermediate model activations to automatically extract semantically aligned examples from large unlabeled corpora. The curated dataset then guides structured pruning to preserve domain-relevant substructures and supports self-distillation fine-tuning to recover task-specific performance.
The goal is not just to shrink a model after adaptation. The goal is to specialize the model around the target domain so that pruning preserves useful capacity instead of removing it blindly.
Experiments across STEM, humanities, social sciences, math, and coding domains show that FineScope consistently outperforms baseline fine-tuning approaches while enabling up to 35% parameter pruning. On math reasoning tasks, it achieves an average improvement of 11.50 points across pruned models.
FineScope uses seed examples, SAE training on intermediate activations, data retrieval via cosine similarity in SAE code space, structured pruning, and teacher-guided distillation to produce compact domain-specialized models.
FineScope first identifies the most domain-relevant unlabeled data rather than pruning with a generic corpus. SAEs trained on intermediate activations extract semantically aligned examples from large unlabeled corpora using only a small seed set.
The curated set drives structured pruning via gradient-based attribution, and teacher-guided self-distillation fine-tuning so the compressed student retains substructures critical for the target domain.
The results below summarize improvements on MMLU domain slices, math reasoning (Pre-algebra, Algebra, Counting & Probability), and coding benchmarks after SAE-guided curation and pruning-aware fine-tuning.
Table 1 — MMLU Domain Results (STEM / Social Sciences / Humanities)
| Model | Pruned | Dataset | STEM | Social Sci. | Humanities |
|---|---|---|---|---|---|
| Vicuna-7B | ✗ | – | 33.10 | 40.23 | 43.69 |
| Vicuna-7B | ✓ | Alpaca | 30.61 | 35.44 | 36.11 |
| Vicuna-7B | ✓ | Full-OI | 29.09 | 35.43 | 36.19 |
| Vicuna-7B | ✓ | FineScope | 31.12 | 36.23 | 36.55 |
| MathCoder-CL-7B | ✗ | – | 31.14 | 11.11 | 9.22 |
| MathCoder-CL-7B | ✓ | Alpaca | 25.14 | 13.11 | 12.33 |
| MathCoder-CL-7B | ✓ | Full-OI | 23.91 | 12.81 | 12.67 |
| MathCoder-CL-7B | ✓ | FineScope | 25.89 | 13.81 | 13.68 |
| LLaMA 3.1-8B | ✗ | – | 48.01 | 49.61 | 49.32 |
| LLaMA 3.1-8B | ✓ | Alpaca | 38.22 | 40.19 | 39.79 |
| LLaMA 3.1-8B | ✓ | Full-OI | 39.32 | 39.91 | 40.93 |
| LLaMA 3.1-8B | ✓ | FineScope | 40.55 | 41.07 | 41.19 |
| GPT-3 (6.7B) | ✗ | – | 35.10 | 49.20 | 42.10 |
| OLMO-7B | ✗ | – | 22.19 | 31.01 | 30.26 |
| GPT-3 (175B) | ✗ | – | 36.70 | 50.40 | 40.80 |
Table 2 — Math Subdomain Results (Pre-Algebra / Algebra / Counting & Probability)
| Model | Pruned | Dataset | Pre-Algebra | Algebra | Count. & Prob. |
|---|---|---|---|---|---|
| Vicuna-7B | ✗ | – | 14.31 | 10.17 | 8.11 |
| Vicuna-7B | ✓ | Alpaca | 5.56 | 0.30 | 0.21 |
| Vicuna-7B | ✓ | Full-Math | 12.73 | 8.91 | 5.48 |
| Vicuna-7B | ✓ | FineScope | 12.91 | 10.12 | 7.01 |
| MathCoder-CL-7B | ✗ | – | 11.60 | 16.77 | 13.38 |
| MathCoder-CL-7B | ✓ | Alpaca | 1.29 | 6.94 | 3.33 |
| MathCoder-CL-7B | ✓ | Full-Math | 9.01 | 12.72 | 10.05 |
| MathCoder-CL-7B | ✓ | FineScope | 10.54 | 15.51 | 11.64 |
| LLaMA 3.1-8B | ✗ | – | 32.77 | 29.87 | 20.35 |
| LLaMA 3.1-8B | ✓ | Alpaca | 9.23 | 5.56 | 9.10 |
| LLaMA 3.1-8B | ✓ | Full-Math | 30.72 | 31.67 | 18.34 |
| LLaMA 3.1-8B | ✓ | FineScope | 30.83 | 32.21 | 19.34 |
| GPT-3 (13B) | ✗ | – | 6.80 | 5.30 | 4.50 |
| GPT-3 (175B) | ✗ | – | 7.70 | 6.00 | 4.70 |
Table 3 — Coding Results (HumanEval / MBPP)
| Model | Pruned | Dataset | HumanEval | MBPP |
|---|---|---|---|---|
| Vicuna-7B | ✗ | – | 0.14 | 0.03 |
| Vicuna-7B | ✓ | Alpaca | 0.07 | 0.00 |
| Vicuna-7B | ✓ | Full-OI | 0.09 | 0.05 |
| Vicuna-7B | ✓ | FineScope | 0.13 | 0.10 |
| MathCoder-CL-7B | ✗ | – | 0.03 | 0.01 |
| MathCoder-CL-7B | ✓ | Alpaca | 0.08 | 0.05 |
| MathCoder-CL-7B | ✓ | Full-OI | 0.10 | 0.09 |
| MathCoder-CL-7B | ✓ | FineScope | 0.11 | 0.10 |
| LLaMA 3.1-8B | ✗ | – | 0.50 | 0.46 |
| LLaMA 3.1-8B | ✓ | Alpaca | 0.25 | 0.13 |
| LLaMA 3.1-8B | ✓ | Full-OI | 0.30 | 0.29 |
| LLaMA 3.1-8B | ✓ | FineScope | 0.49 | 0.43 |
These analysis points summarize why SAE-guided data curation and domain-conditioned pruning improve efficiency while preserving domain-relevant performance.
Sparse autoencoder encoder codes give a compact retrieval space for finding semantically aligned examples from large unlabeled corpora using only a small seed set (~10 examples). Cosine similarity in SAE code space captures feature-level domain alignment rather than surface-form similarity.
Top-K activation coordinates are selected by Jacobian row-norm sensitivity — coordinates that are causally responsive to input content. This outperforms magnitude, variance, and PCA-based selectors for SAE training, as validated in ablations.
After pruning, teacher-guided distillation (TGD) helps the compressed student recover domain performance. TGD yields gains of 1.9% in STEM, 5.8% in Social Sciences, and 10.26% in Humanities over standard fine-tuning alone.
Sensitivity analysis varying initial seeds from 5 to 25 shows only marginal accuracy differences, confirming that FineScope does not require heavy user input. Performance largely plateaus after ~15–20 initial seeds.
Unified pipeline that connects domain-specific data selection with model pruning and fine-tuning to support efficient adaptation of large language models.
Novel SAE-guided data selection using sparse autoencoders trained on intermediate activations to identify semantically relevant samples from large unlabeled corpora, starting from only a small seed set.
Modified self-distillation fine-tuning (TGD) that transfers knowledge from the original unpruned model to the pruned student, helping recover domain-relevant behaviors lost during compression.
Consistent improvements across domains and model families — MMLU STEM/Social Sciences/Humanities, math subdomains, coding benchmarks — with up to 35% parameter reduction across Vicuna, MathCoder-CL, LLaMA 3.1, and Qwen model families.
If you use this work, please cite the FineScope submission below.