preprint

FineScope uses SAE-guided curation to build domain-specific pruned LLMs.

FineScope couples domain-aware data selection with structured pruning and self-distillation fine-tuning to produce compact, domain-specialized language models from large unlabeled corpora.

Chaitali Bhattacharyya1, Hyunsei Lee2, Junyoung Lee1, Shinhyoung Jang2, Il Hong Suh3, Yeseong Kim1
1 POSTECH   2 DGIST   3 COGA Robotics
Read Abstract See Method Cite
Abstract

Data selection and pruning are treated as one coupled problem.

FineScope starts from a small set of user-provided seed examples, trains sparse autoencoders (SAEs) on intermediate model activations to automatically extract semantically aligned examples from large unlabeled corpora. The curated dataset then guides structured pruning to preserve domain-relevant substructures and supports self-distillation fine-tuning to recover task-specific performance.

The goal is not just to shrink a model after adaptation. The goal is to specialize the model around the target domain so that pruning preserves useful capacity instead of removing it blindly.

Experiments across STEM, humanities, social sciences, math, and coding domains show that FineScope consistently outperforms baseline fine-tuning approaches while enabling up to 35% parameter pruning. On math reasoning tasks, it achieves an average improvement of 11.50 points across pruned models.


Method

The FineScope pipeline couples retrieval, pruning, and specialization.

FineScope uses seed examples, SAE training on intermediate activations, data retrieval via cosine similarity in SAE code space, structured pruning, and teacher-guided distillation to produce compact domain-specialized models.

1

Curate before compression

FineScope first identifies the most domain-relevant unlabeled data rather than pruning with a generic corpus. SAEs trained on intermediate activations extract semantically aligned examples from large unlabeled corpora using only a small seed set.

2

Prune with domain signal

The curated set drives structured pruning via gradient-based attribution, and teacher-guided self-distillation fine-tuning so the compressed student retains substructures critical for the target domain.

STEP 01
Seed Input
Around 10 user-provided seed examples define the target domain starting point.
STEP 02
SAE Training
Sparse autoencoders are trained on top-K intermediate activations selected via Jacobian sensitivity.
STEP 03
Data Retrieval
Cosine similarity in SAE code space retrieves domain-aligned samples from a large unlabeled corpus.
STEP 04
Prune and Distill
Structured pruning with domain-specific data plus teacher-guided distillation fine-tuning yields a smaller specialized model.
Vicuna-7B
MathCoder-CL-7B
LLaMA 3.1-8B
Qwen 2-7B
Qwen 2.5-7B
Qwen 3-8B
35%
Maximum parameter pruning reported while preserving target-domain quality
FineScope outperforms SelfInstruct baselines at all pruning ratios; SelfInstruct degrades at 25% while FineScope holds stable accuracy up to 35%.

Results

FineScope reports consistent gains across specialization settings.

The results below summarize improvements on MMLU domain slices, math reasoning (Pre-algebra, Algebra, Counting & Probability), and coding benchmarks after SAE-guided curation and pruning-aware fine-tuning.

35%
Parameter pruning
while preserving domain accuracy
+4.45%
Average MMLU gain
over OpenInstruct (Full-OI) baselines
+11.50
Math reasoning lift
average across pruned models
Benchmark Comparison — FineScope (pruned) vs Best Baseline
All values are real numbers from the paper. Toggle to switch benchmark category.
FineScope (pruned + distilled)
Best baseline (Alpaca or Full-corpus, pruned)
Accuracy (%) on MMLU. FineScope uses SAE-curated data; baseline uses Alpaca (pruned).

Table 1 — MMLU Domain Results (STEM / Social Sciences / Humanities)

ModelPrunedDatasetSTEMSocial Sci.Humanities
Vicuna-7B33.1040.2343.69
Vicuna-7BAlpaca30.6135.4436.11
Vicuna-7BFull-OI29.0935.4336.19
Vicuna-7BFineScope31.1236.2336.55
MathCoder-CL-7B31.1411.119.22
MathCoder-CL-7BAlpaca25.1413.1112.33
MathCoder-CL-7BFull-OI23.9112.8112.67
MathCoder-CL-7BFineScope25.8913.8113.68
LLaMA 3.1-8B48.0149.6149.32
LLaMA 3.1-8BAlpaca38.2240.1939.79
LLaMA 3.1-8BFull-OI39.3239.9140.93
LLaMA 3.1-8BFineScope40.5541.0741.19
GPT-3 (6.7B)35.1049.2042.10
OLMO-7B22.1931.0130.26
GPT-3 (175B)36.7050.4040.80

Table 2 — Math Subdomain Results (Pre-Algebra / Algebra / Counting & Probability)

ModelPrunedDatasetPre-AlgebraAlgebraCount. & Prob.
Vicuna-7B14.3110.178.11
Vicuna-7BAlpaca5.560.300.21
Vicuna-7BFull-Math12.738.915.48
Vicuna-7BFineScope12.9110.127.01
MathCoder-CL-7B11.6016.7713.38
MathCoder-CL-7BAlpaca1.296.943.33
MathCoder-CL-7BFull-Math9.0112.7210.05
MathCoder-CL-7BFineScope10.5415.5111.64
LLaMA 3.1-8B32.7729.8720.35
LLaMA 3.1-8BAlpaca9.235.569.10
LLaMA 3.1-8BFull-Math30.7231.6718.34
LLaMA 3.1-8BFineScope30.8332.2119.34
GPT-3 (13B)6.805.304.50
GPT-3 (175B)7.706.004.70

Table 3 — Coding Results (HumanEval / MBPP)

ModelPrunedDatasetHumanEvalMBPP
Vicuna-7B0.140.03
Vicuna-7BAlpaca0.070.00
Vicuna-7BFull-OI0.090.05
Vicuna-7BFineScope0.130.10
MathCoder-CL-7B0.030.01
MathCoder-CL-7BAlpaca0.080.05
MathCoder-CL-7BFull-OI0.100.09
MathCoder-CL-7BFineScope0.110.10
LLaMA 3.1-8B0.500.46
LLaMA 3.1-8BAlpaca0.250.13
LLaMA 3.1-8BFull-OI0.300.29
LLaMA 3.1-8BFineScope0.490.43

Analysis

Why FineScope works

These analysis points summarize why SAE-guided data curation and domain-conditioned pruning improve efficiency while preserving domain-relevant performance.

01

SAE-guided retrieval

Sparse autoencoder encoder codes give a compact retrieval space for finding semantically aligned examples from large unlabeled corpora using only a small seed set (~10 examples). Cosine similarity in SAE code space captures feature-level domain alignment rather than surface-form similarity.

02

Jacobian-aware Top-K selection

Top-K activation coordinates are selected by Jacobian row-norm sensitivity — coordinates that are causally responsive to input content. This outperforms magnitude, variance, and PCA-based selectors for SAE training, as validated in ablations.

03

Teacher-guided recovery

After pruning, teacher-guided distillation (TGD) helps the compressed student recover domain performance. TGD yields gains of 1.9% in STEM, 5.8% in Social Sciences, and 10.26% in Humanities over standard fine-tuning alone.

04

Low seed sensitivity

Sensitivity analysis varying initial seeds from 5 to 25 shows only marginal accuracy differences, confirming that FineScope does not require heavy user input. Performance largely plateaus after ~15–20 initial seeds.


Contributions

What FineScope contributes


Citation

Cite FineScope

If you use this work, please cite the FineScope submission below.

@article{bhattacharyya2025finescope,
  title      = {FineScope: Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation},
  author     = {Bhattacharyya, Chaitali and Lee, Hyunsei and Lee, Junyoung and Jang, Shinhyoung and Kim, Yeseong and others},
  journal    = {arXiv preprint arXiv:2505.00624},
  year       = {2025}
}