Tuning the pipeline

Once you've produced a few briefings and know what the defaults feel like, you'll usually want to tune the pipeline — which models run where, how many papers get deep-analysed, how many calls fire in parallel, which optional stages run at all. All of those knobs live in the Settings panel (sidebar → Settings), and this page walks through them.

Each stage costs roughly 10× more per paper than the one before it, so most of the tuning game is making sure the expensive stages only see the papers worth spending on. The Settings panel is organised around that idea too: the model slots come first, the cheap stages sit in small text boxes under Advanced Options, and the review gates live at the bottom.

The mental model

The pipeline is a waterfall. Each stage narrows the set of papers before the next stage runs, so cheap models can afford to be loose on hundreds of abstracts while expensive models only ever see the handful that make it through. Rough per-paper costs:

Stage	What runs	Rough per-paper cost
Stage 1 · fetch	arXiv API	free
Stage 2 · filter	YES / MAYBE / NO triage	fractions of a cent
Stage 3 · scoring	0.0–10.0 with justification	a fraction of a cent
Stage 3.5 · post	consistency pass on top N	a fraction of a cent
Stage 4 · PDFs	deep read of top-ranked papers	a few cents each
Stage 5 · briefing	synthesis + hallucination audit	a few cents per run

Tuning comes down to: get the right papers into Stage 4, don't overspend on the stages before it, and don't let the wrong papers through. The rest of this page goes through the knobs.

Per-stage model slots

Each stage has its own model slot. Current defaults are all Google:

Slot	Setting label	Default	What it drives
`filterModel`	Quick Filter Model (Stage 1)	`gemini-3.1-flash-lite`	Stage 2 YES/MAYBE/NO verdicts
`scoringModel`	Abstract Scoring Model	`gemini-3-flash`	Stage 3 scoring (and the optional Stage 3.5 consistency pass)
`pdfModel`	Deep PDF Analysis Model	`gemini-3.1-pro`	Stage 4 full-text read
`briefingModel`	Briefing Model	`gemini-3.1-pro`	Stage 5 synthesis, the hallucination audit, and the profile-refinement flow
`quickSummaryModel`	Quick-summary model	`gemini-3.1-flash-lite`	~300-word pre-read per paper, generated in parallel just before synthesis
`notebookLMModel`	(set when generating podcast)	unset until first podcast	Optional NotebookLM document bundle

The Settings panel's own stage numbering is a little different from the numbering these docs use (the UI calls the filter "Stage 1"); the labels above are what you'll literally see on screen. All slots are disabled while a run is in progress.

Three rules of thumb for model selection:

Spend on what reads the most

PDF analysis reads whole papers — typically tens of thousands of tokens each. That's where a stronger model earns its keep: better grounding, more faithful summaries, fewer claims that have to be flagged downstream. Upgrading pdfModel tends to move the briefing quality more than upgrading anything else.

Synthesis reads all the final-round papers plus your profile. It benefits from a capable model too, but it's a single call per run, so the bill for upgrading briefingModel is much smaller than upgrading pdfModel.

Don't spend on what filters

The quick filter runs on every fetched paper, often a hundred or more per day. A cheap, fast model is exactly right for this — it only has to decide "plausibly relevant" or "plausibly not." gemini-3.1-flash-lite, gemini-2.5-flash-lite, claude-haiku-4.5, or gpt-5.4-nano all work well.

Scoring is similar: dozens to hundreds of abstracts, and most of what the model is doing is ranking within them. A mid-tier model is usually enough. gemini-3-flash or claude-sonnet-4.6 are good picks.

Mixing providers is fine

Nothing in Aparture requires one provider per run. A common pattern is Gemini Flash-Lite for filtering (speed and price), Claude Sonnet for scoring, Claude Opus for PDFs, and Gemini for synthesis. As long as every slot has a valid key for the provider it points at, the pipeline doesn't care.

See Model selection for the full registry, the per-stage recommendations, and example lineups (including a free-tier-only setup and a three-provider mix).

Parallelism

Five concurrency knobs control how many calls fire in parallel at each batch-capable stage. All default to 3 (5 for quick-summaries), all clamp 1–20.

Parallel filter calls

Parallel Filter Calls (default 3, range 1–20) controls how many /api/quick-filter batches fire concurrently. Since the filter runs on every fetched paper, this is the knob that matters most on large runs — a 250-paper filter on filterConcurrency: 1 takes ~85 sequential batches, and bumping to 5 cuts wall-clock roughly fivefold.

Anthropic gets a single-flight cache warmup: the first batch completes alone so the prompt-cache entry is primed before siblings start. Google and OpenAI start all workers immediately.

Parallel scoring calls

Parallel Scoring Calls (default 3, range 1–20) controls how many /api/score-abstracts batches fire concurrently. Scoring runs on the papers that passed filter, so the knob's payoff scales with the filter's pass-through rate — with a ~60% pass rate, a 250-paper run means ~50 scoring batches at scoringBatchSize: 3.

Same Anthropic cache-warmup behavior as filter.

Parallel review calls

Parallel Review Calls (default 3, range 1–20) controls how many /api/rescore-abstracts batches fire concurrently during the optional Stage 3.5 post-processing consistency pass. Only active when Enable Post-Processing is on.

Parallel PDF analyses

Papers to Analyze caps how many PDFs get read per run (default 30, range 1–100). Parallel PDF Analyses (default 3, range 1–20) controls how many of those PDF analyses fire concurrently. The arXiv download step is serialised server-side with a ~5-second spacing to respect arXiv's rate limits, but once the PDFs are in hand the LLM calls themselves run in parallel.

On Anthropic providers, the first worker finishes its call alone before the others begin, so the prompt-cache entry is primed once instead of racing.
On Google and OpenAI, all workers start immediately.

Parallel quick-summaries

Parallel calls under the Quick-summary model (default 5, range 1–20) sets how many /api/analyze-pdf-quick calls fan out at once during briefing prep. These produce the ~300-word pre-read summaries that expand inline on paper cards.

Tuning guidance

Three is a conservative default that works across provider tiers. Raising it to 5–8 is usually fine on higher provider tiers (Anthropic Tier 3+, Google Tier 2+, OpenAI Tier 3+); drop to 1 if you start seeing 429s on a rate-limit-sensitive tier. Free-tier Google has aggressive RPM caps on Flash-Lite, so start at 2–3 if you're relying on the free tier for filtering.

Aparture self-heals on transient 429s: when one batch hits a rate limit, every concurrent worker for the same provider pauses for the Retry-After window (or up to 60 s if the provider doesn't tell us), then retries. The retry ladder uses up to maxRetries retries (default 4 → 5 attempts) with exponential + jittered backoff. End-of-stage activity-log lines call out how many batches were rate-limited so you can see whether to drop concurrency or upgrade tier. If your filter model is a free-tier Gemini Flash-Lite slot and your concurrency × batch count is likely to exceed 60 RPM, Aparture also emits a pre-flight warning before the stage starts.

Batch sizes

Each stage batches papers into API calls to avoid per-paper overhead. Bigger batches are faster but risk hitting context limits or giving each paper less careful attention.

Setting	Default	Reasonable range	Notes
Filter Batch Size	3	3–10	Short prompts; batching mostly helps throughput
Scoring Batch Size	3	3–8	Each paper carries a full abstract + justification
Review Batch Size	5	3–10	Post-processing compares multiple papers at once
PDF analysis batch	1	(always 1)	One PDF per call — see Parallel PDF Analyses above

Most users never touch these. Raising batch sizes tends to help when you're on a fast provider with generous rate limits, your abstracts are short, or you're using a long-context model. Lowering them tends to help when you're hitting rate-limit errors or noticing sloppier scoring at larger batch sizes.

Which papers advance where

Two settings govern which papers move through the waterfall.

Filter verdicts to advance

Under Categories to Process (Advanced Options → Filter Options), three checkboxes control which filter verdicts move on to scoring. Default is YES + MAYBE. Dropping MAYBE saves tokens on borderline papers but also skips anything the filter wasn't sure about, which tends to include the most interesting edge cases. Dropping YES is rarely what you want.

Papers to Analyze

Papers to Analyze (default 30, range 1–100) is a hard cap on how many papers get PDF-analysed. Stage 3 ranks every scored paper; Stage 4 takes the top N by score. If the filter lets through 200 papers and this is set to 30, the bottom 170 will appear in the scored list but won't be deep-analysed.

Tighten it when cost matters more than comprehensiveness. Raise it when you want more papers read in depth and have the budget for it. The per-paper cost at Stage 4 is roughly 10× everything upstream, so this is the most cost-sensitive single knob in the panel.

No separate score threshold

Aparture doesn't currently expose a "minimum score to advance" setting — the cutoff is just "top N by score," controlled by Papers to Analyze. A paper with a 3.5 will get PDF-analysed if it's in the top 30 on a thin day, and a 6.5 will be skipped if it's position 31 on a busy one.

Optional stages

Post-processing

Enable Post-Processing (default on) runs an extra consistency pass on the top N scored papers (default 50) after Stage 3. Papers are compared head-to-head in small batches and their scores adjusted up or down to bring outliers into line with the broader ranking.

The reason it exists: Stage 3 scores each paper independently, which sometimes produces inconsistencies — two papers of similar quality can land a point apart just because they were in different batches. The post-processing pass irons that out.

Turn it off if run-time matters more than ranking quality, or if you're rapidly iterating on profile changes and want faster feedback. Leave it on for most production runs.

Quick filter

Whether Stage 2 runs at all is governed by the useQuickFilter config flag (default on). It's not currently exposed in the Settings UI — the flag lives in the app's saved state and would need to be edited there to turn off — but it exists. Skipping the filter is reasonable when your categories are narrow enough that the day's volume is already manageable (under ~30 papers) and the filter is just overhead. For most users, especially on high-volume categories like cs.LG or cs.CL, the filter cuts scoring costs by a meaningful margin and is worth leaving on.

Review & confirmation

Five checkboxes under Review & confirmation govern when the pipeline pauses and when it retries.

Checkbox	Default	What it does
Pause after filter to review overrides	on	Stops at Gate 1 so you can move papers between YES / MAYBE / NO buckets
Pause before deep analysis	on	Stops at Gate 2 (after scoring, before PDF reads) so you can star borderline papers into the PDF set or exclude obvious duds from it
Pause before briefing to review scores and add feedback	on	Stops at Gate 3 so you can star, dismiss, or comment before synthesis runs
Auto-retry briefing if hallucination check returns YES	on	Reruns synthesis once with a retry hint when the audit flags claims
Auto-retry briefing if hallucination check returns MAYBE	off	Same, but for the more ambiguous verdict

pauseBeforeDeepAnalysis

pauseBeforeDeepAnalysis (default: on) fires after Stage 3.5 (post-processing) and before Stage 4 (PDF analysis). It's the most cost-sensitive gate: PDF analysis is roughly 10× more expensive per paper than everything upstream, so catching the wrong papers here has a real budget impact.

The gate shows the full scored list in three groups — Will PDF (top-N), Borderline, and Low score — with star and exclude controls per row. Stars pull papers into the PDF set from any group; excludes remove them from it. The final PDF set is (top-N) ∪ {starred papers} − {excluded papers}, so you can exceed maxDeepAnalysis deliberately if you want more papers analysed on a particular day.

A free-text "+ feedback on this scoring round" field at the bottom of the gate lets you leave a run-level note — observations about the scoring spread, an unusual batch composition, or a pattern you want the refinement flow to see. This is distinct from per-paper comments; it flows into the suggest-profile prompt as a round-level observation.

See Review gates → Gate 2 for the full walkthrough.

Skip remaining gates this run

A Skip remaining gates this run link appears near the bottom of all three gates. Clicking it bypasses any subsequent default-on gates for this run only — settings are not modified. Useful for an unattended run when you don't want to toggle settings off and back on: engage with whichever gate you want to review, then skip the rest.

The three pause gates are the training wheels. Leaving them on is the right starting point; turning them off is something you'll usually do deliberately once you've seen enough runs to know the filter and the scores are landing well for your profile. Review gates goes deeper on what each gate shows and when to skip which.

The retry checkboxes trade tokens for quality. A retry is a full second synthesis pass plus a second audit — cheap compared to Stage 4, but not free. Retry-on-YES is usually worth keeping on (a confident "hallucinations detected" verdict is rare but worth fixing). Retry-on-MAYBE is more defensible to turn off — the MAYBE verdict often catches paraphrases the auditor is unusually strict about rather than genuine hallucinations. The retry path fires at most once per briefing either way.

arXiv fetching

Five Advanced-settings controls govern how Aparture talks to arXiv: ingestion mode, window semantics, min papers per subcategory, fill-up lookback days, cache TTL. See arXiv ingestion for what each does and when to change defaults.

Duplicate detection

Aparture remembers every arxivId fetched in the last 90 days across all your past runs (the seen-papers index, stored in your browser's localStorage). By default, the Remove duplicate papers toggle in Settings drops these from new runs before any LLM call — saving tokens and avoiding re-reading work you've already seen. Turn it off if you'd rather keep duplicates visible; they'll be marked with a small "seen ‹date›" badge on filter, scoring, and briefing cards.

The first time you run Aparture after upgrading, the index is seeded from your existing session files on disk. While that one-time migration is in flight, the status line will show seen-papers index still loading and the run is treated as having zero duplicates; the next run is fully deduped.

Memory rolls off automatically after 90 days. Re-running with the same daysBack after a long pause will surface old papers again. If you ever want to clear the index manually, delete the aparture-seen-papers-index key from your browser's DevTools → Application → Local Storage; the next mount will rebuild it from sessions on disk.

A few common configurations

The defaults ship tuned sensibly — all-Google models, both pause gates on, retry-on-YES on, retry-on-MAYBE off, Papers to Analyze at 30. From there:

Cost-sensitive. Switch filter and scoring to the cheapest Flash-Lite or Nano tier. Drop Papers to Analyze to 10–15. Turn off post-processing. Turn off retry-on-MAYBE if it isn't already.

Highest quality. Put a top-tier model on pdfModel and briefingModel (Claude Opus 4.7, Gemini 3.1 Pro, or GPT-5.4). Keep scoringModel on a mid-tier model like Claude Sonnet 4.6 or Gemini 3 Flash. Keep post-processing on. Raise Papers to Analyze to 50+. Turn both retry checkboxes on.

Unattended. Turn off all three pause gates (nobody's there to click through them), or use the Skip remaining gates this run link on the first gate you encounter to bypass the rest without changing settings. Consider turning off both retry checkboxes — an unattended run can't weigh the trade-off when the audit flags something, and a retry ties up an extra few minutes. Tighten Papers to Analyze to something predictable so cost doesn't swing with daily paper volume.

Where to see what a past briefing used

Every briefing archives its full configuration in the Generation details panel — the small expandable section below the briefing itself. When you're looking at an old briefing and wondering which model was on the PDF slot that day, or whether the hallucination retry fired, open Generation details and check the fields there. This is also useful when a run produces a weird briefing and you want to compare against a known-good run to isolate what changed. See Reading a briefing → Generation provenance for what the panel shows.

Review gates → — what each pause gate shows and when to turn it off.

Also worth reading:

You want the system view of each stage, not just the knobs. → The pipeline
You want to compare providers and pick a cost/quality combination. → Model selection
You want to tune the synthesis prompt itself, not just which model runs it. → Prompts

Tuning the pipeline ​

The mental model ​

Per-stage model slots ​

Spend on what reads the most ​

Don't spend on what filters ​

Mixing providers is fine ​

Parallelism ​

Parallel filter calls ​

Parallel scoring calls ​

Parallel review calls ​

Parallel PDF analyses ​

Parallel quick-summaries ​

Tuning guidance ​

Batch sizes ​

Which papers advance where ​

Filter verdicts to advance ​

Papers to Analyze ​

Optional stages ​

Post-processing ​

Quick filter ​

Review & confirmation ​

pauseBeforeDeepAnalysis ​

Skip remaining gates this run ​

arXiv fetching ​

Duplicate detection ​

A few common configurations ​

Where to see what a past briefing used ​

Next ​