Skip to content

Reports & Outputs

Understanding Aparture's output files and report structure.

Output Files

Aparture generates three types of files, all saved to the reports/ directory:

Analysis Report

Filename: YYYY-MM-DD_arxiv_analysis_XXmin.md

The primary output containing complete analysis results.

Example: 2025-10-14_arxiv_analysis_45min.md

Format: Markdown (human-readable, GitHub-compatible)

Size: 50-500 KB depending on paper count and PDF analyses

NotebookLM Document

Filename: YYYY-MM-DD_notebooklm_XXmin.md

Podcast-optimized document structured for audio generation.

Example: 2025-10-14_notebooklm_15min.md

Format: Markdown with special structuring

Size: 30-150 KB

Purpose: Upload to notebooklm.google.com for podcast creation

Audio Podcast

Filename: YYYY-MM-DD_podcast.m4a

AI-generated audio overview (requires CLI automation).

Example: 2025-10-14_podcast.m4a

Format: M4A (audio)

Size: 5-30 MB depending on duration

Duration: 5-30 minutes (configurable)

Podcasts require CLI automation

The web interface can generate the NotebookLM document, but you'll need to upload it manually to Google NotebookLM to create the podcast. Use CLI automation for fully automated podcast generation.

Analysis Report Structure

The analysis report follows a consistent structure:

Header Section

markdown
# arXiv Analysis Report

**Date**: 2025-10-14
**Analysis Duration**: 45 minutes
**Generated by**: Aparture v1.0

## Configuration

- **Categories**: cs.LG, cs.AI, stat.ML (47 papers)
- **Quick Filter**: Enabled (Haiku 4.5)
- **Abstract Scoring**: Enabled (Sonnet 4.5)
- **PDF Analysis**: Top 20 papers (Opus 4.1)

Key information:

  • Run date and duration
  • Categories analyzed
  • Paper counts
  • Models used

Executive Summary

markdown
## Executive Summary

Analyzed 47 recent papers from arXiv. Key findings:

**Top Themes:**

- 12 papers on transformer architectures
- 8 papers on Bayesian methods
- 6 papers on interpretability

**Highlights:**

- Novel attention mechanism (Score: 9.2)
- Scalable inference method (Score: 8.8)
- New benchmark dataset (Score: 8.5)

**Recommendations:**
Priority reading: Papers #1, #3, #5 for immediate relevance.

Provides quick overview of:

  • Paper distribution
  • Major themes
  • Top highlights
  • Reading recommendations

Stage Results

Each processing stage gets a dedicated section:

Quick Filter Results

markdown
## Stage 1: Quick Filter

**Model**: Claude Haiku 4.5
**Duration**: 2 minutes 34 seconds
**Cost**: $0.08

**Results:**

- ✓ YES: 18 papers (38%)
- ~ MAYBE: 12 papers (26%)
- ✗ NO: 17 papers (36%)

**Filtered out**: 17 papers
**Proceeding to scoring**: 30 papers

Shows filtering effectiveness and cost.

Abstract Scoring Results

markdown
## Stage 2: Abstract Scoring

**Model**: Claude Sonnet 4.5
**Duration**: 15 minutes 22 seconds
**Cost**: $1.45

**Score Distribution:**

- 9-10: 3 papers (10%)
- 7-8: 8 papers (27%)
- 5-6: 12 papers (40%)
- 3-4: 7 papers (23%)

**Average Score**: 6.2 / 10

Provides scoring overview and statistics.

PDF Analysis Results

markdown
## Stage 3: PDF Analysis

**Model**: Claude Opus 4.1
**Duration**: 28 minutes 15 seconds
**Cost**: $3.20

**Papers Analyzed**: 20 / 30
**Success Rate**: 100%
**Average PDF Size**: 2.3 MB

Shows deep analysis statistics.

Paper Details

Each paper gets a detailed entry:

markdown
### 1. Novel Attention Mechanism for Transformers

**Score**: 9.2 / 10
**Authors**: Smith et al.
**arXiv**: [2410.12345](https://arxiv.org/abs/2410.12345)
**PDF**: [Download](https://arxiv.org/pdf/2410.12345.pdf)

**Abstract:**
We propose a new attention mechanism that reduces computational
complexity from O(n²) to O(n log n) while maintaining performance...

**Relevance Justification:**
Highly relevant. Addresses key challenge in transformer scaling
with novel approach. Strong empirical results. Builds on recent
work in efficient attention.

**PDF Analysis:**
The paper introduces "Sparse Hierarchical Attention" (SHA) which...

**Key Contributions:**

- Reduces attention complexity to O(n log n)
- Maintains accuracy on standard benchmarks
- Provides theoretical analysis of approximation quality

**Methodology:**

- Hierarchical clustering of tokens
- Sparse attention patterns
- Gradient-based importance sampling

**Results:**

- 3x faster training on long sequences
- Comparable accuracy to full attention
- Scales to 100K token sequences

**Limitations:**

- Requires tuning of sparsity hyperparameters
- Limited evaluation on generation tasks

**Future Directions:**

- Extension to decoder-only models
- Application to multi-modal learning

Includes:

  • Title and metadata
  • Abstract and relevance score
  • PDF analysis (if performed)
  • Key contributions
  • Methodology summary
  • Results and limitations
markdown
## Analysis Summary

**Total Papers**: 47
**Papers Scored**: 30
**Papers with PDF Analysis**: 20

**Total Duration**: 45 minutes 11 seconds
**Total Cost**: $4.73

**Model Breakdown:**

- Quick Filter (Haiku 4.5): $0.08
- Abstract Scoring (Sonnet 4.5): $1.45
- PDF Analysis (Opus 4.1): $3.20

---

_Generated by Aparture - AI-powered research paper discovery_

Provides complete cost and timing breakdown.

NotebookLM Document Structure

The NotebookLM document uses a different structure optimized for audio:

Conversational Format

markdown
# Research Highlights: Computer Science (cs.LG, cs.AI)

## Overview

Today's analysis covered 47 papers in machine learning and AI,
with several exciting developments in transformer architectures
and Bayesian inference methods.

## Major Themes

### Transformer Efficiency

There's significant progress in making transformers more efficient.
Smith et al. introduce "Sparse Hierarchical Attention"...

### Bayesian Methods

Several papers explore Bayesian approaches to deep learning.
The most interesting is Jones et al.'s work on...

Key differences:

  • Narrative style - Flows like a conversation
  • Thematic organization - Groups related papers
  • Synthesis - Connects ideas across papers
  • Audio-friendly - Short sentences, clear structure

Structured Sections

markdown
## Deep Dive: Sparse Hierarchical Attention

This paper by Smith et al. tackles a fundamental challenge:
transformers are computationally expensive on long sequences.

**The Problem**: Standard attention is O(n²), making it
prohibitive for documents longer than a few thousand tokens.

**The Solution**: Hierarchical clustering creates sparse
attention patterns that approximate full attention.

**The Impact**: 3x faster training with minimal accuracy loss.

**Why It Matters**: Opens the door to processing much longer
documents and could enable new applications in...

Provides deep dives on top papers with context and implications.

Working with Reports

Opening Reports

Markdown viewers:

  • VS Code - Built-in preview (Ctrl/Cmd + Shift + V)
  • Obsidian - Rich markdown experience
  • Typora - WYSIWYG markdown editor
  • GitHub - Upload for web viewing

Convert to other formats:

  • PDF: Use pandoc or markdown-to-pdf tools
  • HTML: Use marked or static site generators
  • Word: Use pandoc with DOCX output

Searching Reports

Find papers by topic:

bash
grep -i "bayesian" reports/2025-10-14_arxiv_analysis_45min.md

Find high-scoring papers:

bash
grep "Score: [89]" reports/2025-10-14_arxiv_analysis_45min.md

Extract arXiv IDs:

bash
grep -o "arxiv.org/abs/[0-9.]*" reports/*.md

Organizing Reports

By date:

reports/
  2025-10-14_arxiv_analysis_45min.md
  2025-10-15_arxiv_analysis_52min.md
  2025-10-16_arxiv_analysis_38min.md

By topic (manual):

reports/
  machine-learning/
    2025-10-14_arxiv_analysis_45min.md
  astrophysics/
    2025-10-15_arxiv_analysis_52min.md

Archive old reports:

bash
mkdir reports/archive
mv reports/2025-09-* reports/archive/

Report Quality

What to Expect

Good Reports:

  • ✅ Consistent scoring across similar papers
  • ✅ Detailed justifications for scores
  • ✅ Comprehensive PDF analyses
  • ✅ Clear executive summary
  • ✅ Actionable recommendations

Common Issues:

  • ⚠️ Score inflation (all papers 7-9)
  • ⚠️ Generic justifications
  • ⚠️ Missing PDF analyses (download failures)
  • ⚠️ Inconsistent formatting

Improving Quality

Better research criteria:

  • Be specific about interests
  • Mention concrete techniques
  • Provide example topics
  • Update regularly based on results

Better model selection:

  • Use Opus 4.1/GPT-5 for scoring (higher quality)
  • Enable post-processing for consistency
  • Use Haiku/Nano only for quick filter

Better configuration:

  • Select focused categories
  • Adjust score thresholds
  • Limit PDF analysis to top papers
  • Review and iterate

See Multi-Stage Analysis for optimization tips.

Sharing Reports

Within Teams

Markdown format advantages:

  • Version control friendly (Git)
  • Easy to diff and merge
  • Readable in any text editor
  • GitHub renders nicely

Collaborative workflows:

  • Commit to shared repository
  • Use pull requests for review
  • Track changes over time
  • Search across all reports

Public Sharing

Before sharing publicly:

  • ⚠️ Remove any internal notes
  • ⚠️ Check for sensitive information
  • ⚠️ Verify arXiv links work
  • ✓ Add context for external readers

Publishing options:

  • GitHub gists
  • Personal blog/website
  • Research group webpage
  • arXiv "daily picks" lists

Research log

Commit reports to Git and track your research interests over time. Great for identifying trends and documenting your learning journey.

Next Steps

Released under the MIT License.