Reports & Outputs
Understanding Aparture's output files and report structure.
Output Files
Aparture generates three types of files, all saved to the reports/ directory:
Analysis Report
Filename: YYYY-MM-DD_arxiv_analysis_XXmin.md
The primary output containing complete analysis results.
Example: 2025-10-14_arxiv_analysis_45min.md
Format: Markdown (human-readable, GitHub-compatible)
Size: 50-500 KB depending on paper count and PDF analyses
NotebookLM Document
Filename: YYYY-MM-DD_notebooklm_XXmin.md
Podcast-optimized document structured for audio generation.
Example: 2025-10-14_notebooklm_15min.md
Format: Markdown with special structuring
Size: 30-150 KB
Purpose: Upload to notebooklm.google.com for podcast creation
Audio Podcast
Filename: YYYY-MM-DD_podcast.m4a
AI-generated audio overview (requires CLI automation).
Example: 2025-10-14_podcast.m4a
Format: M4A (audio)
Size: 5-30 MB depending on duration
Duration: 5-30 minutes (configurable)
Podcasts require CLI automation
The web interface can generate the NotebookLM document, but you'll need to upload it manually to Google NotebookLM to create the podcast. Use CLI automation for fully automated podcast generation.
Analysis Report Structure
The analysis report follows a consistent structure:
Header Section
# arXiv Analysis Report
**Date**: 2025-10-14
**Analysis Duration**: 45 minutes
**Generated by**: Aparture v1.0
## Configuration
- **Categories**: cs.LG, cs.AI, stat.ML (47 papers)
- **Quick Filter**: Enabled (Haiku 4.5)
- **Abstract Scoring**: Enabled (Sonnet 4.5)
- **PDF Analysis**: Top 20 papers (Opus 4.1)Key information:
- Run date and duration
- Categories analyzed
- Paper counts
- Models used
Executive Summary
## Executive Summary
Analyzed 47 recent papers from arXiv. Key findings:
**Top Themes:**
- 12 papers on transformer architectures
- 8 papers on Bayesian methods
- 6 papers on interpretability
**Highlights:**
- Novel attention mechanism (Score: 9.2)
- Scalable inference method (Score: 8.8)
- New benchmark dataset (Score: 8.5)
**Recommendations:**
Priority reading: Papers #1, #3, #5 for immediate relevance.Provides quick overview of:
- Paper distribution
- Major themes
- Top highlights
- Reading recommendations
Stage Results
Each processing stage gets a dedicated section:
Quick Filter Results
## Stage 1: Quick Filter
**Model**: Claude Haiku 4.5
**Duration**: 2 minutes 34 seconds
**Cost**: $0.08
**Results:**
- ✓ YES: 18 papers (38%)
- ~ MAYBE: 12 papers (26%)
- ✗ NO: 17 papers (36%)
**Filtered out**: 17 papers
**Proceeding to scoring**: 30 papersShows filtering effectiveness and cost.
Abstract Scoring Results
## Stage 2: Abstract Scoring
**Model**: Claude Sonnet 4.5
**Duration**: 15 minutes 22 seconds
**Cost**: $1.45
**Score Distribution:**
- 9-10: 3 papers (10%)
- 7-8: 8 papers (27%)
- 5-6: 12 papers (40%)
- 3-4: 7 papers (23%)
**Average Score**: 6.2 / 10Provides scoring overview and statistics.
PDF Analysis Results
## Stage 3: PDF Analysis
**Model**: Claude Opus 4.1
**Duration**: 28 minutes 15 seconds
**Cost**: $3.20
**Papers Analyzed**: 20 / 30
**Success Rate**: 100%
**Average PDF Size**: 2.3 MBShows deep analysis statistics.
Paper Details
Each paper gets a detailed entry:
### 1. Novel Attention Mechanism for Transformers
**Score**: 9.2 / 10
**Authors**: Smith et al.
**arXiv**: [2410.12345](https://arxiv.org/abs/2410.12345)
**PDF**: [Download](https://arxiv.org/pdf/2410.12345.pdf)
**Abstract:**
We propose a new attention mechanism that reduces computational
complexity from O(n²) to O(n log n) while maintaining performance...
**Relevance Justification:**
Highly relevant. Addresses key challenge in transformer scaling
with novel approach. Strong empirical results. Builds on recent
work in efficient attention.
**PDF Analysis:**
The paper introduces "Sparse Hierarchical Attention" (SHA) which...
**Key Contributions:**
- Reduces attention complexity to O(n log n)
- Maintains accuracy on standard benchmarks
- Provides theoretical analysis of approximation quality
**Methodology:**
- Hierarchical clustering of tokens
- Sparse attention patterns
- Gradient-based importance sampling
**Results:**
- 3x faster training on long sequences
- Comparable accuracy to full attention
- Scales to 100K token sequences
**Limitations:**
- Requires tuning of sparsity hyperparameters
- Limited evaluation on generation tasks
**Future Directions:**
- Extension to decoder-only models
- Application to multi-modal learningIncludes:
- Title and metadata
- Abstract and relevance score
- PDF analysis (if performed)
- Key contributions
- Methodology summary
- Results and limitations
Footer Section
## Analysis Summary
**Total Papers**: 47
**Papers Scored**: 30
**Papers with PDF Analysis**: 20
**Total Duration**: 45 minutes 11 seconds
**Total Cost**: $4.73
**Model Breakdown:**
- Quick Filter (Haiku 4.5): $0.08
- Abstract Scoring (Sonnet 4.5): $1.45
- PDF Analysis (Opus 4.1): $3.20
---
_Generated by Aparture - AI-powered research paper discovery_Provides complete cost and timing breakdown.
NotebookLM Document Structure
The NotebookLM document uses a different structure optimized for audio:
Conversational Format
# Research Highlights: Computer Science (cs.LG, cs.AI)
## Overview
Today's analysis covered 47 papers in machine learning and AI,
with several exciting developments in transformer architectures
and Bayesian inference methods.
## Major Themes
### Transformer Efficiency
There's significant progress in making transformers more efficient.
Smith et al. introduce "Sparse Hierarchical Attention"...
### Bayesian Methods
Several papers explore Bayesian approaches to deep learning.
The most interesting is Jones et al.'s work on...Key differences:
- Narrative style - Flows like a conversation
- Thematic organization - Groups related papers
- Synthesis - Connects ideas across papers
- Audio-friendly - Short sentences, clear structure
Structured Sections
## Deep Dive: Sparse Hierarchical Attention
This paper by Smith et al. tackles a fundamental challenge:
transformers are computationally expensive on long sequences.
**The Problem**: Standard attention is O(n²), making it
prohibitive for documents longer than a few thousand tokens.
**The Solution**: Hierarchical clustering creates sparse
attention patterns that approximate full attention.
**The Impact**: 3x faster training with minimal accuracy loss.
**Why It Matters**: Opens the door to processing much longer
documents and could enable new applications in...Provides deep dives on top papers with context and implications.
Working with Reports
Opening Reports
Markdown viewers:
- VS Code - Built-in preview (Ctrl/Cmd + Shift + V)
- Obsidian - Rich markdown experience
- Typora - WYSIWYG markdown editor
- GitHub - Upload for web viewing
Convert to other formats:
- PDF: Use
pandocor markdown-to-pdf tools - HTML: Use
markedor static site generators - Word: Use
pandocwith DOCX output
Searching Reports
Find papers by topic:
grep -i "bayesian" reports/2025-10-14_arxiv_analysis_45min.mdFind high-scoring papers:
grep "Score: [89]" reports/2025-10-14_arxiv_analysis_45min.mdExtract arXiv IDs:
grep -o "arxiv.org/abs/[0-9.]*" reports/*.mdOrganizing Reports
By date:
reports/
2025-10-14_arxiv_analysis_45min.md
2025-10-15_arxiv_analysis_52min.md
2025-10-16_arxiv_analysis_38min.mdBy topic (manual):
reports/
machine-learning/
2025-10-14_arxiv_analysis_45min.md
astrophysics/
2025-10-15_arxiv_analysis_52min.mdArchive old reports:
mkdir reports/archive
mv reports/2025-09-* reports/archive/Report Quality
What to Expect
Good Reports:
- ✅ Consistent scoring across similar papers
- ✅ Detailed justifications for scores
- ✅ Comprehensive PDF analyses
- ✅ Clear executive summary
- ✅ Actionable recommendations
Common Issues:
- ⚠️ Score inflation (all papers 7-9)
- ⚠️ Generic justifications
- ⚠️ Missing PDF analyses (download failures)
- ⚠️ Inconsistent formatting
Improving Quality
Better research criteria:
- Be specific about interests
- Mention concrete techniques
- Provide example topics
- Update regularly based on results
Better model selection:
- Use Opus 4.1/GPT-5 for scoring (higher quality)
- Enable post-processing for consistency
- Use Haiku/Nano only for quick filter
Better configuration:
- Select focused categories
- Adjust score thresholds
- Limit PDF analysis to top papers
- Review and iterate
See Multi-Stage Analysis for optimization tips.
Sharing Reports
Within Teams
Markdown format advantages:
- Version control friendly (Git)
- Easy to diff and merge
- Readable in any text editor
- GitHub renders nicely
Collaborative workflows:
- Commit to shared repository
- Use pull requests for review
- Track changes over time
- Search across all reports
Public Sharing
Before sharing publicly:
- ⚠️ Remove any internal notes
- ⚠️ Check for sensitive information
- ⚠️ Verify arXiv links work
- ✓ Add context for external readers
Publishing options:
- GitHub gists
- Personal blog/website
- Research group webpage
- arXiv "daily picks" lists
Research log
Commit reports to Git and track your research interests over time. Great for identifying trends and documenting your learning journey.