arXiv Categories
Understanding arXiv's taxonomy and how to select relevant categories.
What are arXiv Categories?
arXiv organizes papers into a hierarchical taxonomy with:
- Major categories - Top-level domains (cs, math, physics, etc.)
- Subcategories - Specialized topics within domains
- Cross-listings - Papers appearing in multiple categories
Papers can belong to multiple subcategories, with one designated as primary.
Category Structure
Computer Science (cs)
Artificial Intelligence
cs.AI- Artificial Intelligencecs.LG- Machine Learningcs.NE- Neural and Evolutionary Computingcs.CV- Computer Vision and Pattern Recognitioncs.CL- Computation and Language (NLP)
Theory
cs.CC- Computational Complexitycs.DS- Data Structures and Algorithmscs.LO- Logic in Computer Science
Systems
cs.DC- Distributed, Parallel, and Cluster Computingcs.OS- Operating Systemscs.PL- Programming Languagescs.SE- Software Engineering
Other
cs.CR- Cryptography and Securitycs.DB- Databasescs.HC- Human-Computer Interactioncs.IR- Information Retrievalcs.RO- Robotics
Full list includes 40+ cs subcategories.
Mathematics (math)
Pure Mathematics
math.AG- Algebraic Geometrymath.NT- Number Theorymath.GT- Geometric Topologymath.RT- Representation Theory
Applied Mathematics
math.NA- Numerical Analysismath.OC- Optimization and Controlmath.PR- Probabilitymath.ST- Statistics Theory
Interdisciplinary
math.CO- Combinatoricsmath.DS- Dynamical Systemsmath.MP- Mathematical Physics
Statistics (stat)
stat.ML- Machine Learningstat.ME- Methodologystat.AP- Applicationsstat.CO- Computationstat.TH- Theory
Physics (physics, astro-ph, etc.)
Astrophysics (astro-ph)
astro-ph.CO- Cosmology and Nongalactic Astrophysicsastro-ph.GA- Astrophysics of Galaxiesastro-ph.SR- Solar and Stellar Astrophysicsastro-ph.EP- Earth and Planetary Astrophysicsastro-ph.HE- High Energy Astrophysical Phenomenaastro-ph.IM- Instrumentation and Methods
General Physics (physics)
physics.comp-ph- Computational Physicsphysics.data-an- Data Analysis, Statistics and Probabilityphysics.optics- Opticsphysics.plasm-ph- Plasma Physics
High Energy Physics (hep-)
hep-th- High Energy Physics - Theoryhep-ph- High Energy Physics - Phenomenologyhep-ex- High Energy Physics - Experiment
Quantitative Biology (q-bio)
q-bio.BM- Biomoleculesq-bio.GN- Genomicsq-bio.NC- Neurons and Cognitionq-bio.QM- Quantitative Methods
Economics (econ)
econ.EM- Econometricsecon.TH- Theoretical Economics
Electrical Engineering (eess)
eess.IV- Image and Video Processingeess.SP- Signal Processingeess.SY- Systems and Control
Full taxonomy
See arxiv.org/category_taxonomy for the complete list of 150+ categories.
Choosing Categories
Strategy 1: Narrow & Focused
When to use:
- Clear, specific research interests
- Limited time for daily reading
- Want high signal-to-noise ratio
Example: Deep learning in astronomy
Selected categories:
- cs.LG (Machine Learning)
- astro-ph.GA (Astrophysics of Galaxies)
- astro-ph.CO (Cosmology)Typical volume: 5-15 papers/day Pros: Highly relevant results Cons: May miss related work
Strategy 2: Broad & Exploratory
When to use:
- Interdisciplinary research
- Want to discover new connections
- Have time for exploration
Example: Bayesian methods across fields
Selected categories:
- stat.ME (Statistics Methodology)
- stat.ML (Machine Learning)
- astro-ph.IM (Instrumentation)
- physics.data-an (Data Analysis)
- econ.EM (Econometrics)Typical volume: 20-50 papers/day Pros: Discover unexpected connections Cons: More noise, higher costs
Strategy 3: Domain-Focused
When to use:
- Deep expertise in one field
- Want comprehensive coverage
- Daily reading is core activity
Example: Complete astrophysics coverage
Selected categories:
- All astro-ph.* subcategories (6 total)
- physics.comp-ph
- stat.APTypical volume: 30-80 papers/day Pros: Miss nothing in your domain Cons: High volume, expensive
Cross-Listing Considerations
Papers often appear in multiple categories:
Example paper:
- Primary:
cs.LG - Cross-listed:
stat.ML,cs.AI
Aparture handling:
- Fetches unique papers (no duplicates)
- Considers all selected categories
- Shows primary category in results
Implications:
- Selecting both
cs.LGandstat.MLwon't duplicate papers - But you'll get papers where either is primary
- More categories = better coverage
Category Selection Tips
Start Small
Begin with 2-3 highly relevant categories:
- Test with dry run - See paper volume
- Run minimal analysis - Check relevance
- Add categories gradually - Expand as needed
Monitor Volume
Track papers per category over time:
- Low volume (<5/day) - Consider adding related categories
- High volume (>50/day) - Consider enabling Quick Filter
- Overwhelming (>100/day) - Narrow selection or increase thresholds
Check Cross-Listings
Some papers appear in multiple categories you care about:
Example workflow:
- Run analysis with
cs.LGonly - Check where top papers are cross-listed
- Add those categories for better coverage
Use Research Criteria
Category selection + research criteria = powerful combination:
Broad categories:
Categories: cs.AI, cs.LG, cs.CV, cs.CL
Research criteria: "Bayesian deep learning"Narrow categories:
Categories: cs.LG, stat.ML
Research criteria: "Any machine learning advances"Common Combinations
Machine Learning Researcher
Core:
- cs.LG (Machine Learning)
- cs.AI (Artificial Intelligence)
- stat.ML (Statistics - Machine Learning)
Optional:
- cs.CV (Computer Vision)
- cs.CL (NLP)Volume: 20-40 papers/day
Computational Astrophysicist
Core:
- astro-ph.CO (Cosmology)
- astro-ph.GA (Galaxies)
- astro-ph.IM (Instrumentation)
Optional:
- physics.comp-ph (Computational Physics)
- physics.data-an (Data Analysis)
- stat.AP (Statistics Applications)Volume: 15-35 papers/day
Applied Statistician
Core:
- stat.ME (Methodology)
- stat.AP (Applications)
- stat.ML (Machine Learning)
Optional:
- cs.LG (Machine Learning)
- econ.EM (Econometrics)Volume: 10-25 papers/day
Theory-Focused CS
Core:
- cs.CC (Computational Complexity)
- cs.DS (Data Structures & Algorithms)
- cs.LO (Logic)
Optional:
- math.CO (Combinatorics)
- cs.CR (Cryptography)Volume: 5-15 papers/day
Category Updates
arXiv occasionally updates its taxonomy:
Recent changes:
- 2020: Added
eess.*(Electrical Engineering) - 2017: Reorganized
cs.*subcategories - 2016: Added
econ.*(Economics)
Staying current:
- Check arXiv taxonomy annually
- Watch for new subcategories in your field
- Update Aparture configuration as needed
New subcategory?
If you notice a new relevant subcategory, add it to your configuration and re-run historical analyses to catch papers you missed.
API Considerations
Fetching Limits
arXiv API has rate limits:
- 1 request per 3 seconds
- Maximum 1000 results per request
Aparture handling:
- Automatically respects rate limits
- Batches requests for multiple categories
- Paginates large result sets
Performance
More categories = longer fetching:
- 1-3 categories: ~1 minute
- 5-10 categories: ~2-3 minutes
- 20+ categories: ~5-10 minutes