Overview
- Skill Key
- camopel/arxivkb
- Author
- camopel
- Source Repo
- openclaw/skills
- Version
- -
- Source Path
- skills/camopel/arxivkb
- Latest Commit SHA
- 16bd32a36d5182a59ea9aa2069b6f96db31e86bf
Local arXiv paper manager with semantic search. Crawls arXiv categories, downloads PDFs, chunks content, and indexes with FAISS + Ollama embeddings. No cloud API keys required — everything runs locally.
Stars
0
Installs
0
Status
ACTIVE
Visibility
PUBLIC
直接复制以下提示词,发送给你的 AI 助手即可完成安装。
请先检查是否已安装 SkillHub 商店,若未安装,请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店,然后安装 arxivkb 技能。 若已安装,则直接安装 arxivkb 技能。
# ArXivKB — Science Knowledge Base ## Why This Skill? 🏠 **100% local** — crawls arXiv's free API, embeds with Ollama (nomic-embed-text), indexes in FAISS + SQLite. No cloud cost. 🔍 **Semantic search on paper content** — FAISS indexes PDF chunks (not just abstracts), so you find papers by what they contain. 📂 **arXiv category-based** — tracks official arXiv categories (155 available, 8 groups). No free-text queries. 🧹 **Auto-cleanup** — configurable expiry deletes old papers, PDFs, and chunks. ## Install ```bash python3 scripts/install.py ``` Works on **macOS and Linux**. Installs Python deps (`faiss-cpu`, `pdfplumber`, `tiktoken`, `arxiv`, `numpy`), pulls `nomic-embed-text` via Ollama, creates data directories and DB. ### Prerequisites - **Ollama** — must be installed and running (`ollama serve`) - **Python 3.10+** ## Quick Start ```bash # 1. Add arXiv categories to track akb categories add cs.AI cs.CV cs.LG # 2. Browse all available categories akb categories browse # 3. Ingest recent papers (last 7 days) akb ingest # 4. Check stats akb stats ``` ## Categories ```bash akb categories list # Show enabled categories akb categories browse # Browse all 155 arXiv categories akb categories browse robotics # Filter by keyword akb categories add cs.AI cs.RO # Enable categories akb categories delete cs.AI # Disable a category ``` Categories are official arXiv codes (e.g. `cs.AI`, `eess.IV`, `q-fin.ST`). The full taxonomy is built in. ## Ingestion ```bash akb ingest # Crawl, download PDFs, chunk, embed akb ingest --days 14 # Look back 14 days akb ingest --dry-run # Preview only akb ingest --no-pdf # Index abstracts only (faster) ``` Pipeline: arXiv API → PDF download → text extraction (pdfplumber) → chunking (tiktoken, 500 tokens, 50 overlap) → embedding (Ollama nomic-embed-text) → FAISS + SQLite. ## Paper Details ```bash akb paper 2401.12345...
# arxivkb An arXiv paper crawler with local semantic search (FAISS), topic management, and optional LLM summarization. All embedding is done locally — no cloud APIs required. Powers the **🔬 ArXiv** app in [PrivateApp](https://github.com/camopel/PrivateApp). ## Install ```bash python3 scripts/install.py ``` This will: - Install Python dependencies (`faiss-cpu`, `pdfplumber`, `arxiv`, `numpy`, `tiktoken`) - Pull the default embedding model via Ollama (`nomic-embed-text`) - Create the data directory at `~/workspace/arxivkb/` - Set up a SQLite database with default arXiv categories - Schedule a daily ingest cron (systemd timer on Linux, launchd on macOS) ## Usage ### Manage topics (arXiv categories) ```bash # Browse available categories akb topics browse akb topics browse "machine learning" # List enabled categories akb topics list # Enable categories akb topics add cs.AI cs.CV cs.RO stat.ML # Disable a category akb topics delete cs.AI ``` ### Ingest papers ```bash # Ingest papers from the last 7 days akb ingest --days 7 # Dry run (show what would be fetched) akb ingest --days 3 --dry-run # Expire old papers akb expire --days 30 ``` ### Search papers ```bash # Semantic search (requires embedding model) python3 scripts/search.py "transformer attention mechanism" --top 10 # Paper details akb paper 2310.00001 ``` ### Stats ```bash akb stats ``` ## Data Directory Papers are stored in `~/workspace/arxivkb/`: - `arxivkb.db` — SQLite database (papers, chunks, categories) - `pdfs/` — Downloaded PDF files - `faiss/` — FAISS vector index files - `config.json` — Per-user configuration ## Embedding Models By default, ArXivKB uses `nomic-embed-text` via [Ollama](https://ollama.ai). Make sure Ollama is running: ```bash ollama serve ollama pull nomic-embed-text ``` Alternative models can be configured in `~/workspace/arxivkb/config.json`. ## Background Service The installer schedules daily paper ingestion: ```bash # Linux — systemd timer systemctl --user...
heyixuan2
Bambu Lab 3D printer control and automation. Activate when user mentions: printer status, 3D printing, slice, analyze model, generate 3D, AMS filament, print monitor, Bambu Lab, or any 3D printing task. Full pipeline: search → generate → analyze → colorize → preview → open BS → user slice → print → monitor. Supports all 9 Bambu Lab printers (A1 Mini, A1, P1S, P2S, X1C, X1E, H2C, H2S, H2D).
openstockdata
OpenClaw Skill for stock data analysis
capt-marbles
Generative Engine Optimization (GEO) for AI search visibility. Optimize content to appear in ChatGPT, Perplexity, Claude, and Google AI Overviews. Use when optimizing websites, pages, or content for LLM discoverability and citation.
camelsprout
DuckDB CLI specialist for SQL analysis, data processing and file conversion. Use for SQL queries, CSV/Parquet/JSON analysis, database queries, or data conversion. Triggers on "duckdb", "sql", "query", "data analysis", "parquet", "convert data".
calvinxhk
Role
calvinxhk
Role