rag-system-builder

Overview

Skill Key: alexfeng75/rag-system-builder
Author: alexfeng75
Source Repo: openclaw/skills
Version: -
Source Path: skills/alexfeng75/rag-system-builder
Latest Commit SHA: e7a0d5eab8134a6584af6eb3cbfbbd16714617c0

Extracted Content

SKILL.md excerpt

# RAG System Builder Skill

Build complete local RAG systems that work offline with document ingestion, semantic search, and AI-powered Q&A.

## 🎯 What This Skill Does

This skill guides you through building a complete RAG system that:
- **Ingests documents** from multiple formats (TXT, PDF, DOCX, MD, HTML, JSON, XML)
- **Generates embeddings** using sentence-transformers (offline, no API needed)
- **Stores vectors** locally using FAISS for fast similarity search
- **Provides Q&A interface** through CLI and web interface
- **Works completely offline** - no external API calls required

## 📦 Prerequisites

```bash
# Python 3.8+ required
python --version

# Install dependencies
pip install sentence-transformers faiss-cpu click flask
```

## 🚀 Quick Start

### 1. Create Project Structure

```bash
# Create project directory
mkdir rag-system
cd rag-system

# Create main files
touch rag.py embeddings.py vector_store.py retriever.py config.py
```

### 2. Download Embedding Model

```bash
# Download sentence-transformers model locally
python -c "from huggingface_hub import snapshot_download; snapshot_download(repo_id='sentence-transformers/all-MiniLM-L6-v2', local_dir='./models/all-MiniLM-L6-v2')"
```

### 3. Configure System

Create `config.py`:

```python
import os
from dataclasses import dataclass

@dataclass
class Config:
    embedding_model: str = "sentence-transformers/all-MiniLM-L6-v2"
    local_model_path: str = "./models/all-MiniLM-L6-v2"
    chunk_size: int = 512
    chunk_overlap: int = 128
    vector_store_path: str = "vector_store"
    default_top_k: int = 5
    supported_formats: tuple = (".txt", ".pdf", ".docx", ".md", ".html", ".json", ".xml")
```

### 4. Build Core Components

#### Embeddings Module (`embeddings.py`)

```python
import os
import numpy as np
from typing import List
from sentence_transformers import SentenceTransformer
from config import config

class EmbeddingModel:
    def __init__(self, model_name: str = None):
        self.model_name = m...

README excerpt

# RAG System Builder Skill

Build complete local RAG (Retrieval-Augmented Generation) systems that work offline with document processing, semantic search, and AI-powered Q&A.

## 🎯 What This Skill Does

This skill provides step-by-step guidance for building a complete RAG system from scratch:

- **Document Ingestion**: Support for TXT, PDF, DOCX, MD, HTML, JSON, XML
- **Embedding Generation**: Using sentence-transformers (offline, no API needed)
- **Vector Storage**: Local FAISS index for fast similarity search
- **Q&A Interface**: CLI and optional web interface
- **Complete Offline**: No external API calls required

## 🚀 Quick Start

### 1. Install Dependencies

```bash
pip install sentence-transformers faiss-cpu click flask
```

### 2. Download Embedding Model

```bash
python -c "from huggingface_hub import snapshot_download; snapshot_download(repo_id='sentence-transformers/all-MiniLM-L6-v2', local_dir='./models/all-MiniLM-L6-v2')"
```

### 3. Build and Use

```bash
# Ingest documents
python rag.py ingest --docs-path ./my-documents

# Query documents
python rag.py query --query "What is machine learning?"
```

## 📦 What's Included

This skill provides:

1. **Complete Code Templates**
   - `rag.py` - CLI interface
   - `embeddings.py` - Embedding generation
   - `vector_store.py` - FAISS storage
   - `retriever.py` - Search functionality
   - `config.py` - Configuration

2. **Step-by-Step Instructions**
   - Project setup
   - Model downloading
   - Component implementation
   - Testing and deployment

3. **Usage Examples**
   - Basic workflow
   - Advanced usage
   - Troubleshooting guide

## 🎯 Use Cases

- **Document Q&A**: Ask questions about your documents
- **Knowledge Base**: Search through document libraries
- **Research Assistant**: Find relevant information quickly
- **Offline AI**: Work without internet connection

## 📚 Requirements

- Python 3.8+
- 2GB+ disk space for embedding model
- RAM depends on document size

## 🤝 Contributing

This skill is...

TopRank Skills

安装方式

Overview

Extracted Content

SKILL.md excerpt

README excerpt

Related Claw Skills

nano-banana-pro-prompts-recommend-skill

ai-maestro

registry-broker-skills

skills-manager

open-skills

youtube-skills