该GitHub仓库(Ollama)是一个开源本地AI模型运行工具,支持快速部署Kimi-K2.6、GLM-5.1、Qwen、Gemma等主流开源及商用大模型。 This GitHub repository (Ollama) is an open-source tool for running large language models locally, enabling quick setup of models including Kimi-K2.6, GLM-5.1, Qwen, Gemma, and others.
本文详细介绍了Google ADK(Agent Development Kit)中用于防御提示注入攻击的五层安全架构,并提供了具体配置方法和防护实践建议。 This article details the five-layer security architecture in Google's Agent Development Kit (ADK) designed to defend AI agents against prompt injection attacks, including concrete configuration steps and mitigation best practices.
LangChain 是一个开源的代理工程平台,用于构建基于大语言模型的应用程序,支持链式调用、工具集成和自主代理开发。 LangChain is an open-source agent engineering platform for building LLM-powered applications, supporting chains, tool integration, and autonomous agents.
NousResearch/hermes-agent 是一个开源的自主智能体框架,旨在随用户需求演进并支持多步推理与工具调用,适用于构建可扩展的AI代理系统。 NousResearch/hermes-agent is an open-source autonomous agent framework designed to evolve with user needs, supporting multi-step reasoning and tool use for building scalable AI agents.
Dify 是一个面向生产环境的开源平台,用于构建和部署基于智能体(agentic)的工作流,支持可视化编排、模型集成与应用发布。 Dify is a production-ready open-source platform for developing and deploying agentic workflows, featuring visual orchestration, LLM integration, and application publishing.
HazelJS 1.0.0 是一个专为AI原生应用设计的稳定版TypeScript框架,支持LLM集成、智能代理和可组合AI工作流,面向开发者提供生产级AI工具链。 HazelJS 1.0.0 is a stable, AI-native TypeScript framework designed for building LLM-powered applications, featuring built-in agent orchestration, tool calling, and composable AI workflows.
vLLM 是一个高性能、内存高效的大型语言模型推理与服务引擎,专为加速 LLM 部署而设计,支持 PagedAttention 等创新技术。 vLLM is a high-throughput, memory-efficient inference and serving engine for large language models, featuring innovations like PagedAttention to significantly improve decoding speed and GPU memory utilization.
Firecrawl 是一个开源的 Web 数据获取工具,提供可扩展的 API,支持大规模网页搜索、爬取和交互,专为 AI 应用(如 RAG)优化。 Firecrawl is an open-source web data acquisition tool offering a scalable API for searching, scraping, and interacting with the web—designed specifically to power AI applications like RAG.
Langflow 是一个开源的低代码可视化平台,用于构建、调试和部署基于 LLM 的 AI 工作流与智能体。 Langflow is an open-source, low-code visual platform for building, debugging, and deploying LLM-based AI workflows and agents.
本文介绍了如何将 mirrord 工具集成到 AI-SRE(AI 驱动的站点可靠性工程)工作流中,实现对 AI 建议修复方案在真实集群上的自动验证,提升运维决策可信度。 This article explains how to integrate the mirrord tool into an AI-SRE workflow to automatically verify every AI-suggested fix against a live production cluster before human review.
本文探讨AI代理系统中“静默失败”问题,指出单纯提升速率限制等容量措施无法保障结果正确性,强调需区分“可用性”与“正确可用性”两类服务等级目标(SLO),并呼吁在工程实践中优先保障输出质量。 This article examines the 'silent failure' problem in AI agent systems, arguing that capacity fixes like rate limit adjustments, retries, and caching improve uptime but not necessarily correctness—highlighting the critical distinction between 'uptime' and 'correct uptime' as separate SLOs and advocating for correctness-aware engineering.
browser-use 是一个开源库,旨在为 AI 智能体提供标准化、可编程的网页交互能力,支持自动化在线任务,填补了 AI 代理与真实 Web 环境之间的关键桥梁。 browser-use is an open-source library that enables AI agents to interact with websites programmatically and reliably, bridging the gap between AI agents and real-world web environments for task automation.
Anthropic就Claude模型中未公开的“Fable”蒸馏式安全护栏向公众致歉,该隐形机制引发AI社区对透明度与安全治理的广泛质疑。 Anthropic apologized for an undisclosed 'Fable' distillation-based safety guardrail in Claude models, sparking broad community debate about AI transparency and responsible deployment.
LobeHub 是一个开源的 AI 代理编排平台,旨在将多个 AI 智能体组织成可 7×24 小时自主运行的‘AI 团队’,支持智能体招聘、调度与绩效报告。 LobeHub is an open-source AI agent orchestration platform designed to organize multiple AI agents into a 7×24 autonomous 'AI team', enabling agent onboarding, scheduling, and performance reporting.
MoneyPrinterTurbo 是一个开源项目,利用大语言模型和多模态AI技术实现一键生成高清短视频,支持本地部署与定制化扩展。 MoneyPrinterTurbo is an open-source project that leverages large language models and multimodal AI to generate high-definition short videos with a single click, supporting local deployment and customization.
MuJoCo-Drones-Gym 是一个开源的、支持 GPU 加速的多无人机仿真环境,专为控制算法开发和强化学习训练设计,兼容 Gymnasium 接口。 MuJoCo-Drones-Gym is an open-source, GPU-accelerated multi-drone simulation environment designed for control algorithm development and reinforcement learning training, compatible with the Gymnasium API.
本文提出EvoBrowseComp——一个面向动态演进知识的搜索智能体评测基准,旨在解决现有静态基准(如BrowseComp)易受测试集污染和参数记忆干扰的问题,从而更真实地评估模型的检索推理能力。 This paper introduces EvoBrowseComp, an evolving benchmark for evaluating search agents on dynamically updating knowledge, addressing critical limitations of static benchmarks (e.g., BrowseComp) such as test-set contamination and parametric memorization, enabling more faithful assessment of genuine browsing and reasoning capabilities.
EvoArena 是一个面向动态环境的新型基准套件,旨在评估大语言模型智能体在环境持续变化(如终端、软件、任务条件更新)下的记忆演化与鲁棒性。该工作填补了现有静态评测与真实部署场景之间的关键鸿沟。 EvoArena is a novel benchmark suite designed to evaluate the memory evolution and robustness of LLM agents in dynamic environments—where terminals, software, and task conditions change progressively—addressing the critical gap between static benchmarks and real-world deployment.
Evoflux 提出一种推理时进化可执行工具工作流的方法,旨在提升紧凑型语言模型在动态工具调用场景下的鲁棒性与成功率,解决小规模规划器在工具发现、依赖追踪和执行验证中的常见失败问题。 Evoflux introduces an inference-time evolutionary method for executable tool workflows, enhancing robustness and success rates of compact language models in dynamic tool-use scenarios—addressing common failures of small planners in tool discovery, dependency tracking, and execution validation.
本文提出了一种名为检索增强强化微调(RA-RFT)的新框架,旨在通过检索类比性推理示例提升大语言模型的复杂推理能力,突破传统基于语义相似性的检索局限。 This paper introduces Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT), a novel post-training framework that improves complex reasoning in LMs by retrieving analogical reasoning examples—addressing the limitation of conventional semantic-similarity-based retrieval for reasoning tasks.