NousResearch/hermes-agent 是一个开源的自主智能体框架,支持动态任务扩展与上下文自适应,旨在随用户需求演进而持续增强能力。 NousResearch/hermes-agent is an open-source autonomous agent framework designed to dynamically grow and adapt its capabilities alongside user needs and task complexity.
本文是一篇实践导向的教程,介绍作者如何构建一个用于AI系统设计练习的交互式教练工具,并提供可复现的代码、部署指南和使用方法,帮助开发者通过动手实践提升AI系统设计能力。 This is a hands-on tutorial describing how the author built an interactive AI system design coaching tool—complete with code, deployment instructions, and usage guidance—to help developers practice and improve AI system design through active experimentation.
Hugging Face Transformers 是一个广泛使用的开源库,提供数千种预训练模型和统一API,支持文本、视觉、音频及多模态任务的推理与训练。 Hugging Face Transformers is a widely adopted open-source library offering thousands of pre-trained models and a unified API for inference and training across text, vision, audio, and multimodal tasks.
f/prompts.chat 是一个开源的社区驱动型提示词共享平台,前身是 Awesome ChatGPT Prompts,支持自托管以保障数据隐私。 f/prompts.chat is an open-source, community-driven prompt-sharing platform—formerly Awesome ChatGPT Prompts—that enables private, self-hosted deployment for organizations.
LangChain 是一个开源的代理工程平台,用于构建基于大语言模型的应用程序,支持链式调用、工具集成和自主代理开发。 LangChain is an open-source agent engineering platform for building LLM-powered applications, enabling chains, tool integration, and autonomous agent development.
Dify 是一个面向生产环境的开源平台,专为构建基于智能体(agentic)的工作流而设计,支持可视化编排、模型集成与应用部署。 Dify is a production-ready open-source platform designed for building agentic workflows, featuring visual orchestration, LLM integration, and application deployment capabilities.
Ollama 是一个本地运行大型语言模型的开源工具,支持一键部署 Kimi-K2.6、GLM-5.1、Qwen、Gemma 等多个主流开源与商用模型。 Ollama is an open-source tool for running large language models locally, enabling one-command setup of popular models including Kimi-K2.6, GLM-5.1, Qwen, Gemma, and others.
Langflow 是一个开源的低代码可视化平台,用于构建、调试和部署基于 LLM 的 AI 代理与工作流。 Langflow is an open-source, low-code visual platform for building, debugging, and deploying LLM-powered AI agents and workflows.
这是一个GitHub上的开源项目列表,汇集了100多个可本地运行、克隆、定制和部署的AI智能体与RAG应用,面向开发者提供即用型实践资源。 This is a GitHub-curated open-source repository listing 100+ runnable, cloneable, customizable, and deployable AI agent and RAG applications—designed as hands-on, production-ready resources for developers.
llama.cpp 是一个用 C/C++ 实现的轻量级开源项目,专注于在本地 CPU 上高效运行大型语言模型(LLM),无需 GPU 即可进行推理。 llama.cpp is a lightweight open-source project written in C/C++ that enables efficient local LLM inference on CPUs without requiring GPUs.
vLLM 是一个高性能、内存高效的大型语言模型推理与服务引擎,专为加速 LLM 部署而设计,支持 PagedAttention 等创新技术。 vLLM is a high-throughput, memory-efficient inference and serving engine for large language models, featuring innovations like PagedAttention to significantly improve decoding speed and GPU memory utilization.
该项目展示了在RK3588S芯片上利用NPU加速双路YOLOv8n模型实现无人机实时检测,达到42 FPS,属于边缘AI部署的开源实践。 This project demonstrates real-time UAV detection using a dual YOLOv8n model accelerated by the NPU on the RK3588S SoC, achieving 42 FPS — an open-source edge AI deployment implementation.
本文提出SciAgentArena,一个专为评估AI智能体在多尺度科学挑战中能力而设计的系统性基准,弥补了现有评测在复杂性、异质性和交互式推理方面的不足。 This paper introduces SciAgentArena, a systematic benchmark designed to evaluate AI agents on multi-scale scientific challenges—addressing critical gaps in existing benchmarks regarding complexity, heterogeneity, and interactive, extended reasoning.
RhymeFlow 是一种无需训练的视频生成加速方法,通过异步去噪流调度优化扩散Transformer(DiT)的推理效率,突破了传统扩散流程中每步必须严格同步的限制。 RhymeFlow is a training-free acceleration method for video generation that improves inference efficiency of Diffusion Transformers (DiTs) via asynchronous denoising flow scheduling, relaxing the rigid step-wise synchronization constraint of standard diffusion pipelines.
ClinHallu 是一项面向医学多模态大语言模型(MLLM)的新型基准,首次提出按推理阶段(视觉识别、知识召回、推理整合)细粒度诊断幻觉来源的方法,填补了现有医疗幻觉评估的结构性空白。 ClinHallu is a novel benchmark for medical multimodal LLMs that enables stage-wise diagnosis of hallucinations—pinpointing origins in visual perception, medical knowledge retrieval, or reasoning integration—addressing a critical gap in fine-grained hallucination evaluation.
该研究发现,在GRPO(组相对策略优化)中,同族小模型天然具备更高的策略级多样性,可替代传统基于词元随机性的多样性增强方法,提升推理轨迹一致性与pass@k性能。 This paper identifies policy-level diversity as a new axis for improving rollout diversity in GRPO for LLMs, showing that smaller models within the same family inherently exhibit higher behavioral diversity—and better pass@k—than larger counterparts, offering a more coherent alternative to token-level stochasticity.
本文提出RedAct方法,用于对AI智能体的执行轨迹进行脱敏以保护程序性技能,并构建CapTraceBench基准来量化轨迹泄露风险。 This paper introduces RedAct, a method for redacting AI agent execution traces to protect procedural skills, and proposes CapTraceBench—a benchmark of 7 tasks—to quantify the risk of skill leakage from traces.
MBench 是一项针对视频世界模型记忆能力的综合性基准测试,旨在填补现有评测在长期时序状态一致性与内部记忆建模方面的空白。该基准强调模型维持稳定、合理内部状态的能力,而不仅限于视觉质量或跨模态对齐。 MBench is a comprehensive benchmark designed to evaluate the memory capability of video-based world models, addressing the critical gap in assessing long-horizon internal state consistency—beyond conventional metrics like visual fidelity or text-video alignment.
Orchestra-o1 是一项针对多模态智能体协同的新型编排框架研究,旨在解决现有LLM智能体编排系统在异构多模态(文本、视觉、音频等)场景下泛化能力不足的问题。 Orchestra-o1 is a novel research framework for orchestrating omnimodal (e.g., text, vision, audio) agent swarms, addressing the critical limitation of current LLM-based orchestration systems in handling heterogeneous, interacting modalities.
该论文提出Persona-Pruner方法,通过模型剪枝技术为角色扮演任务定制轻量级语言模型,在保持角色一致性的同时显著降低计算开销。 This paper introduces Persona-Pruner, a model pruning technique that customizes lightweight language models for role-playing tasks, preserving persona consistency while substantially reducing computational cost.