NousResearch/hermes-agent 是一个开源的、可扩展的智能体框架,旨在随用户需求演进,支持自主任务规划与工具调用。 NousResearch/hermes-agent is an open-source, extensible agent framework designed to evolve with user needs, supporting autonomous task planning and tool use.
Ollama 是一个用于本地运行大型语言模型的开源工具,支持 Kimi-K2.6、GLM-5.1、Qwen、Gemma 等多种主流开源与商业对齐模型的一键拉取和运行。 Ollama is an open-source tool for running large language models locally, enabling one-command pull and execution of popular models including Kimi-K2.6, GLM-5.1, Qwen, Gemma, and others.
微软开源了pg_durable项目,这是一个为PostgreSQL设计的数据库内持久化执行框架,支持长时间运行、容错且状态可恢复的AI/数据工作流。该项目填补了传统数据库在有状态异步任务调度方面的空白,对AI工程化和MLOps场景具有实用价值。 Microsoft has open-sourced pg_durable, an in-database durable execution framework for PostgreSQL that enables fault-tolerant, stateful, long-running workflows—particularly valuable for AI/ML orchestration and MLOps. It bridges a key gap in database-native support for resilient asynchronous task execution.
Dify 是一个面向生产环境的开源平台,用于构建和部署基于智能体(agentic)的工作流,支持可视化编排、模型集成与应用发布。 Dify is a production-ready open-source platform for developing and deploying agentic workflows, featuring visual orchestration, multi-model integration, and one-click application publishing.
本文介绍了防御AI推理端点遭受推理窃取(inference theft)和“钱包拒绝服务”(denial-of-wallet)攻击的实用方法,包括机器人检测、防护护栏、成本感知路由和预算控制。 This article presents practical defensive techniques against inference theft and denial-of-wallet attacks on AI endpoints, including bot detection, guardrails, cost-aware routing, and budget controls.
本文介绍了Gemini团队发布的Gemma 4量化感知训练(QAT)模型,专为在移动设备和笔记本电脑等资源受限设备上高效运行而优化,强调模型压缩与推理效率提升。 This post announces the release of Gemma 4 quantization-aware training (QAT) models by Google’s Gemini team, optimized for efficient inference on resource-constrained devices like mobile phones and laptops.
Gemma 4 12B 是一款新型免编码器的统一多模态模型,专为本地高效运行设计,但当前内容仅宣布发布,未提供技术细节、代码或使用指南。 Gemma 4 12B is a newly announced unified, encoder-free multimodal model optimized for high-performance local execution—but the article provides only an introductory announcement without technical specifications, code, or deployment instructions.
Open WebUI 是一个开源的、用户友好的本地化 AI 界面,支持 Ollama、OpenAI API 等多种后端模型服务,便于快速部署和交互式使用大语言模型。 Open WebUI is an open-source, user-friendly local AI interface that supports multiple backends including Ollama and the OpenAI API, enabling quick deployment and interactive LLM usage.
这是一个基于大语言模型的开源股票分析系统,支持A股、港股和美股,整合多源行情数据、实时新闻与LLM决策仪表盘,并提供零成本定时运行和多渠道推送功能。 An open-source, LLM-powered stock analysis system supporting A-share, Hong Kong, and US markets, integrating multi-source market data, real-time news, an LLM-driven decision dashboard, and zero-cost scheduled execution with multi-channel notifications.
vLLM 是一个高性能、内存高效的大型语言模型推理与服务引擎,专为优化吞吐量和降低显存开销而设计。 vLLM is a high-throughput, memory-efficient inference and serving engine for large language models, designed to optimize latency, throughput, and GPU memory utilization.
Langflow 是一个开源的低代码平台,用于可视化构建、调试和部署基于 LLM 的 AI 代理与工作流。它支持与 LangChain 等主流框架集成,适合开发者快速原型设计。 Langflow is an open-source, low-code platform for visually building, debugging, and deploying LLM-powered AI agents and workflows, with native integration with frameworks like LangChain, enabling rapid prototyping for developers.
Firecrawl 是一个开源的 Web 数据获取工具,提供可扩展的 API,支持大规模网页搜索、爬取和交互,专为 AI 应用(如 RAG)优化。 Firecrawl is an open-source web data acquisition tool offering a scalable API for searching, scraping, and interacting with the web—designed specifically to power AI applications like RAG.
本文是一篇反思性技术随笔,探讨在AI代理(agent)执行任务过程中人类主动介入(如中途接管键盘)所揭示的当前AI系统在可控性、可中断性和人机协作方面的局限性,并指出现有评估框架(如PMP)未能覆盖这些关键维度。 This is a reflective technical essay examining the limitations of current AI agents—particularly around controllability, interruptibility, and human-in-the-loop collaboration—highlighted by the author's deliberate mid-task keyboard reclamation, and argues that standard evaluation benchmarks like the PMP fail to measure these critical dimensions.
本文分享了通过在Claude Code中添加一项关键配置(未明示但 implied 为 caching、state reuse 或 deterministic sampling),显著降低AI工程成本的实践案例,强调实操优化而非 prompt 工程。 This post shares a practical optimization—adding one key configuration (e.g., caching, state reuse, or deterministic sampling) to Claude Code—that cut AI engineering costs by 62% on identical hardware and models, highlighting cost-saving tactics beyond prompt engineering.
这是一款开源的Claude Code插件,仅用两行代码即可强制Claude回复严格控制在5行以内,提升响应简洁性与可读性。 This is an open-source Claude Code plugin that enforces concise replies—limiting every response to under 5 lines—with just two lines of code.
Lowfat 是一款轻量级可插拔的命令行过滤工具,专为减少 LLM 处理 CLI 输出时的 token 消耗而设计,实测节省高达 91.8% 的 token;支持作为 shell 包装器或 agent 钩子使用,并提供按命令定制的插件系统。 Lowfat is a lightweight, pluggable CLI filtering tool designed to drastically reduce LLM token consumption when processing verbose command-line output—achieving up to 91.8% token savings; it works as a shell wrapper or agent hook and supports per-command customizable plugins.
该内容介绍了一种使用现代大语言模型进行风格化微调的实践方法,聚焦于让模型生成1995年风格的技术文档(如早期HTML指南、终端手册),涵盖数据准备、LoRA微调和提示工程等关键步骤。 This content presents a hands-on tutorial on fine-tuning a modern LLM to generate technical documentation in the stylistic and linguistic conventions of 1995—e.g., early web HTML guides and Unix man-page tone—covering dataset curation, LoRA-based fine-tuning, and prompt engineering.
LobeHub 是一个开源的 AI 代理编排平台,旨在将多个 AI 智能体组织为 7×24 小时持续运行的自动化团队,提供招聘、调度与报告等类管理功能。 LobeHub is an open-source AI agent orchestration platform designed to manage multiple AI agents as a 24/7 operational team, offering agent 'hiring', scheduling, and performance reporting.
AnythingLLM 是一个开源的本地优先大语言模型(LLM)智能体框架,支持私有化部署、文档加载、RAG 和多模型后端集成,旨在让用户完全掌控自己的AI工作流。 AnythingLLM is an open-source, local-first LLM agent framework enabling private, on-prem deployment with RAG, document ingestion, and multi-model backend support—designed for full user control over AI workflows.
SABER 是一项针对大语言模型编码智能体的新基准,聚焦于有状态项目工作区中的操作安全性评估,弥补了现有安全评测忽视环境状态演变的缺陷。 SABER is a novel benchmark for evaluating the operational safety of LLM-based coding agents in realistic, stateful project environments—shifting safety assessment from isolated prompt refusal to holistic post-execution workspace integrity.