Open Source Toolkit for Building AI Agents in 2026
It used to take a lot of effort to get your first PR merged in open source. Now you can ship something real in a weekend thanks to coding agents like Claude Code.
But that also means the noise went way up. I saw the wave of openclaw repos a while back but the ratio of hype to actual maintenance is quite different.
I have a habit of exploring new projects almost every day, been doing that for 2+ years. Lately I have been deep in the AI agents space and these are the repos that actually stuck.
These aren't random projects, I have come across these in my journey and built with some of them. Agent harness, frontend stack, engineering skills for coding agents, voice agents, browser automation, computer use and a lot more. If you are building AI agents in 2026, this list is for you.
If you are new to open source, check out this free guide I made a while back. For any project you're considering: look for a
CONTRIBUTING.mdand a healthy community profile.
Let's jump in.
Categories
- Frontend & UI Layer (6)
- Skills & Plugins (4)
- Computer Use (6)
- Agent Orchestration (6)
- Coding Agent Harness (4)
- Open-Source Coding Agents (7)
- Browser Automation (5)
- Web Scraping & Ingestion (5)
- Multi-Agent Frameworks (6)
- Document Processing (7)
- Voice Agents (7)
- Visual Builders (6)
- MCP & Tool Integration (5)
- Sandboxing & Code Execution (5)
- Agent Memory (5)
- Testing & Evaluation (7)
- Monitoring & Observability (5)
Just remember, there's no specific order in this post. Every open source project is good in its own way.
1. Frontend & UI Layer - CopilotKit
CopilotKit is the frontend stack for Agents. Most agent stacks give you the backend and leave the user-facing layer entirely to you.
CopilotKit is the layer that provides all the building blocks. Chat components, hooks, headless UI for custom agent interfaces, persistent threads, human-in-the-loop, shared state and a built-in Inspector for debugging.
They support all three generative UI patterns in one runtime. Basically, it allows the agent to show components rather than just describing them (A2UI by Google is one pattern).
The part I really like is that they have a mcp server for coding agents that lets coding agents fetch live docs without any usage limit.
You can connect directly to any LLM in just a few lines without any agent framework on the backend and make it context-aware about your app. They also have 13+ first-party integration support with major frameworks.
import {
CopilotRuntime,
copilotRuntimeNextJSAppRouterEndpoint,
} from "@copilotkit/runtime";
import { BuiltInAgent } from "@copilotkit/runtime/v2";
import { NextRequest } from "next/server";
const builtInAgent = new BuiltInAgent({
model: "openai:gpt-5.5",
});
const runtime = new CopilotRuntime({
agents: { default: builtInAgent },
});
export const POST = async (req: NextRequest) => {
const { handleRequest } = copilotRuntimeNextJSAppRouterEndpoint({
runtime,
endpoint: "/api/copilotkit",
});
return handleRequest(req);
};
You can enable multimodal attachments like images, PDFs, audio, video & pass reasoningEffort to control how hard the model thinks.
The framework-agnostic design is what makes it practical. It's built on AG-UI protocol, the open event protocol for agent-user interaction now adopted by Google, AWS, Microsoft, LangChain, and many more. So if you switch any framework or protocol, everything on the frontend stays exactly the same.
I have used it for a couple of my recent projects, including a job search assistant I built using LangChain Deep Agents & it helped me surface all the things it was doing under the hood.
CopilotKit has 31.5k stars on GitHub.
Alternatives
TanStack AI - Framework-agnostic, vendor-neutral AI SDK from the TanStack team. Strong TypeScript support, modular adapters per provider. Direct alternative to Vercel AI SDK without the Next.js coupling.
Vercel AI SDK - Good for streaming and tool calling in Next.js. Stateless and tied to the Vercel ecosystem.
Tambo - React SDK focused purely on generative UI. Still early and not the full agent-chat stack.
Assistant UI - Headless React primitives for building chat UIs.
agent-native - Framework from Builder.io where agent and UI share the same action model. Define actions once, expose them to both. No separate agent API - if the UI can do it, the agent can and vice versa.
2. Skills & Plugins - agent-skills
Anthropic shipped the Skills format and the ecosystem took off fast. A lot of people are even saying MCP is dead because of Skills (I don't really believe so).
A skill is basically a directory containing a SKILL.md file that contains organized folders of instructions, scripts, and resources that give agents additional capabilities.
The official repo has 138k stars & it's worth reading the engineering blog to understand how progressive disclosure works in practice.
agent-skills by Addy Osmani is the gold. 23 production-grade engineering skills with 7 slash commands that map to the full dev lifecycle (/spec, /plan, /build, /test, /review, /ship).
Hard exit criteria, anti-rationalization tables, progressive disclosure. Encodes Google engineering culture -- Hyrum's Law, the Beyonce Rule, trunk-based development.
It has 43.8k stars on GitHub. Here are all the skills included.
agent-skills/
├── skills/ # 23 skills (22 lifecycle + 1 meta)
│ ├── interview-me/ # Define
│ ├── idea-refine/ # Define
│ ├── spec-driven-development/ # Define
│ ├── planning-and-task-breakdown/ # Plan
│ ├── incremental-implementation/ # Build
│ ├── context-engineering/ # Build
│ ├── source-driven-development/ # Build
│ ├── doubt-driven-development/ # Build
│ ├── frontend-ui-engineering/ # Build
│ ├── test-driven-development/ # Build
│ ├── api-and-interface-design/ # Build
│ ├── browser-testing-with-devtools/ # Verify
│ ├── debugging-and-error-recovery/ # Verify
│ ├── code-review-and-quality/ # Review
│ ├── code-simplification/ # Review
│ ├── security-and-hardening/ # Review
│ ├── performance-optimization/ # Review
│ ├── git-workflow-and-versioning/ # Ship
│ ├── ci-cd-and-automation/ # Ship
│ ├── deprecation-and-migration/ # Ship
│ ├── documentation-and-adrs/ # Ship
│ ├── shipping-and-launch/ # Ship
│ └── using-agent-skills/ # Meta: how to use this pack
├── agents/ # 3 specialist personas
├── references/ # 4 supplementary checklists
├── hooks/ # Session lifecycle hooks
├── .claude/commands/ # 7 slash commands (Claude Code)
├── .gemini/commands/ # 7 slash commands (Gemini CLI)
└── docs/ # Setup guides per tool
Alternatives
skills.sh - The npm for agent skills (marketplace). Install any skill with
npx skills add <owner/repo>. The leaderboard surfaces what developers are actually using rather than what got hyped on launch day.taste-skill - Portable design taste skills (minimalist, brutalist, GPT-tuned) that fix the generic-looking AI slop. One of the few skills that visibly changes what the agent produces. I have been using it for a couple of months.
Repomix - Packs an entire repo into one AI-friendly file. Pick when you need the agent to see the whole codebase at once.
3. Computer Use - UI-TARS Desktop
Most computer-use agents take a screenshot and ask a generalist VLM to guess pixel coordinates. UI-TARS was trained end-to-end on GUI grounding - it understands UI elements as a first-class concept rather than image regions to click on.
What I find really interesting is the "System-2 reflection" -- after each action, it compares before/after screenshots and generates a corrective plan if something didn't land right, instead of just going through a broken sequence.
It scores higher on OSWorld vs Claude Computer Use. Personally, I believe practical usage matters a lot more than benchmarks. 😅
They also ship Agent TARS - a CLI and Web UI that brings the same vision + MCP tool integration to your terminal and browser.
You can say something like: Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline. All the demos are in the readme.
It has 34k stars on GitHub.
One very interesting repo I found is Sutando. It's a personal AI agent for macOS and runs on your Claude Code subscription with minimal extra costs.
The use cases are pretty wild. You can say "join my 2pm call" - it reads your calendar, joins Zoom via the desktop app or Google Meet via browser, takes screenshots to identify participants, does live research when someone asks a question and writes you a summary when the call ends.
Or you can call it from your phone, say "summon" - it opens Zoom with screen sharing and you control your computer by voice while walking around.
When you're not giving it tasks, Sutando runs an autonomous build loop - it monitors its own health, detects patterns in how you work, discovers new skills, and builds missing capabilities, which is crazy.
It has like 300 stars on Github but it is genuinely interesting.
Alternatives
Midscene - Also from ByteDance's Web Infra team. Vision-driven UI automation across web, Android, and iOS from one API. Integrates with Playwright and Puppeteer, ships a Chrome extension, CLI, and MCP server.
Agent-S - Hierarchical planning approach that builds a knowledge base from past interactions and uses it to plan future tasks. Benchmarks well on OSWorld and WindowsAgentArena.
Bytebot - Self-hosted AI desktop agent in a containerized Linux environment. The agent gets its own full virtual desktop -- browser, file system, password manager, any app.
docker-compose upand it's running.cua - macOS/Linux VM sandbox so the agent runs on a virtual machine, not your real machine.
OpenHands - Full developer environment that can browse, write code, run tests, and commit PRs. Covered again under Coding Agents.
4. Agent Orchestration - LangGraph
LangGraph is the stateful graph runtime built on LangChain. It is the most mature framework for building, managing, and deploying long-running, stateful agents.
The loop is a graph. Every step is a node. State is typed and checkpointed. You can pause at any node, serialize the entire state to disk, resume on a different machine days later.
They also ship Deep Agents - a coding agent harness on top of LangGraph with planning, filesystem tools, sub-agents, and context compression, if you want to skip writing the graph yourself.
Combining it with other products like LangSmith Engine, LangChain, Deep Agents gives developers a full suite of tools for building agents. And it's super useful for debugging.
If you're confused, here's a simple distinction:
- LangChain - agents via chains and
create_agent. Simple, fast to get started, less control over the state. The foundation on which everything else is built. If the process dies, the agent starts over. - LangGraph - stateful graph runtime built on LangChain. You can replay from any checkpoint to debug what went wrong.
- Deep Agents - harness built on top of LangGraph.
LangGraph has 32.3k stars on GitHub.
Alternatives
Agno - lightweight support for agents needing persistent memory and multimodal inputs. Ships with AgentOS, a pre-built FastAPI server with sessions, streaming, RBAC, and observability. Claims 529x faster instantiation than LangGraph.
Mastra - TypeScript-first with RAG, observability, MCP, and workflows baked in. Pick if your team lives in JS/TS rather than Python.
Pydantic AI - Type-safe agent framework from the Pydantic team. Pick when you want validated structured outputs without writing the validators yourself.
Google ADK - Google's official agent dev kit with native Vertex AI integration. Pick if you're building on Google Cloud.
PocketFlow - A 100-line LLM framework. Genuinely minimal. Pick when LangGraph feels like too much infrastructure.
5. Coding Agent Harness - Deep Agents
A harness is everything around the model that makes it an agent - tools, state, planning, memory, feedback loops, guardrails.
You can say: "Agent = Model + Harness". LangChain proved this matters more than most teams expect: harness-layer changes alone moved the same model from 52.8% to 66.5% on Terminal Bench 2.0, jumping from Top 30 to Top 5. No model change.
Deep Agents is LangChain's batteries-included harness built on LangGraph. Planning, filesystem tools, sub-agents, and context compression out of the box.
User goal
↓
Deep Agent (LangGraph StateGraph)
├─ Plan: write_todos → updates "todos" in state
├─ Delegate: task(...) → runs a subagent with its own tool loop
├─ Context: ls/read_file/write_file/edit_file → persists working notes/artifacts
↓
Final answer
The core problem with long-running agents is that they accumulate tool call results until the context window fills -- causing context poisoning, distraction, and confusion.
Their fix:
- Large tool outputs go to a virtual filesystem instead of the prompt
- Skills load only frontmatter at startup, full content on demand
- Conversation history gets compressed as sessions grow
- Sub-agents run in their own context window, the main agent only gets the final result
You can build lots of stuff around this like Deep Research Assistant.
Deep Agents has 23.1k stars on GitHub.
Alternatives
Hive - Outcome-driven agent development framework. Agents evolve based on whether they actually achieved the goal, not just whether they completed steps.
Browser Harness - From the Browser Use team. Self-healing harness that gives the LLM maximum freedom -- instead of wrapping Chrome with thousands of lines of heuristics, it lets the LLM use CDP directly and add its own tools when it needs them. Different philosophy from most browser frameworks.
Archon - Open-source harness builder for AI coding. Describe what you want and it generates a deterministic, repeatable agent harness for you.
6. Open-Source Coding Agents - OpenCode
I have used Claude Code and Codex extensively. Both are great but locked to their respective ecosystems.
OpenCode is the open-source alternative - terminal-native, 75+ provider support, LSP integration, multi-session (run multiple agents in parallel on the same project), privacy-first.
What makes it stand out: genuinely provider-agnostic from day one. You can switch between Claude, Gemini, GPT-5 and local models in the same session without reconfiguring anything. Most other coding agents have a preferred model set up in the defaults.
You can also share a link to any session for reference or to debug. It is available as a terminal interface, desktop app, and IDE extension - though I have only used the terminal.
OpenCode has 162k stars on GitHub.
Alternatives
- Codex (OpenAI) - OpenAI's official terminal coding agent. Pick when you want first-party support and the cleanest GPT-5 integration.
- Gemini CLI - Google's official terminal agent with 1M token context. The free tier is hard to beat for experimentation.
- Cline - VS Code extension with per-step approval. Pick if you want IDE-native control rather than a terminal.
- Aider - Git-native terminal pair programmer. 70%+ of Aider's own code is now written by Aider. Fast and model-agnostic.
- OpenHands - Full agentic developer environment that can browse, run shell, and commit PRs. Heavier than the others.
- Goose - Block's extensible coding agent with first-class MCP and a clean extension model.
7. Browser Automation - Browser Use
Browser Use gives your agent a browser. Point it at a URL, describe what you want done, and it clicks, types, and navigates. You write intent, not selectors -- the agent reads the DOM and figures out the interaction itself.
The LLM-first design means you describe intent, not selectors, and the agent figures out the DOM.
The reason they are so good is that they have built purpose-built LLMs specifically for browser tasks. Their bu-ultra model scores 97% on Mind2Web vs 62% for claude-opus-4-6.
The open source library works with any model but their custom ones are what the benchmarks run on.
They also have a desktop app that controls your local Chrome directly and Browser Use Box - a 24/7 Claude Code agent you can deploy on any $5 VPS and control via Telegram.
# pip install browser-use-sdk
from browser_use_sdk.v3 import AsyncBrowserUse
client = AsyncBrowserUse()
result = await client.run(
"Go to amazon.com, extract 200 products with name, price and reviews, save to products.csv"
It has 94k stars on GitHub.
Alternatives
Stagehand - Four primitives:
act,extract,observe,agent. Deterministic step-by-step control when you need it, autonomous execution when you don't. Self-healing -- "click submit" survives page redesigns because it's resolved by AI at runtime, not hardcoded selectors.Playwright MCP - Microsoft's MCP server wrapping Playwright. Pick if you already write Playwright tests and want your agent to drive the same browser.
Skyvern - Uses a swarm of agents + computer vision to operate on sites it's never seen before. No XPaths, no selectors -- maps visual elements to actions in real-time. Also ships a no-code workflow builder.
Scrapling - Adaptive scraper that survives selector drift. Bypass anti-bot systems like Cloudflare Turnstile out of the box. Allows concurrent, multi-session crawls with automatic proxy rotation.
8. Web Scraping & Ingestion - Firecrawl
Agents need to pull content from the web constantly - research, monitoring, competitive intel, RAG pipelines.
Most scrapers give you raw HTML full of nav menus, ads, and cookie banners that burn tokens and confuse the model.
Firecrawl converts any website into clean LLM-ready Markdown or structured JSON. Three core endpoints that cover everything:
-
/searchfor web search with content already extracted -
/scrapefor full page Markdown -
/extractfor structured JSON via natural language prompt.
They also have an /agent endpoint where basically you describe what you want in natural language and it searches, navigates, and extracts across multiple sites autonomously. No URLs needed.
import Firecrawl from '@mendable/firecrawl-js';
import { z } from 'zod';
const firecrawl = new Firecrawl({
apiKey: 'fc-YOUR-API-KEY'
});
const schema = z.object({
companies: z.array(z.object({
name: z.string(),
founders: z.array(z.string()),
funding: z.string().optional(),
website: z.string()
}))
});
const result = await firecrawl.agent({
prompt: 'Get all YC W24 companies',
schema: schema
});
Their FIRE-1 navigation agent (beta) can autonomously navigate complex sites, clicks, scrolls, fills forms and handles multi-step flows before extracting. Pages behind login or pagination are no longer a blocker.
There's a lot more so feel free to explore. It has 122k stars on GitHub.
Alternatives
Gitingest - Replace 'hub' with 'ingest' in any GitHub URL and get a prompt-friendly extract of the codebase. Filter by file size, include/exclude specific paths, and support private repos too.
Crawl4AI - Open-source, self-hosted, no API key needed. Built specifically for RAG pipelines - LLM-aware chunking, BM25 content filtering, full-site crawling with depth control. Pick when you want full control without per-request fees.
Jina Reader - Prepend
r.jina.ai/to any URL and get clean Markdown. Zero setup, zero SDK. Pick for quick one-off page conversion or prototyping, where you don't want any configuration.ScrapeGraphAI - Prompt-driven scraping. Describe what you want extracted in natural language and it builds the scraping workflow. Pick when you need structured JSON extraction, not just Markdown.
9. Multi-Agent Frameworks - CrewAI
CrewAI is the most widely ado










