What I Actually Learned Building with 5 Agentic AI Frameworks

TL;DR

I spent weeks building projects with five different Agentic AI frameworks: OpenAI Agents SDK, CrewAI, LangGraph, Autogen, and MCP. Here's the honest truth: there's no perfect framework. Each one clicked for different reasons.

OpenAI SDK and CrewAI were the easiest to pick up. I was productive almost immediately. LangGraph gave me the most control when I needed to map out exactly how my agent should think. MCP solved a real headache when I needed to plug into multiple external services. Autogen? Still wrapping my head around it, but its distributed architecture is seriously impressive for the right use case.

Bottom line: pick the framework that matches your problem, not the one with the most hype.

Introduction

"Which framework should I use?"

I kept asking myself this question when I started diving into agentic AI. The landscape is crowded, and everyone seems to have a strong opinion. So instead of reading more blog posts, I decided to actually build things.

Here's what I worked on:

OpenAI Agents SDK - A deep research system with output guardrails
CrewAI - A stock picker with a hierarchical team of agents
LangGraph - A "Sidekick" assistant that browses the web and evaluates its own work
Autogen - A world where agents create other agents dynamically
MCP - A trading floor where AI traders connect to market data, accounts, and search services

What follows is what I actually learned—the good parts, the frustrating bits, and when you'd actually want to use each one.

1. OpenAI Agents SDK — When You Just Want It to Work

The Project: Deep Research with Guardrails

I built a research assistant that plans searches, runs them in parallel, and writes up findings. The twist? It has guardrails that check the output quality before returning results.

What the Code Looks Like

Setting up an agent is dead simple:

python

from agents import Agent, WebSearchTool, ModelSettings

search_agent = Agent(
    name="Search agent",
    instructions=INSTRUCTIONS,
    tools=[WebSearchTool(search_context_size="low")],
    model="gpt-4o-mini",
    model_settings=ModelSettings(tool_choice="required"),
)

That's it. No ceremony, no boilerplate maze. I defined what I wanted, and it worked.

For structured outputs, Pydantic plays nicely:

python

from pydantic import BaseModel, Field

class WebSearchItem(BaseModel):
    reason: str = Field(description="Why this search matters.")
    query: str = Field(description="The actual search term.")

class WebSearchPlan(BaseModel):
    searches: list[WebSearchItem]

planner_agent = Agent(
    name="PlannerAgent",
    instructions=INSTRUCTIONS,
    model="gpt-4o-mini",
    output_type=WebSearchPlan,  # Type-safe outputs!
)

The Guardrail Feature Stood Out

This is where things got interesting. I could attach guardrails that validate outputs before they're returned:

python

from agents import output_guardrail, GuardrailFunctionOutput, RunContextWrapper

@output_guardrail
async def guardrail_output_length(ctx: RunContextWrapper, agent: Agent, output: ReportData) -> GuardrailFunctionOutput:
    result = await Runner.run(guardrail_agent, output.markdown_report, context=ctx.context)
    return GuardrailFunctionOutput(
        output_info={"output_too_long": result.final_output.reason},
        tripwire_triggered=False
    )

writer_agent = Agent(
    name="WriterAgent",
    output_type=ReportData,
    output_guardrails=[guardrail_output_length]
)

I used this to check if reports were too long, but you could validate anything, tone, accuracy, or safety. It's baked into the framework, not bolted on.

The Good Stuff

Fast to learn — I was building within an hour
Structured outputs — Pydantic integration means type safety without extra work
Guardrails — Built-in quality control with tripwire mechanisms
Tracing — Debugging with trace IDs saved me hours
Async-first — Parallel operations just work

The Rough Edges

You're in OpenAI's world — Primarily designed for their models
Fewer knobs to turn — When I wanted fine control, I hit limits
Moving target — The SDK is newer, docs sometimes lag

When to Reach for It

If you need a research assistant, content generator, or anything that benefits from output validation, and you're already using OpenAI, this framework gets out of your way.

2. CrewAI — Thinking in Teams

The Project: Stock Picker

I wanted a system where different agents have different jobs: one finds trending companies, another researches them, and a final one picks the best investment. CrewAI's "crew" metaphor made this natural to build.

What the Code Looks Like

CrewAI uses decorators and YAML config, which keeps things organized:

python

from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task

@CrewBase
class StockPicker:
    agents_config = "config/agents.yaml"
    tasks_config = "config/tasks.yaml"

    @agent
    def trending_company_finder(self) -> Agent:
        return Agent(
            config=self.agents_config["trending_company_finder"],
            tools=[SerperDevTool()],
            memory=True,
        )

    @agent
    def financial_researcher(self) -> Agent:
        return Agent(
            config=self.agents_config["financial_researcher"],
            tools=[SerperDevTool()]
        )

    @task
    def find_trending_companies(self) -> Task:
        return Task(
            config=self.tasks_config["find_trending_companies"],
            output_pydantic=TrendingCompanyList,
        )

YAML Config Changed How I Worked

Being able to tweak agent personalities without touching Python felt liberating:

trending_company_finder: role: > Financial News Analyst that finds trending companies in {sector} goal: > You read the latest news, then find 2-3 companies that are trending. backstory: > You are a market expert with a knack for picking out interesting companies. llm: openai/gpt-4o-mini

I could iterate on prompts fast, and non-technical teammates could understand what each agent does.

The Memory System is No Joke

This blew me away. CrewAI has three types of memory built in:

python

from crewai.memory import LongTermMemory, ShortTermMemory, EntityMemory
from crewai.memory.storage.rag_storage import RAGStorage
from crewai.memory.storage.ltm_sqlite_storage import LTMSQLiteStorage

return Crew(
    agents=self.agents,
    tasks=self.tasks,
    process=Process.hierarchical,
    manager_agent=manager,
    memory=True,
    long_term_memory=LongTermMemory(
        storage=LTMSQLiteStorage(db_path="./memory/long_term_memory_storage.db")
    ),
    short_term_memory=ShortTermMemory(
        storage=RAGStorage(
            embedder_config={"provider": "openai", "config": {"model": "text-embedding-3-small"}},
            type="short_term",
            path="./memory/",
        )
    ),
    entity_memory=EntityMemory(
        storage=RAGStorage(
            embedder_config={"provider": "openai", "config": {"model": "text-embedding-3-small"}},
            type="short_term",
            path="./memory/",
        )
    ),
)

Long-term memory persists across sessions. Short-term uses RAG. Entity memory tracks key information about people, companies, and concepts. This is enterprise-grade stuff.

The Good Stuff

Role-based thinking — Matches how we actually organize teams
YAML separation — Config lives apart from logic
Hierarchical processes — Manager-worker patterns built in
Memory that actually works — Long-term, short-term, entity—all covered
Solid tool ecosystem — SerperDev and others ready to go

The Rough Edges

Opinionated — Fighting the structure gets painful
Debugging hierarchies — When the manager delegates incorrectly, good luck
Resource hungry — Memory systems add overhead

When to Reach for It

If you're building something that naturally maps to a team of analysts, researchers, writers, and reviewers, CrewAI makes sense. Especially if you need agents to remember things across sessions.

3. LangGraph — When You Need a Map

The Project: Sidekick Assistant

I built an assistant that browses the web, runs Python code, and here's the key part: evaluates its own work and loops back if it's not good enough. LangGraph's graph structure made this feedback loop explicit.

What the Code Looks Like

You literally define nodes and edges:

python

from langgraph.graph import StateGraph, START, END

class State(TypedDict):
    messages: Annotated[List[Any], add_messages]
    success_criteria: str
    feedback_on_work: Optional[str]
    success_criteria_met: bool
    user_input_needed: bool

async def build_graph(self):
    graph_builder = StateGraph(State)

    # Nodes = things that happen
    graph_builder.add_node("worker", self.worker)
    graph_builder.add_node("tools", ToolNode(tools=self.tools))
    graph_builder.add_node("evaluator", self.evaluator)

    # Edges = how we move between them
    graph_builder.add_conditional_edges(
        "worker",
        self.worker_router,
        {"tools": "tools", "evaluator": "evaluator"}
    )
    graph_builder.add_edge("tools", "worker")
    graph_builder.add_conditional_edges(
        "evaluator",
        self.route_based_on_evaluation,
        {"worker": "worker", "END": END}
    )
    graph_builder.add_edge(START, "worker")

    self.graph = graph_builder.compile(checkpointer=self.memory)

Reading this, I can see the flow. The worker does something, maybe uses tools, then the evaluator checks it. If it's not good, back to the worker. If it's good, we're done.

Self-Evaluation Was the Killer Feature

The evaluator pattern changed how I think about agent quality:

python

class EvaluatorOutput(BaseModel):
    feedback: str = Field(description="What needs improvement")
    success_criteria_met: bool
    user_input_needed: bool

def evaluator(self, state: State) -> State:
    eval_result = self.evaluator_llm_with_output.invoke(evaluator_messages)

    return {
        "messages": [...],
        "feedback_on_work": eval_result.feedback,
        "success_criteria_met": eval_result.success_criteria_met,
        "user_input_needed": eval_result.user_input_needed,
    }

The agent doesn't just try once; it keeps going until it meets the criteria or needs human help.

Tool Integration Was Rich

Since LangGraph sits on LangChain, I got access to everything:

python

async def playwright_tools():
    playwright = await async_playwright().start()
    browser = await playwright.chromium.launch(headless=False)
    toolkit = PlayWrightBrowserToolkit.from_browser(async_browser=browser)
    return toolkit.get_tools(), browser, playwright

async def other_tools():
    return [
        Tool(name="send_push_notification", func=push, description="..."),
        Tool(name="search", func=serper.run, description="..."),
        PythonREPLTool(),
        WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())
    ]

Browser automation, search, Python execution, and Wikipedia are all plugged in.

The Good Stuff

Visual thinking — I could draw the flow, then code it
Total control — Every state transition is explicit
Built-in checkpointing — Conversation memory just works
Conditional routing — Dynamic paths based on state
LangChain ecosystem — Huge tool library available
Testable — Each node works independently

The Rough Edges

Conceptual overhead — You need to grok state machines
More typing — It's verbose compared to simpler frameworks
Learning curve — Took me longer to feel comfortable

When to Reach for It

If your agent needs to make decisions, branch, loop back, or follow a specific workflow, especially one you need to visualize or debug, LangGraph is worth the extra setup.

4. Autogen — Agents Creating Agents

The Project: Creative Agent World

This one was wild. I built a system where a "Creator" agent generates new agents at runtime. These agents then collaborate on business ideas, sometimes bouncing ideas off each other randomly.

What the Code Looks Like

Autogen uses message passing between routed agents:

python

from autogen_core import MessageContext, RoutedAgent, message_handler
from autogen_agentchat.agents import AssistantAgent

class Agent(RoutedAgent):
    system_message = """
    You are a creative entrepreneur. Your interests: Healthcare, Education.
    You're drawn to disruptive ideas. You're optimistic but impatient.
    """

    CHANCES_THAT_I_BOUNCE_IDEA_OFF_ANOTHER = 0.5

    def __init__(self, name) -> None:
        super().__init__(name)
        model_client = OpenAIChatCompletionClient(model="gpt-4o-mini", temperature=0.7)
        self._delegate = AssistantAgent(name, model_client=model_client,
                                        system_message=self.system_message)

    @message_handler
    async def handle_message(self, message: Message, ctx: MessageContext) -> Message:
        response = await self._delegate.on_messages([text_message], ctx.cancellation_token)
        idea = response.chat_message.content

        # 50% chance to collaborate
        if random.random() < self.CHANCES_THAT_I_BOUNCE_IDEA_OFF_ANOTHER:
            recipient = messages.find_recipient()
            response = await self.send_message(Message(content=message), recipient)
            idea = response.content

        return Message(content=idea)

Dynamic Agent Creation Blew My Mind

The Creator agent writes Python code for new agents, saves it, imports it, and registers it all at runtime:

python

class Creator(RoutedAgent):
    @message_handler
    async def handle_my_message_type(self, message: Message, ctx: MessageContext) -> Message:
        filename = message.content
        agent_name = filename.split(".")[0]

        # LLM generates agent code
        response = await self._delegate.on_messages([text_message], ctx.cancellation_token)

        # Write it to disk
        with open(filename, "w", encoding="utf-8") as f:
            f.write(response.chat_message.content)

        # Import and register dynamically
        module = importlib.import_module(agent_name)
        await module.Agent.register(self.runtime, agent_name, lambda: module.Agent(agent_name))

        # Get an idea from the new agent
        result = await self.send_message(Message(content="Give me an idea"),
                                         AgentId(agent_name, "default"))
        return Message(content=result.content)

Agents creating agents. Each with unique personality. Collaborating randomly. It's chaos in the best way.

Distributed by Design

python

from autogen_ext.runtimes.grpc import GrpcWorkerAgentRuntimeHost

async def main():
    host = GrpcWorkerAgentRuntimeHost(address="localhost:50051")
    host.start()

    worker = GrpcWorkerAgentRuntime(host_address="localhost:50051")
    await worker.start()

    await Creator.register(worker, "Creator", lambda: Creator("Creator"))

    coroutines = [create_and_message(worker, creator_id, i)
                  for i in range(1, HOW_MANY_AGENTS+1)]
    await asyncio.gather(*coroutines)

gRPC runtime, parallel agent creation, and distributed execution. This is built for scale.

The Good Stuff

True distribution — gRPC-based, scales horizontally
Dynamic creation — Agents spawning agents
Clean message model — Inter-agent communication is well-designed
Built for scale — Large multi-agent systems are the use case
Unique personalities — Each agent can be genuinely different

The Rough Edges

Steep climb — RoutedAgent, MessageContext, handlers—lots to learn
Infrastructure weight — gRPC setup isn't trivial
Docs catching up — Not as mature as others
Debugging distributed stuff — Tracing across agents is hard

When to Reach for It

Honestly? I'm still figuring this out. If you need dozens of agents talking to each other across machines, Autogen is built for that. For simpler cases, it's probably overkill.

5. MCP — The Universal Plug

The Project: AI Trading Floor

I built a trading system where multiple AI traders connect to market data (Polygon.io), account services, search (Brave), and memory (LibSQL)—all through MCP servers. Different traders even use different LLMs.

What the Code Looks Like

MCP standardizes how agents connect to external tools:

python

from agents.mcp import MCPServerStdio

trader_mcp_server_params = [
    {"command": "uv", "args": ["run", "accounts_server.py"]},
    {"command": "uv", "args": ["run", "push_server.py"]},
    market_mcp,  # Polygon.io
]

def researcher_mcp_server_params(name: str):
    return [
        {"command": "uvx", "args": ["mcp-server-fetch"]},
        {
            "command": "npx",
            "args": ["-y", "@modelcontextprotocol/server-brave-search"],
            "env": {"BRAVE_API_KEY": os.getenv("BRAVE_API_KEY")},
        },
        {
            "command": "npx",
            "args": ["-y", "mcp-memory-libsql"],
            "env": {"LIBSQL_URL": f"file:./memory/{name}.db"},
        },
    ]

Notice the pattern? Each tool is an MCP server that could be Python (uv run), could be Node (npx), doesn't matter. The protocol is the same.

Connecting Agents to MCP Servers

python

class Trader:
    async def run_with_mcp_servers(self):
        async with AsyncExitStack() as stack:
            trader_mcp_servers = [
                await stack.enter_async_context(
                    MCPServerStdio(params, client_session_timeout_seconds=120)
                )
                for params in trader_mcp_server_params
            ]

            async with AsyncExitStack() as stack:
                researcher_mcp_servers = [
                    await stack.enter_async_context(
                        MCPServerStdio(params, client_session_timeout_seconds=120)
                    )
                    for params in researcher_mcp_server_params(self.name)
                ]
                await self.run_agent(trader_mcp_servers, researcher_mcp_servers)

    async def create_agent(self, trader_mcp_servers, researcher_mcp_servers) -> Agent:
        tool = await get_researcher_tool(researcher_mcp_servers, self.model_name)

        self.agent = Agent(
            name=self.name,
            instructions=trader_instructions(self.name),
            model=get_model(self.model_name),
            tools=[tool],
            mcp_servers=trader_mcp_servers,
        )
        return self.agent

Multi-Model Support Was a Nice Bonus

Different traders can use different LLMs:

python

def get_model(model_name: str):
    if "/" in model_name:
        return OpenAIChatCompletionsModel(model=model_name, openai_client=openrouter_client)
    elif "deepseek" in model_name:
        return OpenAIChatCompletionsModel(model=model_name, openai_client=deepseek_client)
    elif "grok" in model_name:
        return OpenAIChatCompletionsModel(model=model_name, openai_client=grok_client)
    elif "gemini" in model_name:
        return OpenAIChatCompletionsModel(model=model_name, openai_client=gemini_client)
    else:
        return model_name

model_names = ["gpt-4.1-mini", "deepseek-chat", "gemini-2.5-flash-preview", "grok-3-mini-beta"]

Four traders, four different models, all using the same tools through MCP.

The Good Stuff

Open standard — Not locked to any vendor
Growing ecosystem — Brave, Polygon, fetch, memory servers ready to use
Model freedom — Swap LLMs without changing tool code
Clean boundaries — Tools are separate processes
npm/uvx packages — Easy installation for common services

The Rough Edges

Process overhead — Each server is its own process
Setup complexity — Juggling multiple servers gets messy
Debugging across processes — Harder than in-process tools
Still maturing — Protocol evolving, expect changes

When to Reach for It

When you need to plug into multiple external services and want flexibility on which LLM to use. MCP acts like a universal adapter; once a tool speaks MCP, any agent can use it.

Quick Comparison
What Matters	OpenAI SDK	CrewAI	LangGraph	Autogen	MCP
Time to first agent	Fast	Fast	Medium	Slow	Medium
Control over flow	Low	Medium	High	High	High
Memory support	Basic	Excellent	Good	Basic	Via Servers
Tool options	Good	Good	Excellent	Good

My Rankings (After Actually Building Stuff)

🥇 Tied for First: OpenAI Agents SDK & CrewAI

Both got me productive fast. OpenAI SDK's guardrails and type safety felt polished. CrewAI's role-based thinking and memory system impressed me. If I need something working by the end of the day, I'd reach for one of these.

🥈 Second: LangGraph

The graph approach took longer to click, but once it did, I had so much control. For complex workflows where I need to visualize and debug the agent's decision path, nothing else comes close.

🥉 Third: MCP

Not a framework per se, but a protocol that changes how I think about tools. The ability to mix and match LLMs while sharing the same tools is powerful. As more MCP servers appear, this gets more valuable.

Fourth: Autogen

I'm being honest, I haven't fully cracked this one yet. The distributed architecture and dynamic agent creation are genuinely innovative, but the learning curve is real. For large-scale multi-agent systems, it's probably the right choice. I just need more time with it.

Picking Your Framework

Here's my mental model:

Need something working fast? → OpenAI SDK or CrewAI

Building a team with distinct roles? → CrewAI

Complex workflow with branches and loops? → LangGraph

Multiple external services? → Use MCP alongside your framework

Distributed agents at scale? → Autogen (and clear your calendar)

Mixing different LLMs? → MCP makes this painless

Final Thoughts

None of these frameworks is "the best." They're different tools for different problems.

What surprised me most: the skills transfer. Learning LangGraph's state management helped me think about CrewAI's task flows. Understanding MCP's tool protocol made me appreciate how OpenAI SDK handles tools under the hood.

So pick one and build something. You'll learn more from a weekend project than from reading ten more comparison posts (including this one).

All the code I referenced is on GitHub. Poke around, break things, make them better.

→ Browse the full project on GitHub

Special thanks to Edward Donner for teaching and providing the materials for this course. Connect with him on LinkedIn.

Written after actually building with these frameworks. Your experience might differ. These tools evolve fast—check the official docs for the latest.

Chat with Segun Jr

Chat with Segun Jr

What I Actually Learned Building with 5 Agentic AI FrameworksTL;DR

Introduction

1. OpenAI Agents SDK — When You Just Want It to Work

The Project: Deep Research with Guardrails

What the Code Looks Like

The Guardrail Feature Stood Out

The Good Stuff

The Rough Edges

When to Reach for It

2. CrewAI — Thinking in Teams

The Project: Stock Picker

What the Code Looks Like

YAML Config Changed How I Worked

The Memory System is No Joke

The Good Stuff

The Rough Edges

When to Reach for It

3. LangGraph — When You Need a Map

The Project: Sidekick Assistant

What the Code Looks Like

Self-Evaluation Was the Killer Feature

Tool Integration Was Rich

The Good Stuff

The Rough Edges

When to Reach for It

4. Autogen — Agents Creating Agents

The Project: Creative Agent World

What the Code Looks Like

Dynamic Agent Creation Blew My Mind

Distributed by Design

The Good Stuff

The Rough Edges

When to Reach for It

5. MCP — The Universal Plug

The Project: AI Trading Floor

What the Code Looks Like

Connecting Agents to MCP Servers

Multi-Model Support Was a Nice Bonus

The Good Stuff

The Rough Edges

When to Reach for It

My Rankings (After Actually Building Stuff)

🥇 Tied for First: OpenAI Agents SDK & CrewAI

🥈 Second: LangGraph

🥉 Third: MCP

Fourth: Autogen

Picking Your Framework

Final Thoughts

What I Actually Learned Building with 5 Agentic AI Frameworks

TL;DR