I Created An Enterprise MCP Gateway

AI Summary9 min read

TL;DR

Building enterprise AI applications requires managing MCP servers, tools, and integrations. An MCP gateway using Bifrost provides centralized control, security, and optimization for production systems.

Key Takeaways

•Centralized MCP gateway solves scaling issues: single entry point for all tools, unified security, and monitoring.
•Role-based access control prevents security breaches by restricting tool access based on user roles.
•Rate limiting protects against runaway costs and API abuse by controlling tool usage frequency.
•Complete audit trails and cost tracking provide visibility and accountability for AI workflows.
•Bifrost offers high performance with 40x lower overhead, 68% less memory usage, and 100% success rate at 5,000 RPS.

💻 The Problem: MCP Without a Gateway is Bad

Here's what happens without proper infrastructure:

Your models spend precious tokens discovering available tools. Teams can't control who uses what. An engineer accidentally deletes the wrong database because the model had access it shouldn't have. API costs spike unexpectedly. You have no idea which AI workflows are running where.

The root issue: MCP was designed for flexibility. When you scale from a chatbot to production AI systems, you need:

Centralized tool management instead of scattered MCP servers
Fine-grained access control so marketing tools don't leak into engineering
Rate limiting per tool to prevent API abuse and runaway costs
Complete audit trails for compliance and debugging

👀 Why Bifrost?

Bifrost is a high-performance, Go-based LLM gateway that solves these problems:

# Quick start - 30 seconds with -p 8000
npx -y @maximhq/bifrost

# Opens http://localhost:8000

Enter fullscreen mode Exit fullscreen mode

💎 Star Bifrost ☆

Key advantages:

40x lower overhead than another gateways (11µs vs 440µs)
68% less memory usage
100% success rate at 5,000 RPS
Code Mode - Models generate orchestration code instead of step-by-step calls
Semantic caching - 40-60% cost reduction on similar queries
Built-in control - RBAC, rate limiting, cost tracking, audit logs

📦 1. Collect All MCP Servers

Instead of direct model access to scattered MCP servers:

// Gateway configuration - single entry point
mcpConfig := &schemas.MCPConfig{
    ClientConfigs: []schemas.MCPClientConfig{
        {
            Name:           "filesystem",
            ConnectionType: schemas.MCPConnectionTypeSTDIO,
            StdioConfig: &schemas.MCPStdioConfig{
                Command: "npx",
                Args:    []string{"-y", "@anthropic/mcp-filesystem"},
            },
            ToolsToExecute: []string{"*"},
        },
        {
            Name:             "web_search",
            ConnectionType:   schemas.MCPConnectionTypeHTTP,
            ConnectionString: bifrost.Ptr("http://localhost:3001/mcp"),
            ToolsToExecute:   []string{"search", "fetch_url"},
        },
    },
}

client, err := bifrost.Init(context.Background(), schemas.BifrostConfig{
    Account:   account,
    MCPConfig: mcpConfig,
    Logger:    bifrost.NewDefaultLogger(schemas.LogLevelInfo),
})

Enter fullscreen mode Exit fullscreen mode

Benefits:

Single source of truth for all tools
Unified security policies
Centralized monitoring and cost tracking
Consistent behavior across all models

⚙️ 2. Control Tool Access Based on Roles

Different teams need different tool access levels. Implement role-based access control:

roleToToolsMapping := map[string][]string{
    "engineering": {"filesystem", "database", "github-api"},
    "marketing":   {"web-search", "document-generation"},
    "finance":     {"cost-tracking"},
    "admin":       {"*"},  // All tools
}

roleLimits := map[string]map[string]int{
    "engineering": {"filesystem": 1000, "database": 500},
    "marketing":   {"web_search": 100},
    "finance":     {"cost_tracking": 50},
}

// Check access
async function checkToolAccess(userId, role, toolName) {
  const allowedTools = roleToToolsMapping[role];
  if (!allowedTools.includes(toolName)) {
    throw new Error(`Tool '${toolName}' is denied for role '${role}'`);
  }
}

Enter fullscreen mode Exit fullscreen mode

Real example - Access denied:

curl -X POST http://localhost:8000/v1/mcp/tool/execute \
  -H "Content-Type: application/json" \
  -d '{
    "tool_call": {
      "tool_name": "database",
      "params": {"query": "SELECT * FROM users"}
    },
    "user_role": "marketing"
  }'

# Response (403):
# {
#   "error": "Access Denied",
#   "message": "Tool 'database' is not allowed for role 'marketing'"
# }

Enter fullscreen mode Exit fullscreen mode

This single change prevents entire categories of security issues.

🔎 3. Implement Rate Limiting

An AI workflow once got stuck in a loop, hammering the database with thousands of queries per second. The costs spiked $2,000 in 2 hours before we caught it.

Rate limiting is your firewall against your own systems:

class RateLimiter {
  async checkLimit(toolName, userId, limit) {
    const key = `${toolName}:${userId}`;
    const now = Date.now();
    const windowStart = now - 60000; // 1 minute

    if (!this.windows.has(key)) {
      this.windows.set(key, []);
    }

    const timestamps = this.windows.get(key)
      .filter(t => t > windowStart);

    if (timestamps.length >= limit) {
      return {
        allowed: false,
        retryAfter: Math.ceil((timestamps[0] + 60000 - now) / 1000)
      };
    }

    timestamps.push(now);
    return { allowed: true, remaining: limit - timestamps.length };
  }
}

Enter fullscreen mode Exit fullscreen mode

Real example - Rate limit exceeded:

curl -X POST http://localhost:8000/v1/mcp/tool/execute \
  -H "Content-Type: application/json" \
  -d '{
    "tool_call": {
      "tool_name": "web_search",
      "params": {"query": "another search"}
    },
    "user_id": "user-123",
    "user_role": "marketing"
  }'

# Response (429 - Rate Limited):
# {
#   "error": "Rate Limit Exceeded",
#   "message": "Tool 'web_search' limit exceeded (100/min)",
#   "retryAfter": 45
# }

Enter fullscreen mode Exit fullscreen mode

The rate limiter caught what would have been a $5,000+ incident in under 30 seconds.

📊 4. Track Costs and Audit Everything

Production AI systems need accountability. Who ran what? When? How much did it cost?

type AuditLog struct {
    Timestamp  time.Time
    UserId     string
    UserRole   string
    ToolName   string
    Success    bool
    Cost       float64
    Duration   time.Duration
    Error      string
}

async function executeTool(toolName, params, context) {
  const startTime = Date.now();

  try {
    const result = await toolExecutor.execute(toolName, params);
    const duration = Date.now() - startTime;
    const cost = calculateCost(toolName, params);

    await auditLogger.log({
      userId: context.userId,
      userRole: context.userRole,
      toolName,
      success: true,
      cost,
      duration
    });

    return result;
  } catch (error) {
    await auditLogger.log({
      userId: context.userId,
      toolName,
      success: false,
      error: error.message
    });
    throw error;
  }
}

Enter fullscreen mode Exit fullscreen mode

Example - Cost breakdown:

GET /v1/analytics/costs?team_id=team-engineering&period=month

{
  "total_cost": "$127.45",
  "budget": "$1000.00",
  "remaining": "$872.55",
  "usage_by_tool": [
    {
      "tool": "web_search",
      "calls": 1234,
      "cost": "$12.34"
    },
    {
      "tool": "database",
      "calls": 567,
      "cost": "$56.70"
    }
  ]
}

Enter fullscreen mode Exit fullscreen mode

This visibility was transformative. Teams saw exactly what they were spending. Anomalies became obvious.

🖋️ The Complete Flow

Here's what tool execution looks like with all control layers:

app.post("/v1/mcp/tool/execute", async (req, res) => {
  const { toolName, params, userId, userRole, teamId } = req.body;

  try {
    // 1. Check role-based access
    await checkToolAccess(userId, userRole, toolName);

    // 2. Check rate limits
    const limit = roleLimits[userRole]?.[toolName];
    const rateLimitCheck = await limiter.checkLimit(toolName, userId, limit);
    if (!rateLimitCheck.allowed) {
      return res.status(429).json({
        error: "Rate Limit Exceeded",
        retryAfter: rateLimitCheck.retryAfter
      });
    }

    // 3. Check budget
    const cost = estimateCost(toolName, params);
    const budgetCheck = await budgetTracker.deductCost(teamId, toolName, cost);
    if (!budgetCheck.allowed) {
      return res.status(402).json({
        error: "Budget Exceeded"
      });
    }

    // 4. Execute tool
    const result = await executeTool(toolName, params);

    // 5. Log the action
    await auditLogger.log({
      userId, userRole, teamId, toolName,
      success: true, cost, duration
    });

    res.json({ success: true, data: result });

  } catch (error) {
    // Log failures too
    await auditLogger.log({
      userId, userRole, teamId, toolName,
      success: false, error: error.message
    });

    res.status(400).json({ success: false, error: error.message });
  }
});

Enter fullscreen mode Exit fullscreen mode

✅ Code Mode

Instead of calling tools one by one, models generate TypeScript code that orchestrates them:

// Model generates this automatically
const tools = await listToolFiles();  // List available tools
const githubTool = await readToolFile('github');  // Read definition

// Execute a complete workflow
const results = await executeToolCode(async () => {
  const repos = await github.search_repos({ 
    query: "golang bifrost", 
    maxResults: 5 
  });

  const formatted = repos.items.map(repo => ({
    name: repo.name,
    stars: repo.stargazers_count,
    url: repo.html_url
  }));

  return { repositories: formatted, count: formatted.length };
});

Enter fullscreen mode Exit fullscreen mode

Benefits:

~40% reduction in token usage
Single execution vs multiple calls
Better control and debugging
Faster execution

📊 Key Metrics

Bifrost performance at 5,000 RPS:

Metric	LiteLLM	Bifrost	Improvement
Gateway Overhead	~440 µs	~11 µs	40x faster
Memory Usage	Baseline	-68%	68% less
Queue Wait	47 µs	1.67 µs	28x faster
Success Rate	89%	100%	Perfect

Why Go language?

Goroutines: lightweight concurrency (~2 KB each)
Compiled binary: no startup overhead
Memory efficient: 68% less than another
True parallelism across CPU cores

⚙️ Getting Started

# 1. Install Bifrost (30 seconds)
npx -y @maximhq/bifrost

# 2. Configure API keys (.env)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

# 3. Open dashboard
open http://localhost:8000

# 4. Make your first call
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello Bifrost!"}]
  }'

# 5. Drop-in replacement
# Change this:
base_url = "https://api.openai.com"
# To this:
base_url = "http://localhost:8000/openai"

Enter fullscreen mode Exit fullscreen mode

💻 What I'd Do Differently

Start with cost tracking from day one - Retrofit is painful
Make rate limits configurable - Teams have different needs
Implement caching aggressively - Semantic caching saves 40%+
Build hierarchical permissions - Flat models don't scale
Set up real-time alerting - Don't wait for weekly reviews

✅ The Real Benefit

At the end of the day, the gateway isn't about being fancy. It's about control.

When you centralize tool management, you get:

Security - Tools isolated by role, mistakes bounded
Visibility - Every action logged and costs tracked
Optimization - See what's expensive and fix it
Debugging - Complete audit trail for incidents

For us, this infrastructure turned AI from "a cool demo" into something we could deploy to production with confidence.

🔗 Resources

Bifrost GitHub: https://github.com/maximhq/bifrost
Documentation: https://docs.getbifrost.ai
MCP Spec: https://modelcontextprotocol.io

Are you building AI infrastructure at scale? Let me know in the comments!

Thanks for reading!

I Created An Enterprise MCP Gateway

TL;DR

Key Takeaways

Tags

💻 The Problem: MCP Without a Gateway is Bad

👀 Why Bifrost?

📦 1. Collect All MCP Servers

⚙️ 2. Control Tool Access Based on Roles

🔎 3. Implement Rate Limiting

📊 4. Track Costs and Audit Everything

🖋️ The Complete Flow

✅ Code Mode

📊 Key Metrics

⚙️ Getting Started

💻 What I'd Do Differently

✅ The Real Benefit

🔗 Resources

dev.to top (week)