My AI Agent Hit a Login Wall: BrowserAct Let It Ask for Help and Resume

👋 Hey there, Tech Enthusiasts!

I'm Sarvar, a Cloud Architect who loves turning complex tech problems into simple solutions. I've worked with AWS, Azure, DevOps, Data, Analytics, Generative-AI and Agentic-AI building real systems for real companies. In this article series, I'll share what I've learned in a way that's easy to follow, whether you're experienced or just getting started.

Let's get into it! 🚀

I'm a cloud architect. I manage infrastructure across multiple AWS accounts, run CI/CD pipelines, and keep monitoring dashboards healthy for clients. A lot of my day involves checking web-based tools Grafana, GitHub, vendor portals, internal dashboards most of which sit behind login walls and anti-bot protection.

But there was always a gap: the agent couldn't browse the web. It couldn't check a dashboard, read a protected page, or handle a login flow.

That changed when I integrated BrowserAct into my workflow. It's a browser layer that gives AI agents the ability to browse real websites with anti-detection, session management, and human handoff built in.

If you missed the first article where I covered the full setup, start there: I Gave My AI Agent a Real Browser - Here's What Actually Happened. This article focuses on the headless + human handoff pattern I've been running in production.

A Note on Tooling

I'm using Kiro as my AI agent it's free during preview and can execute CLI commands directly. But BrowserAct works with anything that can run shell commands: Claude Code, Cursor, Codex, CrewAI, LangChain, or even a simple bash script. The pattern is the same regardless of agent.

The Setup

I run BrowserAct on a Linux server no desktop, no display, just a terminal. This is how it runs in production for my client: headless on a server, triggered by cron or the agent.

Prerequisites

Before getting started, make sure the following components are installed on your system.

Verify Installed Versions

Run the following commands:

python3 --version
# Python 3.12+

node --version
# v18+

google-chrome --version
# Google Chrome 149.x.x.x

Enter fullscreen mode Exit fullscreen mode

Install UV (If Not Already Installed)

BrowserAct uses Python tooling, and uv is the recommended package manager.

curl -LsSf https://astral.sh/uv/install.sh | sh

Enter fullscreen mode Exit fullscreen mode

Install Google Chrome (If Not Already Installed)

Ubuntu / Debian

wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install -y ./google-chrome-stable_current_amd64.deb

Enter fullscreen mode Exit fullscreen mode

Amazon Linux

wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
sudo yum localinstall -y google-chrome-stable_current_x86_64.rpm

Enter fullscreen mode Exit fullscreen mode

Creating a BrowserAct API Key

To allow your AI agent to control a real browser, you'll need a BrowserAct API key.

Step 1: Sign In

Step 2: Open API Key Management

Click your profile email address in the top-right corner.
Select API Keys from the dropdown menu.
Click Manage Keys.

Step 3: Create a New API Key

Click Create Key.
Enter a descriptive name such as:

Amazon-Q
MCP-Server
Development
1. Click Create.

Step 4: Save the API Key

Copy the generated API key and store it securely. For security reasons, you may not be able to view the complete key again after leaving the page.

Treat your API key like a password. Never share it publicly or commit it to source code repositories.

Configure BrowserAct Authentication

Once you have your API key, authenticate BrowserAct using the following command:

browser-act auth set <your-api-key>

Enter fullscreen mode Exit fullscreen mode

Successful authentication will return:

API key saved.

Enter fullscreen mode Exit fullscreen mode

At this point, BrowserAct is connected and ready to provide browser access to your AI agent. The integration takes less than a minute and requires no additional configuration.

After that, the agent has a browser. One more step create a stealth browser instance:

browser-act browser create --type stealth --name "research"

Enter fullscreen mode Exit fullscreen mode

id=101758963005571124 name="research" type=stealth

Enter fullscreen mode Exit fullscreen mode

That id is your browser ID you'll use it every time you open a session. Think of it like a browser profile: it keeps its own fingerprint, cookies, and anti-detection settings. You create it once and reuse it across sessions.

Note: The browser ID shown in this article (101758963005571124) is from my account. When you run browser create, you'll get your own unique ID. Use that in place of mine throughout the examples.

Managing Sessions

Before starting new sessions, check if any are already running:

browser-act session list

Enter fullscreen mode Exit fullscreen mode

session_name: research-gh
browser_type: stealth
browser_id: 101758963005571124
title: Trending repositories on GitHub today · GitHub
url: https://github.com/trending

session_name: research-hn
browser_type: stealth
browser_id: 101758963005571124
title: news.ycombinator.com
url: https://news.ycombinator.com/

session_name: research-ph
browser_type: stealth
browser_id: 101758963005571124
title: Product Hunt – The best new products in tech.
url: https://www.producthunt.com/

Enter fullscreen mode Exit fullscreen mode

To close a specific session:

browser-act --session research-hn session close

Enter fullscreen mode Exit fullscreen mode

session_name=research-hn closed=true

Enter fullscreen mode Exit fullscreen mode

Tip: Always close sessions when you're done. Open sessions keep the browser running and consume resources. If you hit a "session already in use" error, it means that session name is still active either close it or use a different name.

Real Scenario: Morning Tech Research

One of the things I do for a client is compile a daily tech digest what's trending, what's launching, what competitors are shipping. Used to take me 30 minutes of tab-switching every morning.

Now my agent does it. Here's what that looks like.

Quick Extract One Session, One Page

# Open a stealth browser session on the target page
browser-act --session research-hn browser open 101758963005571124 https://news.ycombinator.com

Enter fullscreen mode Exit fullscreen mode

# Get the page state
browser-act --session research-hn state

Enter fullscreen mode Exit fullscreen mode

The agent got back clean, structured content page title, URL, and all interactive elements. From there it can extract exactly what it needs using JS eval:

browser-act --session research-hn eval 'JSON.stringify(Array.from(document.querySelectorAll(".athing")).slice(0,3).map(el => ({title: el.querySelector(".titleline a")?.textContent, points: el.nextElementSibling?.querySelector(".score")?.textContent})))'

Enter fullscreen mode Exit fullscreen mode

[
  {"title": "AI agent bankrupted their operator while trying to scan DN42", "points": "171 points"},
  {"title": "Nobody ever gets credit for fixing problems that never happened", "points": "348 points"},
  {"title": "If you are asking for human attention, demonstrate human effort", "points": "537 points"},
  {"title": "Show HN: Homebrew 6.0.0", "points": "1145 points"}
]

Enter fullscreen mode Exit fullscreen mode

Two commands to open, one to extract. The agent can summarize this, filter by topic, or flag anything relevant to the client.

Where this fits: Any team that needs a daily briefing tech trends, industry news, competitor launches. The agent grabs it, the team reads a summary instead of spending 30 minutes browsing.

Parallel Research - Three Sites at Once

For the full morning digest, the agent opens three parallel sessions on the same browser.

You can use the browser you already created, or create a separate one to keep research isolated from other workflows:

browser-act browser create
# Returns: id=101764340218654773

Enter fullscreen mode Exit fullscreen mode

Then open sessions on it:

# Session 1: GitHub Trending
browser-act --session research-gh browser open 101764340218654773 https://github.com/trending

Enter fullscreen mode Exit fullscreen mode

# Session 2: Hacker News
browser-act --session research-hn browser open 101764340218654773 https://news.ycombinator.com

Enter fullscreen mode Exit fullscreen mode

# Session 3: Product Hunt
browser-act --session research-ph browser open 101764340218654773 https://www.producthunt.com

Enter fullscreen mode Exit fullscreen mode

All three run independently. No conflicts. The agent works through each one:

browser-act session list

Enter fullscreen mode Exit fullscreen mode

session_name: research-gh
browser_type: stealth
browser_id: 101764340218654773
title: Trending repositories on GitHub today · GitHub
url: https://github.com/trending

session_name: research-hn
browser_type: stealth
browser_id: 101764340218654773
title: news.ycombinator.com
url: https://news.ycombinator.com/

session_name: research-ph
browser_type: stealth
browser_id: 101764340218654773
title: Product Hunt – The best new products in tech.
url: https://www.producthunt.com/

Enter fullscreen mode Exit fullscreen mode

Where this fits: Product teams that need multi-source intelligence before standup. Marketing teams tracking launches. DevOps engineers checking status pages across providers. Anything where you'd normally open 5+ tabs.

Structured Data Extraction

Instead of parsing full page HTML, the agent runs targeted JavaScript and gets clean JSON:

browser-act --session research-gh eval "JSON.stringify(Array.from(document.querySelectorAll('article.Box-row')).slice(0,3).map(r => ({repo: r.querySelector('h2 a')?.textContent.trim(), stars: r.querySelector('span.d-inline-block.float-sm-right')?.textContent.trim()})))"

Enter fullscreen mode Exit fullscreen mode

[
  {"repo":"iptv-org /\n\n      iptv","stars":"2,650 stars today"},
  {"repo":"teslamate-org /\n\n      teslamate","stars":"35 stars today"},
  {"repo":"Panniantong /\n\n      Agent-Reach","stars":"1,045 stars today"}]

Enter fullscreen mode Exit fullscreen mode

The agent navigated within the same session Python trending, then TypeScript without opening a new browser. Took a screenshot for the report. I covered extraction patterns in depth in previous article.

Where this fits: Competitor monitoring prices, features, reviews. The agent extracts exactly the data points you need as structured JSON. No scraping framework. No maintenance when the page layout changes. BrowserAct isn't a standalone scraping tool it's a browser layer. Your AI agent is the brain that decides what to do. BrowserAct is the eyes and hands that execute on the web.

Then the Agent Hits a Wall

Everything was going smoothly. The agent had data from three sources, screenshots saved, research compiling nicely. Then it tried to check my GitHub profile settings:

browser-act --session research-gh navigate https://github.com/settings/profile

Enter fullscreen mode Exit fullscreen mode

Response:

url=https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fsettings%2Fprofile
title=Sign in to GitHub · GitHub

Enter fullscreen mode Exit fullscreen mode