Control a Chrome session via Stagehand to browse, act, extract, and screenshot on demand inside the Factory CLI.
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
skills listSkill Instructions
name: browser description: Control a Chrome session via Stagehand to browse, act, extract, and screenshot on demand inside the Factory CLI.
Skill: Browser
Use this skill when you need live browser automation during a Factory session—opening sites, clicking through flows, gathering structured data, or capturing screenshots.
Inputs
- Target URL or task description (natural language)
- Optional structured extraction schema (JSON field → type)
Behavior
- Ensure Chrome is running via Stagehand with the local profile stored in
.chrome-profile. - Support these commands:
navigate <url>: open a page and capture a screenshot.act "<instruction>": perform natural-language actions.extract "<instruction>" '{"field":"type"}': return structured data.observe "<goal>": list suggested steps the agent can take.screenshot: capture the current viewport.close: shut down the session when finished.
- Save screenshots to
agent/browser_screenshotsand report the file path in the response. - When tasks finish, summarize what happened plus any follow-up steps for the user.
Verification
- If a navigation/action fails, include the error message and prompt the user for next steps.
- Before ending the session, ensure
closehas been run so Chrome processes don’t linger.
Notes
- This skill expects
ANTHROPIC_API_KEY(for Stagehand) and, if used viadroid exec,FACTORY_API_KEYto already be configured. - The working directory should remain inside the cloned skill folder so relative paths resolve correctly.
More by factory-ben
View allUse the public toolbox scripts published from docs/tools via tool-runner.js. Covers listing tools, executing them safely with droid exec context, and required environment variables.
Minimal Chrome DevTools Protocol tools for browser automation and scraping. Use when you need to start Chrome, navigate pages, execute JavaScript, take screenshots, or interactively pick DOM elements.