browser

@majiayu000/browser

majiayu000

275

50 forks

Updated 5/6/2026

View on GitHub

remotely orchestrate a (chromium) browser to utilize web apps

Installation

$npx agent-skills-cli install @majiayu000/browser

Claude Code

Cursor

Copilot

Codex

Antigravity

Details

Repositorymajiayu000/claude-skill-registry

Pathskills/development/browser/SKILL.md

Branchmain

Scoped Name@majiayu000/browser

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

npx agent-skills-cli list

Skill Instructions

name: browser description: remotely orchestrate a (chromium) browser to utilize web apps

Browser

Uses browser-use (via Chrome DevTools Protocol) to browse any website. Remotely orchestrate a browser, to navigate the web. Meaning any task that can be accomplished via a browser.

Instructions

read README.md to understand this project

QUICK REFERENCE

These are CLI commands you may use.

Task	Command	Purpose
Start Browser Server	`browser server start`	Start background browser server
Stop Browser Server	`browser server stop`	Stop background server
Server Status	`browser server status`	Check server status and latest logs summary
View Live Logs	`browser server logs --follow`	Tail live server logs
Get Page Status	`browser client status`	Get current page URL and title
Navigate to URL	`browser client navigate URL`	Navigate browser to specified URL
Execute JavaScript	`browser client execute JAVASCRIPT`	Execute JavaScript code (arrow function)
Get DOM Content	`browser client dom [SELECTOR]`	Get HTML content of element (default: body)
Fill Input Field	`browser client fill SELECTOR VALUE`	Fill form input field with value
Click Element	`browser client click SELECTOR`	Click on specified element
Click at Coordinates	`browser client click-at X Y`	Click at specific coordinates
Visualize Elements	`browser client visualize [--csv]`	Generate Set-of-Marks visualization
Detect Objects	`browser client detect [--csv]`	Run YOLO object detection
Segment Objects	`browser client segment`	Run SAM segmentation

Examples

Sample user prompt:

let's test this project:

start browser service
use client scripts to
- navigate to google.com
- type hamster dance into search box
- click search button
- read first 3 search results
- summarize them in your response to me
- stop the server

Sample agent output (ie. Gemini 3 Flash):

GitHub Copilot: Created 6 todos

Starting: *Start browser service* (1/6)

Ran terminal command: browser server start

Stdout:
Starting browser server...
✅ Server started in background (PID 12345)
Logs: /home/user/.browser_agent/server.log

Starting: *Navigate to Google* (2/6)

Ran terminal command: browser client navigate "https://www.google.com"

Stdout:
✅ Navigated to https://www.google.com

Starting: *Type search query* (3/6)

Ran terminal command: browser client fill "[name='q']" "hamster dance"

Stdout:
✅ Filled [name='q']

Starting: *Click search button* (4/6)

Ran terminal command: browser client click "[name='btnK']"

Stdout:
✅ Clicked [name='btnK']

Starting: *Read search results* (5/6)

Ran terminal command: browser client execute "() => Array.from(document.querySelectorAll('#search h3')).slice(0, 3).map(h3 => ({title: h3.innerText, link: h3.closest('a')?.href}))"

Stdout:
Result: [{"title": "Hampton the Hamster \"The Hamsterdance Song\"", "link": "https://www.youtube.com/watch?v=1qN72LEQnaU"}, ...]

Starting: *Stop server* (6/6)

Ran terminal command: browser server stop

Stdout:
✅ Server stopped (PID 12345)

Completed: *Summarize results*

The browser service was successfully started, and I performed the requested search for "hamster dance" on Google. Here are the first three search results:

1.  **Hampton the Hamster "The Hamsterdance Song"**
    - [https://www.youtube.com/watch?v=1qN72LEQnaU](https://www.youtube.com/watch?v=1qN72LEQnaU)
2.  **Hampton the Hamster - The Hamsterdance Song (Official Video ...**
    - [https://www.youtube.com/watch?v=6WpMlwVwydo](https://www.youtube.com/watch?v=6WpMlwVwydo)
3.  **Hamster Dance - Wikipedia**
    - [https://en.wikipedia.org/wiki/Hamster_Dance](https://en.wikipedia.org/wiki/Hamster_Dance)

Visual Grounding Example

Sample user prompt:

Find the "Sign in" button on this page using vision and click it.

Sample agent output:

# 1. Get visual coordinates of interactive elements (CSV is default)
browser client visualize | grep -i "Sign in"
# Output: 1194,28,5,Sign in

# 2. Click at the identified coordinates
browser client click-at 1194 28

Example Output:

CSV Representation (truncated):

44,30,0,About
99,30,1,Store
1010,28,2,Gmail
1066,28,3,Images
1124,28,4,
1194,28,5,Sign in

Object Detection Example (YOLOv8)

Sample user prompt:

Detect people and bicycles in the current image search results.

browser client detect
# Output:
# 256,180,0,person
# 240,320,1,bicycle
# ...

Example Output:

Image Segmentation Example (SAM)

Sample user prompt:

Segment the visual regions of the current page and click on the main logo (Segment 0).

# 1. Run segmentation to get IDs and coordinates
browser client segment
# Output:
# 450,300,0,segment
# 120,50,1,segment
# ...

# 2. Click at the coordinates corresponding to ID 0
browser client click-at 450 300

Example Output:

How it works: The segment command returns a CSV mapping (x,y,id,label). The id in the CSV matches the large number shown in the segmented_*.png image. An AI agent can look at the image to identify which segment it wants to interact with, find that ID in the CSV, and use the provided x,y coordinates for a click-at command.

More by majiayu000

View all

foundry

275

Ethereum development with foundry (forge, cast, anvil)

trading-strategies

275

Framework for developing, testing, and deploying trading strategies for prediction markets. Use when creating new strategies, implementing signals, or building backtesting logic.

pm-orchestrator-website

275

Project management orchestration for website design and development projects. Use this skill when coordinating multi-agent website projects that require design, frontend development, quality control, accessibility compliance, SEO optimization, and performance analysis. Specifically use for (1) Planning website project workflows, (2) Coordinating design and development agents, (3) Enforcing quality gates and standards, (4) Managing project risks and timelines, (5) Handling stakeholder communication, (6) Ensuring WCAG AAA accessibility compliance, (7) Australian Consumer Law compliance for e-commerce sites, (8) Mobile-first responsive design enforcement.

supabase-realtime

275

name: supabase-realtime