Agent SkillsAgent Skills
majiayu000

browser

@majiayu000/browser
majiayu000
275
50 forks
Updated 5/6/2026
View on GitHub

remotely orchestrate a (chromium) browser to utilize web apps

Installation

$npx agent-skills-cli install @majiayu000/browser
Claude Code
Cursor
Copilot
Codex
Antigravity

Details

Pathskills/development/browser/SKILL.md
Branchmain
Scoped Name@majiayu000/browser

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

npx agent-skills-cli list

Skill Instructions


name: browser description: remotely orchestrate a (chromium) browser to utilize web apps

Browser

Uses browser-use (via Chrome DevTools Protocol) to browse any website. Remotely orchestrate a browser, to navigate the web. Meaning any task that can be accomplished via a browser.

Instructions

  • read README.md to understand this project

QUICK REFERENCE

These are CLI commands you may use.

TaskCommandPurpose
Start Browser Serverbrowser server startStart background browser server
Stop Browser Serverbrowser server stopStop background server
Server Statusbrowser server statusCheck server status and latest logs summary
View Live Logsbrowser server logs --followTail live server logs
Get Page Statusbrowser client statusGet current page URL and title
Navigate to URLbrowser client navigate URLNavigate browser to specified URL
Execute JavaScriptbrowser client execute JAVASCRIPTExecute JavaScript code (arrow function)
Get DOM Contentbrowser client dom [SELECTOR]Get HTML content of element (default: body)
Fill Input Fieldbrowser client fill SELECTOR VALUEFill form input field with value
Click Elementbrowser client click SELECTORClick on specified element
Click at Coordinatesbrowser client click-at X YClick at specific coordinates
Visualize Elementsbrowser client visualize [--csv]Generate Set-of-Marks visualization
Detect Objectsbrowser client detect [--csv]Run YOLO object detection
Segment Objectsbrowser client segmentRun SAM segmentation

Examples

Sample user prompt:

let's test this project:

  • start browser service
  • use client scripts to
    • navigate to google.com
    • type hamster dance into search box
    • click search button
    • read first 3 search results
    • summarize them in your response to me
    • stop the server

Sample agent output (ie. Gemini 3 Flash):

GitHub Copilot: Created 6 todos

Starting: *Start browser service* (1/6)

Ran terminal command: browser server start

Stdout:
Starting browser server...
βœ… Server started in background (PID 12345)
Logs: /home/user/.browser_agent/server.log

Starting: *Navigate to Google* (2/6)

Ran terminal command: browser client navigate "https://www.google.com"

Stdout:
βœ… Navigated to https://www.google.com

Starting: *Type search query* (3/6)

Ran terminal command: browser client fill "[name='q']" "hamster dance"

Stdout:
βœ… Filled [name='q']

Starting: *Click search button* (4/6)

Ran terminal command: browser client click "[name='btnK']"

Stdout:
βœ… Clicked [name='btnK']

Starting: *Read search results* (5/6)

Ran terminal command: browser client execute "() => Array.from(document.querySelectorAll('#search h3')).slice(0, 3).map(h3 => ({title: h3.innerText, link: h3.closest('a')?.href}))"

Stdout:
Result: [{"title": "Hampton the Hamster \"The Hamsterdance Song\"", "link": "https://www.youtube.com/watch?v=1qN72LEQnaU"}, ...]

Starting: *Stop server* (6/6)

Ran terminal command: browser server stop

Stdout:
βœ… Server stopped (PID 12345)

Completed: *Summarize results*

The browser service was successfully started, and I performed the requested search for "hamster dance" on Google. Here are the first three search results:

1.  **Hampton the Hamster "The Hamsterdance Song"**
    - [https://www.youtube.com/watch?v=1qN72LEQnaU](https://www.youtube.com/watch?v=1qN72LEQnaU)
2.  **Hampton the Hamster - The Hamsterdance Song (Official Video ...**
    - [https://www.youtube.com/watch?v=6WpMlwVwydo](https://www.youtube.com/watch?v=6WpMlwVwydo)
3.  **Hamster Dance - Wikipedia**
    - [https://en.wikipedia.org/wiki/Hamster_Dance](https://en.wikipedia.org/wiki/Hamster_Dance)

Visual Grounding Example

Sample user prompt:

Find the "Sign in" button on this page using vision and click it.

Sample agent output:

# 1. Get visual coordinates of interactive elements (CSV is default)
browser client visualize | grep -i "Sign in"
# Output: 1194,28,5,Sign in

# 2. Click at the identified coordinates
browser client click-at 1194 28

Example Output: Visualized Elements

CSV Representation (truncated):

44,30,0,About
99,30,1,Store
1010,28,2,Gmail
1066,28,3,Images
1124,28,4,
1194,28,5,Sign in

Object Detection Example (YOLOv8)

Sample user prompt:

Detect people and bicycles in the current image search results.

browser client detect
# Output:
# 256,180,0,person
# 240,320,1,bicycle
# ...

Example Output: Detected Objects

Image Segmentation Example (SAM)

Sample user prompt:

Segment the visual regions of the current page and click on the main logo (Segment 0).

# 1. Run segmentation to get IDs and coordinates
browser client segment
# Output:
# 450,300,0,segment
# 120,50,1,segment
# ...

# 2. Click at the coordinates corresponding to ID 0
browser client click-at 450 300

Example Output: Segmented Regions

How it works: The segment command returns a CSV mapping (x,y,id,label). The id in the CSV matches the large number shown in the segmented_*.png image. An AI agent can look at the image to identify which segment it wants to interact with, find that ID in the CSV, and use the provided x,y coordinates for a click-at command.