chromedp模拟浏览器基础入门

源起

最近有个项目要用到headless，以前用过python Selenium。最近想试下go版本的。但是刚开始时候，发现一个坑。网上有部分代码是老版本的chromedp。新版本不通用，就记录下我的学习过程

chromedp 是什么?

广泛使用的headless browser解决方案PhantomJS已经宣布不再继续维护,转而推荐使用headless chrome

那么headless chrome究竟是什么呢,Headless Chrome 是 Chrome 浏览器的无界面形态,可以在不打开浏览器的前提下,使用所有 Chrome 支持的特性运行您的程序。可以像在其他现代浏览器里一样渲染目标网页,并能进行网页截图,获取cookie,获取html等操作.

想要在golang程序里使用headless chrome,需要借助一些开源库,实现和headless chrome交互的库有很多,这里选择chromedp,接口和Selenium类似,易上手。

普通模式

普通模式会在电脑上弹出浏览器窗口，可以在浏览器中看到代码执行的效果，调用完成之后需要关闭掉浏览器。

chrome headless模式

chrome headless模式不会弹出浏览器窗口，并且你多次go run main.go的时候, go 代码运行中断导致后台chrome headless不能退出,导致第二次本地调试失败, 此时解决方案就是自己手动结束chrome进程。 因此在调试go代码的时候不建议使用chrome headless模式。

一些浏览器参数

--no-first-run 第一次不运行
---default-browser-check 不检查默认浏览器
--disable-gpu 关闭gpu,服务器一般没有显卡
remote-debugging-port chrome-debug工具的端口(golang chromepd 默认端口是9222,建议不要修改)
--no-sandbox 不开启沙盒模式可以减少对服务器的资源消耗,但是服务器安全性降低,配和参数 --remote-debugging-address=127.0.0.1 一起使用
--disable-plugins 关闭chrome插件
--remote-debugging-address 远程调试地址 0.0.0.0 可以外网调用但是安全性低,建议使用默认值 127.0.0.1
--window-size 窗口尺寸

使用代码:

代码语言：javascript复制

    opts := append(chromedp.DefaultExecAllocatorOptions[:],
        chromedp.Flag("headless", false),  // 不开启图像界面
        chromedp.ProxyServer("http://10.10.1.1:21869"), // 设置代理访问
        chromedp.Flag("mute-audio", false), // 关闭声音
    )
    allocCtx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
    defer cancel()

选择器:

代码语言：javascript复制

熟悉最常用的几个方法：

chromedp.NewContext() 初始化chromedp的上下文，后续这个页面都使用这个上下文进行操作

chromedp.Run() 运行一个chrome的一系列操作

chromedp.Navigate() 将浏览器导航到某个页面

chromedp.WaitVisible() 等候某个元素可见，再继续执行。

chromedp.Click() 模拟鼠标点击某个元素

chromedp.Value() 获取某个元素的value值

chromedp.ActionFunc() 再当前页面执行某些自定义函数

chromedp.Text() 读取某个元素的text值

chromedp.Evaluate() 执行某个js，相当于控制台输入js

network.SetExtraHTTPHeaders() 截取请求，额外增加header头

chromedp.SendKeys() 模拟键盘操作，输入字符

chromedp.Nodes() 根据xpath获取某些元素，并存储进入数组

chromedp.NewRemoteAllocator

chromedp.OuterHTML() 获取元素的outer html

chromedp.Screenshot() 根据某个元素截图

page.CaptureScreenshot() 截取整个页面的元素

chromedp.Submit() 提交某个表单

chromedp.WaitNotPresent() 等候某个元素不存在，比如“正在搜索。。。”

简单说就是1. 设定参数后调起浏览器 2. 浏览器根据你设定的事件进行操作。下面我们直接看一个代码案例

案例:启动访问某个网站

代码语言：javascript复制

package main

import (
    "context"
    "log"
    "time"

    "github.com/chromedp/chromedp"
)

func main() {

    // 禁用chrome headless
    opts := append(chromedp.DefaultExecAllocatorOptions[:],
        chromedp.Flag("headless", false),
    )
    allocCtx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
    defer cancel()

    // create chrome instance
    ctx, cancel := chromedp.NewContext(
        allocCtx,
        chromedp.WithLogf(log.Printf),
    )
    defer cancel()

    // create a timeout
    ctx, cancel = context.WithTimeout(ctx, 5*time.Second)
    defer cancel()

    // navigate to a page, wait for an element, click
    var example string
    sel := `//*[@id="username"]`
    err := chromedp.Run(ctx,
        chromedp.Navigate(`https://github.com/awake1t`),
        chromedp.WaitVisible("body"),
        //缓一缓
        chromedp.Sleep(2*time.Second),

        chromedp.SendKeys(sel, "username", chromedp.BySearch), //匹配xpath

    )
    if err != nil {
        log.Fatal(err)
    }

    log.Printf("Go's time.After example:n%s", example)

}

访问网站并且截图

代码语言：javascript复制

package main

import (
    "context"
    "io/ioutil"
    "log"
    "math"
    "time"

    "github.com/chromedp/cdproto/emulation"
    "github.com/chromedp/cdproto/page"
    "github.com/chromedp/chromedp"
)

func main() {

    // 禁用chrome headless
    opts := append(chromedp.DefaultExecAllocatorOptions[:],
        chromedp.Flag("headless", false),
    )
    allocCtx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
    defer cancel()

    // create chrome instance
    ctx, cancel := chromedp.NewContext(
        allocCtx,
        chromedp.WithLogf(log.Printf),
    )
    defer cancel()

    // create a timeout
    ctx, cancel = context.WithTimeout(ctx, 15*time.Second)
    defer cancel()

    // navigate to a page, wait for an element, click

    // capture screenshot of an element
    var buf []byte
    // capture entire browser viewport, returning png with quality=90
    if err := chromedp.Run(ctx, fullScreenshot(`https://github.com/awake1t`, 90, &buf)); err != nil {
        log.Fatal(err)
    }
    if err := ioutil.WriteFile("./Screenshot.png", buf, 0644); err != nil {
        log.Fatal(err)
    }
    log.Println("图片写入完成")

}

// fullScreenshot takes a screenshot of the entire browser viewport.
// Liberally copied from puppeteer's source.
// Note: this will override the viewport emulation settings.
func fullScreenshot(urlstr string, quality int64, res *[]byte) chromedp.Tasks {
    return chromedp.Tasks{
        chromedp.Navigate(urlstr),
        chromedp.ActionFunc(func(ctx context.Context) error {
            // get layout metrics
            _, _, contentSize, err := page.GetLayoutMetrics().Do(ctx)
            if err != nil {
                return err
            }

            width, height := int64(math.Ceil(contentSize.Width)), int64(math.Ceil(contentSize.Height))

            // force viewport emulation
            err = emulation.SetDeviceMetricsOverride(width, height, 1, false).
                WithScreenOrientation(&emulation.ScreenOrientation{
                    Type:  emulation.OrientationTypePortraitPrimary,
                    Angle: 0,
                }).
                Do(ctx)
            if err != nil {
                return err
            }

            // capture screenshot
            *res, err = page.CaptureScreenshot().
                WithQuality(quality).
                WithClip(&page.Viewport{
                    X:      contentSize.X,
                    Y:      contentSize.Y,
                    Width:  contentSize.Width,
                    Height: contentSize.Height,
                    Scale:  1,
                }).Do(ctx)
            if err != nil {
                return err
            }
            return nil
        }),
    }
}

更多操作：

https://github.com/chromedp/examples

官方文档：

https://godoc.org/github.com/chromedp/chromedp

网站 go selenium https

0 人点赞