常用自动化操作模块特征隐藏

2023-12-14 09:21:08 浏览数 (2)

前言

爬虫的路上总有我们这些小白解不了的密, 反不了的爬。这时候就需要自动化工具了, 但是一般情况下, 直接使用自动化工具都会被目标网站监测到, 因为有几十个特征会被暴露的特征。所以这篇文章写一下, 常见的浏览器如何执行js, 和隐藏浏览器特征。文章不会涉及到配安装和配置环境步骤。自行查教程

selemium

最早接触的自动化模块

代码语言:javascript复制
# -*- coding: utf-8 -*-
# @Author: Mehaei
# @Date: 2023-12-07 19:58:47
# @Last Modified by: Mehaei
# @Last Modified time: 2023-12-07 21:03:31
import time
from selenium import webdriver


def start():
    driver = webdriver.Chrome()
    with open('stealth.min.js', 'r') as f:
        js = f.read()
    driver.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {'source': js})
    driver.get("https://bot.sannysoft.com/")
    time.sleep(60)


if __name__ == '__main__':
    start()
pyppeteer

实测还是会有少部分特征会无法隐藏, 不过还有其它办法 pyppeteer_stealth隐藏pyppeteer特征天花板神

代码语言:javascript复制
# -*- coding: utf-8 -*-
# @Author: Mehaei
# @Date: 2023-12-07 19:58:47
# @Last Modified by: Mehaei
# @Last Modified time: 2023-12-07 21:22:31
import asyncio
from pyppeteer import launch


async def start():
    browser = await launch(headless=False)
    page = await browser.newPage()
    with open('stealth.min.js', 'r') as f:
        js = f.read()
    await page.evaluateOnNewDocument(js)
    await page.goto("https://bot.sannysoft.com/")
    await asyncio.sleep(60)


if __name__ == '__main__':
    asyncio.get_event_loop().run_until_complete(start())
playwright

新一代爬虫工具 可以录制手动的操作, 自动生成代码。自动化神器

官网 https://playwright.dev/

代码语言:javascript复制
# -*- coding: utf-8 -*-
# @Author: Mehaei
# @Date: 2023-12-07 19:58:47
# @Last Modified by: Mehaei
# @Last Modified time: 2023-12-07 20:52:55
import time
from playwright.sync_api import sync_playwright


def start():
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=False)
        context = browser.new_context()
        context.add_init_script(path='stealth.min.js')
        page = context.new_page()
        page.goto("https://bot.sannysoft.com/", timeout=100000)
        time.sleep(60)


if __name__ == '__main__':
    start()
DrissionPage

新的自动化工具, 同时兼容requests便利性和自动化工具的强大行 且会自动隐藏掉一些自动化特征和无需安装驱动, 感兴趣的可以看官网

https://g1879.gitee.io/drissionpagedocs/

代码语言:javascript复制
# -*- coding: utf-8 -*-
# @Author: Mehaei
# @Date: 2023-12-07 19:58:47
# @Last Modified by: Mehaei
# @Last Modified time: 2023-12-07 22:02:58
import time
from DrissionPage import ChromiumPage


def start():
    page = ChromiumPage()
    with open('stealth.min.js', 'r') as f:
        js = f.read()
    """
    运行js, 但是运行这个stealth脚本会报错
    """
    # page.run_js(js)
    page.get("https://bot.sannysoft.com/")
    time.sleep(60)


if __name__ == '__main__':
    start()

0 人点赞