2025 ??? Scrapy ?? Playwright ?? ??? ??? ?????????? ????????? ?? ???? ?? ????

Emma Foster
Machine Learning Engineer
12-Nov-2024

Scrapy-Playwright ???? ???
Scrapy-Playwright ?? ???????? ?? ?? Scrapy, Python ?? ??? ?? ???? ?? ????????? ??? ?????????? ??????????, ?? Playwright, ?? ???????? ??????? ????????? ?? ??? ?????? ???? ??? ?? ?????? Scrapy ?? Playwright ?? ?????? ??????? ?? ???????? ????, ??? ????? ?? ??? ???????? ???? ?? ???????? ???????? ?? ??????? ??? ?? ???????? ???? ?? ?????? ?? ??? ????? ?????????????-???? ????????? ?? ??????? ?? ?????? ???? ???
Scrapy-Playwright ?? ????? ????? ?????
???? Scrapy ????? ????????? ?? ??????? ???? ?? ??? ???????? ??, ?? ?????? ????????? ??????? ?? ?????? ??? ?? ???????? ???? ?? ??? ????????????? ?? ???? ???? ?????? ???? ???? ???????? Scrapy ??????? ?? ?????? ?? ??? ???? ???, ????? ?????????? ???? ???? ???? ??? ?? ???? ????? ???????? ?? ??????? ???? ??? ???? ???? ???? Scrapy-Playwright ?? ???? ?? ????? ??, ????? Scrapy ?? ?????? ???????? ?? ????????? ???? ??? ????? ???? ??, ?? ????????? ???? ?? ?? ??? ?????? ??????? ???? ??? ?? ??? ?? ?? ?? ?? ?????????? ?? ??? ???? ???
Scrapy-Playwright ?? ????? ???? ?? ???
- ????????????? ????????: ????????????? ?? ????? ???? ?????? ??? ?? ??? ???? ???? ????????? ?? ????? ?? ??????? ?????
- ?????? ??????????: ???????? ?? ???????? ???? ???, ?? ???????? ???????? ?? ???? ?????????? ????? ?????
- ????? ?????????: ??? ?? ????? ????, ????? ???? ?? ??????? ?? ?????? ?? ??????? ???? ???? ???? ????????? ?? ????????
- ?????????? ??????: ?????????? ??????? ?? ??? ???? ?? ??? Playwright ?? ?????? ???????? ?? ??? ??????
???????
Scrapy-Playwright ?? ??? ?????? ???? ?? ???, ???? Scrapy ?? Playwright ????? ?? ??????? ???? ????? ???? ????? ??? ?? ?? ?? ???? ??????? ?? ???? ??? ?? ???? ???:
-
Scrapy ??????? ????:
bashpip install scrapy
-
Scrapy-Playwright ??????? ????:
bashpip install scrapy-playwright
-
Playwright ???????? ??????? ????:
Playwright ??????? ???? ?? ???, ???? ?????? ???????? ?????? ??????? ???? ?? ???????? ???
bashplaywright install
?????? ????
?? ??? Scrapy ????????? ??? ?? ????
???? ????, ??? ???? ???? ?? ???? ???? ??, ?? ?? ??? Scrapy ????????? ?????:
bash
scrapy startproject myproject
cd myproject
Playwright ?? ????????? ????
????, ???? ???? Scrapy ????????? ?? ?????? ??? Playwright ?? ????? ???? ????? settings.py
????? ?? ?????????? ???????????? ??????:
python
# settings.py
# Playwright ???????? ???????? ?? ????? ????
DOWNLOADER_MIDDLEWARES = {
'scrapy_playwright.middleware.ScrapyPlaywrightDownloadHandler': 543,
}
# HTTP ?? HTTPS ?? ??? ??????? ?????? ????????? ????
DOWNLOAD_HANDLERS = {
'http': 'scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler',
'https': 'scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler',
}
# Playwright ???????? ?? ????? ????
TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'
# Playwright ???????? (????????)
PLAYWRIGHT_BROWSER_TYPE = 'chromium' # 'chromium', 'firefox', ?? 'webkit' ?? ???? ??
PLAYWRIGHT_LAUNCH_OPTIONS = {
'headless': True,
}
??? ?????
?? ??????? ?????
????? ???? ???? ?? ???, ??? ?? ?????? ??????? ????? ?? ?? ?????????????-???????? ??????? ?? ??????? ???? ?? ??? Playwright ?? ????? ???? ??? ?????? ?? ???, ?? ?? ???????? ???? ?? ??????? ?????? ?? ??????? ?? ?????? ??? ?? ??? ???? ???
spiders
?????????? ?? ???? ?? ??? ??????? ????? dynamic_spider.py
?????:
python
# spiders/dynamic_spider.py
import scrapy
from scrapy_playwright.page import PageCoroutine
class DynamicSpider(scrapy.Spider):
name = "dynamic"
start_urls = ["https://example.com/dynamic"]
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(
url,
meta={
"playwright": True,
"playwright_page_coroutines": [
PageCoroutine("wait_for_selector", "div.content"),
],
},
)
async def parse(self, response):
# ????????????? ?????? ??????? ???????? ???? ?? ??? ???? ???????
for item in response.css("div.content"):
yield {
"title": item.css("h2::text").get(),
"description": item.css("p::text").get(),
}
# ??? ?????? ?? ?? ???????? ?? ???????? ????????? ?? ???????
?????????????-???????? ??????? ?? ???????
??? ??? ?? ?????? ???:
playwright: True
: Scrapy ?? ?? ?????? ?? ??? Playwright ?? ????? ???? ?? ??? ????? ???? ???playwright_page_coroutines
: Playwright ?? ??? ??? ???? ???? ??????? ?? ????????? ???? ??? ????, ?? ???????? ?? ???? ?? ????????? ???? ?? ??? ?? ????????div.content
?? ????????? ???? ?? ?? ?????? ??????? ??? ?? ?? ???- ??????????
parse
????: ??????????? ?? ??????? ??? ?? ??????? ?? ??? async ???????? ?? ??? ????? ???
CapSolver ?? ??? ?????? ?? ?? ????
??? ?????????? ??? ?????????? ????????? ??? ?? ?? ?????? ?? ?????? ??, ??????? ???????? ????? ?? ????? ?? ??? ??????? ???? ??? ??? CapSolver ?? ????? ?????? ?? ?? ??????-?????? ?????? ?????? ???? ??, ?????? Playwright ???? ???????? ??????? ??? ?? ??? ?????? ????? ??? ?? ??? ???, ?? ?? ??? ??????? ?? ?????? ?? ??????? ??? ?? ??????? ?? ??? Scrapy-Playwright ?? ??? CapSolver ?? ???? ?????? ???? ????
CapSolver ???? ???
CapSolver ?? ??????-?????? ???? ?? ?? ??????? ?????? ?? ?????? ?? ?? ???? ?? ????????? ?? ???????? ???? ??, ?????? captcha ?? reCAPTCHA ????? ???? ???? ?????????? ????????? ?? ??? CapSolver ?? ?????? ????, ?? ?????? ????????? ?? ?????? ?? ???? ??? ?? ?????? ????????? ?? ???? ???? ?????????? ??????? ?? ?????? ?? ???? ?? ???? ????
Scrapy-Playwright ?? ??? CapSolver ?? ?????? ????
Scrapy-Playwright ?? ??? CapSolver ?? ?????? ???? ?? ???, ???? ?? ???? ????:
- CapSolver ???????? ????????? ??????? ????: CapSolver ?? ???????? ????????? ?????? ???? ?? ?? ???????? ???????? ?? ???? ?????? ?? ?? ???? ?? ???????? ???? ???
- Playwright ?? CapSolver ????????? ??? ???? ?? ??? ????????? ????: Playwright ???????? ????? ???? ???, ?????? ?? ?? ???? ?? ??? CapSolver ????????? ??? ?????
- ?????????? Playwright ?????? ?? ????? ???? ?? ??? Scrapy ???????? ?? ??????? ????: ????????? ???? ?? ???? Scrapy ?????? CapSolver ????????? ??? ??? ?? Playwright ?????? ?? ????? ???? ????
Python ??? ?????? ???????????
???? Scrapy-Playwright ?? ??? CapSolver ?? ?????? ???? ?? ??? ???-??-??? ???????????? ?? ?? ??, ?? ?????? ??? ?? ??? ???? ???
1. CapSolver ???????? ????????? ??????? ????
???? ????, CapSolver ???????? ????????? ??????? ???? ?? ??? ???? ????????? ?????????? ??? ????? ??? ??? ?? ????????? CapSolver.Browser.Extension
?? ????? ???
2. ????????? ?? ????????? ????:
- CapSolver ????????? ?????????? ??? ???????????? ?????
./assets/config.json
?? ??? ?????? - ??????
enabledForcaptcha
??true
?? ??? ???? ?? ???????? ?????? ?? ???captchaMode
??token
?? ???????? ?????
?????? config.json
:
json
{
"enabledForcaptcha": true,
"captchaMode": "token"
// ???? ???????? ???? ???? ???
}
3. ????????? ??? ???? ?? ??? Scrapy ???????? ????? ????
Playwright ?? CapSolver ????????? ??? ???? ?? ??? ????????? ???? ?? ??? ???? settings.py
?? ??????? ????? ???? ????????? ?? ??? ?? ????????? ???? ???? ?? Playwright ?? ?????? ???? ??? ???? ?????
python
# settings.py
import os
from pathlib import Path
# ?????? Playwright ????????
PLAYWRIGHT_BROWSER_TYPE = 'chromium'
PLAYWRIGHT_LAUNCH_OPTIONS = {
'headless': False, # ????????? ??? ???? ?? ??? ??? ???? ?????
'args': [
'--disable-extensions-except={}'.format(os.path.abspath('CapSolver.Browser.Extension')),
'--load-extension={}'.format(os.path.abspath('CapSolver.Browser.Extension')),
],
}
# ????????? ???? ?? ????????? ??????? ??? ??
TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'
???: ???????? ????????? ??? ???? ?? ??? ???????? ?? ???-?????? ??? ??? ???? ?? ???????? ???? ??? ?????, 'headless': False
??? ?????
3. ?? ??????? ????? ?? ?????? ?? ??????? ??
CapSolver ????????? ?? ????? ???? ?????? ?? ??? ???????? ???? ?? ??? ?? ??? ??????? ????? ?? ?????? ??????? ?? ??????? ?????
python
# spiders/captcha_spider.py
import scrapy
from scrapy_playwright.page import PageCoroutine
import asyncio
class CaptchaSpider(scrapy.Spider):
name = "captcha_spider"
start_urls = ["https://site.example/captcha-protected"]
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(
url,
meta={
"playwright": True,
"playwright_page_coroutines": [
PageCoroutine("wait_for_selector", "iframe[src*='captcha']"),
PageCoroutine("wait_for_timeout", 1000), # ????????? ?? ??????? ???? ?? ??? ????????? ????
],
"playwright_context": "default",
},
callback=self.parse_captcha
)
async def parse_captcha(self, response):
page = response.meta["playwright_page"]
# captcha ???????? ?? ?????? ?? ??? ????? ?? ??????? ???????? ????
try:
# captcha iframe ?? ?????? ???? ?? ????????? ????
await page.wait_for_selector("iframe[src*='captcha']", timeout=10000)
frames = page.frames
captcha_frame = None
for frame in frames:
if 'captcha' in frame.url:
captcha_frame = frame
break
if captcha_frame:
# captcha ???????? ?? ????? ????
await captcha_frame.click("div#checkbox")
# CapSolver ?????? ?????? ?? ???? ?? ????????? ????
await page.wait_for_selector("div.captcha-success", timeout=60000) # ????????????? ???????? ?? ???????? ????
self.logger.info("Captcha solved successfully.")
else:
self.logger.warning("captcha iframe not found.")
except Exception as e:
self.logger.error(f"Error handling captcha: {e}")
# ?????? ?? ???? ?? ??? ????? ?? ????? ???? ?? ??? ??? ?????
for item in response.css("div.content"):
yield {
"title": item.css("h2::text").get(),
"description": item.css("p::text").get(),
}
# ??? ?????? ?? ?? ???????? ?? ???????? ????????? ?? ???????
4. ??????? ?????
????????? ???? ?? ??? ?????????? ??????? ??? ?? ???? ??????? ?? ?? ?????? ?????:
bash
scrapy crawl captcha_spider
????? ????????
?? ??? ?? ?? ??? ????? ?? ?????? ?? ???? ???, ?? Scrapy-Playwright ???? ?????????? ????????? ?? ????? ????? ?? ??? ?? ????? ???????? ?????? ???? ???
?? ??????? ?? ???????
?? ??????? ?? ??????? ???? ?? ???? ??????? ?? ?????? ?? ??????? ???? Playwright ?? ???????? ???????? ?? ????? ???? ??????????? ???? ?? ???? ???
python
# spiders/multi_page_spider.py
import scrapy
from scrapy_playwright.page import PageCoroutine
class MultiPageSpider(scrapy.Spider):
name = "multipage"
start_urls = ["https://example.com/start"]
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(
url,
meta={
"playwright": True,
"playwright_page_coroutines": [
PageCoroutine("wait_for_selector", "div.list"),
PageCoroutine("evaluate", "window.scrollTo(0, document.body.scrollHeight)"),
],
},
)
async def parse(self, response):
# ???? ????? ?? ???? ???????
for item in response.css("div.list-item"):
yield {
"name": item.css("span.name::text").get(),
"price": item.css("span.price::text").get(),
}
# ???? ????? ?? ??????? ????
next_page = response.css("a.next::attr(href)").get()
if next_page:
yield scrapy.Request(
response.urljoin(next_page),
callback=self.parse,
meta={
"playwright": True,
"playwright_page_coroutines": [
PageCoroutine("wait_for_selector", "div.list"),
],
},
)
Playwright ???????? ?? ????? ????
Playwright ?? ???????? ???????? ?? ??????? ?? ?????? ???? ??, ?? ????, ?????? ?? ???????? ?????????? ??????? ?? ??????? ?? ??? ?????? ?? ???? ????
python
# settings.py
PLAYWRIGHT_CONTEXTS = {
"default": {
"viewport": {"width": 1280, "height": 800},
"user_agent": "CustomUserAgent/1.0",
},
"mobile": {
"viewport": {"width": 375, "height": 667},
"user_agent": "MobileUserAgent/1.0",
"is_mobile": True,
},
}
???? ??????? ???, ?????? ????????? ????:
python
# spiders/context_spider.py
import scrapy
class ContextSpider(scrapy.Spider):
name = "context"
start_urls = ["https://example.com"]
def start_requests(self):
yield scrapy.Request(
self.start_urls[0],
meta={
"playwright": True,
"playwright_context": "mobile",
},
)
async def parse(self, response):
# ???? ???????? ???? ????
pass
???????? ?? ??? ??????
Scrapy-Playwright ?? ???????????, ???????? ??????? ?? ????? ???? ???? ??????? ?? ????? ????? ?? ??? ???? ???????? ?? ??? ?????? ???? ?? ???? ???
python
# settings.py
DOWNLOADER_MIDDLEWARES.update({
'scrapy.downloadermiddlewares.retry.RetryMiddleware': 550,
'scrapy_playwright.middleware.ScrapyPlaywrightDownloadHandler': 543,
})
# ????? ???? ??? ???? ?? ??????
DEFAULT_REQUEST_HEADERS = {
'User-Agent': 'MyCustomAgent/1.0',
'Accept-Language': 'en-US,en;q=0.9',
}
????????? ??????
Scrapy-Playwright ?? CapSolver ?? ?????? ??? ????? ?? ???, ?????????? ????????? ??????? ?? ????? ????:
- Playwright ????? ?? ??????? ????: ???? ?? ???????? ?? ??? Playwright ?? ????? ???? ????? ??? ???????? ?? ????? ?? ??? ????????????? ???????? ?? ???????? ???? ???
- ???????? ???????? ?? ??????? ????: ???????? ??? ????? ?? ?????? ?? ?? ???? ?? ??? ???? ???? ?? ???????? ???????? ?? ???: ????? ?????
- ???????? ?? ???????????? ???????: ????-???? ??? ???? ???? ??????? ?? ???????? ???? ?? ??? ??????? ???????? ?? ?????? ???????? ??? ?????
- ???????.txt ?? ???? ?? ?????? ?? ?????? ????: ????? ????????? ???? ?? ???? ?????????? ?????????? ?????? ??????? ?? ??????? ?? ???? ???? ????
- ????????? ?? ???? ?? ???? ????: ?????? ?????????? ??????? ?? ???? ???? ?????? ????? ?? ???????? ???? ?? ?????
- ???? CapSolver API ????? ???????? ????: ???? ????????? ??? API ????? ?? ???????? ???? ?? ???? ?? ???? ????????? ??? ????????? ??????? ???? API ????? ?? ???????? ??? ?? ???????? ?????
- ?????????? ??????? ?? ??????? ?? ??? ????: ???????? ?? ?????? ????? ?? ?????? ?? ??? ???? ?????????? ??????? ?? ????? ?????
???? ???
CapSolver: scrape ?? ????? ?????? ???????? ?? ??? ???? ???? ??? ??????? ????? ??? ????? ???? ?? ???, ???? ???????? ??????? ?? ??? 5% ???????? ???? ??????, ?????? ????

????????
Scrapy-Playwright ??? ?????????? ?? ??? ?? ???-????? ??, ?? ????? ?? ?????? ??????? ????????? ?? ??? ?? ??? ?? ????? ??? Scrapy ?? ????? ?????????? ?? Playwright ?? ????? ???????? ??????? ?? ????? ?? ??? ?????, ?? ???? ??????????? ?????????? ??????? ?? ????? ?? ????? ???? ???? ???? ?????, CapSolver ?? ?????? ???? ?? ?? ?????? ????????? ?? ??? ?? ???? ???, ?? ????????? ???? ??? ?? ???? ???????? ????????? ?? ?? ??????? ???? ?????? ???
???? ?? ?-?????? ??????, ???? ?????? ???????????? ?? ???? ?? ?????????????-???? ??????? ?? ??????? ?? ??? ???, Scrapy-Playwright CapSolver ?? ??? ????? ???? ??? ???? ?? ??? ?????? ????? ?????? ???? ??? ????????? ??????? ?? ???? ???? ?? ?? ????????? ???????? ?? ??? ?????, ?? ???? ??????? ?????????? ?? ?????? ????, ????????? ?? ???????? ??? ?????????? ?????? ??? ???? ????
???? ?????????? ????????? ?? ??? ????? ????? ???? Scrapy-Playwright ?? CapSolver ??? ???? ?????, ?? ???? ?????? ?? ??????? ?? ??? ?? ????????? ?? ????? ?????
??????? ????????: ?? ????? ?? ?????? ?? ?? ??????? ???? ????????? ?????????? ?? ??? ??? CapSolver ??? ???? ??????? ?? ???????? ?? ???? ???? ?? ??? ????????? ??? CapSolver ??????? ?? ????? ????, ???????? ?? ???????? ???? ???? ?????????? ?? ??? ???? ???? ?????? ?? ?? ???? ???? ?? ?????? ????? ?????? ?????? ?????????? ????? ?? ????? ????? ?? ???-??? ????????? ???? ???????? ?? ????? ?????? ????????? ?? ?? ???? ??? 100% ??????? ????????? ???? ???? ?? ???? ?????? ?? ????????? ????? ?? ??????????? ???? ???? ???? ??????? ?? ???, ????? ????? ???? ?? ?????? ?? ???????? ???? ?? ?????
????

???? ?? reCAPTCHA Enterprise ??????? ?? ?? ????, v2, v2 ??????, v3, v3 ?????????? 0.9 ?????
?????? ???? ???? ?? reCaptcha ??????? ?? CapSolver ?? ???: ?? ???? reCaptcha ?? ??????? ??? ?? ?? ???? ?? ??? ???-??-??? ?????????? ?????? ???? ??, ?? ??? ???? ?????? ????????? ???? ???

Rajinder Singh
11-Oct-2025

???? ?? ???? ??? ??????
?? ????? ????? ??? ??? ?????? ?? ?? ???? ?? ???? ??? ?? ??????? ???? ?????? ???? ??, ?? ?? ??? ?? ????? ????? ????????? ?? ???????? ???? ?? ??? ???? ???? ??? ?? ??? ?????? ???? ??, ???? ???????? ?? ???? ???? ??, ??? ????????? ?? API ?? ????? ?? ???-??-??? ???? ?? ??????? ????? ???? ??? ???? ??? ?? ????????? ??? ????? ?????? ?? ????????????? ?? ?????? ????? ???? ????? ??? ??? ?????? ?? ???? ?? ??? ????????? ?? ????? ?? ??????????? ?? ????? ?? ?? ??? ?? ??? ?????? ???? ??, ?? ?????? ?????? ???? ??????? ?? ??? ?????? ???????? ?????? ?? ???? ???

Rajinder Singh
11-Oct-2025

??????? ??2 ?????? ???? ?? ????
?? ????? reCaptcha v2 ?????? ?? ?? ???? ?? ???? ??? ?? ???????? ???? ??? ?? ?????? ??????? ?? Capsolver ?? ??? ???? ?? ???? ???????? ?? ?????? ???? ?? ?? ???-??-??? ???? ?????? ???? ??? ????? ?? ????? ?? ?????? ???? ??? ???? ????? ??? ??, ????? ???? ??????? ?? reCaptcha v2 ?????? ?? ???? ???? ?? ?? ???? ?? ????????? ??? ?? ???? ?? ???? ??? ?? ?? ???????? ?? ??????? ???? ??, ?? ????????? ???? ?? ?? ???? ?? ??? ???? ???? ????????? ?? ???? ??? ??????? ???? ???

Rajinder Singh
11-Oct-2025

???????? v3 ?? ????
"reCaptcha V3 ?? CapSolver ?? ??? ?? ???? ?????: ????? ????? ??????, ??? API ??????, ?? ??????? ?? ??????? ?? ??? ??????? ??????"

Rajinder Singh
10-Oct-2025

reCaptcha ?? ?????? ??????? ?? ??? ???? ?????
reCAPTCHA ?? ?????? ?? ??? ????? ??? ?? JavaScript ??????? ?? ????? ???? ????? ?? ?? reCAPTCHA ?? ??????????? ?? ???? ?? ??? ????????? ???? ??? ???? ?? ????? ?? ????????? ??????????? ???? ??? ?? ?? ?? ??? ???? ???

Rajinder Singh
23-Sep-2025

Cloudflare Challenge ???? ?? ????
CapSolver ?? Cloudflare ????????? ?? ????? ?? ??? ????? ?? ???? Cloudflare ?? ??????? ?? ?? ???? ?? ??? ?????? ?????? ?? ????? ???? ??, ?????? ??????? ????? ?? ??????? ?? ??? ??? ???????? ????????? ???? ??? ???

Rajinder Singh
23-Sep-2025