On the same website, playwright succeeded but PlaywrightCrawler failed with error: page. goto: net:: ERR-EMPTYResponse #2989

jeff1998-git · 2025-05-28T15:52:36Z

Which package is this bug report for? If unsure which one to select, leave blank

@crawlee/playwright (PlaywrightCrawler)

Issue description

run code:
Console output:
use Playwright is success
INFO PlaywrightCrawler: Starting the crawler.
use Playwright message: 湖北省中医院

use crawlee is fail
WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. page.goto: net::ERR_EMPTY_RESPONSE at https://www.hbhtcm.com/
Call log:

navigating to "https://www.hbhtcm.com/", waiting until "load"
{"id":"dcHa1anqjn3kF91","url":"https://www.hbhtcm.com","retryCount":1}

ERROR PlaywrightCrawler: Request failed and reached maximum retries. page.goto: net::ERR_EMPTY_RESPONSE at https://www.hbhtcm.com/
Call log:

navigating to "https://www.hbhtcm.com/", waiting until "load"

at gotoExtended (D:\crawleeRunVersion\notice_project\crawlee1.1.2\node_modules@crawlee\playwright\internals\utils\playwright-utils.js:165:17)
at PlaywrightCrawler._navigationHandler (D:\crawleeRunVersion\notice_project\crawlee1.1.2\node_modules@crawlee\playwright\internals\playwright-crawler.js:117:52)
at PlaywrightCrawler._handleNavigation (D:\crawleeRunVersion\notice_project\crawlee1.1.2\node_modules@crawlee\browser\internals\browser-crawler.js:331:52)
at async PlaywrightCrawler._runRequestHandler (D:\crawleeRunVersion\notice_project\crawlee1.1.2\node_modules@crawlee\browser\internals\browser-crawler.js:260:13)
at async PlaywrightCrawler._runRequestHandler (D:\crawleeRunVersion\notice_project\crawlee1.1.2\node_modules@crawlee\playwright\internals\playwright-crawler.js:114:9)
at async wrap (D:\crawleeRunVersion\notice_project\crawlee1.1.2\node_modules@apify\timeout\cjs\index.cjs:54:21) {"id":"dcHa1anqjn3kF91","url":"https://www.hbhtcm.com","method":"GET","uniqueKey":"https://www.hbhtcm.com"}

so:On the same website, playwright succeeded but PlaywrightCrawler failed with error: page. goto: net:: ERR-EMPTYResponse
This is very strange.
and use puppeteer run this web is success
Using Fiddler, proxy to 127.0.0.1:8888, playwrightCrawler is successful again, packet capture is normal
This is a very strange phenomenon，I want to know what the reason is

Code sample

const url = 'https://www.hbhtcm.com';    // fail
// const url = 'https://www.baidu.com';  // success

// 1. use crawlee
const crawleeTest = new PlaywrightCrawler({

    useSessionPool: false,
    navigationTimeoutSecs: 120,
    requestHandlerTimeoutSecs: 300,
    maxRequestsPerCrawl: 50,

    launchContext: {
        // !!! You need to specify this option to tell Crawlee to use playwright-extra as the launcher !!!
        launchOptions: {
            // Other playwright options work as usual
            headless: true,
            channel: 'chrome',
            launcher: playwright.chromium,
            args: [
                '--disable-http2', // isable HTTP/2  use of HTTP/1.1

                '--no-sandbox',
                '--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
                '--disable-blink-features=AutomationControlled',
                "--ignore-certificate-errors",
                "--ignore-certificate-errors-spki-list",

            ]
        },
    },

    // ------------------------------------------
    preNavigationHooks: [
        async ({ page }) => {
            // set headers
            await page.setExtraHTTPHeaders({
                'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0',
                'sec-ch-ua': '"Microsoft Edge";v="131", "Chromium";v="131", "Not_A Brand";v="24"',
                'sec-ch-ua-mobile': '?0',
                'sec-ch-ua-platform': '"Windows"',
                'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
                'Accept-Encoding': 'gzip, deflate, br, zstd',
                'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
                'Cache-Control': 'max-age=0',
                'Connection': 'keep-alive',
                'Sec-Fetch-Dest': 'document',
                'Sec-Fetch-Mode': 'navigate',
                'Sec-Fetch-Site': 'none',
                'Sec-Fetch-User': '?1',
                'Upgrade-Insecure-Requests': '1',
            });

            // set Cookie
            // await page.context().addCookies([
            //     { name: 'view', value: '1748198195', domain: 'www.hbhtcm.com', path: '/' },
            //     { name: 'PHPSESSID', value: 'aa5m36c3b87m38ofumijqilbvm', domain: 'www.hbhtcm.com', path: '/' },
            // ]);
        },
    ],
    // --------------------------------------------

    async requestHandler({ request, page, enqueueLinks, log }) {
        log.info(`Processing ${request.url}...`);
        const title = await page.title()
        console.log("use PlaywrightCrawler message:：", title);
    },

    // This function is called if the page processing failed more than maxRequestRetries+1 times.
    failedRequestHandler({ request, log }) {
        log.info(`Request ${request.url} failed too many times.`);
    },
});

await crawleeTest.addRequests([url]);   // 

// ==================================================================
// 2. use Playwright 
const playwrightTest = async () => {

    const browser = await playwright.chromium.launch({
        headless: true,
    });

    const page = await browser.newPage();
    await page.goto(url);
    console.log('use Playwright message:', await page.title());
    await browser.close();
};

// Running results:
await Promise.all([crawleeTest.run(), playwrightTest()]);

Package version

Node.js version

Node v22.12.0

Operating system

windows

Apify platform

Tick me if you encountered this issue on the Apify platform

I have tested this on the `next` release

No response

Other context

No response

The text was updated successfully, but these errors were encountered:

barjin · 2025-05-29T10:38:40Z

Hello, and thank you for your interest in this project!

It seems that the network proxying layer in Crawlee (proxy-chain) exhibits different behaviour than a regular browser while loading the page (https://www.hbhtcm.com).

import { chromium } from 'playwright';
import { Server as ProxyChainServer } from 'proxy-chain';

const server = new ProxyChainServer({
    port: 0,
});
await server.listen();

const url = 'https://www.hbhtcm.com';

const browser = await chromium.launch({
    headless: false,
    proxy: {                                       // comment me out to make this "work"
        server: `http://127.0.0.1:${server.port}`, // comment me out to make this "work"
    }                                              // comment me out to make this "work"
});

const page = await browser.newPage();
await page.goto(url); // Fails with `page.goto: net::ERR_EMPTY_RESPONSE at https://www.hbhtcm.com/`
await browser.close();

By the way, locally, I didn't manage to load the page even without the proxy (vanilla Playwright). The page.goto call has timed out for me. Can you confirm that your Playwright instance can load the page contents (without Crawlee)?

This might be just those two libraries behaving differently on an empty HTTP response - while proxy-chain fails immediately, the browser might try to wait for the response for longer. If this is the case, it's IMO wontfix, as both behaviours seem reasonable.

jeff1998-git · 2025-05-29T14:32:24Z

hello,Thank you for your reply，I have reproduced your code，do not use proxy is success，use proxy is fail，but use proxy before，first open fiddler soft, change port ->8888，fiddler working prot， use proxy code running is success

In the playwrightcrawler module, when using Fiddler as a proxy, the access is successful. Without Fiddler proxy, the access fails

I don't know where the problem lies

const playwright = require('playwright');
const { Server: ProxyChainServer } = require('proxy-chain');

(async () => {
// create proxy server
const server = new ProxyChainServer({
port: 0,
prepareRequestFunction: ({ request }) => {
return {
upstreamProxyUrl: null,
};
},
});

await server.listen();
console.log(`proxy running，port: ${server.port}`);

const url = 'https://www.hbhtcm.com';

try {
    // 1、 run browser and use proxy
    const browser = await playwright.chromium.launch({
        headless: false,
        proxy: {
            server: `http://127.0.0.1:${server.port}`,
        },
    });

    const page = await browser.newPage();

    console.log(`page accessing: ${url}`);

    await page.goto(url, { timeout: 60000 });  // Fails with `page.goto: net::ERR_EMPTY_RESPONSE at https://www.hbhtcm.com/`

    console.log('loading page is success...');

    await page.waitForTimeout(5000);

    await browser.close();
} catch (error) {
    console.error(`Use proxy Navigation failed: ${error.message}`);

    // 2、try again --- run browser and do not use proxy 
    // Attempt to access without using a proxy and determine if it is a proxy issue
    try {
        console.log('try no proxy...');
        const directBrowser = await playwright.chromium.launch({ headless: false });
        const directPage = await directBrowser.newPage();
        await directPage.goto(url, { timeout: 60000 });
        console.log('Do not use proxy server...is success...');   // output: Do not use a proxy server...is success...
        await directBrowser.close();
    } catch (directError) {
        console.error(`Do not use proxy server..is fail...Too.: ${directError.message}`);
        console.error('What is the problem....');
    }
} finally {

    await server.close();
    console.log('proxy server close');
}

})();

jeff1998-git · 2025-05-29T17:57:44Z

again,Use the same code as above，test more website, result:

// const url = 'https://www.google.com';     // use proxy is fail    && no proxy is success  ||  Fiddler port->8888 success
// const url = 'https://www.youtube.com/';   // use proxy is fail    && no proxy is success  ||  Fiddler port->8888 success
// const url = 'https://www.baidu.com/';     // use proxy is success && no proxy is success  ||  Fiddler port->8888 success
// const url = 'https://www.163.com/'        // use proxy is success && no proxy is success  ||  Fiddler port->8888 success
// const url = 'https://www.bing.com/';      // use proxy is success && no proxy is success  ||  Fiddler port->8888 success
// const url = 'https://www.hbhtcm.com';     // use proxy is fail    && no proxy is success  ||  Fiddler port->8888 success

// Fiddler listen in port->8888
// use fiddler soft listen in port 8888
const server = new ProxyChainServer({
port: 8888,
prepareRequestFunction: ({ request }) => {
return {
upstreamProxyUrl: null,
};
},
});

Using Playwright, Fiddler soft can capture data packets, but using the proxy-chain proxy module, Fiddler cannot capture data packets
When using the playwrightcrawler module, Fiddler cannot capture the data packets. I speculate that the playwrightcrawler module should be encapsulated with proxy-chain, so that Fiddler cannot retrieve the data packets from the playwrightcrawler module,

It can be inferred that this type of proxy, when processing certain responses, missed something, causing page.goto: net:: ERR-EMPTYResponse. However, by directing the proxy port to Fiddler's working port 8888 and passing it through Fiddler, Fiddler supplemented and processed these issues, making the work normal again,

Although I don't know what processing the proxy module has done, through these tests, it can be known that the proxy chain's work that was not handled properly has become normal again through Fiddler processing

It may also be due to differences in the network. If playwrightcrawler can obtain data packets through intermediaries, it should be able to discover something

jeff1998-git added the bug Something isn't working. label May 28, 2025

github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label May 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

On the same website, playwright succeeded but PlaywrightCrawler failed with error: page. goto: net:: ERR-EMPTYResponse #2989

On the same website, playwright succeeded but PlaywrightCrawler failed with error: page. goto: net:: ERR-EMPTYResponse #2989

jeff1998-git commented May 28, 2025

barjin commented May 29, 2025

Uh oh!

jeff1998-git commented May 29, 2025

Uh oh!

jeff1998-git commented May 29, 2025

Uh oh!

On the same website, playwright succeeded but PlaywrightCrawler failed with error: page. goto: net:: ERR-EMPTYResponse #2989

On the same website, playwright succeeded but PlaywrightCrawler failed with error: page. goto: net:: ERR-EMPTYResponse #2989

Comments

jeff1998-git commented May 28, 2025

Which package is this bug report for? If unsure which one to select, leave blank

Issue description

Code sample

Package version

Node.js version

Operating system

Apify platform

I have tested this on the next release

Other context

barjin commented May 29, 2025

Uh oh!

jeff1998-git commented May 29, 2025

Uh oh!

jeff1998-git commented May 29, 2025

Uh oh!

I have tested this on the `next` release