Seite 1 von 1

Problem with the crawler

BeitragVerfasst: Mi Nov 08, 2017 7:50 pm
von irnerio
I'm having the following problem with some of the website listed into my index:

https://www.sitename.com error response: java.io.IOException: Client can't execute: Connection reset duration=1411 for url https://www.sitename.com/ robots exist: crawl allowed 500 ms

So the site can't be indexed.

How can I solve this problem. Is it a known issue ?

Kind regards.

Mario

Re: Problem with the crawler

BeitragVerfasst: Fr Nov 10, 2017 8:28 am
von luc
Hi Mario,
can you provide one or more sample URLs of the problematic websites so one can try to reproduce your error and check what is going wrong?

Best regards
Luc

Re: Problem with the crawler

BeitragVerfasst: Fr Nov 10, 2017 9:32 am
von irnerio
Hi Luc !

https://www.sabatino.pro

URL Access Robots Crawl-Delay Sitemap
https://www.sabatino.pro/ error response: java.io.IOException: Client can't execute: Connection reset duration=1849 for url https://www.sabatino.pro/ robots exist: crawl allowed 500 ms []

Thanks

Mario

Re: Problem with the crawler

BeitragVerfasst: Sa Nov 11, 2017 8:08 pm
von luc
Ok, I tried to start a crawl with depth 1 on this website with YaCy peers of version 1.92/9000 and 1.921/9447 ... and if worked fine.

Do you use specific crawler settings? Do you also have an error when checking the failing pages with YaCy URL Viewer (/ViewFile.html)?

Re: Problem with the crawler

BeitragVerfasst: So Nov 19, 2017 9:28 pm
von irnerio
No specific settings. Crawling depth 3 (but it still didn't work with 1).

This is the response of the url viewer.

Unable to download resource content.

error loading resource: java.io.IOException: Client can't execute: Connection reset duration=1450 for url https://www.sabatino.pro/

I've also atteched a screenshot



luc hat geschrieben:Ok, I tried to start a crawl with depth 1 on this website with YaCy peers of version 1.92/9000 and 1.921/9447 ... and if worked fine.

Do you use specific crawler settings? Do you also have an error when checking the failing pages with YaCy URL Viewer (/ViewFile.html)?

Re: Problem with the crawler

BeitragVerfasst: Mi Nov 22, 2017 8:09 am
von luc
Ok, the error message is rather generic. It guess it could be an issue with your hardware network connection (but likely not as your crawl is working fine with other websites), or eventually this particular host could be rejecting requests from your YaCy peer for some reason.
Does this website answers without error when you request it from the same computer on which you run YaCy, but with other tools such as a browser, curl, wget...?

Re: Problem with the crawler

BeitragVerfasst: Fr Nov 24, 2017 2:41 pm
von irnerio
Yes the website is reachable from the server where yacy is installed. May be a problem with ssl ? www.sabatino.pro is my website. The problem started after I changed the ssl certificate. Do you think it's only a coincidence ?

Re: Problem with the crawler

BeitragVerfasst: Mo Nov 27, 2017 10:14 am
von luc
Yes it would be strange to be only a coincidence... But I have "good" news : I could reproduce the exact same issue as you when running YaCy 1.92/9000 with a Oracle JVM jdk1.7.0_80. I had no time yet to check in-depth what is exactly going wrong, but at least I can ensure you that everything works fine with the same YaCy release on a recent 1.8 JVM (OpenJDK or Oracle 1.8.0_151), so I suggest you to upgrade at least your Java version.

Re: Problem with the crawler

BeitragVerfasst: Mo Nov 27, 2017 9:36 pm
von luc
A few more details : your website appears to be configured to use TLSv1.2. By default TLSv1.2 and TLSv1.1 are disabled in the JDK 1.7 on client connections, while the JDK 1.8 uses TLSv1.2 as default. I tried to enable TLSV1.2 in my JDK 1.7 install, using the control panel, but still had no success to crawl then your website with a YaCy peer running on this Java version.
So this makes one more good reason to upgrade to Java 1.8.

Re: Problem with the crawler

BeitragVerfasst: Di Nov 28, 2017 9:45 pm
von irnerio
Ok. Thanks. I'll make the update. Will let you know. Kind regards.

Mario

Re: Problem with the crawler

BeitragVerfasst: Di Nov 28, 2017 10:43 pm
von irnerio
Dear Luc, It worked ! Updated to java 1.8.0_151 and all works fine.

Thx again

Mario