Problem with the crawler

Discussion in English language.
Forumsregeln
You can start and continue with posts in english language in all other forums as well, but if you are looking for a forum to start a discussion in english, this is the right choice.

Problem with the crawler

Beitragvon irnerio » Mi Nov 08, 2017 7:50 pm

I'm having the following problem with some of the website listed into my index:

https://www.sitename.com error response: java.io.IOException: Client can't execute: Connection reset duration=1411 for url https://www.sitename.com/ robots exist: crawl allowed 500 ms

So the site can't be indexed.

How can I solve this problem. Is it a known issue ?

Kind regards.

Mario
irnerio
 
Beiträge: 17
Registriert: Fr Mär 17, 2017 9:03 pm

Re: Problem with the crawler

Beitragvon luc » Fr Nov 10, 2017 8:28 am

Hi Mario,
can you provide one or more sample URLs of the problematic websites so one can try to reproduce your error and check what is going wrong?

Best regards
Luc
luc
 
Beiträge: 305
Registriert: Mi Aug 26, 2015 1:04 am

Re: Problem with the crawler

Beitragvon irnerio » Fr Nov 10, 2017 9:32 am

Hi Luc !

https://www.sabatino.pro

URL Access Robots Crawl-Delay Sitemap
https://www.sabatino.pro/ error response: java.io.IOException: Client can't execute: Connection reset duration=1849 for url https://www.sabatino.pro/ robots exist: crawl allowed 500 ms []

Thanks

Mario
irnerio
 
Beiträge: 17
Registriert: Fr Mär 17, 2017 9:03 pm

Re: Problem with the crawler

Beitragvon luc » Sa Nov 11, 2017 8:08 pm

Ok, I tried to start a crawl with depth 1 on this website with YaCy peers of version 1.92/9000 and 1.921/9447 ... and if worked fine.

Do you use specific crawler settings? Do you also have an error when checking the failing pages with YaCy URL Viewer (/ViewFile.html)?
luc
 
Beiträge: 305
Registriert: Mi Aug 26, 2015 1:04 am

Re: Problem with the crawler

Beitragvon irnerio » So Nov 19, 2017 9:28 pm

No specific settings. Crawling depth 3 (but it still didn't work with 1).

This is the response of the url viewer.

Unable to download resource content.

error loading resource: java.io.IOException: Client can't execute: Connection reset duration=1450 for url https://www.sabatino.pro/

I've also atteched a screenshot



luc hat geschrieben:Ok, I tried to start a crawl with depth 1 on this website with YaCy peers of version 1.92/9000 and 1.921/9447 ... and if worked fine.

Do you use specific crawler settings? Do you also have an error when checking the failing pages with YaCy URL Viewer (/ViewFile.html)?
Dateianhänge
irnerio-errore.JPG
irnerio-errore.JPG (38.86 KiB) 342-mal betrachtet
irnerio
 
Beiträge: 17
Registriert: Fr Mär 17, 2017 9:03 pm

Re: Problem with the crawler

Beitragvon luc » Mi Nov 22, 2017 8:09 am

Ok, the error message is rather generic. It guess it could be an issue with your hardware network connection (but likely not as your crawl is working fine with other websites), or eventually this particular host could be rejecting requests from your YaCy peer for some reason.
Does this website answers without error when you request it from the same computer on which you run YaCy, but with other tools such as a browser, curl, wget...?
luc
 
Beiträge: 305
Registriert: Mi Aug 26, 2015 1:04 am

Re: Problem with the crawler

Beitragvon irnerio » Fr Nov 24, 2017 2:41 pm

Yes the website is reachable from the server where yacy is installed. May be a problem with ssl ? www.sabatino.pro is my website. The problem started after I changed the ssl certificate. Do you think it's only a coincidence ?
irnerio
 
Beiträge: 17
Registriert: Fr Mär 17, 2017 9:03 pm

Re: Problem with the crawler

Beitragvon luc » Mo Nov 27, 2017 10:14 am

Yes it would be strange to be only a coincidence... But I have "good" news : I could reproduce the exact same issue as you when running YaCy 1.92/9000 with a Oracle JVM jdk1.7.0_80. I had no time yet to check in-depth what is exactly going wrong, but at least I can ensure you that everything works fine with the same YaCy release on a recent 1.8 JVM (OpenJDK or Oracle 1.8.0_151), so I suggest you to upgrade at least your Java version.
luc
 
Beiträge: 305
Registriert: Mi Aug 26, 2015 1:04 am

Re: Problem with the crawler

Beitragvon luc » Mo Nov 27, 2017 9:36 pm

A few more details : your website appears to be configured to use TLSv1.2. By default TLSv1.2 and TLSv1.1 are disabled in the JDK 1.7 on client connections, while the JDK 1.8 uses TLSv1.2 as default. I tried to enable TLSV1.2 in my JDK 1.7 install, using the control panel, but still had no success to crawl then your website with a YaCy peer running on this Java version.
So this makes one more good reason to upgrade to Java 1.8.
luc
 
Beiträge: 305
Registriert: Mi Aug 26, 2015 1:04 am

Re: Problem with the crawler

Beitragvon irnerio » Di Nov 28, 2017 9:45 pm

Ok. Thanks. I'll make the update. Will let you know. Kind regards.

Mario
irnerio
 
Beiträge: 17
Registriert: Fr Mär 17, 2017 9:03 pm

Re: Problem with the crawler

Beitragvon irnerio » Di Nov 28, 2017 10:43 pm

Dear Luc, It worked ! Updated to java 1.8.0_151 and all works fine.

Thx again

Mario
irnerio
 
Beiträge: 17
Registriert: Fr Mär 17, 2017 9:03 pm


Zurück zu English

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 2 Gäste