Can't index URL with 301 redirection

Discussion in English language.
Forumsregeln
You can start and continue with posts in english language in all other forums as well, but if you are looking for a forum to start a discussion in english, this is the right choice.

Can't index URL with 301 redirection

Beitragvon mlagneaux » Di Sep 13, 2016 9:45 am

Hello,

I'm using Yacy to add a search functionnality on a static web site. I've added a search form on my site home page which triggers a jQuery ajax request to my local Yacy server. The static site has been indexed by this Yacy server.

I've a problem with the crawler. Indexing starts from my root URL, let's say http://my.domain.com/. This page references other pages which are not indexed. From example, in my home page, I've a link that points to http://my.domain.com/documentation. This page is not indexed.

In fact, when I access to this page, I'm redirected to http://my.domain.com/documentation/ (with a slash at the end). The crawler doesn't seem to manage this case. In the log, I found this :
I 2016/09/13 10:40:24 REJECTED http://my.domain.com/documentation/ - cannot load: load error - java.io.IOException: CRAWLER Redirect of URL=http://my.domain.com/documentation to http://my.domain.com/documentation/ placed on crawler queue for double-check
I 2016/09/13 10:40:24 LOADER CRAWLER ..Redirecting request to: http://my.domain.com/documentation/
I 2016/09/13 10:40:24 LOADER CRAWLER Redirection detected ('HTTP/1.1 301 Moved Permanently') for URL http://my.domain.com/documentation
I 2016/09/13 10:40:24 LoaderDispatcher waited 5002 ms for http://my.domain.com/documentation

Is there a way to index that kind of page ? A crawler parameter for example ?

Thanks in advance for your help.
mlagneaux
 
Beiträge: 5
Registriert: Di Sep 13, 2016 8:57 am

Re: Can't index URL with 301 redirection

Beitragvon luc » Mi Sep 14, 2016 8:13 am

Hi, I also experienced some issues when crawling redirected pages (http://mantis.tokeek.de/view.php?id=636).

So maybe I am wrong, but to my mind until some fixes are applied, your best option (if you have not to many redirected pages failing) is probably to directly recrawl the target URLs...
luc
 
Beiträge: 230
Registriert: Mi Aug 26, 2015 1:04 am

Re: Can't index URL with 301 redirection

Beitragvon mlagneaux » Do Sep 15, 2016 9:34 am

That's what I've done :-)
Thank you for your answer

Mickaël
mlagneaux
 
Beiträge: 5
Registriert: Di Sep 13, 2016 8:57 am


Zurück zu English

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 1 Gast

cron