Crawling, exporting data, recrawling failures, 404 errors...

Discussion in English language.
You can start and continue with posts in english language in all other forums as well, but if you are looking for a forum to start a discussion in english, this is the right choice.

Crawling, exporting data, recrawling failures, 404 errors...

Beitragvon Chaoticum » Mi Okt 08, 2014 4:28 pm

Hello everybody,

I have few questions mostly regarding crawling websites.

1. Is it possible to just (pre)crawl the website? What I need is to quickly get list of all URLs (even 404s and external links etc.) with all nodes and edges. Maybe some of you might be familiar with tools like Xenu or Screaming Frog and that's also what I need, just a list of URLs. I will process that list to several purposes. In the meantime of course I want to index the website. For example Xenu I've mentioned is quite unstable and it's constantly crashing.
2. Is there a way to export the data from YaCy? For example to export list of URLs from selected indexed website. Or is there some file in which are stored these data on my hard drive? I've read something about sql databases but I couldn't find it.
3. Is there a way to get a list of 404s and 301 redirects from indexed website?
4. How about recrawling only failed URLs?

I'll be glad for any tips or ideas. I'm quite new user of YaCy and I'm really excited about it. It has a lot of potencial in SEO.
Beiträge: 3
Registriert: Mi Okt 08, 2014 3:52 pm

Zurück zu English

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 1 Gast