Seite 1 von 1

Crawling, exporting data, recrawling failures, 404 errors...

BeitragVerfasst: Mi Okt 08, 2014 4:28 pm
von Chaoticum
Hello everybody,

I have few questions mostly regarding crawling websites.

1. Is it possible to just (pre)crawl the website? What I need is to quickly get list of all URLs (even 404s and external links etc.) with all nodes and edges. Maybe some of you might be familiar with tools like Xenu or Screaming Frog and that's also what I need, just a list of URLs. I will process that list to several purposes. In the meantime of course I want to index the website. For example Xenu I've mentioned is quite unstable and it's constantly crashing.
2. Is there a way to export the data from YaCy? For example to export list of URLs from selected indexed website. Or is there some file in which are stored these data on my hard drive? I've read something about sql databases but I couldn't find it.
3. Is there a way to get a list of 404s and 301 redirects from indexed website?
4. How about recrawling only failed URLs?

I'll be glad for any tips or ideas. I'm quite new user of YaCy and I'm really excited about it. It has a lot of potencial in SEO.