scraper cannot load URL: java.io.IOException: Download exce

Discussion in English language.
Forumsregeln
You can start and continue with posts in english language in all other forums as well, but if you are looking for a forum to start a discussion in english, this is the right choice.

scraper cannot load URL: java.io.IOException: Download exce

Beitragvon xioc752 » Mi Jan 14, 2015 9:13 pm

HI, We need to scrape an html page that is frankly 53+ megabytes long and full of links we need to rescue from a sick server...to re-load in a new server.
It is an html save from inside YaCy.

Crawling of "http://IP.Address/filename.html " failed. Reason: scraper cannot load URL: java.io.IOException: Download exceeded maximum value of 10485760 bytes/

How + where do we eliminate the barrier, please?
Many thanks
xioc752
 
Beiträge: 68
Registriert: Mo Jul 28, 2014 5:01 pm

Re: scraper cannot load URL: java.io.IOException: Download e

Beitragvon Orbiter » Mo Feb 02, 2015 11:57 am

thats easy, open http://localhost:8090/Settings_p.html?page=crawler
and set a new value at HTTP Crawler Settings
Orbiter
 
Beiträge: 5778
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: scraper cannot load URL: java.io.IOException: Download e

Beitragvon xioc752 » Mo Feb 02, 2015 6:14 pm

Thank you!
I have used the -1 setting for unlimited.
These are all cloud servers.
I am processing the page on /CrawlStartExpert.html
I will let you know if it fails to load.
I know it is important for the circular indicator to generate a Green Check Mark to show that the page is remotely fully loaded.
I imagine that this will take a long time.
Thanks again!
:D
xioc752
 
Beiträge: 68
Registriert: Mo Jul 28, 2014 5:01 pm

Re: scraper cannot load URL: java.io.IOException: Download e

Beitragvon xioc752 » Mo Feb 02, 2015 7:20 pm

-1 did not work on one of our quiet servers.
Crawling of "http://0.000.00.00/123456789.html " failed. Reason: scraper cannot load URL: java.io.IOException: java.lang.OutOfMemoryError: Java heap space/

next thoughts, please?
Many thanks
xioc752
 
Beiträge: 68
Registriert: Mo Jul 28, 2014 5:01 pm

Re: scraper cannot load URL: java.io.IOException: Download e

Beitragvon Orbiter » Di Feb 03, 2015 11:37 am

Did you read the words "java.lang.OutOfMemoryError: Java heap space" to understand the meaning of that or did you just do a copy-paste?

?

The limitation of the html file size had a reason: to protect people exactly from that error message (which is is not an error at all!).

There would be an obvious answer to your question and I will give you that as an excercise: think! What would you need to do?
Orbiter
 
Beiträge: 5778
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: scraper cannot load URL: java.io.IOException: Download e

Beitragvon xioc752 » Di Feb 03, 2015 3:33 pm

Mais oui, mon cher,...c'est evident.
Thanks for the fast reply.... Homework assignment 'done.'
Of course, it is not an error, per se.
...
However, the real Question #1 is, of course, 'how much extra memory is "enough" for processing an external, cloud hosted, html file that large (53+ megabytes, generated by a previous but ~ sick ~ YaCy server) if we use the /CrawlStartExpert.html for an input door on a new and healthy server?

Question #2, please, is where - other than the underlying cloud platform - is that new and extra memory applied inside the replacement, healthy, YaCy (where it is needed), please?
Somewhere in the crawler is unquestionably the correct place and possibly it is necessary to use a mix of settings for a cloud environment.
I am sure someone knows the current 'best practices' guidance on this. ...ha ha... Many, Many Thanks :)
xioc752
 
Beiträge: 68
Registriert: Mo Jul 28, 2014 5:01 pm

Re: scraper cannot load URL: java.io.IOException: Download e

Beitragvon Orbiter » Di Feb 03, 2015 5:13 pm

I don't know how much memory is needed but I recommend to split the 53 MB file into pieces and import them separately, step by step. I can also not do a guess a size of that pieces, you must try.
Orbiter
 
Beiträge: 5778
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: scraper cannot load URL: java.io.IOException: Download e

Beitragvon xioc752 » Di Feb 03, 2015 5:30 pm

Seems eminently reasonable!
Many thanks...
xioc752
 
Beiträge: 68
Registriert: Mo Jul 28, 2014 5:01 pm


Zurück zu English

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 1 Gast

cron