how to import a lot of url's

Discussion in English language.
Forumsregeln
You can start and continue with posts in english language in all other forums as well, but if you are looking for a forum to start a discussion in english, this is the right choice.

how to import a lot of url's

Beitragvon roel912 » So Feb 21, 2016 9:25 pm

hello, how to import a list with a lot of url's (>2 million) in yacy? thanks for your reply.
roel912
 
Beiträge: 1
Registriert: So Feb 21, 2016 9:22 pm

Re: how to import a lot of url's

Beitragvon luc » Mo Feb 22, 2016 8:52 am

Hello, you can use Advanced Crawler page (/CrawlStartExpert.html), select "From File" Start Point, and paste url of a file containing your urls list (one url per line).
Be aware whole file content will be loaded in memory, so you have to check sufficient free memory is available for YaCy : check file size, and check free memory in /PerformanceMemory_p.html ("Now before GC" column).
luc
 
Beiträge: 232
Registriert: Mi Aug 26, 2015 1:04 am

Re: how to import a lot of url's

Beitragvon smokingwheels » Mi Mär 09, 2016 9:13 am

2 million urls Wow
I think that is a bit much in one hit, why dont you try splitting the main file into smaller ones.
I have a program that runs in QB64 to do that so you could try reducing the number of URLs per go.

https://github.com/smokingwheels/loklak_split/blob/master/split_linux.bas

Instructions on how to install QB64 in Linux
http://smokingwheels.mnsnet.ca/forum/topic.asp?TOPIC_ID=93

Windows http://www.qb64.net/

It will run faster on QuickBasic 4.5 in Windows But no long file names.
smokingwheels
 
Beiträge: 102
Registriert: Sa Aug 31, 2013 7:16 am


Zurück zu English

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 1 Gast

cron