Seite 1 von 1

The YaCy Grid

BeitragVerfasst: Mi Mär 29, 2017 9:58 am
von Orbiter
I'm actually working hard to make a YaCy/2, now called "YaCy Grid".
The main idea is currently, that this becomes a large-scale search appliance -- for the first step.
In a second step, we can do two things: replace the old code parts in "Legacy YaCy" with the grid elements and secondly, turn the YaCy Grid into a peer-to-peer architecture (again).
YaCy Grid is therefore a 'professional YaCy' with the vision that it stays a modern piece of software that may power the next-generation p2p search.

I posted a milestone plan and an architecture picture here:
https://twitter.com/yacy_search/status/ ... 1844357120
Bild

Re: The YaCy Grid

BeitragVerfasst: Mi Mär 29, 2017 10:02 am
von Orbiter
"Legacy YaCy" (YaCy/1) will benefit from the milestone 2: we will get a WARC parser which produces elasticsearch-like JSON index files and YaCy will get a surrogate parser to read those files.
Then it will be easy to use outside-of-YaCy crawlers like you have with wget:
Code: Alles auswählen
wget "http://yacy.net" --warc-file="yacy"

..will generate a WARC file which YaCy/1 then can index using the Grid Parser.

Re: The YaCy Grid

BeitragVerfasst: Fr Mär 31, 2017 12:50 am
von reger
Uups,

was parallel looking into a warc importer and read your post to last, see commit https://github.com/yacy/yacy_search_ser ... fd248d51f3

P.S. looked at your grid prototype, haven't grap'd all the communication details so far but was a little surprised by the prerequisite (rabbit & ftp) currently without a way around/out,
at least for the ftp I implemented for my first testing Apache embedded (https://mina.apache.org/ftpserver-proje ... erver.html). Maybe something to consider.

Re: The YaCy Grid

BeitragVerfasst: Sa Apr 01, 2017 12:37 am
von Orbiter
great work with the WARC importer!
reger hat geschrieben:prerequisite (rabbit & ftp) currently without a way around/out,

Well actually if the MCP does not find a FTP service, it will host files itself. Same with the queue, if there is no rabbitMQ, it will handle queues with a poor-mans-queue implementation using an embedded MapDB

reger hat geschrieben:at least for the ftp I implemented for my first testing Apache embedded (https://mina.apache.org/ftpserver-proje ... erver.html). Maybe something to consider.

I considered that as well but we can that as add-on later. Same with SMB or other protocols, any file sharing should be usable. Idea is that everyone can choose their own place to share warc/index files.

Re: The YaCy Grid

BeitragVerfasst: Sa Apr 01, 2017 11:11 am
von Huppi
@Orbiter: Thanks for sharing your plan! Looks great!