The YaCy Grid

Forum for developers

The YaCy Grid

Beitragvon Orbiter » Mi Mär 29, 2017 9:58 am

I'm actually working hard to make a YaCy/2, now called "YaCy Grid".
The main idea is currently, that this becomes a large-scale search appliance -- for the first step.
In a second step, we can do two things: replace the old code parts in "Legacy YaCy" with the grid elements and secondly, turn the YaCy Grid into a peer-to-peer architecture (again).
YaCy Grid is therefore a 'professional YaCy' with the vision that it stays a modern piece of software that may power the next-generation p2p search.

I posted a milestone plan and an architecture picture here:
https://twitter.com/yacy_search/status/ ... 1844357120
Bild
Orbiter
 
Beiträge: 5777
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: The YaCy Grid

Beitragvon Orbiter » Mi Mär 29, 2017 10:02 am

"Legacy YaCy" (YaCy/1) will benefit from the milestone 2: we will get a WARC parser which produces elasticsearch-like JSON index files and YaCy will get a surrogate parser to read those files.
Then it will be easy to use outside-of-YaCy crawlers like you have with wget:
Code: Alles auswählen
wget "http://yacy.net" --warc-file="yacy"

..will generate a WARC file which YaCy/1 then can index using the Grid Parser.
Orbiter
 
Beiträge: 5777
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: The YaCy Grid

Beitragvon reger » Fr Mär 31, 2017 12:50 am

Uups,

was parallel looking into a warc importer and read your post to last, see commit https://github.com/yacy/yacy_search_ser ... fd248d51f3

P.S. looked at your grid prototype, haven't grap'd all the communication details so far but was a little surprised by the prerequisite (rabbit & ftp) currently without a way around/out,
at least for the ftp I implemented for my first testing Apache embedded (https://mina.apache.org/ftpserver-proje ... erver.html). Maybe something to consider.
reger
 
Beiträge: 43
Registriert: Mi Jan 02, 2013 9:23 am

Re: The YaCy Grid

Beitragvon Orbiter » Sa Apr 01, 2017 12:37 am

great work with the WARC importer!
reger hat geschrieben:prerequisite (rabbit & ftp) currently without a way around/out,

Well actually if the MCP does not find a FTP service, it will host files itself. Same with the queue, if there is no rabbitMQ, it will handle queues with a poor-mans-queue implementation using an embedded MapDB

reger hat geschrieben:at least for the ftp I implemented for my first testing Apache embedded (https://mina.apache.org/ftpserver-proje ... erver.html). Maybe something to consider.

I considered that as well but we can that as add-on later. Same with SMB or other protocols, any file sharing should be usable. Idea is that everyone can choose their own place to share warc/index files.
Orbiter
 
Beiträge: 5777
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: The YaCy Grid

Beitragvon Huppi » Sa Apr 01, 2017 11:11 am

@Orbiter: Thanks for sharing your plan! Looks great!
Huppi
 
Beiträge: 897
Registriert: Fr Jun 29, 2007 9:49 am
Wohnort: Kürten

Re: The YaCy Grid

Beitragvon Orbiter » Mo Apr 24, 2017 3:47 pm

YaCy Grid: Parser Microservice

you can now send a WARC file to a yacy_grid_parser microservice
and get the parsed fulltext and links as JSON:

Code: Alles auswählen
wget https://www.ffii.org --warc-file=ffii.org
curl -X POST -F "sourcebytes=@ffii.org.warc.gz"  http://yacygrid.com:8500/yacy/grid/parser/parser.json


Here we stil use wget as loader. That component will be replaced soon with a headless browser which
generates WARC files.
Orbiter
 
Beiträge: 5777
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main


Zurück zu YaCy Coding & Architecture

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 2 Gäste

cron