A generic API advanced crawler

Forum for developers

A generic API advanced crawler

Beitragvon luc » Fr Feb 19, 2016 9:53 am

Hello, I recently found an interesting repository of public APIs descriptions in OpenApi format : https://github.com/APIs-guru/api-models.

First this made me think it would be a good thing to provide a description of YaCy API in a standardized format such as OpenApi (Swagger) or RAML or in the most relevant format...

I also wonder if it could be possible to make a generic advanced crawler able to request any public API, as an alternative to classical crawling. YaCy already have MediaWiki and phpBB3 specialized crawlers and RSS and OAI-PMH importers.
It would be great to have the possibility to request other APIs with little mapping work.

My basic idea would be to specify in a new YaCy screen :
- eventual required api key
- one or more resources listing/discovery services
- which result field(s) contain a public http resource to index
- or wich result field(s) contain metadata to index
- ... any other necessary information

This mapping should be exportable to be shared with other YaCy users, eventually in a specific folder in git repository.

What do you think? Feasible?
luc
 
Beiträge: 300
Registriert: Mi Aug 26, 2015 1:04 am

Re: A generic API advanced crawler

Beitragvon Orbiter » Mi Mär 30, 2016 9:11 am

well that looks like a lot of work. The big issue with this is: YaCy is already a big beast, and an api with user accounts would open new questions.
My current attempt to create an architecture which can grow - and maybe has such a kind of an api - is a microservice partition of the whole search engine parts.
My thoughts are here: http://kaskelix.de/
This would also cover the idea to have a separate crawler.
Orbiter
 
Beiträge: 5796
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main


Zurück zu YaCy Coding & Architecture

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 1 Gast