api and content types

Discussion in English language.
Forumsregeln
You can start and continue with posts in english language in all other forums as well, but if you are looking for a forum to start a discussion in english, this is the right choice.

api and content types

Beitragvon drdevil44 » Fr Sep 22, 2017 7:18 pm

Hi there

Two simple noob questions

1) How to I stop yacy indexing images?

2) How can I use the api to loop through all of the pages in the index?

thanks
G
drdevil44
 
Beiträge: 2
Registriert: Fr Sep 22, 2017 7:16 pm

Re: api and content types

Beitragvon luc » Sa Sep 23, 2017 7:53 am

Hi,
1) How to I stop yacy indexing images?

- on crawls started on your own : in the /CrawlStartExpert.html you can simply uncheck the "index media" checkbox. Note that this way video and audio content will also be excluded from indexing.
- on index entries received from other peers (when searching or by the DHT distribution rules) : I believe there is currently not such a setting. You can either enable or disable index receive.

2) How can I use the api to loop through all of the pages in the index?

- you can use the solr select servlet : basically /solr/select?q=*:*&start=[pageNumber]&rows=[numberOfDocsPerPage]&core=collection1 (the link is provided in some YaCy pages, notably/IndexFederated_p.html or /Crawler_p.html pages). You can customize the result with additional parameters controlling either the output format (wt=json, or wt=xml, or wt=csv and so on...), the rendered fields (fl=[your fields]) or using any other relevant Solr parameter

Have a nice day
luc
 
Beiträge: 294
Registriert: Mi Aug 26, 2015 1:04 am

Re: api and content types

Beitragvon drdevil44 » Di Sep 26, 2017 7:30 pm

thanks for your reply and the info

Just one further question:

1) How can I change an existing crawl to stop it crawling images?
2) How do I remove images from the index?
drdevil44
 
Beiträge: 2
Registriert: Fr Sep 22, 2017 7:16 pm

Re: api and content types

Beitragvon luc » Do Sep 28, 2017 7:35 pm

1) How can I change an existing crawl to stop it crawling images?

- it is possible on a running crawl by using the "Edit Profile" button in the /CrawlProfileEditor_p.html page (link in Crawler Monitor > Scheduler and Profile Editor)

2) How do I remove images from the index

- you can do it in the /IndexDeletion_p.html page (Index Administration > Index Deletion) with the "Delete by Solr Query" fieldset : type for example "content_type:image/*" in the query field and this should already remove most of the images with a valid content type. You can eventually extend this to the URL extension if necessary ("url_file_ext_s" field)
luc
 
Beiträge: 294
Registriert: Mi Aug 26, 2015 1:04 am


Zurück zu English

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 2 Gäste

cron