How to get the list of url indexed by a crawl?

Discussion in English language.
Forumsregeln
You can start and continue with posts in english language in all other forums as well, but if you are looking for a forum to start a discussion in english, this is the right choice.

How to get the list of url indexed by a crawl?

Beitragvon dClauzel » Mi Jun 04, 2014 9:47 am

I am looking for a way to get the list of all URl indexed by a craw, so I can check what has (not) been collected in order to refine the exclusion rules.

I looked into the Index administration (IndexControlURLs_p.html) it looks like there is no way to do that.

How to get the list of url indexed by a crawl?
dClauzel
 
Beiträge: 4
Registriert: Mi Jun 04, 2014 9:30 am

Re: How to get the list of url indexed by a crawl?

Beitragvon Orbiter » Mi Jun 04, 2014 11:20 am

well, that is finally easy :)
- assign a collection name to your crawl start which identifies the crawl (just give any name, see field "Add Crawl result to collection(s)")
- use the solr search interface to get a list restricted to that collection. I.e, if the collection name was 'crawl1', then get the url with path
Code: Alles auswählen
/solr/collection1/select?q=collection_sxt:crawl1&defType=edismax&start=0&rows=100&fl=sku

you can adopt the start number and rows number here to get all or parts of the list.
Orbiter
 
Beiträge: 5787
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: How to get the list of url indexed by a crawl?

Beitragvon dClauzel » Mi Jun 04, 2014 2:52 pm

That's perfect. Thanks!

We really need more web interface to explore the index. I can work with xml, but the non-technical users… :/
dClauzel
 
Beiträge: 4
Registriert: Mi Jun 04, 2014 9:30 am


Zurück zu English

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 1 Gast

cron