Recomend number of characters for each title in SOL index

Discussion in English language.
Forumsregeln
You can start and continue with posts in english language in all other forums as well, but if you are looking for a forum to start a discussion in english, this is the right choice.

Recomend number of characters for each title in SOL index

Beitragvon smokingwheels » Mi Nov 05, 2014 12:40 pm

On Page /IndexSchema_p.html.
The value for title_chars_val does this chop the title length back in the index?

I Have been processing one of my Dumps and had string to long error. I deleted the first 1000 lines of the dump and it was no trouble.

Does anyone have a good number they use on there system.
Thanks
smokingwheels
 
Beiträge: 102
Registriert: Sa Aug 31, 2013 7:16 am

Re: Recomend number of characters for each title in SOL inde

Beitragvon Orbiter » Di Nov 11, 2014 10:54 am

I don't understand the problem, can you try to reformulate or point out what exactly you want to do or a reproducible step-by-step guide on how to create a bug situation?
Orbiter
 
Beiträge: 5769
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: Recomend number of characters for each title in SOL inde

Beitragvon smokingwheels » Mi Nov 12, 2014 12:37 am

Ok Sorry I did not explain very well.
I have Exported my index.
I tried to process it with QuickBASIC 4.5 and when I input the data from the exported file, I get Out Of String Space error on a line input.
For this error to happen the data in the line in the export file is longer than 23800 characters. eg a Title or Description.

Currently I have to open the exported file with a text editor an remove the first 100 lines of the file.
I will have to export it again and then copy what I find in a few days.
It mainly looks like a lot of extended ASCII characters eg above code 127.

Is there any way to limit the length of the data in an export?

Looking at some of the exported data the average would be approx 150 characters or less, depends on the web site though.

Links like this one I have to remove.
<a href="http://www.xyz.com/2012/02/%23D7%91%23D7%93%23D7%99%23D7%23A7%23D7%95%23D7%23AA-%23D7%239E%23D7%23A2%23D7%91%23D7%93%23D7%94-%23D7%239E%23D7%92%23D7%239C%23D7%95%23D7%23AA-%23D7%239E%23D7%94-%23D7%239E%23D7%23A1%23D7%23AA%23D7%23AA%23D7%23A8-%23D7%91%23D7%23AA%23D7%95%23D7%239A-%23D7%94%23D7%97%23D7%23A9%23D7%99%23D7%23A9/">בדיקות מעבדה מגלות: מה מסתתר בתוך החשיש, החגיגת, הקוקאין והאקסטזי | קנאביס - מגזין עם כיוון</a>
smokingwheels
 
Beiträge: 102
Registriert: Sa Aug 31, 2013 7:16 am

Re: Recomend number of characters for each title in SOL inde

Beitragvon Orbiter » Mi Nov 12, 2014 10:36 am

The Solr export is simply a zip file of the Solr data directory which contains lucene index files in binary form. I wonder how you process them but as far as I know there is no limit to any field at all. If any of the fields are at the size you write then the size was like that in the original html that was indexed. Do you actually parse the lucene index files?
Orbiter
 
Beiträge: 5769
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: Recomend number of characters for each title in SOL inde

Beitragvon smokingwheels » Fr Nov 14, 2014 1:09 am

Do you actually parse the lucene index files?

No not at the present point in time.
I just use the Export Function built into Yacy and pick off what I need with QuickBasic 4.5.
smokingwheels
 
Beiträge: 102
Registriert: Sa Aug 31, 2013 7:16 am

Re: Recomend number of characters for each title in SOL inde

Beitragvon Orbiter » Do Dez 11, 2014 9:45 am

smokingwheels hat geschrieben:It mainly looks like a lot of extended ASCII characters eg above code 127.

I believe that should be UTF-8
Orbiter
 
Beiträge: 5769
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main


Zurück zu English

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 1 Gast