reason for limitation of CrawlingThreadPoolSize

Publié sur 3 ans il y a par  Andreas Korntheuer

Publier un sujet
Résolu
A
Andreas Korntheuer

Hi Team,


why there is a limitation to the CrawlingThreadPoolSize to max 2?

mentioned here --> https://copernicsearch.freshdesk.com/en/support/discussions/topics/47000642738/page/last#post-47001465793


I have a powerful workstation for running copernic server and rescaning the folders for changes takes more than 3 weeks for 4 million files.


i'm sure - increasing CrawlingThreadPoolSize to 4 or 8 would now harm my machine


regards

Andreas Korntheuer

 

 

 

 

0 Votes


12 Commentaires

Triés par
L

LUST & Partners Jacky d'Hoest publiés presque 3 ans il y a

I found out that you can change the crawling value to whatever value you want when you first quit CDS, second delete the queue and third change the value in the xml. You can then restart CDS and start an update. Even "12" was possible. Scanning is then superfast. Downside is that sometimes CDS is then unstable, meaning it freezes at 99 % of indexing. You can then only restart CDS.  The value of extraction can never be higher then "4".

2 Votes

A

Alexander Borouhin publiés presque 3 ans il y a

Currently both thread pools seem to be limited by 4 threads max. At least, 4/4 configuration works for me, anything higher gets reset to 4/4. But still it is very low limit. While initial indexing of ca. 1 million documents takes days, and each index update - hours, I'd prefer to utilize all the 16 cores to the maximum to speed up this process, scheduling the indexing time so that it doesn't affect other tasks.

1 Votes

R

Roeland Mertens publiés environ 3 ans il y a

What I report now will sound a bit dumb – but here it is.


In total I repeated the test to modify both parameters 4 times. The first 3 times the values “2/4” were immediately overwritten to “1/2” upon restart of the app. I’ve done another test now once again … and the”2/4” values stayed on!


Go figure …


The only difference with the 4th test may be that at long last the initial indexing had been finalized; so the app was busy indexing the first 3 times, and idle at the 4th time. I have no other explanation.


Thx for all the replies.

0 Votes

M

Mirco Persechino publiés environ 3 ans il y a

Hi Mertens,


Make sure you completely exit Copernic Desktop Search BEFORE making any changes in the config file. I think that could be the issue here. So completely exit CDS, then make changes to your config file extractionthread and then start Copernic. It should be okay.

0 Votes

L

LUST & Partners Jacky d'Hoest publiés environ 3 ans il y a

My configuration runs like this :


image


Only If you go higher the values are adjusted.

0 Votes

M

Mirco Persechino publiés environ 3 ans il y a

Did you try 2 and 2, start and see if it returns back to 2 and 1, then try with 2 and 3.


It shouldn't behave like that. I'll make a few checks with our developpers.

0 Votes

R

Roeland Mertens publiés environ 3 ans il y a

Not so. As I have just said, with the settings as suggested, at the next start it overwrites both settings to those "2" and "1"  values again.

0 Votes

M

Mirco Persechino publiés environ 3 ans il y a

Hi Mertens,


"ExtractionThreadPoolSize" can be set to a maximum of 4.

"CrawlingThreadPoolSize", maximum 2.


If you exceed those numbers, the config file will simply reset to its default value.  

1 Votes

R

Roeland Mertens publiés environ 3 ans il y a

Same problem here.


I use a high-end PC (with i.a. an AMD Threadripper Ryzen CPU with 24 cores). With the indexing performance set at unrestricted, the indexing process stays well below 3% CPU load all the time. That is of course ridiculous for this kind of PC and the time waste is enormous.


Remarkably, after manually adjusting the parameters ("CrawlingThreadPoolSize" and "ExtractionThreadPoolSize"), at each startup my settings are immediately overwritten and reset to those meager "2" and "1"  values again.


Ask me why ... !

0 Votes

A

Andreas Korntheuer publiés sur 3 ans il y a

i think, this is not the main problem in my case.


the process Copernic.Plugins.PluginsService.exe is still at 0% cpu usage

if copernic do a ocr rescaning this process should be more than 0% cpu usage


the process Copernic.DesktopSearch.exe use about 15% cpu usage.

if there are more than 2 Crawling Threads the indexing should be faster.


regards


0 Votes

L

LUST & Partners Jacky d'Hoest publiés sur 3 ans il y a

If you are using the OCR-plugin, that is the reason, why it takes so long. I have an index with 1.300.000 items and with the OCR-Pugin a full update takes more than three weeks. Without it, it still takes 10 days and the reason for that is the fact that CDS every time reindexes all emails. It does not restrict itself to the new or changed emails ...


I complained several times about that without any succes.

2 Votes

A

Andreas Korntheuer publiés sur 3 ans il y a

please reply - rescaning the folders for changes should not take 3 weeks

0 Votes

Connexion ou S'inscrire pour poster un commentaire