Start a new topic

reason for limitation of CrawlingThreadPoolSize

Hi Team,


why there is a limitation to the CrawlingThreadPoolSize to max 2?

mentioned here --> https://copernicsearch.freshdesk.com/en/support/discussions/topics/47000642738/page/last#post-47001465793


I have a powerful workstation for running copernic server and rescaning the folders for changes takes more than 3 weeks for 4 million files.


i'm sure - increasing CrawlingThreadPoolSize to 4 or 8 would now harm my machine


regards

Andreas Korntheuer

 

 

 

 


please reply - rescaning the folders for changes should not take 3 weeks

If you are using the OCR-plugin, that is the reason, why it takes so long. I have an index with 1.300.000 items and with the OCR-Pugin a full update takes more than three weeks. Without it, it still takes 10 days and the reason for that is the fact that CDS every time reindexes all emails. It does not restrict itself to the new or changed emails ...


I complained several times about that without any succes.

i think, this is not the main problem in my case.


the process Copernic.Plugins.PluginsService.exe is still at 0% cpu usage

if copernic do a ocr rescaning this process should be more than 0% cpu usage


the process Copernic.DesktopSearch.exe use about 15% cpu usage.

if there are more than 2 Crawling Threads the indexing should be faster.


regards


Same problem here.


I use a high-end PC (with i.a. an AMD Threadripper Ryzen CPU with 24 cores). With the indexing performance set at unrestricted, the indexing process stays well below 3% CPU load all the time. That is of course ridiculous for this kind of PC and the time waste is enormous.


Remarkably, after manually adjusting the parameters ("CrawlingThreadPoolSize" and "ExtractionThreadPoolSize"), at each startup my settings are immediately overwritten and reset to those meager "2" and "1"  values again.


Ask me why ... !

Hi Mertens,


"ExtractionThreadPoolSize" can be set to a maximum of 4.

"CrawlingThreadPoolSize", maximum 2.


If you exceed those numbers, the config file will simply reset to its default value.  

Not so. As I have just said, with the settings as suggested, at the next start it overwrites both settings to those "2" and "1"  values again.

Did you try 2 and 2, start and see if it returns back to 2 and 1, then try with 2 and 3.


It shouldn't behave like that. I'll make a few checks with our developpers.

My configuration runs like this :


image


Only If you go higher the values are adjusted.

Hi Mertens,


Make sure you completely exit Copernic Desktop Search BEFORE making any changes in the config file. I think that could be the issue here. So completely exit CDS, then make changes to your config file extractionthread and then start Copernic. It should be okay.

What I report now will sound a bit dumb – but here it is.


In total I repeated the test to modify both parameters 4 times. The first 3 times the values “2/4” were immediately overwritten to “1/2” upon restart of the app. I’ve done another test now once again … and the”2/4” values stayed on!


Go figure …


The only difference with the 4th test may be that at long last the initial indexing had been finalized; so the app was busy indexing the first 3 times, and idle at the 4th time. I have no other explanation.


Thx for all the replies.

Currently both thread pools seem to be limited by 4 threads max. At least, 4/4 configuration works for me, anything higher gets reset to 4/4. But still it is very low limit. While initial indexing of ca. 1 million documents takes days, and each index update - hours, I'd prefer to utilize all the 16 cores to the maximum to speed up this process, scheduling the indexing time so that it doesn't affect other tasks.

Login or Signup to post a comment