CDS should keep an easily accessible log of files it cannot index

Posted over 3 years ago by Larry Edwards

Post a topic
L
Larry Edwards

 CDS should keep a list of files it was unable to index, so the user can seek to solve the problems with them. As it is, CDS will try futilely to index these files every time the index is updated.


The list should include the full pathname.


A good way to access the list would be a new menu item:  Tools > Unindexable files

8 Votes


6 Comments

Sorted by
L

Larry Edwards posted about 2 years ago

I the version released in fall of 2021, a feature was added to so the a list of unindexed files can be viewed. This is a big addition. To use is execute a search with all input fields blank. The second column from the left (with a cryptic icon too small to see what it is) says "Indexing Error" if you hover over it. Problem files will be at the top of the list, with an exclamation point in an orange circle.


Some of these files are actually OK (e.g. image-only PDF files, and mp3 and mp4 files with no meta data), but I was able to find several hundred files with genuine problems. Such as: very old MS Office files that newer MS Office versions will not read (but I was able to find work arounds for most of those), file fragments from ChkDsk fixes, Apple Mac image files that are useless to me, etc. 


A further improvement I would like to see is to add "Indexing Problem" to the File Type drop-down menu. This would allow a search for only files whose contents could not be indexed, and then the resulting list could be sorted by file type, date, size or filename.


It would also be handy if image-only PDF files and media files that lack meta data would have a yellow instead of orange circle, to distinguish them from genuinely unreadable or corrupted files.


Even so, nice to have this new feature!  Here's Copernic will develop it further, as above.

1 Votes

H

Hubertus Fremerey posted over 2 years ago

I would like to see what sorts of data are indexable and what are already indexed and what not.

After indexing a couple of hours I did not make much advance and I cannot see what has been indexed already.

The extensions relate to Windows Office, but I am using LibreOffice and do not know if this is accepted.


I feel very underinformed.  I wanted to index a couple of data, but I can't because there is no filter for that.

1 Votes

L

Larry Edwards posted over 2 years ago

I agree completely. 

3 Votes

A

Alexander Borouhin posted over 2 years ago

One addition to this request.

It turned out that I have a fair amount of very large PDF files in my document archive which are password-protected from copying contents.  Maybe CDS is right in refusing to index such files.


But why on Earth it attempts to reindex them on every index update? They have not changed, so the result is kinda predictable. However, such attempts take a lot of time and slow down index updates considerably.


IMHO, it would be wise to distinguish between temporary (e.g. read errors) and permanent (such as this password protection) indexing failures, and exclude files with the latter from further reindexing.

2 Votes

N

Nicholas Deinhardt posted about 3 years ago

Anyway to see which files / emails were not index, so as to remove or update them would be great.

1 Votes

B

Brian Hanan posted about 3 years ago

I agree.  The list mentioned by the OP should exist to prevent CDS from trying to index again unless the file has been modified.  The list should also be printable or exportable in a human-readable format.

2 Votes

Login or Sign up to post a comment