Jump to content

Recommended Posts

A little weirdness. I indexed the word "Generator" and then went to the View/Studio/Index Topic item and performed a "Find In Document" - this is what it brought back - another word that is similar to the word "Generator" but NOT the same word at all. It brought back the word "General" and offered to index those entries under the topic "Generator"

As Data said once about how to pronounce his name - "one is my name, the other is not"

Screen Shot 2019-09-09 at 2.58.58 PM.png

Share this post


Link to post
Share on other sites

It's probably trying to find similar words so you could index, for example, generator, generators, generating, etc. 

But "general" may be going a bit far :)


-- Walt

Windows 10 Home, version 1909 (183623.476),
   Desktop: 16GB memory, Intel Core i7-6700K @ 4.00GHz, GeForce GTX 970
   Laptop:  8GB memory, Intel Core i7-3625QM @ 2.30GHz, Intel HD Graphics 4000 or NVIDIA GeForce GT 630M
Affinity Photo 1.8.3.641 and 1.8.4.650 Beta   / Affinity Designer 1.8.3.641 and 1.8.4.650 Beta  / Affinity Publisher 1.8.3.641 and 1.8.4.651 Beta.

Share this post


Link to post
Share on other sites

If it is truly intentional that it is trying to find similar words, then it may be hard to write software that can tell the difference between words that are related (in the same family) and those that are wholly unrelated but simply have a significant portion that happens to match.

Here, generator and general share the same genera- beginning, differing only in the endings -l versus -tor. Unless the software is actually able to understand suffixes, then I think this is about as close as it could get in identifying similar words. Without understanding language, it can’t know that genera- with the suffixes -ted, -ting, -te, are in the same family, but genera- with the suffixes -l, -lly, -lize, are not. If you tried to define which suffixes are similar, it would fail to find genuinely similar words in other cases.

I think that trying to parse the words for meaning across multiple language is asking too much for this kind of software.

My suggestion is instead to provide a checkbox in the Index Topic studio to optionally include similar words in the search results, so that the user can either find exactly the string as searched or include similar strings that may or may not include false positives.

Share this post


Link to post
Share on other sites
10 minutes ago, garrettm30 said:

My suggestion is instead to provide a checkbox in the Index Topic studio to optionally include similar words in the search results, so that the user can either find exactly the string as searched or include similar strings that may or may not include false positives.

Or, possibly, to let the user provide the stemming suggestion. E.g., allow an index search for "generat*" or "generat.*" if the user wants the behavior.


-- Walt

Windows 10 Home, version 1909 (183623.476),
   Desktop: 16GB memory, Intel Core i7-6700K @ 4.00GHz, GeForce GTX 970
   Laptop:  8GB memory, Intel Core i7-3625QM @ 2.30GHz, Intel HD Graphics 4000 or NVIDIA GeForce GT 630M
Affinity Photo 1.8.3.641 and 1.8.4.650 Beta   / Affinity Designer 1.8.3.641 and 1.8.4.650 Beta  / Affinity Publisher 1.8.3.641 and 1.8.4.651 Beta.

Share this post


Link to post
Share on other sites
1 minute ago, walt.farrell said:

Or, possibly, to let the user provide the stemming suggestion. E.g., allow an index search for "generat*" or "generat.*" if the user wants the behavior.

That is not a bad suggestion, but I think it is one more thing that has to be discovered in a manual or video. If we could suggest something that is intuitively apparent, that would be better.

Share this post


Link to post
Share on other sites

Thanks all. @garrettm30, I like your thoughts on the difficulty of the software to deal with similar words - and it could be prefixes that are involved as well as suffixes. I think, perhaps, no change needs to be made. The choices are displayed, the user chooses which words to index and omits the ones that are not indexed under the Topic. It's very straightforward even if sometimes surprising. Thanks again.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...

Important Information

Please note the Annual Company Closure section in the Terms of Use. These are the Terms of Use you will be asked to agree to if you join the forum. | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.