Jump to content
You must now use your email address to sign in [click for more info] ×

Index find - wrong word


thetasig

Recommended Posts

A little weirdness. I indexed the word "Generator" and then went to the View/Studio/Index Topic item and performed a "Find In Document" - this is what it brought back - another word that is similar to the word "Generator" but NOT the same word at all. It brought back the word "General" and offered to index those entries under the topic "Generator"

As Data said once about how to pronounce his name - "one is my name, the other is not"

Screen Shot 2019-09-09 at 2.58.58 PM.png

Link to comment
Share on other sites

It's probably trying to find similar words so you could index, for example, generator, generators, generating, etc. 

But "general" may be going a bit far :)

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.1.2, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.1.2

Link to comment
Share on other sites

If it is truly intentional that it is trying to find similar words, then it may be hard to write software that can tell the difference between words that are related (in the same family) and those that are wholly unrelated but simply have a significant portion that happens to match.

Here, generator and general share the same genera- beginning, differing only in the endings -l versus -tor. Unless the software is actually able to understand suffixes, then I think this is about as close as it could get in identifying similar words. Without understanding language, it can’t know that genera- with the suffixes -ted, -ting, -te, are in the same family, but genera- with the suffixes -l, -lly, -lize, are not. If you tried to define which suffixes are similar, it would fail to find genuinely similar words in other cases.

I think that trying to parse the words for meaning across multiple language is asking too much for this kind of software.

My suggestion is instead to provide a checkbox in the Index Topic studio to optionally include similar words in the search results, so that the user can either find exactly the string as searched or include similar strings that may or may not include false positives.

Link to comment
Share on other sites

10 minutes ago, garrettm30 said:

My suggestion is instead to provide a checkbox in the Index Topic studio to optionally include similar words in the search results, so that the user can either find exactly the string as searched or include similar strings that may or may not include false positives.

Or, possibly, to let the user provide the stemming suggestion. E.g., allow an index search for "generat*" or "generat.*" if the user wants the behavior.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.1.2, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.1.2

Link to comment
Share on other sites

1 minute ago, walt.farrell said:

Or, possibly, to let the user provide the stemming suggestion. E.g., allow an index search for "generat*" or "generat.*" if the user wants the behavior.

That is not a bad suggestion, but I think it is one more thing that has to be discovered in a manual or video. If we could suggest something that is intuitively apparent, that would be better.

Link to comment
Share on other sites

  • 3 weeks later...

Thanks all. @garrettm30, I like your thoughts on the difficulty of the software to deal with similar words - and it could be prefixes that are involved as well as suffixes. I think, perhaps, no change needs to be made. The choices are displayed, the user chooses which words to index and omits the ones that are not indexed under the Topic. It's very straightforward even if sometimes surprising. Thanks again.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.