thetasig Posted September 9, 2019 Share Posted September 9, 2019 A little weirdness. I indexed the word "Generator" and then went to the View/Studio/Index Topic item and performed a "Find In Document" - this is what it brought back - another word that is similar to the word "Generator" but NOT the same word at all. It brought back the word "General" and offered to index those entries under the topic "Generator" As Data said once about how to pronounce his name - "one is my name, the other is not" Quote Link to comment Share on other sites More sharing options...
walt.farrell Posted September 9, 2019 Share Posted September 9, 2019 It's probably trying to find similar words so you could index, for example, generator, generators, generating, etc. But "general" may be going a bit far Quote -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. iPad: iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1 Link to comment Share on other sites More sharing options...
Staff Jon P Posted September 10, 2019 Staff Share Posted September 10, 2019 That does seem to be the case, not sure it is intended though so I've logged it Quote Serif Europe Ltd. - www.serif.com Link to comment Share on other sites More sharing options...
garrettm30 Posted September 13, 2019 Share Posted September 13, 2019 If it is truly intentional that it is trying to find similar words, then it may be hard to write software that can tell the difference between words that are related (in the same family) and those that are wholly unrelated but simply have a significant portion that happens to match. Here, generator and general share the same genera- beginning, differing only in the endings -l versus -tor. Unless the software is actually able to understand suffixes, then I think this is about as close as it could get in identifying similar words. Without understanding language, it can’t know that genera- with the suffixes -ted, -ting, -te, are in the same family, but genera- with the suffixes -l, -lly, -lize, are not. If you tried to define which suffixes are similar, it would fail to find genuinely similar words in other cases. I think that trying to parse the words for meaning across multiple language is asking too much for this kind of software. My suggestion is instead to provide a checkbox in the Index Topic studio to optionally include similar words in the search results, so that the user can either find exactly the string as searched or include similar strings that may or may not include false positives. Quote Link to comment Share on other sites More sharing options...
walt.farrell Posted September 13, 2019 Share Posted September 13, 2019 10 minutes ago, garrettm30 said: My suggestion is instead to provide a checkbox in the Index Topic studio to optionally include similar words in the search results, so that the user can either find exactly the string as searched or include similar strings that may or may not include false positives. Or, possibly, to let the user provide the stemming suggestion. E.g., allow an index search for "generat*" or "generat.*" if the user wants the behavior. Quote -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. iPad: iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1 Link to comment Share on other sites More sharing options...
garrettm30 Posted September 13, 2019 Share Posted September 13, 2019 1 minute ago, walt.farrell said: Or, possibly, to let the user provide the stemming suggestion. E.g., allow an index search for "generat*" or "generat.*" if the user wants the behavior. That is not a bad suggestion, but I think it is one more thing that has to be discovered in a manual or video. If we could suggest something that is intuitively apparent, that would be better. Quote Link to comment Share on other sites More sharing options...
thetasig Posted September 28, 2019 Author Share Posted September 28, 2019 Thanks all. @garrettm30, I like your thoughts on the difficulty of the software to deal with similar words - and it could be prefixes that are involved as well as suffixes. I think, perhaps, no change needs to be made. The choices are displayed, the user chooses which words to index and omits the ones that are not indexed under the Topic. It's very straightforward even if sometimes surprising. Thanks again. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.