Jump to content
You must now use your email address to sign in [click for more info] ×

Tibetan word spacing and line breaks not working properly


Recommended Posts

I'm using a unicode Tibetan font in Publisher. The character stacking, ie contextual alternates, are working properly. However, the line breaks are not recognizing word boundaries. There is a character called "task" that separates syllables or words and should be treated much as a space or hyphen in English. However, Publisher is treating these as characters in a word, so a whole paragraph is treated as a single word, and line breaks occur in the middle of words rather than after "tasks".

The best would be to have Publisher properly treat unicode Tibetan. Is there a way to define a style with appropriate character and paragraph settings to do this? If not, is there a way to define white space characters that Publisher would use for determining word boundaries and where to put line breaks? 

Link to comment
Share on other sites

Hi David in MA,

Welcome to the Affinity Forums!

Sorry I am not familiar with Tibetan font at all, so just two thoughts: Concerning the "task" character and wrong hyphenation I wonder if you have set the spelling language + hyphenation dictionary in the Character Panel > Language.  In case your installation of Affinity doesn't include a suitable dictionary you can add one manually.

Here are some hints how to add:
https://forum.affinity.serif.com/index.php?/topic/65149-dictionaries-support/&tab=comments#comment-339174

 

https://forum.affinity.serif.com/index.php?/topic/90129-hyphenation-dictionaries-not-found/&tab=comments#comment-485289

 

Concerning optional spaces you might try if the menu Text > Insert > Spaces and Tabs would offer according items, some have no width or influence hyphenation.

259929532_insertspacesandtabs.jpg.814c715c437a4b34135db54bf628984e.jpg

 

 

 

macOS 10.14.6 | MacBookPro Retina 15" | Eizo 27" | Affinity V1

Link to comment
Share on other sites

Thanks. I had chosen Tibetan in the typography options. That got the contextual alternates working properly. 

 I'll have to figure out about the dictionary and hyphenation, however, I don't think that will do what I want. I don't want to break works at the end of the line, but rather, I want Publisher to treat the "tsak" character as a space separating words (note: tsak, not task as the spell checker wanted to substitute). 

I used your suggestion about inserting zero-width spaces and found that works. Especially, by using Find and Replace, I searched for all tsaks and replaced them with tsak followed by a zero with space. The extra character is invisible but allows Publisher to properly wrap lines after tsaks without breaking words. Thanks for the tip.

It would be nice to not have to do this, so I hope Affinity will someday support unicode foreign languages properly. We had been using Adobe products for years, but with their subscription model, weak support for Tibetan unicode, and the new Mac OS obsoleting the last versions that could be purchased outright, we were very happy to see your products come out as a replacement for the Adobe ones. I'm hoping you can improve the ability to import InDesign and Illustrator files. I understand that when Publisher 1.8 comes out, it will be able to import InDesign IDML files maintaining threaded text boxes. Yay!

Link to comment
Share on other sites

7 hours ago, David in MA said:

note: tsak, not task as the spell checker wanted to substitute

Thanks for that correction :)

My Unicode lookup tool does not recognize the name tsak. Is there another name for it? Or, can you tell me its Unicode value (U+????) so I can find some information about it?

Thanks.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

Possibly this article might give a little more info on the issue:

https://w3c.github.io/tlreq/#line_breaking

As far I understand the 'tsek' character should cause a line break at the end of the text frame – but also there seems to be no hyphenation in Tibetan at all.

 

Quote

 

4.2 Line breaking ...

Normally, Tibetan only breaks after a tsek (U+0F0B TIBETAN MARK INTER-SYLLABIC TSHEG ), and doesn't break after spaces.

(...)

Tibetan never breaks inside a syllable, and has no hyphenation. If a word is composed of multiple syllables, it is also preferable to avoid breaking a line in the middle of the word.

 

Although I am glad to here that a workaround with adding a 'zero-width space' does work for you to cause a line break as wanted, I do wonder whether assigning a dictionary could avoid the need for this additional step, – like a dictionary for a western/latin language does include the information about line break + hyphenation positions and might not work correctly in Affinity without such a dictionary.

In case Affinity needs a dictionary to interpret the correct characters for hyphenation it might be worth to test a Tibetan dictionary, for instance these for the open source text editor apps 'Libre Office' or 'Open Office':

https://github.com/LibreOffice/dictionaries/commit/471ccf158958a7e7968c96796bdf4dcfe4142bb3

https://extensions.openoffice.org/de/node/18573

 

If you experience this issue in Affinity only you may consider to open a thread in the Affinity Forum for Bugs, accordingly to your operating system (mac/win).
https://forum.affinity.serif.com/index.php?/forum/80-report-a-bug-in-affinity-publisher/

 

macOS 10.14.6 | MacBookPro Retina 15" | Eizo 27" | Affinity V1

Link to comment
Share on other sites

I just looked up the unicode value. I guess the official name is "tseg" and the value is U+0F0B . It functions like a space in english separating words. Publisher is treating like a letter, so it considers a whole sentence as one word and doesn't know where the true word boundaries are. There is no hyphenation in Tibetan. I'll try to figure out how to add a Tibetan dictionary to see if that helps.

Link to comment
Share on other sites

PS: Microsoft Word 2011 and 2016 for Mac had this problem, but the Mac supplied TextEdit program and recent versions of LibreOffice and NeoOffice work properly. Microsoft Word for Mac 2019 fixed the problem in Word. So support for this seems dependent on the App and version. 

Link to comment
Share on other sites

PPS: I already had two Tibetan dictionaries installed in the Mac OS. So that wasn't the problem. I think it's just a matter of treating the tseg as a whitespace character instead of a letter, thereby allowing line breaks after a trek. I'll submit a bug report as you suggested. Thanks.

Link to comment
Share on other sites

Notice there are 4 menus to set a language.

178511509_languagesettingscharacterpanel.jpg.9336f8a596819e9a4e4692362012b09a.jpg

I am not sure where Tibetan should appear to cause correct line breaks.
If you don't see any of your installed dictionaries in at least one of these pulldown menus then possibly the file path isn't caught by Affinity and needs a move of the according file on your disk.

macOS 10.14.6 | MacBookPro Retina 15" | Eizo 27" | Affinity V1

Link to comment
Share on other sites

I do see Tibetan in the Typography script pulldown, and the options within that are working. Tibetan does not appear in the other three pulldowns. There is no documentation I can find about where to put dictionary files. On Mac, the standard location is in the dictionaries folder either in the user Library folder or the global Library folder. Is there some other place the dictionary file should go for Publisher on Mac?

Even so, I don't think this is a dictionary problem. In Tibetan, each "word" or "syllable" ends with a tsek. Lines should wrap after any tsek. Publisher is treating the tsek as a non-breaking space or letter within a word. It shouldn't matter whether the word is in a dictionary or not.

Link to comment
Share on other sites

1 hour ago, David in MA said:

in the user Library folder or the global Library folder. Is there some other place the dictionary file should go for Publisher on Mac?

If you have a dictionary but it doesn't appear in the Affinity menu:  A dictionary in Affinity needs a certain folder structure + specific file naming, as mentioned here:

 

1 hour ago, David in MA said:

Even so, I don't think this is a dictionary problem.

May be you are right. If in Tibetan the typed text contains & shows the relevant glyph to cause a line break wherever it occurs, this possibly but not necessarily means, a dictionary does not offer additional information besides spelling. I just thought a dictionary might be able to solve the issue because in Western languages  Affinity gets both, the spelling & correct line breaks (+ its additional dashes for hyphenation) from the dictionary.

macOS 10.14.6 | MacBookPro Retina 15" | Eizo 27" | Affinity V1

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.