Jump to content
You must now use your email address to sign in [click for more info] ×

Publisher text editor does not recognize cyrillic text


anto

Recommended Posts

Version 2 (from 2.0 to 2.3.0.2157) of Publisher does not work properly with Cyrillic text. They often fail to recognize letters and split them into their component parts.
I can't reproduce this when I want to. The text is usually copied and pasted from office documents - libreoffice or word. This happens with two or three words in the text. Today it happened with 7.

MacOs Ventura

 

SCR-20231129-mkma.png.980cf76d4806da47ee8eff4dd7af848b.png

 

 

Link to comment
Share on other sites

Can you share the .odt or .docx document you're copying from?

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.5, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.5

Link to comment
Share on other sites

Looks like the diacritics are either being decomposed, or they were entered separately in the original document.

If they are entered separately usually the shaper will/should turn these into the single character. 

What is the original font? 

On my phone at the moment, but can look at this on my laptop in about an hour.

Note: In your movie the single dot reappears on the decomposed character which could be a font issue, or an issue with the shaper.

Link to comment
Share on other sites

32 minutes ago, kenmcd said:

Looks like the diacritics are either being decomposed, or they were entered separately in the original document.

If they are entered separately usually the shaper will/should turn these into the single character. 

What is the original font? 

On my phone at the moment, but can look at this on my laptop in about an hour.

Note: In your movie the single dot reappears on the decomposed character which could be a font issue, or an issue with the shaper.

It doesn't depend on the font. The letters were entered in the normal mode. This does not happen with all words. There are many other words in the text, but this happens only with some of them, there are the same words, but they are spelled correctly.

 

 

Link to comment
Share on other sites

A font without any composites would affect this - and probably work.
So I doubt it always "does not depend on the font."

This appears to be a problem with the Word shaper, the font composites, and the Affinity shaper.
When you say it was "entered in the normal mode" do you mean in Word?

In Liberation Serif the Ukrainian yi-cyrl (ї) U+0457 is composed of two glyphs.
A dotlessi (ı) U+0131 and a diresis (¨) U+00A8.
That is the old legacy diresis, not the diresiscomb  (   ̈ ) U+0308 which it should be.

Word apparently sees this and changes the diresis to the diresiscomb U+0308
but it also changes the dotlessi to a ibyelorussianukrainian-cyrl (і) U+0456.
Then apparently the Word shaper removes the dot and applies the diresiscomb.
Because of that it appears to be the yi-cyrl  (ї) U+0457.
But when the text is copied - that is what comes over - those two characters.
ibyelorussianukrainian-cyrl (і) U+0456 and diresiscomb U+0308
And that is what you are seeing in APub.

In APub when I directly enter the yi-cyrl (ї) U+0457 as Unicode - it works fine.
If that correct character was coming over from Word - it would work fine in APub.

I am not sure if the Affinity shaper should be "fixing" this mess.
That would basically be fixing a Word-created error.
But that could be another help-the-users-deal-with-this situation.

Word compounded the error when it replaced the dotlessi.
If it just replaced the dieresis with dieresiscomb - that should work.
Or replace the whole thing with yi-cyrl (ї) U+0457.
(which is what I think harfbuzz does).

Link to comment
Share on other sites

I wonder if it would work better to Open a .docx file rather than using copy/paste. I can try that when I'm at a different machine, but I'm not sure I'll recognize whether the problem has occurred :) 

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.5, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.5

Link to comment
Share on other sites

Yes, I wanted to take a look inside the document.
But because it was .doc, and not .docx - cannot look inside.
Like to see what actual characters are in there.

There could also be an issue with the .doc.
When I opened it in LibreOffice Writer the font is shown as:
Liberation Serif;Times New Roman

Which of course makes no sense.

Word just showed: Liberation Serif

Link to comment
Share on other sites

14 hours ago, kenmcd said:

The characters for yi-cyrl (ї) in the .docx are what is coming over to APub.
ibyelorussianukrainian-cyrl (і) U+0456 and diresiscomb U+0308

I'm not quite sure I understand what you mean there, but thanks and congratulations for the detailed explanations and detective work in this topic!

Are you saying that the proper character was contained in the Word file, composed correctly, but the Affinity application broke it during import?

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.5, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.5

Link to comment
Share on other sites

2 hours ago, walt.farrell said:

Are you saying that the proper character was contained in the Word file, composed correctly, but the Affinity application broke it during import?

No. Word broke it. And Affinity is just displaying what is there.

This could be something which could be fixed during the Place processing of the Word file. A new test/fix to look-out-for-dumb-stuff-like-this and replace it with the one correct character. May be difficult to actually implement, but would help users and prevent issues like this in the future.

Part of the problem may be in how the font is constructed. Current best practices are to only use combining accents in composite characters, not legacy accents. This could be Word's (DirectWrite's) way of dealing with older fonts which still used legacy accents in composites and did not have a dotless i glyph. Dunno. I can only guess.

If we de-compose the font (make all the composites go away), it would probably not be an issue.

@anto If I make you a decomposed version of that font, would you test it for us? (would probably not be until tomorrow) 

Link to comment
Share on other sites

21 minutes ago, anto said:

But the problem is that the file was sent to me.

That is a major problem.

Above you said:

22 hours ago, anto said:

The letters were entered in the normal mode.

Which to me means you typed it on your keyboard into Word. 

But instead, you apparently received a defective file. So the source of those odd characters is completely unknown. 

This has been a giant waste of time.

Link to comment
Share on other sites

9 minutes ago, kenmcd said:

Which to me means you typed it on your keyboard into Word. 

But instead, you apparently received a defective file. So the source of those odd characters is completely unknown. 

This has been a giant waste of time.

This is not the first time this has happened. It happens very often. I get files from different people. Everyone can't type the text wrong. Everyone has different versions of programs (Word, for example, LibreOffice etc). Even in this text, there are the same words, but spelled correctly and not decomposed into elements.

Link to comment
Share on other sites

FWIW 1
According to good old Google "The Liberation Fonts (Sans and Serif) are a font family which aims at metric compatibility with Arial, Times New Roman and Courier New sponsored by Red Hat" so the Liberation Serif;Times New Roman perhaps makes a little sense, not from a font selection perspective but just in general and as far as I know it is the default font used in LibreOffice.

FWIW 2
If I change my keyboard to either 'Ukranian - QWERTY' or 'Ukranian - Legacy' on macOS and type the words in Microsoft Word then both 'copying and pasting' the text from Word or 'placing' the Word file in Publisher displays the text correctly, i.e.,

Instead of a ibyelorussianukrainian-cyrl (і) U+0456 and diresiscomb U+0308 you have the U+0457 Unicode character ї and
Instead of a ibyelorussianukrainian-cyrl (и) U+0438 and diresiscomb U+0306 you have the U+0439 Unicode character й

This to me suggests the issue is one of text input and not an issue with Publisher...

Sample Word File

Source.docx

Affinity Designer 2.4.2 | Affinity Photo 2.4.2 | Affinity Publisher 2.4.2
Affinity Designer  Beta 2.5.0 (2463) | Affinity Photo Beta 2.5.0 (2463) | Affinity Publisher Beta 2.5.0 (2463)

MacBook Pro M3 Max, 36 GB Unified Memory, macOS Sonoma 14.4.1, Magic Mouse

Link to comment
Share on other sites

1 hour ago, kenmcd said:

Which to me means you typed it on your keyboard into Word. 

But instead, you apparently received a defective file. So the source of those odd characters is completely unknown. 

This has been a giant waste of time.

The text was typed in LibreOffice on Linux Mint

Link to comment
Share on other sites

8 hours ago, anto said:

The text was typed in LibreOffice on Linux Mint

"was typed" LOL - by whom? The Invisible Man?

This appears to be exactly opposite from my initial understanding/assumptions.

I assumed you, or The Invisible Man, had entered the characters correctly,
and the shaper had broken it.

But it appears that The Invisible Man (TIM) had entered the wrong characters,
and the shapers actually hacked the display to show the intended character.

The yi-cyrl (ї) U+0457 character is in Ukrainian, and not in Russian.

My guess is that TIM is using an old Russian flavored keyboard layout, and has to use some old trick to enter the character.
And the shapers in Word and LibreOffice are aware of this old hack, and they display the correct glyph - but the underlying characters are still wrong.

Unfortunately this has the effect of standardizing the hack.
When the correct fix is to correct the character input.

Windows has three different Russian keyboard layouts available, and also has two Ukrainian keyboard layouts available. And there are apparently some popular 3rd-party Russian keyboard layouts.
I assume there are similar keyboard layout choices available for Linux.

The best fix is for TIM to use a keyboard layout or tool which enables him to enter the correct character codes, not some hack which the shaper has to hack visually.
@Hangman demonstrated one working solution above.

Or you could use find-and-replace to correct the characters.

I hope Affinity never adds this hack to their shaper as it just perpetuates this.
Affinity is just correctly showing what is actually there.
What is there should be corrected by the author.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.