Jump to content
You must now use your email address to sign in [click for more info] ×

Copying text out of a PDF document when the font has OpenType ligature capability


Recommended Posts

I have found a fantastic feature in Affinity Publisher. This feature is really top class quality.

I have been experimenting. Here is the latest result.

Suppose that one produces a PDF document (for publication on the web) where the text of the document is displayed using a font that has OpenType ligature capability.

Suppose that one now copies the text from the PDF document and pastes it into WordPad.

The underlying text is displayed. Not in the same font, but that is not the issue. The issue is that the underlying text is displayed, not blanks where some or all of the ligatures appear.

The ligature must not be mapped as well though.

This is a great facility. It means that one can publish a PDF document of a poem and have ligatures in the display, yet the poem can be copied from the PDF document and the underlying text pasted into another document, such as in, say, WordPad.

Gold star for that.

Something that I have not yet tested is what happens if one has an alternate glyph for a single letter and one produces a PDF document.

For example, a swash e at the end of a line of the text of a poem.

I made two fonts for the tests. The first one tried three possibilities, namely st ligature as regular Unicode,  ct ligature as unmapped and et ligature mapped into the Private Use Area. When I observed the result, I made the second font with the ct ligature and the et ligature unmapped.

Actually, I have not turned OpenType on in Affinity Publisher. I just started typing using the font and the ligature glyphs appeared automatically. I had intended entering the text and then trying to find the OpenType facility.

Does anyone have any information please about use of alternate glyphs in Affinity Publisher please?

I need to relearn how to make a font with an alternate swash e glyph where the glyph is unmapped and in an OpenType font. I have such a glyph available but it is not in an OpenType font as an OpenType alternate glyph.

Please find two fonts and two PDF documents attached.

William Overington

Wednesday 26 December 2018

 

gold_star.png

ligatest.otf

ligatst2.otf

ligature_test_affinity.pdf

ligature_test_2_affinity.pdf

Until December 2022, using a Lenovo laptop running Windows 10 in England. From January 2023, using an HP laptop running Windows 11 in England.

Link to comment
Share on other sites

I have made good progress.

Please find attached a font and a PDF document.

I have produced both of them myself, respectively using High-Logic FontCreator 8 and Serif Affinity Publisher Beta. The PDF document uses the font.

The PDF document contains a poem that I have written today, written to show five features of the font, namely three ligatures and two stylistic alternates. The stylistic alternates are each for lowercase e.

The ligatures just appeared as I keyed the poem into the computer. I needed to highlight each particular letter e in the text (one at a time) and then use Text Show Typography and choose the desired alternate glyph to replace the ordinary letter e at that location.

The really great thing about the PDF document is that if one copies the text from it and pastes the text into WordPad, one gets the underlying original text.

Not all desktop publishing programs do that. Some just have a blank for the two letters of the ligature glyph (except sometimes for st, which is a special case) or a blank for the stylistic alternate.

The Serif Affinity team have done really well to provide this facility. This facility makes Affinity Publisher a top class product.

William

Thursday 27 December 2018

 

 

ligatst3.otf

white.pdf

Until December 2022, using a Lenovo laptop running Windows 10 in England. From January 2023, using an HP laptop running Windows 11 in England.

Link to comment
Share on other sites

Having had great success with this feature, I tried something yesterday that was, in fact, really pushing the envelope.

Things did not work out totally well, so I thought about it and tried again and got a better result, certainly useful, but not quite as would have been perfect.

Bearing in mind the extreme envelope pushing involved I decided to just keep it all to myself.

Yet, thinking about it, I am posting details of what has happened, just in case it might highlight some bug that might be worth fixing.

In the three test fonts thus far posted, there are three ligature glyphs, one for a ct ligature, one for an et ligature, one for an st ligature.

The font Ligature test 4 in ligatst4.otf added two more ligatures to the liga table of the font.

I have for some years, since 2009, being carrying out a research project from time to time on communication through the language barrier.

Since 2016 I have been writing a novel based around some of the ideas and how they may be applied.

This test involves a part of the research that is in the novel yet not in a scientific research document, so the links here are to the novel, but just enough so as to give the necessary background to the experiment.

The novel, which is not at the present time complete, is linked, chapter by chapter, from the following web page. Most of the chapters are not very long, so there is not a lot of reading involved for this topic.

http://www.users.globalnet.co.uk/~ngo/novel.htm

For the present purpose,

please read Chapter 46 from the second section of page 1, and page 2;

the second section of Chapter 50 version 2, just the first page for this purpose;

and the fourth and fifth blue glyphs on page 3 of Chapter 72.

What it comes down to for this test is that there is a sequence !123 that is to be regarded as a ligature that will produce the symbol designed to represent 'Good day.' in a language-independent manner and that there is a sequence !987 that is to be regarded as a ligature that will produce the symbol designed to represent 'Best regards,' in a language-independent manner.

The first test is will Affinity Publisher substitute the symbol for !123 automatically? Yes it does.

The second test is can it be copied out of the PDF into plain text? No, it cannot.

I wondered whether the fact that I had named the one glyph in the font to be good_day and the other glyph to be best_regards might be something to do with it, as the ct ligature had been named c_t and the et ligature had been named e_t and the st ligature had been named s_t and that maybe the name provided a clue for decoding in some way.

So on to the font Ligature test 4a in ligatst4a.otf which is far as I have got at present.

So I looked up the glyph name for the exclamation mark, which is exclam, and renamed the glyphs as follows.

exclam_one_two_three

exclam_nine_eight_seven

The first test is will Affinity Publisher substitute the symbol for !123 automatically? Yes it does.

The second test is can it be copied out of the PDF into plain text? Yes, it can, but there seems to be an issue of missing out one or more space characters near the glyph that is decoded.

So it seems to be trying to work but it is not quite right.

By the way, the glyphs shown displayed in Chapter 72 were done using a font where the special glyphs are in an ordinary TrueType font and mapped into the Private Use Area.

If any reader wants to have a look at some more glyphs that have been produced as part of the project, Chapter 5 and Chapter 42 have a number shown. Chapter 34, whilst not showing any glyphs as such might give an insight into the ideas of the project.

William

 

 

ligatst4.otf

ligatst4a.otf

Edited by William Overington
Adding attachments

Until December 2022, using a Lenovo laptop running Windows 10 in England. From January 2023, using an HP laptop running Windows 11 in England.

Link to comment
Share on other sites

9 minutes ago, William Overington said:

The first test is will Affinity Publisher substitute the symbol for !123 automatically? Yes it does.

The second test is can it be copied out of the PDF into plain text? No, it cannot.

What were your export settings, William? Does the PDF file include an embedded subset of your font?

After you copied the symbol from the PDF, where did you try to paste it?

Alfred spacer.png
Affinity Designer/Photo/Publisher 2 for Windows • Windows 10 Home/Pro
Affinity Designer/Photo/Publisher 2 for iPad • iPadOS 17.4.1 (iPad 7th gen)

Link to comment
Share on other sites

Hello Alfred

What were your export settings, William?

PDF for web.

> Does the PDF file include an embedded subset of your font?

Yes. I remember seeing a setting for that the other day with another project, but it is the default and I did not touch it.

After you copied the symbol from the PDF, where did you try to paste it?

WordPad

Since posting I have been producing some carefully composed source files and PDF documents that show the issue, those I did yesterday were just rough and I had lost them anyway.

Here are the attachments now, in chronological order.

The second .afpub file is just a Save As from the first one and then a change of font.

Please note how the spaces around the first glyph and the space in front of the second glyph do not get through to WordPad.

William

 

 

 

 

test004.afpub

test004.pdf

test004a.afpub

test004a.pdf

Until December 2022, using a Lenovo laptop running Windows 10 in England. From January 2023, using an HP laptop running Windows 11 in England.

Link to comment
Share on other sites

Just in case it was a more general issue if the ligature has a space in front of it or after it, I produced the following .afpub files and PDF documents. The copying from the PDFs to WordPad works fine. So it is not general.

 

test002a.afpub

test002a.pdf

test004b.afpub

test004b.pdf

Until December 2022, using a Lenovo laptop running Windows 10 in England. From January 2023, using an HP laptop running Windows 11 in England.

Link to comment
Share on other sites

BTW, this works properly for applications using the Adobe PDF engine. I tried a couple other applications that, like Affinity applications, use a different PDF engine and all failed. As such, I don't know if it is a failing of these other PDF engines as used or if they are incapable of using the decomposed code points for non-standard ligatures.

Link to comment
Share on other sites

Thank you.

Well, et is not a standard ligature, yet that has worked fine.

BTW, this works properly for applications using the Adobe PDF engine.

Which particular test, with which font, does "this" in the above line refer please?

William

 

Until December 2022, using a Lenovo laptop running Windows 10 in England. From January 2023, using an HP laptop running Windows 11 in England.

Link to comment
Share on other sites

15 minutes ago, William Overington said:

Thank you.

Well, et is not a standard ligature, yet that has worked fine.

BTW, this works properly for applications using the Adobe PDF engine.

Which particular test, with which font, does "this" in the above line refer please?

William

But the et ligature is one that APub's PDF engine understands. And likely Dave Harris can make Affinity applications understand the sequence of
exclam one two three (etc.)
But it isn't this way currently.

I made one of my fonts to use a ligature named good_day. The glyph is a simple pair of rectangles with inner ovals, and I used the word start that uses a discretionary lig for the st combination. I also included the same string of text below the text using the dlig feature. In APub and the resulting PDF, it appears as such:

capture-002401.png.76c556b28007fd47c00e484d9d120732.png

If I look into a PDF using the above sequence (the bold words), then I see this in the PDF:

capture-002400.png.9e8c7de85e15c982de4135a2947969bd.png

As you can see, the st lig has the decomposed Unicode code points underlying it (the decomposed Unicode code points). But for the good_day lig, the underlying code point is the Unicode code point for any glyph that cannot be determined. From Wikipedia:

  • Quote

    U+FFFD  REPLACEMENT CHARACTER used to replace an unknown, unrecognized or unrepresentable character

    If I use the same text in InDesign, though, the Unicode code point for the good_day lig is represented properly in the PDF:

capture-002402.png.ee596bdd94ab4f8e984159062ea5bf51.png

Because of this decomp in the PDF, the text from the PDF pastes as such into WordPad:

capture-002403.png.f5d27034edd5e40f5f88459669cd6983.png

Mike

 

 

Link to comment
Share on other sites

If the glyph for 'Good day.', which is accessed by !123 has a PostScript name within the font of exclam_one_two_three then the !123 can be pasted from the PDF to WordPad, but if the glyph for 'Good day.', which is accessed by !123 has a PostScript name within the font of good_day then the !123 cannot be pasted from the PDF to WordPad. It is as if the correct code points for !123 are decoded from the glyph name, which may or may not be the case.

However, the spaces around the !123 do not get pasted. I am wondering if this is because the width of the glyph for 'Good day.' is a lot wider than the combined width of the !123 characters upon which a glyph substitution takes place.

William

 

Until December 2022, using a Lenovo laptop running Windows 10 in England. From January 2023, using an HP laptop running Windows 11 in England.

Link to comment
Share on other sites

So I made a test font ligatst5.otf Ligature test 5 and I changed the glyph name for the 'Good day.' glyph to become asterisk_a_h_h and I changed the OpenType code accordingly. That is the glyph name is nothing to do with the !123 sequence but is nevertheless made up of standard Postscript names used in fonts.

The line of text in the OpenType code is as follows.

  sub exclam one two three -> asterisk_a_h_h;

I installed the font and made a copy of test004a.afpub as test005.afpub and simply formatted the text with the Ligature test 5 font.

The PDF document test005.pdf displayed the two special glyphs well.

Copying from the PDF and pasting to WordPad gave *ahh for the plain text version.

William

 

ligatst5.otf

test005.pdf

Until December 2022, using a Lenovo laptop running Windows 10 in England. From January 2023, using an HP laptop running Windows 11 in England.

Link to comment
Share on other sites

I have now added a transcript of my post of 27 December 2018 to my webspace. The font and the PDF are also available.

The transcript is near the end of the following web page.

http://www.users.globalnet.co.uk/~ngo/library.htm

William

 

Until December 2022, using a Lenovo laptop running Windows 10 in England. From January 2023, using an HP laptop running Windows 11 in England.

Link to comment
Share on other sites

  • 2 years later...

I'm not sure what you're expecting from "bringing it back to the top".

It's in the Feedback section, which typically the Serif staff uses for planning but will not comment in. As Feedback, it doesn't need to be "at the top" because its placement does not affect how it's used by Serif.

If you think you've found a bug of some kind, and are looking for a fix, this thread is in the wrong forum.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

3 hours ago, walt.farrell said:

I'm not sure what you're expecting from "bringing it back to the top".

It's in the Feedback section, which typically the Serif staff uses for planning but will not comment in. As Feedback, it doesn't need to be "at the top" because its placement does not affect how it's used by Serif.

If you think you've found a bug of some kind, and are looking for a fix, this thread is in the wrong forum.

In the thread

https://forum.affinity.serif.com/index.php?/topic/138654-artwork-for-greetings-cards/

near the end of page 5 there is a link to a thread in the High-Logic forum and in that High-Logic thread is a link to this thread.

I brought it back to the top in case it might be of interest to some users of Affinity Publisher.

I have it in mind to have a look sometime at how the most recent version of Affinity Publisher reacts to this and maybe to have a go at making a font that has more capability with a view to making that font available as a free download.

So maybe the thread is now not in the best place for users of Affinity Publisher to see it.

William

 

Until December 2022, using a Lenovo laptop running Windows 10 in England. From January 2023, using an HP laptop running Windows 11 in England.

Link to comment
Share on other sites

17 minutes ago, William Overington said:

In the thread

https://forum.affinity.serif.com/index.php?/topic/138654-artwork-for-greetings-cards/

near the end of page 5 there is a link to a thread in the High-Logic forum and in that High-Logic thread is a link to this thread.

In the top right-hand corner of each forum post there is an ellipsis. Click on it to display a popup menu listing the available options.

199D1DDC-DBBD-42F4-9007-F4ABA10A814B.jpeg.9f2acd5b225abb2a53b4e25f9eea7bd2.jpeg

If you click on ‘Share’ you’ll get a further popup with a ‘Link to post’ box containing the URL of the post in question, and if you copy that URL and paste it into another post it will give you (by default) something like this:

You will also be presented with an option to change the way it’s displayed:

Quote

Your link has been automatically embedded.   Display as a link instead

 

Alfred spacer.png
Affinity Designer/Photo/Publisher 2 for Windows • Windows 10 Home/Pro
Affinity Designer/Photo/Publisher 2 for iPad • iPadOS 17.4.1 (iPad 7th gen)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.