Jump to content
pdussart

Character issue with File/Open PDF

Recommended Posts

Dear,

If I open a PDF file with Affinity 249 on Windows 10, in some cases, - hyphen are disappearing with Affinity, while this issue does nots exists with Adobe Reader, Microsoft Word, Edge, Scribus, etc

I think that during the open process, the character before the hyphen gets transformed into some combination with the previous character.

See input PDF file, description of findings + screens and .AFPUB files in attachement

Regards, Philippe

Affinity issue Open PDF.afpub

Affinity issue Open PDF.pdf

mania201962p04.pdf

Share this post


Link to post
Share on other sites

The hyphen is transformed in "tiret conditionnel" (conditional hyphen). Perhaps because when opening a PDF, there's no possibility to distinguish them from hyphen. Since we use a lot of them in French, it would be better to have only regular hyphens, and delete manually the one added by hyphenation.

Share this post


Link to post
Share on other sites

It appears that the APub PDF import does not recognize the soft-hyphen there.
U+00AD : SOFT HYPHEN [SHY] {discretionary hyphen}
Not sure why the author would use a soft-hyphen there, but that was the Unicode ID I got.
I edited the PDF and copied the characters and pasted them into a Unicode identifier tool.

I edited the original PDF and replaced that character with a regular hyphen and then it imported fine.

This does not seem to be an APub error.
The soft-hyphen should only appear if it is needed at the end of a line.
So I do not know how it ended up being the required dash/hyphen between those words in that PDF.

 

@Wosven

You posted while I was testing.

How does a soft-hyphen or conditional hyphen end up in the middle of a sentence in French?
How is it used?
I do not understand.
O.o

 

Share this post


Link to post
Share on other sites

@LibreTraining

They don't unless we need them and manually insert some to add a hyphenation that doesn't follow the rule set in the paragraph style! (usually for aesthetic). It's a reason to have them in a text, if the text flow differently, they'll disappear.

Some can be imported from Word too, and they can appear/be visible in ID (why not in APub?), and we need to delete them (I didn't checked their unicode value, I thought they were converted to regular hyphens, while exporting as PDF or copying-pasting).

But they shouldn't be used in words where they need to be visible, like compound words: Jean-Marie (J.-M.), sac-à-main…
Another strange point in this PDF is that's there aren't regular spaces too.

 

Share this post


Link to post
Share on other sites

Yes, that is how I would expect soft-hyphens to be used.
So it is the same as I am familiar with in English.
No special use.

But it is weird that a soft-hyphen appears in this compound word.
I would expect to see a non-breaking hyphen, or a regular hyphen, but not a soft-hyphen.

The PDF document info says it was created with Scribus 1.5.4.

How can a soft-hyphen appear in the middle of a sentence?
I'm soooo confused.
:42_confused:

 

Share this post


Link to post
Share on other sites
Posted (edited)

Dear all,

I read different things about "soft hyphen" etc. I have no clue about what this is. I have used this version of Scribus and the other tools mentioned without any issue with my German printers for PDF-X3 documents. About 20 magazines and four books. I even took the risk of using Publisher 145 for printing a book. It worked  fine ! (my old PC was windows 8.1 at the time - I got a new PC with windows 10 in January)

As far as I am concerned, I just use the hyphen that shows on my keyboard. No Alt+ combination or whatever "clever".

Note: Same issue with Affinity Photo and Affinity Designer

I can only conclude that there is a "glitch" somewhere with Affinity tools.

Note 2: I am using common fonts like Arial, Times New Roman, Helvetica and Verdana. Nothing fancy.

Regards, Philippe

Edited by pdussart
Additinal note about fonts used

Share this post


Link to post
Share on other sites

The problem is not an Affinity "glitch."

The problem in Scribus is producing PDFs with incorrect characters.
Any application importing these broken PDFs is going to properly import these wrong characters as written.

The PDFs will print properly, but that does not mean they are structurally correct.
There is no rational reason that I am aware of to justify replacing all spaces with non-breaking spaces.

 

Share this post


Link to post
Share on other sites

Dear All,

I have no way to display control characters of PDF files, so I cannot say whether Scribus writes a PDF file correctly or not (see above)

But I have another example of conversion issues with the Word -> PDF -> Publisher process.

1) Create a page with Microsoft Word (bought Jan 2019, new PC on Windows 10) Set Word so that all control characters are shown

2) Save Word as PDF (with option rasterize if not embedded) or PDF/A  Results in Affinity look the same. Both tested.

3) Open PDF in Publisher 247 with option "Group Lines in text frame"
- the result is "strange"  Publisher creates Text Frames that do not seem consistant with Word control characters. An additional logic seem to take place.
- e.g. in some cases (after 10 spaces) a new frame is created, regardless of a character paragraph end

In this case, I think that a new frame should not be created by Publisher after a certain amount of spaces, but after a paragraph end.

I would also suggest to add a third option when opening PDF files : "Convert blocks of text" in a "structured" set of frames.
Ideally, categories of blocks would be: Heading - Core text (in one or more columns) - Footer
This would be an useful option for publishing simple documents as books or magazines containing "structured" pages of text.

Alternatively, a third option would be "For each page, put all text in a single Text Frame, while respecting the original visual position as per PDF display by Acrobat Reader.

Also, a simple tool to merge Text Frames would be welcome:
- CTRL+ right click on frames to select, then merge.
- Option 1: Keep the the X Y positions of the text as per PDF display by Acrobat Reader.
- Option 2: Wrap text in the merged frame.

2 PDF files in attachment + 5 screen shots. Note: check the sequence of file names

Regards, Philippe

 

Publisher PDF import Text Frames Example A 1.jpg

Publisher PDF import Text Frames Example A 2.jpg

Publisher PDF import Text Frames Example A.pdf

Publisher PDF import Text Frames example B 1.jpg

Publisher PDF import Text Frames example B 2.jpg

Publisher PDF import Text Frames Example B.pdf

Publisher PDF import Text Frames Example A 0.jpg

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×

Important Information

These are the Terms of Use you will be asked to agree to if you join the forum. | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.