netera Posted January 5, 2022 Share Posted January 5, 2022 I wonder if anyone could solve this. I am laying up a book which has Sanskrit characters with a roman translation underneath. I am coping from a word file and pasted into Publisher. What's happening is the characters are changing when pasted, I attach an example. I have tried formatting the original word file, making it all Times New Roman and 13pt. I cannot remove all the formatting as I need some of it. In the past I had trouble with pasting characters with diacritic marks, they used to show up as box characters. I then just went over them and replaced as necessary. This problem is different and I am scratching my head. I am sure someone has already figured this out? Notice how the long 'a' had been substituted for a capitol long 'O' This is consistent throughout the book and it’s highly annoying. Any assistance would be gratefully received. Andrew :-) Quote Link to comment Share on other sites More sharing options...
lacerto Posted January 5, 2022 Share Posted January 5, 2022 Can you attach a piece of Word document (and Publisher document, if relevant) so that it would be easier to test? Quote Link to comment Share on other sites More sharing options...
netera Posted January 5, 2022 Author Share Posted January 5, 2022 you mean save out each in its native file format? Quote Link to comment Share on other sites More sharing options...
netera Posted January 5, 2022 Author Share Posted January 5, 2022 Here are the files. I hope this is what is wanted. When you open the .pub file it says there are 2 missing fonts, these are then converted to Sanskrit 2003 and Times New Roman. I have tried a few ways but when the text is imported into AfPub it still swaps out some letters. This problem appears throughout the book, every batch of text with discritic marks usually has some errors when placed into Publisher. Any thoughts? Diacritic sample.afpub Diacritic sample.docx Quote Link to comment Share on other sites More sharing options...
netera Posted January 5, 2022 Author Share Posted January 5, 2022 1 hour ago, LondonSquirrel said: Follow up: I see that you are using Windows. Times New Roman (the standard version) should work there too. Yes, window I’m afraid. So I see its showing (? Times New Roman Baltic) when I paste it into AfPub, but if I highlight the text and convert into Times New Roman it staying the same. Why would that be? See attached. Could it be the way the software is set up? I don't even have TNR Baltic installed where is that coming from? Thank you for your assistance, very much appreciated. Andrew Quote Link to comment Share on other sites More sharing options...
netera Posted January 5, 2022 Author Share Posted January 5, 2022 Yes, I am getting the same. I don'teven have Baltic installed, where is that coming from, very odd. How do I remove that so that when it pasted its going to display as Times New Roman? Is there a set up panel to erase this and add Times New Roman? I have not attached a pic as I have the same as you in my Text Styles panel, so at least that is good news. Andrew Quote Link to comment Share on other sites More sharing options...
netera Posted January 5, 2022 Author Share Posted January 5, 2022 I just copied the original text in word and pasted it into another blank word file and removed its formatting. It reverted to Calibri, the font I use as standard in Word. I then pasted this into AfPub (into a new doc). It kept 'Calabri' as its style but in Text Styles says Calibri Baltic! What on earth is Baltic??? It seems to be added to the style. I'll take a look on-line. A. Quote Link to comment Share on other sites More sharing options...
lacerto Posted January 5, 2022 Share Posted January 5, 2022 What happens if you "Paste As" in Unicode? If I try it, the diacritics are retained. (EDIT: Also, Times New Roman (Baltic), which is just miisinterpreation, is gone and the pasted text is just "Times New Roman".) Quote Link to comment Share on other sites More sharing options...
netera Posted January 5, 2022 Author Share Posted January 5, 2022 Hi there, it’s not so much retained diacritics as substituting the wrong letter, so the character should be a long 'a' but instead it’s a long capital 'O'. Not sure what you mean? (EDIT: Also, Times New Roman (Baltic), which is just miisinterpreation, is gone and the pasted text is just "Times New Roman".) How does one edit Baltic out? Looking on line it seems there is a Times New Roman Baltic Free Font. BUT I don't even have this installed. A. Quote Link to comment Share on other sites More sharing options...
netera Posted January 5, 2022 Author Share Posted January 5, 2022 It pastes the same, still says 'Baltic' in the Text Styles panel. The original word file was typed on a Mac in the US. Is there some weird formatting attached to the text. As I have been flowing the pages its really glitchy in AfPub........... A. Quote Link to comment Share on other sites More sharing options...
netera Posted January 5, 2022 Author Share Posted January 5, 2022 Here is a screen shot of the font manager. Does this help? I have selected Times New Roman Regular as sub font. A. Quote Link to comment Share on other sites More sharing options...
netera Posted January 5, 2022 Author Share Posted January 5, 2022 I can download and try. I'll give that a go - thank you! Damn, this is a real pain in the butt, this is throughout the whole book. Nightmare 😞 A. Quote Link to comment Share on other sites More sharing options...
lacerto Posted January 5, 2022 Share Posted January 5, 2022 1 hour ago, netera said: Not sure what you mean? (EDIT: Also, Times New Roman (Baltic), which is just miisinterpreation, is gone and the pasted text is just "Times New Roman".) I meant this: paste_as_unicode.mp4 I am not sure what actually causes this. Pasting as Unicode also seems to be character based so it does not support formatting, so therefore the additional steps with styles. If you use Word files just for sources for collecting data, I would place them directly in Publisher via File > Place (on the pasteboard, off the page) and then copy paste the required parts in the final text from there, as that would keep both the formatting and correct Unicode encoding. netera 1 Quote Link to comment Share on other sites More sharing options...
netera Posted January 5, 2022 Author Share Posted January 5, 2022 This would take too long as there are too many instances. But I'm sure that would be OK as I have essentially done that when correcting. The other method suggested by pasting special worked, so for now that would appear to be the best move. Thank you so much for your assistance! Very much appreciated. I've been using Publisher for ages and love it to bits, and seldom need to consult the experts, but it’s nice to have help as I learn! Andrew 🙂 Quote Link to comment Share on other sites More sharing options...
netera Posted January 5, 2022 Author Share Posted January 5, 2022 12 minutes ago, Lagarto said: I meant this: paste_as_unicode.mp4 72.53 MB · 0 downloads I am not sure what actually causes this. Pasting as Unicode also seems to be character based so it does not support formatting, so therefore the additional steps with styles. If you use Word files just for sources for collecting data, I would place them directly in Publisher via File > Place (on the pasteboard, off the page) and then copy paste the required parts in the final text from there, as that would keep both the formatting and correct Unicode encoding. FAB! This works. It solves the problem so that gets me off the hook! Thank you so much, you're a star! Andrew 🙂 lacerto 1 Quote Link to comment Share on other sites More sharing options...
netera Posted January 7, 2022 Author Share Posted January 7, 2022 I seem to have another issue with the same text, I'd not noticed - was concentrating on the roman script with discritics, but it seems the Sanskrit is being changed when I use the EDIT>PASTE SPECIAL>UNICODE function. I'll have to gop back and change what I've done as its all wrong - I should have checked 😞 Any ideas why this text is being changed. In the original Word file its Arial Unicode MS, as is the pasted text in Publisher. I have no font substituition set up. Thanks in advance and sorry to bother again. Andrew Sanskrit.afpub Sanskrit.docx Quote Link to comment Share on other sites More sharing options...
lacerto Posted January 7, 2022 Share Posted January 7, 2022 I am no expert and know practically nothing about writing in Sanskrit, but when I tested this with InDesign, I first get the same as in Publisher, but when I enable World Tools and World Ready Composition (a free InDesign tool, which is needed to be able to lay out e.g. Japanese and Arabic texts), get the same as in Word. It might be that it is not possible to use Publisher to get Sanskrit correctly laid out (at least with Arial Unicode MS), but I hope someone proves me wrong. sanskrit.mp4 Quote Link to comment Share on other sites More sharing options...
netera Posted January 7, 2022 Author Share Posted January 7, 2022 That would be a nightmare, I only have Publisher. I wondered if it was a spell check issue and when I go to check spelling it shows the right character in the spell checker. Why would that be? A. Quote Link to comment Share on other sites More sharing options...
lacerto Posted January 7, 2022 Share Posted January 7, 2022 It might be an encoding related issue (possibly true Unicode vs. UTF-8), the system supporting it (as when displaying system controls like dialog boxes) but not the app? Quote Link to comment Share on other sites More sharing options...
netera Posted January 7, 2022 Author Share Posted January 7, 2022 I'm downloading Libreoffice and see if that yelids better results than word. I turned off spell check also, but it made no difference. Nightmare - haha. Quote Link to comment Share on other sites More sharing options...
netera Posted January 7, 2022 Author Share Posted January 7, 2022 No, Libreoffice is the same. Weird you can copy from Word to Libre Office with no change in formatting. I also tried in AfPhoto and AfDesign, just in case I had ticked somthing weird in Publisher, alas, nope. It looks like I will have to export as an image and place in the text. I wonder if anyone has posted this on youtube? Quote Link to comment Share on other sites More sharing options...
lacerto Posted January 7, 2022 Share Posted January 7, 2022 17 minutes ago, netera said: No, Libreoffice is the same. Weird you can copy from Word to Libre Office with no change in formatting. Not if both support the same encoding. I tried copying from Word and LibreWriter (betweeen which text in Sanskrit using Arial Unicode MS can be exchanged) to macOS Pages, and it also works there, so this appears to be an encoding (app-related) issue. Quote Link to comment Share on other sites More sharing options...
netera Posted January 7, 2022 Author Share Posted January 7, 2022 I found a few other posts relating to Devanargari/Sanskrit, these were from 2020 so maybe they were not able to address it yet. OK, seems I will need to insert images so get around it. Hassle, but no other option. Thank you for your assistance! Andrew 🙂 Quote Link to comment Share on other sites More sharing options...
kenmcd Posted January 7, 2022 Share Posted January 7, 2022 There is some sort of odd invisible character within the Latin text. DOCX files are ZIPs containing XML, etc. so I opened it and looked directly at the document.xml file. Usually what I do to find what Unicode characters are actually there is paste the text intohttps://r12a.github.io/app-listcharacters/ which does not show the expected ā - it shows a. So after pasting, there is nothing odd remaining. This is what I see in the document.xml file (in a UTF-8 text editor). <w:t>jyotiṣapaṭhanam</w:t> (notice no ā) But when I move my cursor over the text there the cursor jumps around oddly. So there is some sort of odd character(s) within that text. You may be able to fix these with a find/replace in the document.xml file. When I change the encoding on that text to ANSI-Baltic this is what I see: <w:t>jyotiį¹£apaį¹hanam</w:t> Which just looks to me like it does not recognize those two characters (ṣṭ). As expected. So my guess is there are some characters in there with odd encoding.APub is getting that "Baltic" from something it sees in the text. Its guess may be wrong, so it may be something which needs to be looked at. Update: Where did the original text come from? I am guessing it was pasted from somewhere, not typed into Word. Quote Link to comment Share on other sites More sharing options...
kenmcd Posted January 7, 2022 Share Posted January 7, 2022 4 hours ago, netera said: I seem to have another issue with the same text, I'd not noticed - was concentrating on the roman script with discritics, but it seems the Sanskrit is being changed when I use the EDIT>PASTE SPECIAL>UNICODE function. I'll have to gop back and change what I've done as its all wrong - I should have checked 😞 Any ideas why this text is being changed. In the original Word file its Arial Unicode MS, as is the pasted text in Publisher. I have no font substituition set up. Thanks in advance and sorry to bother again. Andrew Sanskrit.afpub 396.05 kB · 5 downloads Sanskrit.docx 13.03 kB · 7 downloads The Indic script support is not complete. Sometimes it works, sometimes not. Which is why they do not claim support. But some users have reported that oddly when printing the text it works properly. And that does seem to work with your sample doc. It still looks wrong on screen, but it does print correctly. So a bit of an odd workaround - print to a PDF printer driver.Not Export to PDF. Printing to the Microsoft Print to PDF printer worked. The PDF looked correct. Printing to the FinePrint printer, and then transferring that to pdfFactory worked. PDF was fine. Some other PDF printers may also work. Give that a try. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.