Jump to content

Recommended Posts

New here — not much luck yet with forum search. If there's a discussion about this, apologies for not having found it.

Back in the Neolithic I used QuarkXPress, which supported a tagging method for text import: Simple codes embedded in plain text were transformed into complex formatting during import. The competition didn't have such a feature at the time. It was among several reasons for QXP's becoming the program of record for book pagination (until InDesign came along).

Even years before microcomputers took over the world, the typesetting systems I used had tagging and translation-table features. Same purpose: Prepare text containing simple codes, then get complex formatting during text import. The machines' CPUs ran at glacial speeds compared with what we have now. But the text-import systems were fast and efficient.

It's orders of magnitude faster than importing plain text into a design/pagination program and then hand-formatting it. Search/replace is not efficient unless a program supports complex search/replace enabling it to find starting and ending tags and formatting text located between those tags. Even at that, having to do it repetitively is tedious and time-consuming. (If search/replace can be controlled via scripting, that certainly helps.)

Manipulating text outside the pagination program is inherently more efficient. It can be done with powerful and fast tools ideal for that purpose (Python, Perl, Ruby, and so forth). 

Affinity Publisher looks like an excellent contender. It too needs this kind of feature. If the company has no such plans for the near future, I hope the program has a plug-in architecture enabling a third party to add this functionality. To anyone importing a lot of text, that kind of automation is worth paying for.

Share this post


Link to post
Share on other sites

I too have requested using tagged text. And if Serif decides to do this, I would prefer QXP's style of tagged text as it isn't as verbose as ID's style. I prepare tagged text for every book to be laid out in Q (and ID) to this day. It is reason #1 why I won't use APub for books. The tagged text is generated from Word to CMS systems to raw data out of SQL I'll import to Access and generate the tagged text from.

I don't think Serif has committed to plug-ins, but there has been a lot of interest shown by users and at least one plug-in developer (Astute).

Mike

Share this post


Link to post
Share on other sites

If the underlying architecture doesn't now support plug-ins, I wonder how much difficulty they'll have adding that later. I never used InDesign — I was out of the book-pagination business long before it became "a thing" — and haven't seen its tagging system. Is it ghastly, along the lines of SGML?

My recollection of XPress Tags is a bit hazy. If memory serves, XPress Tags permitted you to specify not just character styles, but named paragraph styles during text import — yes?  I actually worked on a book about QuarkXPress way back then. I suppose I could just go look it all up. :-)

Even rudimentary tagging would be better than nothing.

{pstyle:"some-style-name"}Some text {cstyle:"italics"}with formatting{/cstyle} via tagging.{/pstyle}

I suppose that to someone who's never used systems like that, it looks decidedly user-hungry. But to people accustomed to scripting it's a walk in the park. I did this in creating a simple e-book once. Devised my own system of text codes (far simpler than the above) and then wrote scripts to transform the simple codes into XHTML in which the style names matched what was already set up in the ePub editor. Not worth taking the time for a small job, but absolutely worth taking the time to set up if it's a long document. In the end you save a lot of time and headache...

Share this post


Link to post
Share on other sites

I generally do not specify the actual p.style or c.style definitions themselves. But that is certainly possible. Instead I will generate the tagged text like you show. Then once inside of Q, I'll actually modify the styles to suit or import the tagged text into a template with styles already set up to the same names.

Here's an example text block from a tagged text file...

<v13.21><e9>
@01 Date:11
@02 Day:TUE<\c>@03 Title:Nosferatu <@FilmRating>(PG)
@04 Time:7<\a>9pm
@05 Location:Minghella Building
@06 Description:Germany 1922. Directed by F.W. Murnau <\a> with Max Schreck, Greta Schroder, Ruth Landshoff
@07 Length:81 min
@08 Spacing:

The first line is the QXP version + the format (<e9> = UTF8), followed by style names ended by the colon, then the paragraph's text. The <@FilmRating> in this case is how character styles are used. Because the paragraph ends with the c.style, there is no need to use a closing tag or the tag to reset the style back to the p.style (which is simply <$>) presently in use. The <\a> tag is for an en-dash.

Importing into my template results in this section's text frames already formatted, across a few pages of listings.

Capture_000100.png.8208f60c47c79caaf0448507b6d6bc0c.png

I use tagged text on far more than books, like the above's newspaper. Each section is processed in the same way, whether an article, page ads, listings, classifieds, etc. It takes me less than an hour with all the sections to produce their respective tagged text files, minutes to import and about 2 hours to paginate and produce the first draft of the newspaper. Nearly every job/job type uses tagged text.

With a decent Word manuscript, a novel takes me between 2-4 hours beginning to end, depending upon images. I couldn't do the same in something that doesn't support tagged text.

Share this post


Link to post
Share on other sites

Thanks for taking the time to post that. It's an excellent illustration of how critical such a system is for efficient work. It's triggering a memory of once having done a catalogue job in much the same way. The tagging saved a huge amount of time. (I take it from what you're writing that QXP has some life in it yet.)

At the shop where I worked we had many problems with a thousand little MS-Word "gotchas." I began to dread receiving customer source material in Word format. Because it contained formatting, working with it was faster than importing plain text. Still, there was always a lot of manual "massaging" afterward — sometimes, line-by-line searches for weird problems. Word would sometimes insert strange zero-width characters — I never did learn what they are — that had to be rooted out. It's a mixed blessing.

I hope Affinity Publisher's authors take this kind of thing seriously — much sooner than later. (Or, again, that there's a plug-in architecture making a third-party tool possible.

Does AP at least accept plain text with HTML tagging — simple stuff like <strong> or even just <b> — as input? (I haven't bought it yet. I probably will. It really does look excellent in so many ways.)

Share this post


Link to post
Share on other sites

Do download a trial version to test.

No to html formatting. At this time, only plain text and RTF file formats for importing text is allowed.

Yep, Q is alive and well.

If you end up with a layout application that imports tagged text and receive Word files, there is an add on that will both clean the file and export tagged text in minutes that is inexpensive. I use it every day. 

Share this post


Link to post
Share on other sites
1 minute ago, MikeW said:

If you end up with a layout application that imports tagged text and receive Word files, there is an add on that will both clean the file and export tagged text in minutes that is inexpensive. I use it every day. 

Sorry to hear about the no-HTML. These are significant gaps, as it were.

The cleanup program sounds intriguing. It might be useful for other purposes, too. What program is it?

Share this post


Link to post
Share on other sites

Thanks. Those tools sound extremely useful, all right. Makes me wonder if there's a tool that can take a plain text file containing tagging of some kind and convert it into MS-Word document, creating named paragraph and character styles as it goes (not just the usual "Heading 1", "Heading 2", etc. styles). That could make the job somewhat less painful.

AP supports RTF, eh? I used to try parsing that stuff in scripts. Painful

Share this post


Link to post
Share on other sites
32 minutes ago, MikeA said:

...Makes me wonder if there's a tool that can take a plain text file containing tagging of some kind and convert it into MS-Word document, creating named paragraph and character styles as it goes (not just the usual "Heading 1", "Heading 2", etc. styles). That could make the job somewhat less painful...

That's a sort of. The answer is yes, but not c.styles. But italic, bold & bold italic, yes. From there one can import into Word and do a search/replace with character styles. And can import simple tables & images.

There is another add-in I use to export from Word as Markdown, but can paste Markdown as well.

http://www.writage.com

Mike

Share this post


Link to post
Share on other sites

I've been hunting about on the web and have run across a few more tools, including something called Pandoc, which its author describes as a Swiss Army Knife of conversion tools. It scores the usual 11 on the 1-to-10 geekiness scale and has the usual minimalist reference material. Whether it can convert from, say, Markdown to a .docx file containing named (user-defined) paragraph and character styles, I can't tell yet.

Share this post


Link to post
Share on other sites

Formatting during import is something that needs to happen in one way of other. We need style automation in import, what ever the method.

Usually creating the tags is the problem part. If it is just as time consuming as styling in layout app it is quite unnecessary. 

I would think usually tagging is unneeded extra step as tagging/styling is something that layout app should be able to do itself. (In Ventura tagging and styling were quite the same concept..) Mapping Word-styles to Publisher styles in import would be a good start here. 

I understand there are workflows where tagged text gives more control with difficult materials, though.

Share this post


Link to post
Share on other sites

I mark certain texts with simple old PageMaker styled tags like <Heading 1>, <Body text> etc. saved as autotext shortcuts in Word, and use true formatting for local things like bold and italics, etc. Then I use an InDesign script to remove the tags and have real formatting (already existing as InDesign styles) with the denoted paragraph styles applied to the text. This is useful since this completely ignores Word paragraph formatting (and its complexities) during import. 

As Fixx noted, if tagging is a tedious manual job in Word then it is usually not worth the effort (at least for relatively simple texts). Catalogs and such often come from databases and then it is easy to add complex tagging effectively and in such situations there is no substitute for tag-based formatting in page-layout program as highly complex layouts can be prepared in a snap.

Otherwise InDesign's and Publisher's inherent support for applying complex formatting for coherently structured highlighted catalog-kind of paragraph separated text using the (ideally looped) "Apply P1Style, then Next Styles" commands are quite effective as they work without any kind of tagging.

Share this post


Link to post
Share on other sites
54 minutes ago, Lagarto said:

Otherwise InDesign's and Publisher's inherent support for applying complex formatting for coherently structured highlighted catalog-kind of paragraph separated text using the (ideally looped) "Apply P1Style, then Next Styles" commands are quite effective as they work without any kind of tagging.

Hmm. Hadn't even thought of whether AP's "next style" command would be effective for this purpose. Yes, if that worked during text import it would be handy (assuming that's what you meant).

Share this post


Link to post
Share on other sites

I tried a couple of Word "clone" programs. LibreOffice was buggy...won't take the time to troubleshoot. Uninstalled it. Next was a demo version of an inexpensive program called SoftMaker Office TextMaker. No bugs so far. (I know, I know — give it time...) Next step: Small HTML file containing CSS definitions such as:

p.test {}

Right — just "{ }". Absent the braces, this doesn't work. The test document contains such text as:

<p class="test">something here</p>

This does create a paragraph style "test" in the TextMaker file. Don't know yet about character styles. Having already devised tagging schemes for plain text that became XHTML files, I can see that it wouldn't be difficult to do the same again. The scripting (Perl) is a bit tedious, but once it's done it's done and then you have your HTML file. Open it in the Word (clone) program, import the styles of a previously set-up .DOCX file, and this might be workable after all. Well, for self-authored text. Tediously re-tagging someone else's manuscript would be...ugh.

How AP treats incoming Word styles is another matter. No clue on that yet. And no clue yet whether AP will accept — or choke on — .docx files created in TextMaker.

Share this post


Link to post
Share on other sites
18 minutes ago, MikeA said:

Hmm. Hadn't even thought of whether AP's "next style" command would be effective for this purpose. Yes, if that worked during text import it would be handy (assuming that's what you meant).

I mean this: not specifically formatting *during* import but manual formatting. This may or may not be useful, but if you have highly structured repetetive text like catalogs where each field is separated by paragraph break and empty fields are imported, and records follow one another in a loop, you can format such document in one go simply by selecting all text and applying "next style" formatting, without using any tags.

nextstyleformatting.jpg.7a88221e0437653a9ae182178b013bbd.jpg

Share this post


Link to post
Share on other sites

Even without scripting support and automatic conversion of tags, you can still import your custom tagged plain text (or even html tags) and then use simply Search and Replace to remove the tags and apply equivalent paragraph and character formatting to text.

(There might be a bug in Publisher as it currenly does NOT remove the tag even if the Replace field is left empty, if the replacement contains a style criterion -- InDesign does remove the searched content but applies replacement formatting for the found spot, so you do not need to do separate search and replace without formatting simply to remove the tags, as you currently need in Publisher). EDIT: Oops, it does not, but my script does as it knows the search criterion to be a tag! So one extra round is needed to remove the tags.

With regex support it is easy to format character styles, too, e.g. search <i>.*</i> replace with italics.

Not too tedious a job even without scripting support, but the point is that you do not need to have any special tools or plugins to be able to format effectively tagged text.

Share this post


Link to post
Share on other sites
1 hour ago, Lagarto said:

(There might be a bug in Publisher as it currenly does NOT remove the tag even if the Replace field is left empty, if the replacement contains a style criterion -- InDesign does remove the searched content but applies replacement formatting for the found spot, so you do not need to do separate search and replace without formatting simply to remove the tags, as you currently need in Publisher). EDIT: Oops, it does not, but my script does as it knows the search criterion to be a tag! So one extra round is needed to remove the tags.

A regex find/replace can to do it in one pass:

Find: <i>(.*)</i>

Replace: \1
and specify the formatting, such as Emphasis or a style name.


-- Walt

Windows 10 Home, version 1903 (18362.239), 16GB memory, Intel Core i7-6700K @ 4.00Gz, GeForce GTX 970
Affinity Photo 1.7.2.471 and 1.7.3.476 Beta   / Affinity Designer 1.7.2.471 and 1.7.3.476 Beta  / Affinity Publisher 1.7.2.471 and 1.7.3.475 Beta

Share this post


Link to post
Share on other sites

Find (<i>)(.*)(</i>)

Replace \2

...I think. It would be nice to be able to save regex expressions. Because I script all recurrent tasks that benefit from parameters, or use saved expressions, I need to check the regex syntax over and over again and never get it right in one shot.

Share this post


Link to post
Share on other sites
Just now, Lagarto said:

Find (<i>)(.*)(</i>)

Replace \2

...I think. It would be nice to be able to save regex expressions. Because I script all recurrent tasks that benefit from parameters, or use saved expressions, I need to check the regex syntax over and over again and never get it right in one shot.

Walt's means is shorter, easier. The open/closing tags will be removed automatically without being captured. Yours will also, but they don't need capturing in the first place.

Share this post


Link to post
Share on other sites
1 hour ago, MikeW said:

Walt's means is shorter, easier. The open/closing tags will be removed automatically without being captured. Yours will also, but they don't need capturing in the first place.

Sorry, true. I tried it but got something wrong as I ended up removing the actual text, as well.

My brain always hurts with regex, I'd imagine \1 would replace the only thing grouped in parentheses but instead it removes the non-marked excluded part! I hope Serif adds saved regex expressions already in some 1.x update, and also includes some useful things as factory defaults. The saved expressions if possible could also allow inclusions of formatting criteria and indicate it somehow in the title. Better yet, it would be nice to be able to run multiple regex expressions in a batch that can also be saved as a task... That would allow effective formatting of tagged text using existing style definitions.

Share this post


Link to post
Share on other sites
15 minutes ago, Lagarto said:

I hope Serif adds saved regex expressions already in some 1.x update

I know it is a rather lame workaround, but what you already can do is save your expressions as small text snippet as assets. You could create a seperat category just for these. Others reported that they keep a seperate *.txt file with their collection of expressions.

This really is just a poor workaround but perhaps helpful.

Apart from that this thread is very educational to me :)

d.


Affinity Designer 1.7.1.404 (beta 1.7.3.476)   |   Affinity Photo 1.7.1.404 (beta 1.7.3.476)   |   Affinity Publisher 1.7.1.404 (beta 1.7.3.475)
Affinity Designer for iPad 1.7.0.7   |   Affinity Photo for iPad 1.6.8.77

Windows 10 (1809) 64-bit - Core i7 - 16GB - Intel HD Graphics 4600 & NVIDIA GeForce GTX 960M
iPad pro 9.7" + Apple Pencil

Share this post


Link to post
Share on other sites
12 minutes ago, dominik said:

I know it is a rather lame workaround, but what you already can do is save your expressions as small text snippet as assets. You could create a seperat category just for these. Others reported that they keep a seperate *.txt file with their collection of expressions.

This really is just a poor workaround but perhaps helpful.

Apart from that this thread is very educational to me :)

d.

While I save certain expressions in my Favorites of my text editor, I also save a text file. I do the same with scripts & macros for the text editor. In all cases I save a text snippet in that text file that led to me writing the expression, script or macro as well. These are modified for the task at hand and saved to a new text file. Been doing this for eons.

Share this post


Link to post
Share on other sites
Posted (edited)

»» you can still import your custom tagged plain text (or even html tags) and then use simply Search and Replace to remove the tags and apply equivalent paragraph and character formatting to text

That would become a lot of work if we're talking about a long document with complex formatting — the kind of work I'd hope to avoid. I've done that kind of thing in the past when I had to. It was tedious and time-consuming. In such cases it's as if the computer isn't helping so much as hindering — creating more work for you rather than less. When you can run scripting within a program to help automate such a task, that helps ease some of the pain.

»» There might be a bug in Publisher as it currenly does NOT remove the tag even if the Replace field is left empty, if the replacement contains a style criterion

Sounds like a bug, all right.

If you want to be rid of the tags entirely — when formatting during the replacements is not an issue: Assuming the program supports character classes and "greedy" matches, this should work (not tested on any real-world document in AP, but I've done this kind of thing many times in the past):

Search for:  

<[^>]+?>

That is: find "<", then 1+ of anything that isn't ">", up to — but no further than — the next occurrence of ">" ... and for "replace" use: nothing at all. This would kill ALL of the elements at once — <p>, <p class="xyz">, <h1>, <ul>, et al.  The expression could become more complicated if you also want to remove all closing tags and/or those like "<br/>" in a single pass.
 

Edited by MikeA
typo

Share this post


Link to post
Share on other sites
15 hours ago, MikeA said:

»» There might be a bug in Publisher as it currenly does NOT remove the tag even if the Replace field is left empty, if the replacement contains a style criterion

Sounds like a bug, all right.

No, I just mixed the things. It is definitely not a bug, this is how the feature should work when you have formatting criteria in the replacement part. Otherwise it would not be possible to use Find-Replace just for formatting (without touching the actual text content).

15 hours ago, MikeA said:

»» you can still import your custom tagged plain text (or even html tags) and then use simply Search and Replace to remove the tags and apply equivalent paragraph and character formatting to text

That would become a lot of work if we're talking about a long document with complex formatting — the kind of work I'd hope to avoid. I've done that kind of thing in the past when I had to. It was tedious and time-consuming. In such cases it's as if the computer isn't helping so much as hindering — creating more work for you rather than less. When you can run scripting within a program to help automate such a task, that helps ease some of the pain.

Assuming that you have already the styles created in Publisher, scripting makes tag-based formatting a matter of seconds even if you have a complex document. Paragraph-based formatting is fairly simple as you do not need to have ending tags but can have any arbitrary tags just marking the beginning of the paragraph style formatting. If you also tag character styles, simple html-style <i></i> and <b></b>enclosed tags do the job, but if you can create Word documents, it is probably much easier to just use local true formatting (as in the attached example document) and use tags only for paragraph formatting.

E.g., for a magazine kind of publication, it would require about 9 tags and regex rounds, something like below.

Find: <t>(.*?$)
Replace: \1, format with Title
Find: <s>(.*?$)
Replace: \1 (format with Sub Title)
Find: <a>(.*?$)
Replace: \1 (format with Authors)
Find: <lead>(.*?$)
Replace: \1 (format with Lead)
Find: <h>(.*?$)
Replace: \1 (format with Sub Heading)
Find: <bodyf>(.*?$)
Replace: \1 (format with Body first)
Find: <body>(.*?$)
Replace: \1 (format with Body text)
Find: <bullet>(.*?$)
Replace: \1 (format with Bullet)
Find: <e>(.*?$)
Replace: \1 (format with End Note)

If you have bold and italic text enclosed in <i></i> and <b></b> kinds of tags then you would have to do additionally something like this (as noted by Walt above):

Find: <i>(.*)</i>
Replace: \1 (format with italic)
Find: <b>(.*)</b>
Replace: \1 (format with bold)

Typically you would need to perform this for each article; but let's assume that you have e.g. 20 articles included in the same document, or have any book-size job in one document, then performing this kind processing is not too bad. 

If you have any word processor that allows you to save tags as keyboard shortcuts, it is a quick job to tag most text types. If you have a catalogue-kind of highly structured document, tagging may become tiresome so if it is not possible to include tagging already at stage of exporting data from e.g. a database, it is worth a consideration whether it is possible to just see that data is arranged so that it can be formatted with "apply first style then next styles" command in a loop in one go. That would allow quick formatting of complex information without any tagging. 

If Publisher adds regex saving in future updates, and allows running saved expressions in a batch, this kind of tag-based formatting can become quite effective even without scripting and plugin support.

See the attached example files: 

Article.docx

Article.afpub

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×