Jump to content


  • Content count

  • Joined

  • Last visited

  1. I got out of that business before InDesign became a "thing" and never used it. Adobe opposed a plugin dependency — yet supported plugin development by providing an open plugin interface? I'm not clear what that means. I can imagine certain users being downright delighted to be able to script home-grown solutions. I can also imagine plenty of users wanting to do their design work without also having to become developers. In which case, the presence of third-party tools — someone else does the heavy lifting development-wise — is a huge plus. As for recurring charge: Did plugin developers for QXP (or other page-composition programs) also go to some kind of subscription model?
  2. Matter of preference, I suppose. I would gladly trade off the minor inconvenience of having to re-type "i.e." if what came with it was the ability to leave the replacement string empty and thus successfully delete the "found" string in a single pass. If that isn't the design, so be it, and the "(.*) and "\1" approach is effective instead.
  3. Permit me to offer: ugh. Point taken. I mis-spoke a bit. I was thinking about an underlying architecture that permits the development of third-party extensions. I don't think we can reasonably expect a given program's authors to have either the time or inclination to code every possible new feature someone might need. As Quark showed only too well, sometimes their own implementation of a feature was not so good, and along came someone else with a far better approach — and users were willing to pay for the extensions if they provided functionality not available in the program "proper" (or if the principal authors hadn't done a good job of it). Some extensions to QXP turned out to be ridiculous, buggy, and decidedly overpriced. But others were of the "can't live without it" variety. Word's an example. A lot of people loathed and despised its ribbon-bar — especially when Microsoft made it impossible to switch back to classic view. I worked at Microsoft at the time and, trust me, a WHOLE lot of people who worked there absolutely hated that ribbon bar. Outlook's and Excel's ribbon bars weren't so awful. Word's was bloody-awful. But, when the Office group said "jump" the company as a whole pretty much had to respond "Ok — how high?" and there was nothing to be done about it. Then along came certain "MVP" developers with inexpensive add-ons that enabled people to restore classic view when they wanted it. It was a huge "plus" and I bought one of those things the first I heard of it. "Extensibility" is a good thing.
  4. That's interesting. I wonder what would happen if "$" were simply omitted from the regular expression search (in AP, I mean). Something to try later on... I guess it explains more about expressions such as (.*?$). It caught my eye because if I were doing searches and replacements with Perl regular expressions it would be typical for the "$" to appear outside the ")" — or a line boundary would be specified as "\n" and might or might not appear within the parens — depends on the situation. All these different approaches to regular expression use ... it's the kind of thing about which some cynic remarked: The wonderful thing about standards is, you can have as many of 'em as you want.
  5. >> [walt.farrell] The version you showed, with a captured string and a Replacement of \1 with formatting would work. From what I'm reading earlier in the thread, I guess the program is designed to do this when the replacement string is empty: • Find a paragraph starting with "<t>" or some other tag. • Leave the "<t>" (or whatever it is) in place and assign, to the entire paragraph, the paragraph style named in the replacement instruction I tried this with the Word "clone" program I have here and it does work that way. That's a bit of a surprise. I would expect that if your instruction is "delete the 'found' string" — then it's to be deleted! I'm trying to think of a use case in which I would specify deletion and yet not want the deletion to occur. So far nothing's coming to mind. Well, if it's designed that way, so it goes. And then using (.*) and "\1" makes sense after all. In AP's regex parlance, does ^ mean start of paragraph and $ mean end of paragraph?
  6. >> You could also import all chapters as separate documents into the same layout, but is there a specific reason to keep them as independent documents? When I worked at a shop where customers supplied files in multiple formats, we didn't have the luxury of specifying precise file formatting or other preparation. We had to work with the material as-is. It's a perfectly understandable and sensible approach on the authors' parts to prepare books — and especially newsletter articles — as single files. I doubt many of them would have been happy with us if we'd said "We need you to go back and convert everything into a single large file." We wouldn't have been thrilled to have to concatenate all of the material into a single large file either. (I don't remember our ever having done that.) In my opinion the software shouldn't force users to work in a given way and only that way. The developers might find it reasonable, but it doesn't do much for users with varying needs. >> Also, a general note on (lack of) support for tags when importing text files: aren't tag based import filters basically a plain text based feature? They were when I was working with XPress Tags files. (I don't know how it works with InDesign.) At that shop we received a lot of plain text material that needed formatting. Because plain text can be heavily — and very rapidly — manipulated outside a page composition or word-processing program via scripting, it can be (not always "is," but "can be") a highly efficient approach. I'm talking about scores of files updated via script in a few seconds. Perl and other such tools are very quick. >> If so, they are not practical as there is plenty of stuff in Word format that is worth to be imported directly They might sound impractical. But, run a complex script routine on dozens of input files in seconds rather than minutes. It might well change one's opinion about what is and isn't practical. We could never predict how customer data would be supplied. If it arrived in some esoteric format that no program supported, we would have to strip it down to plain text and "massage" the text — say, by adding XPress Tags — for import into the page composition program. Those jobs became rather expensive for the customer due to the additional processing time. And a customer-supplied Word file, even if ideal in some respects, might contain any number of problems that all had to be corrected via search/replace or 100% manual formatting. Word alone isn't, of itself, necessarily a saving grace. >> Batch-based regex support with formatting criteria would allow much the same without requiring any scripting skills, and that would be kind of a minor revolutionary thing and I really hope this will happen at some point. It would certainly move the program along, yes. An entire automated formatting routine stored for immediate access and use would be excellent. >> It should be fairly easy to implement, compared to creating a scripting API with full access to document object model. I hope for users' sakes that the underlying architecture makes that feasible. Would it be easy to implement if there aren't any "hooks" for it in the present design? Possibly not. (I'm not doubting that AP has a suitable architecture; I don't know anything about what's underneath the bonnet.) Someone from AP, replying to mail I sent, noted "It's early days yet." I do get that. They're just getting started...
  7. I was talking with a guy who's written extensively about publishing tools. He's a bit miffed about AP's lack of InDesign or QuarkXPress-style tag reading during text import. It might be that AP doesn't plan to try going head-to-head with InDesign in all aspects of the publishing business, such as book pagination. Absent a tagging system in AP, QXP and ID will remain more desirable. For shorter documents not requiring automation — could be a far different story. AP's graphics-related features are very impressive. If a book contains 20 chapters with 6 paragraph styles per chapter plus two kinds of "hard" formatting (italics and boldface), you're looking at 160 separate instance of typing out the regular expressions and manually entering the style assignments into the Replace dialog. That strikes me as a lot of work that screams for automation. In the absence of a batch operations feature or at least the ability to save search/replace instructions for quick recall — doing this correctly requires that you make zero errors when you're typing the instructions out 160 times. Otherwise you end up having to undo a mistake and re-do it. That's why the idea of using Word as an intermediate format appealed to me...in theory at least. Tagged text -> Perl script -> HTML file -> import into Word (with the CSS instructions actually creating the necessary paragraph styles) -> save as .DOCX -> import into AP document. Admittedly it'd be a workaround and not a solution, until a read-tagging-during-import feature is added. As for something like this: Find: <t>(.*?$) Replace: \1, format with Title Could you give yourself much less to type, repetitively, by doing this sort of thing: Find: <t> Replace: (with nothing) and format current paragraph with style named Title Concerning the other replacements — e.g., Find: <i>(.*)</i> and Replace: \1 (format with italic) — I would suggest non-greedy matches for those rare situations in which a greedy match would give you some grief. Thus: Find: <i>(.*?)</i> Replace: \1 (format with italic)
  8. >> No, I just mixed the things. So when you specify that a string is to be deleted and you're assigning a paragraph style at the same time, it does delete the text string?
  9. »» you can still import your custom tagged plain text (or even html tags) and then use simply Search and Replace to remove the tags and apply equivalent paragraph and character formatting to text That would become a lot of work if we're talking about a long document with complex formatting — the kind of work I'd hope to avoid. I've done that kind of thing in the past when I had to. It was tedious and time-consuming. In such cases it's as if the computer isn't helping so much as hindering — creating more work for you rather than less. When you can run scripting within a program to help automate such a task, that helps ease some of the pain. »» There might be a bug in Publisher as it currenly does NOT remove the tag even if the Replace field is left empty, if the replacement contains a style criterion Sounds like a bug, all right. If you want to be rid of the tags entirely — when formatting during the replacements is not an issue: Assuming the program supports character classes and "greedy" matches, this should work (not tested on any real-world document in AP, but I've done this kind of thing many times in the past): Search for: <[^>]+?> That is: find "<", then 1+ of anything that isn't ">", up to — but no further than — the next occurrence of ">" ... and for "replace" use: nothing at all. This would kill ALL of the elements at once — <p>, <p class="xyz">, <h1>, <ul>, et al. The expression could become more complicated if you also want to remove all closing tags and/or those like "<br/>" in a single pass.
  10. I tried a couple of Word "clone" programs. LibreOffice was buggy...won't take the time to troubleshoot. Uninstalled it. Next was a demo version of an inexpensive program called SoftMaker Office TextMaker. No bugs so far. (I know, I know — give it time...) Next step: Small HTML file containing CSS definitions such as: p.test {} Right — just "{ }". Absent the braces, this doesn't work. The test document contains such text as: <p class="test">something here</p> This does create a paragraph style "test" in the TextMaker file. Don't know yet about character styles. Having already devised tagging schemes for plain text that became XHTML files, I can see that it wouldn't be difficult to do the same again. The scripting (Perl) is a bit tedious, but once it's done it's done and then you have your HTML file. Open it in the Word (clone) program, import the styles of a previously set-up .DOCX file, and this might be workable after all. Well, for self-authored text. Tediously re-tagging someone else's manuscript would be...ugh. How AP treats incoming Word styles is another matter. No clue on that yet. And no clue yet whether AP will accept — or choke on — .docx files created in TextMaker.
  11. Hmm. Hadn't even thought of whether AP's "next style" command would be effective for this purpose. Yes, if that worked during text import it would be handy (assuming that's what you meant).
  12. I've been hunting about on the web and have run across a few more tools, including something called Pandoc, which its author describes as a Swiss Army Knife of conversion tools. It scores the usual 11 on the 1-to-10 geekiness scale and has the usual minimalist reference material. Whether it can convert from, say, Markdown to a .docx file containing named (user-defined) paragraph and character styles, I can't tell yet.
  13. Thanks. Those tools sound extremely useful, all right. Makes me wonder if there's a tool that can take a plain text file containing tagging of some kind and convert it into MS-Word document, creating named paragraph and character styles as it goes (not just the usual "Heading 1", "Heading 2", etc. styles). That could make the job somewhat less painful. AP supports RTF, eh? I used to try parsing that stuff in scripts. Painful.
  14. Sorry to hear about the no-HTML. These are significant gaps, as it were. The cleanup program sounds intriguing. It might be useful for other purposes, too. What program is it?
  15. Thanks for taking the time to post that. It's an excellent illustration of how critical such a system is for efficient work. It's triggering a memory of once having done a catalogue job in much the same way. The tagging saved a huge amount of time. (I take it from what you're writing that QXP has some life in it yet.) At the shop where I worked we had many problems with a thousand little MS-Word "gotchas." I began to dread receiving customer source material in Word format. Because it contained formatting, working with it was faster than importing plain text. Still, there was always a lot of manual "massaging" afterward — sometimes, line-by-line searches for weird problems. Word would sometimes insert strange zero-width characters — I never did learn what they are — that had to be rooted out. It's a mixed blessing. I hope Affinity Publisher's authors take this kind of thing seriously — much sooner than later. (Or, again, that there's a plug-in architecture making a third-party tool possible. Does AP at least accept plain text with HTML tagging — simple stuff like <strong> or even just <b> — as input? (I haven't bought it yet. I probably will. It really does look excellent in so many ways.)