Jump to content
You must now use your email address to sign in [click for more info] ×

Hyperlinks not Auto detected


Recommended Posts

I cut and pasted the entire resource/references text section from a PDF with active links into Publisher but all the links become DOA.  Publisher should recognize all http:// or https:// as links and make them clickable in exported PDFs.  Why does this not work? Am I missing something?  What's a workaround?

Example entry:

1. Hartsel, J. A., Boyar, K., Pham, A., Silver, R. J., & Makriyannis, A. (2019). Cannabis in Veterinary Medicine: Cannabinoid Therapies for Animals. In Nutraceuticals in Veterinary Medicine (pp. 121-155). Springer, Cham. https://www.researchgate.net/profile/Kyle_Boyar/publication/333306722_Cannabis_in_Veterinary_Medicine_Cannabinoid_Therapies_for_Animals/links/ 5d0178b0299bf13a385104fb/Cannabis-in-Veterinary-Medicine-Cannabinoid-Therapies-for-Animals.pdf

Link to comment
Share on other sites

I guess it does actually work on export to a PDF.  Many of the links were broken during the cut and paste process and none of the link took on the industry standard look, blue underlined.

Can this be added as a Style?  All links have a style different from the body text?

Link to comment
Share on other sites

  • Staff

Hi Chrisborgman,

I think it would be a good idea to have a separate setting in text styles just for hyperlinks. I will move this thread to the feature requests section.

Thanks

C

Please tag me using @ in your reply so I can be sure to respond ASAP.

Link to comment
Share on other sites

2 hours ago, Callum said:

I think it would be a good idea to have a separate setting in text styles just for hyperlinks. I will move this thread to the feature requests section.

There is already a "Hyperlink" Character Text Style that appears when the user has added a Hyperlink and chosen that style in the Text > Interactive > Insert Hyperlink... dialog.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

  • Staff
1 hour ago, walt.farrell said:

There is already a "Hyperlink" Character Text Style that appears when the user has added a Hyperlink and chosen that style in the Text > Interactive > Insert Hyperlink... dialog.

I believe in dare I say it PagePlus that a separate style wasn't needed for hyperlinks and instead hyperlink settings could be defined within the text style of the block of text the hyperlink is found in. However I may be wrong as it has been a while since I used that program. This is the behaviour I was referring to as a good idea.

Please tag me using @ in your reply so I can be sure to respond ASAP.

Link to comment
Share on other sites

6 minutes ago, Callum said:

This is the behaviour I was referring to as a good idea.

That does sound interesting. Thanks, Callum.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

Just now, fde101 said:

Wouldn't regex style support eliminate the need for this?

Probably, but not easily. Constructing a regular expression to recognize a URL is harder than it seems at first glance. Especially if one wants to highlight only the actual URL, and to handle the case where it occurs immediately before some punctuation mark that is allowed in URLs but is not part of the URL the user intended (e.g., a URL that appears at the end of  a sentence, or before a comma).

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

1 hour ago, walt.farrell said:

Probably, but not easily. Constructing a regular expression to recognize a URL is harder than it seems at first glance. Especially if one wants to highlight only the actual URL, and to handle the case where it occurs immediately before some punctuation mark that is allowed in URLs but is not part of the URL the user intended (e.g., a URL that appears at the end of  a sentence, or before a comma).

True, but any other automatic detection scheme is likely to have problems with this case as well.

Link to comment
Share on other sites

Just now, fde101 said:

True, but any other automatic detection scheme is likely to have problems with this case as well.

Yes, but (for example) I have seen applications that assume that if a URL contains ". " or ", " that it stops before the punctuation mark, while allowing periods or commas that are not followed by a space to be part of the URL. And that seems like a reasonable compromise (though still, possibly, failing in some cases when one uses spaces or punctuation that are not encoded).

But doing that in a regular expression is more tricky, I think.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

For whatever it probably isn't worth, here is a regex that seems to handle the URLs most likely of interest for such a detection algorithm, though it doesn't account for that corner case of punctuation immediately after the URL.  You weren't kidding that it is quite trickly...  and long if you want to do it correctly!

 

[A-Za-z]([a-zA-Z0-9+\-\.])*:(\/\/(((([A-Za-z0-9\-\_\.!~\*'\(\);:\&=+\$,]|\%[0-9A-Fa-f]{2})*\@)?(([A-Za-z0-9]([A-Za-z0-9\-]*[A-Za-z0-9])?\.)*[A-Za-z]([A-Za-z0-9\-]*[A-Za-z0-9])?\.?|\d+\.\d+\.\d+\.\d+):\d*)?|)(\/(\%[0-9A-Fa-f]{2}|[A-Za-z0-9\-\_\.!~\*'\(\):\@&=+\$,])*(;(\%[0-9A-Fa-f]{2}|[A-Za-z0-9\-\_\.!~\*'\(\):\@&=+\$,])*)*(\/(\%[0-9A-Fa-f]{2}|[A-Za-z0-9\-\_\.!~\*'\(\):\@&=+\$,])*(;(\%[0-9A-Fa-f]{2}|[A-Za-z0-9\-\_\.!~\*'\(\):\@&=+\$,])*)*)*)?|\/(\%[0-9A-Fa-f]{2}|[A-Za-z0-9\-\_\.!~\*'\(\):\@&=+\$,])*(;(\%[0-9A-Fa-f]{2}|[A-Za-z0-9\-\_\.!~\*'\(\):\@&=+\$,])*)*(\/(\%[0-9A-Fa-f]{2}|[A-Za-z0-9\-\_\.!~\*'\(\):\@&=+\$,])*(;(\%[0-9A-Fa-f]{2}|[A-Za-z0-9\-\_\.!~\*'\(\):\@&=+\$,])*)*)*)(\?([;\/?:\@\&=+\$,A-Za-z0-9\-\_\.!~\*'\(\)]|(\%[0-9A-Fa-f]{2}))*)?(\#([;\/?:\@\&=+\$,A-Za-z0-9\-\_\.!~\*'\(\)]|(\%[0-9A-Fa-f]{2}))*)?

 

EDIT: actually, this doesn't handle mailto: URLs correctly either...

Link to comment
Share on other sites

Thanks, @fde101.

It might be wise to start it with an explicit list of protocols, too:

(http|https):

which, when it's adjusted to handle mailto, could become

(http|https|mailto):

 

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))\b((?=[a-z0-9-]{1,63}\.)(xn--)?[a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,63}\b

 

Though I usually handle emails differently, this was saved in my library.

Link to comment
Share on other sites

9 minutes ago, walt.farrell said:

Thanks, @fde101.

It might be wise to start it with an explicit list of protocols, too:


(http|https):

which, when it's adjusted to handle mailto, could become


(http|https|mailto):

 

That excludes ftp, smb, and numerous other protocols which still form valid URLs.

Link to comment
Share on other sites

32 minutes ago, fde101 said:

 


[A-Za-z]([a-zA-Z0-9+\-\.])*:(\/\/(((([A-Za-z0-9\-\_\.!~\*'\(\);:\&=+\$,]|\%[0-9A-Fa-f]{2})*\@)?(([A-Za-z0-9]([A-Za-z0-9\-]*[A-Za-z0-9])?\.)*[A-Za-z]([A-Za-z0-9\-]*[A-Za-z0-9])?\.?|\d+\.\d+\.\d+\.\d+):\d*)?|)(\/(\%[0-9A-Fa-f]{2}|[A-Za-z0-9\-\_\.!~\*'\(\):\@&=+\$,])*(;(\%[0-9A-Fa-f]{2}|[A-Za-z0-9\-\_\.!~\*'\(\):\@&=+\$,])*)*(\/(\%[0-9A-Fa-f]{2}|[A-Za-z0-9\-\_\.!~\*'\(\):\@&=+\$,])*(;(\%[0-9A-Fa-f]{2}|[A-Za-z0-9\-\_\.!~\*'\(\):\@&=+\$,])*)*)*)?|\/(\%[0-9A-Fa-f]{2}|[A-Za-z0-9\-\_\.!~\*'\(\):\@&=+\$,])*(;(\%[0-9A-Fa-f]{2}|[A-Za-z0-9\-\_\.!~\*'\(\):\@&=+\$,])*)*(\/(\%[0-9A-Fa-f]{2}|[A-Za-z0-9\-\_\.!~\*'\(\):\@&=+\$,])*(;(\%[0-9A-Fa-f]{2}|[A-Za-z0-9\-\_\.!~\*'\(\):\@&=+\$,])*)*)*)(\?([;\/?:\@\&=+\$,A-Za-z0-9\-\_\.!~\*'\(\)]|(\%[0-9A-Fa-f]{2}))*)?(\#([;\/?:\@\&=+\$,A-Za-z0-9\-\_\.!~\*'\(\)]|(\%[0-9A-Fa-f]{2}))*)?

 

Regular expressions seemed to me alway like a book with seven seals. Now I know why 😉

Impressive!

d.

Affinity Designer 1 & 2   |   Affinity Photo 1 & 2   |   Affinity Publisher 1 & 2
Affinity Designer 2 for iPad   |   Affinity Photo 2 for iPad   |   Affinity Publisher 2 for iPad

Windows 11 64-bit - Core i7 - 16GB - Intel HD Graphics 4600 & NVIDIA GeForce GTX 960M
iPad pro 9.7" + Apple Pencil

Link to comment
Share on other sites

27 minutes ago, fde101 said:

That excludes ftp, smb, and numerous other protocols which still form valid URLs.

True. But having an explicit list of the protocols simplifies the processing when trying to match against general text, and should improve performance.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

5 minutes ago, walt.farrell said:

True. But having an explicit list of the protocols simplifies the processing when trying to match against general text, and should improve performance.

I just ran the one I posted against a very long list with some buried in the text. Performance is fine, as I suspect fde's is fine too. 

No regex that does this will ever be "set it and forget it." There will always be reviewing hits if the purpose is to make text into hyperlinks or for inclusion in an index.

Now, if someone tries to use such a regex in a grep style should that capability ever come to APub, they deserve the performance smack-down...

Link to comment
Share on other sites

I'm not at my computer to check, but I would try something like this:

Do a regular expression Find for

(http|https):[^ ]+

That should find and highlight each one. So, do a Find to locate/highlight the first one, then Text > Interactive > Insert Hyperlink....

After inserting the first one, click Find again in the Find and Replace panel, and insert the next one. Repeat. 

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

I have to add my vote and input to this lack of a Hyperlink import feature.

A year ago I made a request and still nothing has changed. I would have thought in this digital age that it is a key feature needed in Publisher.

I have hundreds of hyperlinks again for my new book project. It is going to be a long and boring cut and paste for a few days to put those links all back on the book pages. Sure was hoping this time around I could avoid that pain. 😒

I have no idea what it takes to code this in but please spare us all this tedious grind...one day soon!

Dan

Link to comment
Share on other sites

36 minutes ago, DNA0101 said:

I have to add my vote and input to this lack of a Hyperlink import feature.

A year ago I made a request and still nothing has changed. I would have thought in this digital age that it is a key feature needed in Publisher.

I have hundreds of hyperlinks again for my new book project. It is going to be a long and boring cut and paste for a few days to put those links all back on the book pages. Sure was hoping this time around I could avoid that pain. 😒

What form is your existing document in (PDF, .docx, ???), and what form do your hyperlinks take?

As I mentioned above, if they happen to look like URLs in your text it's easy to find them and insert hyperlinks. Of course, if they don't resemble URLs then it will be harder, as you indicate. In either case, I agree they should be handled automatically.

If they do happen to look like URLs, then as a workaround while waiting for a better implementation:

  1. Do a Regular Expression Find for
    (http|https):[^ ]+

    If you have other protocols besides http and https you can add them.

  2. Click Find.

  3. Having found one, the text is highlighted. If the end of the URL was detected properly, do a Ctrl+C to copy it. If not, adjust the end of the found text, and then copy it.
  4. Then, Text > Interactive > Insert Hyperlink..., change the link type to URL, and Ctrl+V in the URL field. Then click OK, and click the FInd button again.
    (You can make this easier by assigning a keyboard shortcut to Text > Interactive > Insert Hyperlink.... I chose Ctrl+Alt+H.)

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

I usually use those in GREP styles in ID:

For emails:
[a-z0-9.-]+@[a-z0-9-]+\.[a-z]{2,5}
or
\S+@\S+\.\S{2,5}

For sites, etc.:
(http|https|www|ftp|feed)\S+
or
(www\.|http:\/\/)?[a-z0-9.-]+\.[a-z]{2,5}

But we usually "clean" URL, and the visible part is reduce to "www.site.ext/page", while the real link can be "https://www.site.ext/page". I had a magazine with simplified and clickable URL in the text, and complete ones lisible in the footnotes, since it was exported as web PDF and as printed PDF (with one they should be able to click in the text, with the other one to read the footnote and type).

But what would be interesting isn't for APub to reconize URL/emails in the text, but to be able to understand when pasting some data (XML, HTML…), which part is the visible URL and which one is the invisible one to be used as link, and create it automatically, if the option is checked — it should be an option, we don't always need it.

If we need APub to reconize them to apply a different style, it's more a need for GREP like in ID, and it would be usefull for more than URL and emails.

Link to comment
Share on other sites

2 hours ago, walt.farrell said:

What form is your existing document in (PDF, .docx, ???), and what form do your hyperlinks take?

As I mentioned above, if they happen to look like URLs in your text it's easy to find them and insert hyperlinks. Of course, if they don't resemble URLs then it will be harder, as you indicate. In either case, I agree they should be handled automatically.

If they do happen to look like URLs, then as a workaround while waiting for a better implementation:

  1. Do a Regular Expression Find for
    
    (http|https):[^ ]+

    If you have other protocols besides http and https you can add them.

  2. Click Find.

  3. Having found one, the text is highlighted. If the end of the URL was detected properly, do a Ctrl+C to copy it. If not, adjust the end of the found text, and then copy it.
  4. Then, Text > Interactive > Insert Hyperlink..., change the link type to URL, and Ctrl+V in the URL field. Then click OK, and click the FInd button again.
    (You can make this easier by assigning a keyboard shortcut to Text > Interactive > Insert Hyperlink.... I chose Ctrl+Alt+H.)

Walt thanks for the input.

I used every shortcut and hot key combo I could manage to streamline the process. Including a nice app called Ditto. Still it was (and will be again) a momentous task at 300+ links for my eBook.

The links I use are already embedded on Wordpress pages. Yet there is no way to copy them into Publisher or via a go between  like Word, Libre Office, Google Docs. Nothing is going to work. 😕

Link to comment
Share on other sites

9 minutes ago, DNA0101 said:

The links I use are already embedded on Wordpress pages. Yet there is no way to copy them into Publisher or via a go between  like Word, Libre Office, Google Docs. Nothing is going to work

I don't have any idea how it might work with Wordpress. However, all the hyperlinks (of any form) that I just used in a .docx file produced by LibreOffice were properly recognized when I Placed the .docx file. So if you can get them into LibreOffice and they are recognized there, they should be recognized by Publisher based on that experiment I just performed.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

I did a simple test: copying and pasting some text containing 2 URL in Libre Office, and exporting this à .docx

Next, I place this file in an existing APub document: the 2 links are visible in the links panel (I use the French version, not sure of the exact name of this panel).

Why can't you do the same in a blank document, to apply the needed text styles to those paragraphs and URL ? It's easy, since there's a small button at the bottom left of the Links panel that you can use to show and select the text of the link. And there's already a character style applied that you can personalize.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.