How to normalize a URL?

Multi tool use
Multi tool use


How to normalize a URL?



I am dealing with a situation where I need users to enter various URLs (for example: for their profiles). However, users do not always insert URLs in the https://example.com format. They might insert something like:


https://example.com


example.com


example.com/


example.com/somepage


me@example.com



How can I normalize the URLs to a format that can potentially lead to a web address? I see this behavior in web browsers. We almost always enter crappy things in a web browser's bar and they can distinguish whether that's a search or something that can be turned into a URL.



I tried looking in many places but seems like I can't find any approach to this.



I would prefer a solution written for Node if it's possible. Thank you very much!





I don't mean to be pedantic, but in order to approach a problem like this, you must first define rigorously what string patterns you intend to treat as a URL, and what protocol you will assume when it's normalized to a valid URI, whether that's HTTP, HTTPS, etc.
– Patrick Roberts
Jul 2 at 22:11





Your point is right! I was actually looking for advice on that aspect too. I think our users will mostly enter addresses for websites without a secure connection though.
– Victor
Jul 3 at 19:42





1 Answer
1



Use node's URL API, alongside some manual checks.



Example code:


const { URL } = require('url')
let myTestUrl = 'https://user:pass@sub.host.com:8080/p/a/t/h?query=string#hash';

try {
if (!myTestUrl.startsWith('https://') && !myTestUrl.startsWith('http://')) {
// The following line is based on the assumption that the URL will resolve using https.
// Ideally, after all checks pass, the URL should be pinged to verify the correct protocol.
// Better yet, it should need to be provided by the user - there are nice UX techniques to address this.
myTestUrl = `https://${myTestUrl}`
}

const normalizedUrl = new URL(myTestUrl);

if (normalizedUrl.username !== '' || normalized.password !== '') {
throw new Error('Username and password not allowed.')
}

// Do your thing
} catch (e) {
console.error('Invalid url provided', e)
}



I have only used http and https in this example, for a gist.


http


https



Straight from the docs, a nice visualisation of the API:


┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│ href │
├──────────┬──┬─────────────────────┬─────────────────────┬───────────────────────────┬───────┤
│ protocol │ │ auth │ host │ path │ hash │
│ │ │ ├──────────────┬──────┼──────────┬────────────────┤ │
│ │ │ │ hostname │ port │ pathname │ search │ │
│ │ │ │ │ │ ├─┬──────────────┤ │
│ │ │ │ │ │ │ │ query │ │
" https: // user : pass @ sub.host.com : 8080 /p/a/t/h ? query=string #hash "
│ │ │ │ │ hostname │ port │ │ │ │
│ │ │ │ ├──────────────┴──────┤ │ │ │
│ protocol │ │ username │ password │ host │ │ │ │
├──────────┴──┼──────────┴──────────┼─────────────────────┤ │ │ │
│ origin │ │ origin │ pathname │ search │ hash │
├─────────────┴─────────────────────┴─────────────────────┴──────────┴────────────────┴───────┤
│ href │
└─────────────────────────────────────────────────────────────────────────────────────────────┘





Looks like I was actually overthinking this when it was actually almost nothing more than a RegEx check. Thank you!
– Victor
Jul 3 at 19:43







By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

MfkS,VwGO4ULcEIH,O,VAIncfgXB5Ux,opJ3GHgegfBaqfB14eKxVTLRwuDBDM5tEffp FtjfIf3vxl Hq0q
BfJAB1WKcDVOY9GqQ

Popular posts from this blog

PHP contact form sending but not receiving emails

Do graphics cards have individual ID by which single devices can be distinguished?

Create weekly swift ios local notifications