How to normalize a URL?

Multi tool use
How to normalize a URL?
I am dealing with a situation where I need users to enter various URLs (for example: for their profiles). However, users do not always insert URLs in the https://example.com
format. They might insert something like:
https://example.com
example.com
example.com/
example.com/somepage
me@example.com
How can I normalize the URLs to a format that can potentially lead to a web address? I see this behavior in web browsers. We almost always enter crappy things in a web browser's bar and they can distinguish whether that's a search or something that can be turned into a URL.
I tried looking in many places but seems like I can't find any approach to this.
I would prefer a solution written for Node if it's possible. Thank you very much!
Your point is right! I was actually looking for advice on that aspect too. I think our users will mostly enter addresses for websites without a secure connection though.
– Victor
Jul 3 at 19:42
1 Answer
1
Use node's URL API, alongside some manual checks.
Example code:
const { URL } = require('url')
let myTestUrl = 'https://user:pass@sub.host.com:8080/p/a/t/h?query=string#hash';
try {
if (!myTestUrl.startsWith('https://') && !myTestUrl.startsWith('http://')) {
// The following line is based on the assumption that the URL will resolve using https.
// Ideally, after all checks pass, the URL should be pinged to verify the correct protocol.
// Better yet, it should need to be provided by the user - there are nice UX techniques to address this.
myTestUrl = `https://${myTestUrl}`
}
const normalizedUrl = new URL(myTestUrl);
if (normalizedUrl.username !== '' || normalized.password !== '') {
throw new Error('Username and password not allowed.')
}
// Do your thing
} catch (e) {
console.error('Invalid url provided', e)
}
I have only used http
and https
in this example, for a gist.
http
https
Straight from the docs, a nice visualisation of the API:
┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│ href │
├──────────┬──┬─────────────────────┬─────────────────────┬───────────────────────────┬───────┤
│ protocol │ │ auth │ host │ path │ hash │
│ │ │ ├──────────────┬──────┼──────────┬────────────────┤ │
│ │ │ │ hostname │ port │ pathname │ search │ │
│ │ │ │ │ │ ├─┬──────────────┤ │
│ │ │ │ │ │ │ │ query │ │
" https: // user : pass @ sub.host.com : 8080 /p/a/t/h ? query=string #hash "
│ │ │ │ │ hostname │ port │ │ │ │
│ │ │ │ ├──────────────┴──────┤ │ │ │
│ protocol │ │ username │ password │ host │ │ │ │
├──────────┴──┼──────────┴──────────┼─────────────────────┤ │ │ │
│ origin │ │ origin │ pathname │ search │ hash │
├─────────────┴─────────────────────┴─────────────────────┴──────────┴────────────────┴───────┤
│ href │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
Looks like I was actually overthinking this when it was actually almost nothing more than a RegEx check. Thank you!
– Victor
Jul 3 at 19:43
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
I don't mean to be pedantic, but in order to approach a problem like this, you must first define rigorously what string patterns you intend to treat as a URL, and what protocol you will assume when it's normalized to a valid URI, whether that's HTTP, HTTPS, etc.
– Patrick Roberts
Jul 2 at 22:11