URL Validation or: How I Learned to Stop Worrying and Love the User
There comes a time for every web developer when they have to do some type of input validation. A form isn't a blog post where a user can wax poetically about their love of Yahoo Mail in an email field. Eventually, there needs to be word count limits, checks for specific characters, and simple validation techniques that stops the user from sending a junk POST request.
However, what if you need to validate a URL? And to add another layer to the problem, what if you only want the hostname of a URL, no paths, no protocol, just the "dubya dubya dubya dot" (www.) and the .com.
Let's start with knowing if a URL is a URL. A link requires a second and top level domain, the walmart and .com in walmart.com respectively, and a scheme (https://). Without these parts, the link doesn't link to anything and becomes no different from a line of text.
But now that we know the parts of a URL, we reach a fork in or development path. Should the validation restrict the user at the field or sanitize the user input when the data is sent to the server?
There are merits and deficiencies in either options:
Validation Before Submission
If you restrict the user from submitting an invalid URL, it allows you to easily take the data on the server side without any extra work by forcing the user to submit the exact input structure you need. In this case, the pattern attribute for the input element combined with some regex would allow for some good old fashioned field validation.
Here's an example of this approach:
<input type="text" pattern="https?://.*"
However, it comes with a downside of restricting the user. It requires the user to have specific parts to their input and if you just need there to be a .com, then the long regex pattern might be overkill.
Validation After Submission
On the other hand, if you choose to sanitize the data after the user submits it, it allows the user to type anything and lets the server decide what to do with the data. Javascript's URL constructor does the validation for you, returning a TypeError if the input is invalid and also allowing you to extract specific parts of the URL like the origin or hostname.
Here's an example of this approach:
export const formatWebsiteAfterDomain = (website: string): string => { if (!website.trim().length) { return ''; } const regEx = /:\/\//; const websiteTrimmed = website.trim(); const hasProtocol = regEx.exec(websiteTrimmed); const updatedWebsite = hasProtocol ? websiteTrimmed : `https://${websiteTrimmed}`; try { const url = new URL(updatedWebsite); return hasProtocol ? url.origin : url.origin.replace('https://', ''); } catch (_err) { return websiteTrimmed; } };
However, because you give the user so much freedom in their input, it requires some compromises in what the server does with the data. If the user puts an invalid URL, what do you do with it? Do you use the TypeError response and notify the user or do you just allow the server to consume what the user sent? Furthermore, the URL constructor validates the input by checking if there is a scheme present (https:// or http://), which may be too little validation for your uses.
In the end, the path taken depends on the specific edge cases of your problem. A combination of both solutions might be the most comprehensive and versatile or one of the choices might be just enough. The user can put in any input and your solution will be determined on the amount of freedom you're willing to give the user. However, what remains universal is that the ability of the user to type anything will always force the user and developer to come to some sort of compromise (often the developer gets a specific input pattern and the user gets to use their application).
But since the peculiarities of user input are eternal, there will always be developers frantically pushing out solutions so their web apps don't break when users try to paste images in the URL field of a form.
The above is the detailed content of URL Validation or: How I Learned to Stop Worrying and Love the User. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

The shift from C/C to JavaScript requires adapting to dynamic typing, garbage collection and asynchronous programming. 1) C/C is a statically typed language that requires manual memory management, while JavaScript is dynamically typed and garbage collection is automatically processed. 2) C/C needs to be compiled into machine code, while JavaScript is an interpreted language. 3) JavaScript introduces concepts such as closures, prototype chains and Promise, which enhances flexibility and asynchronous programming capabilities.

The main uses of JavaScript in web development include client interaction, form verification and asynchronous communication. 1) Dynamic content update and user interaction through DOM operations; 2) Client verification is carried out before the user submits data to improve the user experience; 3) Refreshless communication with the server is achieved through AJAX technology.

JavaScript's application in the real world includes front-end and back-end development. 1) Display front-end applications by building a TODO list application, involving DOM operations and event processing. 2) Build RESTfulAPI through Node.js and Express to demonstrate back-end applications.

Understanding how JavaScript engine works internally is important to developers because it helps write more efficient code and understand performance bottlenecks and optimization strategies. 1) The engine's workflow includes three stages: parsing, compiling and execution; 2) During the execution process, the engine will perform dynamic optimization, such as inline cache and hidden classes; 3) Best practices include avoiding global variables, optimizing loops, using const and lets, and avoiding excessive use of closures.

Python and JavaScript have their own advantages and disadvantages in terms of community, libraries and resources. 1) The Python community is friendly and suitable for beginners, but the front-end development resources are not as rich as JavaScript. 2) Python is powerful in data science and machine learning libraries, while JavaScript is better in front-end development libraries and frameworks. 3) Both have rich learning resources, but Python is suitable for starting with official documents, while JavaScript is better with MDNWebDocs. The choice should be based on project needs and personal interests.

Both Python and JavaScript's choices in development environments are important. 1) Python's development environment includes PyCharm, JupyterNotebook and Anaconda, which are suitable for data science and rapid prototyping. 2) The development environment of JavaScript includes Node.js, VSCode and Webpack, which are suitable for front-end and back-end development. Choosing the right tools according to project needs can improve development efficiency and project success rate.

C and C play a vital role in the JavaScript engine, mainly used to implement interpreters and JIT compilers. 1) C is used to parse JavaScript source code and generate an abstract syntax tree. 2) C is responsible for generating and executing bytecode. 3) C implements the JIT compiler, optimizes and compiles hot-spot code at runtime, and significantly improves the execution efficiency of JavaScript.
