Sanitization vs. Validation (and the importance of both in your forms)

One of the most important things an aspiring web developer can learn how to do is to implement forms. Not only do they facilitate user interaction with website owners, but they have a plethora of other uses. Forms are simply a mechanism for the end user to submit data to the form owner’s server. This data can be anything from email messages, forum posts, blog posts, credit card information, pictures, and so on. And while setting up forms to facilitate these interactions is relatively easy (generally speaking), a lot of newcomers to development don’t know that there’s anything else beyond setting up the transmission and receiving points of the form.

Sanitization vs. Validation

These two terms are often confused and/or misused by beginning developers. So what is the difference between sanitization and validation? Well let’s start with validation, as it happens before sanitization. Simply put, validation is verifying that the data being submitted conforms to a rule or set of rules you (the developer) set for a particular input field. This could mean something as simple as verifying that a form field wasn’t left blank, or it could mean using a complicated regex pattern to verify that an email or phone number is valid.

“…validation is verifying that the data being submitted conforms to a rule or set of rules you (the developer) set for a particular input field.”

Now that we have that out of the way, let’s talk about sanitization. Whereas validation requires user input to conform to a certain rule or rules put forth by the developer, sanitization only cares about making sure the data being submitted doesn’t contain code. Let’s say you have an input field in your form with no sanitization. It might look something like this:

This is an incredibly simplified example, but it gets the point across. The form method is set to POST, and the action is set to the same page the form is on. So, when the user submits the form data, the page refreshes with the global $_POST variable set and containing the data submitted in the input field “blah”. With no sanitization, whatever the user inputs in that field is then going to be echoed into the “value” attribute of the input element (which is common practice for retaining data upon submission), and this is where the problem lies. If a mischevious peruser of your site decided they were so inclined, they could easily add some code to change your page however they saw fit.

Using the above example form, take the following user input data, for example: "></form><script>alert('All your website are belong to us.');</script>. The user has effectively closed your input element, followed by your form element, and then maliciously inserted their own <script> tag with which they can do whatever they so desire. This is why sanitization is so important. Not only do you want to validate that your user is inputing correctly formatted data pertinent to the input fields they’re filling out, but you also want to safeguard your own server from XSS and SQL injection attacks.

There is so much that could be discussed on the topics of validation, sanitization, internet protocols, and so on that if I were to attempt to cover them all this post would never end. So, considering I’ve filed this under the “For Beginners” category, I suppose I should start wrapping up. Below I’ll give a few examples of validation and sanitization. You’ll notice that to validate data takes a lot more effort on your part than it does to sanitize data. This is because scripting languages have sanitization functions built-in, whereas they don’t for validation.

Javascript email address validation

Here’s a function which thoroughly validates an email address using javascript. This is the best validation technique I’ve found thusfar for email addresses, although it will soon be obsolete. With IPv6 nearing widespread acceptance around the web, many email addresses will look very different upon transitioning. Without going into a ton of detail between the differences of IPv4 (what we currently use) and IPv6 (what ISP’s are transitioning to), I will say this: it is still safe to use this function to validate an email address for a while. It’s still going to correctly validate about 99% of valid email addresses.

PHP email address validation

Below is basically the same function as above, but with an additional level of verification. Since javascript is a client-side scripting language (meaning that it’s executed on your computer), you can’t actually use it to connect to remote hosts to verify DNS settings. However, PHP is a server-side scripting language, so with PHP you can connect to remote hosts to verify data. Therefore, we run one additional check at the end to verify that the domain is actually listed in the DNS records using checkdnsrr().

Input Sanitization

In contrast to its validation counterparts, the sanitization of data is quite easy.

This is an extremely simple example, and definitely not all-inclusive, but it will safeguard against the majority of nefarious attempts to compromise your site through user generated data. Implementing this is simple, just wrap the newly created function around the data to be echoed onto the page after the user submits it.

Thanks for reading!

If you’re an advanced developer, then the topics covered in this post will almost certainly be old news to you. However, if you’re new to development or if you simply weren’t aware of the need to sanitize and validate user-generated data, then I hope you found this post informative. Keep in mind, there is a LOT more to it than the simple examples I listed above. I just meant this as a starting point for newcomers to grasp the basic concept and hopefully inspire them to research more robust methods on their own.

Josh

About Josh Jones

A self-taught web developer with a passion for all types of geekery.. I've written a WordPress Plugin or 10, coded many themes, and in my spare time I'm working on a project to strengthen online security protocols. You can find me on Facebook, Twitter, Google+, LinkedIn, and GitHub, or you can check me out at NaughtNull. •••i••l•o•v•e••2••c•o•d•e•••

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">