One of the most important things an aspiring web developer can learn how to do is to implement forms. Not only do they facilitate user interaction with website owners, but they have a plethora of other uses. Forms are simply a mechanism for the end user to submit data to the form owner’s server. This data can be anything from email messages, forum posts, blog posts, credit card information, pictures, and so on. And while setting up forms to facilitate these interactions is relatively easy (generally speaking), a lot of newcomers to development don’t know that there’s anything else beyond setting up the transmission and receiving points of the form.
Sanitization vs. Validation
These two terms are often confused and/or misused by beginning developers. So what is the difference between sanitization and validation? Well let’s start with validation, as it happens before sanitization. Simply put, validation is verifying that the data being submitted conforms to a rule or set of rules you (the developer) set for a particular input field. This could mean something as simple as verifying that a form field wasn’t left blank, or it could mean using a complicated regex pattern to verify that an email or phone number is valid.
“…validation is verifying that the data being submitted conforms to a rule or set of rules you (the developer) set for a particular input field.”
Now that we have that out of the way, let’s talk about sanitization. Whereas validation requires user input to conform to a certain rule or rules put forth by the developer, sanitization only cares about making sure the data being submitted doesn’t contain code. Let’s say you have an input field in your form with no sanitization. It might look something like this:
1 2 3 4 |
<form method="post" action=""> <input type="text" name="blah" value="<?php if(isset($_POST['blah'])) {echo $_POST['blah']; } ?>"> <input type="submit" value="Send"> </form> |
This is an incredibly simplified example, but it gets the point across. The form method is set to POST, and the action is set to the same page the form is on. So, when the user submits the form data, the page refreshes with the global $_POST variable set and containing the data submitted in the input field “blah”. With no sanitization, whatever the user inputs in that field is then going to be echoed into the “value” attribute of the input element (which is common practice for retaining data upon submission), and this is where the problem lies. If a mischevious peruser of your site decided they were so inclined, they could easily add some code to change your page however they saw fit.
Using the above example form, take the following user input data, for example: "></form><script>alert('All your website are belong to us.');</script>
. The user has effectively closed your input element, followed by your form element, and then maliciously inserted their own <script>
tag with which they can do whatever they so desire. This is why sanitization is so important. Not only do you want to validate that your user is inputing correctly formatted data pertinent to the input fields they’re filling out, but you also want to safeguard your own server from XSS and SQL injection attacks.
There is so much that could be discussed on the topics of validation, sanitization, internet protocols, and so on that if I were to attempt to cover them all this post would never end. So, considering I’ve filed this under the “For Beginners” category, I suppose I should start wrapping up. Below I’ll give a few examples of validation and sanitization. You’ll notice that to validate data takes a lot more effort on your part than it does to sanitize data. This is because scripting languages have sanitization functions built-in, whereas they don’t for validation.
Javascript email address validation
Here’s a function which thoroughly validates an email address using javascript. This is the best validation technique I’ve found thusfar for email addresses, although it will soon be obsolete. With IPv6 nearing widespread acceptance around the web, many email addresses will look very different upon transitioning. Without going into a ton of detail between the differences of IPv4 (what we currently use) and IPv6 (what ISP’s are transitioning to), I will say this: it is still safe to use this function to validate an email address for a while. It’s still going to correctly validate about 99% of valid email addresses.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
function validEmail(email) { var isValid = true; var atIndex = email.indexOf("@"); var ending = email.length - 1; if(typeof(atIndex) == "boolean" && !atIndex) { isValid = false; } else { var domain = email.substr(atIndex+1); var local = email.substr(0, atIndex); var localLen = local.length; var domainLen = domain.length; if(localLen < 1 || localLen > 64) { // local part length exceeded isValid = false; } else if(domainLen < 1 || domainLen > 255) { // domain part length exceeded isValid = false; } else if(local[0] == '.' || local[localLen-1] == '.') { // local part starts or ends with '.' isValid = false; } else if(/\.\./.test(local)) { // local part has two consecutive dots isValid = false; } else if(/^[A-Za-z0-9\\-\\.]+$/.test(domain) == false) { // character not valid in domain part isValid = false; } else if(/\.\./.test(domain)) { // domain part has two consecutive dots isValid = false; } else if(/^(\\\\.|[A-Za-z0-9!#%&`_=\/$'*+?^{}|~.-])+$/.test(local.replace("\\\\",""))) { // character not valid in local part unless // local part is quoted if(/^"(\\\\"|[^"])+"$/.test(local.replace("\\\\",""))) { isValid = false; } } } return isValid; } if(validEmail('me@somewhere.com')) { console.log("Yep, it's valid."); } |
PHP email address validation
Below is basically the same function as above, but with an additional level of verification. Since javascript is a client-side scripting language (meaning that it’s executed on your computer), you can’t actually use it to connect to remote hosts to verify DNS settings. However, PHP is a server-side scripting language, so with PHP you can connect to remote hosts to verify data. Therefore, we run one additional check at the end to verify that the domain is actually listed in the DNS records using checkdnsrr()
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
function validEmail($email) { $isValid = true; $atIndex = strrpos($email, "@"); if(is_bool($atIndex) && !$atIndex) { $isValid = false; } else { $domain = substr($email, $atIndex+1); $local = substr($email, 0, $atIndex); $localLen = strlen($local); $domainLen = strlen($domain); if($localLen < 1 || $localLen > 64) { // local part length exceeded $isValid = false; } else if($domainLen < 1 || $domainLen > 255) { // domain part length exceeded $isValid = false; } else if($local[0] == '.' || $local[$localLen-1] == '.') { // local part starts or ends with '.' $isValid = false; } else if(preg_match('/\\.\\./', $local)) { // local part has two consecutive dots $isValid = false; } else if(!preg_match('/^[A-Za-z0-9\\-\\.]+$/', $domain)) { // character not valid in domain part $isValid = false; } else if(preg_match('/\\.\\./', $domain)) { // domain part has two consecutive dots $isValid = false; } else if(!preg_match('/^(\\\\.|[A-Za-z0-9!#%&`_=\\/$\'*+?^{}|~.-])+$/', str_replace("\\\\","",$local))) { // character not valid in local part unless // local part is quoted if(!preg_match('/^"(\\\\"|[^"])+"$/', str_replace("\\\\","",$local))) { $isValid = false; } } if($isValid && !(checkdnsrr($domain,"MX") || checkdnsrr($domain,"A"))) { // domain not found in DNS $isValid = false; } } return $isValid; } if(validEmail("me@somewhere.com")) { echo "Yep, it's valid."; } |
Input Sanitization
In contrast to its validation counterparts, the sanitization of data is quite easy.
1 2 3 4 5 6 |
function clean_data($data) { $data = trim($data); $data = stripslashes($data); $data = htmlspecialchars($data); return $data; } |
This is an extremely simple example, and definitely not all-inclusive, but it will safeguard against the majority of nefarious attempts to compromise your site through user generated data. Implementing this is simple, just wrap the newly created function around the data to be echoed onto the page after the user submits it.
1 2 3 4 |
<form method="post" action=""> <input type="text" name="blah" value="<?php if(isset($_POST['blah'])) {echo clean_data($_POST['blah']); } ?>"> <input type="submit" value="Send"> </form> |
Thanks for reading!
If you’re an advanced developer, then the topics covered in this post will almost certainly be old news to you. However, if you’re new to development or if you simply weren’t aware of the need to sanitize and validate user-generated data, then I hope you found this post informative. Keep in mind, there is a LOT more to it than the simple examples I listed above. I just meant this as a starting point for newcomers to grasp the basic concept and hopefully inspire them to research more robust methods on their own.
I see you don’t monetize webdesignforidiots.net, don’t
waste your traffic, you can earn extra bucks every month with new monetization method.
This is the best adsense alternative for any type of website (they approve all websites), for more info simply search in gooogle: murgrabia’s
tools