Categories
php security sql-injection user-input xss

How can I sanitize user input with PHP?

1238

Is there a catchall function somewhere that works well for sanitizing user input for SQL injection and XSS attacks, while still allowing certain types of HTML tags?

8

  • 94

    Using PDO or MySQLi isn’t enough. If you build your SQL statements with untrusted data, like select * from users where name='$name', then it doesn’t matter if you use PDO or MySQLi or MySQL. You are still in danger. You must use parametrized queries or, if you must, use escaping mechanisms on your data, but that is much less preferable.

    Dec 20, 2013 at 17:01

  • 30

    @AndyLester Are you implying that someone uses PDO without prepared statements? 🙂

    – user1537415

    Mar 30, 2014 at 14:20

  • 75

    I’m saying that “Use PDO or MySQLi” is not information enough to explain to novices on how to safely use them. You and I know that prepared statements matter, but I do not assume that everyone who reads this question will know it. That is why I added the explicit instructions.

    Mar 30, 2014 at 22:10

  • 35

    Andy’s comment is entirely valid. I converted my mysql website to PDO recently thinking that I was now somehow safe from injection attacks. It was only during the process I realised that some of my sql statements were still built using user input. I then fixed that using prepared statements. To a complete novice, it’s not fully clear that there is a distinction as many experts throw out the comment about using PDO but don’t specify the need for prepared statements. The assumption being that this is obvious. But not to a novice.

    May 25, 2014 at 8:15


  • 10

    @Christian: GhostRider and AndyLester are right. Let this be a lesson in communication. I was a novice once and it sucked because experts flat out don’t know how to communicate.

    – OCDev

    Nov 4, 2014 at 13:17

1267

It’s a common misconception that user input can be filtered. PHP even has a (now deprecated) “feature”, called magic-quotes, that builds on this idea. It’s nonsense. Forget about filtering (or cleaning, or whatever people call it).

What you should do, to avoid problems, is quite simple: whenever you embed a a piece of data within a foreign code, you must treat it according to the formatting rules of that code. But you must understand that such rules could be too complicated to try to follow them all manually. For example, in SQL, rules for strings, numbers and identifiers are all different. For your convenience, in most cases there is a dedicated tool for such an embedding. For example, when you need to use a PHP variable in the SQL query, you have to use a prepared statement, that will take care of all the proper formatting/treatment.

Another example is HTML: If you embed strings within HTML markup, you must escape it with htmlspecialchars. This means that every single echo or print statement should use htmlspecialchars.

A third example could be shell commands: If you are going to embed strings (such as arguments) to external commands, and call them with exec, then you must use escapeshellcmd and escapeshellarg.

Also, a very compelling example is JSON. The rules are so numerous and complicated that you would never be able to follow them all manually. That’s why you should never ever create a JSON string manually, but always use a dedicated function, json_encode() that will correctly format every bit of data.

And so on and so forth …

The only case where you need to actively filter data, is if you’re accepting preformatted input. For example, if you let your users post HTML markup, that you plan to display on the site. However, you should be wise to avoid this at all cost, since no matter how well you filter it, it will always be a potential security hole.

31

  • 260

    “This means that every single echo or print statement should use htmlspecialchars” – of course, you mean “every … statement outputting user input”; htmlspecialchars()-ifying “echo ‘Hello, world!’;” would be crazy 😉

    Oct 20, 2008 at 13:32

  • 12

    There’s one case where I think filtering is the right solution: UTF-8. You don’t want invalid UTF-8 sequences all over your application (you might get different error recovery depending on code path), and UTF-8 can be filtered (or rejected) easily.

    – Kornel

    Sep 9, 2009 at 21:33

  • 6

    @jbyrd – no, LIKE uses a specialised regexp language. You will have to escape your input string twice – once for the regexp and once for the mysql string encoding. It’s code within code within code.

    – troelskn

    Oct 29, 2011 at 20:02


  • 9

    At this moment mysql_real_escape_string is deprecated. It’s considered good practice nowadays to use prepared statements to prevent SQL injection. So switch to either MySQLi or PDO.

    Jun 5, 2013 at 12:46

  • 4

    Because you limit the attack surface. If you sanitize early (when input), you have to be certain that there are no other holes in the application where bad data could enter through. Whereas if you do it late, then your output function doesn’t have to “trust” that it is given safe data – it simply assumes that everything is unsafe.

    – troelskn

    Jul 15, 2014 at 17:33

238

Do not try to prevent SQL injection by sanitizing input data.

Instead, do not allow data to be used in creating your SQL code. Use Prepared Statements (i.e. using parameters in a template query) that uses bound variables. It is the only way to be guaranteed against SQL injection.

Please see my website http://bobby-tables.com/ for more about preventing SQL injection.

6

  • 20

    Or visit the official documentation and learn PDO and prepared statements. Tiny learning curve, but if you know SQL pretty well, you’ll have no trouble adapting.

    – a coder

    Nov 13, 2014 at 2:49

  • 2

    For the specific case of SQL Injection, this is the correct answer!

    May 30, 2015 at 2:04

  • 6

    Note that prepared statements don’t add any security, parameterised queries do. They just happen to be very easy to use together in PHP.

    – Basic

    Aug 16, 2015 at 3:01

  • Its not the only guaranteed way. Hex the input and unhex in query will prevent also. Also hex attacks are not possible if you use hexing right.

    Feb 22, 2016 at 15:50

  • What if you’re inputting something specialized, like email addresses or usernames?

    Jan 9, 2017 at 8:34


82

No. You can’t generically filter data without any context of what it’s for. Sometimes you’d want to take a SQL query as input and sometimes you’d want to take HTML as input.

You need to filter input on a whitelist — ensure that the data matches some specification of what you expect. Then you need to escape it before you use it, depending on the context in which you are using it.

The process of escaping data for SQL – to prevent SQL injection – is very different from the process of escaping data for (X)HTML, to prevent XSS.

0