In the databases course that I did during my education (approx. 4 years ago), I thought that it is recommended avoiding the use of character strings as primary key’s data type.
Can someone tell me what are the pros and cons for choosing a character varying data type for primary key in SQL and how much the above premise is true?
N.B.: (I’m using PostgreSQL database). I’m also dealing with a situation when you need to reference such a table from another, thus putting foreign key on character varying data type. Please take in account that also.
The advantages you have for choosing a character datatype as a primary key field is that you may choose what data it can show. As an example, you could have the email address as the key field for a users table. The eliminates the need for an additional column. Another advantage is if you have a common data table that holds indexes of multiple other tables (think a NOTES table with an external reference to FINANCE, CONTACT, and ADMIN tables), you can easily know what table this came from (e.g. your FINANCE table has an index of F00001, CONTACT table has an index of C00001, etc). I’m afraid the disadvantages are going to be greater larger in this reply as I’m against such an approach.
The disadvantages are as follows:
- The serial datatype exists for exactly this reason in PostgreSQL
- Numeric indexes will be entered in order and minimal reindexing will need to be done (i.e. if you have a table with keys Apple, Carrot and want to insert Banana, the table will have to move around the indexes so that Banana is inserted in the middle. You will rarely insert data in the middle of an index if the index is numeric).
- Numeric indexes unlinked from data are not going to change.
- Numeric indexes are shorter and their length can be fixed (4 bytes vs whatever you pick as your varchar length).
In your case you can still put a foreign key on a numeric index, so I’m not sure why you would want to force it to be a varchar type. Searching and filtering on a numeric field is theoretically faster than a text field as the server will be forced to convert the data first. Generally speaking, you would have a numeric primary key that is non-clustered, and then create a clustered key on your data column that you are going to filter a lot.
Those are general standards when writing SQL, but when it comes to benchmarking, you will only find that varchar columns are a little slower on joining and filtering than integer columns. As long as your primary keys are not changing EVER then you’re alright.