When designing tables, I've developed a habit of having one column that is unique and that I make the primary key. This is achieved in three ways depending on requirements: Show
Number 3 would be used for fairly small lookup, mostly read tables that might have a unique static length string code, or a numeric value such as a year or other number. For the most part, all other tables will either have an auto-incrementing integer or unique identifier primary key. The Question :-)I have recently started working with databases that have no consistent row identifier and primary keys are currently clustered across various columns. Some examples:
Is there a valid case for this? I would have always defined an identity or unique identifier column for these cases. In addition there are many tables without primary keys at all. What are the valid reasons, if any, for this? I'm trying to understand why tables were designed as they were, and it appears to be a big mess to me, but maybe there were good reasons for it. A third question to sort of help me decipher the answers: In cases where multiple columns are used to comprise the compound primary key, is there a specific advantage to this method vs. a surrogate/artificial key? I'm thinking mostly in regards to performance, maintenance, administration, etc.?
asked Dec 3, 2008 at 15:30
Lloyd CottenLloyd Cotten 4,3765 gold badges23 silver badges21 bronze badges 1 I follow a few rules:
On surrogate vs natural key, I refer to the rules above. If the natural key is small and will never change it can be used as a primary key. If the natural key is large or likely to change I use surrogate keys. If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you'd put a primary key in place.
Kaii 19.7k3 gold badges37 silver badges59 bronze badges answered Dec 3, 2008 at 19:25
10 Natural verses artifical keys is a kind of religious debate among the database community - see this article and others it links to. I'm neither in favour of always having artifical keys, nor of never having them. I would decide on a case-by-case basis, for example:
Wherever artificial keys are used, you should always also declare unique constraints on the natural keys. For example, use state_id if you must, but then you'd better declare a unique constraint on state_code, otherwise you are sure to eventually end up with:
1737973 6921 silver badges41 bronze badges answered Dec 3, 2008 at 16:19
Tony AndrewsTony Andrews 128k21 gold badges220 silver badges257 bronze badges 9 I avoid using natural keys for one simple reason -- human error. Although natural unique identifiers are often available (SSN, VIN, Account Number, etc.), they require a human to enter them correctly. If you're using SSNs as a primary key, someone transposes a couple of numbers during data entry, and the error isn't discovered immediately, then you're faced with changing your primary key. My primary keys are all handled by the database program in the background and the user is never aware of them. answered Jul 12, 2011 at 20:55
PaulPaul 3273 silver badges2 bronze badges 2 Just an extra comment on something that is often overlooked. Sometimes not using a single surrogate key as primary has benefits in the child tables. Let's say we have a design that allows you to run multiple companies within the one database (maybe it's a hosted solution, or whatever). Let's say we have these tables and columns:
In case that last bit doesn't make sense, In this model, it's not possible to screw-up and reference a CostElement from one company and a CostCentre from another company. If a single surrogate key was used as primary on the CostElement and CostCentre tables, and without the foreign key relations in the Invoice table, it would be. The fewer chances to screw up, the better.
Kaii 19.7k3 gold badges37 silver badges59 bronze badges answered Dec 5, 2008 at 10:38
WW.WW. 23.4k13 gold badges94 silver badges119 bronze badges 4 There´s no problem in making your primary key from various fields, that's a Natural Key. You can use a Identity column (associated with a unique index on the candidate fields) to make a Surrogate Key. That´s an old discussion. I prefer surrogate keys in most situations. But there´s no excuse for the lack of a key. RE: EDIT Yeah, there´s a lot of controversy about that :D I don´t see any obvious advantage on natural keys, besides the fact that they are the natural choice. You will always think in Name, SocialNumber - or something like that - instead of idPerson. Surrogate keys are the answer to some of the problems that natural keys have (propagating changes for example). As you get used to surrogates, it seems more clean, and manageable. But in the end, you´ll find out that it's just a matter of taste - or mindset -. People "think better" with natural keys, and others don´t.
answered Dec 3, 2008 at 16:04
1 Tables should have a primary key all the time. When it doesn't it should have been an AutoIncrement fields. Sometime people omit primary key because they transfer a lot of data and it might slow down (depend of the database) the process. BUT, it should be added after it. Some one comment about link table, this is right, it's an exception BUT fields should be FK to keep the integrity, and is some case those fields can be primary keys too if duplicate in links is not authorized... but to keep in a simple form because exception is something often in programming, primary key should be present to keep the integrity of your data. answered Dec 3, 2008 at 15:33
5 Besides all those good answers, I just want to share a good article I just read, The great primary-key debate. Just to quote a few points: The developer must apply a few rules when choosing a primary key for each table:
Natural keys (tend to) break the rules. Surrogate keys comply with the rules. (You better read through that article, it is worth your time!)
answered Jun 3, 2013 at 1:46
RayLuoRayLuo 16.1k5 gold badges84 silver badges71 bronze badges 1 Here are my own rule of thumbs I have settled on after 25+ years of development experience.
The primary key is used by the database for optimization purposes and should not be used by your application for anything more than identifying a particular entity or relating to a particular entity. Always having a single value primary key makes performing UPSERTs very straightforward.
answered Mar 5, 2018 at 19:19
0 A natural key, if available, is usually best. So, if datetime/char uniquely identifies the row and both parts are meaningful to the row, that's great. If just the datetime is meaningful, and the char is just tacked on to make it unique, then you might as well just go with an identify field. answered Dec 3, 2008 at 15:34
James CurranJames Curran 100k35 gold badges180 silver badges257 bronze badges 11 What is special about the primary key?What is the purpose of a table in a schema? What is the purpose of a key of a table? What is special about the primary key? The discussions around primary keys seem to miss the point that the primary key is part of a table, and that table is part of a schema. What is best for the table and table relationships should drive the key that is used. Tables (and table relationships) contain facts about information you wish to record. These facts should be self-contained, meaningful, easily understood, and non-contradictory. From a design perspective, other tables added or removed from a schema should not impact on the table in question. There must be a purpose for storing the data related only to the information itself. Understanding what is stored in a table should not require undergoing a scientific research project. No fact stored for the same purpose should be stored more than once. Keys are a whole or part of the information being recorded which is unique, and the primary key is the specially designated key that is to be the primary access point to the table (i.e. it should be chosen for data consistency and usage, not just insert performance).
It was said that primary keys should be as small as necessary. I would says that keys should be only as large as necessary. Randomly adding meaningless fields to a table should be avoided. It is even worse to make a key out of a randomly added meaningless field, especially when it destroys the join dependency from another table to the non-primary key. This is only reasonable if there are no good candidate keys in the table, but this occurrence is surely a sign of a poor schema design if used for all tables. It was also said that primary keys should never change as updating a primary key should always be out of the question. But update is the same as delete followed by insert. By this logic, you should never delete a record from a table with one key and then add another record with a second key. Adding the surrogate primary key does not remove the fact that the other key in the table exists. Updating a non-primary key of a table can destroy the meaning of the data if other tables have a dependency on that meaning through a surrogate key (e.g. a status table with a surrogate key having the status description changed from ‘Processed’ to ‘Cancelled’ would definitely corrupt the data). What should always be out of the question is destroying data meaning. Having said this, I am grateful for the many poorly designed databases that exist in businesses today (meaningless-surrogate-keyed-data-corrupted-1NF behemoths), because that means there is an endless amount of work for people that understand proper database design. But on the sad side, it does sometimes make me feel like Sisyphus, but I bet he had one heck of a 401k (before the crash). Stay away from blogs and websites for important database design questions. If you are designing databases, look up CJ Date. You can also reference Celko for SQL Server, but only if you hold your nose first. On the Oracle side, reference Tom Kyte. answered Jan 3, 2013 at 18:57
LukeLuke 691 silver badge1 bronze badge 2 I suspect Steven A. Lowe's rolled up newspaper therapy is required for the designer of the original data structure. As an aside, GUIDs as a primary key can be a performance hog. I wouldn't recommend it. 5 Natural versus artificial keys to me is a matter of how much of the business logic you want in your database. Social Security number (SSN) is a great example. "Each client in my database will, and must, have an SSN." Bam, done, make it the primary key and be done with it. Just remember when your business rule changes you're burned. I don't like natural keys myself, due to my experience with changing business rules. But if your sure it won't change, it might prevent a few critical joins.
answered Dec 3, 2008 at 19:26
Dan WilliamsDan Williams 4,86011 gold badges36 silver badges46 bronze badges 3 I too always use a numeric ID column. In oracle I use number(18,0) for no real reason above number(12,0) (or whatever is an int rather than a long), maybe I just don't want to ever worry about getting a few billion rows in the db! I also include a created and modified column (type timestamp) for basic tracking, where it seems useful. I don't mind setting up unique constraints on other combinations of columns, but I really like my id, created, modified baseline requirements. answered Dec 3, 2008 at 15:35
JeeBeeJeeBee 17.5k5 gold badges50 silver badges60 bronze badges 1 I look for natural primary keys and use them where I can. If no natural keys can be found, I prefer a GUID to a INT++ because SQL Server use trees, and it is bad to always add keys to the end in trees. On tables that are many-to-many couplings I use a compound primary key of the foreign keys. Because I'm lucky enough to use SQL Server I can study execution plans and statistics with the profiler and the query analyzer and find out how my keys are performing very easily. answered Dec 3, 2008 at 19:33
GugeGuge 4,5793 gold badges34 silver badges47 bronze badges 4 You should use a 'composite' or 'compound' primary key that comprises of multiple fields. This is a perfectly acceptable solution, go here for more info :) answered Dec 3, 2008 at 15:35
adamadam 22.3k20 gold badges86 silver badges119 bronze badges I always use an autonumber or identity field. I worked for a client who had used SSN as a primary key and then because of HIPAA regulations was forced to change to a "MemberID" and it caused a ton of problems when updating the foreign keys in related tables. Sticking to a consistent standard of an identity column has helped me avoid a similar problem in all of my projects. answered Dec 3, 2008 at 15:53
MattMatt 6195 silver badges12 bronze badges 2 GUIDs can be used as a primary key, but you need to create the right type of GUID so that it performs well. You need to generate COMB GUIDs. A good article about it and performance statistics is The Cost of GUIDs as Primary Keys. Also some code on building COMB GUIDs in SQL is in Uniqueidentifier vs identity(archive).
matt 3332 gold badges15 silver badges23 bronze badges answered Dec 3, 2008 at 19:49
Donny V.Donny V. 21.6k13 gold badges64 silver badges79 bronze badges 2 All tables should have a primary key. Otherwise, what you have is a HEAP - this, in some situations, might be what you want (heavy insert load when the data is then replicated via a service broker to another database or table for instance). For lookup tables with a low volume of rows, you can use a 3 CHAR code as the primary key as this takes less room than an INT, but the performance difference is negligible. Other than that, I would always use an INT unless you have a reference table that perhaps has a composite primary key made up from foreign keys from associated tables.
answered Dec 3, 2008 at 16:29
CoolcoderCoolcoder 4,0066 gold badges28 silver badges34 bronze badges If you really want to read through all of the back and forth on this age-old debate, do a search for "natural key" on Stack Overflow. You should get back pages of results.
answered Dec 3, 2008 at 16:34
Tom HTom H 46.3k14 gold badges85 silver badges126 bronze badges We do a lot of joins and composite primary keys have just become a performance hog. A simple int or long takes care of many problems even though you are introducing a second candidate key, but it's a lot easier and more understandable to join on one field versus three. answered Dec 3, 2008 at 15:41
Dan BlairDan Blair 2,3893 gold badges21 silver badges32 bronze badges 2 I'll be up-front about my preference for natural keys - use them where possible, as they'll make your life of database administration a lot easier. I established a standard in our company that all tables have the following columns:
Row ID has a unique key on it per table, and in any case is auto-generated per row (and permissions prevent anyone editing it), and is reasonably guaranteed to be unique across all tables and databases. If any ORM systems need a single ID key, this is the one to use. Meanwhile, the actual PK is, if possible, a natural key. My internal rules are something like:
So ideally you end up with a natural, human-readable and memorable PK, and an ORM-friendly one-ID-per-table GUID. Caveat: the databases I maintain tend to the 100,000s of records rather than millions or billions, so if you have experience of larger systems which contraindicates my advice, feel free to ignore me! answered Dec 30, 2008 at 22:34
Keith WilliamsKeith Williams 2,2273 gold badges19 silver badges29 bronze badges 2 Which field is suitable as a primary key?Often, a unique identification number, such as an ID number or a serial number or code, serves as a primary key in a table. For example, you might have a Customers table where each customer has a unique customer ID number. The customer ID field is the primary key.
Which of the following key is used to access a primary key field of a table in another table?A foreign key is a column or group of columns in a relational database table that provides a link between data in two tables. It uniquely identifies a record in the relational database table. It refers to the field in a table which is the primary key of another table. Only one primary key is allowed in a table.
|