The 33 character "representation" of GUIDs is NOT what is compared by SQL Server any more than the 'Monday March 1st, 1995 12:00:00AM' is what is compared when doing a DATETIME comparison.
Rather, the value's 16-bytes are compared as raw binary. This is why NEWID() [aka randomized GUIDs] are so much worse than NEWSEQUENTIALID() [aka old-style MAC+TIME+TSEQ+SEQ]. The latter will be monotonically increasing within ranges.
So, GUIDs are not bad, randomized GUIDs are.
I will say that when you _CAN_ use an INT, BIGINT or segmented key [e.g. source-server TINYINT + IDENTITY INT], you should... this is when you can control the sources of the ids. But if you're dealing with clouds of servers, you'll need some globally unique value and that is what GUIDs are for.
Marc, I completely agree. The problem is, on my poll, only one respondent indicated the use of NEWSEQUENTIALID(). Most of us as developers are simply not aware of the raw size / fragmentation issues created by using UNIQUEIDENTIFIER as a primary key, because we are looking for convenience first and ignoring the details vis-a-vis performance.
Lack of education how to use a feature does not mean a feature is bad, or should be avoided, especially if YOU do know how to use the feature correctly.
that is why your "Im going to think twice..." comment makes no sense.
@Gaijin42, It really depends on how many rows you are going to be dealing with. Even the NEWSEQUENTIALID method has more fragmentation than an int identity PK, and fragmentation has performance implications.
So for me to say that I need to think about whether I am going to use GUIDs as primar keys is, I think, completely logical.
One way or the other, perhaps it's laziness -- I find that with storage being cheap, it was the clustering/indexing that was my main concern with GUIDs as a PK...
...but you can get away with that if you read another article from Kimberly Tripp... http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx
@Anonymous, I read Tripp's article, I even quoted her in this post. But she does not provide a solution with NEWSEQUENTIALID() to get away from the space issue. You're still dealing with 16 byte wide ID's.
I only use GUIDs as PKs for UserIDs, for security reasons. I can also see using GUIDs for replication/merging databases, as you point out. But for pretty much everything else? Never. Nice to see some calculations and tests that verify my own reasons for eschewing GUIDs (I was always too lazy to test).
@jonf, One of the biggest problems is that if you've developed a database schema using GUIDs and you're in production with millions of rows of data, it's almost impossible to change at that point.
Only a developer would create a guid pk field in a database! DBA's nearly always always frown upon there use because of the fragmentation and size issues. Great post for those that do not know evils of guids in the db!
Peter is a Microsoft C# .NET MVP, Author, and UnEducator based in the Orlando Florida area. He writes mostly about .NET for eggheadcafe.com An ex-jazz musician and stockbroker, he is a conservative - libertarian who dreams of being a philanthropist and paying less in taxes. He believes in the US Constitution and free-market "Austrian" economic theories.
Research for a PhD in microeconomics has given Peter an avid interest in the study of economic and climate change trends and the design of computer algorithms to process historical data sets with neural networks.
Peter is a long time Mensa member. His favorite quotation is:
"Wrong is Right" -- Thelonius Monk.
You will find about 50 to 60% of Peter's posts having to do with politics and economics, and the rest about evenly split between .NET, programming, and science.
Peter often makes controversial posts or comments in order to stir controversy, so if you get your feelings hurt, don't say you haven't been warned! He often has strong opinions on various issues. However, he is ready willing and able to change those opinions when presented with compelling evidence that he was wrong.
10 Comments:
The 33 character "representation" of GUIDs is NOT what is compared by SQL Server any more than the 'Monday March 1st, 1995 12:00:00AM' is what is compared when doing a DATETIME comparison.
Rather, the value's 16-bytes are compared as raw binary. This is why NEWID() [aka randomized GUIDs] are so much worse than NEWSEQUENTIALID() [aka old-style MAC+TIME+TSEQ+SEQ]. The latter will be monotonically increasing within ranges.
So, GUIDs are not bad, randomized GUIDs are.
I will say that when you _CAN_ use an INT, BIGINT or segmented key [e.g. source-server TINYINT + IDENTITY INT], you should... this is when you can control the sources of the ids. But if you're dealing with clouds of servers, you'll need some globally unique value and that is what GUIDs are for.
Marc,
I completely agree. The problem is, on my poll, only one respondent indicated the use of NEWSEQUENTIALID(). Most of us as developers are simply not aware of the raw size / fragmentation issues created by using UNIQUEIDENTIFIER as a primary key, because we are looking for convenience first and ignoring the details vis-a-vis performance.
As mentioned previously, the guid is not stored as a string,and is using numeric methods for comparison and indexing so that point is flat out wrong.
The paging issue is real, which is why COMB guids were instituted.
http://www.informit.com/articles/article.aspx?p=25862
Lack of education how to use a feature does not mean a feature is bad, or should be avoided, especially if YOU do know how to use the feature correctly.
that is why your "Im going to think twice..." comment makes no sense.
@Gaijin42,
It really depends on how many rows you are going to be dealing with. Even the NEWSEQUENTIALID method has more fragmentation than an int identity PK, and fragmentation has performance implications.
So for me to say that I need to think about whether I am going to use GUIDs as primar keys is, I think, completely logical.
I've got some replication to consider.
One way or the other, perhaps it's laziness -- I find that with storage being cheap, it was the clustering/indexing that was my main concern with GUIDs as a PK...
...but you can get away with that if you read another article from Kimberly Tripp...
http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx
@Anonymous,
I read Tripp's article, I even quoted her in this post. But she does not provide a solution with NEWSEQUENTIALID() to get away from the space issue. You're still dealing with 16 byte wide ID's.
I only use GUIDs as PKs for UserIDs, for security reasons. I can also see using GUIDs for replication/merging databases, as you point out. But for pretty much everything else? Never. Nice to see some calculations and tests that verify my own reasons for eschewing GUIDs (I was always too lazy to test).
@jonf,
One of the biggest problems is that if you've developed a database schema using GUIDs and you're in production with millions of rows of data, it's almost impossible to change at that point.
Only a developer would create a guid pk field in a database! DBA's nearly always always frown upon there use because of the fragmentation and size issues. Great post for those that do not know evils of guids in the db!
@Jeff,
I think you just hit the nail right between the eyes on that score!
Post a Comment
Links to this post:
Create a Link
<< Home