A Primary Key Especially An Auto-generated Surrogate Key Is Sufficient

A Primary Key Especially An Auto-generated Surrogate Key Is Sufficient 5,9/10 198 reviews
  1. A Primary Key Especially An Auto-generated Surrogate Key Is Sufficient Living
  2. A Primary Key Especially An Auto-generated Surrogate Key Is Sufficient Meaning

As those of you who watched my recent webinar Data Modeling Fundamentals With Sisense ElastiCube might recall, a primary key is a unique identifier given to a record in our database, which we can use when querying the database or in order to join multiple sources. This article will discuss the concept of surrogate keys and show some examples of when and how to apply them using simple SQL.

Jul 14, 2019 The main difference between surrogate key and primary key is that surrogate key is a type of primary key that helps to identify each record uniquely, while the primary key is a set of minimal columns that helps to identify each record uniquely. RDBMS is a DBMS designed using the relational data model. It helps to store and manage data in databases. Browser to search for PRIMARY KEY within the page, it's not too hard to find. Also, if you create a primary key or a unique constraint on a table, you should see a notice informing you of the creation of the index, and its name.-Kevin.

ON DUPLICATE KEY) MySQL will always perform a lookup against ALL keys of a table - not just the PRIMARY KEY. So if you add a surrogate key in addition to a unique constraint/index on 5 columns then performance will likely be worse and the database will be more complex (more columns, more indexes, more lookups to handle the surrogate key processing when data is inserted/updated). Generally, a Surrogate Key is a sequential unique number generated by SQL Server or the database itself. The purpose of a Surrogate Key is to act as the Primary Key. There is a slight difference between a Surrogate Key and a Primary Key. Ideally, every row has both a Primary Key and a Surrogate Key. “There seams to be some confusion between what a Primary Key is, and what an Index is and how they are used. The Primary Key is a logical object. By that I mean that is simply defines a set of properties on one column or a set of columns to require that the columns which make up the primary key are unique and that none of them are null.

General Guidelines for Selecting Primary Keys

Before we dive into natural vs. surrogate keys, let’s recall four important rules to follow when selecting a primary key for your data model:

  1. The primary key must be unique for each record. A primary key with duplicates will lead to inaccurate queries with duplicated counts and totals. If two customers are assigned the same primary key, their sales activity will be unintentionally blended together. If the customer is accidentally duplicated, their sales activity will also be duplicated. Database architects refer to this as a loss of referential integrity.
  2. The primary key must apply uniform rules for all records. Whether your key is strictly numeric, alphanumeric, or a random system-generated value, each record must be programmed in a consistent format. This format must exist despite whatever complexities there are in the business requirements. An inconsistent format can lead to difficult data analysis, especially in parent/child data relationships.
  3. The primary key must stand the test of time. A key based off of contextual data at the present time, may not have the same contextual meaning later. For example, if a customer ID key is based on customer name, what happens when a customer is acquired or reorganized? Changing key formats should be avoided at all costs. Changing keys will require changing all stored procedures referencing the new key in any JOINs or WHERE clauses, as well as UPDATEs to all existing references to the old key in all of your database tables.
  4. The primary key must be read-only. In order to stand the test of time, primary keys should never be edited. Edited primary keys can have typos (123123 vs 132123), varying formats based on the user’s preference (1 vs 000001), and allow for overwriting a previously deleted record. Never allow anyone to edit the value of primary keys.

Selecting a Primary Key: Surrogate vs. Natural Keys

dee dee ford torrent download First, let’s go over the difference between these two forms of primary keys:

A natural key is a key that has contextual or business meaning (for example, in a table containing STORE, SALES, and DATE, we might use the DATE field as a natural key when joining with another table detailing inventory).

A natural key can be system-generated, but natural keys are at least partially determined by a manual process. Some natural keys are totally manually generated. One of the most widely recognized uses of a natural key is a stock ticker symbol – i.e. MSFT, APPL, and GOOGL. Natural keys serve as a great primary key when contextual meaning is important.

A surrogate key is a key which does not have any contextual or business meaning. It is manufactured “artificially” and only for the purposes of data analysis. The most frequently used version of a surrogate key is an increasing sequential integer or “counter” value (i.e. 1, 2, 3). Surrogate keys can also include the current system date/time stamp, or a random alphanumeric string.

A Primary Key Especially An Auto-generated Surrogate Key Is Sufficient Living

See Sisense in action:

When should you stick to natural keys in your data model?

The main advantage of natural keys is in their simplicity and in the fact that the data maintains its original context. They will often be (relatively) easy to recognize to people viewing the data, and relying on natural keys reduces the need to enrich the data using custom SQL. Additionally:

  • Natural keys are great for multiple data types in the database. Natural keys allow the user to easily identify the data type from the key, even when multiple data types use similar key formats. Financial databases frequently format their keys using a natural and sequential key together.

    Even though all three records contain a sequential ID of 123, the natural key prefix allows the user to immediately identify different data types.

  • Natural keys work well when connecting two systems with two different primary key formats. Thus for example, we can use

    To create

  • Natural keys make for a more easy-to-understand GUI. A customer ID such as GOOGL is easy for a user to recognize (for instance, you likely knew this stock ticker symbol is for Google). Easier recognition also allows for easier search.

Drawbacks of using natural keys

While it might be tempting and initially easier to rely on existing natural keys, this could prove problematic when scaling the data model, or in a more complex environment, which we will demonstrate using an example of stock tickers:

  • Natural keys do not apply uniform rules for each record. Designators or variables in the natural key make the key difficult to query and understand after the fact. For example, stock ticker symbols of preferred shares have a multitude of designators, including P, PR, and /PR. Trying to query for the designator P (SELECT * FROM stock_quotes WHERE stock_ticker_symbol like %P) would return all results where the stock ticker symbol ends in P, regardless if the symbol is actually preferred stock or not.
  • Natural keys do not stand the test of time. Symbols which might have been business meaning could become meaningless, or bear a different meaning in the future. Thus, for example, the symbols GOOG and GOOGL do not accurately represent the reorganization of the company from Google to Alphabet.
  • Natural keys can be easily confused with each other. Sticking with the previous example – when Twitter was ready to launch their IPO under the ticker TWTR, many investors bought from a defunct electronics company named Tweeter, trading under the ticker TWTRQ. Because TWTR and TWTRQ contain the same first four letters, many investors unintentionally invested in the wrong stock. Tweeter later changed their ticker symbol to THEGQ, which could also be misconstrued with GQ Magazine (a privately held company under Conde Nast).

Advantages of using surrogate keys

As mentioned, a surrogate key sacrifices some of the original context of the data. However, it can be extremely useful for analytical purposes for the following reasons:

  • Surrogate keys are unique. Because surrogate keys are system-generated, it is impossible for the system to create and store a duplicate value.
  • Surrogate keys apply uniform rules to all records. The surrogate key value is the result of a program, which creates the system-generated value. Any key created as a result of a program will apply uniform rules for each record.
  • Surrogate keys stand the test of time. Because surrogate keys lack any context or business meaning, there will be no need to change the key in the future.
  • Surrogate keys allow for unlimited values. Sequential, timestamp, and random keys have no practical limits to unique combinations.

Combining Natural and Surrogate Keys

Certain business scenarios might require keeping the natural key intact as a means for users to interact with the database. In these cases …

  • If a natural key is recommended, use a surrogate key field as the primary key, and a natural key as a foreign key. While users may interact with the natural key, the database can still have surrogate keys outside of the users’ view, with no interruption to user experience.
  • If a natural key must be used without an additional surrogate key, be sure to combine it with a surrogate key element. In our financial database example, Expense Reports (ER-123) have a natural key is used in conjunction with a surrogate sequential key. This format prevents many of the natural key side effects listed above.

An Example of Adding a Surrogate Key Using Custom SQL

In the following example, we will look at a table containing historical data about product prices. By using a custom SQL expression in the Sisense Elasticube Manager, we create the surrogate key ProdDate_Key, which in this case is created by combining the other fields into a single, unique identifier that can easily be queried later.

Original:

SQL used to add surrogate key:

SSELECT DISTINCT
tostring(ProductID)+'_'+tostring(getyear(Date))+'-'+tostring(getmonth(Date))+'-'+tostring(Getday(Date)) AS Prod_Date_Key,
Date,
PH.ProductID,
PH.ListPrice
FROM [ProductListPriceHistory] PH JOIN [AllDates] ON Date between PH.StartDate AND PH.EndDate

Result:

Settlers 7 cd key generator chomikuj. Want to master data modeling? Watch our on demand webinar and learn the fundamental skills every analyst should have.

his article is to explain how to implement the Surrogate keys from a logical dimensional model to a physical DBMS. There have been some clear cut guidelines on this matter from Ralph Kimball & Group but some people still manage to make a mess out of it , so it seems that it should be explained in finer details. This article has taken all the concepts from Ralph Kimball’s writing & only aims to explain them further for audience who may still have some doubts.

-What is a Surrogate Key?

By definition a surrogate key is a meaningless key assigned at the data warehouse to uniquely identify the values of that dimension . For example look at the typical star schema of logical dimensional model below-

Here all the dimensions are connected to central fact table in a one-many relationship. This relationship is identified by a surrogate key which acts as a primary key for that dimension (ie.e one surrogate key for each of dimension like Geography , Vendor , product etc ) and same is refered as the foreign key in the fact table.

-Surrogate Key in Physical DBMS Layer

While the surrogate is used to establish a one-many relationship between fact & dimension table where it acts a primary-foreign key identifier in dimension and fact table respectively , it must be kept in mind that this is a logical relationship and same is not meant to be implement in the database layer using the “Primary Key” method available in the database function. As a thumb rule we do not implement any keys or constraints in the physical layer and same is not required in the data warehousing systems .

In DW systems & in particular our dimensional its not relevant.They act as burden to ETL load & are cause of load failures & bring dw to its knees at times. As Ralph Kimball says below (which I think anyone with common sense can deduct) is that in DW keys /constraints enforced at physical level don’t add any value as they are process oriented objects unlike operational systems where they are entity identifier objects.

So the first point being that we dont “need” to enforce a primary key via database in an OLAP env since it is a controlled env loaded only through ETL & so not prone to random updates like OLTP env. The logic to generate & maintain the surrogate keys is already part of the ETL layer.

Apart from this surrogate keys as primary keys enforced at database layer comes with many serious drawbacks

– Surrogate keys are maintained using a SCD (slowly changing dimensions ) logic which determines whether a dimension row key need to updated , added new value or retain old value. This is driven by whether than col is SCD1 or 2 or 3 type and same is driven by business requirements. In such scenario a auto-generated primary key at database level implemented for a surrogate key can be a serious troublemaker . This can throw whole SCD logic out of window giving unexpected results

– Further a surrogate key is performing the function of a primary key logically in a “given context” of a data mart. This means if we were to re-use same dimension as a shrunk dimension in another data mart where the grain is different then it will not be unique at that level.

– Apart from this any database level operations adds a burden to ETL load in terms of reducing its performance. Any more database level keys will mean more IO , more CPU and less time for the ETL load to perform. This can become a nightmare if your data-warehouse contains a sizable chunk of data & is normally a cause of load failures as well.

-Foreign Key Constraints (RI) in DBMS Layer

While surrogate keys “logically” acts as a foreign key in the fact table , these foreign key constraints (Referential Integrity ) are also not required to be implemented in the DBMS layer. As Ralph Kimball says below this also proves to be more of problem than any value addition.

To explain this further here are few more points
– A data is loaded into the fact table only after data is verified and cleaned and loaded to dimension table. So there is no purpose served of implementing a RI at a fact table level.
– A Fact table is voluminous table with millions of rows , and implementing RI is going to make it prone to ETL load failures as FK key enforcement incurs huge cost in terms of CPU & IO load on server.
– Some people chose to have a FK key constraint in fact table but NOT enforce them . This is normally done for DB2 systems as it is suppose to use this info for aggregate navigator in terms of MQT. Refer to below link for more details

This aspect is misleading normally because MQT are not used automatically. For reports normally a BIT tool is used & whichever MQTs we create will be used through BI Tool & so BI Tool query rewrite handles it well. In fact in almost all cases MQTs that we will use will be built for “a” report and so MQTs is used directly in all the cases. Further MQTs when created , since they hold the data , the querying on them is isolated from source tables. The implementation of MQT through BI Tool is the information where aggregate navigator is not used at all.

Apart from this the aggregate navigator optimization level of DB2 is not known to give good results and we can always derive better results by explicitly enforcing these things. MQT on the top of is only known to make matters worse and MQT refresh is single most reason for ETL failures in our observations.

-Political Challenge from DBA team

While all this approach is correct in theory , the biggest challenge comes in going forward with this is not technical but a political challenge from DBA team which Ralph Kimball also hinted at . DBA team who is not aware of data warehousing systems can be a nightmare to work with in such cases. Few points can be helpful in explaining them the superfluous nature of implementing keys & constraints at the DBMS level.

A Primary Key Especially An Auto-generated Surrogate Key Is Sufficient Meaning

– Implementing keys & constraints at physical layer is duplicating the effort of ETL team. This part is already taken care of by ETL team in its code.
– A good ETL means no garbage or random updates to data warehouse. So implementing such things at physical level is an oxymoron suggesting our ETL jobs is poor.
– Try a demo of ETL load with keys & constraints and without them , showing why they are performance killer.
– Finally try telling them OLAP and OLTP systems are poles apart and things totally different in OLAP world .