Referential Integrity

indices and primary keys

Difference between an Index and a Primary Key

Indexes can be used for fast retrieval based on other columns. A primary key must be unique and cannot be nullable. A primary key is always indexed, meaning that a primary key is also always an index key. Keys are frequently good candidates for indexing and some DBMSs automatically create indexes for keys, but that doesn’t have to be so. A key (minimal super key) is a set of attributes, the values of which are unique for every tuple. An index is a performance optimization feature that enables data to be accessed faster.

Oracle Database enforces a UNIQUE key or PRIMARY KEY integrity constraint on a table by creating a unique index on the unique key or primary key. This index is automatically created by the database when the constraint is enabled. You can create indexes explicitly (outside of integrity constraints) using the SQL statement CREATE INDEX. Indexes can be unique or non-unique. Unique indexes guarantee that no two rows of a table have duplicate values in the key column (or columns). Non-unique indexes do not impose this restriction on the column values. Use the CREATE UNIQUE INDEX statement to create a unique index.

The Primary Key is a logical object. By that I mean that is simply defines a set of properties on one column or a set of columns to require that the columns which make up the primary key are unique and that none of them are null. Because they are unique and not null, these values (or value if your primary key is a single column) can then be used to identify a single row in the table every time. In most if not all database platforms the Primary Key will have an index created on it. An index on the other hand doesn’t define uniqueness. An index is used to more quickly find rows in the table based on the values which are part of the index. When you create an index within the database, you are creating a physical object which is being saved to disk.

Difference between Key, Primary Key, Unique Key and Index in MySQL

Primary key does not allow null value but unique key allows null value. We can declare only one primary key in a table but a table can have multiple unique key(column assign). Primary key makes the table row unique. Unique key makes the table column in a table row unique. An index is typically created on columns used in JOIN, WHERE, and ORDER BY clauses. Primary keys and unique keys are similar. A primary key is a column, or a combination of columns, that can uniquely identify a row. It is a special case of unique key. Also note that columns defined as primary keys or unique keys are automatically indexed in MySQL.

foreign keys

You should always enforce referential integrity by using normal foreign keys. We use referential integrity to validate data. But does that really help or hinder performance? The answers to these questions are Yes, they can and No, they don’t. Foreign key constraint improve performance at the time of reading data but at the same time it slows down the performance at the time of inserting / modifying / deleting data. In case of reading the query, the optimizer can use foreign key constraints to create more efficient query plans as foreign key constraints are pre declared rules. This usually involves skipping some part of the query plan because for example the optimizer can see that because of a foreign key constraint, it is unnecessary to execute that particular part of the plan. run the following query from Query menu and include Actual Execution plan.

create table Employee(EmployeeID int primary key)
create table EmployeeOrder(OrderID int primary key, EmployeeID int not null constraint fkOrderCust references Employee(EmployeeID))
Select * from EmployeeOrder eo
where exists
(
  select * From Employee e
  where eo.EmployeeID = e.EmployeeID
)

You can notice that ~optimizer did not access Employee table~ and is not shown in execution plan. This is because the optimizer knows that it is not necessary to execute the EXISTS operator in this query because the foreign key constraint(Trusted constraint) requires all EmployeeOrders to refer to an existing Employee, which is what the WHERE clause checks.

Yes, having foreign key constraints in place can improve query performance. There are various transforms that are open to the optimizer when appropriate foreign key constraints exist that are not generally available. For example, if you were to join A and B but only select data from B, the optimizer could eliminate A from the query plan entirely if there was a foreign key constraint in place (this sort of thing comes in very handy when you’ve got useful views that join in more tables than your current query strictly needs because you don’t have to trade the performance costs of the extra joins against the code reuse from using an existing view). They also come in handy when you’re doing things like using things like query rewrite to rewrite a query to use a materialized view in a data warehouse/ DSS type system.

Foreign Keys are a referential integrity tool, not a performance tool. At least in SQL Server, the creation of an FK does not create an associated index, and you should create indexes on all FK fields to improve look up times.

SQL Server 7.0 / 2000 came with ‘index tuning wizard’ this functionality has been around for a long time. I’d recommend having a look at select * from sys.dm_db_missing_index_details. It tells you which indexes are ‘missing’, it’s trivial to look in that table and then create indexes.

pro con
They are already implemented within the DBMS You are duplicating the work that has already been done.
They are declarative, self-documenting and “obvious”. It’s imperative, probably “buried” deep in your application source code, and harder to maintain.
They cannot be bypassed (unless explicitly disabled or dropped). A single client application that has a bug can break the referential integrity (and corrupt the data).
They are correct. You are likely to implement them incorrectly in your application code. It looks simple from the outset, but in a concurrent environment, it is easy to introduce race conditions.
They are fast. Even if you have implemented them correctly, you probably used some form of locking to avoid race conditions, which is likely to be slower / less scalable than specially optimized FKs built into the DBMS.
They support cascading referential actions (such as ON DELETE CASCADE). You have to implement cascading yourself.
The DBMS knows the data is related, allowing it to find a better query plan in some cases. The DBMS doesn’t know the data is related, which may produce sub-optimal query plan.
If you are using an ORM tool, it can automatically generate references between objects. You may need to do more manual work in your ORMt ool of choice.

inner platform effect

In the database world, developers are sometimes tempted to bypass the RDBMS, for example by storing everything in one big table with three columns labelled entity ID, key, and value. While this entity-attribute-value model allows the developer to break out from the structure imposed by an SQL database, it loses out on all the benefits, since all of the work that could be done efficiently by the RDBMS is forced onto the application instead. Queries become much more convoluted, the indexes and query optimizer can no longer work effectively, and data validity constraints are not enforced. Performance and maintainability can be extremely poor.