Sunday, December 3, 2017

Working with XML Data Type

The XML data type, introduced in SQL Server 2005. XML data type can store either a complete XML or a fragment of XML document. Before xml data type the xml data was stored in varchar or text data type, that was proven to be poor in terms of querying and manipulating the XML data.

When to use XML data type:
  • If one wants to store the XML document in the database and retrieve and update the document as a whole - that is, if you never need to query or modify the individual XML components - you should consider using one of the large object data types like VARCHAR(MAX), NVARCHAR(MAX) or VARINARY(MAX).
  • If you want to preserve the original form such as legal documents or want to retain an exact textual copy, use large object storage.
  • The rest of the time, one should consider using XML data type which ensures that the data is well formed according to ISO standards, and supports fine-grained queries and modifications to specific elements and attributes within the XML. One can also index an XML column and associate its data with an XML schema collection in order to preserve its content and structure.
Limitations of using XML data type:
  • This cannot be used as a key in an index
  • Cannot exceed 2 GB size
  • XML data type cannot be compared, sorted, or used in GROUP BY
  • It does not support casting or converting to either text or ntext
  • It cannot be used as a parameter to any scalar, or built-in functions other than ISNULL, COALESCE and DATALENGTH
Data Manipulations on XML column: Below illustrations explains how to SELECT, INSERT, UPDATE and DELETE data in a XML column.

Below is the sample XML used to walk through the manipulations:
<Employees>
<Employee>
<ID>1</ID>
<Name>John Test</Name>
<Department>IT</Department>
<Address>
<Street1>4404, Main Road</Street1>
<Street2/>
<City>Reston</City>
<State>VA</State>
<Zip>20171</Zip>
</Address>
</Employee>
<Employee>
<ID>2</ID>
<Name>King George</Name>
<Department>Admin</Department>
<Address>
<Street1>3004, St. Patric Road</Street1>
<Street2>Apt D</Street2>
<City>Vienna</City>
<State>VA</State>
<Zip>20171</Zip>
</Address>
</Employee>
</Employees>
SELECTing Employee Name, Department and Address for the given Employee ID
SELECT
t.c.value('Name[1]', 'varchar(max)') as 'Employee Name',
t.c.value('Department[1]', 'varchar(max)') as 'Department',
t1.c1.value('Street1[1]', 'varchar(max)') + ', ' +
t1.c1.value('Street2[1]', 'varchar(max)') +
-- Conditionalyy append comma based on Street2 value, Append when Street2 available else don't
case when ((t1.c1.value('Street2[1]', 'varchar(max)') != '')) then (', ') else ('') end +
t1.c1.value('City[1]', 'varchar(max)') + t1.c1.value('State[1]', 'varchar(max)') + ' - ' +
t1.c1.value('Zip[1]', 'varchar(max)') as 'Address'
FROM
@XML.nodes('//Employees/Employee') as t(c)
CROSS APPLY t.c.nodes('Address') as t1(c1)
WHERE
t.c.value('ID[1]', 'int') = 1
UPDATing Street2 of Address for the given Employee ID
SET @employeeID = 2 -- Provide employee ID
-- Check for provided employee ID and if Street2 value is NULL
IF @Xml.exist('/Employees/Employee/ID[text()=sql:variable("@employeeID")]/../Address/Street2/text()') = @employeeID
BEGIN -- will execute for employeeID = 2
PRINT 'First Part'
SET
@Xml.modify('replace value of (/Employees/Employee/ID[text()=sql:variable("@employeeID")]/../Address/Street2/text())[1] with "Apt F"')
END
ELSE -- IF Street2 tag value is present execute below
BEGIN -- will execute for employeeID = 1
PRINT 'Second Part'
SET @Xml.modify('delete (/Employees/Employee/ID[text()=sql:variable("@employeeID")]/../Address/Street2)')
SET @Xml.modify('insert (<Street2>Apt A</Street2>) after (/Employees/Employee/ID[text()=sql:variable("@employeeID")]/../Address/Street1)[1]')
END
SELECT @Xml
DELETing Street2 of Address for the given Employee ID
SET @employeeID = 1 -- Provide employee ID
BEGIN
SET @Xml.modify('delete (/Employees/Employee/ID[text()=sql:variable("@employeeID")]/../Address/Street2)')
END
SELECT @Xml
INSERTing Street3 after Street2 of Address for the given Employee ID
SET @employeeID = 1 -- Provide employee ID
BEGIN
SET @Xml.modify('insert (<Street3>Near</Street3>) after (/Employees/Employee/ID[text()=sql:variable("@employeeID")]/../Address/Street2)[1]')
END
SELECT @Xml

Friday, December 1, 2017

Performance Tuning

LINQ to Entity is a ORM for querying and managing database. Below are some tips and tricks that we should keep an eye on while designing and querying database using entity framework ORM.

Cold Vs Warm Query Execution:
The very first time any query is made against a given model, the Entity Framework does lot of work behind the scenes to load and validate the model. This is called Cold query. These are slow in nature. Further queries against an already loaded model are known as Warm queries and are much faster.

Below section explains different ways of reducing performance cost of both cold and warm queries.
Caching
Entity Framework comes with below built-in cache mechanisms:

Object Caching: This is also called as First Level Caching and uses ObjectContext instance's ObjectStateManager to keep track of the objects in memory that have been retrieved using that instance.

The behavior of this caching is - when EF materializes an entity returned as resultant of a query, the ObjectContext will check if an entity with the same key has already been loaded into its ObjectStateManager, if found EF will include it in the results. Although EF will still issue the query against the database. This behavior will decrease the performance and can be bypassed to eliminate materializing the same entity multiple times as below:
  • Getting entities from the object cache using DbContext Find: Unlike a regular query, the Find method in DbSet will perform a search in memory before even issuing the query against the database. Find uses the primary key value to attempt to find an entity tracked by the context. NULL will be returned if the entity is not found in the context or database.

    Example of Find with auto-detect changes disabled:
    context.Configuration.AutoDetectChangesEnabled = false;
    var product = context.Products.Find(productId);
    context.configuration.AutoDetectChangesEnabled = true;

    Per MSDN, Find method can be used n below scenarios:
    • If the object is not cache the benefits of find are negated, but the syntax is still simpler than a query by key
    • If auto detect changes is enabled the cost of the Find method may increase by one order or magnitude, or even more depending on the complexity of your model and the amount in your object cache.
  • Issues when the object cache has many entities: If the Object Cache has a very large amount of entities loaded, it may effect operations like Add, Remove, Find,Entry and more. Especially DefectChanges will negatively affected by large object caches. This synchronizes the object graph with the object state manager and its performance will determine directly by the size of the object graph.

    Range methods like AddRange and RemoveRange should be used instead of iterating through the collection as the advantage of using range methods is that the cost of DetectChange is called only once for the entire set of entities as opposed to once per each added entity.
Query Plan Caching: Reusing the generated store command when a query is executed more than once. When the query is executed first time, it goes through the internal plan compiler to translate the conceptual query into the store command. When the query plan caching is enables, whenever the query is executed next time, it retrieves the store command from query plan cache for execution bypassing the plan compiler. The query plan caching is shared across ObjectContext instances within the same AppDomain.
  • Query plan cache is shared for all Entity SQL, LINQ to Entities and Compiled Query types
  • Query plan caching is enabled by default for Entity SQL queries and LINQ to Entity queries, whether they are executed through an EntityCommand or through an ObjectQuery.
  • Query plan cache can be disabled by setting the EnablePlanCaching property to false as below:
    var query = from customer in context.Customer where customer.CustomerId == id
    select new { customer.CustomerId, customer.Name };

    ObjectQuery oQuery = query as ObjectQuery;
    oQuery.EnablePlanCaching = false;
  • For parameterized queries, changing the parameter value will still use the cached query
Metadata Caching : This is essentially caching of type information and type-to-database mapping information across different connections to the same model and unique per AppDomain.

Results caching: This is also called as Second Level Caching which keeps the results of queries in a local cache. One can check this before running a query against the database/store. This caching is not directly supported by EF but can be implemented using wrapping provider.

MSDN wrapper samples can be found at: Wrappers
Avoid using One Single Entity Model:
Entity Model specifies a single unit of work and not the whole database. So to eliminate the wastage of space and performance degradation, create separate entity models for related DB objects. For ex: all logs, batch process etc.
Disable Change Tracking for Entity
Object Tracking is not required whenever we are handling the read only data. So, disable object tracking by using MergeOption as below:

TestContext context = new TestContext();
context.testTable.MergeOption = MergeOption.NoTracking;
Using Views
Whenever the ObjectContext is created first time in an application, entity framework creates a set of classes which are required to access database. These set of classes are called views and if the model is large then creating this view may delay the web application's response time for the first request. Try to create this view at compile time using T4 templates or EdmGen.exe command line tool to reduce the response time.
Fetch only Required Fields
Avoid fetching all the fields of a database table. For example, if a table contains 25 fields and the current scenario requires only 5 fields, then just fetch only the required 5 fields instead of all the 25 fields.

Good Practice
var customer = (from cust in dataContext.Customers select new {customer. CustomerID, customer.Name, customer.Address }). ToList ();
Bad Practice
var customer = (from cust in dataContext.Customers select cust).ToList();
Use Appropriate Collection
As LINQ is highly used to query ObjectContext in EF and LINQ has Var, IEnumerable, IQueryable, IList type collections for data manipulation, and each collection has its own purpose in regards to performance, beware of using these collections for data manipulation.
Use Compiled Query
Compiled queries are helpful in scenarios where we know that it is used very frequently to fetch records. Comparatively this query is slow while executing first time but boost the performance significantly in following calls. Query is compiled once and can be used any number of time. Compiled Query does not need to be recompiled even if the parameter of the query is being changed.

Steps to create Compiled Query:
  • Create a static class
  • Add System.Data.Linq namespace
  • Use LINQ's CompiledQuery class to create compiled LINQ query
Below example illustrates creating CompiledQuery with no parameters:
static class MyCompliedQueries
{
public static Func<DataClasses1DataContext ,IQueryable<Person>>
CompliedQueryForPesron = CompiledQuery.Compile((DataClasses1DataContext context)=>from c in context.Persons select c );
}
Above CompiledQuery can be called as below:
DataClasses1DataContext context = new DataClasses1DataContext();
var result = MyCompliedQueries.CompliedQueryForPesron(context);

Below example illustrates creating CompiledQuery with parameters:
static class MyCompliedQueries
{
public static Func<DataClasses1DataContext, int, IQueryable<Person>>
CompliedQueryForPesron = CompiledQuery.Compile(
(DataClasses1DataContext context, int personId)=>from c in context.Persons where c.PersonID == personId select c );

}
Above CompiledQuery can be called as below:
DataClasses1DataContext context = new DataClasses1DataContext();
var result = MyCompliedQueries.CompliedQueryForPesron(context, 9);
Retrieve only Required Records
To improve performance while retrieving records, retrieve only required no of records by using Take, While and Skip methods.

DataClasses1DataContext context = new DataClasses1DataContext();
int pageSize=10,startingPageIndex=2;

List lstCus = context.Persons.Take(pageSize).Skip(startingPageIndex * pageSize).ToList();
Avoid Contains
Try avoid using Contains of LINQ. This will be converted as "WHERE IN" in SQL which cause performance degrades.
Avoid Views
SQL Views degrade LIQ query performance. These are slow and impact performance badly. So avoid using Views in LINQ to Entities.
Debug and Optimize LINQ Query
With the help of LINQ Pad one can debug and optimize query. This tool can be used to optimize LINQ query.