MSBI (SSIS/SSRS/SSAS) Online Training

Thursday, October 25, 2012

MCITP 70-452 Passed

Hi All,

I have completed MCITP 70-452 certification successfully.


Exam Profile :
MCITP 70-452: Designing a Business Intelligence Infrastructure Using Microsoft SQL Server 2008
http://www.microsoft.com/learning/en/us/exam.aspx?id=70-452#tab1



MSBI (SSIS/SSRS/SSAS) Online Training:



Regards,

Ravindra

Sunday, October 21, 2012

MS : SSAS - Best Practices – Cube & MDX





Cube Design Best Practices - Dimensions


• Consolidate multiple hierarchies into single dimension (unless they are related via fact table)

• Avoid ROLAP storage mode

• Use role playing dimensions (e.g. OrderDate, BillDate, ShipDate) - avoids multiple physical copies

• Use parent-child dimensions prudently

o No aggregation support

• Set Materialized = true on reference dimensions

• Use many-to-many dimensions prudently

o Slower than regular dimensions, but faster than calculations

o Intermediate measure group must be “small” relative to primary measure group



Cube Design Best Practices – Attributes/Hierarchies

• Define all possible attribute relationships!

• Remove redundant attribute relationships

• Mark attribute relationships as rigid where appropriate

• Use integer (or numeric) key columns

• Set AttributeHierarchyEnabled to false for attributes not used for navigation (e.g. Phone#, Address)

• Set AttributeHierarchyOptimizedState to NotOptimized for infrequently used attributes

• Set AttributeHierarchyOrdered to false if the order of members returned by queries is not important

• Use natural hierarchies where possible



Cube Design Best Practices – Measures

• Use smallest numeric data type possible

• Use semi-additive aggregate functions instead of MDX calculations to achieve same behavior

• Put distinct count measures into separate measure group (BIDS does this automatically)

• Avoid string source column for distinct count measures



Cube Design Best Practices – OLAP Partitions

• No more than 20M rows per partition

• Specify partition slice

o Optional for MOLAP – server auto-detects the slice and validates against user specified slice (if any)

o Must be specified for ROLAP

• Manage storage settings by usage patterns

o Frequently queried  MOLAP with lots of aggregations

o Periodically queried  MOLAP with less or no aggregations

o Historical  ROLAP with no aggregations

• Alternate disk drive - use multiple controllers to avoid I/O contention

• Remote partitions for scale out – VLDB



Cube Design Best Practices – Aggregations

• Define all possible attribute relationships

• Set accurate attribute member counts and fact table counts

• Set AggregationUsage to guide aggregation designer

o Set rarely queried attributes to None

o Set commonly queried attributes to Unrestricted

• Do not build too many aggregations

o In the 100s, not 1000s

• Do not build aggregations larger than 30% of fact table size (agg design algorithm doesn’t)



MDX Query Design Best Practices

• Use calculated members instead of calc cells where possible

• Use .MemberValue for calculations on numeric attributes

o Filter(Customer.members, Salary.MemberValue > 100000)

• Avoid using CalculationPassValue

o Rely on auto recursion resolution using scopes and assignments

• Avoid redundant use of .CurrentMember and .Value

o (Time.CurrentMember.PrevMember, Measures.CurrentMember ).Value can be replaced with Time.PrevMember

• Avoid LinkMember, StrToSet, StrToMember, StrToValue

• Replace simple calculations with computed columns in DSV

o Calculation done at processing time is always better

• Many more at:

o http://sqljunkies.com/weblog/mosha

o http://sqlserveranalysisservices.com



Source:

http://blogs.msdn.com/b/jbasilico/archive/2008/04/01/analysis-services-2005-design-best-practices-cubes-mdx.aspx



MSBI (SSIS/SSRS/SSAS) Online Training:




Monday, October 8, 2012

CDC in SQL Server 2008/2012

Source tables change over time. A data mart or data warehouse that is based on those tables needs to reflect these changes. However, a process that periodically copies a snapshot of the entire source consumes too much time and resources. Alternate approaches that include timestamp columns, triggers, or complex queries often hurt performance and increase complexity. What is needed is a reliable stream of change data that is structured so that it can easily be applied by consumers to target representations of the data. Change data capture in SQL Server provides this solution.




The change data capture feature of the Database Engine captures insert, update, and delete activity applied to SQL Server tables, and makes the details of the changes available in an easily-consumed, relational format. The change tables used by change data capture contain columns that mirror the column structure of the tracked source tables, along with the metadata needed to understand the changes that have occurred on a row by row basis.

Source:

http://msdn.microsoft.com/en-us/library/bb895315.aspx

http://www.mattmasson.com/index.php/2011/12/cdc-in-ssis-for-sql-server-2012-2/



MSBI (SSIS/SSRS/SSAS) Online Training:



Best Regards,

Ravindra

sql server 2012 new features



Quite simply, SQL Server 2012 is going to revolutionize the way we think about availability and recovery. Today, when we think about these concepts, we consider an individual database (log shipping, replication, mirroring) or an entire SQL Server instance (failover clustering). In SQL Server 2012, in addition to Failover Clustering enhancements, we’ll be able to treat a group of databases as an entity – a much more common requirement with today’s complex, multi-database applications.


You may download fee ebook from below Microsoft site:

http://download.microsoft.com/download/F/F/6/FF62CAE0-CE38-4228-9025-FBF729312698/Microsoft_Press_eBook_Introducing_Microsoft_SQL_Server_2012_PDF.pdf



Microsoft has introduced SQL Server 2012 to the world and it's time for IT professionals to start to come to speed on what's new in this highly anticipated version of SQL Server.

1. AlwaysOn Availability Groups -- This feature takes database mirroring to a whole new level. With AlwaysOn, users will be able to fail over multiple databases in groups instead of individually. Also, secondary copies will be readable, and can be used for database backups. The big win is that your DR environment no longer needs to sit idle.

2. Windows Server Core Support -- If you don't know what Windows Server Core is, you may want to come up to speed before Windows 8 (MS is making a push back to the command line for server products). Core is the GUI-less version of Windows that uses DOS and PowerShell for user interaction. It has a much lower footprint (50% less memory and disk space utilization), requires fewer patches, and is more secure than the full install. Starting with SQL 2012, it is supported for SQL Server.

3. Columnstore Indexes -- This a cool new feature that is completely unique to SQL Server. They are special type of read-only index designed to be use with Data Warehouse queries. Basically, data is grouped and stored in a flat, compressed column index, greatly reducing I/O and memory utilization on large queries.

4. User-Defined Server Roles -- DBAs have always had the ability to create custom database role, but never server wide. For example, if the DBA wanted to give a development team read/write access to every database on a shared server, traditionally the only ways to do it were either manually, or using undocumented procedures. Neither of which were good solutions. Now, the DBA can create a role, which has read/write access on every DB on the server, or any other custom server wide role.

5. Enhanced Auditing Features -- Audit is now available in all editions of SQL Server. Additionally, users can define custom audit specifications to write custom events into the audit log. New filtering features give greater flexibility in choosing which events to write to the log.

6. BI Semantic Model -- This is replacing the Analysis Services Unified Dimensional Model (or cubes most people referred to them). It's a hybrid model that allows one data model will support all BI experiences in SQL Server. Additionally, this will allow for some really neat text infographics

7. Sequence Objects -- For those folks who have worked with Oracle, this has been a long requested feature. A sequence is just an object that is a counter -- a good example of it's use would be to increment values in a table, based a trigger. SQL has always had similar functionality with identity columns, but now this is a discrete object.

8. Enhanced PowerShell Support -- Windows and SQL Server admins should definitely start brushing up on their PowerShell scripting skills. Microsoft is driving a lot of development effort into instrumenting all of their server-based products with PowerShell. SQL 2008 gave DBAs some exposure to it, but there are many more in cmdlets in SQL 2012.

9. Distributed Replay -- Once again this is answer to a feature that Oracle released (Real Application Testing). However, and in my opinion where the real value proposition of SQL Server is, in Oracle it is a (very expensive) cost option to Enterprise Edition. With SQL, when you buy your licenses for Enterprise Edition, you get everything. Distributed replay allows you to capture a workload on a production server, and replay it on another machine. This way changes in underlying schemas, support packs, or hardware changes can be tested under production conditions.

10. PowerView -- You may have heard of this under the name "Project Crescent" it is a fairly powerful self-service BI toolkit that allows users to create mash ups of BI reports from all over the Enterprise.

11. SQL Azure Enhancements -- These don't really go directly with the release of SQL 2012, but Microsoft is making some key enhancements to SQL Azure. Reporting Services for Azure will be available, along with backup to the Windows Azure data store, which is a huge enhancement. The maximum size of an Azure database is now up to 150G. Also Azure data sync allows a better hybrid model of cloud and on-premise solutions

12. Big Data Support -- I saved the biggest for last, introduced at the PASS (Professional Association for SQL Server) conference last year, Microsoft announced a partnership with Hadoop provider Cloudera. One part of this involves MS releasing a ODBC driver for SQL Server that will run on a Linux platform. Additionally, Microsoft is building connectors for Hadoop, which is an extremely popular NoSQL platform. With this announcement, Microsoft has made a clear move into this very rapidly growing space.

Source:


http://mcpmag.com/articles/2012/03/14/top-12-features-of-sql-server-2012.aspx



MSBI (SSIS/SSRS/SSAS) Online Training:



Best Regards,

Ravindra


Tuesday, June 5, 2012

Sql Server : Differences b/w CTE/Temp/Table Variable


It depends on the circumstances.

Advantages of CTE:

1. You'll really have to performance test - There is no Yes/No answer. As per Andy Living's post above links to, a CTE is just shorthand for a query or subquery.
2. If you are calling it twice or more in the same function, you might get better performance if you fill a table variable and then join to/select from that. However, as table variables take up space somewhere, and don't have indexes/statistics (With the exception of any declared primary key on the table variable) there's no way of saying which will be faster.
3. They both have costs and savings, and which is the best way depends on the data they pull in and what they do with it. I've been in your situation, and after testing for speed under various conditions - Some functions used CTEs, and others used table variables.

Advantages of temp tables

1. The table "exists" - that is, it's materialized as a table, at least in memory, which contains the result set and can be reused.
2. In some cases, performance can be improved or blocking reduced when you have to perform some elaborate transformation on the data - for example, if you want to fetch a 'snapshot' set of rows out of a base table that is busy, and then do some complicated calculation on that set, there can be less contention if you get the rows out of the base table and unlock it as quickly as possible, then do the work independently. In some cases the overhead of a real temp table is small relative to the advantage in concurrency.

Advantages of derived tables:

4. A derived table is part of a larger, single query, and will be optimized in the context of the rest of the query. This can be an advantage, if the query optimization helps performance (it usually does, with some exceptions). Example: if you populate a temp table, then consume the results in a second query, you are in effect tying the database engine to one execution method (run the first query in its entirety, save the whole result, run the second query) where with a derived table the optimizer might be able to find a faster execution method or access path.

5. A derived table only "exists" in terms of the query execution plan - it's purely a logical construct. There really is no table.

Good Blog:

http://www.sqlservercentral.com/articles/Temporary+Tables/66720/


MSBI (SSIS/SSRS/SSAS) Online Training:


Sunday, May 6, 2012

SSAS-Analysis Services Query Performance Top 10 Best Practices

Proper cube design, efficient multidimensional expressions (MDX), and sufficient hardware resources are critical to optimal performance of MDX queries issued against a SQL Server 2005 Analysis Services instance. This article lists the ten most common best practices that the Microsoft SQL Server development team recommends with respect to optimizing Analysis Services query performance. For additional discussions about Analysis Services best practices related to query performance, see The Analysis Services Performance Guide and OLAP Design Best Practices for Analysis Services 2005.


I. Optimize cube and measure group design
• Define cascading attribute relationships (for example Day > Month > Quarter > Year) and define user hierarchies of related attributes (called natural hierarchies) within each dimension as appropriate for your data. Attributes participating in natural hierarchies are materialized on disk in hierarchy stores and are automatically considered to be aggregation candidates. User hierarchies are not considered to be natural hierarchies unless the attributes comprising the levels are related through cascading attribute relationships. With SQL Server 2005 Service Pack 2 (SP2), a warning appears in Business Intelligence Development Studio with each user hierarchy that is not defined as a natural hierarchy.
 • Remove redundant relationships between attributes to assist the query execution engine in generating the appropriate query plan. Attributes need to have either a direct or an indirect relationship to the key attribute, not both.
 • Keep cube space as small as possible by only including measure groups that are needed.
 • Place measures that are queried together in the same measure group. A query that retrieves measures from multiple measure groups requires multiple storage engine operations. Consider placing large sets of measures that are not queried together into separate measure groups to optimize cache usage, but do not explode the number of measure groups.
• Minimize the use of large parent-child hierarchies. In parent-child hierarchies, aggregations are created only for the key attribute and the top attribute (for example, the All attribute) unless it is disabled. As a result, queries returning cells at intermediate levels are calculated at query time and can be slow for large parent-child dimensions. If you are in a design scenario with a large parent-child hierarchy (more than 250,000 members), you may want to consider altering the source schema to reorganize part or all of the hierarchy into a user hierarchy with a fixed number of levels.
 • Optimize many-to-many dimension performance, if used. When you query the data measure group by the many-to-many dimension, a run-time “join” is performed between the data measure group and the intermediate measure group using the granularity attributes of each dimension that the measure groups have in common. Where possible, reduce the size of the intermediate fact table underlying the intermediate measure group. To optimize the run-time join, review the aggregation design for the intermediate measure group to verify that aggregations include attributes from the many-to-many dimension.

To understand how to optimize dimensions to increase query performance, refer to the articles SQL Server 2005 Analysis Services Performance Guide and OLAP Design Best Practices for Analysis Services 2005. For assistance in analyzing your design for compliance with best practices, see the February 2007 Community Technology Preview (CTP) release of the SQL Server 2005 Best Practices Analyzer (the final version should be released soon).

II.  Define effective aggregations
• Define aggregations to reduce the number of records that the storage engine needs to scan from disk to satisfy a query. If SQL Server Profiler traces indicate that most user queries that are not resolved from cache are resolved by partition reads rather than aggregation reads, consider using the Aggregation Manager sample application to design custom aggregations. This sample is available on CodePlex at http://www.codeplex.com/MSFTASProdSamples and a version of this sample that has been updated by the community is available on CodePlex at http://www.codeplex.com/bidshelper
• Avoid designing an excessive number of aggregations. Excessive aggregations reduce processing performance and may reduce query performance. While the optimum number of aggregations varies, the SQL Server Best Practices team’s experience has been that the optimum number is in the tens, not hundreds or thousands in almost all cases.
 • Enable the Analysis Services query log to capture user query patterns and use this query log when designing aggregations. For more information, see Configuring the Analysis Services Query Log.
 To understand how to design aggregations to increase query performance, refer to the articles SQL Server 2005 Analysis Services Performance Guide and OLAP Design Best Practices for Analysis Services 2005.

III. Use partitions
 • Define partitions to enable Analysis Services to query less data to resolve a query when it cannot be resolved from the data cache or from aggregations. Also define the partitions to increase parallelism when resolving queries.
 • For optimum performance, partition data in a manner that matches common queries. A very common choice for partitions is to select an element of time such as day, month, quarter, year or some combination of time elements. Avoid partitioning in a way that requires most queries to be resolved from many partitions.
 • In most cases, partitions should contain fewer than 20 million records size and each measure group should contain fewer than 2,000 total partitions. Also, avoid defining partitions containing fewer than two million records. Too many partitions causes a slowdown in metadata operations, and too few partitions can result in missed opportunities for parallelism.
 • Define a separate ROLAP partition for real-time data and place real-time ROLAP partition in its own measure group.

To understand how to design partitions to increase query performance, refer to the SQL Server 2005 Analysis Services Performance Guide, the Microsoft SQL Server Customer Advisory Team blog, and OLAP Design Best Practices for Analysis Services 2005.

IV. Write efficient MDX
• Remove empty tuples from your result set to reduce the time spent by the query execution engine serializing the result set.
• Avoid run-time checks in an MDX calculation that result in a slow execution path. If you use the Case and IF functions to perform condition checks which must be resolved many times during query resolution, you will have a slow execution path. Rewrite these queries using the SCOPE function to quickly reduce the calculation space to which an MDX calculation refers. For more information, see Budget Variance - A study of MDX optimizations: evaluation modes and NON_EMPTY_BEHAVIOR, Comparing Levels in MDX and CONDITION vs. SCOPE in cell calculations, and Multiselect friendly MDX calculations.
• Use Non_Empty_Behavior where possible to enable the query execution engine to use bulk evaluation mode. However, if you use Non_Empty_Behavior incorrectly, you will return incorrect results. For more information, see Budget Variance - A study of MDX optimizations: evaluation modes and NON_EMPTY_BEHAVIOR and Averages, ratios, division by zero and NON_EMPTY_BEHAVIOR.
• Use EXISTS rather than filtering on member properties to avoid a slow execution path. Use the NonEmpty and Exists functions to enable the query execution engine to use bulk evaluation mode.
• Perform string manipulations within Analysis Services stored procedures using server-side ADOMD.NET rather than with string manipulation functions such as StrToMember and StrToSet.
• Rather than using the LookupCube function, use multiple measure groups in the same cube wherever possible.
• Rewrite MDX queries containing arbitrary shapes to reduce excessive subqueries where possible. An arbitrary shaped set is a set of members that cannot be resolved as a crossjoin of sets with a single hierarchality. For example, the set {(Gender.Male, Customer.USA), (Gender.Female, Customer.Canada)} is an arbitrary set. You can frequently use the Descendants function to resolve arbitrary shapes by using a smaller number of subqueries than queries that return the same result that are written using other functions.
• Rewrite MDX queries that result in excessive prefetching where possible. Prefetching is a term used to describe cases where the query execution engine requests more information from the storage engine than is required to resolve the query at hand for reasons of perceived efficiency. Generally, prefetching is the most efficient data retrieval choice. However, occasionally it is not. In some cases you may be able to eliminate excessive prefetching by rewriting queries with a subselect in the FROM clause rather than a set in the WHERE clause. When you cannot eliminate excessive prefetching, you may need to disable prefetching and warm the cache using the Create Cache statement. For more information, see How to Warm up the Analysis Services data cache using Create Cache statement.
• Filter a set before using it in a crossjoin to reduce the cube space before performing the crossjoin.

V. Use the query engine cache efficiently
• Ensure that the Analysis Services computer has sufficient memory to store query results in memory for re-use in resolving subsequent queries. To monitor, use the MSAS 2005: Memory/Cleaner Memory Shrinkable DB and the MSAS 2005: Cache/Evictions/sec Performance Monitor counters.
• Define calculations in the MDX script. Calculations in the MDX script have a global scope that enables the cache related to these queries to be shared across sessions for the same set of security permissions. However, calculated members defined using Create Member and With Member within user queries do not have global scope and the cache related to these queries cannot be shared across sessions.
• Warm the cache by executing a set of predefined queries using the tool of your choice. You can also use a Create Cache statement for this purpose. For more information on using the Create Cache statement, see How to Warm up the Analysis Services data cache using Create Cache statement. For information on how to use SQL Server 2005 Integration Services to warm the cache, see Build Your Own Analysis Services Cache-Warmer in Integration Services.
• Rewrite MDX queries containing arbitrary shapes to optimize caching. For example, in some cases you can rewrite queries that require non-cached disk access such that they can be resolved entirely from cache by using a subselect in the FROM clause rather than a WHERE clause. In other cases, a WHERE clause may be a better choice.

VI. Ensure flexible aggregations are available to answer queries.
• Note that incrementally updating a dimension using ProcessUpdate on a dimension drops all flexible aggregations affected by updates and deletes and, by default, does not re-create them until the next full process.
• Ensure that aggregations are re-created by processing affected objects, configuring lazy processing, performing ProcessIndexes on affected partitions, or performing full processing on affected partitions.

To understand how to ensure flexible aggregations are not dropped, refer to the SQL Server 2005 Analysis Services Performance Guide.

VII. Tune memory usage
• Increase the size of the paging files on the Analysis Services server or add additional memory to prevent out–of-memory errors when the amount of virtual memory allocated exceeds the amount of physical memory on the Analysis Services server.
• Use Microsoft Windows Advanced Server® or Datacenter Server with SQL Server 2005 Enterprise Edition (or SQL Server 2005 Developer Edition) when you are using SQL Server 2005 (32-bit) to enable Analysis Services to address up to 3 GB of memory. To enable Analysis Services to address more than 2 GB of physical memory with either of these editions, use the /3GB switch in the boot.ini file. If you set the /3GB switch in the boot.ini file, the server should have at least 4 GB of memory to ensure that the Windows operating system also has sufficient memory for system services.
• Reduce the value for the Memory/LowMemoryLimit property below 75 percent when running multiple instances of Analysis Services or when running other applications on the same computer.
• Reduce the value for the Memory/TotalMemoryLimit property below 80percent when running multiple instances of Analysis Services or when running other applications on the same computer.
• Keep a gap between the Memory/LowMemoryLimit property and the Memory/TotalMemoryLimit property – 20 percent is frequently used.
• When query thrashing is detected in a multi-user environment, contact Microsoft Support for assistance in modifying the MemoryHeapType.
• When running on non-uniform memory access (NUMA) architecture and VirtualAlloc takes a very long time to return or appears to stop responding, upgrade to SQL Server 2005 SP2 and contact Microsoft Support for assistance with appropriate settings for pre-allocating NUMA memory.

To understand when to consider changing default memory use, refer to the SQL Server 2005 Analysis Services Performance Guide and Microsoft SQL Server Customer Advisory Team blog.

VIII. Tune processor usage
• To increase parallelism during querying for servers with multiple processors, consider modifying the Threadpool\Query\MaxThreads and Threadpool\Process\MaxThreads options to be a number that depends on the number of server processors.
• A general recommendation is to set the Threadpool\Query\MaxThreads to a value of less than or equal to two times the number of processors on the server. For example, if you have an eight-processor server, the general guideline is to set this value to no more than 16. In practical terms, increasing the Threadpool\Query\MaxThreads option will not significantly increase the performance of a given query. Rather, the benefit of increasing this property is that you can increase the number of queries that can be serviced concurrently.
• A general recommendation is to set the Threadpool\Process\MaxThreads option to a value of less than or equal to 10 times the number of processors on the server. This property controls the number of threads used by the storage engine during querying operations as well as during processing operations. For example, if you have an eight-processor server, the general guideline is setting this value to no more than 80. Note that even though the default value is 64, if you have fewer than eight processors on a given server, you do not need to reduce the default value to throttle parallel operations.
• While modifying the Threadpool\Process\MaxThreads and Threadpool\Query\MaxThreads properties can increase parallelism during querying, you must also consider the additional impact of the CoordinatorExecutionMode option. For example, if you have a four-processor server and you accept the default CoordinatorExecutionMode setting of -4, a total of 16 jobs can be executed at one time across all server operations. So if 10 queries are executed in parallel and require a total of 20 jobs, only 16 jobs can launch at a given time (assuming that no processing operations are being performed at that time). When the job threshold has been reached, subsequent jobs wait in a queue until a new job can be created. Therefore, if the number of jobs is the bottleneck to the operation, increasing the thread counts may not necessarily improve overall performance.

IX. Scale up where possible
• Use a 64-bit architecture for all large systems.
• Add memory and processor resources and upgrade the disk I/O subsystem, to alleviate query performance bottlenecks on a single system.
• Avoid linking dimensions or measure groups across servers and avoid remote partitions whenever possible because these solutions do not perform optimally.

X. Scale out when you can no longer scale up
• If your performance bottleneck is processor utilization on a single system as a result of a multi-user query workload, you can increase query performance by using a cluster of Analysis Services servers to service query requests. Requests can be load balanced across two Analysis Services servers, or across a larger number of Analysis Services servers to support a large number of concurrent users (this is called a server farm). Load-balancing clusters generally scale linearly.
• When using a cluster of Analysis Services servers to increase query performance, perform processing on a single processing server and then synchronize the processing and the query servers using the XMLA Synchronize statement, copy the database directory using Robocopy or some other file copy utility, or use the high-speed copy facility of SAN storage solutions.
 Source :
http://sqlcat.com/sqlcat/b/top10lists/archive/2007/09/13/analysis-services-query-performance-top-10-best-practices.aspx