MSBI (SSIS/SSRS/SSAS) Online Training

Thursday, October 29, 2009

SSIS - Full-load vs. Delta-Load

Nice discussion o nthis topix.Pls go thru the below URL.
http://social.msdn.microsoft.com/forums/en-US/sqlintegrationservices/thread/e17e8143-e19b-4cff-a1a0-49977cfaede9/
--Cheers, Rav

SSRS-Scale-Out Deployment Best Practices

Check out the the third of five technical note as part of the Building and Deploying Large Scale SQL Server Reporting Services Environments Technical Note Series: Reporting Services Scale-Out Deployment Best Practices
This technical note reviews the
SSRS Scale-Out Architecture
Report Catalog sizing
The benefits of File System snapshots for SSRS 2005
Why File System snapshots may not help for SSRS 2008
Using Cache Execution
Load Balancing your Network
Isolate your workloads
Report Data Performance Considerations

SSAS-Storage Modes in Analysis Services

Why did we only use the MOLAP storage mode in Project REAL? There is also ROLAP and HOLAP storage, but they weren’t used at all. Also, why doesn’t Project REAL illustrate pro-active caching?

The Analysis Services 2005 Performance Guide contains a more complete discussion of the storage modes in Analysis Services in Appendix B. Because the question comes up from time to time regarding Project REAL, this write-up gives a summary.

Every partition in AS has an associated storage mode. MOLAP partitions store aggregations and a copy of the source data (fact and dimension data) in a multidimensional structure on the Analysis server. This is the optimal storage mode for several reasons:
Compression. When relational data is processed into AS, the storage needed is typically 20% to 40% of the size of the un-indexed relational fact table. Less I/O means that the MOLAP data is faster to access.
Multidimensional structure. The format in which AS data is stored is optimized for the kind of access that OLAP users do – access to the data using arbitrary dimensions and arbitrary slices of the dimension.
Data local to AS. There is no need for AS to call out to another service to obtain data when MOLAP storage is used.
Indexes are not needed on relational fact tables. When AS processes fact data, it generally can do a full table scan, so indexes are not needed. When there are multiple partitions in a measure group, there will be one join to the dimension that is used for partitioning, but that’s all. Then at query time, no requests are sent to the relational database, so no indexes are needed for querying. Given that indexes can multiply the size of data in the relational system, the space savings can be significant.

HOLAP partitions store aggregations in a multidimensional structure on the Analysis server, but do not copy fact data from the original relational database. As a result, whenever Analysis Services needs to resolve a query against fact data stored in a HOLAP partition, Analysis Services must query the relational database rather than using a multidimensional structure stored on the Analysis server. Some administrators choose HOLAP because HOLAP appears to require less total storage space while yielding high performance queries. However, the disadvantages almost always outweigh the advantages:
When a query “misses” the aggregations AS will have to query the relational database for fact-level data. This is necessarily slower than getting data from the compressed and natively multidimensional MOLAP storage.
Because of the above, many administrators will create more aggregations to decrease the number of “misses.” At some point the additional aggregations can become larger than the fact data would have been. But even if it doesn’t, more aggregations take more time to process.
Processing of HOLAP partitions is not significantly faster than processing of MOLAP partitions. In both cases the entire fact table must be read into AS in order to create aggregations. The only difference is whether AS also saves the fact data. However if more aggregations were created because of the previous point, processing will be slower.

ROLAP partitions store aggregations in the same relational database that stores the fact data. ROLAP partitions take longer to process, and because all queries must be redirected to the relational database, ROLAP partitions are slower to query against as well.

Given all this, it sounds like only MOLAP should ever be used! For most implementations that is the case. The exception is when near-real-time access is needed to data that changes throughout the day. Even there, pro-active caching is the first technique of choice. However, if even pro-active caching does not allow frequent enough updates, then the use of one ROLAP partition in the measure group is a way to handle the incoming data.

The recommended way to handle near-real-time scenarios is to take advantage of the fact that the storage mode is set on a per-partition basis. A vast majority of data in an OLAP cube does not change throughout the day. That data should be stored in MOLAP partitions. One partition in the measure group is designated to receive updates as they come in. That way, only that partition needs to be updated. This partition can be updated using proactive caching or it can be defined as a ROLAP partition with zero aggregations. In the latter case, all queries to this partition will be sent to the relational source, and current data will always be as current as that table.

For more details about near-real-time data access, see the section “Near Real-Time Data Refreshes” in the Analysis Services 2005 Performance Guide.

An example of how such a partitioning scheme would be updated might look like this:
Fact tables contain weekly partitions in the RDBMS, and there are corresponding weekly partitions in AS.
Each night, data for the previous day is processed into the correct weekly partition.
As new data comes in during the day, it flows into a relational partition for current updates. AS has a near-real-time partition that maps to that relational partition. Either using pro-active caching or a ROLAP partition with zero aggs, users get access to new data throughout the day.
At night, the collected daily data is moved to the weekly partition and processed (either by full process or using incremental update). The near-real-time partition is now ready to collect the next day’s updates.

Returning to the original question: Why did we only use the MOLAP storage mode in Project REAL? The simple answer is that MOLAP storage performs the best. The slightly extended answer is that Project REAL is based on the scenario in the Barnes & Noble data warehouse. The business model centered on daily updates, not near-real-time access. Therefore, there was no need for proactive caching or ROLAP partitions.

SSIS-ETL World Record

Today at the launch of SQL Server 2008, you may have seen the references to world-record performance doing a load of data using SSIS. Microsoft and Unisys announced a record for loading data into a relational database using an Extract, Transform and Load (ETL) tool. Over 1 TB of TPC-H data was loaded in under 30 minutes. I wanted to provide some background material in the form of a Q&A on the record, since it’s hard to give many details in the context of a launch event. We are also planning a paper that talks about all this, so think of this article as a place-holder until the full paper comes along. I hope you find this background information useful.
-- See the below link for more info
http://blogs.msdn.com/sqlperf/archive/2008/02/27/etl-world-record.aspx

Monday, October 12, 2009

SQL collations in SQL SERVER – 2005

SQL collations=>
The SQL collations option is used for compatibility with earlier versions of SQL Server. Select this option to match settings compatible with SQL Server 8.0, 7.0, or earlier. For more information, see Using SQL Collations in SQL Server 2005 Books Online.
Collation designator and sort order==>
Designates the collation to be used by this instance of SQL Server Express. An SQL collation is selected by default for English-language locales. The default collation for non-English locales is the Microsoft Windows locale setting for your computer — the Language for non-Unicode programs setting, or the closest equivalent from Regional and Language Options in Control Panel.
The check boxes under Collation designator and sort order let you specify the sort order to use with the Collation designator you have selected. Binary is the fastest sorting order and is case-sensitive. If Binary is selected, the Case-sensitive, Accent-sensitive, Kana-sensitive, and Width-sensitive options are not available. For more information, see Windows Collation Sorting Styles in SQL Server 2005 Books Online.
Note Change the default settings for the sort order only if your installation of SQL Server must match the collation settings used by another instance of SQL Server, or if it must match the Windows locale of another computer. Good Link:->
http://blog.sqlauthority.com/2007/08/30/sql-server-2005-find-database-collation-using-t-sql-and-ssms/#comment-56641