MSBI (SSIS/SSRS/SSAS) Online Training

Monday, December 28, 2009

MCTS - I passed the 70-445 exam!

Hi All,
I feel happy to inform you about today I cleared 70-445 exam with 89.9%!!!!!!!!!!!!!!!!!!
;-)
Rav’s

Thursday, December 24, 2009

SSIS-Webservice && XML Task - Usage

Hi,
Its a good video to know the usage of Webservice task in SSIS.
http://technet.microsoft.com/en-us/library/cc952927.aspx
In this video, you learned how to call a Web Service by using the Web Service task.
You also learned how to:
Configure an HTTP connection manager.
Configure the Web Service task itself.
Download the WSDL file that describes the Web service.
Call a Web method and supply the expected input values.
Read the return value by using an XML task.
And, use the results from the Web service in the package.
Thanks,Rav's

MSBI-'08 Videos=>> Setup '08,SSIS,SSAS,SSAS-Data Mining,SSRS

'n Joy ;-)
http://technet.microsoft.com/en-us/library/bb418446(SQL.10).aspx

Wednesday, December 23, 2009

MSBI - Useful Videos

http://siddhumehta.blogspot.com/2009/07/few-more-ssis-ssas-and-ssrs-video.html
http://siddhumehta.blogspot.com/2009/12/microsoft-business-intelligence-ssis.html

SSRS-Table Vs Matrix

The report type defines the structure, or data region, of the data returnedby your query. The Report Wizard lets you present this information as eithera table or a matrix, though you have more options outside the wizard. Themain difference between these two types of data regions is the number ofcolumns. A table has a fixed number of columns; a matrix has a variablenumber determined by the query results.
More Details:
http://technet.microsoft.com/en-us/library/ms157334.aspx
http://www.bronios.com/index.php/2008/04/28/ssrs-performances-matrix-control-vs-table-control/

Friday, December 11, 2009

Tuesday, December 8, 2009

SSIS-Performance Tuning Techniques(SSIS Engine Overview,BlockingExecution Trees, Buffer Sizing, Parallelism)

Hi Performance tuning is very important one as a process of data loading using ETL tools.
When you architect data integration solutions, your design decisions not only determine how successfully your solution meets functional requirements, but also how well your solution meets performance requirements. To make the right performance design decisions, you need to understand the performance architecture of your data integration tool and, just as importantly, the techniques that enable you to maximize the tool’s utilization of system resources such as memory and CPU.
Microsoft® SQL Server™ 2005 Integration Services (SSIS) provides full-featured data integration and workflow engines coupled with a rich development environment for building high-performance data integration solutions. SSIS provides a variety of optimization opportunities to help you maximize resource utilization while successfully meeting the needs of your specific data integration scenario.
http://msdn.microsoft.com/en-us/library/cc966529.aspx

http://shujaatsiddiqi.blogspot.com/2008/08/performance-improvement-of-sql-server.html

SSIS-Parallel Execution

Hi,
this is important one to achieve parallel execution using SSIS.

http://shujaatsiddiqi.blogspot.com/2008/10/ssis-multithreading-parallel-execution.html

SQL SERVER-Rebuild Indexes AND Update Statistics

Hi,
Rebuild Indexes and Update statistics is very important to improve the performance of the data.
SQL SERVER 2005 uses ALTER INDEX syntax to reindex database. SQL SERVER 2005 supports DBREINDEX but it will be deprecated in future versions.When any data modification operations (INSERT, UPDATE, or DELETE statements) table fragmentation can occur. DBCC DBREINDEX statement can be used to rebuild all the indexes on all the tables in database. DBCC DBREINDEX is efficient over dropping and recreating indexes.
Execution of Stored Procedure sp_updatestats at the end of the Indexes process ensures updating stats of the database.
We have 2 good scripts to implement to achive our requirements.
Please have a look into this:
http://weblogs.sqlteam.com/tarad/archive/2006/08/14/11194.aspx

http://blog.sqlauthority.com/2007/01/31/sql-server-reindexing-database-tables-and-update-statistics-on-tables/

Thursday, November 5, 2009

SSIS - Difference between Control Flow and Data Flow

http://shahharsh.wordpress.com/2008/05/31/difference-between-control-flow-and-data-flow/

SSIS-Waiting for a file

http://dichotic.wordpress.com/2009/10/20/ssis-waiting-for-a-file/

SSRS - Adding Subreports

A subreport is a report item that displays another report inside the body of a main report. Conceptually, a subreport is similar to a frame in a Web page. It is used to embed a report within a report. Any report can be used as a subreport. The report that the subreport displays is stored on a report server, usually in the same folder as the parent report. You can design the parent report to pass parameters to the subreport. A subreport can be repeated within data regions, using a parameter to filter data in each instance of the subreport.
If you use subreports to display separate groups of data, consider using data regions (tables, matrices, lists, charts, or gauges) instead. Reports with data regions only may perform better than reports that include subreports.
Use data regions when you need to nest groups of data from the same data source within a single data region. Use subreports if you need to nest groups of data from different data sources within a single data region, reuse a subreport in multiple parent reports, or display a standalone report inside of another report. You can create a "briefing book" by placing multiple subreports inside the body of another report.
Note:
In Report Designer, if you preview a report that contains subreports, and then change the subreport, the preview may not be updated. To see the changes, click the Refresh button.
Using Parameters in Subreports
To pass parameters from the parent report to the subreport, define a report parameter in the report that you use as the subreport. When you place the subreport in the parent report, you can select the report parameter and a value to pass from the parent report to the report parameter in the subreport.
Note:
The parameter that you select from the subreport is a report parameter, not a query parameter. For more information about parameters, see Adding Parameters to Your Report.
You can place a subreport in the main body of the report, or in a data region. If you place a subreport in a data region, the subreport will repeat with each instance of the group or row in the data region. To pass a value from the group or row to the subreport, in the subreport value property, use a field expression for the field containing the value you want to pass to the subreport parameter.
For more information about working with subreports, see How to: Add a Subreport and Parameters (Reporting Services).

For More Details:-
Here is the good link aboutSubReport
http://www.simple-talk.com/content/print.aspx?article=278
Thanks,Rav’s

Thursday, October 29, 2009

SSIS - Full-load vs. Delta-Load

Nice discussion o nthis topix.Pls go thru the below URL.
http://social.msdn.microsoft.com/forums/en-US/sqlintegrationservices/thread/e17e8143-e19b-4cff-a1a0-49977cfaede9/
--Cheers, Rav

SSRS-Scale-Out Deployment Best Practices

Check out the the third of five technical note as part of the Building and Deploying Large Scale SQL Server Reporting Services Environments Technical Note Series: Reporting Services Scale-Out Deployment Best Practices
This technical note reviews the
SSRS Scale-Out Architecture
Report Catalog sizing
The benefits of File System snapshots for SSRS 2005
Why File System snapshots may not help for SSRS 2008
Using Cache Execution
Load Balancing your Network
Isolate your workloads
Report Data Performance Considerations

SSAS-Storage Modes in Analysis Services

Why did we only use the MOLAP storage mode in Project REAL? There is also ROLAP and HOLAP storage, but they weren’t used at all. Also, why doesn’t Project REAL illustrate pro-active caching?

The Analysis Services 2005 Performance Guide contains a more complete discussion of the storage modes in Analysis Services in Appendix B. Because the question comes up from time to time regarding Project REAL, this write-up gives a summary.

Every partition in AS has an associated storage mode. MOLAP partitions store aggregations and a copy of the source data (fact and dimension data) in a multidimensional structure on the Analysis server. This is the optimal storage mode for several reasons:
Compression. When relational data is processed into AS, the storage needed is typically 20% to 40% of the size of the un-indexed relational fact table. Less I/O means that the MOLAP data is faster to access.
Multidimensional structure. The format in which AS data is stored is optimized for the kind of access that OLAP users do – access to the data using arbitrary dimensions and arbitrary slices of the dimension.
Data local to AS. There is no need for AS to call out to another service to obtain data when MOLAP storage is used.
Indexes are not needed on relational fact tables. When AS processes fact data, it generally can do a full table scan, so indexes are not needed. When there are multiple partitions in a measure group, there will be one join to the dimension that is used for partitioning, but that’s all. Then at query time, no requests are sent to the relational database, so no indexes are needed for querying. Given that indexes can multiply the size of data in the relational system, the space savings can be significant.

HOLAP partitions store aggregations in a multidimensional structure on the Analysis server, but do not copy fact data from the original relational database. As a result, whenever Analysis Services needs to resolve a query against fact data stored in a HOLAP partition, Analysis Services must query the relational database rather than using a multidimensional structure stored on the Analysis server. Some administrators choose HOLAP because HOLAP appears to require less total storage space while yielding high performance queries. However, the disadvantages almost always outweigh the advantages:
When a query “misses” the aggregations AS will have to query the relational database for fact-level data. This is necessarily slower than getting data from the compressed and natively multidimensional MOLAP storage.
Because of the above, many administrators will create more aggregations to decrease the number of “misses.” At some point the additional aggregations can become larger than the fact data would have been. But even if it doesn’t, more aggregations take more time to process.
Processing of HOLAP partitions is not significantly faster than processing of MOLAP partitions. In both cases the entire fact table must be read into AS in order to create aggregations. The only difference is whether AS also saves the fact data. However if more aggregations were created because of the previous point, processing will be slower.

ROLAP partitions store aggregations in the same relational database that stores the fact data. ROLAP partitions take longer to process, and because all queries must be redirected to the relational database, ROLAP partitions are slower to query against as well.

Given all this, it sounds like only MOLAP should ever be used! For most implementations that is the case. The exception is when near-real-time access is needed to data that changes throughout the day. Even there, pro-active caching is the first technique of choice. However, if even pro-active caching does not allow frequent enough updates, then the use of one ROLAP partition in the measure group is a way to handle the incoming data.

The recommended way to handle near-real-time scenarios is to take advantage of the fact that the storage mode is set on a per-partition basis. A vast majority of data in an OLAP cube does not change throughout the day. That data should be stored in MOLAP partitions. One partition in the measure group is designated to receive updates as they come in. That way, only that partition needs to be updated. This partition can be updated using proactive caching or it can be defined as a ROLAP partition with zero aggregations. In the latter case, all queries to this partition will be sent to the relational source, and current data will always be as current as that table.

For more details about near-real-time data access, see the section “Near Real-Time Data Refreshes” in the Analysis Services 2005 Performance Guide.

An example of how such a partitioning scheme would be updated might look like this:
Fact tables contain weekly partitions in the RDBMS, and there are corresponding weekly partitions in AS.
Each night, data for the previous day is processed into the correct weekly partition.
As new data comes in during the day, it flows into a relational partition for current updates. AS has a near-real-time partition that maps to that relational partition. Either using pro-active caching or a ROLAP partition with zero aggs, users get access to new data throughout the day.
At night, the collected daily data is moved to the weekly partition and processed (either by full process or using incremental update). The near-real-time partition is now ready to collect the next day’s updates.

Returning to the original question: Why did we only use the MOLAP storage mode in Project REAL? The simple answer is that MOLAP storage performs the best. The slightly extended answer is that Project REAL is based on the scenario in the Barnes & Noble data warehouse. The business model centered on daily updates, not near-real-time access. Therefore, there was no need for proactive caching or ROLAP partitions.

SSIS-ETL World Record

Today at the launch of SQL Server 2008, you may have seen the references to world-record performance doing a load of data using SSIS. Microsoft and Unisys announced a record for loading data into a relational database using an Extract, Transform and Load (ETL) tool. Over 1 TB of TPC-H data was loaded in under 30 minutes. I wanted to provide some background material in the form of a Q&A on the record, since it’s hard to give many details in the context of a launch event. We are also planning a paper that talks about all this, so think of this article as a place-holder until the full paper comes along. I hope you find this background information useful.
-- See the below link for more info
http://blogs.msdn.com/sqlperf/archive/2008/02/27/etl-world-record.aspx

Monday, October 12, 2009

SQL collations in SQL SERVER – 2005

SQL collations=>
The SQL collations option is used for compatibility with earlier versions of SQL Server. Select this option to match settings compatible with SQL Server 8.0, 7.0, or earlier. For more information, see Using SQL Collations in SQL Server 2005 Books Online.
Collation designator and sort order==>
Designates the collation to be used by this instance of SQL Server Express. An SQL collation is selected by default for English-language locales. The default collation for non-English locales is the Microsoft Windows locale setting for your computer — the Language for non-Unicode programs setting, or the closest equivalent from Regional and Language Options in Control Panel.
The check boxes under Collation designator and sort order let you specify the sort order to use with the Collation designator you have selected. Binary is the fastest sorting order and is case-sensitive. If Binary is selected, the Case-sensitive, Accent-sensitive, Kana-sensitive, and Width-sensitive options are not available. For more information, see Windows Collation Sorting Styles in SQL Server 2005 Books Online.
Note Change the default settings for the sort order only if your installation of SQL Server must match the collation settings used by another instance of SQL Server, or if it must match the Windows locale of another computer. Good Link:->
http://blog.sqlauthority.com/2007/08/30/sql-server-2005-find-database-collation-using-t-sql-and-ssms/#comment-56641

Wednesday, September 30, 2009

***->MSBI-Books to read****

http://sqlblogcasts.com/blogs/martinisti/archive/2009/03/19/books-to-read.aspx

MSBI - Some IMP URL For this Week

http://blogs.msdn.com/bi/
http://blogs.msdn.com/msbi/default.aspx

http://www.tech-faq.com/
(it is a good site for all basic concepts)

http://sqlblogcasts.com/blogs/martinisti/default.aspx
(70-446)

MSBI-Data Mining

What is data Mining???
Data mining is usually defined as searching, analyzing and sifting through large amounts of data to find relationships, patterns, or any significant statistical correlations. With the advent of computers, large databases and the internet, it is easier than ever to collect millions, billions and even trillions of pieces of data that can then be systematically analyzed to help look for relationships and to seek solutions to difficult problems. Besides governmental uses, many marketers use data mining to find strong consumer patterns and relationships. Large organizations and educational institutions also data mine to find significant correlations that can enhance our society.
Purpose of Data Mining?
Data mining uses a relatively large amount of computing power operating on a large set of data to determine regularities and connections between data points. Algorithms that employ techniques from statistics, machine learning and pattern recognition are used to search large databases automatically. Data mining is also known as Knowledge-Discovery in Databases (KDD).
You can see video about Data Mining from the below link.

http://blog.sqlauthority.com/2009/09/03/sql-server-what-is-data-mining-a-simple-introductory-note/

SSIS Video’s-Some IMP Links

Hi,
I captured some videos from the below blog.
Enjoy;-)

http://blogs.msdn.com/mattm/
blog.
Its really good one.
Please don’t miss it.

SQL PASS 2009
I was just looking at the sessions for the SQL PASS 2009 Summit and was happy to see so many SSIS presentations on the list!
Andy Leonard
Applied SSIS Design Patterns
Steve Simon
SQL Server Integration Services and the modern financial institution
Davide Mauri
Instrumenting, Monitoring and Auditing of SSIS ETL Solutions
Brian Knight
Loading a Data Warehouse with SSIS
Dave Fackler
ETL from the Trenches: Using SSIS in the Real World
Joy Mundy
ETL: The Linchpin for the Complete Data Warehouse
Erik Veerman
Overcoming SSIS Deployment and Configuration Challenges Data Profiling and Cleansing with Integration Services 2008

SSWUG.ORG Ultimate Virtual Conference – April 2009
Samples for the CDC & Merge demo
Avoiding Common Pitfalls in SSIS

Matt Masson - Incremental Data Warehouse Loads with Merge & CDC
John Welch - Doing More (ETL) With Less (Effort) by Automating SSIS
Brian Knight - Loading a Data Warehouse With SSIS
Anthony D’Angelo - Deep Dive: Extending SSIS with .NET Development

SSIS - Tutorial: SSIS Performance Videos

These all are useful links about SSIS-Videos.
This post is coming a little late (these have already been announced here and here), but in case you haven’t seen these before, I wanted to bring attention to four new performance related created by the SQLCAT and SSIS teams. They mostly focus on larger scale/Enterprise level package deployments, but there are good tips for all levels of SSIS usage.
From the SQLCAT post:
Measuring and Understanding the Performance of Your SSIS Packages in the Enterprise (SQL Server Video) Link Watch this video
Author: Denny Lee, Microsoft Corporation
This video demonstrates how to measure and understand the performance of packages, based on lessons learned from enterprise customers. In this video, you will learn the following guidelines for improving performance:
How the limits of the source system affect performance.
Why disk I/O is important.
Why you should establish a package performance baseline
Tuning Your SSIS Package Data Flow in the Enterprise (SQL Server Video) Link Watch this video
Author: David Noor, Microsoft Corporation
This video demonstrates how to improve the performance of the data flow in an Integration Services package. In this video, you will learn how to tune the following phases of the data flow:
Extraction
Transformation
Loading
You can apply these performance tuning tips when you design, develop, and run the data flow.
Understanding SSIS Data Flow Buffers (SQL Server Video) Link Watch this video
Author: Bob Bojanic, Microsoft Corporation
This video looks at the memory buffers that transfer data to the data flow of an Integration Services package. The video includes the following items:
Demonstration of a simple package that shows you how Integration Services divides the data along the data flow paths and how buffers carry data along those paths. This simple package performs the following operations:
Data extraction
Character mapping
Creation of new columns by using a synchronous transformation
Multicasting
Sorting by using a blocking (asynchronous) transformation.
Design guidelines for building and maintaining the data flow of a package.
Designing Your SSIS Packages for Parallelism (SQL Server Video) Link Watch this video
Author: Matt Carroll, Microsoft Corporation
This video demonstrates how to increase the performance of your Integration Services packages by designing for parallelism.
------
These four videos join the growing list of SSIS tutorial videos in the SQL Server Video series. Others that might interest you:
Creating a Basic Package
Exporting SQL Server Data to Excel
How to: Automate SSIS Package Execution by Using the SQL Server Agent
How to: Call a Web Service by Using the Web Service Task
How to: Implement a Lookup Transformation in Full Cache Mode
How to: Use the Data Profiling Task
Troubleshooting: SSIS Package Execution Using SQL Server Agent

SQL Server Performance Tuning

5 Key Areas You Must Focus on to Improve Performance & Availability.
www.Heroix.com

SQL Server – Use Of NOLOCK / ROWLOCK

Hi,
While we are working to retrieve huge data or sum(*) data with (NOLOCK) will give better performance. After some quick searching, I found a great page on Sql-Server-Performance.com. Apparently SQL server puts a locking mechanism around all data access and manipulation to prevent things like dirty reads and the reading of uncommitted data. I was totally not aware of this - SQL server just worked and that was good.
Apparently though, these locks (as with most locking) comes with a perforance hit. This is good though as it ensures the integrity of your data. However, there are times when you just don't care. Some tables, especially look up tables, are not updated often (if ever) and locking around these brings on unnecessary overhead.
ROWLOCKUse row-level locks when reading or modifying data.
PAGLOCKUse page-level locks when reading or modifying data.
TABLOCKUse a table lock when reading or modifying data.
DBLOCKUse a database lock when reading or modifying data.
UPDLOCKUPDLOCK reads data without blocking other readers, and update it later with the assurance that the data has not changed since last read.XLOCKUse exclusive locks instead of shared locks while reading a table, and use hold locks until the end of the statement or transaction.
HOLDLOCKUse a hold lock to hold a lock until completion of the transaction, instead of releasing the lock as soon as the required table, row, or data page is no longer required.
NOLOCKThis does not lock any object. This is the default for SELECT operations. It does not apply to INSERT, UPDATE, and DELETE statements.

Using the SQL directives NOLOCK and ROWLOCK can circumvent this SQL Server's locking mechanism and speed up queries. Here, I demonstrate using it on a SQL Join query in which I get Blog entry information:
Examples:SELECT OrderIDFROM Orders (WITH ROWLOCK)WHERE OrderID BETWEEN 100AND 2000
UPDATE Products (WITH NOLOCK)SET ProductCat = 'Machine'WHERE ProductSubCat = 'Mac'

· SELECT
· b.id,
· b.name,
· ( t.id ) AS tag_id,
· ( t.name ) AS tag_name
· FROM
· blog_entry b
· INNER JOIN
· blog_entry_tag_jn btjn (NOLOCK)
· ON
· b.id = btjn.blog_entry_id
· INNER JOIN
· tag t (NOLOCK)
· ON
· btjn.tag_id = t.id
Notice that by using the NOLOCK directive on the blog_entry_tag_jn and the tag Tables. The Tag table pretty much never gets updated and the blog_entry_tag_jn (joining of entries to tags) table gets updated ONLY when add or update a blog entry. Due to the low frequency of updates, the requirement for locking on these tables is (next to) pointless. By using the NOLOCK directive I am asking SQL to ignore all locking mechanism surroundings those tables and proceed directly to data-retrieval. Theoretically, this should provide a small performance gain
Thanks
Rav’s.

Tuesday, August 18, 2009

SQL SERVER – How to Find List of Primary Keys and Foreign Keys in a Database

Good Article from Pinal.

http://blog.sqlauthority.com/2009/07/17/sql-server-two-methods-to-retrieve-list-of-primary-keys-and-foreign-keys-of-database/

SQL SERVER – MSBI – Design Process Decision Flow

MicrosoftMicrosoft Business Intelligence offers a complete suite of programs that supports all facets of decision-making. Through tight integration with the Microsoft SQL Server 2005 platform, Microsoft provides comprehensive business intelligence (BI) capabilities that deliver the right information, at the right time, and in the right format. It has emerged in the Leaders Quadrant in
MS BI expertise which include MS BI Program Implementation Strategy, MS Reporting/OLAP implementation Roadmap, SSIS Implementation Roadmap, SQL Server 2005 Upgrade Roadmap, SSIS Migration Roadmap and MS Scorecard.
Leveraging the entire MS BI technology stack,.
We have expertise in MS BI products such as MS SQL Server, MS SQL Server Analysis Services, MS SQL Server Reporting Services, MS SQL Server Integration Services, Microsoft Office Business Scorecard Manager, Microsoft PerformancePoint Server and MS SharePoint Portal.
MS BI Services include:
· BI Readiness Assessment, Tool Evaluation (ETL & OLAP) and Tool Standardization.
· Design and Implementation of ETL strategy using MS SQL Server Integration Services with extensive custom programmability on Microsoft SQL Server Business Intelligence Development Studio and SQL Server Management Studio.
· Design and development of reports using MS SQL Server Reporting Services (Enterprise, ad-hoc, web-based and embedded reports) and reporting on Microsoft SharePoint Portal.
· Design and development of analytical solutions and design of multi-dimensional cubes for OLAP analysis applying dimensional modelling approach using MS SQL Server Analysis Services
· Product Upgradation for SQL Server with detailed platform change analysis, enterprise security framework and step-wise comprehensive approach
· Migration and consolidation of data from legacy VSAM, ERP and web portal with SQL Server encompassing ETL activities with MS SQL Server Integration Services and progress reporting with MS SQL Server Reporting Services with tight integration of .NET and SQL Server stored procedures.
· Web based reporting using advance visualization, dashboard and analytics through Microsoft PerformancePoint Server.



http://blog.sqlauthority.com/2009/07/28/sql-server-2008-design-process-decision-flow/

SQL Server 2005 – Use Of Import/Export wizard

As per the requirement we can easily transfer the data from one table to another table in same server as well in other server by using the import/export wizard.

Schema and as well as data we can transfer at a time using the SQL Server 2008 option.

See for more Details:

http://blog.sqlauthority.com/2009/07/29/sql-server-2008-copy-database-with-data-generate-t-sql-for-inserting-data-from-one-table-to-another-table/

SQL Server 2005 – How Much Space Does My Database Use?

Hi it’s very important to identify the following things as sql server developer. Like-
how large your database is, how full the files are, which tables and indexes use the most space, which tables might be dormant, etc…
Here is the link for excellent script from Glen Berry.

http://glennberrysqlperformance.spaces.live.com/blog/cns!45041418ECCAA960!1619.entry

SQL Server 2005 – Job History

Hi
Sql Server jobs are writing log info into the system tables.
If we get struck at any Job or related steps,by using the Sys.job history we can easily identify the problem is.It will give the description at high level.(success,failure,stopped all the info)

By using the following query we can get complete job info from sys tables.

SELECT a.name
,count(a.name ) as CNT
,b.[step_id]
,b.[step_name]
,b.[message]
,b.[run_date]
,b.[run_status]
,b.run_time
,b.run_duration
FROM [sysjobhistory] b JOIN sysjobs a
ON a.job_id = b.job_id

WHERE b.[run_date] ='20090101'
and b.[run_date] <='20090818'



See the below link.
http://sqlservernation.com/blogs/tipweek/archive/2009/02/25/viewing-sql-server-jobs-and-history.aspx

SQL Server 2005 – Use “FileStream”

We are getting problem while we are inserting the data into database for especially datatype of varbinary and image .Instead of using those datatype we have one more better option like - Filestream storage type. As you are probably aware, varbinary and image are very difficult data types to handle and it is a pain to manage them. Filestream storage overcomes all the limitations and has emerged as a true winner in terms of storage.
With FILESTREAM, the SQL Server team not only added a feature to handle unstructured data, but also made sure that it smoothly integrates with many of the existing features of SQL Server.
· FILESTREAM feature is available with all versions of SQL Server 2008, including SQL Server Express.
· SQL Server Express database has a 4 GB limitation; however this limitation does not apply to the FILESTREAM data stored in a SQL Server Express database.
· FILESTREAM Columns can be replicated.
· FILESTREAM enabled databases can be used with LOG Shipping
· FILESTREAM columns can be used in Full Text Indexes
· FILESTREAM works with all recovery models
· FILESTREAM File Groups can be placed on compressed disk volumes
· The maximum size of the file that can be stored into the FILESTREAM data storage is limited by the size of the disk volume only.

I have been added some more useful links here.

http://blog.sqlauthority.com/2009/07/13/sql-server-blob-pointer-to-image-image-in-database-filestream-storage/
http://www.simple-talk.com/sql/sql-training/an-introduction-to-sql-server-filestream/

Sunday, July 26, 2009

Sql Server Links

---------------SQL Server 2005---------------http://www.accelebrate.com/sql_training/ssis_tutorial.htmhttp://www.accelebrate.com/sql_training/ssas_tutorial.htmhttp://www.accelebrate.com/sql_training/ssrs_tutorial.htm
---------------SQL Server 2008---------------
http://www.accelebrate.com/sql_training/ssis_2008_tutorial.htmhttp://www.accelebrate.com/sql_training/ssas_2008_tutorial.htmhttp://www.accelebrate.com/sql_training/ssrs_2008_tutorial.htm

SSIS - Adventure Approach - Best Practice

Integration Services Tutorial
Microsoft SQL Server 2005 Integration Services (SSIS) is a platform for building high performance data integration solutions, including the extraction, transformation, and load (ETL) packages for data warehousing. Integration Services includes graphical tools and wizards for building and debugging packages; tasks for performing workflow functions such as FTP operations, executing SQL statements, and sending e-mail messages; data sources and destinations for extracting and loading data; transformations for cleaning, aggregating, merging, and copying data; a management service, Integration Services Service, for administering Integration Services; and application programming interfaces (APIs) for programming the Integration Services object model.
In this tutorial, you will learn how to use SSIS Designer to create a simple Microsoft SQL Server 2005 Integration Services package. The package that you create takes data from a flat file, reformats the data, and then inserts the reformatted data into a fact table. In following lessons, the package will be expanded to demonstrate looping, package configurations, logging and error flow.
What You Will Learn
The best way to become acquainted with the new tools, controls and features available in Microsoft SQL Server 2005 Integration Services is to use them. This tutorial walks you through SSIS Designer to create a simple ETL package that includes looping, configurations, error flow logic and logging.
Lessons
Description
Lesson 1: Creating a Simple ETL Package
In this lesson, you will create a simple ETL package that extracts data from a single flat file, transforms the data using lookup transformations and finally loads the result into a fact table destination.
Lesson 2: Adding Looping
In this lesson, you will expand the package you created in Lesson 1 to take advantage of new looping features to extract multiple flat files into a single data flow process.
Lesson 3: Adding Package Configurations
In this lesson, you will expand the package you created in Lesson 2 to take advantage of new package configuration options.
Lesson 4: Adding Logging
In this lesson, you will expand the package you created in Lesson 3 to take advantage of new logging features.
Lesson 5: Adding Error Flow Redirection
In this lesson, you will expand the package you created in lesson 4 to take advantage of new error output configurations.
Requirements
This tutorial is intended for users familiar with fundamental database operations, but who have limited exposure to the new features available in SQL Server 2005 Integration Services.
To use this tutorial, your system must have the following components installed:
· SQL Server 2005 with the AdventureWorksDW database. To enhance security, the sample databases are not installed by default. To install the sample databases, see Running Setup to Install AdventureWorks Sample Databases and Samples.
· This tutorial also requires sample data. The sample data is installed together with the samples. If you cannot find the sample data, return to the procedure above and complete installation as described.

In this lesson, you will create a simple ETL package that extracts data from a single flat file source, transforms the data using two lookup transformation components, and writes that data to the FactCurrencyRate fact table in AdventureWorksDW. As part of this lesson, you will learn how to create new packages, add and configure data source and destination connections, and work with new control flow and data flow components.
Understanding the Package Requirements
Before creating a package, you need a good understanding of the formatting used in both the source data and the destination. Once you understand both of these data formats, you are then able to define the transformations necessary to map the source data to the destination.
Looking at the Source
For this tutorial, the source data is a set of historical currency data contained in the flat file, SampleCurrencyData.txt. The source data has the following four columns: the average rate of the currency, a currency key, a date key, and the end-of-day rate.
Here is an example of the source data contained in the SampleCurrencyData.txt file:
1.00010001 ARS 9/3/2001 0:00 0.99960016
1.00010001 ARS 9/4/2001 0:00 1.001001001
1.00020004 ARS 9/5/2001 0:00 0.99990001
1.00020004 ARS 9/6/2001 0:00 1.00040016
1.00050025 ARS 9/7/2001 0:00 0.99990001
1.00050025 ARS 9/8/2001 0:00 1.001001001
1.00050025 ARS 9/9/2001 0:00 1
1.00010001 ARS 9/10/2001 0:00 1.00040016
1.00020004 ARS 9/11/2001 0:00 0.99990001
1.00020004 ARS 9/12/2001 0:00 1.001101211

When working with flat file source data, it is important to understand how the Flat File connection manager interprets the flat file data. If the flat file source is Unicode, the Flat File connection manager defines all columns as [DT_WSTR] with a default column width of 50. If the flat file source is ANSI-encoded, the columns are defined as [DT_STR] with a column width of 50. You will probably have to change these defaults to make the string column types more appropriate for your data. To do this, you will need to look at the data type of the destination where the data will be written to and then choose the correct type within the Flat File connection manager.
Looking at the Destination
The ultimate destination for the source data is the FactCurrencyRate fact table in AdventureWorksDW. The FactCurrencyRate fact table has four columns, and has relationships to two dimension tables, as shown in the following table.
Column Name
Data Type
Lookup Table
Lookup Column
AverageRate
float
None
None
CurrencyKey
int (FK)
DimCurrency
CurrencyKey (PK)
TimeKey
Int (FK)
DimTime
TimeKey (PK)
EndOfDayRate
float
None
None
Mapping Source Data to be Compatible with the Destination
Analysis of the source and destination data formats indicates that lookups will be necessary for the CurrencyKey and TimeKey values. The transformations that will perform these lookups will obtain the CurrencyKey and TimeKey values by using the alternate keys from DimCurrency and DimTime dimension tables.
Flat File Column
Table Name
Column Name
Data Type
0
FactCurrencyRate
AverageRate
Float
1
DimCurrency
CurrencyAlternateKey
nchar (3)
2
DimTime
FullDateAlternateKey
Datetime
3
FactCurrencyRate
EndOfDayRate
Float

The first step in creating a package in Microsoft SQL Server 2005 Integration Services (SSIS) is to create an Integration Services project. This project includes the templates for the objects — data sources, data source views, and packages — that you use in a data transformation solution.
To create a new Integration Services project
1. On the Start menu, point to All Programs, point to Microsoft SQL Server 2005, and click SQL Server Business Intelligence Development Studio.
2. On the File menu, point to New, and click Project to create a new Integration Services project.
3. In the New Project dialog box, select Integration Services Project in the Templates pane.
4. In the Name box, change the default name to SSIS Tutorial.
5. Click OK.
By default, an empty package, titled Package.dtsx, will be created and added to your project.
6. In the Solution Explorer toolbar, right-click Package.dtsx, click Rename, and rename the default package to Lesson 1.dtsx.
7. (Optional) When prompted to rename the package object, click Yes.
In this task, you add a Flat File connection manager to the package that you just created. A Flat File connection manager enables a package to extract data from a flat file. Using the Flat File connection manager, you can specify the file name and location, the locale and code page, and the file format, including column delimiters, to apply when the package extracts data from the flat file. In addition, you can manually specify the data type for the individual columns, or use the Suggest Column Types dialog box to automatically map the columns of extracted data to Integration Services data types.
You must create a new Flat File connection manager for each file format that you work with. Because this tutorial extracts data from multiple flat files that have exactly the same data format, you will need to add configure only one Flat File connection manager to your package.
For this tutorial, you will configure the following properties in your Flat File connection manager:
· Column names Because the flat file does not have column names, the Flat File connection manager creates default column names. These default names are not useful for identifying what each column represents. To make these default names more useful, you need to change the default names to names that match the fact table into which the flat file data is to be loaded.
· Data mappings The data type mappings that you specify for the Flat File connection manager will be used by all flat file data source components that reference the connection manager. You can either manually map the data types by using the Flat File connection manager, or you can use the Suggest Column Types dialog box. In this tutorial, you will view the mappings suggested in the Suggest Column Types dialog box and then manually make the necessary mappings in the Flat File Connection Manager Editor dialog box.
To add a Flat File connection manager
1. Right-click anywhere in the Connection Managers area, and then click New Flat File Connection.
2. In the Flat File Connection Manager Editor dialog box, for Connection manager name, type Sample Flat File Source Data.
3. Click Browse.
4. In the Open dialog box, browse to the sample data folder and open the SampleCurrencyData.txt file. By default, the tutorial sample data is installed to the c:\Program Files\Microsoft SQL Server\90\Samples\Integration Services\Tutorial\Creating a Simple ETL Package\Sample Data folder.
To rename columns in the Flat File connection manager
1. In the Flat File Connection Manager Editor dialog box, click Advanced.
2. In the property pane, makes the following changes:
· Change the Column 0 name property to AverageRate.
· Change the Column 1 name property to CurrencyID.
· Change the Column 2 name property to CurrencyDate.
· Change the Column 3 name property to EndOfDayRate.
Note:
By default, all four of the columns are initially set to a string data type [DT_STR] with an OutputColumnWidth of 50.
To remap column data types
1. In the Flat File Connection Manager Editor dialog box, click Suggest Types.
Integration Services automatically suggests data types based on the first 100 rows of data, but you can change the suggestion options to sample more or less data, specify the default data type for integer or Boolean data, or add spaces added as padding to string columns.
For now, make no changes, and click Cancel to return to the Advanced pane of the Flat File connection Manager Editor dialog box and view the suggested column data types.
In this tutorial, Integration Services suggests the data types shown in the second column of the table below for the data from the SampleCurrencyData.txt file. However, the data types that are required for the columns in the destination, which will be defined in a later step, are shown in the last column of the following table.
Flat File Column
Suggested Type
Destination Column
Destination Type
AverageRate
Float [DT_R4]
FactCurrencyRate.AverageRate
Float
CurrencyID
String [DT_STR]
DimCurrency,CurrencyAlternateKey
nchar(3)
CurrencyDate
Date [DT_DATE]
DimTime.FullDateAlternateKey
datetime
EndOfDayRate
Float [DT_R4]
FactCurrencyRate.EndOfDayRate
Float
The data types suggested for the CurrencyID and CurrencyDate columns are not compatible with the data types of the fields in the destination table. Because the data type of DimCurrency.CurrencyAlternateKey is nchar (3), CurrencyID needs to be changed from string [DT_STR] to string [DT_WSTR]. Additionally, the field DimTime.FullDateAlternateKey is defined as a DateTime data type; therefore, CurrencyDate needs to be changed from date [DT_Date] to database timestamp [DT_DBTIMESTAMP].
2. In the property pane, change the data type of column CurrencyID from string [DT_STR] to Unicode string [DT_WSTR].
3. In the property pane, change the data type of column CurrencyDate from date [DT_DATE] to database timestamp [DT_DBTIMESTAMP].
4. Click OK.


After you have added a Flat File connection manager to connect to the data source, the next task is to add an OLE DB connection manager to connect to the destination. An OLE DB connection manager enables a package to extract data from or load data into any OLE DB–compliant data source. Using the OLE DB Connection manager, you can specify the server, the authentication method, and the default database for the connection.
In this lesson, you will create an OLE DB connection manager that uses Windows Authentication to connect to the local instance of AdventureWorksDB. The OLE DB connection manager that you create will also be referenced by other components that you will create later in this tutorial, such as the Lookup transformation and the OLE DB destination.
To add and configure an OLE DB Connection Manager
1. Right-click anywhere in the Connection Managers area, and then click New OLE DB Connection.
2. In the Configure OLE DB Connection Manager dialog box, click New.
3. For Server name, enter localhost.
When you specify localhost as the server name, the connection manager connects to the default instance of Microsoft SQL Server 2005 on the local computer. To use a remote instance of SQL Server 2005, replace localhost with the name of the server to which you want to connect.
4. In the Log on to the server group, verify that Use Windows Authentication is selected.
5. In the Connect to a database group, in the Select or enter a database name box, type or select AdventureWorksDW.
6. Click Test Connection to verify that the connection settings you have specified are valid.
7. Click OK.
8. Click OK.
9. In the Data Connections pane of the Configure OLE DB Connection Manager dialog box, verify that localhost.AdventureWorksDW is selected.
10. Click OK.
After you have created the connection managers for the source and destination data, the next task is to add a Data Flow task to your package. The Data Flow task encapsulates the data flow engine that moves data between sources and destinations, and provides the functionality for transforming, cleaning, and modifying data as it is moved. The Data Flow task is where most of the work of an extract, transform, and load (ETL) process occurs.
Note:
Microsoft SQL Server 2005 Integration Services separates data flow from control flow. This separation of data flow from control flow is one of the dramatic differences between Integration Services and Microsoft SQL Server 2000 Data Transformation Services.
To add a Data Flow task
1. Click the Control Flow tab.
2. In the Toolbox, expand Control Flow Items, and drag a Data Flow Task onto the design surface of the Control Flow tab.
3. On the Control Flow design surface, right-click the newly added Data Flow Task, click Rename, and change the name to Extract Sample Currency Data.
It is good practice to provide unique names to all components that you add to a design surface. For ease of use and maintainability, the names should describe the function that each component performs. Following these naming guidelines allows your Integration Services packages to be self-documenting. Another way to document your packages is by using annotations. For more information about annotations, see Using Annotations in Packages.
In this task, you will add and configure a Flat File source to your package. A Flat File source is a data flow component that uses metadata defined by a Flat File connection manager to specify the format and structure of the data to be extracted from the flat file by a transform process. The Flat File source can be configured to extract data from a single flat file by using the file format definition provided by the Flat File connection manager.
For this tutorial, you will configure the Flat File source to use the Sample Flat File Source Data connection manager that you previously created.
To add a Flat File Source component
1. Open the Data Flow designer, either by double-clicking the Extract Sample Currency Data data flow task or by clicking the Data Flow tab.
2. In the Toolbox, expand Data Flow Sources, and then drag a Flat File Source onto the design surface of the Data Flow tab.
3. On the Data Flow design surface, right-click the newly added Flat File Source, click Rename, and change the name to Extract Sample Currency Data.
4. Right-click the Flat File source to open the Flat File Source Editor dialog box.
5. In the Flat file connection manager box, type or select Sample Flat File Data.
6. Click Columns and verify that the names of the columns are correct.
7. Click OK.
After you have configured the Flat File source to extract data from the source file, the next task is to define the Lookup transformations needed to obtain the values for the CurrencyKey and TimeKey. A Lookup transformation performs a lookup by joining data in the specified input column to a column in a reference dataset. The reference dataset can be an existing table or view, a new table, or the result of an SQL statement. The Lookup transformation uses an OLE DB connection manager to connect to the database that contains the data that is the source of the reference dataset.
For this tutorial, you will add and configure the following two Lookup transformation components to the package:
· One transformation to perform a lookup of values from the CurrencyKey column of the DimCurrency dimension table based on matching CurrencyID column values from the flat file.
· One transformation to perform a lookup of values from the TimeKey column of the DimTime dimension table based on matching CurrencyDate column values from the flat file.
In both cases, the Lookup transformations will utilize the OLE DB connection manager that you previously created.
To add and configure the CurrencyKey Lookup transformation
1. In the Toolbox, expand Data Flow Transformations, and then drag Lookup onto the design surface of the Data Flow tab.
2. Click the Extract Sample Currency Data flat file source and drag the green arrow onto the newly added Lookup transformation to connect the two components.
3. On the Data Flow design surface, right-click the newly added Lookup transformation, click Rename, and change the name to Lookup Currency Key.
4. Double-click the Lookup Currency Key transformation.
5. In the Lookup Transformation Editor dialog box, in the OLE DB connection manager box, ensure that localhost.AdventureWorksDW is displayed.
6. In the Use a table or view box, type or select [dbo].[DimCurrency].
7. Click the Columns tab.
8. In the Available Input Columns panel, drag CurrencyID to the Available Lookup Columns panel and drop it on CurrencyAlternateKey.
9. Select CurrencyKey.
10. Click OK.
To add and configure the DateKey Lookup transformation
1. In the Toolbox, drag Lookup onto the Data Flow design surface.
2. Click the Lookup Currency Key transformation and drag the green arrow onto the newly added Lookup transformation to connect the two components.
3. On the Data Flow design surface, right-click the newly added Lookup transformation, click Rename, and change the name to Lookup Date Key.
4. Double-click the Lookup Date Key transformation.
5. In the Lookup Transformation Editor dialog box, in the OLE DB connection manager box, ensure that localhost.AdventureWorksDW is displayed.
6. In the Use a table or view box, type or select [dbo].[DimTime].
7. Click the Columns tab.
8. In the Available Input Columns panel, drag CurrencyDate to the Available Lookup Columns panel and drop it on FullDateAlternateKey.
9. Select TimeKey.
10. Click OK.
Your package now can extract data from the flat file source and transform that data into a format that is compatible with the destination. The next task is to actually load the transformed data into the destination. To load the data, you must add an OLE DB destination to the data flow. The OLE DB destination can use a database table, view, or an SQL command to load data into a variety of OLE DB-compliant databases.
In this procedure, you add and configure an OLE DB destination to use the OLE DB connection manager that you previously created.
To add and configure the Sample OLE DB destination
1. In the Toolbox, expand Data Flow Destinations, and drag OLE DB Destination onto the design surface of the Data Flow tab.
2. Click the Lookup Date Key transformation and drag the green arrow over to the newly added OLE DB Destination to connect the two components together.
3. On the Data Flow design surface, right-click the newly added OLE DB Destination component, click Rename, and change the name to Sample OLE DB Destination.
4. Double-click Sample OLE DB Destination.
5. In the OLE DB Destination Editor dialog box, ensure that localhost.AdventureWorksDW is selected in the OLE DB Connection manager box.
6. In the Name of the table or the view box, type or select [dbo].[FactCurrencyRate].
7. Click Mappings.
8. Verify that the input columns are mapped correctly to the destination columns.
9. Click OK.
In this lesson, you have done the following tasks:
· Created a new Microsoft SQL Server 2005 Integration Services (SSIS) project.
· Configured the connection managers that the package needs to connect to the source and destination data.
· Added a data flow that takes the data from a flat file source, performs the necessary Lookup transformations on the data, and configures the data for the destination.
Your package is now complete! It is time to test your package.
To run the Lesson 1 tutorial package
1. On the Debug menu, click Start Debugging.
The package will run, resulting in 1097 rows successfully added into the FactCurrency fact table in AdventureWorksDW.
2. After the package has completed running, on the Debug menu, click Stop Debugging

DataMining Concepts

Module 8: Data Mining




Data Mining........................................................................................ 2
Introduction........................................................................................ 2
Demonstration 1: Browse Data Mining Models..................................... 2
Demonstration 2: Add DM Viewers to a Web Page............................... 5

Data Mining
Introduction
This section provides an overview of the data mining features in SQL Server 2005. The following features are covered in the demonstrations:
· Naïve Bayes Model
· Decision Tree Model
· Association Model
· Data Mining Viewer
Demonstration 1: Browse Data Mining Models
This demonstration introduces you to mining models through the built-in browsers for developers and administrators. You will explore a Naïve Bayes mining model to review attribute characteristics, attribute discrimination, and dependencies. You will also explore a Decision Tree model, an Association model, and compare the predictive capability of multiple models.
#
Task
Notes
1
Browse the Naïve Bayes model of the Targeted Mailing mining structure of the Adventure Works DW database

Before you start, you need to make sure you have the AdventureWorks DW Analysis Services database installed if you did not already deploy this database during an earlier module.:

· In Business Intelligence Development Studio, open the Adventure Works solution at C:\Program Files\Microsoft SQL Server\90\Tools\Samples\1033\awasdb\AdventureWorksAS.slnbi.
· Deploy the project.

In SQL Server Management Studio (SSMS):
Connect to the local Analysis Server.
Open the Mining Structures folder of Adventure Works DW.
Open the Targeted Mailing mining structure.
Each mining structure is a set of mining models that goes against a single data source. You can use the same source as shown in the Targeted Mailing structure, which has four different mining models. By contrast, the Market Basket structure has only one. Think of the mining structure as the problem to solve, with each model as an option to consider for solving the problem.
Right-click TM_Naive_Bayes and click Browse.
Naïve Bayes is the simplest model, so it’s a good one to start with.
2
Review the Attribute Characteristics of the model
The Targeted Mailing “problem” is to predict who will buy a bike. Using a training set of known bike buyers, each record is flagged with a value where 1 is assigned to people who did buy bikes and 0 is assigned to people who did not. Use the Attribute Characteristics to see how attributes compare with the population for each group – bike buyers and non-bike buyers.

§ Select the Attribute Characteristics tab, and choose 1 in the Value list
This shows the most frequent attribute states (in descending order) for bike buyers (Bike Buyer = 1). Note that Number Children at Home = 0 is at the top of the list. It is not necessarily the best predictor, because it is also high for non-bike buyers. This is an example where correlation does not imply causation.
§ Change the Value option to 0 (non-bike buyers).
Notice that Number Children at Home = 0 is still at the top of the list. The population of the sample just doesn’t have many kids.
3
Review the Attribute Discrimination of the model
Attribute Discrimination is a one-dimensional sequence of the importance of different attributes. Naïve Bayes is a fast and simple model to use to compare attributes.

§ Select the Attribute Discrimination tab, with 1 for Value 1 and 0 for Value 2:
This shows which attributes are the best predictors (discriminators) for buying bikes. Not having a car is the highest positive predictor, followed closely age in the mid-30s. Having 2 cars means you’re unlikely to buy a bike. Notice that Number of Children at Home doesn’t show up for a while.
4
Review the Dependency Network
§ Select the Dependency Network tab.

§ In the Dependency Network, click Bike Buyer.

The nodes change colors to show the prediction relationship. This becomes more useful as models become more complex. In this example, the network is very simple. All attributes here are predicting the bike buyer value. Nodes that predict BikeBuyer are indicated by color, which you can identify by the legend.

§ Click Number of Cars Owned.

Now you can see what this attribute predicts. As you can see, because this model is simple, the relationships are easy to see.

§ Gradually drag the Links bar down.
Weaker links drop off the model. The strongest predictors are Age and Number Cars Owned.

5
Browse the Decision Tree model of the Targeted Mailing mining structure of the Adventure Works DW database

Right-click TM_Decision_Tree and click Browse.
Decision Tree examines relationships in a more complex way than Naïve Bayes. It looks more closely at the interplay between attributes.
§ Choose TM_Decision_Tree in the model list.
The decision tree shows all the possible values of the strongest variable—number of cars owned. But for people who own 3 cars, the second most important factor is Income, while for people who own 1 car, the second most important factor is Age. The decision tree allows you to see how different factors interrelate to create discrete groups within the population.
§ In the Background drop-down, choose 1.
This shades the background darker based on the percent of cases that meet the condition—in this case, bike buyers.
§ Hover the mouse over the three Yearly Income boxes for Number Cars Owned = 3.
There are not as many rich people (income > 122,000) in this group, but the ratio of bike buyers is very high.
6
Browse the Association mining structure
Association is very different from the prediction models. This model is checking for which products are purchased together. Sometimes one, two, or three different products are purchased together.
Open the Market Basket structure, right-click Association and click Browse.
§ Start with the Itemsets tab.
This shows a list of “shopping carts” along with how many times each happened. Lots of people just bought a single Sport-100.
§ Change the Minimum Itemset Size to 3.
This model shows only “shopping carts” that had at least three items purchased together.
§ In the Filter Itemset box, type water.
This shows only shopping carts that included something to do with water. In this case, the sales are primarily water bottles.
§ Change to the Rules tab.
Probability just shows how often a combination occurs, but not whether one item is a good predictor of another. For a while, everybody who bought a Betty Crocker book also bought a Harry Potter book, but that doesn’t mean that purchasing a Betty Crocker book was an important predictor for purchasing a Harry Potter book.
Remember that the goal is to find which product purchases predict the others. For example, if someone bought a Road Bottle Cage and a Cycling Cap, then bought a Water Bottle, there is 100% probability that the first two strongly predict the third.
§ Sort in descending order of Importance.
This view shows the combinations that the model judges to be the most use in making predictions. Its purpose is to measure the competition between other products that could predict purchases.
§ Select the Dependency Network, drag the Links bar to the middle, and choose Water Bottle = Existing.
Notice that this dependency network is much more complex than the previous one you viewed. A hydration pack is likely to be accompanied by a water bottle, but just because you bought a water bottle, you won’t necessarily buy a Hydration Pack.
7
Compare the mining models of the Targeted Mailing mining structure
When a mining structure contains more than one mining model, you can compare their predictive ability.
§ Close the mining viewer.
§ Right-click the Targeted Mailing structure, and choose Lift Chart.
§ As the Case table (also known as Input Table), select the Case Table vTargetMail.
This is the same view that was used to generate the models, so the columns should match. A common practice is to divide the data in half and use half to train the model and the other half to test it.
§ In the Predict Value column of the grid, choose 1.
This tests how well the models predict bike buyers, rather than how the models work in general.
§ Select the Lift Chart tab.
The Red line is theoretically perfect model. The Yellow line is pure chance. The lines in between are the four models in the mining structure. The “best” model for this data appears to be Decision Tree, while the “worst” model appears to be Clustering.
Note:
You may receive a permissions error at this point. If so, implement the following workaround:
Right-click on the Roles folder underneath the AdventureWorksAS database and add a new role. Change the membership of the role by adding the ‘Everyone group.
On the Data Source Access page give read/write access to adventure_works_dw_3.0
On the Mining Structure page give read access to Targeted Mailing
Underneath Target Mailing, give read access to each Mining Model in turn.
Please note that this is not a recommended security practice.
Demonstration 2: Add DM Viewers to a Web Page
In this demonstration, you will learn how to add a data mining viewer to a web page.
#
Task
Notes
1
Run the DM Viewer application
In Visual Studio 2005:
Open the DM Viewer solution as a Web Site (File/Open/Web Site) in the folder C:\BID\Finished\08 DM\DM Viewer folder.
Press F5 to run the page.
This is a simple application that you can include in your own application with very little work. Just add the control and set the properties. It’s not beautiful, but it’s functional. You may find that the colored bars do not appear in this version of the control
2
Review the Default.aspx code
Open the Default.aspx page.
View the Design page.
Select the data mining viewer control on the page, and then review properties in the Properties window.
You can see that this application is pretty simple. DM Viewer uses a self-contained control. Just set the database, mining model, server, and add a few attributes.
3
Add a second copy of control to the Default.aspx page
§ Right-click on Toolbox window and select Choose Items.
§ Click the Browse button to add the DLL to your toolbox from C:\BID\Finished\08 DM\DM Viewer.
§ Drag the DMNaiveBayesViewer onto the Default.aspx page.
§ As the database, assign Adventure Works DW.
§ As the model, assign TM_Naive_Bayes.
§ Run the project to see the second copy of the control.

You can learn more about the controls if you first install the Analysis Services samples and then look at the Data Mining samples found in C:\Program Files\Microsoft SQL Server 2005 Samples\Analysis Services\DataMining\Data Mining Web Controls.

SSAS - Creation Of First Cube

SQL Server Analysis Services:

Open SQL Server Business Intelligence Development Studio.


This is the software to create ETL packages, Cubes etc. We have to create a new project here. Types of projects can be seen in the below window.
Click on Analysis Services Project and create a new project. We can create a folder in local system and create a directory for the project.



The project window appears as below. In the Solution Explorer we can see the structure which contains Data Sources etc. We can create cube, dimension, connection to data sources etc.
Creating a Data Sources:
Right click on Data Sources, create a New Data Source

Click Next and select New, here we can define the connection of data base

Here we can see where the data located, you can give server name.
Here I selected local machine server name: BLRDXP-SGOURISH
Specify the Authentication and select the database name (select sample database name)
AdwentureWorkDW.

Test the connection then click ok.

Here we can see the data source name then click on Next.














You can define what credentials Analysis Service will use to connect to the data source.
Select Use the service account.

Click on Next
Here we can give the data source name. (I given SSAS_DS)
Click on Finish button.
Create new Data Source View:
In Data source view, we can add tables which can be used for development. We need to pull the tables from the database and put them in the database view. We can connect to multiple data sources and pull the tables, for that we need to add new data sources to the project.










In solution explorer right click on the Data Source Views and select New Data Source View.


We will get Data Source View Wizard window
Click on Next




Select the created Data Souce.Then click on Next
You can see all the tables and views in selected data source.
Select required tables in to included objects window then click on Next


Here I selected five tables then click on Next then give the data source name.
Then click on Finish button.

After this you can see the diagram which is having the relation ship between tables.



The arrow marks in the design window represents the relationships. Double click on the arrow mark to see the relationship window




Right click on a table and click on explore data to see the data present in the database for that table














Here we can create additional table, logical relationships and new named queries.

Here we can customize the DSV in two ways
1. Create calculated columns.
2. Replace with Named query.

Named query is better because it more flexible adding joins and filters.
When convert to named query, doesn’t bring calculated columns among.
Create Dimensions:
Right click on the dimensions and select new dimension




The new dimension wizard will appear


Select the Built method

When u building a cube we have an option auto build
It will built attributes and hierarchies automatically.
But here I uncheck the auto build check box then click on Next





Here we have an option selecting the dimension type, and select standard dimension

Standard dimension, example: product, employee.
Time dimension, we have year, qtr, month from a database table
Server time dimension: based on the server date we can create.

Select the key column and select the member description for the key.
Click on Next

We can figure out here related tables will appear.

Select the required columns (attributes) in that dimension.
Click on Next
We need to specify the dimension type, select the Regular dimension
Click on Next
Define if any Parent Child relationship inside this tables
Click on Next
Here we can give the Name of the Dimension and click on Finish button.

In solution explorer we can see the created dimension




After this we will get window which is having dimensions, attributes and hierarchies.
All the dimensions are in blue since attributes are selected one from each dimension table. We can rename the attributed in the Attributes list.





In the Hierarchies window we can specify the hierarchy.

Click on save all icon.
No actual data is populated yet. Now we need to create Meta data into the dimensions.
For that, Right click on the Project Name (SSAS_Project) and select properties.
Give the server name then click OK.
Then right click on the dimension click Process.

We will get

Click on Yes.
















If you go and check in Management studio
It will ask to connect to server-- Analysis server
Click on connect,
You can see in the databases we will have our project SSAS_Project
Here right click on the dimension and select browse then you will not get any data because the data is not populated yet.
Go back to BIDS (Business Intelligent Development Studio)

Now we are populating the data into dimension.
Click on Run

You can see the Process progress like succeeded. Then click on close.
Again close the window.

In output window you can see the message.
Now you can test the data by right clicking on the Product dimension and select Browse.

As per the created hierarchy you can see the data.

Building the Cube:
In the Solution explorer Right click on the cube folder click on create New cube

Here we can see the same options.
Click on next
Then select Data Source View
Click on next
It will detect all the tables and relationships
Click on next
Here we can find out fact and dimension tables.

We can define the time dimension table ,

Move the available dimensions in to cube dimension



Specify the time table columns like Calendar year, Calendar Quarter and month name etc.
Click on next




Here you can select measures that you want to include in the cube.
Click on next


Here I can see the time dimension and hierarchies and attributes.

Click on Next
Give the Cube name and select finish button.
We can see the Time dimension also in Dimension folders.
The facts will appear in yellow color in the Data Source View window.

Now we can see in Management studio, right click on the project and click on refresh.
In cube folder we still not yet get the build cube.
Again go back to BIDS; here we can deploy the cube.
For that right click on the project click on the Process
Click on yes.


Deployed successful complete.
Again go back to Management Studio and check the deployed cube is available or not.
In BIDS,
Click on Run then Process Progress window will appear here you can see the Status.
Click on close button.
In the Measures window we can see the selected measure.
We can define the properties also. For that right click on measure select properties.

Below are the options available for modifying, adding new cube features like measure, attributes hierarchies and levels.




To browse the data use the browser. We will see a cross tab like structure and we can see the data there.
We can check the same data in Management Studio also. Right click on the cube select browse option.

Tuesday, June 23, 2009

MSBI – GOOD LINKS FOR THE MONTH

http://bp-msbi.blogspot.com/
http://prologika.com/CS/blogs/blog/default.aspx

SSAS - Currency Conversion in SSAS

Microsoft SQL Server Analysis Services uses a combination of features, guided by Multidimensional Expressions (MDX) scripts, to provide currency conversion support in cubes supporting multiple currencies.
Currency Conversion Terminology
The following terminology is used in Analysis Services to describe currency conversion functionality:
Pivot currency
The currency against which exchange rates are entered in the rate measure group.
Local currency
The currency used to store transactions on which measures to be converted are based.
The local currency can be identified by either:
· A currency identifier in the fact table stored with the transaction, as is commonly the case with banking applications where the transaction itself identifies the currency used for that transaction.
· A currency identifier associated with an attribute in a dimension table that is then associated with a transaction in the fact table, as is commonly the case in financial applications where a location or other identifier, such as a subsidiary, identifies the currency used for an associated transaction.
Reporting currency
The currency to which transactions are converted from the pivot currency.
Note:
For many-to-one currency conversions, the pivot currency and reporting currency are the same.
Currency dimension
A database dimension defined with the following settings:
· The Type property of the dimension is set to Currency.
· The Type property of one attribute for the dimension is set to CurrencyName.
Important:
The values of this attribute must be used in all columns that should contain a currency identifier.
Rate measure group
A measure group in a cube, defined with the following settings:
· A regular dimension relationship exists between a currency dimension and the rate measure group.
· A regular dimension relationship exists between a time dimension and the rate measure group.
· Optionally, the Type property is set to ExchangeRate. While the Business Intelligence Wizard uses the relationships with the currency and time dimensions to identify likely rate measure groups, setting the Type property to ExchangeRate allows client applications to more easily identify rate measure groups.
· One or more measures, representing the exchange rates contained by the rate measure group.
Reporting currency dimension
The dimension, defined by the Business Intelligence Wizard after a currency conversion is defined, that contains the reporting currencies for that currency conversion. The reporting currency dimension is based on a named query, defined in the data source view on which the currency dimension associated with the rate measure group is based, from the dimension main table of the currency dimension. The dimension is defined with the following settings:
· The Type property of the dimension is set to Currency.
· The Type property of the key attribute for the dimension is set to CurrencyName.
· The Type property of one attribute within the dimension is set to CurrencyDestination, and the column bound to the attribute contains the currency identifiers that represent the reporting currencies for the currency conversion.
Defining Currency Conversions
You can use the Business Intelligence Wizard to define currency conversion functionality for a cube, or you can manually define currency conversions using MDX scripts.
Prerequisites
Before you can define a currency conversion in a cube using the Business Intelligence Wizard, you must first define at least one currency dimension, at least one time dimension, and at least one rate measure group. From these objects, the Business Intelligence Wizard can retrieve the data and metadata used to construct the reporting currency dimension and MDX script needed to provide currency conversion functionality.
Decisions
You need to make the following decisions before the Business Intelligence Wizard can construct the reporting currency dimension and MDX script needed to provide currency conversion functionality:
Exchange rate direction
Converted members
Conversion type
Local currencies
Reporting currencies
Exchange Rate Directions
The rate measure group contains measures representing exchange rates between local currencies and the pivot currency (commonly referred to as the corporate currency). The combination of exchange rate direction and conversion type determines the operation performed on measures to be converted by the MDX script generated using the Business Intelligence Wizard. The following table describes the operations performed depending on the exchange rate direction and conversion type, based on the exchange rate direction options and conversion directions available in the Business Intelligence Wizard.
Exchange rate direction
Many-to-one
One-to-many
Many-to-many
n pivot currency to 1 sample currency
Multiply the measure to be converted by the exchange rate measure for the local currency in order to convert the measure into the pivot currency.
Divide the measure to be converted by the exchange rate measure for the reporting currency in order to convert the measure into the reporting currency.
Multiply the measure to be converted by the exchange rate measure for the local currency in order to convert the measure into the pivot currency, then divide the converted measure by the exchange rate measure for the reporting currency in order to convert the measure into the reporting currency.
n sample currency to 1 pivot currency
Divide the measure to be converted by the exchange rate measure for the local currency in order to convert the measure into the pivot currency.
Multiply the measure to be converted by the exchange rate measure for the reporting currency in order to convert the measure into the reporting currency.
Divide the measure to be converted by the exchange rate measure for the local currency in order to convert the measure into the pivot currency, then multiply the converted measure by the exchange rate measure for the reporting currency in order to convert the measure into the reporting currency.
You choose the exchange rate direction on the Set currency conversion options page of the Business Intelligence Wizard. For more information about setting conversion direction, see Set Currency Conversion Options (Business Intelligence Wizard).
Converted Members
You can use the Business Intelligence Wizard to specify which measures from the rate measure group are used to convert values for:
Measures in other measure groups.
Members of an attribute hierarchy for an account attribute in a database dimension.
Account types, used by members of an attribute hierarchy for an account attribute in a database dimension.
The Business Intelligence Wizard uses this information within the MDX script generated by the wizard to determine the scope of the currency conversion calculation. For more information about specifying members for currency conversion, see Select Members (Business Intelligence Wizard).
Conversion Types
The Business Intelligence Wizard supports three different types of currency conversion:
One-to-many Transactions are stored in the fact table in the pivot currency, and then converted to one or more other reporting currencies. For example, the pivot currency can be set to United States dollars (USD), and the fact table stores transactions in USD. This conversion type converts these transactions from the pivot currency to the specified reporting currencies. The result is that transactions can be stored in the specified pivot currency and viewed either in the specified pivot currency or in any of the reporting currencies specified in the reporting currency dimension defined for the currency conversion.
Many-to-one Transactions are stored in the fact table in local currencies, and then converted into the pivot currency. The pivot currency serves as the only specified reporting currency in the reporting currency dimension.For example, the pivot currency can be set to United States dollars (USD), and the fact table stores transactions in euros (EUR), Australian dollars (AUD), and Mexican pesos (MXN). This conversion type converts these transactions from their specified local currencies to the pivot currency. The result is that transactions can be stored in the specified local currencies and viewed in the pivot currency, which is specified in the reporting currency dimension defined for the currency conversion.
Many-to-many Transactions are stored in the fact table in local currencies. The currency conversion functionality converts such transactions into the pivot currency, and then to one or more other reporting currencies. For example, the pivot currency can be set to United States dollars (USD), and the fact table stores transactions in euros (EUR), Australian dollars (AUD), and Mexican pesos (MXN). This conversion type converts these transactions from their specified local currencies to the pivot currency, and then the converted transactions are converted again from the pivot currency to the specified reporting currencies. The result is that transactions can be stored in the specified local currencies and viewed either in the specified pivot currency or in any of the reporting currencies that are specified in the reporting currency dimension defined for the currency conversion.
Specifying the conversion type allows the Business Intelligence Wizard to define the named query and dimension structure of the reporting currency dimension, as well as the structure of the MDX script defined for the currency conversion.
Local Currencies
If you choose a many-to-many or many-to-one conversion type for your currency conversion, you need to specify how to identify the local currencies from which the MDX script generated by the Business Intelligence Wizard performs the currency conversion calculations. The local currency for a transaction in a fact table can be identified in one of two ways:
The measure group contains a regular dimension relationship to the currency dimension. For example, in the Adventure Works DW sample Analysis Services database, the Internet Sales measure group has a regular dimension relationship to the Currency dimension. The fact table for that measure group contains a foreign key column that references the currency identifiers in the dimension table for that dimension. In this case, you can select the attribute from the currency dimension that is referenced by the measure group to identify the local currency for transactions in the fact table for that measure group. This situation most often occurs in banking applications, where the transaction itself determines the currency used within the transaction.
The measure group contains a referenced dimension relationship to the currency dimension, through another dimension that directly references the currency dimension. For example, in the Adventure Works DW sample Analysis Services database, the Financial Reporting measure group has a referenced dimension relationship to the Currency dimension through the Organization dimension. The fact table for that measure group contains a foreign key column that references members in the dimension table for the Organization dimension. The dimension table for the Organization dimension, in turn, contains a foreign key column that references the currency identifiers in the dimension table for the Currency dimension. This situation most often occurs in financial reporting applications, where the location or subsidiary for a transaction determines the currency for the transaction. In this case, you can select the attribute that references the currency dimension from the dimension for the business entity.
Reporting Currencies
If you choose a many-to-many or one-to-many conversion type for your currency conversion, you need to specify the reporting currencies for which the MDX script generated by the Business Intelligence Wizard performs the currency conversion calculations. You can either specify all the members of the currency dimension related to the rate measure group, or select individual members from the dimension.
The Business Intelligence Wizard creates a reporting currency dimension, based on a named query constructed from the dimension table for the currency dimension using the selected reporting currencies.

See the below link:
http://blogs.conchango.com/christianwade/archive/2006/08/24/Currency-Conversion-in-Analysis-Services-2005.aspx

SSIS - USE OF Surrogate Keys at Fact level

Surrogate key is an artificial or synthetic key that is used as a substItute for a natural keys.It is just a unique identifier or number for each row that can be used for the primary key to the table.(It is a sequence generate key which is assigned to be a primary key in the system(table)).

See the below link.GoodOne.

http://blogs.msdn.com/sqlcat/archive/2009/05/13/assigning-surrogate-keys-to-early-arriving-facts-using-integration-services.aspx

Thursday, May 14, 2009

New Blogs For This Month

SSIS -Blogs
http://ganeshprof.blogspot.com/
http://rameshsql.blogspot.com/
Complete-Sql Server
http://bisqlserver.blogspot.com/
BizTalk Web Resources-
http://blogs.msdn.com/gsnowman/archive/2006/01/15/513159.aspx
.NET Banana-
http://blogs.msdn.com/gsnowman/

CRYSTAL-REPORTS-INFO

Crystal Reports software enables you to easily design interactive reports and connect them to virtually any data source. Your users can benefit from on-report sorting and filtering – giving them the power to execute decisions instantly.
And with Crystal Reports Visual Advantage, you can gain even more functionality for compelling reports. This bundled product includes Crystal Reports and Xcelsius Engage, giving you the power to create highly formatted reports with what-if scenario models, interactive charts – and deliver them via the Web, e-mail, Microsoft Office, Adobe PDF, or embedded in enterprise applications. As a result, you can leverage your reports to make better operational and strategic decisions.
With Crystal Reports, you can:
· Leverage professional reporting - priced for everyone
· Empower end users to explore reports with on-report sorting and parameters
· Minimize IT and developer effort with interactive reports
· Develop powerful data mashups
· Save valuable report design time
· Embed professional-looking reports in Java and .NET applications
· Tailor your solution by adding report management and viewing tools

http://www.crystalreportsbook.com/CrystalReportsXI.asp

SSIS- DTS to SSIS Migration

When you upgrade an existing SQL Server instance, DTS runtime remains in place. Your DTS packages are not affected and local server packages remain stored in the msdb.dbo.sysdtspackages table. Structured storage files, or packages saved to DTS files, are also unchanged.If you use Meta Data Services, previously called the Repository in SQL Server 7.0, you may encounter some problems. Meta Data Services is not supported by SSIS, which is not surprising considering that SQL Server 2000 Service Pack 3 removed Meta Data Services as a storage location. This is not a significant change, however, since the existing DTS object model and the DTSRUN command-line tool continue to support Meta Data Services.The most important thing to remember about the DTS-to-SSIS upgrade is that after it is complete, your packages will continue to run. Scheduled jobs will still use DTSRUN and call the same packages from the same locations as they did prior to the upgrade.When making the upgrade, you will need to determine how you will manage these legacy DTS packages in the future. The DTS runtime will continue to be available, and will have been updated to enable DTS packages to connect SQL Server 2005 data sources. But the DTS designer components are removed along with SQL Server 2000 Enterprise Manager.
See for more info:

http://bisqlserver.blogspot.com/2006/11/migrating-your-sql-2000-dts-packages.html
http://www.simple-talk.com/sql/sql-server-2005/dts-to-ssis-migration/