MSBI (SSIS/SSRS/SSAS) Online Training

Friday, December 31, 2010

!!~~~HAPPY NEW YEAR~~~!!

Hi All
______________________________________________
Wishing you & your family,
A Very Happy, Prosperous
!! New Year 2011 !!



Cheers,
Ravindra

Saturday, December 11, 2010

New Features in PerformancePoint Services (SharePoint Server 2010)



PerformancePoint Services in Microsoft SharePoint Server 2010 is a performance management service that you can use to monitor and analyze your business. By providing flexible, easy-to-use tools for building dashboards, scorecards, reports, and key performance indicators (KPIs), PerformancePoint Services can help everyone across an organization make informed business decisions that align with companywide objectives and strategy. Scorecards, dashboards, and KPIs help drive accountability. Integrated analytics help employees move quickly from monitoring information to analyzing it and, when appropriate, sharing it throughout the organization.

Prior to the addition of PerformancePoint Services to SharePoint Server 2010, Microsoft Office PerformancePoint Server 2007 functioned as a standalone server. Now PerformancePoint functionality is available as an integrated part of the SharePoint Server Enterprise license, as is the case with Excel Services in Microsoft SharePoint Server 2010. The popular features of earlier versions of PerformancePoint Services are preserved along with numerous enhancements and additional functionality.

What's new

PerformancePoint Services includes many new and updated features and functionality. For convenience, this document separates these updates into the following two categories:

Platform integration with SharePoint Products and Technologies

New PerformancePoint Services features

Platform integration with SharePoint Products and Technologies

PerformancePoint Services builds on the platform of SharePoint 2010 Products, providing customers with a more robust deployment, scalability, and performance model. The previous version was built directly on Microsoft Internet Information Services (IIS) and used a Microsoft SQL Server database. PerformancePoint Services is a SharePoint Server service application and uses SharePoint document libraries and lists to store content. These architectural changes enable PerformancePoint Services to take advantage of SharePoint Server 2010 enterprise features.

Physical architecture

For information about the physical architecture, see the diagram in Overview of PerformancePoint Services architecture, which shows the PerformancePoint Services architecture for farm deployment that utilizes three servers.

PerformancePoint Services as a service application

In SharePoint Server 2010, services are no longer contained within a Shared Service Provider (SSP). Instead, the infrastructure for hosting services is integrated with Microsoft SharePoint Foundation and the configuration of service offerings is much more flexible. The service application framework is a common service model that provides the following benefits:

A consistent management experience and shared infrastructure for all services that plug into it. As one of those services, PerformancePoint Services can provide the ability to perform bulk security operations for service applications in a farm deployment from the SharePoint Central Administration Web site.


Backup and recovery benefits include restoring a site collection, entire site content, or list content back to a previous version or point in time.


Services are installed by default; there is no SSP setup.


Sites can be configured to use only the services that are needed rather than the entire list of services.


Deploying services across sites and farms is more flexible. Similar to previous versions, a single set of services can be shared by all Web sites in a farm.


SharePoint Server 2010 as the repository

PerformancePoint Services stores data sources and dashboards in document libraries and all other dashboard content in lists. In addition, PerformancePoint Services data security and management is enhanced by the following features:

Failover and up-time strategies


Backup and restore strategies


Disaster-recovery strategies


Multi-tenancy support for list content


Enterprise-level, single-security model


Authentication and authorization schemes


Trusted locations


Familiar interface for storing and consuming data


Changes to the security model

PerformancePoint Services uses SharePoint Server 2010 to manage user credentials and to secure access to dashboard content and its underlying data sources. The new and changed features of the PerformancePoint Services security model are described in the following list:

User authentication is handled by SharePoint Server 2010. Authentication of PerformancePoint Services users is validated by the SharePoint Server 2010 authentication provider.


Trusted locations are used to limit access to PerformancePoint Services content types to specific sites. 


PerformancePoint Services uses the SharePoint Server 2010 security model to set permissions on dashboard content. These permissions provide a level of security that is equivalent to that found in Microsoft Office PerformancePoint Server 2007, but permissions in SharePoint Server 2010 do not map directly to PerformancePoint Monitoring Server 2007 roles and permissions.


New PerformancePoint Services features

PerformancePoint Services now can utilize SharePoint Server scalability, collaboration, backup and recovery, and disaster recovery capabilities. Dashboards and dashboard items are stored and secured within SharePoint lists and libraries, providing you with a single security and repository framework. The following section provides a summary of feature enhancements.

New features and enhancements

The following is an overview of features and enhancements to PerformancePoint Services in Microsoft SharePoint Server 2010.

With PerformancePoint Services, functioning as a service in SharePoint Server, dashboards and dashboard items are stored and secured within SharePoint lists and libraries, providing you with a single security and repository framework. The new architecture also takes advantage of SharePoint Server scalability, collaboration, backup and recovery, and disaster recovery capabilities. You also can include and link PerformancePoint Services Web Parts with other SharePoint Server Web Parts on the same page. The new architecture also streamlines security models that simplify access to report data.


The Decomposition Tree is a new visualization report type available in PerformancePoint Services. You can use it to quickly and visually break down higher-level data values from a multi-dimensional data set to understand the driving forces behind those values. The Decomposition Tree is available in scorecards and analytic reports and ultimately in dashboards.


You can access more detailed business information with improved scorecards. Scorecards have been enhanced to make it easy for you to drill down and quickly access more detailed information. PerformancePoint scorecards also offer more flexible layout options, dynamic hierarchies, and calculated KPI features. Using this enhanced functionality, you can now create custom metrics that use multiple data sources. You can also sort, filter, and view variances between actual and target values to help you identify concerns or risks.


Better Time Intelligence filtering capabilities that you can use to create and use dynamic time filters that are always up to date. Other improved filters improve the ability for dashboard users to quickly focus in on information that is most relevant.


Ability to include and link PerformancePoint Services Web Parts together with other PerformancePoint Services Web parts on the same page.


Easier to author and publish dashboard items by using Dashboard Designer.


SQL Server Analysis Services 2008 support.


Increased support for accessibility compliance in individual reports and scorecards.


The KPI Details report is a new report type that displays contextually relevant information about KPIs, metrics, rows, columns, and cells within a scorecard. The KPI Details report works as a Web part that links to a scorecard or individual KPI to show relevant metadata to the end user in SharePoint Server. This Web part can be added to PerformancePoint dashboards or any SharePoint Server page.


Create analytics reports to better understand underlying business forces behind the results. Analytic reports have been enhanced to support value filtering, new chart types, and server-based conditional formatting.


Retired features

PerformancePoint Services no longer supports Trend Charts, PivotTable reports, PivotChart reports, ODBC data sources, Analysis Services 2000, and 32-bit server architecture.

Conclusion

By becoming more tightly integrated with SharePoint Server 2010, PerformancePoint Services takes advantage of many enterprise-level SharePoint Server 2010 features. This change in architecture results in many changes and enhancements to PerformancePoint Services.

You can see more details on:

http://technet.microsoft.com/en-us/library/ee661741.aspx

PerformancePoint Server 2007 Top Blogs

The Microsoft PerformancePoint Team Blog—Find in-depth discussions on the monitoring and analytics aspects of Office PerformancePoint Server 2007.

Kevin White's PerformancePoint Server 2007 Blog—Find in-depth discussions on Office PerformancePoint Server 2007.

Previous Version Support—Get help with ProClarity and Business Scorecard Manager.

Help for PerformancePoint Server 2007—Browse PerformancePoint Server 2007 Help

TechNet Planning, Monitoring, and Analytic—This forum focuses on PerformancePoint Server and Management Reporter

The BI Blog—Various posts on everything BI

TechNet: ProClarity—This forum focuses on technical questions related to Microsoft ProClarity products.

SQL Server TechNet Forums— This forum focuses on technical questions related to Microsoft Analysis Services, Reporting Services, and Data Mining

Mosha's Blog—This blog has important information on MDX and other things related to Analysis Services

Norm's Blog—Bits of knowledge from the field and beyond

About PerformancePoint Server 2007


What Is PerformancePoint Server?
Microsoft Office PerformancePoint Server 2007 is an integrated performance management application that enables you to monitor and analyze your business.

By providing flexible, easy-to-use tools for building dashboards, Office PerformancePoint Server 2007 can help everyone across your organization make informed business decisions that align with companywide objectives.
PerformancePoint Server 2007 Top Benefits
The top benefits of Microsoft Office PerformancePoint Server 2007 can help you transform your business process and turn your data into a competitive advantage.

1. A complete and integrated performance management solution

Many performance management applications require connectors, interfaces, and add-ins to link their various applications and to the Microsoft Office environment. Office PerformancePoint Server 2007 is natively integrated across all aspects of the solution—from reporting and analysis, to dashboarding and forecasting.

PerformancePoint Server 2007 is designed to improve operational performance across all departments and all levels of your organization. You can use this single integrated application to monitor and analyze the success your company enjoys or the barriers it faces and to plan for the future of your business.

2. Performance management for all business users

While many solutions on the market focus on business analysts and BI experts, Office PerformancePoint Server 2007 is designed to reach everyone involved in your business at all levels of your organization. PerformancePoint Server 2007 provides a user experience in the familiar and easy-to-use Microsoft Office system, including Microsoft Office Excel, Office Outlook, and Office SharePoint Server.

And through the scalability and high performance of Microsoft SQL Server 2005 and the Microsoft Business Intelligence platform, your organization can afford to deliver the right information that helps the broadest number of people make better, faster, and more relevant decisions without continually burdening the IT department.

3. Monitoring and analyzing across the entire organization

Because of high cost, high complexity, and the need for significant IT support, most performance management environments limit their access to just a few executives or analysts. Because Office PerformancePoint Server 2007 resides within the Microsoft applications that you use today, it’s now economically feasible for all executives, managers, and front-line employees to have the same access to valuable insight from a high-quality performance management application.

Everyone can access the metrics, key performance indicators, dashboards, and reports to see how they contribute to the performance of the organization and to the overall corporate strategy—all leading to improved corporate performance.

4. Advanced analytic and visualization capabilities

Analysis is a core component of the performance management cycle. An organization is at a competitive advantage if every decision maker in it can quickly and easily perform analysis that helps them make better decisions and execute better against corporate goals and objectives.

Using the capabilities incorporated into Office PerformancePoint Server 2007, people can benefit from a single business data model shared across monitoring, and analytics. PerformancePoint Server 2007 guides information workers so they can easily monitor what is happening and make better sense of what the data analysis is telling them. The advanced visualization capabilities can help you spot trends and opportunities easily, allowing for better planning for your business.

5. Accountability from the individual to the enterprise

Office PerformancePoint Server 2007 uses a model-driven approach. Users can create scorecards, dashboards, and analytics that align with their own responsibilities and goals and with the overall goals and objectives of the organization. Providing maximum flexibility, the system accommodates unique departmental business models that synchronize with those above and below them in the organization, as well as across departments and up to the corporate level. So everyone has an up-to-date, consistent view of the data and the most current insight on the entire organization’s performance.

6. Cross-enterprise view

Office PerformancePoint Server 2007 provides a complete perspective of business performance across enterprise functions such as finance, operations, sales, and human resources. PerformancePoint Server 2007 uses SQL Server 2005 to integrate information across disparate enterprise systems and business functions, providing all users with a more strategic and forward-looking view of the business.

7. Enhanced partnership between IT and business analysts

While IT provides the infrastructure and security for crucial business applications, Office PerformancePoint Server 2007 business modeling tools enable business analysts to use their insight to manage roles, workflows, business rules, and information without having to continually rely on IT input. This increases the pace at which business models can be developed, deployed, and modified to meet changing business conditions.
PerformancePoint Server 2007 also features a user-friendly design and application wizard that business people can use to build their own dashboards and scorecards and create interactive analytics charts and graphs without the need for IT support.

8. Built on the Microsoft Business Intelligence platform

Using the Microsoft Business Intelligence platform, Office PerformancePoint Server 2007 can help you build an enterprise-grade, scalable performance management application. PerformancePoint Server 2007 uses Microsoft Office as its primary interface, so business users can interact with familiar tools like Excel to consume information, collaborate with colleagues, and contribute to plans and the performance management cycle.

PerformancePoint Server 2007 also uses the proven data engine of SQL Server 2005 for data integration; an analytical data model that includes a calculation engine and data mining; and a highly flexible and scalable reporting platform to broadcast information to a wide audience. All of this is delivered via the Web in Microsoft Office SharePoint Server 2007, providing everyone with a central place to get their business intelligence information.

MSBI(SSIS,SSAS with MDX,SSRS) No 1 Training Institute && Online Training

Hi Guys,
Excellent Opportunity. Definitely you will get job in 2 months.
for more details Send me an eMail at→ nv.ravindranathreddy@gmail.com
Right Opportunity at Right Place.
Regards,
Ravindra

Friday, November 19, 2010

Future MSBI-Power Pivot ‘N’ DAX




Here are Vidas comments on Power Pivot and DAX
- from the session ”SQLCAT: A Preview of PowerPivot Best Practices”. Presenters are Denny Lee and Dave Wickert:
• #PowerPivot spreadsheet is stored in the SQL database when deployed to SharePoint
• #powerPivot uses AS engine – in memory column based store w/VertiPaq. All calculations are local after import.
• Excel has it’s own local in process SSAS engine. New#powerPivot functions – Data Access eXpressions (DAX)
• #powerPivot compression 10:1 (approx) and depends…
• More info about #PowerPivot:VertiPaq does compression about 10:1, then SSAS does on top compression 1.5:1 or 2:1.
• MOLAP stores multiple records in the grain of fact table, so some records could be rolled up
• Vertipaq has a row for each source data, so different from MOLAP.
• There is on disk structure for #PowerPivot that looks similar to SSAS. c:\local user\… Additional info: Denny Lee just blogged about this with more details.
• Sub-folder customdata has file with backup of SSAS database for #PowerPivot.
• Showing demo of silverlight gallery of #PowerPivot reports in SharePoint
• #PowerPivot - when open workbook in Excel services, no interaction with SSAS first
• when click on slicer, excel services /#PowerPivot service talks to AS service engine. So on demand loading of SSAS DB.
• first initialization could take time – depends on traffic, etc. Same published #PowerPivot can serve many people.
• Need to optimize SharePoint for #PowerPivot setup.
• Capacity planning-need memory for any in use database and buffer 10-20% for auto detection(?).
• #PowerPivot keeps cache of detached databases. in the SSAS backup folder
• SSAS servers will be 64-128GB and more for #PowerPivot. Will work with 8GB, but most likely you will want more.
• #PowerPivot capacity planning depends - #users total/concurrent, # of files, max file size, data refresh options
• in SharePoint content db workbooks are stored as blobs . Could use Remote Blob store in FileStream. Overhead 2-3%
• but better handle higher concurrency scenarios (cont..)
• different topology options – one sharepoint server, multiple servers with different services on them, etc…
• Currently SharePoint requires Kerberos security all the way
• in SharePoint 2010 – new “claims tokens”, so no need for Kerberos for #PowerPivot
• Kerberos still might be required, but there is reduction in cases when you need it for #PowerPivot and SharePoint 2010.
• #PowerPivot - before installing YOU MUST READ MANUAL! That is important.
• upgrade SharePOInt 2007 to 2010 is very complex for#PowerPivot. Cannot upgrade CTP2->CTP3. Not clear CTP3->RTM.
• SharePoint is optimized for download, so there are concerns for upload #PowerPivot (large files)
• power pivot max size 2GB (SharePoint limit). But you can create >2GB #PowerPivot files, just not upload to Sharepoint. Additional info: Actual max publish size could be 1.8GB. Although it is limitation, it will affect very few users, as majority of them will not have such large files, so there is no need to worry about this
• in sharepoint LargeChunkFileSize parameter is useless – don’t touch it for #PowerPivot.
• to upload #PowerPivot to SharePoint you need very good network connection! Maybe first copy to SharePoint server – faster.
• troubleshooting – error messages very en-cryptic for#PowerPivot.
• use ULS logs (Bing or google for more info) from SharePoint to troubleshoot#PowerPivot.
• from ULS logs use correlationID to track down log data for one event. Log files very large, filter by time too. #powerpivot
there is no separate #PowerPivot log. Error could be in#PowerPivot, Excel services, etc., so ULS one option.
• You can use SQL Profiler to troubleshoot#PowerPivot SSAS instance! Good news!
• so to troubleshoot #PowerPivot you need to know SharePoint and SSAS and use tools! Consultant will be still busy…
• session almost done, time for lunch. See you latter tonight for #PowerPivot DAX session!
• create a view with fewer records and create#PowerPivot, deploy to Sharepoint, then update view to include all records. Faster.
• #PowerPivot tip: Rename *.xlsx file to *.zip and inside you will see data file that you can rename to *.abf and restore to SSAS server that runs in PowerPivot integrated mode
• this was the best session so far at - Best Practice on#PowerPivot by @dennylee and Dave Wickert
- from the session “DAX in #PowerPivot for Excel 2010″ by Howie Dickerman.
• Data Analysis Expressions = DAX.
• DAX lets user do multidimensional analysis without user knowing that this is multidimensional analysis.
• sample DAX: =[Qty]*[Price] – syntax just like Excel.
• DAX uses Excel functions, but no notion of addressing indv cells or ranges – instead columns in the data
• DAX is not replacement for MDX
• DAX provides functions that implement relations database concepts: Filter tables, aggregates, follow relationships
• DAX one to many function sample: =SUMX(RELATEDTABLE([Sales], Sales[Amount])
• DAX has functions to assist with dynamic aggregations of measures:
• DAX dynamic sample: =VALUES(Time[year]) & “.” & VALUES(Product[ProductID])
• DAX sample: =IF(VALUES(Time[Year])=2008,”Baseline”,”normal”)
• DAX calc expression in 2 places – calc columns (full materialize) and measures (eval for each cell dynamically)
• More than 80 Excel functions in DAX
• CTP3 has new FORMAT function to allow to convert any number to string #DAX
• DAX sample: [salesAmt]/[SalesAmt](All(Product)) – use measure as function.
• DAX = Sales[SalesAmt]/CALCULATE(Sales[SalesAmt], ALL(Product)) more complex syntax
• CTP3 DAX has 35 Time Intelligence Functions – require date column in the data. Column type Date
• DAX this version does not have custom time periods and weeks. Works just with Yr, Qtr,Mth,Day
• DAX function samples: FirstDate, LastDate, FirstNonBlank, StartOfMonth, StartOfQtr, EndOfYear
• DAX: DateAdd, DatesBetween, DatesInperiod – 26 function that return a table of dates
• DAX: PreviousDay, Nextmonth, DatesMTD, DatesYTD, TotalMTD, OpeningBalanceQuarter, ClosingBalanceYear, etc <-more functions
• Year over year growth: =Sales[SalesAmtt]-Sales[SalesAmth)(DateAdd(Time[Date],-1,Year].All(Time))
• AllTime – need to add now in CTP3, not in RTM(will be under cover). So that first year has data.
• DAX sample: QTD Sales =TotalQTD(Sales[SalesAmt],Time[Date], All(Time))
• DAX yearAgo:=Sales[SalesAmt)(ParallelPeriod(Time[Date],-12, Month).All(Time))
• I missed single quotes in same functions prev. If table name contains spaces, need to put single quote around names.
• DAX autocomplete in CTP3 add single quotes, but they are optional.
More Details:
http://www.ssas-info.com/VidasMatelisBlog/195_my-tweets-from-sql-pass-summit-2009-ssas-powerpivot-dax-and-more
http://ms-olap.blogspot.com/2010/05/self-service-bi-powerpivot-and-future.html#comment-form
http://cwebbbi.wordpress.com/2010/11/11/pass-summit-day-2/
~Rav’s

Wednesday, November 3, 2010

My latest Pics-Smoky mountain,tennesse











SSRS Report Deployment Procedure in Sharepoint server

Hi Guys,
Once we develop the SSRS report,we have to deploy this report into sharepoint portal to provide the facility to access these reports by power users /CEO’s.
Here I explained detailed deployment procedure,Please have a look.
Step-1:SSRS Reports Deploy at Report Server
1. Open the SSRS Report Manager Production URL
https://servername.com/reports

2. Go to the Home Folder in Report Manager.

3. Go to Home/DataSources folder

4. Click on “New Data Source”. Make sure the settings in the “Properties/General” tab match with the settings like: Name,Connection Type,Connection String and report server credentials.

5. At the very bottom of the page, click on “Apply”

6. Go to the Home/ Administration folder and for each of the report
o Click “Upload File” button
o Navigate to Reports location folder
o Select each report RDL file from the table below
o à click “OK”

7. Click on “Properties” > “Data Sources” and then “Browse” button

8. Expand the “Data Sources” folder and then select the Data Source “XXXX”

9. Click “OK” and then “Apply”

All the required reports should have been successfully migrated to server(Test/Prod).

Step-2: Changes at Report Viewer Level
In the Report Viewer folder, Edit the XXXMenu.xml file.
Under the Administration Reports, Add the new tag.





Step-3: Changes at sharepoint portal
Copy the link from the report server for the report and add them to the SharepointPortal Source editor HTML .
1. Goto the portal page where you want to show this report and select the drop down button in that section and choose option ‘Modify Shared WebPart’ option ans selct the Source Editor button from Content Editor window.
2. Copy the report url from Report Server and place into the Portal Content Editor Window.Place the below code inside of that…

href="https://servername.com/ReportServer/Default.aspx?%2fReports%2fAdmin+Sales+ Report&rs:Command=Render" target="_blank">Sales Report


That’s it.Now User is able to access the report from sharepoint portal.

We have different methods to deploy the reports.
See the below blogs to know some other deployment techniques.

http://blah.winsmarts.com/2007-12-SharePoint_and_SQL_Server_Reporting_Services_-_Authoring_a_simple_report.aspx

http://bisqlserver.rdacorp.com/2008/08/deploy-ssrs-reports-in-sharepoint.html


~Rav’s

SSRS Report Deplyment Procedure in Sharepoint server

Monday, November 1, 2010

SSAS-Data Security

Hi,
SSAS provides the way to secure analysis services database/cube data from unauthorized access. Analysis services provides secure access by creating object called "roles". After creation of role, user's windows login credential can be used to enroll into particular role because analysis services identifies user from their windows login credentials . You can protect your data in roles at two levels:
1) Dimension level
2) Cell level
If user has been assigned more than one role, analysis services loop through all assigned roles after login. Analysis services finds all permission level for the particular user and union all the permission levels.
If two roles has contradictory access for user then particular access will be allowed. Suppose role1 says Australia data access and role2 denies Australia data access then access to Australia data will be allowed.

http://learnmicrosoftbi.blogspot.com/2010/10/ssas-data-security-dimension-and-cell.html

SSAS-Data Security

Hi,
SSAS provides the way to secure analysis services database/cube data from unauthorized access. Analysis services provides secure access by creating object called "roles". After creation of role, user's windows login credential can be used to enroll into particular role because analysis services identifies user from their windows login credentials . You can protect your data in roles at two levels:
1) Dimension level
2) Cell level
If user has been assigned more than one role, analysis services loop through all assigned roles after login. Analysis services finds all permission level for the particular user and union all the permission levels.
If two roles has contradictory access for user then particular access will be allowed. Suppose role1 says Australia data access and role2 denies Australia data access then access to Australia data will be allowed.

http://learnmicrosoftbi.blogspot.com/2010/10/ssas-data-security-dimension-and-cell.html

**SSAS-Interview Questions**

Nice Post.!!Thanks Amit!!
http://learnmicrosoftbi.blogspot.com/search/label/Interview%20Questions

SSAS-MDX functions

Hi Guys,
Please look into the below site for more MDX functions.
Regards,
Rav’s
http://www.ssas-info.com/analysis-services-articles/50-mdx/2518-mdx-tutorials-mdx-queries-mdx-samples

SQL SERVER – Merge Operations – Insert, Update, Delete in Single Execution

Hi, MERGE is a new feature that provides an efficient way to do multiple DML operations. In earlier versions of SQL Server, we had to write separate statements to INSERT, UPDATE, or DELETE data based on certain conditions; however, at present, by using the MERGE statement, we can include the logic of such data changes in one statement that even checks when the data is matched and then just update it, and similarly, when the data is unmatched, it is inserted.
One of the most important advantages of MERGE statement is that the entire data are read and processed only once. In earlier versions, three different statements had to be written to process three different activities (INSERT, UPDATE or DELETE); however, by using MERGE statement, all the update activities can be done in one pass of database table.
See the below blog for more info.Nice article from Dave.
http://blog.sqlauthority.com/2010/06/08/sql-server-merge-operations-insert-update-delete-in-single-execution/

SSAS-MDX functions

Wednesday, September 22, 2010

SSAS - Usage Of OLAP PivotTable Extensions && PT Power


Hi Guys,
If you want to verify the background MDX query while you are browsing the cube from Excel, i have a very simple solution for you.By using the above 2 Add-Ins, we can analyze the data and MDX funcitionality in easy manner.
This is the best way to resolve the issues on Multi dimensional data.
ThanksRavindra
Useful Links:
http://olappivottableextend.codeplex.com/
http://www.sqlserverpower.com/UtilityDetail/PTPower.aspx

Monday, September 13, 2010

Capacity Planning for the DW/BI Environment


Hi All-
Capacity planning is a problem for a data warehouse because it sets contrasting functional requirements against each other. On one hand, data warehouse customers consume data warehouse capacity as they query the data in the data warehouse business intelligence (BI) reporting. Meanwhile, applications consume data warehouse capacity as they load data into a data warehouse through the extract, transform and load (ETL) process. These two functions, BI reporting and ETL, grow in volume and frequency as a data warehouse grows. Additionally, database administrator (DBA) tasks, such as backups and table reorganizations can cause additional data warehouse bottlenecks because ETL, BI reporting and DBA tasks contend for the same resources.
On top of that even we can maintain capacity of DW/BI environment by analyzing the below analytical approaches.
how many customers will be in the warehouse?,
at what rate will the customers grow?,
how many transactions will be in the warehouse?,
at what rate will the transactions grow?,
what other data will be in the warehouse?,
at what rate will the other data grow?,
what is the proper level of granularity for data in the warehouse?,
can the level of granularity be changed if needed?,
what amount of history is needed in the warehouse?,
will the user decide to add more history than anticipated?, and so forth.
See the below links for more info.
http://www.inmoncif.com/view/33
http://www.dmforum.org/portal/library/capacityplanningforthedwInmon.pdf
~Ravindra

Thursday, August 26, 2010

Sql Server Useful Date() FUNC

/*
Many useful SQL Server Date functions.
*/
--DATE CONVERSION
SELECT CONVERT (date, CURRENT_TIMESTAMP) ,CONVERT (date, GETDATE()) ,CONVERT (date, GETUTCDATE());
--DATE ADD()
SELECT DATEADD(year,10, getdate());
SELECT DATEADD(month,10, getdate());
SELECT DATEADD(day,10, getdate());
--DATE DIFF()
select DATEDIFF (day,getdate() ,'2010-05-26')
select DATEDIFF (month,getdate() ,'2010-05-26')
--DATE NAME()
SELECT DATENAME(year, '12:10:30.123') ,DATENAME(month, '12:10:30.123') ,DATENAME(day, '12:10:30.123') ,
DATENAME(dayofyear, '12:10:30.123') ,DATENAME(weekday, '12:10:30.123') ,DATENAME(hour, '2007-06-01') ,
DATENAME(minute, '2007-06-01') ,DATENAME(second, '2007-06-01');
--DATE PART()
SELECT DATEPART(year, '12:10:30.123') ,DATEPART(month, '12:10:30.123') ,DATEPART(day, '12:10:30.123')
,DATEPART(dayofyear, '12:10:30.123') ,DATEPART(weekday, '12:10:30.123');
****************************************************************************
select DATEPART(dw, getdate()) AS 'Today';
SELECT CURRENT_TIMESTAMP ,GETDATE() ,GETUTCDATE();
select convert(varchar(12),getdate(),112) AS 'YYYYMMDD'
UNION ALL
select convert(varchar(12),getdate(),112) AS 'YYYYMMDD'
UNION ALL
select convert(varchar(10),getdate(),120) AS 'YYYY-MM-DD'
UNION ALL
SELECT DATENAME (MM,GETDATE())
****************************************************************************
****************************************************************************
----Today
SELECT GETDATE() 'Today'
----Yesterday
SELECT DATEADD(d,-1,GETDATE()) 'Yesterday'
----First Day of Current Week
SELECT DATEADD(wk,DATEDIFF(wk,0,GETDATE()),0) 'First Day of Current Week'
----Last Day of Current Week
SELECT DATEADD(wk,DATEDIFF(wk,0,GETDATE()),6) 'Last Day of Current Week'
----First Day of Last Week
SELECT DATEADD(wk,DATEDIFF(wk,7,GETDATE()),0) 'First Day of Last Week'
----Last Day of Last Week
SELECT DATEADD(wk,DATEDIFF(wk,7,GETDATE()),6) 'Last Day of Last Week'
----First Day of Current Month
SELECT DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0) 'First Day of Current Month'
----Last Day of Current Month
SELECT DATEADD(ms,- 3,DATEADD(mm,0,DATEADD(mm,DATEDIFF(mm,0,GETDATE())+1,0))) 'Last Day of Current Month'
----First Day of Last Month
SELECT DATEADD(mm,-1,DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0)) 'First Day of Last Month'
----Last Day of Last Month
SELECT DATEADD(ms,-3,DATEADD(mm,0,DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0))) 'Last Day of Last Month'
----First Day of Current Year
SELECT DATEADD(yy,DATEDIFF(yy,0,GETDATE()),0) 'First Day of Current Year'
----Last Day of Current Year
SELECT DATEADD(ms,-3,DATEADD(yy,0,DATEADD(yy,DATEDIFF(yy,0,GETDATE())+1,0))) 'Last Day of Current Year'
----First Day of Last Year
SELECT DATEADD(yy,-1,DATEADD(yy,DATEDIFF(yy,0,GETDATE()),0)) 'First Day of Last Year'
----Last Day of Last Year
SELECT DATEADD(ms,-3,DATEADD(yy,0,DATEADD(yy,DATEDIFF(yy,0,GETDATE()),0))) 'Last Day of Last Year'

Datawherehouse Questions & Answers

What's A Data warehouse
Answer1:A Data warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated. This makes it much easier and more efficient to run queries over data that originally came from different sources". Another definition for data warehouse is: " A data warehouse is a logical collection of information gathered from many different operational databases used to create business intelligence that supports business analysis activities and decision-making tasks, primarily, a record of an enterprise's past transactional and operational information, stored in a database designed to favour efficient data analysis and reporting (especially OLAP)". Generally, data warehousing is not meant for current "live" data, although 'virtual' or 'point-to-point' data warehouses can access operational data. A 'real' data warehouse is generally preferred to a virtual DW because stored data has been validated and is set up to provide reliable results to common types of queries used in a business. Answer2:Data Warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated....This makes it much easier and more efficient to run queries over data that originally came from different sources. Typical relational databases are designed for on-line transactional processing (OLTP) and do not meet the requirements for effective on-line analytical processing (OLAP). As a result, data warehouses are designed differently than traditional relational databases.

What is ODS?
1. ODS means Operational Data Store. 2. A collection of operation or bases data that is extracted from operation databases and standardized, cleansed, consolidated, transformed, and loaded into an enterprise data architecture. An ODS is used to support data mining of operational data, or as the store for base data that is summarized for a data warehouse. The ODS may also be used to audit the data warehouse to assure summarized and derived data is calculated properly. The ODS may further become the enterprise shared operational database, allowing operational systems that are being reengineered to use the ODS as there operation databases.

What is a dimension table?
A dimensional table is a collection of hierarchies and categories along which the user can drill down and drill up. it contains only the textual attributes.

What is a lookup table?
A lookUp table is the one which is used when updating a warehouse. When the lookup is placed on the target table (fact table / warehouse) based upon the primary key of the target, it just updates the table by allowing only new records or updated records based on the lookup condition.

Why should you put your data warehouse on a different system than your OLTP system?
Answer1:A OLTP system is basically " data oriented " (ER model) and not " Subject oriented "(Dimensional Model) .That is why we design a separate system that will have a subject oriented OLAP system... Moreover if a complex querry is fired on a OLTP system will cause a heavy overhead on the OLTP server that will affect the daytoday business directly. Answer2:The loading of a warehouse will likely consume a lot of machine resources. Additionally, users may create querries or reports that are very resource intensive because of the potentially large amount of data available. Such loads and resource needs will conflict with the needs of the OLTP systems for resources and will negatively impact those production systems.

What are Aggregate tables?
Aggregate table contains the summary of existing warehouse data which is grouped to certain levels of dimensions.Retrieving the required data from the actual table, which have millions of records will take more time and also affects the server performance.To avoid this we can aggregate the table to certain required level and can use it.This tables reduces the load in the database server and increases the performance of the query and can retrieve the result very fastly.

What is Dimensional Modelling? Why is it important ?
Dimensional Modelling is a design concept used by many data warehouse desginers to build thier datawarehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measuremnets ie, the dimensions on which the facts are calculated.

Why is Data Modeling Important?
Data modeling is probably the most labor intensive and time consuming part of the development process. Why bother especially if you are pressed for time? A common response by practitioners who write on the subject is that you should no more build a database without a model than you should build a house without blueprints. The goal of the data model is to make sure that the all data objects required by the database are completely and accurately represented. Because the data model uses easily understood notations and natural language , it can be reviewed and verified as correct by the end-users. The data model is also detailed enough to be used by the database developers to use as a "blueprint" for building the physical database. The information contained in the data model will be used to define the relational tables, primary and foreign keys, stored procedures, and triggers. A poorly designed database will require more time in the long-term. Without careful planning you may create a database that omits data required to create critical reports, produces results that are incorrect or inconsistent, and is unable to accommodate changes in the user's requirements.

What is data mining?
Data mining is a process of extracting hidden trends within a datawarehouse. For example an insurance dataware house can be used to mine data for the most high risk people to insure in a certain geographial area.

What is ETL?
ETL stands for extraction, transformation and loading. ETL provide developers with an interface for designing source-to-target mappings, ransformation and job control parameter.· ExtractionTake data from an external source and move it to the warehouse pre-processor database. · TransformationTransform data task allows point-to-point generating, modifying and transforming data. · LoadingLoad data task adds records to a database table in a warehouse.

What does level of Granularity of a fact table signify?
GranularityThe first step in designing a fact table is to determine the granularity of the fact table. By granularity, we mean the lowest level of information that will be stored in the fact table. This constitutes two steps: Determine which dimensions will be included.Determine where along the hierarchy of each dimension the information will be kept.The determining factors usually goes back to the requirements

What is the Difference between OLTP and OLAP?
Main Differences between OLTP and OLAP are:- 1. User and System Orientation OLTP: customer-oriented, used for data analysis and querying by clerks, clients and IT professionals. OLAP: market-oriented, used for data analysis by knowledge workers( managers, executives, analysis). 2. Data Contents OLTP: manages current data, very detail-oriented. OLAP: manages large amounts of historical data, provides facilities for summarization and aggregation, stores information at different levels of granularity to support decision making process. 3. Database Design OLTP: adopts an entity relationship(ER) model and an application-oriented database design. OLAP: adopts star, snowflake or fact constellation model and a subject-oriented database design. 4. View OLTP: focuses on the current data within an enterprise or department. OLAP: spans multiple versions of a database schema due to the evolutionary process of an organization; integrates information from many organizational locations and data stores

What is SCD1 , SCD2 , SCD3?
SCD Stands for Slowly changing dimensions. SCD1: only maintained updated values. Ex: a customer address modified we update existing record with new address. SCD2: maintaining historical information and current information by using A) Effective DateB) VersionsC) Flagsor combination of these SCD3: by adding new columns to target table we maintain historical information and current information.

Why are OLTP database designs not generally a good idea for a Data Warehouse?
Since in OLTP,tables are normalised and hence query response will be slow for end user and OLTP doesnot contain years of data and hence cannot be analysed.

What is BUS Schema?
BUS Schema is composed of a master suite of confirmed dimension and standardized definition if facts.
What are the various Reporting tools in the Market?
1. MS-Excel2. Business Objects (Crystal Reports)3. Cognos (Impromptu, Power Play)4. Microstrategy5. MS reporting services6. Informatica Power Analyzer7. Actuate8. Hyperion (BRIO)9. Oracle Express OLAP10. Proclarity

What is Normalization, First Normal Form, Second Normal Form , Third Normal Form?
1.Normalization is process for assigning attributes to entities–Reducesdata redundancies–Helps eliminate data anomalies–Produces controlledredundancies to link tables 2.Normalization is the analysis offunctional dependency between attributes / data items of userviews?It reduces a complex user view to a set of small andstable subgroups of fields / relations 1NF:Repeating groups must beeliminated, Dependencies can be identified, All key attributesdefined,No repeating groups in table 2NF: The Table is already in1NF,Includes no partial dependencies–No attribute dependent on a portionof primary key, Still possible to exhibit transitivedependency,Attributes may be functionally dependent on non-keyattributes 3NF: The Table is already in 2NF, Contains no transitivedependencies

What is Fact table?
Fact Table contains the measurements or metrics or facts of business process. If your business process is "Sales" , then a measurement of this business process such as "monthly sales number" is captured in the Fact table. Fact table also contains the foriegn keys for the dimension tables.

What are conformed dimensions?
Answer1:Conformed dimensions mean the exact same thing with every possible fact table to which they are joined Ex:Date Dimensions is connected all facts like Sales facts,Inventory facts..etc Answer2:Conformed dimentions are dimensions which are common to the cubes.(cubes are the schemas contains facts and dimension tables) Consider Cube-1 contains F1,D1,D2,D3 and Cube-2 contains F2,D1,D2,D4 are the Facts and Dimensions here D1,D2 are the Conformed Dimensions

What are the Different methods of loading Dimension tables?
Conventional Load:Before loading the data, all the Table constraints will be checked against the data. Direct load:(Faster Loading)All the Constraints will be disabled. Data will be loaded directly.Later the data will be checked against the table constraints and the bad data won't be indexed.

What is conformed fact?
Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly

What are Data Marts?
Data Marts are designed to help manager make strategic decisions about their business. Data Marts are subset of the corporate-wide data that is of value to a specific group of users. There are two types of Data Marts:1.Independent data marts – sources from data captured form OLTP system, external providers or from data generated locally within a particular department or geographic area. 2.Dependent data mart – sources directly form enterprise data warehouses.

What is a level of Granularity of a fact table?
Level of granularity means level of detail that you put into the fact table in a data warehouse. For example: Based on design you can decide to put the sales data in each transaction. Now, level of granularity would mean what detail are you willing to put for each transactional fact. Product sales with respect to each minute or you want to aggregate it upto minute and put that data.

How are the Dimension tables designed?
Most dimension tables are designed using Normalization principles upto 2NF. In some instances they are further normalized to 3NF. Find where data for this dimension are located. Figure out how to extract this data. Determine how to maintain changes to this dimension (see more on this in the next section).

What are non-additive facts?
Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.

What type of Indexing mechanism do we need to use for a typical datawarehouse?
On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other types of clustered/non-clustered, unique/non-unique indexes. To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports bitmaps.

What Snow Flake Schema?
Snowflake Schema, each dimension has a primary dimension table, to which one or more additional dimensions can join. The primary dimension table is the only table that can join to the fact table.
What is real time data-warehousing?
Real-time data warehousing is a combination of two things: 1) real-time activity and 2) data warehousing. Real-time activity is activity that is happening right now. The activity could be anything such as the sale of widgets. Once the activity is complete, there is data about it. Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly. In other words, real-time data warehousing is a framework for deriving information from data as the data becomes available.

What are slowly changing dimensions?
SCD stands for Slowly changing dimensions. Slowly changing dimensions are of three types SCD1: only maintained updated values. Ex: a customer address modified we update existing record with new address. SCD2: maintaining historical information and current information by using A) Effective DateB) VersionsC) Flagsor combination of thesescd3: by adding new columns to target table we maintain historical information and current information

What are Semi-additive and factless facts and in which scenario will you use such kinds of fact tables?
Snapshot facts are semi-additive, while we maintain aggregated facts we go for semi-additive. EX: Average daily balance A fact table without numeric fact columns is called factless fact table. Ex: Promotion Facts While maintain the promotion values of the transaction (ex: product samples) because this table doesn’t contain any measures.

Differences between star and snowflake schemas?
Star schema - all dimensions will be linked directly with a fat table.Snow schema - dimensions maybe interlinked or may have one-to-many relationship with other tables.

What is a Star Schema?
Star schema is a type of organising the tables such that we can retrieve the result from the database easily and fastly in the warehouse environment.Usually a star schema consists of one or more dimension tables around a fact table which looks like a star,so that it got its name.

What is a general purpose scheduling tool?
The basic purpose of the scheduling tool in a DW Application is to stream line the flow of data from Source To Target at specific time or based on some condition.

What is ER Diagram?
The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views. Simply stated the ER model is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the Entity-Relationship diagram which is used to visually represents data objects. Since Chen wrote his paper the model has been extended and today it is commonly used for database design For the database designer, the utility of the ER model is: it maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables. it is simple and easy to understand with a minimum of training. Therefore, the model can be used by the database designer to communicate the design to the end user. In addition, the model can be used as a design plan by the database developer to implement a data model in a specific database management software.

Which columns go to the fact table and which columns go the dimension table?
The Primary Key columns of the Tables(Entities) go to the Dimension Tables as Foreign Keys. The Primary Key columns of the Dimension Tables go to the Fact Tables as Foreign Keys.

What are modeling tools available in the Market?
here are a number of data modeling tools Tool Name Company NameErwin Computer AssociatesEmbarcadero Embarcadero TechnologiesRational Rose IBM CorporationPower Designer Sybase CorporationOracle Designer Oracle Corporation

Name some of modeling tools available in the Market?
These tools are used for Data/dimension modeling 1. Oracle Designer2. ERWin (Entity Relationship for windows)3. Informatica (Cubes/Dimensions)4. Embarcadero5. Power Designer Sybase

How do you load the time dimension?
Time dimensions are usually loaded by a program that loops through all possible dates that may appear in the data. It is not unusual for 100 years to be represented in a time dimension, with one row per day.
Explain the advanatages of RAID 1, 1/0, and 5. What type of RAID setup would you put your TX logs.
Transaction logs write sequentially and don't need to be read at all. The ideal is to have each on RAID 1/0 because it has much better write performance than RAID 5. RAID 1 is also better for TX logs and costs less than 1/0 to implement. It has a tad less reliability and performance is a little worse generally speaking. RAID 5 is best for data generally because of cost and the fact it provides great read capability.

What are the vaious ETL tools in the Market?
Various ETL tools used in market are: 1. Informatica2. Data Stage3. MS-SQL DTS(Integrated Services 2005)4. Abinitio5. SQL Loader6. Sunopsis7. Oracle Warehouse Bulider8. Data Junction

What is VLDB?
Answer 1:VLDB stands for Very Large DataBase. It is an environment or storage space managed by a relational database management system (RDBMS) consisting of vast quantities of information. Answer 2:VLDB doesn’t refer to size of database or vast amount of information stored. It refers to the window of opportunity to take back up the database. Window of opportunity refers to the time of interval and if the DBA was unable to take back up in the specified time then the database was considered as VLDB.

What are Data Marts ?
A data mart is a focused subset of a data warehouse that deals with a single area(like different department) of data and is organized for quick analysis

What are the steps to build the datawarehouse ?
Gathering bussiness requiremntsIdentifying SourcesIdentifying FactsDefining DimensionsDefine AttribuesRedefine Dimensions & AttributesOrganise Attribute Hierarchy & Define RelationshipAssign Unique IdentifiersAdditional convetions:Cardinality/Adding ratios

What is Difference between E-R Modeling and Dimentional Modeling.?
Basic diff is E-R modeling will have logical and physical model. Dimensional model will have only physical model. E-R modeling is used for normalizing the OLTP database design. Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design.

Why fact table is in normal form?
Basically the fact table consists of the Index keys of the dimension/ook up tables and the measures. so when ever we have the keys in a table .that itself implies that the table is in the normal form.

What are the advantages data mining over traditional approaches?
Data Mining is used for the estimation of future. For example, if we take a company/business organization, by using the concept of Data Mining, we can predict the future of business interms of Revenue (or) Employees (or) Cutomers (or) Orders etc. Traditional approches use simple algorithms for estimating the future. But, it does not give accurate results when compared to Data Mining.

What are the vaious ETL tools in the Market?
Various ETL tools used in market are: InformaticaData StageOracle Warehouse BuliderAb InitioData Junction

What is a CUBE in datawarehousing concept?
Cubes are logical representation of multidimensional data.The edge of the cube contains dimension members and the body of the cube contains data values.

What is data validation strategies for data mart validation after loading process ?
Data validation is to make sure that the loaded data is accurate and meets the business requriments. Strategies are different methods followed to meet the validation requriments

what is the datatype of the surrgate key ?
Datatype of the surrgate key is either inteeger or numaric or number

What is degenerate dimension table?
Degenerate Dimensions : If a table contains the values, which r neither dimesion nor measures is called degenerate dimensions.Ex : invoice id,empno

What is Dimensional Modelling?
Dimensional Modelling is a design concept used by many data warehouse desginers to build thier datawarehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measuremnets ie, the dimensions on which the facts are calculated.

What are the methodologies of Data Warehousing.?
Every company has methodology of their own. But to name a few SDLC Methodology, AIM methodology are stardadly used. Other methodologies are AMM, World class methodology and many more.

What is a linked cube?
Linked cube in which a sub-set of the data can be analysed into great detail. The linking ensures that the data in the cubes remain consistent.
What is the main difference between Inmon and Kimball philosophies of data warehousing?
Both differed in the concept of building teh datawarehosue.. According to Kimball ... Kimball views data warehousing as a constituency of Data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence a unified view of the enterprise can be obtain from the dimension modeling on a local departmental level. Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the development of the data warehouse can start with data from the online store. Other subject areas can be added to the data warehouse as their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary. i.e.,Kimball--First DataMarts--Combined way ---Datawarehouse Inmon---First Datawarehouse--Later----Datamarts

What is Data warehosuing Hierarchy?
HierarchiesHierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure. Within a hierarchy, each level is logically connected to the levels above and below it. Data values at lower levels aggregate into the data values at higher levels. A dimension can be composed of more than one hierarchy. For example, in the product dimension, there might be two hierarchies--one for product categories and one for product suppliers. Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to enable you to drill down into your data to view different levels of granularity. This is one of the key benefits of a data warehouse. When designing hierarchies, you must consider the relationships in business structures. For example, a divisional multilevel sales organization. Hierarchies impose a family structure on dimension values. For a particular level value, a value at the next higher level is its parent, and values at the next lower level are its children. These familial relationships enable analysts to access data quickly. LevelsA level represents a position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the month, quarter, and year levels. Levels range from general to specific, with the root level as the highest or most general level. The levels in a dimension are organized into one or more hierarchies. Level RelationshipsLevel relationships specify top-to-bottom ordering of levels from most general (the root) to most specific information. They define the parent-child relationship between the levels in a hierarchy. Hierarchies are also essential components in enabling more complex rewrites. For example, the database can aggregate an existing sales revenue on a quarterly base to a yearly aggregation when the dimensional dependencies between quarter and year are known.

What is the main differnce between schema in RDBMS and schemas in DataWarehouse....?
RDBMS Schema* Used for OLTP systems* Traditional and old schema* Normalized* Difficult to understand and navigate* Cannot solve extract and complex problems* Poorly modelledDWH Schema* Used for OLAP systems* New generation schema* De Normalized* Easy to understand and navigate* Extract and complex problems can be easily solved* Very good model
What is hybrid slowly changing dimension?
Hybrid SCDs are combination of both SCD 1 and SCD 2. It may happen that in a table, some columns are important and we need to track changes for them i.e capture the historical data for them whereas in some columns even if the data changes, we don't care. For such tables we implement Hybrid SCDs, where in some columns are Type 1 and some are Type 2.

What are the different architecture of datawarehouse?
There are two main things 1. Top down - (bill Inmon)2.Bottom up - (Ralph kimbol)

1.what is incremintal loading? 2.what is batch processing? 3.what is crass reference table? 4.what is aggregate fact table?
Incremental loading means loading the ongoing changes in the OLTP. Aggregate table contains the [measure] values ,aggregated /grouped/summed up to some level of hirarchy.

what is junk dimension? what is the difference between junk dimension and degenerated dimension?
Junk dimension: Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension. Degenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension table with fields like order number and order line number and have 1:1 relationship with Fact table, In this case this dimension is removed and the order information will be directly stored in a Fact table inorder eliminate unneccessary joins while retrieving order information..

What are the possible data marts in Retail sales.?
Product information,sales information

What is the definition of normalized and denormalized view and what are the differences between them?
Normalization is the process of removing redundancies. Denormalization is the process of allowing redundancies.

Data Warehousing Interview Questions and Answers
Part: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
(Continued from previous part...)
Data Warehousing Basics Interview Questions and Answers (5)

What is meant by metadata in context of a Datawarehouse and how it is important?
Meta data is the data about data; Business Analyst or data modeler usually capture information about data - the source (where and how the data is originated), nature of data (char, varchar, nullable, existance, valid values etc) and behavior of data (how it is modified / derived and the life cycle ) in data dictionary a.k.a metadata. Metadata is also presented at the Datamart level, subsets, fact and dimensions, ODS etc. For a DW user, metadata provides vital information for analysis / DSS.

Differences between star and snowflake schemas?
Star schemaA single fact table with N number of Dimension Snowflake schemaAny dimensions with extended dimensions are know as snowflake schema

Difference between Snow flake and Star Schema. What are situations where Snow flake Schema is better than Star Schema to use and when the opposite is true?
Star schema contains the dimesion tables mapped around one or more fact tables. It is a denormalised model. No need to use complicated joins. Queries results fastly. Snowflake schema It is the normalised form of Star schema. contains indepth joins ,bcas the tbales r splitted in to many pieces.We can easily do modification directly in the tables. We hav to use comlicated joins ,since we hav more tables . There will be some delay in processing the Query .

What is VLDB?
The perception of what constitutes a VLDB continues to grow. A one terabyte database would normally be considered to be a VLDB.

What's the data types present in bo?n what happens if we implement view in the designer n report
Three different data types: Dimensions,Measure and Detail. View is nothing but an alias and it can be used to resolve the loops in the universe.

can a dimension table contains numeric values?
Yes.But those datatype will be char (only the values can numeric/char)

What is the difference between view and materialized view?
View - store the SQL statement in the database and let you use it as a table. Everytime you access the view, the SQL statement executes. Materialized view - stores the results of the SQL in table form in the database. SQL statement only executes once and after that everytime you run the query, the stored result set is used. Pros include quick query results.
What is surrogate key ? where we use it expalin with examples
surrogate key is a substitution for the natural primary key. It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table. Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension tables primary keys. They can use Infa sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key. It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult. Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as the primary keys (according to the business users) but ,not only can these change, indexing on a numerical value is probably better and you could consider creating a surrogate key called, say, AIRPORT_ID. This would be internal to the system and as far as the client is concerned you may display only the AIRPORT_NAME. 2. Adapted from response by Vincent on Thursday, March 13, 2003 Another benefit you can get from surrogate keys (SID) is : Tracking the SCD - Slowly Changing Dimension. Let me give you a simple, classical example: On the 1st of January 2002, Employee 'E1' belongs to Business Unit 'BU1' (that's what would be in your Employee Dimension). This employee has a turnover allocated to him on the Business Unit 'BU1' But on the 2nd of June the Employee 'E1' is muted from Business Unit 'BU1' to Business Unit 'BU2.' All the new turnover have to belong to the new Business Unit 'BU2' but the old one should Belong to the Business Unit 'BU1.' If you used the natural business key 'E1' for your employee within your datawarehouse everything would be allocated to Business Unit 'BU2' even what actualy belongs to 'BU1.' If you use surrogate keys, you could create on the 2nd of June a new record for the Employee 'E1' in your Employee Dimension with a new surrogate key. This way, in your fact table, you have your old data (before 2nd of June) with the SID of the Employee 'E1' + 'BU1.' All new data (after 2nd of June) would take the SID of the employee 'E1' + 'BU2.' You could consider Slowly Changing Dimension as an enlargement of your natural key: natural key of the Employee was Employee Code 'E1' but for you it becomesEmployee Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the difference with the natural key enlargement process, is that you might not have all part of your new key within your fact table, so you might not be able to do the join on the new enlarge key -> so you need another id.

What is ER Diagram?
The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views. Simply stated the ER model is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the Entity-Relationship diagram which is used to visually represents data objects. Since Chen wrote his paper the model has been extended and today it is commonly used for database design For the database designer, the utility of the ER model is: it maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables. it is simple and easy to understand with a minimum of training. Therefore, the model can be used by the database designer to communicate the design to the end user. In addition, the model can be used as a design plan by the database developer to implement a data model in a specific database management software.

What is aggregate table and aggregate fact table ... any examples of both?
Aggregate table contains summarised data. The materialized view are aggregated tables. for ex in sales we have only date transaction. if we want to create a report like sales by product per year. in such cases we aggregate the date vales into week_agg, month_agg, quarter_agg, year_agg. to retrive date from this tables we use @aggrtegate function.

What is active data warehousing?
An active data warehouse provides information that enables decision-makers within an organization to manage customer relationships nimbly, efficiently and proactively. Active data warehousing is all about integrating advanced decision support with day-to-day-even minute-to-minute-decision making in a way that increases quality of those customer touches which encourages customer loyalty and thus secure an organization's bottom line. The marketplace is coming of age as we progress from first-generation "passive" decision-support systems to current- and next-generation "active" data warehouse implementations

Why do we override the execute method is struts? Plz give me the details?
As part of Struts FrameWork we can decvelop the Action Servlet,ActionForm servlets(here ActionServlet means which class extends the Action class is called ActionServlet and ActionFome means which calss extends the ActionForm calss is called the Action Form servlet)and other servlets classes. In case of ActionForm class we can develop the validate().this method will return the ActionErrors object.In this method we can write the validation code.If this method return null or ActionErrors with size=0,the webcontainer will call the execute() as part of the Action class.if it returns size > 0 it willnot be call the execute().it will execute the jsp,servlet or html file as value for the input attribute as part of the attribute in struts-config.xml file.

What is the difference between Datawarehousing and BusinessIntelligence?
Data warehousing deals with all aspects of managing the development, implementation and operation of a data warehouse or data mart including meta data management, data acquisition, data cleansing, data transformation, storage management, data distribution, data archiving, operational reporting, analytical reporting, security management, backup/recovery planning, etc. Business intelligence, on the other hand, is a set of software tools that enable an organization to analyze measurable aspects of their business such as sales performance, profitability, operational efficiency, effectiveness of marketing campaigns, market penetration among certain customer groups, cost trends, anomalies and exceptions, etc. Typically, the term “business intelligence” is used to encompass OLAP, data visualization, data mining and query/reporting tools.Think of the data warehouse as the back office and business intelligence as the entire business including the back office. The business needs the back office on which to function, but the back office without a business to support, makes no sense.
What is the difference between OLAP and datawarehosue?
Datawarehouse is the place where the data is stored for analyzing where as OLAP is the process of analyzing the data,managing aggregations, partitioning information into cubes for indepth visualization.

What is fact less fact table? where you have used it in your project?
Factless table means only the key available in the Fact there is no mesures availalabl

Why Denormalization is promoted in Universe Designing?
In a relational data model, for normalization purposes, some lookup tables are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called DIMENSION table for performance and slicing data.Due to this merging of tables into one large Dimension table, it comes out of complex intermediate joins. Dimension tables are directly joined to Fact tables.Though, redundancy of data occurs in DIMENSION table, size of DIMENSION table is 15% only when compared to FACT table. So only Denormalization is promoted in Universe Desinging.

What is the difference between ODS and OLTP?
ODS:- It is nothing but a collection of tables created in the Datawarehouse that maintains only current data where as OLTP maintains the data only for transactions, these are designed for recording daily operations and transactions of a business

What is the difference between datawarehouse and BI?
Simply speaking, BI is the capability of analyzing the data of a datawarehouse in advantage of that business. A BI tool analyzes the data of a datawarehouse and to come into some business decision depending on the result of the analysis.

Is OLAP databases are called decision support system ??? true/false?
True

explain in detail about type 1, type 2(SCD), type 3 ?
Type-1 Most Recent Value Type-2(full History) i) Version Number ii) Flag iii) Date Type-3 Current and one Perivies value

What is snapshot?
You can disconnect the report from the catalog to which it is attached by saving the report with a snapshot of the data. However, you must reconnect to the catalog if you want to refresh the data.

What is the difference between datawarehouse and BI?
Simply speaking, BI is the capability of analyzing the data of a datawarehouse in advantage of that business. A BI tool analyzes the data of a datawarehouse and to come into some business decision depending on the result of the analysis.

What are non-additive facts in detail?
A fact may be measure, metric or a dollar value. Measure and metric are non additive facts. Dollar value is additive fact. If we want to find out the amount for a particular place for a particular period of time, we can add the dollar amounts and come up with the total amount. A non additive fact, for eg measure height(s) for 'citizens by geographical location' , when we rollup 'city' data to 'state' level data we should not add heights of the citizens rather we may want to use it to derive 'count'


What is a correlated sub-query? How can these queries be useful?
The more seasoned developer will be able to accurately describe this type of query. A correlated sub-query is a special type of query containing a sub-query. The sub-query contained in the query actually requests values from the outside query, creating a situation similar to a loop. You can find a more detailed description as to how these special types of queries work in this article.