Not so fast, NoSQL — SQL still reigns

Glad to see MSSQL is still the most popular.

I think that most projects..  SQL Server Express is the best choice.. MySql just can’t compete with the indexing options available with free MSSQL.


When it comes to which database technologies are currently in use at the participating companies, the results featured established commercial products like Microsoft SQL Server (57 percent) and Oracle (38 percent), as well as open source options like MySQL database (40 percent) and PostgreSQL (13 percent).

http://m.infoworld.com/t/dbms/not-so-fast-nosql-sql-still-reigns-244507

Small Business Email Provider – need recommendations

So I have decided to change my domain.. My email address. I am trying to increase the value of my brand.

I think I have decided on TWO URLs. I will share them soon.

One half of my career.. To focus on databases, mostly SQL Server, MySQL and Hadoop. 

The other half of my career.. Has primarily been a hobby of mine.  I love WordPress. I just am fascinated by the productivity gains that are realized by standardizing on the worlds most popular platform.

I will be sharing these domain names.. And moving my blogs soon.  I am excited to peer into the future and plan for this.

The ONE major sticking point in my plan. EMAIL.

I just can’t stand paying for email service.. But I REALLY need to get some productivity solution that includes a calendar.  Any ideas of where to turn? I need a custom domain name. And calendar. And FREE.

MOST importantly. I need to support multiple accounts.  I want to segregate my sales leads from existing clients communication. I want to be able to close one inbox.. And get 8 hours of work done. Without hearing from people about new opportunities.

Currently, I have something like a hundred thousand emails in gmail.  This makes it hard to find things.

If you have any advice on where to look for custom domains.. Hosted email, multiple accounts.. And zero cost.. Currently the best option I can see.. Is GoDaddy Virtual Private Servers. They have not one.. But THREE webmail providers available automatically with each of my websites. I didn’t have to lift a finger.  I just  apprehensive about how this will work.. Once I start doing more broadcast emails..

I think that Broadcast Emails are a critical option for MOST companies.. But most people are too scared to do it. I love getting email reminders.. I love getting automatic email reminders. And scheduled emails from clients and systems.

I just can’t BELEIVE that Google and Outlook / Microsoft don’t have lower priced email packages for businesses. I think it is called collusion.. When two giants copy each other’s actions… And stifle innovation.

What other options to I have? Yahoo?

ROFL.  As if.

Google Voice – Multiple Numbers – anyone use this ?

Hey does anyone I know use Google Voice in a business environment.. In order to screen calls? If so, I would love to talk (Seattle area code) 934 9333.

I think I want two Google voice numbers.. One for personal and one for business.. And be able to screen calls from new clients.. So that I don’t get interrupted during the workday.

It seems like such a nice feature. But from this article.. It sounds like there are MAJOR problems in separating traffic between the two numbers.

I guess I need to have a bunch of different Google Account.. GMail accounts. Its just such a shame that Google had to kill their free products.. In order to CON people into using their Office Software.   Seems completely worthy of antitrust investigations.

They got people hooked onto Gmail (with a custom domain) and then bundled that with Google Documents.

I have always loved MS Office.. And its a major shame that Google AND Microsoft killed their free offerings… So that they could charge people fifteen bucks a month for email service.

It blows my mind. That nobody can offer decent email with custom domains.. For a dollar per user per month.

I honesty. Sincerely. Am going to start relying on my GoDaddy Virtual Private Server to host my own email services.  Seems ridiculous to me. But it is the only path forward.

http://www.androidpolice.com/2011/06/10/google-voice-now-lets-you-have-multiple-numbers-on-one-account-for-a-fee/

VersionPress is VERY important. And NOT getting enough funding

If anyone wants to contribute to a VERY good open source project.. I really think that VersionPress would be a great candidate.  I love WordPress. It makes web development fun again.. Every day I find a plugin that seems like a FANTASTIC feature.

The one weak point in most WordPress projects has got to be version control.  There is nothing more frustrating than a poorly written plugin or theme.

I really think that WordPress NEEDS automatic version control .. Sure would love to have this feature.

I hope that they can come up with the funding for this project in time.

http://wptavern.com/versionpress-hopes-to-bring-version-control-to-the-masses?utm_source=rss&utm_medium=rss&utm_campaign=versionpress-hopes-to-bring-version-control-to-the-masses

Why doesn't Facebook allow me to save DRAFTS before I post them?

I write a lot. Sorry. It happens.   I am frequently more verbose than necessary.

I just for the life of me.. Can’t believe that FACEBOOK doesn’t have the decency. to allow me to save a DRAFT..  When I am halfway done with a thought.. I *SHOULD* be able to just ‘SAVE AS DRAFT’.

Come on Facebook.. what years is this?

Is it REALLY too complex to offer drafts?

SAME THING FOR YOU MR LINKEDIN!

 

Microsoft Paying Bloggers To Write About Internet Explorer | Uncrunched

I think it is so sad that people give Microsoft such a hard time.

I honestly love Internet Explorer. I don’t see anyone building easy Chrome automation tools using ActiveX control… The ability to push IE via Visual Basic means I have always cared less about other browsers.

I just can’t fathom that a company the size of Microsoft has short-mans complex…

I think that Internet Explorer is easily the best browser.

With Chrome and Firefox.. I am always having issues.. For example on one machine, I install a plugin.. That LOOKS credible.. But the next thing I know my browser’s homepage is set to conduit.  Then these other browsers have the audacity to push that setting to all of my other profiles.. So one stupid install of a browser plugin for Chrome… And then next thing I know I have a dozen infected machines.

I only ever have this problem with Chrome and Firefox.

Long live Microsoft and long live Internet Explorer.

PS – bring back VB6!!

http://uncrunched.com/2014/06/17/microsoft-paying-bloggers-to-write-about-internet-explorer/

Strategies for Managing Spreadmarts

Business users are empowered by knowledge—and knowledge comes, in part, from having access to accurate and timely information. It is generally up to the information technology (IT) department to supply this information. But it doesn’t always work out that way.

Definition of a Spreadmart. TDWI used the following definition of a spreadmart in the survey it conducted as part of this report:

  • spreadmart is a reporting or analysis system running on a desktop database (e.g., spreadsheet, Access database, or dashboard) that is created and maintained by an individual or group that performs all the tasks normally done by a data mart or data warehouse, such as extracting, transforming, and formatting data as well as defining metrics, submitting queries, and formatting and publishing reports to others. Also known as data shadow systems, human data warehouses, or IT shadow systems.

In organizations all over the world, business people bypass their IT groups to get data from spreadmarts. Spreadmarts are data shadow systems in which individuals collect and massage data on an ongoing basis to support their information requirements or those of their immediate workgroup. These shadow systems, which are usually built on spreadsheets, exist outside of approved, IT-managed corporate data repositories, such as data warehouses, data marts, or ERP systems, and contain data and logic that often conflict with corporate data. Once created, these systems spread throughout an organization like pernicious vines, strangling any chance for information consistency and reliability. You’ll find them in all industries, supporting all business functions. According to TDWI Research, more than 90% of all organizations have spreadmarts. (See Figure 1.)


Does your group have any spreadmarts?

Strategies for Managing Spreadmarts

Spreadmarts often lead to the phenomenon of duelingspreadsheets. Murray Trim, a management accountantwith Foodstuffs South Island Limited, described one suchsituation: “We have had the classic situation of two peoplepresenting ostensibly the same data at a board meeting withdifferent figures, which they got from different spreadmarts.”Donna Welch, a BI consultant at financial holding companyBB&T, talks about the issues of trust that arise from duelingspreadsheets: “We constantly hear our users talk aboutmanagement’s distrust of their reports because multiple peoplecame up with different answers.”

Who and Why. Spreadmarts are usually created by business analysts and power users who have been tasked to create custom reports, analyses, plans, benchmarks, budgets, or forecasts. Often, these analysts—especially those in the finance department and the financial services industry—have become proficient with Microsoft Excel or Microsoft Access and prefer to use those tools to create reports and analyses. As a result, most are reluctant to adopt a new corporate reporting “standard,” which they believe will limit their effectiveness. Change comes hard, especially when it means learning a new toolset and adapting to new definitions for key entities, calculations, or metrics. Executives perpetuate the problem because they don’t want to pay hundreds of thousands of dollars or more to build a robust data infrastructure and deploy enterprise reporting and analysis tools. Instead, spreadmarts proliferate.

Dangers of Spreadmarts

Inconsistent Views. The problem with spreadmarts is that their creators use different data sources, calculations, calendars, data conversions, naming conventions, and filters to generate reports and analyses based on their view of the business. The marketing department views customers and sales one way, while the finance department views them another way. The way the business operates in Germany is different from the way it operates in Brazil. Business units sell the same products with different names, packaging, pricing, and partner channels. When each group manages its own data and processes, it’s nearly impossible to deliver a consistent, enterprise view of customers, products, sales, profits, and so on. These parochial silos of data undermine cross-departmental and business unit synergies and economies of scale.

Excessive Time. In addition, business analysts spend two days a week—or almost half their time—creating spreadmarts, costing organizations $780,000 a year! Instead of analyzing data, these high-priced employees act like surrogate information systems professionals, gathering, massaging, and integrating data. Many executives have initiated BI projects simply to offload these time-consuming data management tasks from analysts.

Increased Risk. In addition, spreadmarts are precarious information systems. Because they are created by business users, not information management professionals, they often lack systems rigor. The problems are numerous:

  • Users often enter data into spreadmarts by hand, which leads to errors that often go undetected.
  • Few spreadmarts scale beyond a small workgroup.
  • Users may create poorly constructed queries, resulting in incorrect data.
  • Spreadmarts may generate system and data errors when they are linked to upstream systems or files that change without notice.
  • Users embed logic in complex macros and hidden worksheets that few people understand but nevertheless copy when creating new applications, potentially leading to unreliable data.
  • There is no audit trail that tracks who changed what data or when to ensure adequate control and compliance.

In short, spreadmarts expose organizations to significant risk. Business people may make decisions based on faulty data, establish plans using assumptions based on incorrect analyses, and increase the possibility of fraud and theft of key corporate data assets.

Not All Bad?

No Alternative. Despite these problems, there is often no acceptable alternative to spreadmarts. For example, the data that people need to do their jobs might not exist in a data warehouse or data mart, so individuals need to source, enter, and combine the data themselves to get the information. The organization’s BI tools may not support the types of complex analysis, forecasting, or modeling that business analysts need to perform, or they may not display data in the format that executives desire. Some organizations may not have an IT staff or a data management infrastructure, which leaves users to fend entirely for themselves with whatever tools are available.

As such, spreadmarts often fill a business requirement for information that IT cannot support in a timely, cost-effective manner. Spreadmarts give business people a short-term fix for information that they need to close a deal, develop a new plan, monitor a key process, manage a budget, fulfill a customer requirement, and so on. Ultimately, spreadmarts are a palpable instantiation of a business requirement. IT needs to embrace what the business is communicating in practice, if not in words, and take the appropriate action. Thus, spreadmarts should not be an entirely pejorative term.

Cheap, Quick, Easy. Moreover, since spreadmarts are based on readily available desktop tools, they are cheap and quick to build. Within a day or two, a savvy business analyst can prototype, if not complete, an application that is 100% tailored to the task at hand. Although the spreadmart may not be pretty or “permitted,” it does the job. And it may be better than the alternative—waiting weeks or months for IT to develop an application that often doesn’t quite meet the need and that costs more than executives or managers want to pay.

Nevertheless, there is a high price to pay for these benefits in the long term. Many executives have recognized the dangers of spreadmarts and made significant investments to fix this problem. However, not all have succeeded. In fact, most struggle to deliver a robust data delivery environment that weans users and groups off spreadmarts and delivers a single version of truth.

Remedies

Managed BI Environment. The problem with spreadmarts is not the technology used to create them. Spreadsheets and other desktop-oriented tools are an important part of any organization’s technology portfolio. The problem arises when individuals use these tools as data management systems to collect, transform, and house corporate data for decision making, planning and process integration, and monitoring. When this happens, spreadmarts proliferate, undermining data consistency and heightening risk.

The technical remedy for spreadmarts is to manage and store data and logic centrally in a uniform, consistent fashion and then let individuals access this data using their tools of choice. In other words, the presentation layer should be separated from the logic and data. When this is done, business users can still access and manipulate data for reporting and analysis purposes, but they do not create new data or logic for enterprise consumption. At TDWI, we call this a managed business intelligence environment. The goal is to transform spreadmarts into managed spreadsheets. This lets IT do what it does best—collect, integrate, and validate data and rules—and lets business analysts do what they do best—analyze data, identify trends, create plans, and recommend decisions.

BI vendors are starting to offer more robust integration between their platforms and Microsoft Office tools. Today, the best integration occurs between Excel and OLAP databases, where users get all the benefits of Excel without compromising data integrity or consistency, since data and logic are stored centrally. But more needs to be done.

Change Management. Applying the right mix of technology to address the spreadmart problem is the easy part. The hard part is changing habits, perceptions, behaviors, processes, and systems. People don’t change on their own, especially when they’ve been successful with a certain set of tools and processes for analyzing data and making decisions. Changing a spreadmart-dependent culture usually requires top executives to both communicate the importance of having unified, consistent, enterprise data, and to apply incentives and penalties to drive the right behaviors. Ultimately, change takes time, sometimes a generation or two, but the right organizational levers can speed up the process.

Aligning Business and IT. Another dynamic driving spreadmarts is the lack of communication and trust between business and IT. The business doesn’t adhere to the architectural standards and processes designed to support its long-term interests, while IT doesn’t move fast enough to meet business needs. To reverse this dynamic, both business and IT must recognize each other’s strengths and weaknesses and learn to work together for the common good. IT must learn to develop agile information systems that adapt quickly to changing business conditions and requirements. The business must recognize the importance of building sustainable, scalable solutions. IT must learn about the business and speak its language, while the business must not blame IT for failures when it continually underfunds, overrides, and hamstrings IT so that it cannot possibly serve business needs.

Recognizing that you have a spreadmart problem is the first step. Most of the people we surveyed know their organizations have spreadmarts, but they don’t know what to do about them.

The survey presented respondents with nine different approaches to addressing the spreadmart issue. (See Table 1.)


What strategies have you employed to remedy the problems caused by spreadmarts, and how effective were they?

Strategies for Managing Spreadmarts

Table 1. Respondents could select more than one response.


Ironically, the most common approach that organizations use is simply to leave the spreadmarts alone. But as with everything else in life, ignoring a problem does not make it go away, and often makes it worse. When asked how effective this approach was, a majority (58%) said “not very effective.”

Replace with BI Tools. The next most popular approach is to “provide a more robust BI/DW solution,” employed by almost two-thirds of respondents (63%). This approach was considered “very effective” by 24% of respondents. BI software has progressed from best-in-class niche products to BI platforms that provide integrated reporting, analysis, visualization, and dashboarding capabilities within a single, integrated architecture. In addition, many BI vendors now offer planning, budgeting, and consolidation applications to supplement their BI offerings.

We recommend caution with these BI replacement approaches. First, don’t assume that business users will find the BI tools easy to use. Second, don’t assume that business users will see the benefit of these systems if their spreadmarts are answering their business questions today. Get business users (not just power users) involved in the selection and implementation of BI tools, provide ongoing training, and market the benefits. “If it ain’t broke, don’t fix it”—if the business users are not committed to using the BI tools, walk away from the project and look for other spreadmarts the business perceives as a problem.

Create a Standard Set of Reports. Almost as many companies (58%) assumed that creating a standard set of reports using their standard BI tools would eliminate the need for spreadmarts as those that implemented new BI tools (63%). Organizations assumed that these reports would become their systems of record for decision making. Only 18% found this approach very effective. The most likely reasons for the shortcoming were, first, that no set of reports will effectively cover every management decision, so there was a gap in what was provided. Second, since this approach burdened IT with a queue of reports to develop, the business faced two of the primary reasons spreadmarts were created initially: the IT group did not understand what the business needed, and the IT group was not responsive to business needs.

Excel Integration. The only approach respondents rated more effective than adopting BI tools was “providing BI tools that integrate with Excel/Office” (29%). For a spreadmart user, the next best thing to Excel is Excel that integrates with the corporate BI standard. This approach was used by slightly more than half of the respondents (53%). However, Office integration technology can also provide users more fuel to proliferate spreadmarts if it enables users to save data locally and disseminate the results to users. Some BI vendors—and ironically, Microsoft is one of them—now provide a thinclient Excel solution where administrators can deny users the ability to download or manipulate data.

Some experts claim that power users use BI tools mainly as a personalized extract tool to dump data into Excel, where they perform their real work. According to our survey, that’s not the case. Only a small percentage (7%) of spreadmarts obtain data this way. More than half of spreadmarts (51%) use manual data entry or manual data import. It follows that a major way to drain the life out of spreadmarts is to begin collecting the data they use in a data warehouse and create standard reports that run against that data. Of course, if there are no operational systems capturing this data, then a spreadmart is the only alternative.

Sometimes strong-arm tactics are effective in addressing spreadmarts. Reassigning the creators of spreadmarts to other activities is certainly effective, if an executive has the clout to carry this out and offers a suitable BI/DW replacement system. For example, the director of operations at a major national bank reassigned 58 people who were creating ad hoc performance reports with a set of standard reports created using a standard BI platform, saving $300 million a year and dramatically improving the bank’s quality and efficiency in industry benchmarks. This may be the dream of those who are hostile to spreadmarts, but the survey illustrates that this is a rare occurrence.

Gentler approaches are seldom very effective. New policies for the proper use of spreadsheets generally fall on deaf ears; they are very effective only 12% of the time. The problem isn’t that business people do not know how to use the spreadsheets, but that they think they have no alternative.

Multiple Solutions. Given the low percentage of respondents who can vouch for the effectiveness of any of the approaches listed in Table 1, it’s not surprising that managing the proliferation of spreadmarts is such a difficult task. It is more of a change management issue than a technological one. While it’s important to bring new technologies to bear, such as BI tools that integrate with Excel, it’s critical to figure out which levers to push and pull to change people’s habits and perceptions. No single approach is effective on its own; therefore, organizations must apply multiple approaches.

Conclusion

Spreadsheets are here to stay. Business users have them, are familiar with them, and will use them to do their jobs for years to come. Memo to IT: Deal with it! Our recommendation is to choose a solution that balances business and IT priorities and yields the greatest business value.

http://tdwi.org/Articles/2008/05/27/Strategies-for-Managing-Spreadmarts.aspx?Page=1

SSIS: Work Flow vs Stored Procedures

When importing a file into a SQL table, we create a Work Flow. But for transferring data from one SQL Server table to another SQL Server table, is it better to use Execute SQL Task (Stored Procedures) or Work Flow?

This is a classic debate in SSIS. A lot of times in data warehousing we need to transfer data from the staging tables to the fact and dimension tables. Should we use SQL Task or Work Flow?

There are 4 main considerations here:

  1. Data quality checking
  2. ETL framework
  3. Performance
  4. Development time

Data Quality Checking

There are a number of data quality checks that we need to perform on the incoming data and log them accordingly, potentially rejecting the incoming data. For example data type validations, number of columns, whether the data is within a certain allowable range or conforming to a certain list, etc. These DQ checks should be built only once and used many times, avoiding redundant work. For that purpose, it is easier to build the DQ checks in the form of stored procedures, running dynamic SQLs on many staging tables tables one by one. One of the main principle in DQ is that any silly data in the incoming data should not fail the data load. It should be gracefully recorded and the whole ETL package carries on. It is of an order of magnitude more difficult to build the DQ routines as script tasks, which are executed before the data flows into the warehouse. On the other hand, the data profiles are easier to be built using Data Profiling task. What I’m saying is that the decision whether to use a data flow or stored procedure/execute SQL task is affected by how the DQ routines were built.

ETL Framework

In every data warehousing or data integration project that uses SSIS as the ETL tool, the first step is to build an ETL framework. This framework handles error checking, alert notification, task failures, logging, execution history, file archiving and batch control. It is built as “parent child” package system, supported by a series of ETL metadata tables, as per chapter 10 of my book, e.g. data flow table, package table and status table. What I’m saying here is that the decision of whether to use a data flow or stored procedures/execute SQL task is affected by your ETL framework. I know that it should be the other way around: the ETL framework should be built to incorporate both the workflow and the stored procedures. Well if that’s the case in your project that is excellent, there’s no problem here. But practically speak I’ve seen several cases where we could not implement a data transfer routine as a workflow because the the ETL framework dictates that they need to be implemented as a stored procedures.

The next 2 points are the guts of the reasons. They are the real reasons for choosing between work flow approach and stored procedures, if it is a green field. Meaning that you have a complete freedom to choose, without any of the existing corporate rules/architecture affecting your decision.

Performance

Performance is about how fast the data load is. Given the same amount data to load from the staging table into the main table, which one is the fastest method, using select insert, or using a data flow? Generally speaking, if the data is less than 10k rows, there’s no real difference in performance. It is how complicated your DQ stuff that slows it down, not whether it’s a workflow or a stored procedure. If you are lucky enough to be involved in a project that loads billions of rows every day, you should be using work flow. Generally it is faster than stored procedure. The main issue with a stored procedure to do 1 billion upsert in SQL Server database is the bottleneck on the tempDB and log files. Your DBA wouldn’t be happy if you blew up the tempDB from a nice 2 GB to 200 GB. Ditto with log files.

Using workflow you can split a derived column transformation into several transformations, effectively boosting the throughput up to twice faster. See here for details from SQLCat team. And this principle is applicable for any synchronous task, including data conversion transform, lookup, row count, copy column and multicast. See here for an explanation about sync vs async tasks. One thing that gives us the most performance gain is to use multiple workflow to read different partitions of the source table simultaneously. This is for sure will create a bottleneck on the target, so it too needs to be partitioned, pretty much the same way as the source table. The other thing that increases the performance is the use of cache on lookup transformation. Using Full Cache, the entire lookup table is pulled into memory before the data flow is executed, so that the lookup operation is lightning fast. Using Partial Cache, the cache is built as the rows pass through. When a new row comes in, SSIS searches the cache (memory) for a match. Only if it doesn’t find then it fetches the data from disk. See here for details. You don’t get all these when you use stored procedures to transfer the data.

Development Time

You may say that development time is inferior compared to performance, when it comes to how big it influences the decision between work flow and SP. But in reality this factor is significant. I have seen several cases where the ETL developer is more convenient coding in Transact SQL than using SSIS transformations. They are probably twice as fast building it in stored procedures than doing it in SSIS transformations, due to their past experience. Understandably, this is because the majority of the so called “SSIS developer” was a “SQL developer”. They may have been doing SSIS for 2 years, but they have been doing SQL stored procedures for 10 years. For example, many developers are more conversant doing date conversion in Transact SQL than in Derived Column.

Conclusion

If you are lucky enough to be able to choose freely, work flow gives more performance and flexibility. But as with everything else in the real world, there are other factors which tie your hands, e.g. the data quality checking, the ETL framework and the development time.

As always I’d be glad to receive your comments and discussion at vrainardi@gmail.com. Vincent 27/2/11.

http://dwbi1.wordpress.com/2011/02/27/ssis-work-flow-vs-stored-procedures/

SSAS Performance – Best Practices and Performance Optimization

Optimizations of SSAS Cubes

SSAS – Best Practices and Performance Optimization – Part 1 of 4

http://www.mssqltips.com/sqlservertip/2565/ssas–best-practices-and-performance-optimization–part-1-of-4/

Performance optimization techniques for source system design and network settings

1. To avoid more locks or lock escalations, you can specify the TABLOCK query hint while querying or ALLOW_ROW_LOCKS = OFF and ALLOW_PAGE_LOCKS = OFF when creating tables or indexes or pull data from a read only database.

2. Sometimes when you need to aggregate fact data at the source before pulling the data you could improve performance if you create indexed (materialized) views for this and instead of doing aggregations every time, pull the data from the indexed view.

3. Make sure you have resources available to SQL Server for serving your data pull requests; you can use RESOURCE GOVERNOR to control the amount of resources available to OLTP and OLAP operations. To learn more about the resource governor click here.

4. Create appropriate indexes on source tables to improve the performance of the query which SSAS fires while processing the cube or while retrieving data from the source. If you have access to the source data, you can use this DMV to identify missing indexes or you can use the Index Tuning Advisor for identifying and creating missing indexes on the source.

5. Consider creating partitions, especially on fact tables, which will improve the performance several folds. (If you have multiple partitions distributed across multiple file groups on multiple drives, then SQL Server can access it in parallel which will be faster)

6. As we all know I/O (Input/Output) is the slowest part of the hardware resources. If I/O is a bottleneck on your source system, you should consider using Data Compression which reduces I/O, but increases CPU cycle a bit (more CPU cycles are used for data compression and decompression). SQL Server 2008 and later versions support both row and page compression for both tables and indexes. Before you decide to enable compression on a table you can use the sp_estimate_data_compression_savings system stored procedure to understand how much space savings you will get. To learn more about Data Compressionclick here.

7. When we select data from a table, shared locks are placed on row/key levels. This row/key level locking escalates to page level or table level depending on the amount of rows that are selected. To minimize the amount of effort by SQL Server to manage these locks you can specify the NOLOCK or TABLOCK query hint in the query.

8. While connecting to source data system, use the default ReadCommitted isolation mode in order to avoid extra overhead/copies at the source system.

9. You can specify the maximum number of connections that SSAS can create in parallel to pull data from source systems during cube processing. This really helps in cube processing to run in parallel by creating multiple connections to refresh several dimensions and facts in parallel. The default value for this is 10 and you should consider increasing this if you have a cube with lots of dimensions/facts and your source supports more parallel connections. This will greatly improve the cube processing times.

10. If your source system (SQL Server) and SSAS are both on the same machine, you should consider using the Shared Memory net library for better performance. (The performance benefit comes from the fact that it bypasses the physical network stack. It uses the Windows Shared Memory feature to communicate between SQL Server and the client/SSAS. This Net-Library is enabled by default and used when you specify either a period or (local) as your machine name or localhost or machine name or by prefixing machineinstance name with lpc: when connecting to a SQL Server instance. To learn more this click here.)

11. During cube processing data moves from your relational data warehouse to SSAS in TDS (Tabular Data Stream) packets. As data movement between the relational data warehouse and SSAS is normally high, we should configure this to have a bigger packet size(therefore less packets) than using a smaller size (high number of packets) to minimize the overhead of breaking data down into multiple chunks/packets and reassembling it at other end. (To change the packet size you can go to connection manager, click on the All page on the left side and specify 32KB for the packet size property instead of its default value of 4KB as shown below. Please note, changing the network packet size property might be good for data warehousing scenario but not for OLTP type applications and therefore it’s better to override the packet size property for your connection separately instead of changing it on SQL Server for all connections.)

Best practices and performance optimization techniques for cube design and development

Dimension Design

1. Include only those columns in dimension which are required by the business.

Including unnecessary columns puts extra overhead on SSAS for managing/storage of these columns and takes longer for processing and querying.

2. Define attribute relationships or cascading attribute relationships.

By default all attributes are related to the key attribute, so define attribute relationships wherever applicable. For example, days roll up into months, months roll up into quarters, quarters roll up into years, etc… This makes queries faster, since it has aggregated 4 quarters or 12 months of data to arrive at yearly figures instead of having to aggregate 365 days. Make sure you don’t create redundant attribute relationships, for example “days roll up into month” and “months roll up into quarter” and also “days roll up into quarter” because this would add extra overhead.

3. Specify the appropriate attribute relationship type

By default an attribute relationship is considered Flexible, but wherever applicable make it Rigid for better performance. If you make it rigid, SSAS doesn’t bother updating members of a dimension on subsequent processing and hence improves the performance. Please make sure you are changing relationships to rigid only in cases where it does not change or else you may get exceptions during processing.

4. Turn Off the Attribute Hierarchy and Use Member Properties

Set AttributeHierarchyEnabled to False for all those attributes ( like Address or List Price etc.) for which you don’t need aggregation to be calculated and want them to access it as member properties. Setting the AttributeHierarchyEnabled property improves the processing performance and also reduces the overall cube size as those attributes will not be considered in aggregation and for index creation. This makes sense for all those attributes which have high cardinality or one to one relationships with a key attribute and which are not used for slicing and dicing; for example Address, Phone Numbers, etc…

5. Appropriately set KeyColumns property

Ensure that the Keycolumns property is set to identify unique values; for example, a month value of 1 is insufficient if the dimension contains more than a single year…so in this case combine Year and Month columns together to make them unique or key columns.

6. Setting AttributeHierarchyOptimizedState property to Not Optimized

During processing of the primary key attribute, bitmap indexes are created for every related attribute. Building the bitmap indexes for the primary key can take time if it has one or more related attributes with high cardinality (for example Address or Phone number or List price). At query time, the bitmap indexes for these attributes are not useful in speeding up retrieval, since the storage engine still must sift through a large number of distinct values to reach the desired values. Unwanted bitmap indexes increase processing time, increase the cube size as well as they may have a negative impact on query response time. To avoid spending time building unnecessary bitmap indexes during processing set the AttributeHierarchyOptimizedState property to Not Optimized.

7. Creating user defined hierarchies

You should consider creating user defined hierarchies whenever you have a chain of related attributes in a dimension as that would be a navigation path for end users. You should create at least one user defined hierarchy in a dimension which does not contain a parent-child hierarchy. Please make sure your lower level attribute contains more members than the members of the attribute above it, if this is not a case then your level might be in the wrong order.

8. AttributeHierarchyVisible property of an attribute

Although it does not impact performance, it’s recommended to set AttributeHierarchyVisibleto FALSE for all those attributes which have been included in user defined hierarchies, this removes the ambiguous (duplicity) experience to end users.

9. Defining default member

By default “All member” is considered as a default member for an attribute and hence its recommended to define a default member for an attribute especially in the case where the attribute cannot be aggregated.

Measure Group Design and Optimization

1. Partitioning the measure groups

Apply a partitioning strategy for all the measure groups (especially those which are quite large in size) and partition them by one or more dimensions as per usage. This will greatly improve the cube processing as well as query performance of the cube.

The processing and query performance improves because of the fact that multiple threads can work together on multiple partitions of a measure group in parallel for processing or for serving query response. You can even define a different aggregation strategy for each partition. For example, you might have a higher percentage aggregation for all those older partitions which are less likely to change whereas a lower percentage of aggregations for those recent partitions which are more likely to change.

(SQL Server 2012 Analysis Services Partitioning Performance Demonstration

http://www.mssqltips.com/sqlservertip/3012/sql-server-2012-analysis-services-partitioning-performance-demonstration/?utm_source=dailynewsletter&utm_medium=email&utm_content=headline&utm_campaign=20130823

)

2. Aggregation

Define the aggregation prudently for the measure groups as aggregations reduce the number of values that SSAS has to scan from the disk to generate the response. While having more (all required) aggregations improves the query performance it will be too slow during cube processing whereas if you have too few aggregations it slows down the query performance, but increases the processing performance. Ideally you should start with 20%-30% query performance improvement and can then use the Usage Based Optimization wizard to define more aggregations as discussed below. If you have created partitions on measure groups, you might consider having a higher percentage of aggregation for all those older partitions which are less likely to change whereas lower percentage of aggregations for those recent partitions which are more likely to change. You should not create aggregations that are larger than one-third of the size of the fact data.

You can define the fact table source record count in the EstimatedRows property of each measure group, and you can define attribute member counts in the EstimatedCount property of each attribute. This way you can ensure your metadata is up-to-date which will improve the effectiveness of your aggregation design and creation.

3. Usage Based Optimization Wizard – Aggregation redefined

Generally we create aggregations to gain 20%-30% performance in the beginning and the later use the Usage Based Optimization wizard to create more aggregations for all the queries being run against the cube. The idea is you enable logging for queries being run against your cube and then you use the collected information as an input to the Usage Based Optimization wizard for creating aggregations for all or long running queries. To learn more about this click here.

4. AggregationUsage Property

AggregationUsage is a property of an attribute which is used by SSAS to determine if the attribute is an aggregation candidate or not. By default SSAS considers only key attributes and attributes in natural hierarchies for inclusion in aggregations. If you find any other attribute which might be used for slicing and dicing then you should consider settingAggregationUsage to Unrestricted for including it in the aggregation design. Avoid settingAggregationUsage property to FULL for an attribute that has many members. You should not create an aggregation that contains several attributes from the same attribute relationship because the implied attribute’s value can be calculated from the first attribute of the attribute relationship chain.

5. IgnoreUnrelatedDimensions property usage

IgnoreUnrelatedDimensions is a property of the measure group which has a default value of TRUE in which case the measure group displays the current amount even for the dimensions which are not related, which might eventually lead to false interpretation. You should consider setting it to FALSE, so a measure group does not ignore an unrelated dimension and to also not show the current amount.

6. Distinct count measures

Its recommended to have each distinct count measure in a separate measure group for improving performance.

7. Referenced relationship of dimension and measure group

You should consider materializing the reference dimension if both dimensions and the measure group are from the same cube for improving performance.

Cube Processing

When we talk of processing a cube, there are two parts to it, processing data which rebuilds dimensions with attribute store, hierarchy store and fact data store and processing indexes which creates bitmap indexes and defined aggregation. You can execute a single command (ProcessFull) to perform these two operations together or execute separate commands (ProcessData and ProcessIndexes) for each of these operations, this way you can identify how much time each operation is taking.

You might choose to do the full process each time or you might do the full process followed by subsequent incremental processes. No matter what approach you use, SSAS uses job based architecture (creates a controller jobs and many other jobs depending on number of attributes, hierarchies, partitions etc.) for processing dimensions and facts.

Cube Synchronization

Cube processing requires exclusive locks on the objects which are being committed, it means that the object will be unavailable to users during the commit. It also means long running queries against SSAS prevents taking exclusive locks on the objects therefore processing may take longer to complete. To prevent processing and querying interfering with each other you can use a different strategy. You can have a cube (also called processing cube) which gets processed (refreshed with latest set of data from the source) and then another cube (also called querying cube) which gets synchronized with the first cube. The second cube is what users will be accessing. There are several ways to synchronize the second (querying) cube and one of the options is built into the cube synchronization feature.

Cube Synchronization (SSAS database synchronization) synchronizes the destination cube with the source cube with the latest metadata and data. When destination cube is getting synchronized, users can still query destination cube because during synchronization SSAS maintains two copies, one of them gets updated while another one is available for usage. After synchronization SSAS automatically switches the users to the new refreshed copy and drops the outdated one. To learn more about cube synchronization best practices click here.

Cache Warming

If you remember from the SSAS architecture, about which I talked about in Part 1 of this tip series, the Query Processor Cache/Formula Engine Cache caches the calculation results whereas the Storage Engine Cache caches aggregated/fact data being queried. This caching technique helps in improving the performance of queries if executed subsequently or if the response of the other queries can be served from the caches. Now the question is, do we really need to wait for first query to complete or can we run the query on its own (pre-execute) and make the cache ready? Yes we can pre-execute one or more frequently used queries or run the CREATE CACHE statement (this one generally runs faster as it does not include cell values) to load the cache and this is what is called Cache Warming.

As a precautionary note, you should not consider that once a query result is cached it will remain there forever; it might be pushed out by other query results if you don’t have enough space for additional query result caching.

To clear the formula engine cache and storage engine you can execute this XMLA command:

<ClearCache xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">

</ClearCache>

To initialize the calculation script you can execute this query which returns and caches nothing:
select {} on 0 from [Adventure Works]

Best practices and performance optimization techniques for Server Resources and Reporting Services.

1. Threading or parallel processing in SSAS

SSAS has been designed to perform its operations in parallel and because of this it can create multiple threads to execute multiple requests in parallel. Since creating and destroying threads is an expensive affair, SSAS maintains two sets of worker thread pools to return threads which are currently not being used, so that SSAS can again pick them up for serving other requests. These two pools are called the Query Thread Pool and the Process Thread Pool.

If you remember from the SSAS architecture which I talked about in the first tip of this series, the XMLA listener listens for incoming requests and creates (or pulls out a query thread from the query pool if one is already available) a query thread which checks for data/calculations in the formula engine cache. If required, the XMLA listener creates (or pulls out a process thread from the process pool if one is already available) a process thread which is used to retrieve data from the storage engine cache/disk. The process thread also stores the data in the storage engine cache which it retrieved from the disk whereas the query thread stores the calculations in the formula engine cache to resolve/serve future queries.

ThreadPoolQueryMinThreads and ThreadPoolQueryMaxThreads

ThreadPoolProcessMaxThreads and ThreadPoolProcessMinThreads

OLAPProcessBufferMemoryLimit

OLAPProcessAggregationMemoryLimitMin and OLAPProcessAggregationMemoryLimitMax

DataDir and LogDir

LogFlightRecorderEnabled

Scale up or scale out whenever or wherever possible

http://mytechconnect.wordpress.com/2013/08/27/ssas-performance-best-practices-and-performance-optimization/

Seattle, the New Center of a Tech Boom – NYTimes.com

SEATTLE — From his ninth-floor office in downtown Seattle, Hewlett-Packard’s cloud computing outpost, Bill Hilf can see the Space Needle, the Pike Street Market and the future of computing.

“There’s two security companies setting up over there,” said Mr. Hilf, the vice president of cloud product management at HP, pointing east. Toward the waterfront is Blue Box, a cloud hosting company. South, in Pioneer Square, is a cloud engineering outpost of EMC, a data storage giant, as well as lots more start-ups.

“It’s like Detroit used to be for car companies,” he said. “The galactic players are here, and they are creating lots of little companies. The only thing driving anyone away from here is the weather.”

Rain or shine, Seattle has quickly become the center of the most intensive engineering in cloud computing: the design and management of global-scale data centers. Last year, Mr. Hilf was hired from Microsoft’s giant cloud business, Azure, to spearhead HP’s efforts to build its own cloud service (based in Seattle, not at HP’s home in Silicon Valley), and build technology for other cloud companies.

It was a 13-minute drive from Microsoft to his new office, above the Seattle convention center. Amazon Web Services, the biggest of the cloud companies, is a six-minute walk from HP, in downtown Seattle. North over Queen Anne Hill, in the Fremont district, Google has a thousand engineers working on its cloud system. In Kirkland, near Microsoft, there is another Google facility.

It’s too early to say if this concentration of big engineering talent is sustainable over the long haul and whether it will evolve into a flywheel of innovation like Silicon Valley. For now, however, it appears to be attracting a lot of money and talent, eager to grow in the cloud.

“I could be completely wrong, but it seems like the valley is focusing on consumer stuff, like search and social, while we’re building infrastructure, things that are a lot harder to learn,” said Jared Wray, chief technology officer of cloud at CenturyLink, a large provider of Internet connectivity to business. “There’s a generational change.”

Last November, Mr. Wray’s company, a provider of cloud computing services to small and midsize businesses called Tier 3, was bought by CenturyLink for an undisclosed sum. “We’ve grown the engineering team 20 percent, to 100 people, since then,” Mr. Wray said. “They want to have 400.”

Besides talent that knows how to build infrastructure, Seattle has a number of leading cloud software companies. Tableau Software, a leader in the computer visualization of large sets of data, is across the street from Google in Fremont. Concur, used for online expense forms, is in Bellevue, near Microsoft Azure. Other companies include Chef, which produces open source cloud automation software; Apptio, a cloud monitoring company, and Socrata, which stores and publishes over 100,000 data sets for 150 government organizations.

“It’s a little more collaborative than the valley, because we still have lots of hard problems to solve in cloud computing,” said Kevin Merritt, founder and chief executive of Socrata.

Others note that Washington State has tougher noncompete clauses in its labor laws than does California, which can give established firms some peace of mind about setting up shop in Seattle but frustrate venture capitalists by making it harder for people to walk out of one company and start a competitor.

For all the action, Seattle still has only one major venture capital firm, compared with scores of such businesses in Silicon Valley. That firm, Madrona, has participated in 15 cloud deals since March 2012, raising $68 million of $304 million total venture capital for cloud companies in the Seattle area (the rest came from 20 other investors, mostly in the valley.) Those companies, all still private, are valued at $2.3 billion“This is the next two years,” said Matt McIlwain, a partner at Madrona, standing before a whiteboard where he brainstormed more than 20 areas where new cloud businesses might be built. As for the “relative dearth” of venture money, he said, the increasing number of people getting rich at the area’s new tech companies would bring in a lot of angel investors in Seattle. “The big question in town is what Steve Ballmer will do with his money.”

Besides, perhaps, buy a basketball team in Los Angeles. (Mr. Ballmer, a former Microsoft chief executive, has agreed to pay $2 billion for the beleaguered Los Angeles Clippers.)

The roots of Seattle’s strength in cloud computing, longtime observers say, goes back to the mid-1990s, when Microsoft, long the sole big power in Seattle tech, began doing extensive work in distributed computing, or making computers work together in problem solving. That concept is the root of what cloud computing is today: lots of computer servers working together for various tasks.

Several Microsoft veterans played key roles building out Amazon’s cloud. In particular, James Hamilton, a former database specialist, is credited with making cloud computing far less expensive.

Another factor is the growing presence of the University of Washington’s computer science department, now considered a leader in distributed computing. “There’s an argument that Seattle owns the cloud now,” said Ed Lazowska, who holds the Bill & Melinda Gates chair in Computer Science and Engineering at the university. “Universities are always part of the axis” in building out a regional tech center, he said.

The university, which awards about 250 computer science degrees a year, is now working on courses in machine learning, which is how computers, particularly in the cloud, study and adapt based on big streams of data.

“The cloud and big data are closely connected,” Mr. Lazowska said. “We’re incredibly lucky to be in Seattle.”

http://mobile.nytimes.com/blogs/bits/2014/06/11/seattle-the-new-center-of-a-tech-boom/?_php=true&_type=blogs&_r=0