Data Warehouse Challenges in Diverse Corporate Environments 
Many organizations have already experimented with and successfully deployed their first few data warehouse systems. As these organizations learn how to apply the data they has beyond single departments, the need for even greater access to information arises. Many organizations are growing rapidly through acquisitions and finding that each acquired division already has an established data warehouse capability or direction, yet almost none of them are integrated or standardized in their information content or presentation.
In each of these scenarios, the solution is usually to establish a corporate or enterprise-wide data warehouse initiative to bring together all of the disparate data warehouse systems and projects into a more cohesive and effective data warehousing initiative.
We've learned valuable lessons from years of developing data warehousing capabilities in large, diverse corporate environments. However, every situation is unique and no single approach works in every environment. Here we consider some of the more successful approaches for addressing today's data warehouse challenges.
Many terms are used to describe data warehousing (DW). As data warehousing becomes commonly applied within an organization, it begins to move towards a broader term that we refer to as business intelligence (BI): applying data warehousing to improve the decision-making processes of the business. Another more recent and broader term that Headstrong uses is digital intelligence (DI): enabling a data warehouse with web and wireless technology to radically transform how information is gathered, stored and distributed throughout the business environment. For the purpose of this document, these terms are used somewhat interchangeably.
Whether you refer to it as DW, BI or DI, building data warehouse systems for multiple divisions, such as are found in large corporations, is very different than building a data warehouse for a single department. This document will highlight some of the important differences in building data warehouse systems in these various settings. Within a corporate environment, there are many potential data warehouse challenges beyond the sheer number of people and organizations involved. Some of these inherent challenges include:
- Lack of consistent standards and applications
- Large quantity and variety of potential data sources
- Proprietary systems and platforms
- Tremendous variety in the number of business relationships
- Government regulation and laws
- Various budgetary procedures or constraints
- Conflicting objectives or incentives
With so many organizations potentially involved, a corporate data warehouse project involves negotiating and managing many perspectives to successfully build a digital intelligence environment that meets the needs of the corporation and the individual divisions. In doing so, there are several internal and external factors to consider. This document is organized into 8 sections that identify the eight factors that I believe are critical to the success of any data warehouse project:

Each is discussed separately, and they are not presented in any particular order of importance, since they are all of critical importance. Each section identifies the key points to consider for each of the 8 CSF's, followed by a description of why they are important.
As a final point of introduction, this document is not intended to be a Methodology describing detailed tasks and deliverables for building a data warehouse. Instead, it is a summary of the "softer" project management aspects of enterprise data warehouse development.
It should also be noted that the issues discussed in this document are independent of industry and should exist in any large corporate data warehouse project.
Section 1: Staffing & Organization
- Team composition will change as the DI environment expands and matures
- Staff retention programs may be necessary
- Consulting experience and "mindset" is vital for a corporate DI team
- New skills and specialties need to be incorporated into a DI team
- Local presence of IT developers is vital
- Full-time end-users must be on the DI team for the duration
- Finding the right balance of roles and responsibilities between the corporate and local/divisional IT teams is critical
DI teams will usually go though 3 distinct organizational stages. Stage 1 could be classified as a pilot project within a small Division or Department, using a 5 to 10 person team of existing IT staff that has little or no previous experience building DI systems. Success on that project leads to larger and more ambitious DI projects, requiring a corresponding expansion of the team during the second stage of organizational development. In Stage 2, more of the same types of staff are added through acquisitions from other IT areas or new hires. When the team develops into a corporate DI team with enterprise-wide responsibilities, it enters Stage 3. A DI team is also considered Stage 3 if it is responsible for developing a data warehouse involving multiple entities (divisions within a corporation; or multiple companies working together) in an integrated corporate environment. The evolution to Stage 3 responsibilities has profound staffing and organizational implications.
Staffing
Staffing requirements change as the team evolves through the 3 stages. During Stage 1, IT staff must know current systems, data sources and users very well as well as be quick learners that can work in an unstructured rapid application development environment. During Stages 2 and 3, these traits remain an important element of team composition; however, they need to be complemented by totally different types of individuals who excel at large-scale standardized systems development, data extraction, cleansing and integration, consulting, and training.
Even as the team evolves, three critical success factors from a staffing perspective must remain consistent throughout.
The first, not unique to just DI projects, is that retention programs may be needed in order to keep highly trained staff from leaving the organization. When a DI initiative is started, it usually involves brand-new technology with which few people in the organization have experience. As a result, expensive training must be provided to develop the necessary technical skills. It is not uncommon to spend between US$3,000 - US$8,000 training each individual on the team. Unfortunately, it is an all-too-frequent occurrence that a person will leave the organization 6 to 8 months after having been trained; moving to a job that pays better because they can now claim they have experience with DI technologies and methods that are in demand. Often, they also bring some of their co-workers with them to the new job. The pure financial costs of this are significant, and it can easily cost US$5,000 to US$15,000 to find an adequate replacement for someone who has left the DI team; not including the additional cost of specialized training that the person may need. These costs and re-training delays will adversely affect the cost, efficiency and timeliness of the DI project.
The second success factor depends on the type of individuals that make up the team. Once the team enters Stage 3 and assumes corporate systems development responsibilities, its staff will need to have a "consulting mindset." Hiring individuals that have the appropriate technical skills is relatively easy when compared with trying to transform division IT staff into "corporate consultants." When a DI team is in Stages 1 and 2 of its development, the typical line IT person who has experience developing systems for local users is fine. Basically, they are developing a data warehouse from feeder systems that they know well, and they are able to work with local end-users and IT contacts that have been established over a long period of time.
As the DI team begins to assume enterprise-wide responsibilities in stage 3 of its evolution, it has to develop the capabilities of an IT consulting firm, as opposed to continuing to operate as a line-IT shop. While the technical steps necessary to build a DI solution have not changed, the environment in which this is to be done has changed radically and new challenges may be presented. For example, the user could be 2,000 miles away, requiring weekly out-of-town trips, or the DI team members may be viewed as corporate "outsiders" by the local divisional IT shops. A new network of personal contacts (user and IT) may have to be developed quickly or a new legacy systems environment may need to be learned.
The third critical success factor is that new specialized skills outside the traditional IT skill set need to be added to the DI team. This is true for all 3 stages of team development. There are three skills that are particularly important:
- Data cleansing skills
- Database architecture skills
- OLAP and query design skills
Data cleansing is the process of standardizing and improving the quality of the data received, before it is loaded into the data warehouse system. Describing the complex process of data cleansing is beyond the scope of this document, however the importance of data quality is discussed in-depth in Section 5.
One option is to outsource this function to a 3rd party. It is generally expensive, slow, and difficult to manage, so many organizations will want to perform this function in-house. There are several advantages to performing the data cleansing function using internal staff:
- They know the source data better than anyone else
- They will learn about data quality issues first hand
- They will gain a better understanding of how the data is to be used
- The routines developed can be leveraged by non-DI projects
However, in trying to use internal IT staff to perform the data cleansing function, three difficult problems will surface over time:
- Cleaning data files does not fit into the career plan of most IT staff
- It is the end-users that have the actual knowledge of how the data needs to be cleaned, since they are the ones that will be using it to perform analysis
- How the data is cleansed has a direct impact on what the data can be used for
Attempts to train existing IT staff in data cleansing skills will usually meet with only limited success. It is a very messy, detail-oriented job that apparently few IT people enjoy doing. Likewise, attempts to get end-user ownership of this function will also meet with only limited success. The dilemma for end-users is that they know what they want to see regarding data cleanliness, however they prefer to do analysis and reporting rather than data cleansing, therefore this activity doesn't fit into their job description or career expectations either.
The obvious solution to this problem is to eliminate the need to cleanse the data in the first place; a topic that will be discussed in more detail in Section 5, Data Quality. Unfortunately, external data sources and numerous diverse systems feeding the data warehouse usually make it impossible to go with the obvious solution in the short-run. In such a situation, data cleansing has to be done by someone.
Initially, the outsourcing approach for data cleansing is the easiest, but in the long-run it also prevents the organization from really understanding, improving, and using it's data most effectively since all of the knowledge about data quality remains with the 3rd party. A recommended short-term solution is to use a vendor to work with internal staff to clean the data, and to also initiate and manage a data quality improvement initiative as well. This will help to clean-up some of the more difficult data quality problems with experienced staff from the vendor, while building the repeatable procedures to perform the cleansing functions in-house. With appropriate involvement of local staff, knowledge transfer will also train internal staff on how to perform data cleansing.
In the longer term, data cleansing and quality improvement initiatives should be made the responsibility of staff internal to the organization, with new people hired specifically to perform that function. These individuals should be in a divisional (not corporate) end-user department, and ideally part of a reporting or analysis group.
Finally, from a staffing perspective, is the issue of end-user involvement. Appropriate end-user involvement is particularly critical to the success of a DI project in all 3 stages of development for these 5 reasons:
1. There is usually no existing business process for IT staff to replicate or automate.
2. With a data warehouse system, most end-users will have a completely different systems capability than they have ever had before, so it is difficult for them to assist with systems design unless they learn the capability of the technology first-hand while the system is being designed.
3. Some programming-savvy end-users may have developed their own mini data warehouse-like analysis and reporting systems that may be replaced by the new DI system. These individuals need to be integrated into the systems development team as systems designers and champions.
4. Only experienced end-users know how the data will be used. This knowledge has a very heavy influence on the design of the database and query tools. It is said in most DI methodologies that you should begin a DI project at the end. What that means is that the DI team must have a clear understanding of what type of analysis the DI system is to be used for. This understanding can only come from the involvement of knowledgeable users.
5. A hands-on end-user must assume the role of project champion, demonstrating to his/her peers and superiors in the organization how the new DI system can help them in their day-to-day responsibilities. Neither executive steering committees nor IT staff can do that as effectively as an end-user analyst who will actually use the system in their day-to-day activities.
As the DI team moves from a local/divisional set of responsibilities to a corporate organizational structure, it is critical to not lose sight of these facts and subtle realities. In a Stage 3 environment, it can be assumed that all of the end-users and local IT staff that are needed will already be committed to various projects and responsibilities in the local Divisions. Since the corporate DI team is not directly part of the local environment, artificial but real barriers may result in the corporate DI team acquiring only part-time user and local IT involvement. This is usually not sufficient nor successful.
If adequate end-user involvement can not be acquired from local departments or divisions, then the corporate DI team has only two alternatives to ensure success:
- Hire a new end-user into the corporate DI team, and hope that that person will be able to establish the relationships necessary to become the user champion at a local level.
- Stop project activities until an appropriate end-user is added to the project team. An appropriate end-user is one who will actually be responsible and held accountable for using the DI system. They must have a stake in its success. An effective test for determining if an end-user is appropriate is : if they think they are doing their job just fine today without a new system, then they are definitely NOT an appropriate end-user.
The presence of an end-user Steering Committee is also not a substitute for the involvement of hands-on end-user analysts. An executive end-user Steering Committee is great for helping to set project priorities and remove organizational barriers, but typically the members on it will not actually be using the data warehouse in their day-to-day activities.
Organization
As soon as the corporation decides that an enterprise DI initiative (Stage 3) is warranted, there are organizational issues that need to be made. The first decision is to determine what type of team will be created to deliver the enterprise DI initiative. There are three common options usually considered:
- SWAT Team
The SWAT team approach involves creating a single DI team that goes from site to site creating the data warehouse system(s). The primary advantage to this approach is that it facilitates great standardization and consistency in the systems developed since there is only 1 team doing everything. It also has the advantage of lower development cost since less people have to be trained on how to develop a data warehouse. The primary disadvantages are that it is very slow, and those sites that have to wait may go out and develop their own solutions rather than wait for the corporate team to get to them. There may also be some political difficulties if there is a perception of favoritism in the order that the corporate team goes from site to site.

- Concurrent Teams
The Concurrent teams approach involves multiple independent data warehouse teams being established, and developing data warehouse systems concurrently. The primary advantage to this approach is that it is faster. The big disadvantage however is that it requires very strong direction to ensure that the independent teams are developing consistent, standardized systems. In practice, this is nearly impossible to accomplish. This approach also has the potential to be much more costly, since multiple teams have to be trained, and the likelihood of redundant activity is very high.

- Hybrid Teams
The Hybrid approach is the best overall approach, balancing speed of implementation with business standardization, consistency and cost effectiveness. Basically, a core Swat team is established to go from site to site to start the data warehouse project(s), provide overall project management, and ensure that data warehouse designs are consistent across sites; while the local on-site teams are responsible for actually building the system. The advantages to this approach are that it ensures data warehouse consistency across sites and maximizes synergies and cost effectiveness. With this approach, there is very little redundancy of effort, and each site can benefit for the cumulative knowledge gained by the core Swat team; facilitating a rapid start to each data warehouse project. The disadvantages to this approach is that there is still a queue of sites, and some sites may have to wait their turn until the core Swat team can get to them.

With the Swat or Hybrid approaches, the allocation of roles and responsibilities between the corporate DI team and the local/divisional IT teams also becomes a critical new aspect of the organization. The logical division of responsibilities between the two IT groups becomes important if the system is to be developed efficiently and implemented successfully at that local site.
Data Model Development & Metadata
When developing DI systems across multiple divisions, or multiple entities within a single division, the corporate DI team is better positioned to assume responsibility for overall data model development. The corporate DI team will most likely be using centralized large-scale data modeling and metadata management tools which may be cost prohibitive for project teams at the local level to purchase. In many local/divisional IT shops, it is also common for data modeling skills to be in very limited capacity. In addition, the corporate DI team can serve as the collection point for data models and metadata for all local initiatives. As the number of local DI initiatives increases, the enterprise data model will serve as both the starting point as well as the primary design coordination tool between the efforts.
Query Front-End Development
Working with OLAP and other types of query tools is the most visible activity on a DI project. Usually, the local DI team is better positioned to work closely with their end-users to develop front-ends that effectively meet their needs. When a corporate DI team tries to do this without local DI team involvement, the results seldom meet expectations.
On the other hand however, the corporate team is usually better positioned to maximize the capabilities of the technology. In many cases, the technologies also require a single administrative control point within the organization, often times known as the "master user" or "administrator", who must define and maintain the structural characteristics (projects, user ID's, security, tables, access, etc.) to the tool. The corporate DI team is also better positioned to identify synergies across multiple local DI projects, and coordinate front-end designs; especially if the ultimate goal is to develop a single enterprise front-end that is used consistently across the enterprise.
With these complementary strengths and weaknesses, the corporate and local DI teams could divide their responsibilities in this area as follows:
- Corporate should provide central administration required for any front-end query tools
- Corporate should provide technical expertise, design standards, and consulting assistance to the local DI teams in developing their front-ends
- Corporate should perform front-end design coordination across multiple local DI projects
- Corporate should review and approve all front-end designs
- Local IT should dedicate individuals to learn the front-end technology thoroughly
- Local IT should do the actual design and development of the front-end
- Local IT should gain user involvement in the design and development process
I have seen a lot of mix-and-match of responsibilities in this area, and it usually works out fine as long as business value is delivered, and standards are adhered to.
Technology Selection
Technology selection is probably the most common and difficult area of contention between the corporate and local DI teams. The corporate team will usually focus on vendors that can provide robust technologies that can handle large numbers of users, files, etc. Local teams will tend to be more interested in low cost tools that can be learned and applied quickly and easily, often times building on vendor relationships that they already have. Both views are valid, however one fact points to the corporate team being the primary owners of technology selection: Most enterprise-capable data warehouse technologies can be applied quite effectively at the local setting; while most inexpensive small-scale data warehouse technologies often selected at the local level simply do not have the scalability necessary to handle enterprise functions.
In addition, there can be a tremendous amount of redundant effort expended if multiple local DI teams are evaluating different technologies to do the same job. Similarly, the amount of time and expense the enterprise as a whole will expend on acquiring, maintaining, and training on different but redundant front-end technologies increases almost exponentially with each new front-end technology brought into the organization. For these reasons, the evaluation and selection of DI technology should be the responsibility of a corporate DI team, with local DI team involvement in the process whenever possible.
Experience has shown however, that this is usually a point of contention with local IT and business users because they want to select the glitzy technology that is going to sit on their desktop. Compounding this issue are the dozens of vendors that will come in and demonstrate customized point solutions to every end-user department at the local level. Staff at the local level will view these custom solutions and ask themselves why they should use the standard corporate technologies when vendor XYZ has just what they need, all ready to implement. To deal with this challenge, corporate and local DI shops must have a clearly defined set of requirements that any front-end technology must meet. If the vendor-promoted point solution can meet those requirements, then it should be considered. This topic is discussed further in Section 7, Architecture.
Source Data Extraction & Cleansing
As mentioned earlier, this is a function that few want to do, so the local DI team will usually be happy to let the corporate DI team do it. The reverse is also true. The correct approach has to be driven by the questions "who knows the data best, and who knows how it will be used ?". Usually, the answer to these are the local DI team. While the corporate team may have some specific data requirements defined at the corporate level, the real day-to-day power users of the data warehouse system will be at the local level. The local users will be the people that will define exactly what data needs to be extracted and the local DI team will be responsible for writing the programs to actually extract that data from the local source systems. In doing this process, the local DI team will also gain tremendous knowledge in the data that is available to the organization, and what it's potential uses and limitations are. With this knowledge, they will be well-positioned to effect any procedural changes needed to maximize the use of that data.
The role of the corporate DI team in this process is usually to provide the tools and training to efficiently extract and partially cleanse data in an automated and repeatable fashion with a minimum of manual involvement. Technology to do this function at an enterprise level is readily available, but is generally very expensive, and is usually not cost-justifiable for a single project at a local site. When viewed from an enterprise-wide perspective however, these tools are not only a good investment, but are required to manage the large number of data feeds going into the enterprise repository. Today, the nature of these tools usually requires a single point of administration using a "master" workstation to control the identification of data sources, timing of the extracts, and movement of the data files. This administration should be done at the corporate level. Additional workstations that are used to define the actual data mapping and transformation rules are usually installed at the local site, and are used by the local DI team to actually code the extract jobs.
Training
There are many different types of training that are needed in a large-scale enterprise DI effort. Training in technology, standards, techniques are required for IT staff; while end-users need training the data warehouse table structures, data contents and front-end query tool usage. For the most part, the needs for training at the corporate vs. local level differ only in scale. At an enterprise level, training has to be standardized and easily repeated across many local sites. On the other hand, it also has to be somewhat customized so that local end-users will be able to understand how the DI system can be applied to their specific jobs.
Like some of the other disciplines mentioned so far, often times the local DI teams do not have dedicated professional trainers to develop formal training courses. This is where the corporate DI team could provide another value-added service to the local DI team. The corporate DI team is well-positioned to develop the basic training materials that can be leveraged by many local DI project teams. Working with the local DI team, the corporate team can customize the training as necessary, and deliver it with a high degree of professionalism.
Project Management
There is no substitute for experience. In general, the more DI projects that you manage, the better you will be at managing DI projects. The benefits of experienced project managers will be demonstrated in faster, lower-cost DI projects. This is where having a pool of experienced DI project managers can benefit both the local DI projects and the overall enterprise effort as well. This pool of experienced project managers should be a component of the corporate DI team, and shared across divisions as needed.
Unfortunately, most local IT shops don't have experienced DI project managers, and the enterprise can not afford to create them over and over again as each local site starts their own DI project. If this happens, the enterprise will in effect be making the same mistakes over and over again as each new project manager at each local site goes through the same learning curve.
In true consulting fashion, a corporate DI team would be able to work with and mentor new local project managers in how to do it right the first time. They would have at their disposal the standard methodology, project plans, project management tools, and the other tools of efficient project management at their disposal. They would also have the benefit of the other experienced project managers in the group to help resolve issues. This experience can be invaluable to the local DI project teams in ensuring that their projects avoid common mistakes and are completed as efficiently as possible. This is not to say however, that the corporate DI team project managers should be solely responsible for managing DI projects at the local level. The ideal situation would be to team them with a local project manager who has a better knowledge of the local staff and who to see to get things done.
Section 2: Projects & Prioritization
- Separate the short-term value from the longer-term value
- Error on the side of short-term value
- Iterative horizontal integration of data is usually more effective than vertical integration done sequentially
In my experience, it is always better to link your initial success on any data warehouse project to money. Follow the money. Specifically, follow the short-term money. It sounds easy, but many a DI team has tried to include too much data and infrastructure, and too little ROI, in the first release of a data warehouse system, and failed. Spending hundreds of thousands of dollars to create a comprehensive infrastructure without delivering any new business value is usually a recipe for failure. Identifying immediate business needs with short-term return on investment for the first release of a data warehouse is critical.
How you actually go after the data in the organization also needs to be prioritized. Do you collect data using a vertical integration strategy or a horizontal strategy ? A vertical strategy would involve collecting and integrating all of the data within a site before moving on to the next site. For example, with a vertical strategy you would collect all of the financial and operational data from site 1, before you moved on to site 2 to do the same thing there. A horizontal strategy would be to collect only certain types of data from every site as quickly as possible. Using the same example, you would collect only the financial data from site 1 before moving on to site 2. Getting all the operational data would be accomplished in later system releases.
Determining which is the right strategy is critical to the long-term success of the data warehouse initiative. A vertical strategy is obviously more comprehensive and complete, but it takes longer to deliver value to the organization; and the organization has to invest more capital in the project before any value is actually delivered. The horizontal approach delivers a smaller subset of capabilities to the organization, but it does it sooner and at a lower initial cost to the organization.
The right approach depends on the organization. In my experience with both approaches however, it has always been better to deliver quick short-term benefits first, and the long-term benefits will take care of themselves in the form of strong user support for additional DI projects to go after the rest. Getting users to actually use the data warehouse sooner rather than later is critical.
This philosophy points to a horizontal integration strategy. Among all the objectives that are identified for a data warehouse system to accomplish, segregate those that are the most likely to deliver concrete short-term benefits to the greatest number of users within the organization. Those are the objectives on which can be bet the success of the data warehouse project.
Section 3: Expectations
- Beware the easy sell ... overselling is easy to do and hard to correct
- Underselling as the project progresses can end the DI effort prematurely
- Capitalize on "success stories" ... your customer is your best sales person
- Hedging is a form of subtle resistance to be dealt with
- An executive steering committee is not enough
Section 2 discussed the importance of establishing the right priorities for the data warehouse initiative. This section will follow-up on that topic with a discussion on sponsorship and public relations. Neither is a one-time effort, and both are critical to the data warehouse effort. Actually, they are more important for data warehouse projects than for most other types of IT projects because data warehouse projects have a tendency to be oversold initially, and then undersold in later stages.
Over-selling
From an over-selling perspective, the potential of data warehouse projects to deliver new capabilities and value to an organization is so great, that it is easy to set expectations far higher than can usually be delivered. Beware the easy sell ! Dazzling project sponsors with a glitzy demo is a long way from actual commitment and follow-through when there are 100 other priorities in the organization. When unmanaged, limiting factors such as the availability of source data, data quality, speed of system delivery, learning curves, and the ability of end-users to integrate the system into their daily processes can quickly disillusion end-users and management on what the system will actually do for them if they were told that the DI was going to be a "magic bullet" solution to their problems. Some clear danger symptoms that a DI project has been over-simplified and oversold include the following:
- The DI system is viewed as a way to "fix" faulty operational systems or processes.
- Targeted end users are unable to specifically articulate what they will use the DI for.
- Targeted end users are unable to specifically articulate how they will integrate the DI into their daily job processes.
- No one is discussing or asking questions about the actual source data feeding the DI.
- More than 10% of the project effort is being spent on front-end query tool selection and design.
- There are no process redesign or data quality improvement initiatives linked to the DI.
Overselling the very first small-scale Stage 1 DI project in an organization is risky, but usually not fatal from a career perspective, because the capital expenditure is small and the timeframe usually short. However, overselling a large Stage 3 corporate DI initiative can be fatal, and will taint, if not prevent, all DI projects that attempt to follow.
Techniques for avoiding an over-selling situation include:
- Create and publish a data quality report for every data source feeding the DW. This technique is discussed in more detail in Section 5, Data Quality.
- Work with the end-users to define specific detailed processes and queries that they will use in their day-to-day responsibilities.
- Make targeted end-users responsible for presenting the front-end design of the DI system.
- Hold project sponsors accountable for helping you start a data quality improvement initiative, and include it as a sub-project to the DI in all status reports.
- Create and publish a bi-weekly or monthly status report.
- Don't let project team members play with the font-end query tools until they have completed design of the data feeds and database structures. I realize this goes against some aspects of the Rapid Application Development(RAD) approach, but it is far too easy to spend precious team resources designing pretty front-ends that will be meaningless because the data is not adequate. In other words, it is dangerous from an expectations management perspective to design front-ends and queries until you know what types of queries the data will actually support.
Under-selling
From an under-selling perspective, once the data warehouse initiative is underway, little time is usually spent by the DI team to promote it. In reality, a DI initiative has to be constantly re-sold at every opportunity. This is especially critical to avoid a disconnect between overly high expectations and actual delivered capabilities. Clear danger symptoms that a DI is being undersold include the following:
- Most end-users know that the project is active, but few can articulate what it is going to deliver, and when it will deliver it.
- Most end-users are unaware that people in the organization are already using the DI.
- No (one or more) end-user can be identified as being responsible for the DI project.
Techniques for avoiding an underselling approach include:
- Communicate, Communicate, Communicate
- Create a DI Monthly newsletter
- Establish a presence on the company Web site or intranet
- Article in the company news letter
- Ask the sponsor(s) to send out a memo about the project from time to time
- Identify and work closely with a select group of users to help with the communication. Identify what they are contributing to the project, and publish that information.
- Look for end users that are applying the DI, and publish their success story.
- Give credit to the customers/users of the DI first, and they will ensure that the DI development team is recognized.
Hedging
While over-selling and under-selling are common problems, another is hedging. This frequently occurs in organizations where end-users or other IT teams are allowed or encouraged to develop their own p/c-based systems in support of their job requirements. Symptoms of a hedging problem will include:
- End-users or local IT staff say something like "Yes, the DI will be wonderful when it is done, but for now I need to do this..."
- End-users say something like "The DI will be great for other people in the organization, but my needs are different because..."
- End-users or local IT staff say something like "The DI technologies are fine for department xyz, but I met with a software vendor yesterday that already has the perfect decision support system for my needs..."
- End-users or local IT staff say something like "The DI is nice but I already do XYZ, so I won't need it..."
- Local IT staff feel that they are not involved or in control of the DI project, and are not providing much help, or are offering passive resistance, or are always working on other things.
Hedging is a serious problem for a corporate DI team (Stage 3) because it not only siphons the time and energy of critical staff away from the design of the data warehouse, but it also promotes a subtle under-current message that says the DI won't meet today's business needs. Given the right scenario, such as budget cuts, downsizing or new business issues to solve, that under-current will lead to senior management asking why the DI project is being done in the first place if so many people are working on what they consider to be better alternate solutions. Valid question, and it is usually too late for the DI team to try and answer it by the time it has been asked.
While there are fairly easy techniques to deal with over-selling and under-selling problems, dealing with the hedging problem is far more complex because it is often in the political arena. It may also point out valid problems with the DI team itself, or the approach being used to build the DI. Naturally, the DI team will be reluctant to admit such problems, which in turn, builds more resentment among the competing efforts. In order to deal with hedging, I suggest asking the following questions to help diagnose the actual problem; and then developing a strategy to deal with the problem that is realistic for the culture that exists in the organization.
1. Are you dealing with a person who views the DI project as a potential threat ? For example, I have worked on DI projects where the actual analysis performed with the DI illustrated that certain business users had not been doing their job properly.
2. Are you dealing with a "Not Invented Here" mindset. This often occurs if end-users or local IT staff feels that they have not had adequate input into the process. Sometimes it is also just a control issue whereby some people won't buy into anything unless they personally have made the decision.
3. Are you dealing with a lack of understanding about the "big picture" ? Perhaps the broader needs of the enterprise require the local staff to do things that are of no immediate value to them.
4. Are you dealing with a general resistance to anything that is directed by Corporate ?
5. Are you dealing with programmer wannabees? I have had physicians approach me with their p/c's to demonstrate the "data warehouse" they built using Foxpro or Access, and ask me what they need to do to distribute it across the enterprise (6000 physicians!).
6. Are you dealing with someone who already has their own "system", usually a series of Excel spreadsheets, SAS jobs, or MicroSoft Access database, that will be replaced by the new DI?
7. Are you competing with "datamart" vendors who are selling form over substance ?
8. Is the DI team involving the right people, or is it trying to do everything itself ?
Sponsorship
Sponsorship is also another area of critical importance to the DI project team. Sponsorship occurs at two different levels. Executive sponsorship is necessary to acquire capital for the project and get the initiative prioritized. Mid-level management sponsorship is what is actually needed to succeed on the project once the executive sponsor has approved it. Executive sponsorship is mostly a one-time effort, while Mid-level sponsorship has to occur every day of the project.
Just getting the funding for the DI initiative from executive management is never enough. Putting it in flying terms, having only a single sponsor is a great way to take off, and a sure way to crash. The reason for this is that the executive sponsor is not the end-user; and it is the Mid-level managers/users that will be the ones to determine whether or not the DI system is valuable.
Section 4: Financials
- Quantify the value of the DI often
- Look for value in areas outside the formal DI project
- Chargeback introduces complexities in DI project management
Having the money for a data warehouse project is a nice thing. It gives you the capability to hire people, buy technology, and carve-out some time to build the system. However, having the money also brings with it some pretty demanding responsibilities and challenges.
The first challenge is keeping the money. A DI project is like any other initiative within the organization. It should always be remembered that financial approval without real sponsorship will only get the project so far.
A DI project must also have a good Return On Investment (ROI) if it is to be justified. "Approved" capital can evaporate rather quickly in most organizations if a better justified project comes along. Most DI project managers perform some high-level ROI in order to get the initial funding, and then forget about doing any further ROI analysis. This is a mistake.
While ROI's aren't everything, they are still something. Previous sections have covered the many things that can happen which will cause you to lose the money to build the DI, regardless of the ROI. However, a savvy DI project manager will have a well-defined ROI analysis performed as part of the DI project, because proving that you are actually delivering the ROI that you said you would is an effective way to keep the project capital. It is not as hard to do as most people think. The key is to identify the specific detailed things that end-users will use the DI for, and then quantify the cost/benefit of doing that with the DW versus without the DW. Maybe the benefit is cost savings, or additional revenue, or reduced manual effort (i.e. headcount), or improved customer service, or increased product sales, etc. The potential benefits of DW systems are huge for most organizations, if the right requirements are identified, and if the system is implemented correctly.
The ROI activity is valuable from two perspectives. First, it forces the DI team to work with the end-users in defining specific uses for the DI. This information helps define and articulate detailed requirements, and provides test cases to test the system later on. From a secondary perspective, it helps get the users committed to the project. If they said that they would get XYZ benefit from the DI system, then they will be motivated to ensure that the system delivers that benefit.
The next challenge is following the money. Section 2 discussed the importance of prioritizing those DI requirements that can deliver financial value. From a financial perspective, the DI project manager must demonstrate value even during the development stage of the DI project. There is a responsibility to illustrate how and why the capital was spent. If a DI project is perceived by end-users or other competing project teams as spending money unnecessarily, resistance to the project will eventually build to dangerous levels. A good example of an "unnecessary" expense that I see too often is DI teams purchasing a very expensive enterprise license for an OLAP tool, or an expensive ETL tool, before they even have their first data warehouse system up and running successfully with firm business support behind it. Smart and timely purchasing decisions will help manage perceptions in this area.
Following the money should also differentiate between planned spending and unplanned spending. The most common area of unplanned spending in DI projects is in the data acquisition and cleansing activities. This will be covered in detail in Section 5, however it is important from a financial management perspective to track the cost and benefits of data quality. For a DI project, it is a bad-news good-news situation. Poor data quality can cause the DI project to go over budget if excessive data cleansing activities are needed (the bad news). On the other hand, data quality improvement initiatives started because of the data warehouse project usually deliver value to many other areas within the organization (the good news). For these reasons, it is important to track all of the costs and benefits associated with the DI project, since some of them may not be obvious. The savvy DI project manager will look for every opportunity to link the cost of the DI project to the benefits being realized in other areas. The following questions may help to do this:
- Are the tools being purchased by the DI team also being used by other teams?
- What are the one-time and on-going costs that were avoided because of DI-related expenditures?
- What redundant or manual efforts have been eliminated as a result of the DI project?
- What brand-new capabilities have been introduced as a result of the DI project, and what is their value to the organization?
ROI and tracking costs and benefits to specific users and activities becomes especially important if the organization has a charge-back approach to funding development of the enterprise DI system. A charge-back approach is usually used when the actual costs of developing the DI system are absorbed by one or more business units or divisions within the company. This is often a special area of sensitivity in Stage 3 environments where the corporate DI team members are not actually part of the local/divisional organization being charged for their services. In this scenario, the corporate DI team must deliver undisputed value to the local/divisional organization footing the bill, since the local organization may not have any choice in the matter. In such an environment, tensions can build quickly if the corporate team is not perceived as valuable to the local organization.
Volumes have been written about charge-back and IT projects, and it is an organizational decision that the DI project team has little or no control over. However, charge-back does introduce some additional variables for the DI project manager to be aware of. When developing a DI system in a charge-back environment, the following things should be considered:
- Charge-back will make the customer/end-user more demanding in their expectations
- Charge-back will force the DI project team to be as efficient and cost sensitive as possible
- Charge-back will tend to speed-up the system development process
- Charge-back will introduce additional financial management complexities
- Charge-back will require a constant focus on linking costs to value delivered
- Charge-back can potentially shift decision-making to the lowest cost solution instead of the right solution
- Charge-back can cause conflict in situations where there wouldn't otherwise be any
- Charge-back can shift decision-making to the person paying the bill instead of the person that should be making the decision
Some of the generalities above may not apply to some organizations. It depends on the charge-back system itself. If charge-back occurs frequently during the year, there may be more pressure to demonstrate the value of the DI more frequently. On the other hand, if charge-back is done only annually, then it is imperative that the DI have demonstrated undisputed value to the customer before they get the bill, otherwise they will be asked to pay for something that has not yet delivered any value to them. That is a risky proposition. This fact is something that should be considered when planning the timing of a DI project in such an environment.
Charge-back gets even more complicated if the overall costs of developing the DI are allocated to multiple organizations using some sort of formula. Coming up with the allocation formula can be an exercise in futility resulting in an allocation method that is guaranteed to leave one or more organizations angry with the outcome; however here are some approaches to allocating the costs of a DI system:
- By number of users with access to the DW
- By actual usage of the DI resources
- By amount of data stored for each organization
- By number of software licenses assigned to each organization
- By a percentage of the time needed to develop the system for each organization
- By an even distribution of costs to all organizations
- By the revenue generated by each organization
Each one of these approaches has pro's and con's, and each one can be circumvented by any one of the organizations receiving the charge-back. Regardless of which one is selected, it is important to establish it at the outset of the project, and not at the end of the project in order to avoid any unwelcome surprises to the customer organizations.
It is also important to avoid having a charge-back mechanism just for the DI. If the DI is the only system that is developed using a charge-back mechanism, then it will be inextricably linked to how people feel about the charge-back mechanism itself. If people are angry about the charge-back they are receiving, then they will be angry about the system causing it, resulting in a very difficult situation for the DI team to overcome. Obviously, charge-back has both good traits and bad. It is usually outside of the control of the actual DI team, but it must be reckoned with. Demonstrating value is the key.
Section 5: Data Quality
- Data quality will define the usefulness of the data warehouse
- Assess, report, follow-up frequently on data quality
- Communicate the limitations of the data
- Quantify the cost of poor data quality
Perceived data quality is THE determining factor on whether or not the completed data warehouse system will actually be used. Regardless or how technically advanced the DW system is, or how much functionality has been provided to the end-users; in the end, it is data quality that defines the success or failure of the warehouse. Perception is reality, as the saying goes. Data quality includes many different perspectives:
- Timeliness of the data
- How frequently is the data warehouse is updated ?
- Reasonableness of the data
- Does the data make sense?
- Accuracy of the data
- Is the data an accurate representation of what was entered at the source?
- Completeness of the data
- Is the minimum data set populated consistently?
- Comprehensiveness of the data
- Is there enough data to meet the stated analytical needs?
- Granularity of the data
- Is the data sufficiently detailed?
It is critical to have the end users identify what they consider to be the minimally acceptable data quality standards. For example, if the user needs the data to be refreshed daily, but it is only available quarterly, then it is highly likely that the data warehouse will fail the user's "timeliness" perspective, and it will not be used. If the data is incomplete, unreasonable, insufficiently populated, or inaccurate, it will also cause the data warehouse to be perceived as a failure. In a perfect world, an agreement should be reached on the metrics defining data quality as part of the requirements for the project.
However, in the real world, often times the targeted end-users for the new data warehouse system simply don't know about the quality of the data. Why should they? The users of the data warehouse probably aren't the same people that enter the data into the source systems feeding the data warehouse; and they probably have not had the opportunity to get at and analyze the data. That is why the data warehouse system is being built in the first place. So we have a dilemma: if the data warehouse is totally dependent on data quality, yet we don't know what the data quality is, how do we proceed?
Each of these perspectives on data quality also has to be managed as a production process. It is relatively easy to identify and talk about data quality as an issue, but it takes well-defined processes and more than a little tenacity to actually address data quality problems. Management of data quality must include the following activities in order to be effective:
- Initial assessment of each data source
- Documented user-defined edit checks
- Documented IT-defined edit checks
- Production of data quality reports every time new data is added
- Creation of a data quality improvement team if needed
The initial assessment of each data source is usually the "eye-opener" for end-users. A good assessment will identify the following information about the data source:
- Frequency that each data field is populated
- Ranges of values that are present in each numeric or coded data field
- Percentage that each value is populated in a given field
- Relational checks for data fields or records that are linked together
- Duplicate checking
- Derived checks
The list of possible things to check for in a data quality assessment depends on what the end-users want to do with the data. A basic understanding of what type of analysis the data is to be used for is what drives the process. Both internal and external data sources should be assessed. In the examples above, every one of them can be a "show stopper" depending on what the end-user analyst wants to do. There are vendor products that can help assess data quality, or programs can be written to do it.
Sometimes, just reporting on the data quality issues is not enough. Decision-makers within the organization may not be able to see the implications of a specific data quality issue just by looking at a frequency report. In situations like this, specific examples that illustrate what types of analysis that can't be accomplished because of data quality issues can be a very important component of the assessment. For example, it may be far more effective to say that "the study of pediatric ear infections can not be supported by this data because 80% of the patients in the source data are over 65 years old", as opposed to saying "80% of the patients are over 65."
The same products and programs that are used to assess data can also be used to "cleanse" or even correct the data. The data warehouse project team must have clear directives from the end-user analysts on exactly how the data is to be manipulated during the cleansing process.
It should also be noted that data cleansing is not exclusively an IT function. Sure, it requires some programming or tool administration that IT will do, but what it really requires is business ownership of the process and hands-on business analyst involvement to define the cleansing rules and interpret the quality assessment reports. Using a previous example, the fact that 80% of the patients in the source data are over age 65 may not be noticed by IT staff; however an end-user analyst is very likely to notice it.
Another important aspect of data quality is determining whether or not there are sufficient unique keys to integrate multiple data sources in the way that the end-users need them integrated. What is the integration quality of the data ? Is the integration well defined and auditable, or is it going to have to be done using "fuzzy logic". An area where fuzzy logic is common is when trying to integrate multiple customer identifiers. Some common challenges in this area include: no SSN available; SSN is for the head of household instead of the actual customer; multiple spellings of name; different birth dates; etc.. Most data integration problems can be solved with vendor tools or programming, but the end-user must understand the likely accuracy of that process early in the project. For example, a 90% accuracy rate on matching customer ID's may be acceptable to one analyst, but totally unacceptable to another analyst. The perception of data quality is defined by the hands-on end-users.
Improving data quality is a process. It is an iterative process that builds the knowledge base of the data within the organization with each iteration. As with any process, there must be a business owner of the process. Sometimes called the Data Steward, Data Owner, or some other label, it is the role of this person, or group of people, to take responsibility for ensuring that data quality problems are addressed. One of the most effective data quality improvement efforts I have ever seen connected to a data warehouse project went through the following steps:
- Facilitated work session to identify issues: 40 key managers and staff, from business and IT areas, were asked to identify all know data quality issues, and their impact on productivity, customer satisfaction, and revenue.
- Creation of data quality issues report: over 100 issues were identified, and presented in a report in order of highest to lowest impact.
- Presentation to senior management: the 100+ issues were presented to senior management, who then identified the highest priority items to fix.
- Prioritization of fixes: separate project teams were formed to modify the source systems as needed
- Feedback on progress: the completion of fixes, and formation of new project teams, was an iterative process that occurred over several months, with oversight by the senior management steering committee.
The entire effort described above occurred over a few months' time, and it was accomplished as a project parallel to but outside of the actual data warehouse project. While the data warehouse project raised the issue, the "spark" that started it all was not that the data warehouse itself needed clean data. Instead, it was the fact that a specific well-documented amount of revenue was being lost each month in data sales because of poor data quality. Once again, follow the money.
For the DI project manager, it is also important to note that the amount of data cleansing that needs to be done will have a direct impact on how quickly the DI can be populated, both initially, and on an on-going basis. This needs to be determined as early as possible in the project, and built into the project plan. In extreme cases, if the data quality is so poor that it cannot meet any of the needs of the end-users, then the data warehouse project should be halted and replaced with a data quality improvement project. Some individuals may believe that it is still better to build the data warehouse and use the data warehouse itself to illustrate data quality problems, theoretically resulting in later data quality improvement projects. I believe that this "build it and they will come" approach is usually a mistake for the following reason: If you build it, and they come, and they determine that the data in it is useless, chances are they will never come back. Even worse, they may develop their own work-around solutions and use the poor data quality in the data warehouse system as their justification for doing it.
In some industries, there are some other unique challenges in the area of data quality. This manifests itself in many ways, such as:
- Each business/division is highly independent
- Sources of data vary widely
- There is little standardization in data collection
- There is little automation & integration of information
- Conflicting incentives sometimes prevent data collection
Regardless of the industry, from a data warehouse perspective, this once again points out the need to understand exactly what data is available first, and then mapping that to the stated business requirements for the warehouse. If it is determined that that the source data is not of sufficient "quality" to meet the stated objectives, then data quality or process (Section 6) initiatives must be started to ensure the data warehouse project's success.
Section 6: Process Design
- Know what incentives are driving current processes
- It takes an executive steering committee to drive process change
- Helping end-users develop new processes is part of DI implementation
Understanding what the incentives are for performing a certain process is usually the key to understanding why the process is being done the way it is. This affects data warehouse projects by defining how the source data is created and how people will integrated the data warehouse into their jobs.
Each data source that will feed into data warehouse has an underlying process that defines how the data was created. While the data quality assessment techniques identified in the previous section will help identify what needs to be changed, it is process design, or redesign, that will actually bring about the needed changes. In fact, most data quality issues can usually be traced back to a process issue.
Unfortunately for many DI projects, the needed process changes sometimes contradict the current incentives that the group creating the data may have. For example, a customer support department may have incentives to enter information as quickly as possible into the system with a minimum set of data being keyed-in. Meanwhile in a downstream process, and analyst may have incentives to perform detailed analysis that requires a more complete set of data. In this example, either the process of entering data needs to be revised, or the needs of the analyst need to be deleted from the data warehouse requirements. This requires Steering Committee support and involvement.
Once the data warehouse is implemented, end-users will need to be trained on how to integrate its new capabilities into their day-to-day jobs. Most standard DI training concentrates on educating users on how to operate the DI system, and usually overlook the critical aspect of training the end-users on how to actually apply it effectively in their jobs.
Process design/redesign should not be an IT responsibility exclusively. It should be driven by the affected business areas with IT involvement.
Section 7: Architecture
- DI projects are not just about technology
- Data Warehouses are not transaction systems
- Rigorous numerically scored evaluations can help with the tech selection process
- Understand what the users want, and why they want it
- Sometimes it is better to simply use what you have and know
The architecture developed and tools selected for a DI system are important, but the process by which they are selected is even more important. More than one DI team has made the classic mistake of evaluating and buying technology before they even understood the business requirements. DI is not about which OLAP tool is slicker than the next, or which hardware platform has the best performance. Sure, making smart choices in those areas is a contributor to the success of the project; but there are plenty of examples of failed DI projects that picked the “best” technology before understanding what the business really needed. However, I have never seen a DI project that picked reasonable technology and delivered true business value to the organization fail. Never lose sight of the fact that DI projects are about delivering business value to the organization. Period.
Once business requirements are clearly understood, then the process of technology evaluation and architecture design can occur. Not all technology is created equal, although at the current stage of the DI industry, most of the technology available today has been proven to work. Experience has shown that there are usually several technologies that can meet a specific requirement, and the differences between them are often times not that important. Experience has also shown that the best technology is not always the right technology to select, nor do you always need to go out and buy technology right away. For example, if this is the first data warehouse project for the organization, and you are only dealing with one or two source files and relatively simple data; then evaluating and buying a $200,000 extract/transform/load (ETL) tool is simply not needed. In this example, it may be far smarter to use SQL and write your own ETL programs.
Defining what the “best” technology is goes beyond just technical excellence or performance. The questions that should be asked when selecting data warehouse technology are as follows:
- Does it have the required functionality?
- Is it scalable enough for the intended mission?
- Is the vendor financially stable and reasonable to work with?
- Does the vendor support what they sell?
- Do the end-users have a preference? If so, why?
- What are the purchase and on-going costs of the technology?
- How does the technology fit into the overall technical strategy of the organization?
- Is one of the technologies easier to learn/support?
- What is the vendor’s stake in your success?
- What is the financial stability of the vendor?
- What is the vendor’s technical direction and vision?
The right technology decision is the one that answers all of the above questions best. It sounds simple, but rarely is because each person in the organization has his own biases and decision-making criteria. Clear identification of business and technical requirements, and an associated weight for each requirement, will make scoring and selection of a technology an easier process.
Even after a rigorous requirements-driven evaluation and numerical scoring of the results, the decision may still need to be subjective based on what end-users really want. A classic example of this is the Beta vs. VHS VCR standards. Beta was superior in every technical way, but the vast majority of customers purchased VHS. In the end, the world survived and we are all using VCR’s. With data warehouse projects, you should care about technology, but not that much, because in the end, it is the customer that will have to use it to be successful. As long as the technology will do the job, then it is the “right” technology. In the data warehousing space, most technologies have matured to the point where they all work, albeit with differing strengths and weaknesses.
There are two other points worth mentioning that are specific to vendors who are selling customized data marts or technology. In some cases, a software vendor will approach the end-user directly, and sell them on a data-mart that is perfectly tailored to their needs. At this point, the data warehouse team will be on the defensive, having to defend their basic approach and technology choices. A thorough understanding of the business requirements and the technology requirements will be necessary in order to determine if the vendor’s solution should be incorporated into the solution or not. The other issue with vendor-supplied functional data marts is that the vendor usually omits all the ugly details about how the data gets into the slick data mart they are selling. As a result, the end-user wants to go out and buy it instead of the working with you to build a data warehouse. The more the end-users know about the process of building a data warehouse, which as you know is mostly about data and processes, the easier it will be to make the right technology decisions.
The introduction of Web and wireless technologies is also having an impact on data warehousing architectures. The days where an end-user must sit at their in-office workstation to access a data warehouse are quickly fading. Push technology, whereby business intelligence information is automatically “pushed” to end-users when something of importance has changed in the data warehouse is becoming more commonplace. Likewise, so is using wireless devices such as PDA’s or pagers to automatically receive information from a warehouse.
The basic architectural approach used for the DI environment is another area of importance, and like technology, what you actually need may depend on the level of maturity and complexity of the organization. And example of this is illustrated on the next page.
The illustration below is representative of a mature business intelligence environment, where multiple data sources are feeding a central data repository, which in-turn feeds multiple datamarts. This example is viewed by many as the architecturally correct way to do data warehousing. However, on a first-time data warehousing effort, it may not be necessary nor practical to expend the time and effort needed to build the central data repository. In order to get the new business intelligence system up and running and in user hands as quickly as possible, it may make more sense to simply extract data from only 1 or 2 data sources and load a single data mart initially. As the success with the first data mart is realized, then the data warehouse team may want to consider extracting data from additional data sources and building a central data repository before a second data mart is requested. This approach would obviously result in some re-work, but the benefits of getting an initial successful system up and running sooner may be a
very smart trade-off.

Section 8: Implementation
- Training enables end-users
- Training in the data is as or more important than training in the technology
- End-user training is most effective when delivered by end-users
- Different levels of training programs will be needed, even for non-users
- Follow-up is critical to assess the effectiveness of training programs
Training is the means by which we enable end-users to apply the systems we develop. There is one important difference in training end-users on data warehouse systems versus operational transaction systems. With transaction systems, a finite number of processes have been pre-designed into the system with a set processing path. As long as the user enters such-and-such data into such-and-such fields, the transaction will happen the same way and deliver the same results every time. Training in a transaction environment concentrates on how to use a predefined set of on-line screens. Data Warehouse systems are different in that there are nearly unlimited numbers of processing paths that they system can take. It all depends on what the user is asking for in their query. In order to teach how to use such a system, training must focus on both the data and the query tool used to extract and analyze it.
Given the highly flexible nature of data warehouse systems, training will not be effective if it is delivered only at the conclusion of the project. In order to effectively understand the capabilities and limitations of data warehouse systems, training of key end-users actually needs to occur during the requirements definition and design stages of the project. In these stages, training should concentrate on the data elements that will be loaded into the data warehouse. Without a thorough knowledge of the data itself, end-users will apply the data warehouse system incorrectly, resulting in inaccurate analysis and/or decreased expectations. Key aspects of data training should include:
- Definition of each data element
- Source of each data element
- Transformations that are applied to each data element
- Known data quality limitations for each data source and data element
- Physical model describing the database tables, and their inter-relationships
As this design knowledge is acquired, it must be documented and included as part of the training program for new users.
In addition, abbreviated and specialized training may need to include people who are not actually users of the data warehouse system. The people who create or enter the data that actually feeds the data warehouse could benefit by giving them a brief presentation on how the data they create is used later on by other people in the organization.
Once training in the data contents of the data warehouse system has occurred, then training on how to use that data can begin. Training on how to use the data has two important components: mechanical training on how to use the query tool and business training on how to apply the new capabilities.
The relatively mechanical training on how to use the query tool is fairly easy to provide. Initially, one of the more effective approaches to this type of training is to use the vendor that is supplying the query product (MicroStategy, Information Advantage, SAS, Brio, etc.). Another alternative is to use highly experienced internal staff, either I.T. or end-users, to develop a training program internally.
Once the mechanics of how to operate the query tool are understood, then training in how to apply the tool can begin. This is the single most important component of the training program. Simply having end-users read the manual or attend a vendor-supplied off-the-shelf training program is not effective. Training must be specifically tailored to the business unit in which it is to be applied if it is to be effective. Identifying specific analytical questions that apply to that business unit, and incorporating them into the training program as case studies can best accomplish tailoring. Likewise, training must be delivered by an appropriate end-user in order to maximize its effectiveness. Training in how to apply a data warehouse is not an I.T. function.
Training is also an iterative process. Training initial first-time users on the basics of the system should be followed by additional 1-on-1 support meetings, as well as an advanced “tips and techniques” training class later on.
In addition to a having beginner and advanced training programs, the creation of a user-group can also increase the effectiveness of how the data warehouse is applied by giving end-users an interactive forum to exchange ideas.
Training evaluations should also be used, and at 2 different times. The first assessment should be completed at the end of the training class to judge the quality of the training curriculum. One or 2 months later, a second survey should be sent out to measure the effectiveness of the training in helping end-users apply the data warehouse in their day-to-day responsibilities. If this second survey indicates that most people are not using the data warehouse system, more work needs to be done to determine why so that appropriate changes can be made to the training program, or the system design itself if necessary.
Conclusion
Every project has a beginning and an end, but data warehouse projects in particular tend to be very iterative in nature. The more you put into a data warehouse system, the more that the organization learns, leading to more enhancements. In addition, the business landscape is continually evolving, and so does the need for business intelligence.
By managing the 8 critical perspectives described in this document, the data warehouse project manager can maximize the effectiveness of each data warehouse project, and survive to manage the next project, and so on, and so on, and so on…








