|
Introduction
Microsoft IT (MSIT) wanted to optimize operations,
simplify planning, and improve performance and reliability by reducing the
number of versions of retail software installed and managed in the production
environment. MSIT developed a “Get Current” strategy to define hardware,
operating systems, and software standards and to measure and publish rates of
compliance.
To support the “Get Current” strategy, MSIT developed a technical
solution that aggregated server information from diverse sources and published the
information in easily consumable formats to help IT leadership, business
owners, and individual teams objectively measure success.
Why “Get Current”?
MSIT found that they were
spending a great deal of time addressing outages in the enviroment that had known
but unapplied fixes. They also found it increasingly difficult to maintain
multiple sets of support groups for all of the versions of software still
running in the environment. Because it often takes twice as long to resolve
support issues related to non-compliance, the rate of increase in the volume of
support incidences was becoming unmanageable.
The “Get Current” program
removes an entire segment of errors—those due to non-compliant hardware and
software. This allows outages to be more defined as known error types for current
platforms. The goal was to reduce variability in the environment by ensuring
that servers were running the latest versions of operating systems, server
software, drivers, and security patches on compliant hardware. This would allow
MSIT to introduce higher reliability, decrease downtime, and reduce the number
of support tickets and costs.
MSIT also wanted to
improve the service they provide to product groups through First and Best initiatives.
MSIT has a role in helping product groups release quality enterprise software
and solutions by deploying and running beta versions in the Microsoft
environment. By providing feedback about their experience to product groups
during First and Best initiatives, MSIT often drives improvements or helps to
identify and resolves issues that Enterprise customers may face. MSIT cannot be
succesful in deploying beta software if the environment is not running the
latest released versions of product offerings.
Solution
In designing and
implementing a “Get Current” strategy, MSIT first had to define their
compliance standards. MSIT adopted the same standards, or product
support lifecycle, as external customers who receive product support from Microsoft.
The MSIT compliance standard covers several areas or
layers of compliance:
1.
SQL Server® version
2.
Operating system version
3.
Operating system security updates
4.
Network and storage drivers
5.
Hardware models
States of compliance then
were defined and categorized into the following:
·
Current. The most recent version of operating
system, SQL Server, or the newest hardware models available.
·
Supported. Operating
system and SQL Server versions still being supported by MSIT, and hardware
models that are not yet at end of life. These are considered compliant.
·
< 18 Mo.
A term used to describe operating systems and SQL Server versions that will no
longer be compliant within 18 months, but are currently considered compliant.
·
EOW (End of
Warranty). A term used for hardware that is still compliant but no longer under
warranty by the manufacturer.
·
Extended. A
term used to describe operating systems and SQL Server versions that are no
longer compliant by MSIT standards; however, they do receive security updates.
·
EOL (End of
Life). A term used for operating systems, SQL Server, and hardware versions
that are no longer compliant by MSIT standards and do not receive security
updates.
·
Non-STD (Non-Standard).
A term used for hardware that is non-compliant, and was never a standard
purchase option.
Obtain Approval
The next phase involved one
of the most important aspects for the success of the Get Current strategy: securing
approval from the highest levels of IT ledership and governance.
MSIT presented their
strategy for building business intellegence and predictability into an ongoing
program that measures the environment against defined current standards and
obtained sign-off for their proposed policy changes with IT executives. Once the
iterative process was complete and all of the policy changes were ratified,
MSIT was able to move forward and begin implementation.
Build a BI Engine
Once the standards were defined and the necessary
approvals were in place, MSIT brought in a small team of business intelligence
(BI) people to build a scanning engine that automated information collection
and reporting. All server owners could have taken MSIT’s standards, manually
measured their compliance, and reported it to IT leadership, but it would have required
more management and resources. There also would have been the challenge of
integrating and aggregating all of that information to be able to report it to
IT executives in a meaningful format that would facilitate planning, measure
progress, or drive action.
Several technologies and resources were available to the
BI team for use in gathering source data:
·
IT Service Management Tool. Used to inventory servers to
develop the “Get Current” master list of servers. The IT service management tool
also provided the configuration item owners, in this case representing who owned
the server. That information was used to align with the organizational
taxonomy.
·
System Center Configuration Manager (SCCM). Used to deploy an
agent on each server that collected detailed data, including information about
the hardware and installed operating system, SQL Server version, drivers,
service packs, patches, and security. The agent reported the collected date to
a central SCCM server. One of the problems was that tool could only be deployed
if the servers were organized in Active Directory® in the same group. In many
cases at Microsoft, a team’s servers span several organizational groups
including those in the production environment, research or test labs, or
because information security requirements are assigned to a secure environment.
Because the SCCM agent cannot be used for all of the groups, the BI team needed
to engage with other teams to use other tools to cover the gap.
·
Service Health Checker. A central team that had access to most
of the servers ran a script that collected much of the same information as the
SCCM agent.
·
Information
Security. For servers where neither the SCCM agent nor the Service Health
Checker were able to access and extract the detailed data required because of
security constraints, generic server information was provided by the
Information Security team.
·
SQL Server Operations.
Similar to the Service Health Checker, the SQL Server operations team ran a
script that provided the detailed server information for the SQL servers.
·
Stay Current. A database created to map server ownership to the
standard organizational structure. This central repository was where all of the
information was aggregated and data logic was used to map the information from
the different sources. Business owners' plans to maintain or achieve compliance
also were stored in this repository so that progress could be measured against
planned and achieved server upgrades.
One challenge the BI team needed to address was the
creation of an organizational taxonomy. “Get Current” measures and manages
different layers of standards compliance against different teams and, depending
on the standard, the sources of data are different. Then the BI team took the
output of the technologies together, grouped them, and aggregated them into a
conforming organizational taxonomy. Creating an organizational taxonomy
standardized the components for reporting to teams, organizations, and IT
leadership.
Measure, Report, and Manage Gaps to Compliance
Once MSIT began measuring what they had running in the
production environment, they compared that to the standards. That was the key
measurement for this program, providing a comparison of the “as is” and the
“desired” states of compliance. That measurement exposed the gaps.
Each team that owned servers in the production
environment had to visit the portal, identify servers that were not compliant,
and then come up with a plan to upgrade or request exemptions for servers that
were not to be upgraded. For example, MSIT would not necessarily upgrade a server
that was due to be retired in the near future. IT leadership wanted to have
those plans known so they could identify actual targets for compliance in
fiscal planning. Because current standards evolve, there is no expectation that
the environment will ever be 100 percent compliant with the standards, but IT
leadership wanted to make sure the reasons for any non-compliance were known
and available.
Reporting
Stakeholders required a consolidated reporting view of
the current states of compliance against the standards, as well as visibility
to team’s plans to bring the out-of-compliance servers up to the standard. MSIT
built a SharePoint® portal that contained tracking details in various reports
that are consumable by a large and diverse audience. IT leadership, business owners,
service owners, and system administrators all rely on the reported information
to help them start planning and to measure their progress over time.
Microsoft technology
provided flexibility in how to provide the information in a manner that meets
the varying needs of the different roles that make up the portal audience. Some
80 percent of the information that is required for scorecards, business
reviews, and monthly presentations to IT leadership is needed in a stardard
format. Those standard formats were built with SQL Server Reporting Services or
Excel and made available through the SharePoint portal. The data is
refreshed every weekend and current reports are available by 12:00 AM every
Monday.
Some of the reports available include:
·
Compliance
by Team
·
Data Gaps by
Team
·
Compliance
by Server
·
Deferral and
Exemption Notes
·
Detailed
Server Report
·
Compliance
by Application
·
Compliance
by Service
·
Server
Details by Application
If someone requires
information that is not in a standard report available on the portal, they can
build ad-hoc reports to meet their special needs. Users
can either use all of the exposed data on the portal, or they can connect to
the back end to create their own report.
A repository of all the
data, including the mappings and all of the interlinking of the data, can be exposed
through a SQL Server view so that someone with knowledge of SQL Server can
connect and do their own analysis or reports in Excel. Or they can use a BI
analysis report that users can connect to with Excel to be able to pivot on
analysis information.
Planning, Deferrals, and Exemptions
Business groups are held accountable to the standards and
their compliance needs to be re-certified every year. Non-compliance requires either
a deferral or an exception. A deferral includes the way in which the issue will
be resolved in the current fiscal year, while an exception indicates how the
issues will be resolved or planned for in the next fiscal year’s budget cycle. No
deferrals or exemptions are granted for non-compliance of security standards or
policies.
Benefits
“Get Current” has minimized risks to applications and
services and has increased server reliability by keeping them compliant to MSIT
supported standards. There are fewer support incidents and server downtime has
been reduced by 50 percent. Reducing variability in the environment has
optimized operations costs by minimizing the number of versions of retail
software managed in MSIT data centers.
In fiscal year 2010, prior to “Get Current,” there was an
average of 1.1 tickets per server per month. As this program progressed, MSIT
saw a decrease in that average to about 0.25 for compliant systems.
As illustrated below, there was a direct correlation
between the increase in number of compliant servers and a reduction in incident
to asset ratios.
Figure 1. Ticket trending as number of compliant servers
increases
The red and blue lines reflect the number of tickets per
asset (per month) generated for compliant and non-compliant production servers
within one of the business units. The green line reflects the percentage
increase in production servers that were compliant with regard to SQL Server versions,
operating system, and hardware. As the rates of compliance increased, the
number of tickets per asset decreased. With fewer incidents on compliant
systems, there was more capacity available to address known errors on
end-of-life (non-compliant) systems.
Conclusion
Developing an overall initiative that focuses on getting
and keeping the environment acceptably compliant has removed the need for every
team to create a business justification when seeking approval for their server
upgrade plans. The investment and business justification has been built
directly into the program itself.
With the creation of the BI engine and the reporting
portal, teams are now more able to focus on creating their plans to become more
compliant with the standards and can now focus on the execution of their plans.
For More Information
For more information about Microsoft products or
services, call the Microsoft Sales Information Center at (800) 426-9400. In
Canada, call the Microsoft Canada Order Centre at (800) 933-4750. Outside the
50 United States and Canada, please contact your local Microsoft subsidiary. To
access information via the World Wide Web, go to:
This
document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES,
EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, Active Directory, and
SharePoint are either registered trademarks or trademarks of Microsoft
Corporation in the United States and/or other countries. The names of actual
companies and products mentioned herein may be the trademarks of their
respective owners.
No comments:
Post a Comment