Simple Talk. March 17th 2011 - Redgate · The default trace in SQL Server - the power of...

Simple Talk. March 17th 2011New Wine in old bottles How many people, when their car shows signs of wear and tear, would consider upgrading the engine and

keeping the shell? Even if you're cash-strapped, you'll soon work out the subtlety of the economics, the cost ofsudden breakdowns, the precious time lost coping with the hassle, and the low 'book value'. You'll generally buya new car.

SQL Strategies for'Versioned' Data

If you keep your data according to its version number, but need to work only with a particular version, what is thebest SQL for the job? Which one works best? Which one do you use and when?

Defining .NETComponents withNamespaces

A .NET software component is a compiled set of classes that provide a programmable interface that is used byconsumer applications for a service. As a component is no more than a logical grouping of classes, what then isthe best way to define the boundaries of a component within the .NET framework? How should the classes inter-operate? Patrick Smacchia, the lead developer of NDepend, discusses the issues and comes up with asolution.

SQL Injection: Defensein Depth

So much has been written about SQL Injection, yet such attacks continue to succeed, even against securityconsultants' websites. The problem is often that only part of the solution is described, whereas the best practicerequires the use of defense in depth.

Hello, can you just sendme all your data please?

All computer security is in vain if one of those permitted users 'gives' the data away over the phone or via emaildue to a fraudulent request. It's wholly out of your control. How do you allow or prepare for this? Do you ever testfor a weakness in this area? Would you know if it happened?

Game-over! GainingPhysical access to acomputer

Security requires defense in depth. The cleverest intrusion detection system, combined with the best antivirus,wonâ€™t help you if a malicious person can gain physical access to your PC or server. A routine job, helping toremove a malware infection, brings it home to Wesley just how easy it is to get a command prompt withSYSTEM access on any PC, and inspires him to give a warning about the consequences.

Simple DatabaseBackups with SQL Azure

SQL Azure can take away a great deal of the maintenance work from a hosted database-driven website. It isn'tperfect, however, in the area of backups. Mike Mooney explains how he customised the solution with SQLCompare and SQL Data Compare to give him archived copies of the company's data

Anatomy of a .NETAssembly - PE Headers

What exactly is inside a .NET assembly? How is the metadata and IL stored? How does Windows knows howto load it? What are all those bytes actually doing? We'll start with PE Headers.

The Polyglot ofDatabases: HowKnowledge of MySQLand Oracle Can GiveSQL Server DBAs anAdvantage

Although switching between different RDBMSs can be the cause of some culture shock for the DatabaseAdministrator,it can have its advantages. In fact, it can help you to broaden your perspective of relationaldatabases, refine your problem-solving skills and give you a better appreciation of the relative strengths ofdifferent relational databases

An Introduction toPowerShell Modules

For PowerShell to provide specialised scripting, especially for administering server technologies, it can havethe range of Cmdlets available to it extended by means of Snapins. With version 2 there is an easier and bettermethod of extending PowerShell: the Module. These can be distributed with the application to be administered,and a wide range of Cmdlets are now available to the PowerShell user. PowerShell has grown up.

EntityDataSourceControl Basics

The Entity Framework can be easily used to create websites based on ASP.NET. The EntityDataSourcecontrol, which is one of a set of Web Server Datasource controls, can be used to to bind an Entity Data Model(EDM) to data-bound controls on the page. Thse controls can be editable grids, forms, drop-down list controlsand master-detail pages which can then be used to create, read, update, and delete data. Joydip tells you whatyou need to get started.

The default trace in SQLServer - the power ofperformance andsecurity auditing

Since the introduction of SQL Server 2005, there is a simple lightweight trace that is left running by default onevery SQL Server. This provides some very valuable information for the DBA about the running server, but itisn't well-documented. Feodor reveals many of the secrets of this facility and shows how to get reports from it.

Supporting Large ScaleTeam Development

With a large-scale development of a database application, the task of supporting a large number ofdevelopment and test databases, keeping them up to date with different builds can soon become ridiculouslycomplex and costly. Grant Fritchey demonstrates a novel solution that can reduce the storage requirementsenormously, and allow individual developers to work on thir own version, using a full set of data.

New Wine In Old Bottles17 March 2011by Tony Davis

How many people, when their car shows signs of wear and tear, would consider upgrading the engine and keeping the shell? Even if you'recash-strapped, you'll soon work out the subtlety of the economics, the cost of sudden breakdowns, the precious time lost coping with the hassle,and the low 'book value'. You'll generally buy a new car.

The same philosophy should apply to database systems. Mainstream support for SQL Server 2005 ends on April 12; many DBAS, if theyhaven't done so already, will be considering the migration to SQL Server 2008 R2. Hopefully, that upgrade plan will include a fresh install of theoperating system on brand new hardware. SQL Server 2008 R2 and Windows Server 2008 R2 are designed to work together. The improvedarchitecture, processing power, and hyper-threading capabilities of modern processors will dramatically improve the performance of many SQLServer workloads, and allow consolidation opportunities.

Of course, there will be many DBAs smiling ruefully at the suggestion of such indulgence. This is nothing like the real world, this halcyon placewhere hardware and software budgets are limitless, development and testing resources are plentiful, and third party vendors immediately certifytheir applications for the latest-and-greatest platform!

As with cars, or any other technology, the justification for a complete upgrade is complex. With Servers, the extra cost at time of upgrade willgenerally pay you back in terms of the increased performance of your business applications, reduced maintenance costs, training costs anddowntime. Also, if you plan and design carefully, it's possible to offset hardware costs with reduced SQL Server licence costs. In his forthcomingSQL Server Hardware book, Glenn Berry describes a recent case where he was able to replace 4 single-socket database servers with one two-socket server, saving about $90K in hardware costs and $350K in SQL Server license costs.

Of course, there are exceptions. If you do have a stable, reliable, secure SQL Server 6.5 system that still admirably meets the needs of aspecific business requirement, and has no security vulnerabilities, then by all means leave it alone. Why upgrade just for the sake of it? However,as soon as a system shows sign of being unfit for purpose, or is moving out of mainstream support, the ruthless DBA will make the strongestpossible case for a belts-and-braces upgrade.

We'd love to hear what you think. What does your typical upgrade path look like? What are the major obstacles?

Cheers,

Tony.

A

SQL Strategies for 'Versioned' Data17 March 2011by Grant Fritchey

If you keep your data according to its version number, but need to work only with a particular version, what is the best SQL for thejob? Which one works best? Which one do you use and when?

company I worked for had a well-defined need for versioned data. In a lot of databases and applications we didn’t do updates or deletes - wedid inserts. That means we had to have mechanisms for storing the data in such a way that we could pull out the latest version, a particular

version, or data as of a moment in time. We solved this problem by using a version table that maintains the order in which data was editedacross the entire database, by object. Some tables will basically have a new row for each new version of the data. Some may only have one ortwo new rows out of a series of new versions. Other tables would have 10s or even 100s of rows of data for a version. To get the data out ofthese tables, you have to have a query that looks something like this, written to return the latest version:

SELECT *FROM dbo.Document d INNER JOIN dbo.Version v ON d.DocumentId = v.DocumentId AND v.VersionId = (SELECT TOP (1) v2.VersionId FROM dbo.Version v2 WHERE v2.DocumentId = v.DocumentId ORDER BY v2.DocumentId DESC, v2.VersionId DESC );

You can write this query using MAX or ROW_NUMBER and you can use CROSS APPLY instead of joining. All of these different approaches willreturn the data appropriately. Depending on the query and the data, each one results in differences in performance. These differences inperformance really make the task of establishing a nice clean “if it looks like this, do that” pattern very difficult for developers to follow. I decidedto set up a wide swath of tests for these methods in order to establish as many of the parameters around which one works best, given areasonably well defined set of circumstances.

Database & DataI designed a small database to show versions of data. The initial design had a clustered index on each of the primary keys and you’ll note thatmany of the primary keys are compound so that their ordering reflects the ordering of the versions of the data.

Figure 1

I used Red Gate’s SQL Data Generator to load the sample data. I tried to go somewhat heavy on the data so I created 100,000 Documents,

each with 10 versions. There were 5,000 Publishers. All of this came together in 4,000,000 Publications. After the data loads, I defragmented alltheindexes. All performance will be recorded in terms of reads and scans since duration is too dependent on the machine running the query. I willreport execution times though, just to have another point of comparison.

Simple Tests to StartThe simplest test is to look at pulling the TOP (1), or MAX version, from the version table, not bothering with any kind of joins or sub-queries orother complications to the code. I’m going to run a series of queries, trying out different configurations and different situations. Each query runwill include an actual execution plan, disk I/O and execution time. I’ll also run DBCC FREEPROCCACHE and FREESYSTEMCACHE (‘ALL’)prior to each query to try to get an apples-to-apples comparison. I’ll start with:

SELECT TOP (1) v.*FROM dbo.Version vWHERE v.DocumentId = 433ORDER BY v.DocumentId DESC, v.VersionId DESC;

The first result generated a single scan with three reads in 5ms and this execution plan:

Figure 2

Next I’ll run a simple version of the MAX query using a sub-select.

SELECT v.*FROM dbo.Version vWHERE v.documentId = 433 AND v.VersionId = (SELECT MAX(v2.VersionId) FROM dbo.[Version] v2 WHERE v2.DocumentId = v.DocumentId );

This query provides a 5ms execution with one scan and three reads and the following, identical, execution plan:

Figure 3

Finally, the ROW_NUMBER version of the query:

SELECT x.*FROM (SELECT v.*, ROW_NUMBER() OVER (ORDER BY v.VersionId DESC) AS RowNum FROM dbo.[Version] v WHERE v.documentid = 433 ) AS xWHERE x.RowNum = 1 ;

Which resulted in an 46ms long query that had one scan and three reads, like the other two queries. It resulted in a slightly more interestingexecution plan:

Figure 4

Clearly, from these examples the faster query is not really an issue. In fact, any of these processes will work well, although at 46ms, theROW_NUMBER query was a bit slower. The execution plans for both TOP and MAX were identical with a Clustered Index Seek followed by aTop operation. While the ROW_NUMBER execution plan was different, the cost was still buried within the Clustered Index Seek and the queryitself didn’t add anything in terms of scans or reads. So we’re done, right?

Not so fast. Now let’s try this with joins.

1 JoinNow we’ll perform the join operation from the Document table to the Version table. This will still only result in a single row result set. First, theTOP query:

SELECT d.[DocumentName], d.[DocumentId], v.[VersionDescription], v.[VersionId]FROM dbo.[Document] d JOIN dbo.[Version] v ON d.[DocumentId] = v.[DocumentId] AND v.[VersionId] = (SELECT TOP (1) v2.VersionId FROM dbo.[Version] v2 WHERE v2.DocumentId = v.DocumentId ORDER BY v2.DocumentId, v2.VersionId DESC )WHERE d.[DocumentId] = 9729;

The query ran in 37ms. This query had 2 scans against the Version table and 6 reads and only 2 reads against the Document table. Theexecution plan is just a little more complex than the previous ones:

Figure 5

Now the MAX query:

SELECT d.[DocumentName], d.[DocumentId], v.[VersionDescription], v.[VersionId]FROM dbo.[Document] d JOIN dbo.[Version] v ON d.[DocumentId] = v.[DocumentId] AND v.[VersionId] = (SELECT MAX(v2.VersionId) FROM dbo.[Version] v2 WHERE v2.DocumentId = v.DocumentId )WHERE d.[DocumentId] = 9729;

This query ran in 32ms. It had 1 scan against the Version table and a combined 5 reads against both tables. The execution plan is as simple asthe query itself:

Figure 6

ExplanationsIn the last query, the optimizer chose to implement the MAX operation in the same way it did in the original simple example of MAX. But the TOP

function forced the optimizer to join the data using a Nested Loop. This resulted in 2 scans of 6 reads each because the top query in the joinreturned all 10 rows for the Document ID provided. But what happens if we change the query just slightly. Instead of referencing the Version tablefor its DocumentId, I’ll reference the Document table like this:

WHERE v2.DocumentId = d.DocumentId

Now when we run the query, there are one scan and six reads on the Version table with this execution plan.

Figure 7

In this instance the TOP operator is still forcing a join on the system, but instead of looping through the Version records it’s doing a single readdue to referring to the Document table directly. Now what happens if I change the query again? This time I’ll use the APPLY statement as part ofthe join:

SELECT d.[DocumentName], d.[DocumentId], v.[VersionDescription], v.[VersionId]FROM dbo.[Document] d CROSS APPLY (SELECT TOP (1) v2.VersionId, v2.VersionDescription FROM dbo.[Version] v2 WHERE v2.DocumentId = d.DocumentId ORDER BY v2.DocumentId, v2.VersionId DESC ) vWHERE d.[DocumentId] = 9729;

This time the query has a single scan on Version and a total of five reads on both tables, and this familiar execution plan:

Figure 8

So the APPLY method was able to take the single row from the Document table and find that TOP (1) match from the Version table withoutresorting to joins and multiple scans.

and ROW_NUMBER

What happened to the ROW_NUMBER function? Here’s how that query has been rewritten.

SELECT x.*FROM (SELECT d.[DocumentName], d.[DocumentId], v.[VersionDescription], v.[VersionId], ROW_NUMBER() OVER (ORDER BY v.VersionId DESC) AS RowNum FROM dbo.[Document] d JOIN dbo.[Version] v ON d.[DocumentId] = v.[DocumentId] WHERE d.[DocumentId] = 9729

) AS xWHERE x.RowNum = 1;

This query resulted in the standard single scan with five reads and ran for 48ms, but had a radically different execution plan:

Figure 9

This query only accesses each table once, performing a clustered index seek operation. Also, like the others, the results of these seeks arejoined through a Nested Loop operation. Next, instead of a TOP operator the data gets segmented by the Segment operator based on aninternal expression, a value derived within the query, probably the ORDER BY statement. This is passed to the Sequence Project operator,which is adding a value; in this case, the ROW_NUMBER or RowNum column itself. And finally, the TOP and FILTER operators reduce thenumber of rows returned to one. While this appears to be more work for the query engine, it’s performing roughly on par with the otheroperations.

Full Data SetFinally, let’s join all the data together. What we want is a list of publications, each demonstrating the max version that is less than a givenmaximum version. This is determining all the versions at a particular point in time. Here’s the new TOP, this time using APPLY right out of thegate because that proved earlier to result in a faster query:

SELECT d.[DocumentName], d.[DocumentId], v.[VersionDescription], pu.[VersionId], p.[PublisherName], pu.[PublicationDate], pu.[PublicationNumber]FROM dbo.[Document] d CROSS APPLY (SELECT TOP (1) v2.VersionId, v2.DocumentId, v2.VersionDescription FROM dbo.[Version] v2 WHERE v2.DocumentId = d.DocumentId ORDER BY v2.DocumentId, v2.VersionId DESC ) AS v JOIN dbo.[Publication] pu ON pu.[DocumentId] = d.[DocumentId] AND pu.[VersionId] = (SELECT TOP (1) pu2.versionid FROM dbo.Publication pu2 WHERE pu2.DocumentId = d.DocumentId AND pu2.VersionId <= v.[VersionId] AND pu2.PublisherId = pu.PublisherId ORDER BY pu2.DocumentId, pu2.VersionId DESC ) JOIN dbo.[Publisher] p ON pu.[PublisherId] = p.[PublisherId]WHERE d.[DocumentId] = 10432 AND p.[PublisherId] = 4813;

The complete query ran in 53ms. Here are the scans and reads:

Table 'Publication'. Scan count 2, logical reads 6...Table 'Version'. Scan count 1, logical reads 3...Table 'Document'. Scan count 0, logical reads 3...Table 'Publisher'. Scan count 0, logical reads 2...

And this execution plan:

Figure 10

It’s a bit hard to read, but it’s a series of five Clustered Index Seek operations, each taking 20% of the total cost of the batch and joined togetherthrough Nested Loop joins. This is as clean and simple a plan as you can hope for.

Here is the MAX version of the FROM clause:

FROM dbo.[Document] d JOIN dbo.[Version] v ON d.[DocumentId] = v.[DocumentId] AND v.[VersionId] = (SELECT MAX(v2.VersionId) FROM dbo.[Version] v2 WHERE v2.DocumentId = v.DocumentId ) JOIN dbo.[Publication] pu ON v.[DocumentId] = pu.[DocumentId] AND pu.[VersionId] = (SELECT MAX(pu2.VersionId) FROM dbo.Publication pu2 WHERE pu2.DocumentId = d.DocumentId AND pu2.VersionId <= v.[VersionId] AND pu2.PublisherId = pu.PublisherId ) JOIN dbo.[Publisher] p ON pu.[PublisherId] = p.[PublisherId]WHERE d.[DocumentId] = 10432 AND p.[PublisherId] = 4676;

This query ran in 46ms. Its scans and reads break down as follows:

Table 'Publication'. Scan count 2, logical reads 6...Table 'Document'. Scan count 0, logical reads 3...Table 'Version'. Scan count 1, logical reads 3...Table 'Publisher'. Scan count 0, logical reads 2...

It resulted in a very similar execution plan:

Figure 11

The execution plan consists of nothing except Clustered Index Seek and Nested Loop operators with a single TOP against the Version table.You would be hard pressed to come up with a better execution plan. The interesting thing is that the optimizer changed our MAX to a TOP as ifwe had re-supplied the TOP query. The only real difference is the order in which the tables are accessed, despite the fact that the queriessubmitted were identical. If the queries are run side-by-side, each takes exactly 50% of the cost of execution of the batch. There really isn’t ameasurable difference.

And then the query itself changes for the ROW_NUMBER version (thanks to Matt Miller for helping with this one):

SELECT d.[DocumentName],

d.[DocumentId], v.[VersionDescription], pu.[VersionId], p.[PublisherName], pu.[PublicationDate], pu.[PublicationNumber]FROM dbo.[Document] d INNER JOIN (SELECT Row_number() OVER (PARTITION BY v2.documentID ORDER BY v2.versionIDDESC) RN, v2.VersionId, v2.DocumentId, v2.VersionDescription FROM dbo.[Version] v2 ) AS v ON d.documentID = v.documentID LEFT OUTER JOIN (SELECT Row_number() OVER (PARTITION BY pu2.documentID, publisherID ORDER BY pu2.versionID DESC) RN, pu2.versionid, pu2.documentID, pu2.publicationdate, pu2.publicationnumber, pu2.publisherID FROM dbo.Publication pu2 ) pu ON pu.[DocumentId] = d.[DocumentId] JOIN dbo.[Publisher] p ON pu.[PublisherId] = p.[PublisherId]WHERE d.[DocumentId] = 10432 AND p.[PublisherId] = 4676 AND v.rn = 1 AND pu.rn = 1;

This query ran in 44ms and had an interesting set of scans and reads:

Table 'Version'. Scan count 1, logical reads 4...Table 'Publication'. Scan count 1, logical reads 3...Table 'Document'. Scan count 0, logical reads 3...Table 'Publisher'. Scan count 0, logical reads 2...

This query returned the exact same data with fewer scans and reads. In some ways it’s a bit more cumbersome than the other queries, butbased on the scans and reads alone this is an attractive query. Even the execution plan, although slightly more complex, shows the increase inperformance this approach could deliver.

Figure 12

Instead of five Clustered Index Seeks, this has only four. There is some extra work involved in moving the data into the partitions in order to getthe row number out of the function, but then the data is put together with Nested Loop joins; again, fewer than in the other plans.

Changing ResultsWhat if we change the results, though? Let’s take the same query written above and simply return more data from one part. In this case, we’llremove the PublisherId from the where clause. Now when we run the queries at the same time, the estimated cost for the TOP is only taking 49%while the estimated cost of the MAX is 50%. The difference? It’s in the Stream Aggregate in the execution plan. Part of the execution plan for theMAX uses TOP, just like the TOP query, but part of it uses an actual Aggregate operator. When the data set is larger, this operation suddenlycosts more. But, interestingly enough, the execution times for the data I’m retrieving and the number of scans and reads are the same. Adding inthe Row_Number query to run with other side by side was also interesting. In terms of execution plan cost, it was rated as the most costly plan.But look at these reads and scans:

TOPTable 'Publisher'. Scan count 0, logical reads 20...

Table 'Publication'. Scan count 11, logical reads 48...Table 'Version'. Scan count 1, logical reads 3...Table 'Document'. Scan count 0, logical reads 3... MAXTable 'Publisher'. Scan count 0, logical reads 20...Table 'Publication'. Scan count 11, logical reads 48...Table 'Document'. Scan count 0, logical reads 3...Table 'Version'. Scan count 1, logical reads 3... ROW_NUMBERTable 'Publisher'. Scan count 0, logical reads 25...Table 'Version'. Scan count 10, logical reads 40...Table 'Publication'. Scan count 1, logical reads 3...Table 'Document'. Scan count 0, logical reads 3...

The difference in scans on the Publication table, despite the fact that identical data was returned, is pretty telling for long term scalability. But thatonly went from selecting one row to selecting 10. The execution plans didn’t change and the differences were measured in very small amounts.Now instead of selecting by Document, I’ll change the query so that it selects by Publisher. Now the queries will have to process more data andreturn 100 rows. Everything changes.

The TOP query ran for 274ms with the following I/O

Table 'Publication'. Scan count 109, logical reads 7893...Table 'Version'. Scan count 100, logical reads 441...Table 'Document'. Scan count 0, logical reads 347...Table 'Worktable'. Scan count 0, logical reads 0...Table 'Publisher'. Scan count 0, logical reads 2...The MAX query ran for 254ms. Here’s the I/OTable 'Publication'. Scan count 101, logical reads 7641...Table 'Version'. Scan count 100, logical reads 864...Table 'Document'. Scan count 0, logical reads 315...Table 'Publisher'. Scan count 0, logical reads 2...

The elapsed time on the ROW_NUMBER ran up to 13 seconds. This is all from the change to using the PublisherId. Since it’s not part of theleading edge of the only index on the table - the PK - we’re forced to do a scan:

Figure 13

This is severe enough that it justifies adding another index to the table. If we simply add an index to Publication ID the scans are reduced, but noteliminated, because we’re then forced into an ID lookup operation:

Table 'Publication'. Scan count 101, logical reads 663

Instead, we can try including the columns necessary for output, Publication Number and Publication Date; the other columns are included sincethey’re part of the primary key. This then arrives at the following set of scans and reads:


And this new execution plan:

Figure 14

This then presents other problems because the Document table isn’t being filtered, resulting in more rows being processed. Rather than rewritethe queries entirely to support this new mechanism, we’ll assume this is the plan we’re going for and test the other approaches to the queryagainst the new indexes. After clearing the procedure and system cache, the MAX query produced a different set of scans and reads:


The scans against Document and the number of reads against Version were less and the execution plan, a sub-set show here, was changedconsiderably:

Figure 15

Instead of a scan against the Document table, this execution plan was able to take advantage of the filtering provided through the Version andPublication table prior to joining to the Document table. While this is interesting in overall performance terms, the differences in terms of whichprocess, TOP or MAX, works better is not answered here. The MAX still results in a Stream Aggregate operation, which we already know isgenerally more costly than the Top operations.

The most dramatic change came in the ROW_NUMBER function. Execution time was 12 seconds. The test was re-run several times to validatethat number and to ensure it wasn’t because of some other process interfering. Limiting based on PublicationId resulted in a pretty largeincrease in scans and reads, as well as the creation of a work tables:

Table 'Document'. Scan count 0, logical reads 315...Table 'Publication'. Scan count 1, logical reads ...Table 'Version'. Scan count 9, logical reads 105166...Table 'Worktable'. Scan count 0, logical reads 0...Table 'Worktable'. Scan count 0, logical reads 0...Table 'Publisher'. Scan count 0, logical reads 2...

The full execution plan is here:

Figure 16

I’ve blown up a section of it for discussion here:

Figure 17

This shows that the Sort, which previously acted so quickly on smaller sets of data, is now consuming 56% of the estimated cost since the querycan’t filter down on this data in the same fashion as before. Further, the total cost of the query is estimated at 277.188, far exceeding the costthreshold for parallelism that I have set on my machine of 50. The number of reads against the Version table makes this almost unworkable. Theinteresting point, though, is that the reads and scans against the other tables, especially the Publication table, are very low, lower than the othermethods. With some rewriting it might be possible to get the performance on this back on par with the other processes.

ConclusionWhen it comes to MAX or TOP, a well structured query running against good indexes should work well with either solution. This is largelybecause, more often than not, this type of query is interpreted in the same way by the optimizer whether you supplied a TOP or a MAX operator.It frequently substitutes TOP for the MAX operator. When this substitution is not made, the MAX value requires aggregation rather than simpleordering of the data and this aggregation can be more costly. TOP is probably the better solution in most instances when comparing MAX andTOP.

ROW_NUMBER clearly shows some strong advantages, reducing the number of operations, scans and reads. This works best on small sets ofdata. When the data sets are larger the processing time goes up quite a ways. So, to answer the questions, if you know the data set is going tobe small, use ROW_NUMBER, but if the data set is going to be large or you’re not sure how large it’s going to be, use TOP. Having made thisbold statement, please allow me to shade the answer with the following: test your code to be sure.

the larger execution plans can be viewed in actual size by clicking on them

© Simple-Talk.com

L

Simple Database Backups With SQL Azure17 March 2011by Mike Mooney

SQL Azure can take away a great deal of the maintenance work from a hosted database-driven website. It isn't perfect, however, inthe area of backups. Mike Mooney explains how he customised the solution with SQL Compare and SQL Data Compare to givehim archived copies of the company's data

Why the problem?

ast year we launched a new version of SportsCommander.com, which offered volleyball organizations the ability to promote their tournamentsand accept registrations for a negligible fee. Having grown out of our previous hosting company, we tried hosting the platform on Windows

Azure, and for the most part it’s been great. Also, the price was right.

We are also hosting our data in SQL Azure, which for the most part has been fine. It has performed well enough for our needs, and it abstractsaway a lot of the IT/DBA maintenance issues that we would really prefer not worry about. Of course, nothing is perfect. We’ve had a few snagswith Azure, all of which we were able to work around, but it was a headache.

One of the biggest issues for us was the ability to run regular backups of our data, for both disaster recovery and testing purposes. SQL Azuredoes a great job of abstracting away the maintenance details, but one of the things you lose is direct access to the SQL backup and restorefunctionality. This was almost a deal-breaker for us.

Microsoft’s response to this issue is that they handle all of the backups and restores for you, so that if something went wrong with the datacenter, they would handle getting everything up and running again. Obviously this only solves part of the problem, because many companieswant to have their own archive copies of their databases, and personally I think doing a backup before a code deployment should be an absoluterequirement. Their answer has been “if you need your own backups, you need to build your own solution.”

Microsoft is aware of this need, and it has been the top-voted issue on their Azure UserVoice site for a while.

In poking around the interwebs, I saw some general discussion of how to work around this, but very little concrete detail. After hacking around fora while, I came up with a solution that has worked serviceably well for us, so I figured I’d share it with y’all.

What's the solution?

In order to address these concerns, Microsoft introduced the ability to copy a database in SQL Azure. So, as a limited backup option, you cancreate a quick copy of your database before a deployment, and quickly restore it back if something fails. However, this does not allow forarchiving or exporting the data from SQL Azure, so all of the data is still trapped in the Azure universe.

Apparently another option is SSIS. Since you can connect to Azure through a standard SQL connection, you could, theoretically, export the datathis way. Now I am no SSIS ninja, so I was just never able to get this working with Azure, and I was spending far too much time on somethingthat I shouldn’t need to be spending much time on.

I’ve heard rumblings Microsoft’s Sync Framework could address the issue, but, uh, see the previous point. Who’s got time for that?

So of course, Red Gate to the rescue. Generally speaking their SQL Compare and SQL Data Compare solve this type of problem beautifully,because they are excellent at copying SQL content from one server to another to keep them in sync. The latest versions both support SQL Azureseamlessly. Just pony up the cash and buy them, they are beyond worth it.

How do we do it?

OK, so how do we set this all up? Basically, we create a scheduled task that creates a copy of the database on SQL Azure, downloads the copyto a local SQL Server database, and then creates a zipped backup of that database.

First, you need a SQL Server database server. And go install the Azure-enabled versions of SQL Compare and SQL Data Compare.

Also, go get a copy of 7-Zip, if you have any interest in zipping the backups.

The scheduled task will execute a batch file. Here’s that batch file:

SET SqlAzureServerName=[censored]SET SqlAzureUserName=[censored]SET SqlAzurePassword=[censored]SET LocalSqlServerName=[censored]SET LocalSqlUserName=[censored]SET LocalSqlPassword=[censored]

echo Creating backup on Azure serversqlcmd -U %SqlAzureUserName%@%SqlAzureServerName% -P %SqlAzurePassword% -S %SqlAzureServerName% -dmaster -i C:\SQLBackups\DropAndRecreateAzureDatabase.sqlecho Backup on Azure server completeecho Create local database SportsCommander_NightlyBackupsqlcmd -U %LocalSqlUserName% -P %LocalSqlPassword% -S %LocalSqlServerName% -d master -iC:\SQLBackups\DropAndRecreateLocalDatabase.sqlecho Synchronizing schema"C:\Program Files (x86)\Red Gate\SQL Compare 9\SQLCompare.exe" /s1:%SqlAzureServerName%/db1:SportsCommanderBackup /u1:%SqlAzureUserName% /p1:%SqlAzurePassword% /s2:%LocalSqlServerName%/db2:SportsCommander_NightlyBackup /u2:%LocalSqlUserName% /p2:%LocalSqlPassword% /syncecho Synchronizing data"C:\Program Files (x86)\Red Gate\SQL Data Compare 9\SQLDataCompare.exe" /s1:%SqlAzureServerName%/db1:SportsCommanderBackup /u1:%SqlAzureUserName% /p1:%SqlAzurePassword% /s2:%LocalSqlServerName%/db2:SportsCommander_NightlyBackup /u2:%LocalSqlUserName% /p2:%LocalSqlPassword% /syncecho Backup Local Databasefor /f "tokens=1-4 delims=/- " %%a in (‘date /t’) do set XDate=%%d_%%b_%%cfor /f "tokens=1-2 delims=: " %%a in (‘time /t’) do set XTime=%%a_%%bSET BackupName=SportsCommander_Backup_%XDate%_%XTime%sqlcmd -U %LocalSqlUserName% -P %LocalSqlPassword% -S %LocalSqlServerName% -d master -Q "BACKUPDATABASE SportsCommander_NightlyBackup TO DISK = ‘C:\SQLBackups\%BackupName%.bak’""C:\Program Files\7-Zip\7z.exe" a "C:\SQLBackups\%BackupName%.zip""C:\SQLBackups\%BackupName%.bak"del /F /Q "C:\SQLBackups\%BackupName%.bak"echo Anonymize Database For Test Usagesqlcmd -U %LocalSqlUserName% -P %LocalSqlPassword% -S %LocalSqlServerName% -dSportsCommander_NightlyBackup -i "C:\SQLBackups\AnonymizeDatabase.sql"

The first thing this does is run a SQL script against the SQL Azure server (DropAndRecreateAzureDatabase.sql). This script will createa backup copy of the database on Azure, using their new copy-database functionality. Here’s that script:

DROP DATABASE SportsCommanderBackupGOCREATE DATABASE SportsCommanderBackup AS COPY OF SportsCommanderGODECLARE @intSanityCheck INTSET @intSanityCheck = 0WHILE(@intSanityCheck < 100 AND (SELECT state_desc FROM sys.databases WHEREname=’SportsCommanderBackup’) = ‘COPYING’)BEGIN— wait for 10 secondsWAITFOR DELAY ’00:00:10′SET @intSanityCheck = @intSanityCheck+1ENDGODECLARE @vchState VARCHAR(200)SET @vchState = (SELECT state_desc FROM sys.databases WHERE name=’SportsCommanderBackup’)IF(@vchState != ‘ONLINE’)BEGINDECLARE @vchError VARCHAR(200)SET @vchError = ‘Failed to copy database, state = ”’ + @vchState + ””RAISERROR (@vchError, 16, 1)ENDGO

A few notes here:

We are always overwriting the last copy of the backup. This is not an archive; that will be on the local server. Instead, this always the latestcopy. Besides, extra Azure databases are expensive.For some reason SQL Azure won’t let you run a DROP DATABASE command in a batch with other commands, even though SQL 2008allows it. As a result, we can’t wrap the DROP DATABASE in an “IF(EXISTS(“ clause. So, we need to always just drop the database,which means you’ll have to create an initial copy the database drop for the first time you run the script.The CREATE DATABASE … AS COPY OF will return almost immediately, and the database will be created, but it is not done thecopying. That is actually still running in the background, and it could take a minute or two to complete depending on the size of thedatabase. Because of that, we sit in a loop and wait for the copy to finish before continuing. We put a sanity check in there to throw anexception just in case it runs forever.

Once that is complete, we create a local database and copy the Azure database down into that. There are several ways to do this, but we choseto keep a single most-recent version on the server, and then zipped backups as an archive. This gives a good balance of being able to look atand test against the most recent data, and having access to archived history if we really need it, while using up as little disk space as possible.

In order to create the local database, we run a very similar script (DropAndRecreateLocalDatabase.sql):

IF(EXISTS(SELECT * FROM sys.databases WHERE Name=’SportsCommander_NightlyBackup’))BEGINDROP DATABASE SportsCommander_NightlyBackupENDCREATE DATABASE SportsCommander_NightlyBackup

In this case, we actually can wrap the DROP DATABASE command in a “IF(EXISTS”, which makes me feel all warm and fuzzy.

After that, it’s a matter of calling the SQL Compare command line to copy the schema down to the new database, and then calling SQL DataCompare to copy the data down into the schema. At this point we have a complete copy of the database exported from SQL Azure.

As some general maintenance, we then call sqlcmd to backup the database out to time-stamped file on the drive, and then calling 7-Zip tocompress it. You might want to consider dumping this out to a DropBox folder, and boom-goes-the-dynamite, you’ve got some seriouslybacked-up database.

Lastly, we run an AnonymizeDatabase.sql script to clear out and reset all of the email addresses, so that we can use the database in atest environment without fear of accidentally sending bogus test emails out to our users, which I’ve done before and it never reflected well on us.

Run that batch file anytime you want to get a backup, or create a scheduled task in Windows to run it every night.

Anyhoo, that’s about it. It’s quick, it’s dirty, but it worked for us in a pinch. Microsoft is just getting rolling on Azure and adding more stuff everymonth, so I’m sure they will provide a more elegant solution sooner or later, but this will get us by for now.

Have you had a similar experience? How are you handling SQL Azure backups?

This article was first published as a blog on Mike's site, The Mooney Project, when the Azure versions of SQL Compare and SQL DataCompare were still in Beta. Mike's site covers a range of topics and is always well-worth reading.

© Simple-Talk.com

I

SQL Injection: Defense in Depth17 March 2011by Timothy Wiseman

So much has been written about SQL Injection, yet such attacks continue to succeed, even against security consultants' websites.The problem is often that only part of the solution is described, whereas the best practice requires the use of defense in depth.

n spite of the threat that is presented to data security by SQL Injection, many programmers and DBAs are either unaware of it, or do not knowhow to properly prevent it. This is partly because SQL injection, and methods to prevent it, are so rarely talked about in formal education: I went

through two classes on database theory, several books on SQL Server, and a MCDBA before I first really learned about SQL Injection through“The curse and blessings of dynamic SQL” by Erland Sommarskog.

Because of this, SQL injection remains a common and effective attack. In a significant recent case, even a firm dedicated to security was atleast partially compromised through a SQL injection attack (See ‘Anonymous speaks: the inside story of the HBGary hack’ ), providing salutarylessons to the industry on what can go wrong.

There are already a plethora of articles about protecting against SQL Server injection on SQL Server. Yet, few of them emphasise that the bestdefense against such attacks is a defense in depth, with a whole range of precautions. Many of these articles focus almost entirely onparameterizing SQL as the defense against SQL Injection. While parameterizing is the first and best defense against SQL Injection, it should notbe the only one. Thus, I decided to add one more to list examining various layers of defense and using python for the examples.

What is SQL Injection?SQL Injection attacks are carried out by passing specially-formatted strings as input. In a successful attack, those special strings are passedalong to a database to either execute arbitrary code or cause the server to return unanticipated results. For example, if we have a pythonprogram using pyodbc which concatenates user input into a SQL query like this:

userInput = "'; Drop Table Test; --" conn = pyodbc.connect(connString)curs = conn.cursor() sql = ("Select City, State from dbo.ZipCodes where zipcode = '" + userInput +"'") curs.execute(sql)conn.commit()

Then a malicious user who carefully formats the zipcode entry could execute unintended SQL commands. For instance, if the user provided:

'; Drop Table Test; --

Then the profiler would show that the server would receive:

Select City, State from dbo.ZipCodes where zipcode = ''; Drop Table Test; --'

Assuming the program had the proper permissions, the server would obediently drop the test table.

Basic Techniques to Prevent SQL InjectionParameterize all Queries

The first, best, line of defense against SQL Injection is to parameterize all SQL queries in code. If the previous example using pyodbc had beenparameterized, it could look like:

userInput = "'; Drop Table Test; --" conn = pyodbc.connect(connString)curs = conn.cursor() sql = "Select City, State from dbo.ZipCodes where zipcode = ?"

curs.execute(sql, (userInput,))conn.commit()

This causes the profiler to receive quite a few messages, but the key part is:

exec sp_prepexec @p1 output,N'@P1 varchar(22)',N'Select City, State from dbo.ZipCodes wherezipcode = @P1','''; Drop Table Test; --'

Since it received the malicious code as a variable, the server would simply look for the value in the table and return a blank result sets. Themalicious string is never executed, so the test table is never dropped.

Similarly, most ORMs like SQLAlchemy will automatically parameterize all SQL statements under normal circumstances. Thus, they provide agood initial defense against SQL injection.

Use Only Stored ProceduresThe use of stored procedures by themselves does not provide direct protection against SQL injection, although it can properly be used as partof a more comprehensive defense. To see why stored procedures cannot by themselves protect against SQL injection, consider one thatqueries to retrieve the city and state for a zip code like:

create procedure dbo.GetCityState @zipcode varchar(15)as select city, Statefrom dbo.ZipCodeswhere zipcode = @zipcode

Then if there is a python program that executes:

sql = "exec dbo.GetCityState '" + userInput + "'" curs.execute(sql)conn.commit()

Then an attacker might provide:

'; Drop Table Test; --

As the input, which will send...

exec dbo.GetCityState ''; Drop Table Test; --'

...to the server. Which, again assuming proper permissions, would drop the Test table just as it did without the stored procedure.

Of course, that type of attack can be prevented by parameterizing the input, just as with the standard select statement. However, if the storedprocedure itself uses dynamic SQL that is made through concatentation, then the stored procedure may execute the malicious commands evenif the calling program properly parameterizes. This can be prevented by parameterizing dynamic SQL in stored procedures throughsp_executesql and is discussed in “The Curse and Blessings of Dynamic SQL”.

The greatest value for using stored procedures in preventing SQL injection is that the DBA can set permissions for the application account sothat its only way to interact with the SQL Server is through stored procedures. (See SQL Server Security Workbench Part 1 ) This wouldmean that most SQL injection attacks would fail due to lack of permissions even if the calling program did not parameterize. This of course stillleaves open the possibility of SQL injection working through dynamic SQL inside the stored procedures, but the stored procedures can be givenan “execute as” clause which limits their permissions to only those needed by the procedure. It is generally easier to verify that all storedprocedures are written to guard against SQL injection then it is to check every place where the application interacts with the SQL Server.

Limiting PermissionsThis naturally leads to a very effective method of preventing some attacks and limiting the damages from SQL injection attacks, namely using theaccount with the lowest permissions possible for a job. If the account being used does not have permission to drop a table, then it will not bedropped even if the command is slipped to SQL Server. Similarly, if the account has only read access, an attacker might be able to gain someinformation, which can certainly cause problems, but the attacker will not be able to modify or destroy the data, which is frequently worse. Evenread permissions can be strictly limited in SQL server, to limit which tables can be viewed. If the application only needs selected columns from atable, then read permission on the view can be granted rather than the full table.

Validating inputUser input should always be treated with care and there are a number of reasons to validate all the user input before further processing.Validation code can also help to avoid wasting server resources by restricting requests that would not return useful results and they can providemuch more helpful messages to the user than a SQL error message or empty result set would likely provide. They can also help stop SQLInjection by rejecting, outright, any forms of input that could be used to perform a SQL injection.

Because of its many advantages, it is always important to validate user input, but it is particularly significant when the user input is being passedon to other routines for further processing, or in some of the rare cases where it is impossible to fully parameterize the input. For instance, if youare dealing with the rare situation where the users are required to provide a table name for a DDL statement, the table name cannot be passedin as a parameter and must be concatenated at some point. In that situation, validation of the input is a crucial defense against injection attacks.Similarly, if the input is passed in to a stored procedure, then it is possible that the stored procedure will use it to generate dynamic SQL viaconcatenation, even if the program properly parameterizes the procedure call. With the benefits that validation can bring, it is generally wise tovalidate all user input, even when fully parameterizing database calls and using an account with limited permissions.

Concealing Error MessagesInjection attacks often depend on the attacker having at least some information about the database schema. He often gains this through trial anderror, and the error messages will tell the attacker quite a lot about the schema. Both SQL Server and python generally provide clear, informativeerror messages that are incredibly helpful to programmers, but can also provide information to a malicious user. Pyodbc, in particular, willnormally raise a pyodbc.ProgrammingError exception, which helpfully includes the SQL Server error message.

Encasing a python call to SQL Server in a try/except block will enable the program to provide a more user friendly error message, which doesnot contain useful information for attackers, to the end user. If used along with something like sys.exc_info and a logging package, the exceptblocks can log all errors for later analysis while displaying a user friendly message to the end user. A very basic example might look like:

import loggingimport sys logFile = 'test.log'logging.basicConfig(filename=logFile,level=logging.DEBUG) #establish connection, create faulty SQL, removed for brevitytry: curs.execute(sql) conn.commit()except: print 'User Friendly Error' logging.debug(sys.exc_info())

Of course, to ensure no unfiltered messages get through it is possible to override the standard exception hook like:

import sys def new_exceptionhandler(type, value, tb): #put in custom logging code here print 'There was a general error.' sys.excepthook = new_exceptionhandler print aVariable #NameError since aVariable is not defined

This will not have any impact on exception handling code, but it will prevent standard error messages from reaching a user for unhandledexceptions.

Limiting DamageAs well as taking steps to prevent attacks like SQL injection, there are other general security steps that can be taken to limit the damage.Limiting the permissions of the accounts used has been mentioned, as it can stop many attacks outright, but it will also limit the amount ofdamage that can be done by a successful attack. But there are other methods that can help mitigate the damage done by an attack.

Use encryption/hash functions where appropriate

When data is properly encrypted, it can be made of little value to someone without an encryption key. Cell level encryption can assist withprotecting against unauthorized access to sensitive data SQL Server has supported it since SQL Server 2005. Transparent Data Encryption(TDE), while useful for protecting the database against other forms of attack, is of very little value against SQL injection.

Passwords in particular should not be stored in clear text, and hashing is generally better than encrypting as it makes it harder to recover theoriginal plaintext. Of course, even hashing may provide only limited security if it is not handled properly. For example, rainbow tables(http://en.wikipedia.org/wiki/Rainbow_table) exist for the MD5 algorithm. They make it relatively practical to determine the original plaintext froma single iteration of MD5 without salting. There are libraries that can make hashing and salting a password relatively easy in python, includinghashlib.

Segregate data

Segregating data into different systems, depending on the level of security it needs, can help limit the reach of an attack. It often even makessense to ensure that truly sensitive data is stored in a way that is not accessible from an outside network. This helps to ensure that even if anattacker compromises a system, that it will not immediately lead to the attacker compromising all systems. Of course, it is necessary to ensurethat the same log on credentials are not shared between the segregated systems, otherwise if one set of log on credentials is compromised insome way it may lead directly to compromising other systems.

Auditing and Logging

Auditing and logging will never help prevent SQL injection or any other attack. However, it is likely to help detect the attacks, and may help inrecovering from it. There are a number of tools within SQL Server such as Change Data Capture and SQL Server Audit. Custom written triggerscould also be used to monitor and log changes. There are also a number of external tools that can provide more options such as SQL ServerCompare and SQL Server Data Compare. The Logging library and other similar libraries make logging from the python application alsorelatively easy.

ConclusionSQL injection is one of the more common, and more effective, forms of attack on a system. By following principles of secure software designsuch as parameterizing input to the database, sanitizing and validating user input, and restricting the permissions given to all accounts to theminimum required it is possible to make it extremely difficult for a SQL injection attack to succeed. Also, by following basic security practiceslike encrypting sensitive data, segregating data, and maintaining logs it is possible to limit the amount of damage that even a successful attackcan do.

More references

SQL Server Security Workbench Part 1 by Robyn Page and Phil FactorDatabase Encryption in SQL Server 2008 Enterprise Edition by Sung HsuehSQL Injection! ( by Christoffer HedgateSQL Injection – Part 1 by Randy DyessUpdated SQL Injection by Michael ColesDo Stored Procedures Protect Against SQL Injection by Brian SwanXKCD: Exploits of a Mom by Randall Munroe

© Simple-Talk.com

I

Supporting Large Scale Team Development16 March 2011by Grant Fritchey

With a large-scale development of a database application, the task of supporting a large number of development and testdatabases, keeping them up to date with different builds can soon become ridiculously complex and costly. Grant Fritcheydemonstrates a novel solution that can reduce the storage requirements enormously, and allow individual developers to work on thirown version, using a full set of data.

recently helped a DBA team that had to maintain about 30 different virtual environments for a large-scale development effort. There were somany different environments because the application in question had a very large number of other systems that it was dependent on, and that

were dependent on it. All these cross dependencies meant that the different development teams working on different applications were atvarying levels of completion. With all that going on, we had to keep three versions of SQL Server (2000, 2005, and 2008) up-to-date in all 30environments. There were multiple databases on each version, all at different stages of completion. For any given database, there might bebetween three and five identical installations, but they varied quite widely across the different environments.

Maintaining the servers, while a serious task that took lots of time, was not very hard. Thanks to the virtual environments, it was really easy tomaintain patches and settings across all the environments. The truly difficult part was the two things we had to do to maintain a common set ofdatabases, some of which were refreshed from production, some deployed by the various applications. First, we had to deploy databasechanges to all those environments. Second, we had to refresh the databases to the different environments, on demand. The term “refresh” in thiscase simply meant restoring a backup from production to one of the development environments. To say the least, all this could keep a personhopping.

The solution for deploying the databases was largely solved at development time. All databases were built out of source control. This means thattechniques similar to those outlined in the Team Development book were followed pretty closely. That only left refreshing the databases ondemand.

At the time, the only option we were aware of was to perform full restores of the databases. That meant we spent a lot of time developingautomation tasks around those restore operations, in order to attempt to make the refreshes easier and more repeatable. We built routines inPowerShell to check for available disk space prior to a restore, in order to avoid filling up the non-virtual hard drive space that we had tomaintain all over the place. We were even working on a mechanism for monitoring database size in production and available disk space in theenvironments. This would generate reports on a daily basis to show where we were hurting for space, before anyone even attempted adatabase restore. In short, these processes generated a lot of manual labor, all because we couldn’t get away from physical storage despite ourvirtual environments.

All this was necessary because it’s easier for developers to use production data for testing than it is to create and maintain large amounts ofvery specific test data. Yes, setting up valid business test data is possible (and I’d recommend you check out Red Gate SQL Data Generator forsome good ways to automate it), but it is a lot of work, and if you don’t do that work as you develop, you have to fall back on production data.

That's where Red Gate SQL Virtual Restore would have come in handy. SQL Virtual Restore can mount a backup file to a server as if it wereactually restoring the database. A virtual restore would clearly be useful for all sorts of disaster recovery scenarios, restoring an individual tableor row from a backup, reporting, offline consistency checks, and lots more. But, just because you’ve set up one server to access the backup file,that doesn’t mean another server can’t access it at the same time. That’s right! You can actually have multiple databases on multiple servers atthe same time when using SQL Virtual Restore. This isn’t just reads. You can do writes too. Here’s a quick run through on how easy it is to set upSQL Virtual Restore to have multiple servers accessing a single backup file.

I have one instance of SQL Server 2008 R2 running on my laptop. There is a second instance running within a virtual machine hosted on thelaptop. SQL Virtual Restore is installed on both. From there, it’s almost fun. First, I take a standard, native, full backup of the AdventureWorksLTdatabase. I’m using the “lite” version of AdventureWorks just because the file management is a lot easier.

BACKUP DATABASE AdventureWorksLT TO DISK = 'c:\share\awlt.bak'

This location on my machine is shared with the network, so I can access it through the UNC path from other machines, including my virtualmachine. With the backup in place, I can create a virtual restore of the database. You can launch SQL Virtual Restore from the Red Gate folderin the Start menu, or directly from the Tools menu in SQL Server Management Studio (SSMS) on the machines where you’ve installed it. TheSQL Virtual Restore GUI opens as shown here in Figure 1:

Figure 1: Select SQL Server

You can use this to select the instance you’re interested in accessing. I’m going with the defaults since it’s my local machine. Clicking on “Next”will move to the “Select backup files” window as shown in Figure 2:

Figure 2: Select backup files

Using the “Add Files…” button at the bottom of the screen will open a standard file selection window. You can use it to navigate to the filelocation of your backups, whether a UNC or a defined path on the machine that’s running the software. SQL Virtual Restore can work with SQLHyperBac or SQL Backup Pro files and, as I’m demonstrating here, native backup files. Once I’ve selected the backup I created earlier, itappears on the list of available backups, as in Figure 2. Clicking the “Next” button moves me to Step 3, as shown in Figure 3:

Figure 3: Specify destination

The options available here largely define themselves. The first section allows you either to create a new database, the default, or overwrite anexisting database. There will be files created by this database, but they are extremely small, 128kb, demonstrating the best thing about usingSQL Virtual Restore: saving tons and tons of disk space.

If the backup being restored has outstanding changes, such as transactions that were committed during the backup process, these will also bewritten to the virtual files. This may affect the size of those files. Think about the scenario at the beginning of the article, with multiple differentenvironments, each of which had multiple versions of SQL Server, all with databases requiring storage space. Upwards of a terabyte was usedjust for this development environment. Imagine what could have been done if, instead of full database restores - 5.3mb each, let’s say, theoriginal size of the data file for our test with AdventureWorksLt - it could have just been 128kb plus the size of the backup. When you consideryou might have 3-5 databases, you then start to multiply the savings without having to maintain multiple copies of the backup. Yes, that’s a hugesaving.

Anyway, back to the task at hand. In the third section, you can have the SQL Virtual Restore process perform DBCC CHECKDB on thedatabase it’s creating. Finally, because you can use log backups and differentials with full backups to restore databases to a point in time, youhave the option of not performing the recovery action at the end of the backup which will bring the database online. Keeping the defaults andclicking “Next” will open the Step 4 of the process, as shown in Figure 4:

Figure 4: View summary

This fourth step shows what you are about to do in a summary. You can click on the “View Scripts…” button to check out the script being used -more on that in a minute. The summary lists out all the pertinent information for what’s about to occur. At the bottom of the screen, the HyperBacControl Service configuration shows that I don’t have “.bak” files associated with HyperBac, but that the engine will happily register this backupfor me. You can manually configure the HyperBac Service to recognize this type of backup file, but I find that simply allowing the process toconfigure for me works well enough in most cases. Clicking “Restore” at this point will create a new database through the restore process - avirtually restored database that is actually still just a backup file.

The question you might be asking yourself is, “how the heck is that happening?” Well, let’s take a quick peek at the script used to perform therestore. By the way, because it is done through scripts, it’s entirely possible for you to automate this process, so if you need to maintain 30different environments, you can use this approach. Here’s the script:

-- Script generated by Red Gate SQL Virtual Restore v2.2.0.166RESTORE DATABASE [AdventureWorksLT_Virtual1] FROMDISK=N'C:\Users\grant\Documents\Share\awlt.bak'WITH MOVE N'AdventureWorksLT_Data' TO N'C:\Program Files\Microsoft SQLServer\MSSQL10_50.GFR1\MSSQL\Data\AdventureWorksLT_Data_AdventureWorksLT_Virtual1.vmdf',MOVE N'AdventureWorksLT_Log' TO N'C:\Program Files\Microsoft SQLServer\MSSQL10_50.GFR1\MSSQL\Data\AdventureWorksLT_Log_AdventureWorksLT_Virtual1.vldf',NORECOVERY, STATS=1GORESTORE DATABASE [AdventureWorksLT_Virtual1] WITH RECOVERY, RESTRICTED_USERGOALTER DATABASE [AdventureWorksLT_Virtual1] SET MULTI_USERGO

Look through it carefully, because, except for a very minor difference, this is just a SQL Server restore operation. Spot the key point? Take alook at the file extensions, .vmdf, not .mdf. What’s happening is that the HyperBac service is intercepting this and performing all the work behindthe scenes, sort of spoofing SQL Server that it’s getting a new database restored, when nothing of the kind is occurring. What you get in the endis a full-fledged SQL Server database, it’s just that Hyperbac is working at a lower level to manage these particular data files in conjunction withSQL Server. For a more detailed overview of how this works, check out Brad’s Sure Guide to SQL Virtual Restore.

Clicking on the “Restore” button will run the process and result in a final screen (Figure 5) that shows the completed process, the sizes of thedata, and the space saved:

Figure 5: Performing virtual restore

As shown, only a small amount of space was saved, because we’re only talking about a single instance that is being served by this backup file.But, when two or three or five instances are being served by this one backup, then the savings multiply.

Running the exact same process on the virtual machine results in a copy of the database being available there as well. Here’s the magicmoment. I’ll run this simple SELECT statement on both machines:

SELECT* FROM SalesLT.SalesOrderHeader AS soh;

You can see the output of this query on my laptop instance and the virtual instance in Figure 6. The results are identical:

Figure 6: Select statements from two machines

In the foreground is the query and the results in my virtual machine, and behind it you can see the same query run from the other machine. Now,you can get excited, but let me provide some more excitement. I can modify data within this new “database” that I’ve created. But, and again, thisis a magic moment, the updates on one machine are not affecting the other machine. In Figure 7, I’ve run an update to the data on my laptop.You can see the results of the update on the laptop, but the old data still in place on the virtual machine:

Figure 7: Updates that are different

While this does look like magic, TANSTAFL (There Ain’t No Such Thing A Free Lunch) still applies. If you recall, a very small set of files iscreated for these virtual databases. When you begin modifying data within the virtual database, it has to be stored somewhere. Committedtransactions are stored to the virtual log and then written to the virtual data file that was used to define your database. The backup is read-onlythe whole time, so you won’t see changes there. As you modify data, these will grow. So the more you modify your data, the less having a virtualrestore will help you. In fact, the more it hurts, since the changes have to be reconciled on the fly with what’s in the backup. Like I said, there is nofree lunch here, but some very well defined processes with results you can anticipate. To date, Red Gate has tested this approach with 10databases created from one backup and more testing is being planned.

Just remember, these types of SQL Virtual Restore uses really shouldn’t be considered for permanent databases. In volatile development

environments, such as the one I described earlier, virtual databases can work. In a more traditional shared development environment, the diskspace saved will be lost quickly as data and structures are modified. If databases are going to be maintained for more than straightforwardintegration testing, you’re much better off looking to save space using SQL Storage Compress . Then you also get all the other benefits that SQLStorage Compress offers.

Back to the multiple environments and multiple database copies for a moment. You can see how, if SQL Virtual Restore had been available,huge amounts of space could have been saved. Instead of having to maintain terabytes for a development environment, gigabytes might havesufficed, more than paying the licensing cost for this software.

It’s very easy to automate this process for development environments in general, because you can take advantage of standard T-SQL scripts toactivate it. While there is likely to be some contention as multiple servers access data from a single source, that shouldn’t be an issue, becausethis is a solution for a development environment. It could also be used to work with QA or Production Support environments, again giving peopleaccess to data without having to maintain large amounts of storage space.

© Simple-Talk.com

Anatomy of a .NET Assembly - PE Headers15 March 2011by Simon Cooper

Today, I'll be starting a look at what exactly is inside a .NET assembly - how the metadata and IL is stored, how Windows knows how to load it,and what all those bytes are actually doing. First of all, we need to understand the PE file format.

PE files

.NET assemblies are built on top of the PE (Portable Executable) file format that is used for all Windows executables and dlls, which itself is builton top of the MSDOS executable file format. The reason for this is that when .NET 1 was released, it wasn't a built-in part of the operatingsystem like it is nowadays. Prior to Windows XP, .NET executables had to load like any other executable, had to execute native code to start theCLR to read & execute the rest of the file.

However, starting with Windows XP, the operating system loader knows natively how to deal with .NET assemblies, rendering most of thislegacy code & structure unnecessary. It still is part of the spec, and so is part of every .NET assembly.

The result of this is that there are a lot of structure values in the assembly that simply aren't meaningful in a .NET assembly, as they refer tofeatures that aren't needed. These are either set to zero or to certain pre-defined values, specified in the CLR spec. There are also severalfields that specify the size of other datastructures in the file, which I will generally be glossing over in this initial post.

Structure of a PE file

Most of a PE file is split up into separate sections; each section stores different types of data. For instance, the .text section stores all theexecutable code; .rsrc stores unmanaged resources, .debug contains debugging information, and so on. Each section has a section headerassociated with it; this specifies whether the section is executable, read-only or read/write, whether it can be cached...

When an exe or dll is loaded, each section can be mapped into a different location in memory as the OS loader sees fit. In order to reliablyaddress a particular location within a file, most file offsets are specified using a Relative Virtual Address (RVA). This specifies the offset fromthe start of each section, rather than the offset within the executable file on disk, so the various sections can be moved around in memory withoutbreaking anything. The mapping from RVA to file offset is done using the section headers, which specify the range of RVAs which are validwithin that section.

For example, if the .rsrc section header specifies that the base RVA is 0x4000, and the section starts at file offset 0xa00, then an RVA of0x401d (offset 0x1d within the .rsrc section) corresponds to a file offset of 0xa1d. Because each section has its own base RVA, each validRVA has a one-to-one mapping with a particular file offset.

PE headers

As I said above, most of the header information isn't relevant to .NET assemblies. To help show what's going on, I've created a diagramidentifying all the various parts of the first 512 bytes of a .NET executable assembly. I've highlighted the relevant bytes that I will refer to in thispost:

Bear in mind that all numbers are stored in the assembly in little-endian format; the hex number 0x0123 will appear as 23 01 in the diagram.

The first 64 bytes of every file is the DOS header. This starts with the magic number 'MZ' (0x4D, 0x5A in hex), identifying this file as anexecutable file of some sort (an .exe or .dll). Most of the rest of this header is zeroed out. The important part of this header is at offset 0x3C - thiscontains the file offset of the PE signature (0x80). Between the DOS header & PE signature is the DOS stub - this is a stub program that simplyprints out 'This program cannot be run in DOS mode.\r\n' to the console. I will be having a closer look at this stub later on.

The PE signature starts at offset 0x80, with the magic number 'PE\0\0' (0x50, 0x45, 0x00, 0x00), identifying this file as a PE executable, followedby the PE file header (also known as the COFF header). The relevant field in this header is in the last two bytes, and it specifies whether the fileis an executable or a dll; bit 0x2000 is set for a dll.

Next up is the PE standard fields, which start with a magic number of 0x010b for x86 and AnyCPU assemblies, and 0x20b for x64 assemblies.Most of the rest of the fields are to do with the CLR loader stub, which I will be covering in a later post.

After the PE standard fields comes the NT-specific fields; again, most of these are not relevant for .NET assemblies. The one that is is thehighlighted Subsystem field, and specifies if this is a GUI or console app - 0x2 for a GUI app, 0x3 for a console app.

Data directories & section headers

After the PE and COFF headers come the data directories; each directory specifies the RVA (first 4 bytes) and size (next 4 bytes) of variousimportant parts of the executable. The only relevant ones are the 2nd (Import table), 13th (Import Address table), and 15th (CLI header). TheImport and Import Address table are only used by the startup stub, so we will look at those later on. The 15th points to the CLI header, where theCLR-specific metadata begins.

After the data directories comes the section headers; one for each section in the file. Each header starts with the section's ASCII name, null-padded to 8 bytes. Again, most of each header is irrelevant, but I've highlighted the base RVA and file offset in each header. In the diagram, youcan see the following sections:

1. .text: base RVA 0x2000, file offset 0x2002. .rsrc: base RVA 0x4000, file offset 0xa003. .reloc: base RVA 0x6000, file offset 0x1000

The .text section contains all the CLR metadata and code, and so is by far the largest in .NET assemblies. The .rsrc section contains the datayou see in the Details page in the right-click file properties page, but is otherwise unused. The .reloc section contains address relocations,which we will look at when we study the CLR startup stub.

What about the CLR?

As you can see, most of the first 512 bytes of an assembly are largely irrelevant to the CLR, and only a few bytes specify needed things like thebitness (AnyCPU/x86 or x64), whether this is an exe or dll, and the type of app this is. There are some bytes that I haven't covered that affect thelayout of the file (eg. the file alignment, which determines where in a file each section can start). These values are pretty much constant in most

.NET assemblies, and don't affect the CLR data directly.

Conclusion

To summarize, the important data in the first 512 bytes of a file is:

1. DOS header. This contains a pointer to the PE signature.2. DOS stub, which we'll be looking at in a later post.3. PE signature4. PE file header (aka COFF header). This specifies whether the file is an exe or a dll.5. PE standard fields. This specifies whether the file is AnyCPU/32bit or 64bit.6. PE NT-specific fields. This specifies what type of app this is, if it is an app.7. Data directories. The 15th entry (at offset 0x168) contains the RVA and size of the CLI header inside the .text section.8. Section headers. These are used to map between RVA and file offset. The important one is .text, which is where all the CLR data is

stored.

B

Game-over! Gaining Physical access to a computer15 March 2011by Wesley David

Security requires defense in depth. The cleverest intrusion detection system, combined with the best antivirus, won’t help you if amalicious person can gain physical access to your PC or server. A routine job, helping to remove a malware infection, brings ithome to Wesley just how easy it is to get a command prompt with SYSTEM access on any PC, and inspires him to give a warningabout the consequences.

ecause we are so aware of the sophistication of some of the 'Data-raids' and identity-thefts by cyber-criminals and malign hackers over theinternet, we can sometimes forget that anyone who is able to gain physical access to a Windows machine can then access just about

anything on the machine that the administrator can get to, unless it is encrypted. Even the term ‘Physical access’ is getting to be rather a blurredconcept, since something like physical access can be gained from thousands of miles away with such wonders as KVMoIP (most servers haveDRAC or ILO or something similar) and even the advent of the more advanced Intel vPro/AMT chipsets give standard desktop and laptop PCslights-out management.

There is a difference between knowing how insecure a password system is, and experiencing it. I was recently given a family member’s laptopto scrape clean of a rather nasty malware infection. After making a complete image of the hard drive for safekeeping, I booted into Windows andthen stared at the login screen when I realized that I didn’t know their password. Could I call them? Were they busy? I hate asking for people’spasswords anyway.

My first thought was to look for one of my boot disks which have a few tools which allege to be able to overwrite / reset the NTLM hashes. In myexperience, results with those are a bit ‘hit and miss’. I could possibly brute force the hashes with Cain and Abel. I doubt this person’s passwordwould make Bruce Schneier very proud: Meh... that just seemed too complicated. Then, in the darker recesses of my memory came theremembrance of an old stand-by method for getting into a password protected Windows machine. I tried it and it worked beautifully. It was alsoridiculously simple. I’d like to share it with you since it has been public, published, knowledge for a long time, but before I do let me make note ofa few important points.

First, you need physical access to the PC. This is not some super-secret “haxor” method to recover passwords across a network. You need tobe able to touch the PC and also have some kind of boot disc that allows you to have access to the PC’s file system. For the record, it’s gameover for any PC that you have physical access to, so don’t feel too proud about successfully pulling this method off.

Second, this will not recover a password. This resets the password. You will not be able to find what the unknown password is; you will only beable to overwrite the existing password with the password of your choice. This is very important because resetting the password willrender all EFS encrypted files and folders unrecoverable unless you reset the password back to what is was prior. You have beenwarned.

Third, this method is specifically to reset a local Windows account and not a domain account.

Fourth, this is not new, I did not discover this, this is not original research and you can find plenty of other tutorials on the internet about exactlythis method. I decided to add my own contribution to the subject for two reasons. First to help other technologists out by adding more qualitycontent to Google (these instructions can be found on some rather dicey websites). Second, to directly impact many of my colleagues whoexpressed that they had never heard of any such "accessibility tools exploit". This method is already well-known to all the villains.

Lastly, please use this information for good and not evil. Of course, if you perform a corporate or governmental caper with this knowledge, I amunlikely be obliged to respond to any court summonses that your defense team sends to me.

The OverviewHave you ever looked in the lower right corner of the Windows login screen and seen a little button that looks like this:

Pressing that icon allows you to launch various accessibility tools. The vulnerability that exists centers on Windows’s behavior of launching theaccessibility tools as the SYSTEM user. For example, if you launch the On Screen Keyboard, osk.exe is launched as the SYSTEM user and thushas total administrative authority. The magnifier is the same way:

Further compounding the problem over how Windows launches administrative tools is that it is purely based on the filename and not some formof the file’s hash value. The login screen merely searches the System32 folder for a file called “magnify.exe” (or whatever other accessibility toolyou choose) and launches it. The trick is to replace magnify.exe with a more useful tool for our password resetting purposes. A tool like... oh...cmd.exe.

Let's begin.

The “Hack”Step 1:

Acquire offline access to the filesystem of the PC that needs a password reset. This can mean booting from a live CD of some kind or physicallyextracting the hard drive of the PC in question and slaving it to another PC. You can use virtually any Windows installation disc to perform thisoperation on virtually any other version of Windows. You could also use a simple Linux live CD or an old-school bootable floppy.

Assuming that you now have offline access to the filesystem, we can progress forward.

Step 2:

Navigate to the %systemdrive%\WINDOWS\System32 folder and rename one of the accessibility tools to get it out of the way. I typically renamemagnify.exe to something like magnify.exe.bak.

Next, make a copy of cmd.exe and name it magnifier.exe. The command sequence would be something like this if you’re using a Windowscommand prompt:

Ren c:\windows\system32\magnifier.exe c:\windows\system32\magnifier.exe.bakCopy c:\windows\system32\cmd.exe c:\windows\system32\magnifier.exe

Step 3:

Boot the PC up and wait for the login screen to appear.

Step 4:

At the login screen, click the accessibility icon and choose the accessibility tool that you renamed and replaced with cmd.exe. In my case, it’s themagnifier.

A command prompt will pop up running as the SYSTEM user so you pretty much own the OS from stem to stern at this point.

Step 5:

Now you can change the password with NET USER [username] *

Step 6:

Log in with the newly chosen password!

That’s all there is to it. If done swiftly with the right boot medium (a minimalist Linux shell, for example), it shouldn’t take more than three minutes

to change any password on a Windows PC. But remember, as Uncle Stan Lee said “With great power, there must also come – greatresponsibility!” Use your knowledge for good.

The Evil WithinI did this for the best of reasons, and pretty well all IT people will be in the same boat. However, if someone who doesn’t have your best interestsat heart uses the same technique, what then?

Sure, you can now reset the password for an administrator account. Of course, the better option for nefarious purposes would be to create a newaccount so as not to arouse suspicion when someone inevitably notices that a known local account has had a password change. However, whatif someone is bent on doing long term damage? What could be done from here to gain a deep foothold in that server and by consequence thenetwork that it is on?

The most obvious option would be for someone to install a rootkit on the server. Antivirus currently installed? Oh please. Remember, there is noaccount more powerful than SYSTEM, so any files that a rootkit would need to touch will bow in obeisance to the wishes of your commandprompt. (This is also ignoring the fact that if you have physical access, you could conceivably boot up from media intended to infect the offline filesystem, making the need to use a SYSTEM command prompt extraneous).

Any rootkit worth its weight in stolen credit card numbers will sufficiently cover its tracks and obscure the true malignant processes runningunderneath everything. From there, any amount of destruction can be wrought.

A complication of Installing a rootkit is that you would have to acquire a rootkit, and that acquisition might not be so easy for certain classes ofwould-be attackers. For those attackers that don’t have deep roots in the malware underground, plenty of other tools abound! For example, justabout every Windows admin knows about the Sysinternals Suite, written by Mark Russinovich. One of the tools that could be used would bePsExec. This handy scrap of code allows for the execution of a program on a remote PC. Knowing the credentials of an administrative useraccount on the attacked server, the remote attacker could easily launch a basic Windows command prompt running on the remote machine. Asan aside, setting up an SSH server on a Windows machine is easy to do as well.

Another option would be redirect all network traffic to a remote PC. Certainly there’s quite a stream of valuable 1s and 0s traversing yournetwork cards, be they LAN networks or SANs. Imagine a constanly flowing mirror of your data being collected and reassembled in real time ona remote machine. Or don’t imagine it; you might not sleep tonight.

Perhaps an attacker doesn’t have any outside tools to install. Not to worry, for Windows has quite a number of inbuilt opportunities for mischief.An attacker could simply enable WinRM and have PowerShell at his remote beck and call. If it’s an older operating system that doesn’t supportWinRM, that’s almost better! An attacker could install a Windows component using the add/remove programs control panel that has a knownvulnerability. From there, he could quietly slip away and remotely use an exploit database (Metasploit, for example) to deliver a poisonedpayload and insert any arbitrary code that he wishes. Installing a Windows server 2000 or 2003 version of the FTP service could result in a fullVNC server being installed due to a buffer overflow attack allowing for arbitrary code execution.

This is all assuming that the attacker wants continued access to the server. Perhaps they just want some simple data. How about some juicySQL Server databases? Certainly an attacker who had physical access could merely keep the server booted into a live CD long enough to copythe databases of. But perhaps the databases are rather large, like many are, and the server being down that long would arouse enoughsuspicion to cause a server-room inspection. By executing this accessibility tools exploit, the server should only be down for a few minutes.

Now with either a command prompt or some form of remote tool installed, it’s trivial to start a virtually unnoticed copy procedure. With adminrights, one might be able to gain privileges to the SQL Server depending on the authentication method in place. Then a backup or replicationprocedure could be started so that data is copied while the server can still process transactions. If you can’t get to SQL Server’s own backupcommands, then you could check for any third party backup tools and attempt to swashbuckle that system into handing you a copy of the data.Lastly, if you can’t get a backup or replication going, then you’ll have to take your chances with bringing the service down and copying the MDFand LDF files right out of the SQL Server directory.

Encryption of sensitive files is an essential defense that would mitigate some possible attack vectors. You can even encrypt all of your livedatabase files with a product like, oh... say SQL Storage Compress . But I digress...

What else can one say? Evil is as evil does, and evil loves some physical access to your servers. In the end, protect your servers physically, andthat includes KVMoIP access, as if you were guarding crates of nitro glycerin balanced on a Jenga tower made of golf balls (don’t ask me howthat works).

© Simple-Talk.com

T

Defining .NET Components with Namespaces14 March 2011by Patrick Smacchia

A .NET software component is a compiled set of classes that provide a programmable interface that is used by consumerapplications for a service. As a component is no more than a logical grouping of classes, what then is the best way to define theboundaries of a component within the .NET framework? How should the classes inter-operate? Patrick Smacchia, the leaddeveloper of NDepend, discusses the issues and comes up with a solution.

Contents

Defining components inside .NET assembliesUsing namespaces to define componentsSize of componentsStructuring larger componentsStructuring code with Mediator, Feature and Base componentsAcyclic graph of dependencies between components'Levelizing' existing code is often cheaper than expectedEvolutionary Design and Acyclic componentizationGuidelines

his article will attempt to explain the advantages of using namespaces to define the boundaries of components in a .NET assembly, tosuggest the best ways of using namespaces as components, and to describe how to continuously enforce the relevant rules of namespace

dependency.

My aim is to assist a .NET development shop to rationalize the development of a large code base so as to let the developers becomeproductive, and to reduce the cost of maintenance.

The advice I’ll give in this article comes from experience gained in years of real-world consulting and development in various developmentcorporations. It has proved to be effective several times, in different circumstances.

In my article Partitioning code base through .NET assemblies and Visual Studio Projects we focused on assemblies, and the reasonswhy it is more convenient to have fewer and larger assemblies. I suggested how to organize the VS solutions and VS projects. We mentionedthat a component is a finer-grained logical concept than a physical assembly. This implies that a large assembly is not a monolithic piece ofcode, but is likely to contain several components.

Defining components inside .NET assembliesThe .NET platform has no intrinsic means of defining a component inside an assembly. So what is a component? There are whole books thatare dedicated to explaining what a component is but, to keep things simple and practical, we’ll list some definitions. A component can be:

An aggregate of classes and associated types: A class is mostly too lightweight to define a component. Therefore a component willalmost always include several classes or interfaces and their associated types (enumerations, structures, exceptions, attributes,delegates…). All these types are grouped by a common well-understood semantic concept that gives its name to the component.A unit of development and test: The notion of a component is well-suited to the way that code is developed. When adding a new featureto the product, a developer generally dedicates time to develop a new set of cohesive classes and their associated tests. This representsa component, or more likely a group of related components. Because of this, the organization of components often mirrors the set offeatures; but a component can also host some infrastructure classes, such as domain classes or helpers and utility classes.A unit of learning for developers: It is a daunting task to have to reverse-engineer a large code base. It helps to use components topartition the code into reasonably-sized chunks that can be learned more easily than if they were entangled within a monolithic style codebase.A unit of architecture: One of the most challenging tasks for any real-world code-base development is to master the entropy thatnecessarily arises in such a complex system. We can do this by imposing an architecture on a code base through a well-defined set ofcomponents. This is an adaptation of the ‘Divide and Conquer’ principle introduced in 1628 in the Discourse of Method of RenéDescartes, inventor of modern science. Relying on well-defined units to divide and conquer represents the way that scientists andengineers have worked over the ages to elucidate complex systems.A unit of layering: In order to rationalize a complex system, It is not enough just to have well-defined components: They must be properlylayered. In other words, the graph of dependencies between components must be a Direct Acyclic Graph (DAG). If the graph ofdependencies between components contains a cycle, components involved in the cycle cannot be developed and tested independently.Because of this, the cycle of components represents a super-component, with higher entropy than the sum of the entropies of its containedcomponents.

For this article, we’ll use the concept of a component as being a well-defined aggregate of types, with a reasonable size, and with an acyclicgraph of dependencies between components.

Using namespaces to define componentsA component is well-defined if its boundaries are explicit. Only two .NET language constructs can be used to explicitly define componentboundaries and contain a set of children types: Parent class and namespace.

The advantage of a parent class over a namespace is that nested classes can be declared with a visibility level. This makes it possible to haveclass encapsulation in the component. On the other hand, the advantage of namespace over parent class is that the namespace is a languageartifact and not a CLR artifact. Namespaces are absent from assembly metadata. The namespace name is just a prefix added to containedclasses. This means that a namespace is more lightweight than a parent class. Also, when using a parent class, there is a risk of confusionbetween a parent class and an application class. On the other hand there is no risk of confusing a namespace with a class.

We therefore prefer to use namespaces to define explicit component boundaries. Namespaces are also often used to organize public APIpresentation. This usage of namespace fits nicely with the concept of components as we’ve shown: Component organization often mirrors theset of features, but a component can also host some infrastructure classes.

Size of componentsWe measure the size of a component by counting the lines of code. A logical Line of Code (Loc) represents a Sequence Point. A sequencepoint is the code excerpt highlighted in dark red in the VS code editor window, when creating a breakpoint. Most of .NET tools for developers,including VS and NDepend, measure Lines of Code through sequence points.

The size of a component must be reasonable, between 500 and 2000 LoC. Indeed, we suggested already that a component is a unit of learningand a unit of architecture. A component must never be too large to be reviewed and understood. An architecture that is made of suchcomponents becomes coarse, and leads to a monolithic code style and uncontrolled entropy.

The size of 500 to 2.000 LoC is just inferred guideline based on real-world observations. 500 to 2.000 LoC usually represents from one to twodozen classes. In the case of an abstract component, made of interfaces and enumerations, the number of LoC can be zero. You’ll agree thatLoC is not the appropriate metric for abstract component size.

Structuring larger componentsIf a larger component is required, then the component classes should be divided within sub-namespaces. Because namespaces can berepresented through a hierarchy, we can use this as a convenient way to partition super-components into smaller components.

For example, the Dependency Structure Matrix (DSM) of NDepend is a complex feature that weighs around 5.000 LoC: Because of this, theDSM implementation resides in a namespace that contains seven sub-namespaces. The sub-namespaces correspond to sub-features of theDSM such as Header handling and Cells Computation. The screenshot below shows this organization through the DSM itself and through agraph of dependencies:

We can see that the graph is acyclic and that the parent namespace is using all child namespaces. This illustrates a recurring pattern where:

The parent component plays the role of a mediator. Most of the sub-namespaces don’t know about other sub-namespaces. This is done toreduce any chance of coupling. The mediator is responsible for making sub-components communicate.There is a base sub-component (the Base namespace) that is used by child sub-components and by the parent component. The basecomponent mostly defines shared interfaces, enumerations and data classes that model the component domain. In our case it definesconcepts such as matrix cell, matrix’-row/column header, dependency and dependency cycle in matrix or matrix display settings.It is not shown here, but the rest of the program only knows about the parent component. Hence using the namespace hierarchy is also away to encapsulate some component implementation details. Here the parent namespace has two responsibilities: it defines the publicsurface visible from the rest of the program and it implements the role of a mediator between sub-components. It could have been evencleaner to separate these two responsibilities in:

the parent namespace, that would define the component public surface.a single sub-namespace named Impl, that contains the mediator code and that contains other sub-namespaces.

Structuring code with Mediator, Feature and Base componentsA program code base is likely to be composed of large components, each one of which will probably contain sub-components that are onlyneeded for their own private implementations. This idea is illustrated by the DSM below that shows the architecture of the NDepend UI code (>50K LoC). There are many large components in the list those who namespace name ends up with .* to signify that they contains sub-components.

The two vertical lines on KernelImpl.*and MainPanel.* namespaces (columns 1 and 3) indicate that these two components act as mediatorsbetween lower level components. Hence the other columns are pretty empty, meaning that components that represent features (Matrix, Graph,CQL …) are independent from each other.

The three rows 22, 23 and 24, are almost full. They represent base components that support all higher level components. Obviously this high-level architecture and the finer-grained architecture of the Matrix components are pretty similar. The same pattern is applied at two differentscales. In both cases there are:

Feature Components: Independent components that contain features implementation.Mediator Components: A few high-level components that act as mediator between feature components. The mediator contains theplumbing needed to make features communicate between each others.Base Components: A few low-level components that implement the domain of the application. Base components classes are sharedamongst feature components.

Main benefits of classifying components between feature, mediator and base are that:

The clarity of the overall architecture is not blurred by the implementation details.The developer can zoom in on the program structure to understand it.The structure favors a low incidence of coupling between implementations details. (this is the Low-Coupling)The implementation of a particular feature is thoroughly nested in a well-identified root namespace (the is the High-Cohesion Pattern)

Acyclic graph of dependencies between componentsBy looking back to the DSM representing the NDepend.UI structure, we can see that the upper triangle above the matrix’s diagonal is empty.Each component can be used by components above it, and can use components below it. Hence the structure is perfectly layered. It is importantto remember that:

The graph of dependencies between components is acyclic if, and only if, its DSM representation has its upper triangle empty.

Since components are layered, each component can have a level index. Hence, we say that when an architecture has no dependency cycles, itis levelized. We have already tried to explain that a levelized architecture is essential because:

'If the graph of dependencies between components contains a cycle, components involved in the cycle cannot be developed andtested independently. Because of this, the cycle of components represents a super-component, with higher entropy than the sumof the entropies of its contained components.'

Every developer has the instinct to lean toward an acyclic structure. This natural tendency explains the popularity of Visual Studio (VS) solutions

with dozens of small VS projects. VS detects, and prevents, dependency cycles between VS projects;therefore developers see in the VS projectthe ideal device with which to implement the idea of levelized component. Unfortunately, doing so is far from ideal and we explain in the articlePartitioning code base through .NET assemblies and Visual Studio Projects all the real-world problems of this approach.

The tool NDepend offers a simple way of checking for dependency cycles between namespaces of an assembly. The following rule, written withCode Query Language (CQL), is all what one needs to be advised of a broken architecture.

If an assembly is matched, just right-click the assembly in the list of match and click :View internal dependency cycles on matrix :

The matrix will make the dependency cycle obvious with a red-square. Here, for the requirements of this article, we contrived it so that the twonamepaces GraphPanel and KernelInterface are mutually dependent. This provokes five components to be entangled in a dependency cycle,hence the red square on the DSM that encompasses these five namespaces:

'Levelizing' existing code is often cheaper than expectedHaving a levelized structure between namespaces is enough to keep the architecture clean and maintainable. We’ve often noticed in code thatthe code structure naturally tends towards being levelized. This is because of the developers instinct for the notion of layers, that we alreadymentioned.

In the DSM above, we see the namespace Base at the lower level in the architecture. For most developer, it would seem unnatural and awkwardto create a dependency from Base to any other component. Base's concepts are not supposed to use anything, they are here to be used fromother components.

Often, the code structure is naturally close to levelized but not thoroughly levelized. Tooling is needed to prevent these few wrong dependencies(like from Base to something else) that appears with time. Usually there are not so many of these dependencies to fix. Because of this, the rulemore than the exception is that often 'Levelizing' an existing code base is a cheap process that can be achieved in a few days of

work.

Evolutionary Design and Acyclic componentizationIf the code structure is kept levelized, low-level components never get a chance to bubble-up in the architecture not because someone decidedso, but because above components won’t let it bubble-up. Like in traditional building architecture, the structure itself put the pressure on low levelcomponents. As long as the acyclic components constraint is continuously respected, the code base remains highly learnable and maintainable.

In traditional building architecture, the gravity strength put the pressure on low level artifacts. This makes them more stable: 'stable' in thesense they are hard to move.In software architecture, abiding by the acyclic component idea put the pressure on low level components. This makes them more stable,in the sense that it is painful to refactor them. Empirically abstractions are less often subject to refactoring than implementations. it is, forthis reason, a good idea that low-level components contain mostly abstractions (interfaces and enumerations) to avoid painful refactoring.

The beauty of levelized architecture is that it discards the need for most design decisions. This lack of up-front design is known as evolutionarydesign. Let’s quote Martin Fowler on evolutionary design:

With evolutionary design, you expect the design to evolve slowly over the course of the programming exercise. There’s no designat the beginning. You begin by coding a small amount of functionality, adding more functionality, and letting the design shift andshape.

With levelized evolutionary design, good design is implicitly and continuously maintained. There are no questions about what to do to implementa new requirement. When planning new code to implement a requirement that is unpredicted, one just has to consider its fan-in and fan-out (whowill use this new code and who this new code will use). From this information and from the need to preserve levelization, one can infer the leveland the right location where this new code will fit well. Sometime the need to introduce abstractions through a pattern like injection of code orinversion of dependency will arise, but only to preserve levelization, not because it seems cool to do so or the new fashion pattern book advisesit. And releases after releases, iterations after iterations, the design will evolve seamlessly toward something continuously flawless andunpredictable. Like in traditional building architecture, the structure won’t collapse.

GuidelinesUse the concept of namespace to define boundaries of components.A namespace typically contains from one to two dozens of types, and has a reasonable size that fits in the 500 to 2.000 LoCrange.Take the time to levelize your code-base components, it is certainly a cheaper task than expected, so the Return OnInvestment will be high.Continuously check that the components’ dependency graph inside an assembly is acyclic.If a component is too large (> 2.000 LoC), then use sub-namespaces to divide it into a smaller set of related components.At any scale, classify components between high-level mediators, middle-level independent features, low-level base/domains.Having a 'levelized' set of components removes the need for most design decisions.

© Simple-Talk.com

S

The default trace in SQL Server - the power of performance andsecurity auditing14 March 2011by Feodor Georgiev

Since the introduction of SQL Server 2005, there is a simple lightweight trace that is left running by default on every SQL Server.This provides some very valuable information for the DBA about the running server, but it isn't well-documented. Feodor revealsmany of the secrets of this facility and shows how to get reports from it.

QL Server provides us with variety of tools for auditing. All of them have their advantages and pitfalls. The default trace, introduced in SQLServer 2005, has the great advantage of being switched on by default, and is usually already there to use. It provides comprehensive

information about changes in the system.

Firstly, let’s start by answering some basic questions:

What is the default trace? The default trace is enabled by default in SQL Server and is a minimum weight trace which consists by default offive trace files ( .trc) located in the SQL Server installation directory. The files are rolled over as time passes.

How do we know that the default trace is running? We can run the following script in order to find out if the default trace is running:

SELECT* FROM sys.configurations WHERE configuration_id = 1568

If it is not enabled, how do we enable it? We can run this script in order to enable the default trace:

sp_configure 'show advanced options', 1;GORECONFIGURE;GOsp_configure 'default trace enabled', 1;GORECONFIGURE;GO

What is logged in the Default Trace? If we open the Default trace file in Profiler and look at the trace definition we will see that events in 6categories are captured: Database, Errors and Warnings, Full-Text, Objects, Security Audit and Server. Also, all available columns are selectedfor every sub-event.

Figure 1: This is how the Default trace looks like

So, how can we benefit from each audited category? In the following sections I will explain briefly what each category means, as well as some ofthe sub-events, and will provide essential scripts for auditing the events in the Default Trace.

Database Events

Let’s start with the first event: the Database. As we can see, the sub-events are pretty much self-explanatory – the growth and shrinkage of dataand log files, together with the changes in mirroring status. It is important to monitor file growths and shrinkages; It would be a vast topic toexplain why, but in an nutshell, it is because of possible performance issues. Every time a file is grown or shrunk, SQL Server will halt and waitfor the disk system to make the file available again. And halt, in this case, means halt: no transactions processed until the action is completed.

These are the database events that are monitored:

Data file auto growData file auto shrinkDatabase mirroring status changeLog file auto growLog file auto shrink

Here is a script which will list the data file growths and shrinkages:

SELECT TE.name AS [EventName] , T.DatabaseName , t.DatabaseID , t.NTDomainName , t.ApplicationName , t.LoginName , t.SPID , t.Duration , t.StartTime , t.EndTimeFROM sys.fn_trace_gettable(CONVERT(VARCHAR(150), ( SELECT TOP 1 f.[value] FROM sys.fn_trace_getinfo(NULL) f WHERE f.property = 2 )), DEFAULT) T JOIN sys.trace_events TE ON T.EventClass = TE.trace_event_idWHERE te.name = 'Data File Auto Grow' OR te.name = 'Data File Auto Shrink'ORDER BY t.StartTime ;

The output of the script will not tell you why the database grew, but will show you how long it took to grow. (Be careful about the value of theDuration column, since it might be in milliseconds or in microseconds, depending on the SQL Server version).

Also, I would recommend extending this query to search for databases which took longer than, say, a second to grow (this is just a guideline).

Here is another query which will return the log growths and log shrinking.

SELECT TE.name AS [EventName] , T.DatabaseName , t.DatabaseID , t.NTDomainName , t.ApplicationName , t.LoginName , t.SPID , t.Duration , t.StartTime , t.EndTimeFROM sys.fn_trace_gettable(CONVERT(VARCHAR(150), ( SELECT TOP 1 f.[value] FROM sys.fn_trace_getinfo(NULL) f WHERE f.property = 2 )), DEFAULT) T JOIN sys.trace_events TE ON T.EventClass = TE.trace_event_idWHERE te.name = 'Log File Auto Grow' OR te.name = 'Log File Auto Shrink'ORDER BY t.StartTime ;

Also keep in mind that the query will not tell you if your junior DBA has been shrinking the data and log files. In the default trace we can find onlythe AUTO growth and shrink events and not the ones triggered by the ALTER DATABASE statement.

Errors and Warnings

Now let’s move on to the next section of the events: the Errors and Warnings. As we can see, there is an abundance of information here.

The Errorlog sub-event occurs when something is written to the SQL Server event log; Hash and Sort warnings happen generally when a sortor a hash match operation is spilled to disk (and since the disk subsystem is the slowest, then our queries become much slower.) Missingcolumn statistics events will occur only when the ‘Auto create statistics’ option is set to off. In this case SQL Server indicates that it might havechosen a bad execution plan. The missing join predicate occurs when two tables do not have a join predicate and when both tables have morethan one row. This can result in a long running queries or unexpected results.

These categories of errors and warnings are:

ErrorlogHash warningMissing Column StatisticsMissing Join PredicateSort Warning

Here is a script which will outline the errors:

SELECT TE.name AS [EventName] , T.DatabaseName , t.DatabaseID , t.NTDomainName , t.ApplicationName , t.LoginName , t.SPID , t.StartTime , t.TextData , t.Severity , t.ErrorFROM sys.fn_trace_gettable(CONVERT(VARCHAR(150), ( SELECT TOP 1 f.[value] FROM sys.fn_trace_getinfo(NULL) f WHERE f.property = 2 )), DEFAULT) T JOIN sys.trace_events TE ON T.EventClass = TE.trace_event_idWHERE te.name = 'ErrorLog'

Note that this script has neither EndTime nor Duration columns, for obvious reasons.

Here is another script which will outline the sort and hash warnings:

SELECT TE.name AS [EventName] , v.subclass_name , T.DatabaseName , t.DatabaseID , t.NTDomainName , t.ApplicationName , t.LoginName , t.SPID , t.StartTimeFROM sys.fn_trace_gettable(CONVERT(VARCHAR(150), ( SELECT TOP 1 f.[value] FROM sys.fn_trace_getinfo(NULL) f WHERE f.property = 2 )), DEFAULT) T JOIN sys.trace_events TE ON T.EventClass = TE.trace_event_id JOIN sys.trace_subclass_values v ON v.trace_event_id = TE.trace_event_id AND v.subclass_value = t.EventSubClassWHERE te.name = 'Hash Warning' OR te.name = 'Sort Warnings'… and finally, one more script which outlines the missing statistics and join predicates.SELECT TE.name AS [EventName] , T.DatabaseName , t.DatabaseID , t.NTDomainName , t.ApplicationName , t.LoginName , t.SPID , t.StartTimeFROM sys.fn_trace_gettable(CONVERT(VARCHAR(150), ( SELECT TOP 1 f.[value] FROM sys.fn_trace_getinfo(NULL) f

WHERE f.property = 2 )), DEFAULT) T JOIN sys.trace_events TE ON T.EventClass = TE.trace_event_idWHERE te.name = 'Missing Column Statistics' OR te.name = 'Missing Join Predicate'

The Full Text Events

The Full-Text event category shows information about the Full-Text population events: If it is aborted, then you should look into the event log for amore detailed message; the FT Crawl Started sub-event indicates that the population request has been picked up by the workers. FT CrawlStopped indicates either a successful completion or stop by error.

Full-Text events are...

FT Crawl AbortedFT Crawl StartedFT Crawl Stopped

Here is a script which will return the Full text events:

SELECT TE.name AS [EventName] , DB_NAME(t.DatabaseID) AS DatabaseName , t.DatabaseID , t.NTDomainName , t.ApplicationName , t.LoginName , t.SPID , t.StartTime , t.IsSystemFROM sys.fn_trace_gettable(CONVERT(VARCHAR(150), ( SELECT TOP 1 f.[value] FROM sys.fn_trace_getinfo(NULL) f WHERE f.property = 2 )), DEFAULT) T JOIN sys.trace_events TE ON T.EventClass = TE.trace_event_idWHERE te.name = 'FT:Crawl Started' OR te.name = 'FT:Crawl Aborted' OR te.name = 'FT:Crawl Stopped'

Notice that the records in the DatabaseName column are null, so we have to get the database name from the DB_NAME() function.

Object events

Here is where the real detective work starts: the changes of the object. In this category we have altered, created and deleted objects, and thisincludes anything from index rebuilds, statistics updates, to database deletion.

Object events include:

Object AlteredObject CreatedObject Deleted

Here is a script which will give you the most recently manipulated objects in your databases.

SELECT TE.name , v.subclass_name , DB_NAME(t.DatabaseId) AS DBName , T.NTDomainName , t.NTUserName , t.HostName , t.ApplicationName , t.LoginName , t.Duration , t.StartTime , t.ObjectName , CASE t.ObjectType WHEN 8259 THEN 'Check Constraint' WHEN 8260 THEN 'Default (constraint or standalone)'

WHEN 8262 THEN 'Foreign-key Constraint' WHEN 8272 THEN 'Stored Procedure' WHEN 8274 THEN 'Rule' WHEN 8275 THEN 'System Table' WHEN 8276 THEN 'Trigger on Server' WHEN 8277 THEN '(User-defined) Table' WHEN 8278 THEN 'View' WHEN 8280 THEN 'Extended Stored Procedure' WHEN 16724 THEN 'CLR Trigger' WHEN 16964 THEN 'Database' WHEN 16975 THEN 'Object' WHEN 17222 THEN 'FullText Catalog' WHEN 17232 THEN 'CLR Stored Procedure' WHEN 17235 THEN 'Schema' WHEN 17475 THEN 'Credential' WHEN 17491 THEN 'DDL Event' WHEN 17741 THEN 'Management Event' WHEN 17747 THEN 'Security Event' WHEN 17749 THEN 'User Event' WHEN 17985 THEN 'CLR Aggregate Function' WHEN 17993 THEN 'Inline Table-valued SQL Function' WHEN 18000 THEN 'Partition Function' WHEN 18002 THEN 'Replication Filter Procedure' WHEN 18004 THEN 'Table-valued SQL Function' WHEN 18259 THEN 'Server Role' WHEN 18263 THEN 'Microsoft Windows Group' WHEN 19265 THEN 'Asymmetric Key' WHEN 19277 THEN 'Master Key' WHEN 19280 THEN 'Primary Key' WHEN 19283 THEN 'ObfusKey' WHEN 19521 THEN 'Asymmetric Key Login' WHEN 19523 THEN 'Certificate Login' WHEN 19538 THEN 'Role' WHEN 19539 THEN 'SQL Login' WHEN 19543 THEN 'Windows Login' WHEN 20034 THEN 'Remote Service Binding' WHEN 20036 THEN 'Event Notification on Database' WHEN 20037 THEN 'Event Notification' WHEN 20038 THEN 'Scalar SQL Function' WHEN 20047 THEN 'Event Notification on Object' WHEN 20051 THEN 'Synonym' WHEN 20549 THEN 'End Point' WHEN 20801 THEN 'Adhoc Queries which may be cached' WHEN 20816 THEN 'Prepared Queries which may be cached' WHEN 20819 THEN 'Service Broker Service Queue' WHEN 20821 THEN 'Unique Constraint' WHEN 21057 THEN 'Application Role' WHEN 21059 THEN 'Certificate' WHEN 21075 THEN 'Server' WHEN 21076 THEN 'Transact-SQL Trigger' WHEN 21313 THEN 'Assembly' WHEN 21318 THEN 'CLR Scalar Function' WHEN 21321 THEN 'Inline scalar SQL Function' WHEN 21328 THEN 'Partition Scheme' WHEN 21333 THEN 'User' WHEN 21571 THEN 'Service Broker Service Contract' WHEN 21572 THEN 'Trigger on Database' WHEN 21574 THEN 'CLR Table-valued Function' WHEN 21577 THEN 'Internal Table (For example, XML Node Table, Queue Table.)' WHEN 21581 THEN 'Service Broker Message Type' WHEN 21586 THEN 'Service Broker Route' WHEN 21587 THEN 'Statistics' WHEN 21825 THEN 'User' WHEN 21827 THEN 'User' WHEN 21831 THEN 'User' WHEN 21843 THEN 'User' WHEN 21847 THEN 'User' WHEN 22099 THEN 'Service Broker Service' WHEN 22601 THEN 'Index' WHEN 22604 THEN 'Certificate Login' WHEN 22611 THEN 'XMLSchema' WHEN 22868 THEN 'Type'

ELSE 'Hmmm???' END AS ObjectTypeFROM [fn_trace_gettable](CONVERT(VARCHAR(150), ( SELECT TOP 1 value FROM [fn_trace_getinfo](NULL) WHERE [property] = 2 )), DEFAULT) T JOIN sys.trace_events TE ON T.EventClass = TE.trace_event_id JOIN sys.trace_subclass_values v ON v.trace_event_id = TE.trace_event_id AND v.subclass_value = t.EventSubClassWHERE TE.name IN ( 'Object:Created', 'Object:Deleted', 'Object:Altered' ) -- filter statistics created by SQLserver AND t.ObjectType NOT IN ( 21587 ) -- filter tempdb objects AND DatabaseID <> 2 -- get only events in the past 24 hours AND StartTime > DATEADD(HH, -24, GETDATE())ORDERBY t.StartTime DESC ;

Keep in mind that SQL Server by default has five trace files, 20 MB each and there is no known supported method of changing this. If you have abusy system, the trace files may roll over far too fast (even within hours) and you may not be able to catch some of the changes. This article willnot discuss in detail any workarounds for efficient SQL Server tracing, instead I will do this in a separate article later on.

Security Audit Events

Another part of the default trace is the Security Audit. As you can see from the event list below, this is one of the richest parts of the default trace.In general, what this event group tells us is what significant security events are occurring in our system.

Security events include...

Audit Add DB user eventAudit Add login to server role eventAudit Add Member to DB role eventAudit Add Role eventAudit Add login eventAudit Backup/Restore eventAudit Change Database ownerAudit DBCC eventAudit Database Scope GDR event (Grant, Deny, Revoke)Audit Login Change Property eventAudit Login FailedAudit Login GDR eventAudit Schema Object GDR eventAudit Schema Object Take OwnershipAudit Server Starts and Stops

Let’s take it one step at a time and:

create a SQL Server loginassign read permissions to this user in one of our databases.

By running the following query we will be able to track what users have been created on our SQL Server instance:

SELECT TE.name AS [EventName] , v.subclass_name , T.DatabaseName , t.DatabaseID , t.NTDomainName , t.ApplicationName , t.LoginName , t.SPID , t.StartTime , t.RoleName , t.TargetUserName , t.TargetLoginName , t.SessionLoginName

FROM sys.fn_trace_gettable(CONVERT(VARCHAR(150), ( SELECT TOP 1 f.[value] FROM sys.fn_trace_getinfo(NULL) f WHERE f.property = 2 )), DEFAULT) T JOIN sys.trace_events TE ON T.EventClass = TE.trace_event_id JOIN sys.trace_subclass_values v ON v.trace_event_id = TE.trace_event_id AND v.subclass_value = t.EventSubClassWHERE te.name IN ( 'Audit Addlogin Event', 'Audit Add DB User Event', 'Audit Add Member to DB Role Event' ) AND v.subclass_name IN ( 'add', 'Grant database access' )

Here is how the result of the query looks like after we have created one login with giving it read permission to one database:

As we can see, the first row is announcing the creation of the login in the master database, together with the creator (SessionLoginNamecolumn) and the create user (TargetLoginName column).

The next two rows are as follows: creating the database user and granting it database access, and last – adding the database user to a DB role.

Keep in mind that if you add the user to more than one role and if you give the login access to more than one database, then you will see severalrows noting every event in your default trace.

Now let’s audit the dropped users and logins by running the following query:

SELECT TE.name AS [EventName] , v.subclass_name , T.DatabaseName , t.DatabaseID , t.NTDomainName , t.ApplicationName , t.LoginName , t.SPID , t.StartTime , t.RoleName , t.TargetUserName , t.TargetLoginName , t.SessionLoginNameFROM sys.fn_trace_gettable(CONVERT(VARCHAR(150), ( SELECT TOP 1 f.[value] FROM sys.fn_trace_getinfo(NULL) f WHERE f.property = 2 )), DEFAULT) T JOIN sys.trace_events TE ON T.EventClass = TE.trace_event_id JOIN sys.trace_subclass_values v ON v.trace_event_id = TE.trace_event_id AND v.subclass_value = t.EventSubClassWHERE te.name IN ( 'Audit Addlogin Event', 'Audit Add DB User Event', 'Audit Add Member to DB Role Event' ) AND v.subclass_name IN ( 'Drop', 'Revoke database access' )

As you can see, the event name is the same for both creating and dropping logins: i.e. Audit Addlogin Event, however the subclass column valueis what defines the difference, ie. In the case of creation of a login the subclass would be ‘Add’ and in the case of deletion it would be ‘Drop’.

In fact, if we drop the database user and the SQL login we created earlier, this query will return two rows – one for each event together with thedropped user and login names and the login name of the user who deleted the user and the login.

The following query will give us all the failed logins contained in our default trace file:

SELECT TE.name AS [EventName] , v.subclass_name , T.DatabaseName , t.DatabaseID , t.NTDomainName , t.ApplicationName , t.LoginName , t.SPID , t.StartTime , t.SessionLoginName

FROM sys.fn_trace_gettable(CONVERT(VARCHAR(150), ( SELECT TOP 1 f.[value] FROM sys.fn_trace_getinfo(NULL) f WHERE f.property = 2 )), DEFAULT) T JOIN sys.trace_events TE ON T.EventClass = TE.trace_event_id JOIN sys.trace_subclass_values v ON v.trace_event_id = TE.trace_event_id AND v.subclass_value = t.EventSubClassWHERE te.name IN ( 'Audit Login Failed' )

There are quite a few events in the Security Audit class and for the sake of compactness of this article I will turn your attention only to one moreevent, namely to the ‘Audit Server Starts and Stops’.

The following query will give you only the server start event:

SELECT TE.name AS [EventName] , v.subclass_name , T.DatabaseName , t.DatabaseID , t.NTDomainName , t.ApplicationName , t.LoginName , t.SPID , t.StartTime , t.SessionLoginNameFROM sys.fn_trace_gettable(CONVERT(VARCHAR(150), ( SELECT TOP 1 f.[value] FROM sys.fn_trace_getinfo(NULL) f WHERE f.property = 2 )), DEFAULT) T JOIN sys.trace_events TE ON T.EventClass = TE.trace_event_id JOIN sys.trace_subclass_values v ON v.trace_event_id = TE.trace_event_id AND v.subclass_value = t.EventSubClassWHERE te.name IN ( 'Audit Server Starts and Stops' )

Yes, you read it correctly: the above query will return only the Server Start event, and never the Server Stop event. Here is the explanation: as Imentioned earlier, SQL Server’s default trace consists of five trace files in total, which are 20 MB each. These five trace files are rotated(‘refurbrished’ or ‘recycled’, if you like) upon several conditions: when the instance starts or when the file size reaches 20 MB. Now, let’s thinkabout this for a second: the queries I have listed so far in this article are returning the results only from the current trace file, i.e. the most recentone. Further, since the default trace file is rolled over every time the instance starts, this means that the event indicating the Server Stop willremain in the previous default trace file. Put simply, after the SQL Service restarts, our current default trace file will have the Server Start eventas a first row. If you really wish to know when your SQL Server instance was stopped, you will need to include at least the contents of theprevious file, but in fact we can include the contents of the other four default trace files to our result set. We can do this by changing the way wecall sys.fn_trace_gettable so that it appends all default trace files. This function accepts 2 parameters – file location and name and number offiles; if we pass as the first parameter the file location and the name of the oldest default trace file, then the sysfn_trace_gettable will append thenewest ones, as long as we specify the appropriate value for the second parameter (the number of files). If we specify the newest file as aparameter to the function (as it is the case in all scripts in this article) then the older files will not be appended. As the filename contains the indexof the file and they increment as each new file is created, it is easy to calculate the name of the oldest file.

To find the exact file location of the default trace files, you just need to execute the following query:

SELECT REVERSE(SUBSTRING(REVERSE(path), CHARINDEX('\', REVERSE(path)), 256)) ASDefaultTraceLocationFROM sys.tracesWHERE is_default = 1

Server Memory Change Events

And now, let’s move on to the last event class in our default trace: the Server class. It contains only one event – Server Memory Change.

The following query will tell us when the memory use has changed:

SELECT TE.name AS [EventName] , v.subclass_name , t.IsSystemFROM sys.fn_trace_gettable(CONVERT(VARCHAR(150), ( SELECT TOP 1 f.[value] FROM sys.fn_trace_getinfo(NULL) f WHERE f.property = 2

)), DEFAULT) T JOIN sys.trace_events TE ON T.EventClass = TE.trace_event_id JOIN sys.trace_subclass_values v ON v.trace_event_id = TE.trace_event_id AND v.subclass_value = t.EventSubClassWHERE te.name IN ( 'Server Memory Change' )

The event subclass indicates if the memory has increased or decreased.

Conclusion

The default trace is a very powerful way to examine the health and the security of your SQL Server instance. There are several pitfalls to keep inmind – mainly related to file rollovers and size limitations, but with some programming the workarounds are not impossible. It is important toremember that the queries presented in this article will return the result from the single most recent default trace file. Depending on how busy theSQL Server instance is, the files may roll over way too fast for a DBA to catch all significant events; therefore, some automation is needed.

© Simple-Talk.com

B

The Polyglot of Databases: How Knowledge of MySQL and OracleCan Give SQL Server DBAs an Advantage14 March 2011by Hugo Shebbeare

Although switching between different RDBMSs can be the cause of some culture shock for the Database Administrator,it can haveits advantages. In fact, it can help you to broaden your perspective of relational databases, refine your problem-solving skills andgive you a better appreciation of the relative strengths of different relational databases.

eing a database administrator means you have taken up a career as a DBA, and not necessarily a vendor-specific DBA. If a challenge thatarises which requires your services as a DBA, it may not necessarily be your favourite flavour of RDBMS that is the best solution for your

employer, or client. Specialists sometimes maintain that you cannot be a Master/Mistress of more than one database management system.However, I’d like to argue that DBAs should diversify as a way to improve their marketable skills or competitiveness, broaden their knowledgeand innovative thinking.

Surely, any Database Administrator who really understands databases and SQL standards could work quite well with any of the leadingRelational Database Management Systems (RDBMS). If you want to become what I describe as a polyglot of databases, equally confident in arange of RDBMSs, you must first have an excellent grasp of the fundamentals of RDBMSs, such as normalisation and indexing, and understandthe importance of certification and practice.

In the same way that knowing several languages can help you to know your own language better, I believe that taking the deep dive into anotherRDBMS can also broaden your approach to resolving problems in your ‘native’ RDBMS. Being multilingual myself, I was briefly tempted to givethis article the long-winded title “Applying the Principles of a Polyglot to Database Management: How Shrewd DBAs Benefit from MySQL andOracle, Over and Above SQL Server”.

Oracle 11g R2 and Developer 2.1 – The Latest Toys from the Oracle WorldIt can be quite an initial shock to move from one RDBMS to another. If you come from a predominantly Oracle background, then you will beinterested in a fine series on Crossing the Great Divide from Oracle to SQL Server by Jonathan Lewis. My shock was in the other direction.Although I originally trained on Oracle 8 in 1998 at Oracle Montreal’s offices, I’ve predominately worked with SQL Server thereafter. I’ve recentlyspent over a year of working once again with Oracle 11gR2 and its solid architecture. The hardest part initially was re-learning the complete setof Acronyms related to Oracle's architecture, such as SMON, PMON, P/SGA, DBWn, RECO, XDB, ARCn et al. It was worth the struggle,though, and I’m impressed with it. When tackling a different RDBMS, one soon notices features that are done particularly well: I especiallyappreciate the database-recovery approach that Oracle takes, especially the redo/undo log files and using the SCN (system change number,defined during a checkpoint) for bombproof recovery. With MSSQL, it is possible to lose your backup recovery chain (i.e. be unable to matchLSN numbers) with mixed Tape/Disk backups. Helpfully, with Oracle’s use of the SCN, this is not such an issue, since control files won’t evenstart up the database without the correct SCNs.

More examples of this solid architecture can be seen during the mount process: During this, there is a fail-safe check for the control file(s) topermit the instance to start, or to avoid Startup Open because one might want to benefit from the way that Startup Restrict can allow easymaintenance operations. After starting a database instance, the Oracle software will then associate the instance with a specific database. Thisis called ‘mounting the database’. I have created the diagram below as an expansion from a rudimentary diagram from the Official coursematerials, so as to help you understand why I admire the way in which an Oracle database is opened.

Figure 1. The Oracle Database Start-up Process

Oracle Developer 3.0 early adopter version, as seen below, is a cool management tool which also comes with a user-friendly ability to exportresult sets to XLS/CSV/TXT, although I personally prefer SQL Server Management Studio (having been made SQL Server MVP, I can hardlyavoid a natural bias). Many SQL Plus commands are supported within Developer 3.0, which also now let’s you handle the DBMS job scheduler,as well as how many times a job is executed. Also not to be ignored in the Oracle world is the excellent web-based Enterprise Manager 11gfrom Oracle, also known as Grid Control, though it takes a while to work out how you find your way around. However, once you’re familiar with it,you’ll discover that this tool is rich in features. It is useful for monitoring health, availability and performance, as well as performing virtually all ofyour normal administrative tasks.

Figure 2. Screenshot of Oracle Developer 3.0, taken from a presentation demonstrating the Query Builder

Seasoned Oracle DBAs have mentioned the lack of spooling in Oracle Developer as a pain-point: Spool actually works fine in Oracle Developer2.1/3.0, although I’ll admit that it works better in SQL Plus. Settings like Timing, Echo, Feedback, Show User, and Show Errors make itmuch easier to discover, in both tools, what went wrong and where in your code, so take advantage of these features.

Oracle Developer and SQL Server Management Studio always have the code and results split up into separate windows: This not always makeit easy to associate the error with the section of code you are working on, unless you click on the error in SSMS’ report pane (as a side note,see my auditing preference for SSMS). Setting the Spool on and off after working on production provides comprehensive evidence for auditingoperations on your production databases or for results during testing and development.

Moving Forward with Oracle Certification, Albeit Slowly.

If you want to be proficient with Oracle, then it makes sense to get yourself some training and certification. You can make some progress with abit of self-learning, but if you really want to get the most out of a new database ‘language’, then you’re doing yourself a disservice if you don’t doyour best to tool up. If this is something you’re exploring, then remember that you have to make time to prepare for exams, and while there isalways too much work to do (e.g. e-mail deluge), allocating a few hours a week to training makes all the difference. Even so, trying to get throughthe over-one-thousand pages of official Oracle course material has been very hard, but YouTube training videos from community colleges in theU.S. make great background noise while trudging through monotonous tasks! Hopefully I will not be shown the door by Microsoft as an MVP forpreaching heterodoxy (bearing in mind that my main reason for diversifying is that my employer has asked for it, and the benefits are a welcomeby-product).

I will not spend any more time describing what might entice one to use Oracle as a SQL Server DBA, because Jonathan Lewis has writtensubstantially and thoughtfully on comparing Oracle with SQL Server, even exploring how the two platforms compare in their handling of heaps,

and indexes. For the purposes of this article, my goal with the above is open the door to the understanding of Oracle’s solid architecture.

MySQL 5.x and Workbench 5.2

Pre-requisite: To follow some of what I’ll be covering next, you need to have a copy of MySQL Workbench 5.2. It’s a great tool, wasreleased July 1st, 2010, and has - thus far - been updated every month with point releases. You can download the latest version of theWorkbench tool for all common operating systems from the MySQL.com site.

As part of a group of DBAs asked to handle a large group of MySQL Instances (once you include all the test, development and productionenvironments too) the most useful software we found was this great administrative tool, MySQL Workbench.

Figure 3. The main dashboard of MySQL Workbench 5.2

We were pleasantly surprised to discover that the tool is pretty well fleshed-out, in terms of what your average DBA might expect. Not only did ithave the typical SQL Editor Query window, a connection manager that supports SSH connections to instances, but it also has a humble entityrelationship diagram tool called Data Modelling built in, just like all the larger vendors do.

Figure 4. MySQL Workbench 5.2’s Data Modelling tool, as seen on the MySQL.com site.

MySQL’s InnoDB Engine Should Be Handled Carefully

Before you think that I have nothing but praise for MySQL, I do have some nightmarish anecdotes of InnoDB log issues which some colleagues

have hit up against. In short, I recommend that you be very careful with the InnoDB log_file_size setting, since the entire database engine itselfcan be disabled silently, as David Pashley describes. What makes the problems which we encountered more painful is that they would havebeen easily avoided in another RDBMS , and that is not very impressive when you are regularly working between environments.

Rather than dive into the details, I will say that the safest thing that you can ultimately do when dealing with an InnoDB issue is to simply removethe offending data folders between restarts of the instance. This will allow the system to rebuild the InnoDB as if it were a cousin of the TempDBin SQL Server (which flushes itself out, without intervention). Incidentally, Log issues aside, the InnoDB will be your preferred Database Enginefor large databases, as opposed to the rudimentary MyISAM database engine.

Finally, before we take a look at how you might use MySQL, it’s worth noting that detractors of MySQL can also (rightly) criticise this RDBMS forits preference of a very fast responding database over proper transaction handling or ACID compliance (especially with the MyISAM defaultengine type, which is the actual internal DB engine). Assuming that you have swallowed these bitter pills, let’s take a look at how a DBA will useMySQL.

Typical MySQL Operations to Whet the Appetite

The following should give you a decent overview of the typical operations you would go through in a day of managing your MySQL instances.Starting with User creation:

CREATE USER 'UserName'@'ServerAddressOrIPSource%' IDENTIFIED BY 'Password';grant SELECT on DBname.* to 'UserName'@'ServerAddressOrIPSource%';grant ALL PRIVILEGES on AnotherDBname.* to 'UserName'@'ServerDNShostAddressOrIPSource%';flush privileges;

Nothing much different from standard SQL syntax, and the flush privileges command is to make sure that privileges are reloaded to the instance,otherwise they are not active. For most admins from other RDBMS, this post User Grant & Create statement is strange and tiresome,depending on your mood, and I am sure that this flushing command will be weeded out of future versions. Here are a few more typical actions:

Login session:

mysql –u root –penter password

See what is happening (global variables, privileges, connections, data) on the Instance:

show processlist;show databases;show variables;show grants;....

Stopping and Starting a MySQL Instance CMD Line:

(Do you have the rights to perform a stop and start? if not contact your Unix Admin)

$ sudo -l// check if processes are running before stopping/restarting$ ps -fu mysql$ sudo /etc/init.d/mysqld stop$ sudo /etc/init.d/mysqld start

A backup (otherwise known as a Data Dump):

mysqldump -u root -p --databases DBName --opt| gzip > MySQLdumpFileName.sql.gz//Loading a dump copy: simply logon to your other DB instance, and load the backup to aSource MySQLdumpFileName.sql//then all your data will begin to load. Or you can rename the database name in the dump file so//you can a the line: Drop database DBName;

On Windows Servers this is done from the Administration of Services Management Console. I’ve mentioned the above details regarding how tooperate MySQL via command line because the Windows users will find the equivalent easier.

Auditing MySQL Operations with Tee Logs:

(the MySQL equivalent of auditing)

tee log Logfile.log

...which is terminated by:

no tee

Typical MY.CNF file for a Website with Respectable Traffic (recommended settings):

[client]port = 3306 # this is the port MySQL uses#socket = /var/run/mysqld/mysqld.socksocket = /var/lib/mysql/mysql.sock[mysqld_safe]#socket = /var/run/mysqld/mysqld.socksocket = /var/lib/mysql/mysql.socknice = 0[mysqld]user = mysqlpid-file = /var/run/mysqld/mysqld.pid#socket = /var/run/mysqld/mysqld.socksocket = /var/lib/mysql/mysql.sockport = 3306basedir = /usrdatadir = /data/mysql # where the data folders livetmpdir = /data/mysqltmplanguage = /usr/share/mysql/englishskip-external-locking#skip-name-resolve # this will disable DNS resolution, thus# if you want your server to run by IP only, users have to be IP based too## * Fine Tuning#key_buffer = 16Mmax_allowed_packet = 16Mthread_stack = 128Kthread_cache_size = 8max_connections = 4000table_cache = 2048 # should be well above your Opened_tables variable value#* Query Cache Configurationtmp_table_size = 128M # this is the maximum size for temp table spacequery_cache_limit = 16M # way too small normally, push this since MEM is cheapquery_cache_size = 256M # loads to MEM for faster access, stores result set rows#(adjust according to your respective runtime status information, i.e. if Qcache_lowmem_prunes isRED)max_heap_table_size = 128M # typically too small## INNODB CONFIG (any changes require resetting files, a DB server restart will fail otherwise)#innodb_thread_concurrency = 16 #8 - eight is default, but too smallinnodb_buffer_pool_size = 18G #3500M #2048M – Iargest impact on space# the innodb Buffer Pool Size should be at least 50% of the physical ram# to be available for high volume DBsinnodb_additional_mem_pool_size = 20Minnodb_flush_log_at_trx_commit = 2innodb_lock_wait_timeout = 50innodb_log_buffer_size = 4Minnodb_log_file_size = 256M #by default much smallerlog = /data/mysql/mysql.loglog-queries-not-using-indexeslong_query_time = 3 #seconds before logging bad querieslog-slow-queries = /data/log/mysql/slow-queries.logexpire_logs_days = 5max_binlog_size = 100M

skip-bdb[mysqldump]quickquote-namesmax_allowed_packet = 16M[mysql]#---- prompt will display user host and database in mysql command lineprompt=(\u@\h) [\d]>#no-auto-rehash # faster start of mysql but no tab competition [isamchk]key_buffer = 16M

Performance Monitoring: Free Tools

When it comes to monitoring the performance of your MySQL installation, there are a few free tools which you can consider. Chances are thatyou’re using MySQL because you don’t want to be locked into an expensive vendor upsell cycle, so free tools are probably the order of the day.To start with, there are the Perl MySQLTuner.pl scripts, which will produce a result set that looks something like this:

Figure 5. A normal result set from the MySQLTuner.pl scripts. A fuller version can be downloaded from the top of this article.

Alternatively, If you don’t want to run Perl scripts, then you can also try the Maatkit open-source performance monitoring tools, with my personalpreference being the Query Digest Analysis of the Slow Queries Log file (mk-query-digest command), which is particularly useful once you havehad a decent amount of traffic on your database, hopefully before reaching production. As a comparison of the tools, the recommendations fromMySQLTuner are, as you can see above, very simple. On the other hand, Maatkit’s Query Digest will provide a detailed report for every poorperforming query that shows up in your Slow Query file, starting with the worst by group. If you want to use the Maatkit tool, then take a look at theabove sample MY.CNF file to see how to set the timing threshold for Slow Queries (the setting is in the last quarter of the file).

If you’re performing query tuning with MySQL, than, as with all DB systems, do not make a whole bunch of changes at once; perform one at atime, and compare your results against benchmarks and palpable real-time monitoring. For more on performance, please see this greatWebinar on Performance Tuning Best Practices for MySQL, with credit going to Google’s Tech Talk series. If you want to know more about theQuery Cache, which is something else you can tinker with via the MY.CNF file, see Ian Gilfillan’s great article on MySQL Query Cache for thebest explanation.

If you’ve hit the ceiling with MySQL Community Edition, described above, and require an RDBMS at the level of Oracle and SQL Server, thelogical step is to invest in MySQL Enterprise.

ConclusionAs with anything, all it takes is practise, practice and more pratique before you can feel truly free to wander between the heterogeneousrelational database worlds. As well as maintaining the Gold Standard of tools for comparing databases or schemas in SQL, Red Gate willeventually be releasing MySQL Compare (and have already released Schema and Data comparison tools for Oracle). Even if you don’t likethese brilliant tools, you can always try BeyondCompare, a favourite for the Oracle/Peoplesoft Development world.

Counter to the naysayers who warn against the perils of playing with multiple RDBMS, I have encountered only benefits since taking up Oracleagain alongside sporadic MySQL administration. One develops a deeper understanding of each with repetitive use, rather than a diffusion of themastery of one. Indeed, thanks to the connectivity of the SQuirrel SQL Universal SQL Client V3.1 , it is remarkably easy to move from oneenvironment to another, and progressively learn more and developer your skills. And finally, you will be considered a greater asset to potentialemployers because you have the experience in multiple domains.

Therefore, I invite you all cross the bridge to the other side, where you can enjoy other database systems, without compromising your knowledgeof your favourite one, because in a nutshell, obtaining more skills, leads to more problem-solving, leading to transcend logical and sequentialthinking (cognitive skills), and hopefully more innovation. This is not an argument to say you are more intelligent to by mastering more than onedatabase system, but it will simply allow the acquisition of specific types of expertise that help attend to critical tasks and ignore irrelevantinformation. These critical tasks, and unresolved problems create dysfunctional relationships in the workplace ultimately, they becomeimpediments to flexibility and in dealing with strategic change in an open-ended and creative way.

The more you develop skills across domains, the more apt you are to benefit from the stimuli to the cognitive and practical with respect to theirrelationship with resolving complex problems (what DBAs see very often), and thus leads to greater innovation. Knowledge dispels the inevitablefear of human nature’s knee-jerk reaction to reject change, by encouraging skills development in this way, brings database administrators muchcloser to the point of providing innovate solutions.

© Simple-Talk.com

O

An Introduction to PowerShell Modules11 March 2011by Jonathan Medd

For PowerShell to provide specialised scripting, especially for administering server technologies, it can have the range of Cmdletsavailable to it extended by means of Snapins. With version 2 there is an easier and better method of extending PowerShell: theModule. These can be distributed with the application to be administered, and a wide range of Cmdlets are now available to thePowerShell user. PowerShell has grown up.

ne of the great features of PowerShell is its extensibility. In PowerShell version 1 it was possible for other product teams within Microsoft orelsewhere to use a delivery mechanism known as snap-ins to extend PowerShell with additional PowerShell cmdlets. Although the snap-in

mechanism is still supported within PowerShell version 2, the new feature known as a PowerShell module has made it much easier to extendPowerShell to make it more appropriate for specialised uses.

PowerShell modules bring extendibility to the systems administrator, DBA, and developer. Whether it’s simply as a method to share functionsand scripts you have created or delving further into providing what would have been snap-in territory in version 1, you should be aware of thecapabilities of modules.

PowerShell is an integral part of Windows 7 and Windows Server 2008 R2. Window Server 2008 R2 ships with a number of built-in modules.Further modules become available after installing various additional components of the operating system.

In this article we will examine the fundamentals of modules and then, by way of an example, look at some practical ways you can make use ofthem in Windows Server 2008 R2.

PowerShell Modulesin PowerShell version 1, Snap-ins were popular with systems administrators who used cmdlets provided by third parties, such as Quest’s ActiveDirectory cmdlets and VMware’s PowerCLI cmdlets. However, you did not find many of those same administrators creating their own snap-ins,particularly if they were just starting out learning PowerShell because it would typically involve writing some C# code.

PowerShell version 2 makes it easier to achieve the objective of sharing functions and scripts as part of a module. In addition, whilst a snap-incan only contain cmdlets and providers, a module can also contain other common PowerShell items such as functions, variables, aliases andPowerShell drives.

Creating Your First PowerShell ModuleBefore you create your first PowerShell Module you will need to know what to create and where to store it. PowerShell will look in the pathsspecified in the $env:PSModulePath environment variable when searching for available modules on a system. By default this contains twopaths; one in a system location %windir%\System32\WindowsPowerShell\v1.0\Modules and also in the currently logged on user location%UserProfile%\Documents\WindowsPowerShell\Modules . In this article for the sake of convenience we will store our module in the%UserProfile% location.

Each module should be stored in a sub folder of either of these paths and typically be the name of the module - within that folder you will thenstore the files that make up the module. At the least, we need a *.psm1 file. In this file could be placed a number of functions or variables thatmake up the module. In addition it is possible to place PowerShell scripts in *.ps1 files in the module’s folder and reference them in the *.psm1file. As a final touch, a module manifest file can be created which will give a more professional and rounded feel to your module, but we willcome on to manifests later.

Let’s look at the process of creating an example module.

To start with we will place a couple of functions in a *.psm1 file to make the module. There is nothing special in a *.psm1 file other than the fileextension, so we can take a normal *.ps1 PowerShell script file containing our functions and rename it to make the *.psm1 file.

Firstly, let’s create two functions that we can use for our module which we will call CurrencyConversion. The first function will convert BritishPounds into Euros. It does this by connecting to a web service at http://www.webservicex.net to obtain the current conversion rate betweenBritish Pounds and Euros, and then multiplies the inputted value of Pounds by the conversion rate to output the Euro value.

Function ConvertTo-GBPEuro{param ([int]$Pounds)$Currency = New-WebServiceProxy -Uri http://www.webservicex.net/CurrencyConvertor.asmx?WSDL$GBPEURConversionRate = $Currency.ConversionRate('GBP','EUR')$Euros = $Pounds * $GBPEURConversionRateWrite-Host “$Pounds British Pounds convert to $Euros Euros”

}

The second function carries out the reverse operation; it converts Euros into their value in British Pounds.

Function ConvertTo-EuroGBP{param ([int]$Euros) $Currency = New-WebServiceProxy -Uri http://www.webservicex.net/CurrencyConvertor.asmx?WSDL$EURGBPConversionRate = $Currency.ConversionRate('EUR','GBP')$Pounds = $Euros * $EURGBPConversionRateWrite-Host “$Euros Euros convert to $Pounds British Pounds”}

Executing these functions with a value of 10 each time produces the following results:

Fig 1. Executing the Conversion Functions

We now save these functions into the CurrencyConversion.psm1 file and save it into the folderC:\Users\Jonathan\Documents\WindowsPowerShell\Modules\CurrencyConversion.

Fig 2. The CurrencyConversion Module File Location

So far, so simple - no real magic required. We can use the Get-Module cmdlet to reveal the modules that are available on our system (Note: itis Windows 7, later in this article we will look in particular at modules which ship with Windows Server 2008 R2) and we are pleased to see thatwith very little effort we have created our first module CurrencyConversion.

Get-Module -ListAvailable

Fig 3. List Available Modules

We can now make the functions in our CurrencyConversion module available to the current PowerShell session via the Import-Module cmdlet.Note: For illustrative purposes we will use the -PassThru parameter to display the output of this cmdlet to the console; by default this does nothappen.

Import-Module CurrencyConversion -PassThru

Fig 4. Importing the CurrencyConversion Module

We can see that the two functions are available to us in this PowerShell session, by using the Get-Command cmdlet.

Get-Command -Module CurrencyConversion

Fig 5. Viewing the Functions in the CurrencyConversion Module

Remember, to make these functions available to somebody else, there is no need to bundle them up into an MSI file for installation, like aPSSnapin, all that needs to be done is the CurrencyConversion folder and files to be copied to the right location. Currently this module onlycontains two functions, however it would be quite straightforward to expand it to contain conversion functions for all kinds of currencies and addthese to the CurrencyConversion.psm1 file and no further work would be necessary since it is already a valid module.

PowerShell Module ManifestsEarlier in this article we mentioned that it was possible to smarten up your modules and give them a more professional look by using modulemanifests. For instance you may wish to include some Author and Versioning information as part of the module or you may wish to specifyminimum versions of PowerShell and / or the .NET Framework which are needed for components of your module. So how do you go aboutcreating a module manifest? Well Microsoft have made creating a basic module manifest easy by giving us the New-ModuleManifest cmdlet.Whilst it is possible to create a module manifest manually (simply create a *.psd1 file containing your requirements and place it in the modulefolder), using the cmdlet makes it easy to create a basic one. Let’s continue with the CurrencyConversion module and create a basic modulemanifest using New-ModuleManifest.

We have two options when using this cmdlet. We can either specify all of the parameters we wish to include in the manifest and supply them onthe command line, or we can simply enter New-ModuleManifest and be prompted for parameters that we might want to provide.

For this example we’ll take the latter method. Entering New-ModuleManifest first prompts us for a path to the module manifest file; in this casewe give the path to the CurrencyConversion module and the name of the manifest file:C:\Users\Jonathan\Documents\WindowsPowerShell\Modules\CurrencyConversion\CurrencyConversion.psd1.

(Note: in all of these examples I am using the built-in PowerShell ISE rather than the basic console, consequently the screenshots are dialogueboxes)

Fig 6. Enter the Path to the Module Manifest

We will then be prompted for a number of other options, particular ones to consider are:

Author: if you don’t specify one PowerShell will use the value of the currently logged on user.

Fig 7. Enter the Author of the Module

ModuleToProcess: Specify the name of the *.psm1 file used to store the functions in the module.

Fig 8. Enter the name of the *.psm1 file

You can Use Get-Help New-ModuleManifest to examine in more detail other options you may wish to include in your module manifest.

The resultant file created from New-ModuleManifest will look something like the below, which can now be modified further if necessary:

## Module manifest for module 'CurrencyConversion'## Generated by: Jonathan Medd## Generated on: 16/02/2011# @{ # Script module or binary module file associated with this manifestModuleToProcess = 'CurrencyConversion.psm1' # Version number of this module.ModuleVersion = '1.0' # ID used to uniquely identify this moduleGUID = 'c6f2e5e7-91ff-4924-b4bb-8db0624195c9' # Author of this moduleAuthor = 'Jonathan Medd' # Company or vendor of this moduleCompanyName = 'Unknown' # Copyright statement for this moduleCopyright = '(c) 2011 Jonathan Medd. All rights reserved.' # Description of the functionality provided by this moduleDescription = 'Convert values between different currencies' # Minimum version of the Windows PowerShell engine required by this modulePowerShellVersion = '' # Name of the Windows PowerShell host required by this modulePowerShellHostName = '' # Minimum version of the Windows PowerShell host required by this modulePowerShellHostVersion = '' # Minimum version of the .NET Framework required by this moduleDotNetFrameworkVersion = '' # Minimum version of the common language runtime (CLR) required by this moduleCLRVersion = '' # Processor architecture (None, X86, Amd64, IA64) required by this moduleProcessorArchitecture = '' # Modules that must be imported into the global environment prior to importing this moduleRequiredModules = @() # Assemblies that must be loaded prior to importing this moduleRequiredAssemblies = @() # Script files (.ps1) that are run in the caller's environment prior to importing this moduleScriptsToProcess = @()

# Type files (.ps1xml) to be loaded when importing this moduleTypesToProcess = @() # Format files (.ps1xml) to be loaded when importing this moduleFormatsToProcess = @() # Modules to import as nested modules of the module specified in ModuleToProcessNestedModules = @() # Functions to export from this moduleFunctionsToExport = '*' # Cmdlets to export from this moduleCmdletsToExport = '*' # Variables to export from this moduleVariablesToExport = '*' # Aliases to export from this moduleAliasesToExport = '*' # List of all modules packaged with this moduleModuleList = @() # List of all files packaged with this moduleFileList = 'CurrencyConversion.psm1' # Private data to pass to the module specified in ModuleToProcessPrivateData = '' }

Now that we have created a basic module manifest, we can take that as a template for future modules and customise it for our needs.

In Fig 3. you saw that our CurrencyConversion module had a ModuleType of Script , whilst other built-in and third-party modules had aModuleType of Manifest. Now that we have created a module manifest file let’s see if that has made a difference to our CurrencyConversionmodule.


Fig 9. CurrencyConversion Module of ModuleType Manifest

Our CurrencyConversion module is now right up there with the Windows Server 2008 built-in modules! OK, maybe not, but hopefully you get theidea.

PowerShell Modules Built-In to Windows Server 2008 R2Now that we have seen how to create our own modules, let’s take a look at some of the modules which ship as part of the Windows 2008 R2operating system. As previously seen, to find out what modules are available we use the Get-Module cmdlet.


Fig 10. Available Modules on a Windows Server 2008 R2 System

You will notice that in Fig 10, the available modules are slightly different from those we saw earlier on a Windows 7 system. Out of the box wehave the BestPractises and ServerManager modules, and in this particular case we have the ActiveDirectory and ADRMS modules. WindowsServer 2008 R2 has evolved to become a more modular style operating system, where not everything is installed by default. Instead, dependingon the role of the server, the required components for that role are installed as required. So an Active Directory Domain Controller may notnecessarily have the IIS components installed, and a File Server may not have the DNS components installed.

When an additional server role or feature is installed, the relevant PowerShell modules are installed. In Fig .10 the Remote Server AdministrationTools for Active Directory have been installed, consequently the ActiveDirectory and ADRMS modules are present.

Server Manager Module

In fact there is actually a module included as part of Windows Server 2008 R2 to help you manage the modular components that are installed,the ServerManager module. By importing this module and using the Get-WindowsFeature cmdlet we can see which components have beeninstalled. By default this will produce a long list of results so it is possible to use wildcard filters to narrow the search down to what we areinterested in - in this case Active Directory. Note in Fig. 11 that the elements marked with a X are those which have been installed.

Import-Module ServerManagerGet-WindowsFeature *AD*

Fig 11. Viewing Installed Components

To install a component we use the Add-WindowsFeature cmdlet from the ServerManager module. (Note that this is an operation that willrequire elevated privileges so you will need to run PowerShell as an Administrator first.) In this case we will install the Print Server Role Service.Firstly we need to find out the name of the Print Server Role Service and then we can add it.

Get-WindowsFeature *print*

Add-WindowsFeature Print-Server

Fig 12. Adding the Print Server Role Service

The ServerManager module is great for deployment scenarios. As an alternative to storing different virtual machine templates for different serverroles, it would be possible to maintain fewer base OS templates and then have different build scripts ready to deploy the different server roles.These scripts could also contain customised settings for that server role, e.g. fine tuned registry settings.

Troubleshooting Pack Module

Windows Server 2008 R2 was a big leap forward in terms of PowerShell cmdlet coverage and there are now cmdlets available via modules foreverything from Active Directory, through Group Policy and Failover Clusters to IIS. For a full list check out the below table on Technet.

http://technet.microsoft.com/en-us/library/ee308287%28WS.10%29.aspx#BKMK_Appendix

Another example is the built-in TroubleshootingPack PowerShell module. To start using it we need to first of all import it and then find out whatcmdlets are available to us after importing it.

Import-Module TroubleshootingPackGet-Command -Module TroubleshootingPack

Fig 13. Cmdlets Available in the TroubleshootingPack Module

From Fig 11. we can see that there are two cmdlets available in this module. Troubleshooting Packs are found in the C:\Windows\Diagnosticsfolder. From a base install of Windows Server 2008 R2 we appear to have two available, Networking and the Program CompatibilityWizard.

Fig 14. Available Troubleshooting Packs

Let’s examine the Networking Troubleshooting Pack and see what we might be able to do with it:

Get-TroubleshootingPack C:\Windows\Diagnostics\System\Networking | Format-List *

Fig 15. Networking Troubleshooting Pack

It looks like we should be able to use this to help troubleshoot networking issues on this machines. We can execute the NetworkingTroubleshooting Pack with the other cmdlet from the module Invoke-TroubleshootingPack.

Get-TroubleshootingPack C:\Windows\Diagnostics\System\Networking | Invoke-TroubleshootingPack

Fig 16. Invoking the TroubleShooting PowerPack

During the running of this TroubleShooting Pack I was first prompted for an InstanceID, which I left blank, and then which test to carry out - Ichoose [1] Web Connectivity. I knew that this server did not have Internet connectivity so I choose that it should troubleshoot [1] Troubleshootmy connection to the Internet.

It correctly determined that whilst the server appeared to be configured for Internet access it was not able to contact external resources. This ispossibly not the most practical of examples, but you get the idea and the enormous scope for these Troubleshooting Packs.

Third-Party Modules

Many organisations and community members are taking advantage of the modules feature and there are already many modules you can obtainvia the Internet. A great starting point for finding these is the Codeplex site http://www.codeplex.com/. Head to the site and search for‘PowerShell Module’. As of the time of this article there are over 50 modules including the PowerShell Community Extensions Module, theBSonPosh Module and SQLPSX. I encourage you to look at the site and check out the many options there.

SummaryIn this article we have looked at how the use of Modules has made it much easier to package and distribute PowerShell scripts and functions forspecial purposes. No longer do the creators of scripts need to be developers who can compile code into PSSnapins as with PowerShell 1.Modules can be developed that fit the particular needs of a particular application or server role.

This is a great encouragement to DBAs and Systems Administrators to create PowerShell cmdlets to automate a number of processes, andmake them accessible via modules. It has also enabled Microsoft to provide modules to accompany added roles or features of the OS that easethe administrative workload. Getting access to these modules is a simple process by installing the correct role or feature and then importing themodule. As IT organisations strive to greater automation it is well worth checking out the automation possibilities that these modules bring.

© Simple-Talk.com

Hello, can you just send me all your data please?17 March 2011by FatherJack (Jonathan Allen)

Our house phone rang on Saturday night and Mrs Fatherjack answered. I was in the other room but I heard her trying to explain to the caller thatthey were in some way mistaken. Eventually, as she got more irate with the caller, I went out and started to catch up with the events so far. Thecaller was trying to convince my wife that our computer was infected with a virus. She was confident that it wasn't.

Her patience expired after almost 10 minutes and the handset was passed to me. The caller tried the same with me, to explain that my computerhad been sending him viruses for the past week and he could show me where they were and then explain how to remove them. He wanted me togo to the Team Viewer website. I can only presume so that he would then ask me to join a session and he would then have control of mycomputer to do goodness only knows what. I didn't even turn my PC on and felt confident that while I was speaking to him I was at least taking uphis time and preventing him speaking to someone who may be more susceptible to this sort of scam. I put the phone down at one point but whenI picked up the handset a few minutes later it was apparent he was still there. I even got put on to his 'supervisor' who, after a few obtusecomments from me about not owning 'a Google' came out with an extraordinarily long and diverse stream of abuse. I had been called most ofthose names before but never all in one sentence.

Eventually, after I started to repeat their lines to them and try to get them to go to go to Team Viewer etc., they ended the call. I rang ourtelephone provider and tried to log the call as a nuisance call in an attempt to get this organisation investigated and stopped. Those proceduresare now in progress.

Thinking about this incident since Saturday it has been a thought of mine that it may be that as computer security is getting better that theweakest, and easiest, target is now the computer user. Social Engineering may give the biggest bang for your scammer's buck.

As a good DBA you monitor your security settings, have controls in place to make sure your users are only connecting to data that they areentitled to, data is encrypted and so on and so forth. All of this is in vain though if one of those permitted users 'gives' the data away over thephone or via email due to an improper request. It's wholly out of your control. As someone in the computer industry I easily saw through theirdisguise of someone trying to help and was wholly suspicious of their motives. I wonder if someone less aware of data security would have beenmore cooperative.

How do you allow or prepare for this? Do you ever test for a weakness in this area? Would you know if it happened?

Please note, I have no idea who the caller was or whether Team Viewer are aware that they are being referenced in such a way. I have usedtheir software, for legitimate remote assistance purposes, and never found any issues with it when used as designed. I fear their software isbeing implicated in activities of which they are unaware.

D

EntityDataSource Control Basics03 March 2011by Joydip Kanjilal

The Entity Framework can be easily used to create websites based on ASP.NET. The EntityDataSource control, which is one of aset of Web Server Datasource controls, can be used to to bind an Entity Data Model (EDM) to data-bound controls on the page.Thse controls can be editable grids, forms, drop-down list controls and master-detail pages which can then be used to create, read,update, and delete data. Joydip tells you what you need to get started.

ata Source controls can be used to connect to and retrieve data from various types of data sources without the need to write code to performsuch operations. The EntityDataSource control helps you to create data driven applications quickly and easily, and to perform Create Delete

Update Delete (CRUD) operations on data based on an Entity Data Model. This article briefly introduces ADO.NET Entity Framework, andprovides an example of using the EntityDataSource control in Entity Framework 4 to retrieve and bind data to data controls.

PrerequisitesTo work with ADO.NET Entity Framework, you must have Visual Studio 2010 installed. You should have some previous experience with C# andVisual Studio, and knowledge of Entity Framework 3.5 would be an added advantage.

Before we delve deep into using the EntityDataSource control, let’s take a quick tour of the basic concepts of Entity Framework.

What is Entity Framework all about?The strategies for Data Access using Microsoft technologies have changed over the years. From Remote Data Objects (RDO), Data AccessObjects (DAO), to ADO.NET, there have been marked improvements to the way data is accessed. Then, Object Relational Mappers (ORMs)evolved with the view to increase development productivity, database independence, and database portability.

ADO.NET Entity Framework is a mature ORM with provisions to connect to any database as long as you have a compatible ADO.NET dataprovider. And, Entity Framework has lots of excellent features such as Entity Inheritance, Entity Composition, Change Tracking, Model FirstDevelopment, and support for Plain Old CLR Objects (POCO)– features that you wouldn’t generally see in a typical ORM. Entity Framework runson top of the managed environment of .NET Framework.

Why do you need Entity Framework?When you use Entity Framework, you can concentrate more on writing application logic rather than doing database related stuff, so developmenttime and effort is greatly reduced. You don’t need a Data Access Layer anymore. And, you can query your Entity Data Model using LINQ.Equipped with support for inheritance of types, using complex members, relationships between entities, and support for Persistence Ignorancethrough POCO entities, Entity Framework is an extended ORM par excellence.

The Entity Data ModelEntity Framework abstracts the logical schema of the data and presents its conceptual schema in the application. It achieves this through theEntity Data Model - an extended Entity Relationship data model which was invented in the 1970s by Dr. Peter Chen, and is concerned with theentities and the relationships between them.

You can generate an Entity Data Model either by using the EDMGen.exe command line tool or by using the Entity Data Model Wizard in VisualStudio 2010.

The Entity Data Model is comprised of three layers:

The Conceptual Data Definition Language Layer. This conceptual layer is used to define the entities and their relationships; it isrepresented using CSDL files.The Store Space Definition Language Layer. This storage layer represents the schema of the underlying data store from which the EntityData Model is created; it is represented using SSDL files.This maps the conceptual layer to the storage layer; it is represented using MSL files.

Note that the files described above are created if you use the EDMGen.exe command line tool to create your Entity Data Model. However, if youuse the Entity Data Model Wizard in Visual Studio 2010, instead an EDMX file containing all three layers is created.

Working with the EntityDataSource controlData Source controls were introduced in ASP.NET 2.0. These are controls that can be bound to external data sources such as databases, XMLfiles, and so on. The EntityDataSource control was first introduced in Visual Studio 2008 SP1. It can be used to bind data to data controls in

ASP.NET applications that use Entity Framework, by connecting to an Entity Data Model.

In the following sections, we will explore how we can work with the EntityDataSource control to bind data to an ASP.NET GridView control. So,the first thing we need to get started is an Entity Data Model.

Creating an Entity Data ModelTo create an Entity Data Model, you have two options; either using the built-in Entity Data Model Designer in Visual Studio 2010, or using theEDMGen.exe command line tool. We will use the former option as it is more user-friendly.

We will use the AdventureWorks database to create our Entity Data Model:

1. In the Visual Studio 2010 IDE, select the project in the Solution Explorer window.2. Right-click and select Add, New Item.3. Under Installed Templates, select Data, and then select ADO.NET Entity Data Model.

Figure 1: Creating a new ADO.NET Entity Data Model using the Entity Data Model Wizard

4. In Name, type AdventureWorksDataModel as the name for the model and click Add.5. In the Choose Model Contents window, select Generate from Database, as we will be generating our Entity Data Model from the

AdventureWorks database.

Figure 2: Specifying the source of the data for the model

6. Click Next.7. In the Choose your Data Connection window, click New Connection and specify the connection properties to connect to the

AdventureWorks database.8. Test the connection, and if all is fine, click OK.9. Click Next.

10. In the Choose your Database Objects window, select Tables and then select Store from the list of database objects.

Figure 3: Choosing the database objects

11. Type AdventureWorksDataModel as the namespace for the model, and click Finish.

The Entity Data Model for the AdventureWorks database is created. Here’s how it looks in Design View:

Figure 4: The Model Browser window and the Entity Data Model in Design View.

Configuring the EntityDataSource controlTo use an EntityDataSource control, simply drag and drop the control from the toolbox in your Visual Studio 2010 IDE. You then need toconfigure the EntityDataSource control to specify the connection string and the container to be used:

1. Switch to the EntityDataSource control in Design View.2. Select the Configure Data Source with Wizard option for the control.3. Select AdventureWorksEntities as the Named Connection, to connect this control to the AdventureWorks Entity Data Model you just

created.4. Select AdventureWorksEntities as the DefaultContainerName.

Figure 5: Configuring the EntityDataSource Control

5. Click Next, and then specify the fields to be retrieved from the database table; in our example, this is the Store table of theAdventureWorks database.

Figure 6: Configuring the Data Selection for the EntityDataSource Control

6. When you have selected the fields, click Finish.

When the EntityDataSource control has been configured, the markup code of the control looks like this:

<asp:EntityDataSource ID="EntityDataSource1" runat="server" ConnectionString="name=AdventureWorksEntities" DefaultContainerName="AdventureWorksEntities" EnableFlattening="False" EntitySetName="Stores" EntityTypeFilter="Store" Select="it.[CustomerID], it.[Name], it.[ModifiedDate]"></asp:EntityDataSource>Listing 1: EntityDataSource control markup code

Configuring the GridView Data control

Now that the EntityDataSource control has been configured, the next step is to associate the DataSource property of the GridView Data controlwith the EntityDataSource control. You can add a GridView control by selecting it from the Toolbox in the Visual Studio 2010 IDE.

Figure 7: Associating the Data Source Property of the GridView with the EntityDataSource Control

In this example, we selected only the CustomerID, Name, ModifiedDate fields when we configured the EntityDataSource control. You can verifythis by clicking Edit Columns in the GridView control menu.

Figure 8: Verifying the selected columns

Ensure the Automatically Generate Fields check box for the GridView control is clear, and only the fields you need are selected.

When you have configured the GridView control, the markup code looks like this:

<asp:GridView ID="GridView1" runat="server" AutoGenerateColumns="False" DataSourceID="EntityDataSource1"> <Columns> <asp:BoundField DataField="CustomerID" HeaderText="Customer ID" ReadOnly="True" SortExpression="CustomerID" /> <asp:BoundField DataField="Name" HeaderText="Name" ReadOnly="True" SortExpression="Name" /> <asp:BoundField DataField="ModifiedDate" HeaderText="Modified Date" ReadOnly="True" SortExpression="ModifiedDate" /> </Columns></asp:GridView>

Executing the application

Now that you have configured the Entity Data Source and the GridView controls, press F5 to execute the application. The output looks like this:

Figure 9: The Application in Execution!

ConclusionIn ASP.NET 2.0, data binding has been simplified with the introduction of Data Source controls. Data Source controls are server controls that

can be used to bind data retrieved from various data sources to the ASP.NET data controls. The EntityDataSource control is one such DataSource Control; it helps you to bind data that is retrieved through the Entity Data Model.

This article has looked at the basic concepts of Entity Framework, and how we can work with the EntityDataSource control. We showed how tocreate an Entity Data Model from an existing database. Next, we demonstrated how to create an EntityDataSource control and configure it toretrieve data from the Entity Data Model. Lastly, we showed how we can use the EntityDataSource control to bind data to a GridView control inour ASP.NET application.

To learn more about Entity Framework, you can refer to my book entitled Entity Framework Tutorial (Packt Publishing). Here’s the link to thebook at Amazon.

ReferencesHere are a few links for further study on this topic:

What's New and Cool in Entity Framework 4.0

What’s New in Entity Framework 4?

© Simple-Talk.com

Simple Talk. March 17th 2011 - Redgate · The default trace in SQL Server - the power of...

Documents

Transcript of Simple Talk. March 17th 2011 - Redgate · The default trace in SQL Server - the power of...