Top_bar_btn_squeeze
At last September's BarCamp Brighton I saw the CSS Specificity Snap cards and this gave me an idea. So, armed with a concept of showing the different ways of producing the same output using the SQL Server 2005 flavour of SQL I started generating SQL statements. Alex was kind enough to make them pretty, and last week I ran a Madgex ILP session to play.

There are 26 cards in my pack, each card has a SQL statement, and a letter. The letter is used for the crib sheet to allow me to easily spot the matches.



I prepared by producing two piles of 13 cards, with a match in both piles. I then put one pile in the middle of the table face down, and dealt out the rest of the cards to the 5 people who were playing. I placed the 3 spare cards on the table face up so that we could all see them and proceeded to turn them over one by one. When a card was turned over everyone looked at the SQL on the overturned card, and checked their cards and the spare cards to see if there was a match. If a match wasn't spotted then I explained what the SQL was doing, and at only one point did I have to say which letter the matching card would have. During this I had quite a few comments about functions people didn't recognise - especially COALESCE and NULLIF. I also explained some of the performance, and functional differences between the SQL statements when they were seen - so for instance I explained the differences between DELETE and TRUNCATE but I still I have these as a match because they can provide the same result and provided an interesting talking point.

The second game we played was a memory game where I placed all of the cards on the table face down and everyone turned over 2 cards at a time. If the SQL on the cards resulted in the same output, then it was a match and they took the cards away. If they didn't then the cards got turned back over and the next person had a go.

All in all this session took about 30 minutes, and resulted in quite a lot of noise and laughter (so much so that someone came and closed the door of the room we were in).

The following table contains the Letter Code, SQL statement, and the matching Letter code. They are ordered so that the matches are grouped together.
CodeSQLMatch
VSELECT CAST(GETDATE() AS VARCHAR(11))G
GSELECT CONVERT (VARCHAR(11), GETDATE())V
ZSELECT name FROM sysobjects WHERE xtype = 'P'Q
QSELECT name FROM sys.proceduresZ
DSELECT sysobjects.name, syscolumns.name FROM sysobjects INNER JOIN syscolumns ON sysobjects.id = syscolumns.id WHERE sysobjects.xtype = 'u'A
ASELECT sysobjects.name, syscolumns.name FROM sysobjects, syscolumns WHERE sysobjects.id = syscolumns.id AND sysobjects.xtype = 'u'D
USELECT getdate()O
OEXEC('SELECT getdate()')U
JDELETE FROM TestT
TTRUNCATE TABLE TestJ
YSELECT ISNULL(NULL,1)W
WSELECT COALESCE(NULL,1)Y
PSELECT [name], xtype FROM sysobjects ORDER BY xtypeM
MSELECT [name], xtype FROM sysobjects ORDER BY 2P
HSELECT NULLIF(1,1)R
RSELECT CASE 1 WHEN 1 THEN NULL ELSE 1 ENDH
ISELECT [name], xtype FROM sysobjects ORDER BY 1N
NSELECT [name], xtype FROM sysobjects ORDER BY [name]I
XSELECT CAST(GETDATE() AS VARCHAR(20))F
FSELECT CONVERT (VARCHAR(20), GETDATE())X
SSELECT name FROM syscolumns WHERE id = ( SELECT id FROM sysobjects WHERE xtype = 'u' AND name = 'Jobs' )K
KWITH objects (id) AS
( SELECT id FROM sysobjects WHERE xtype = 'u' AND name = 'Jobs' ) SELECT name FROM syscolumns INNER JOIN objects ON syscolumns.id = objects.id
S
LINSERT INTO Job (JobID, PrimaryJobTypeID) VALUES (1,1)B
BINSERT INTO Job (JobID, PrimaryJobTypeID) SELECT 1,1L
CDECLARE @sMessage AS VARCHAR(20) SET @sMessage = 'Hello'E
EDECLARE @sMessage AS VARCHAR(20) SELECT @sMessage = 'Hello'C
Yesterday Christian left me a comment containing an enhancement for the spu_generateinsert SQL. His suggestion was adding a new parameter @GenerateOneLinePerColumn to allow more "pretty" SQL to be produced - producing all the columns on separate lines which makes it easier for file comparisons to spot differences when comparing data values. I thought it made sense, so this morning I used an hour of my Madgex ILP time to make this change.
By calling EXEC spu_GenerateInsert @table = 'Detail',@GenerateOneLinePerColumn = 0 it produces
-- ** Start of Inserts
INSERT INTO [Detail] ([ID], [Age]) VALUES (1,36)
INSERT INTO [Detail] ([ID], [Age]) VALUES (2,40)
-- ** End of Inserts


but by calling EXEC spu_GenerateInsert @table = 'Detail',@GenerateOneLinePerColumn = 1 it produces

-- ** Start of Inserts
INSERT INTO [Detail]
(
[ID],
[Age]
)
VALUES
(
1,
36
)

INSERT INTO [Detail]
(
[ID],
[Age]
)
VALUES
(
2,
40
)
-- ** End of Inserts


Thanks to Christian for this suggestion. The updated script can be found here and related blog posts can be found here.
Note: this is now only working on SQL 2000 as during my testing I found that VARCHAR(8000) just wasn't long enough so I've made it VARCHAR(max). This is the only SQL 2005 specific piece of SQL in this procedure and I've put comments in the code to indicate -- change this to be (8000) for SQL Server 2000.
There are 2 methods of executing dynamic SQL :
  • Exec which executes a scalar-valued, user-defined function, a system procedure, a user-defined stored procedure, or an extended stored procedure
  • sp_executesql which executes a SQL statement or batch that can be reused many times, or that has been built dynamically
Example SQL Statements - both returning the same results:

  • Exec ('Select * from Items')
  • DECLARE @strQuery AS NVARCHAR(50)
    SET @strQuery = 'Select * from Items'
    EXEC sp_executesql @strQuery

As you can see sp_executesql demands a parameter to be sent rather than just a string. This leads to improved security and improved performance because the SQL statement itself remains constant and only the parameter values change which means that the SQL Server query optimizer is likely to reuse the execution plan it generates for the first execution.

In addition, when writing a parameterized query, the additional parameters can be specified separately which again adds to the security, see the following which is based on the example on MSDN:
DECLARE @SQLString NVARCHAR(500)
DECLARE @ParmDefinition NVARCHAR(500)
DECLARE @TitleDefinition NVARCHAR(50)

/* Build the SQL string once.*/
SET @SQLString =N'SELECT * FROM ItemsWHERE Title like ''%'' + @title + ''%'''
SET @ParmDefinition = N'@title nvarchar(50)'

* Execute the string with the first parameter value. */
SET @TitleDefinition = 'S'
EXECUTE sp_executesql @SQLString, @ParmDefinition, @title = @TitleDefinition

/* Execute the same string with the second parameter value. */
SET @TitleDefinition = 'P'
EXECUTE sp_executesql @SQLString, @ParmDefinition, @title = @TitleDefinition


So, if you have an option, use sp_executesql instead of exec
So, the first period of ILP time is over, and I've managed to use 50-60 hours of my allocation attending workshops, watching presentations, learning stuff and doing ideas project work.
Learning:
  • Started reading through the Opera web standards curriculum to ensure my knowledge was up to date
  • Attended various presentations including: Stress Management, Zen and the craft of software development, Introduction to unit testing with nUnit, CSS3: Third time's the charm, Comet (a presentation by Simon Willson who came in kindly to talk to us), Basic NLP and hypnosis, SQL Server Advanced, Vision and Goal setting, Theories of Management and Javascript inheritance

Ideas:
  • Looked into DBVerse, a database deployment tool, to see if this would help us. Review here
  • Looked into Tarantino, another database deployment tool. Review here
  • Worked with Bruce to start writing a database deployment tool to meet our specific requirements. This is still ongoing and will continue into the next few months
  • Attended Hackday 3 and worked on some keyword searching work with Chris
  • Looked into Stylecop, a code style analyser. Review here

Now I have to start planning out the next years allocation - I'm cashing in a couple of days to attend the ReMix UK conference, I'm going to continue attending Simon's management and leadership courses, as well as his NLP related ones. I'm also hosting 2 sessions in the next month - one which is SQL Snap, based on CSS Specificity snap and one which is an experimental SQL Coding dojo. I'm sure I'll report back on both of these afterwards.
Another fine batch of newly added blogs to go and take a look at:

Design Agency Blog
The Recommender
Upstream connections - SEO
A mobile phone blog
Pet Care and News from Grove Lodge Vets
Short Stories :: and other stuff
Just Seven Things
Miss Tickle

And a submission that doesn't make it in Last tango in Buenos Aires because the author is an ex-resident of Brighton, currently residing in Buenos Aires :-)
I read the article SQL SERVER - 2005 - Difference Between INTERSECT and INNER JOIN - INTERSECT vs. INNER JOIN today and was interested in the relative speed of Inner join vs Intersect for simple matcjhes and so thought I'd make another use of my timing code to see the difference.

I produced a couple of very simple tables:
Summary ([ID] INT, [Name] NVARCHAR(10))
Detail ([ID] INT, [Age] INT)

both with primary keys on ID and populated them both as follows:

INSERT INTO Summary
SELECT 1, 'Jane' UNION ALL
SELECT 2, 'Richard'

INSERT INTO Detail
SELECT 1, 36 UNION ALL
SELECT 2,40


I chose to do a very simple task - retrieve the ID for an entry which is in both Summary and Detail - note: this assumes that the ID is a foreign key constraint and the IDs are relating to the same item.

Using my rudimentary timing code, I think that intersect is more performant than the inner join - based on running the same query 100000 times

SELECT Summary.ID
FROM Summary
INNER JOIN Detail
ON Summary.ID = Detail.ID


and

SELECT ID FROM Summary
INTERSECT
SELECT ID FROM Detail


which has the following results:
11746
DescriptionTimeInMS
Join
Intersect10203

I ran them both together and took a look at the actual execution plans and it does indicate that the intersect takes a little less effort - which must be down to the Seek instead of Scan in the Join'd table detail


So, another option for me to remember to consider when determining the best way to get at some data
One of the Madgex value words is Passionate, and this got me to thinking what am I the most passionate about in my working life and it has to be learning. And by that I mean both learning myself and encouraging and enabling learning amongst team members and colleagues.

In Eat that frog - which I'm currently reading - there is a lovely quote
"Continuous learning is the minimum requirement for success in any field"

One of the things I try and do is read a lot, mostly in the form of blogs where I find useful titbits of information, or clues, or things to go and find out more about, but also books relating to more static skills. My google reader list now includes more leadership and time management focussed blogs as well due to my recent role change. I've also cashed in some of my ILP time to follow the Web Standards Curriculum to ensure that I have a full understanding of the implications of web standards.

This also leads to me keeping a keen eye on what is going on event-wise within the community - both locally, here in Brighton, and further afield (well near London main-line stations if I'm honest) - that could form an introduction to new techniques, technologies and methodologies. I've just started sending out a weekly email to Madgex containing all of my finds, so that others can benefit from shared learning experiences. My key resources for this are upcoming searches for Brighton & Hove, geek, .Net, Barcamp, sql, developer along with MSDN events, VBUG events and Skills Matter events.

I'm planning to go to Remix UK in September, which I'm hoping will be a great way to catch up with new technologies and new methodologies. I've been to a WebDD, a DDD and SQLBits community conferences before so it will be great to see how Remix differs.
Following on from my More on SQL Server 2005 Full Text Index Service post the other day, I thought I'd give an example of how it works

Setup


I created a table LanguageData which consisted of 2 fields liID and sValue

CREATE TABLE [dbo].[LanguageData]
(
  [ID] [int] IDENTITY(1,1) NOT NULL,
  [Value] [nvarchar](50) NOT NULL,
  CONSTRAINT [PK_LanguageData] PRIMARY KEY CLUSTERED
  (
  [ID] ASC
  ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]


I entered some sample data as follows

INSERT INTO [LanguageData](Value)
SELECT 'the' UNION
SELECT 'przed' UNION
SELECT 'jakby'


where 'the' is featured in the English and Neutral language noise word files, 'przed' and 'jakby' are in the Polish language noise files. Note: You'll need to have installed the Polish full text index to make this work.

Next enable the full text indexing on the database

sp_fulltext_database 'enable'

and then create a full text catalog and an index for the table LanguageData

CREATE FULLTEXT CATALOG LanguageData AS DEFAULT
CREATE FULLTEXT INDEX ON LanguageData ([Value] LANGUAGE 1045 )
KEY INDEX [PK_LanguageData]


where 1045 indicates the language Polish - retrieved from
SELECT alias, lcid FROM Sys.syslanguages
WHERE alias = 'Polish'


Scenarios


Now, time to run some tests,

1) Check that all is initially correct, get everything

SELECT * FROM LanguageData

which returns 3 rows, as expected

2) Get everything which matches the noise word 'jakby'

SELECT * FROM LanguageData
WHERE CONTAINS(*,'jakby')


returns no rows as the word 'jakby' was stripped out at index time, and is also stripped out at query time, and a warning message "Informational: The full-text search condition contained noise word(s)."

3) Get everything which matches the noise word 'jakby' specifying Polish (1045) in the CONTAINS clause

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'jakby', language 1045 )


returns no rows as the word 'jakby' was stripped out at index time, and is also stripped out at query time, and a warning message "Informational: The full-text search condition contained noise word(s)."

4) Get everything which matches the word 'jakby' specifying US English (1033) in the CONTAINS clause

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'jakby', language 1033 )


returns no rows as the word 'jakby' was stripped out at index time. No warning message is displayed though as 'jakby' is not a noise word for US English

5) Get everything which matches the word 'the'

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'the')


returns one row, as 'the' isn't a noise word in Polish and so wasn't stripped out at index time or at query time

6) Get everything which matches the word 'the' specifying Polish in the CONTAINS clause

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'the', language 1045 )


returns one row, as 'the' isn't a noise word in Polish and so wasn't stripped out at index time or at query time

7) Get everything which matches the word 'the' specifying US English in the CONTAINS clause

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'the', language 1033 )


returns no rows as 'the' is a noise word in US English and therefore is excluded at query time. A warning message "Informational: The full-text search condition contained noise word(s)." is displayed

Now to make it more interesting, lets add some data which combines noise words with normal words

INSERT INTO [LanguageData] (Value)
VALUES
('jakby przed the test')

which includes 2 polish noise words, one english noise word and one remaining word

8) Get everything which matches the word 'jakby'

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'jakby')


returns no rows as the word 'jakby' was stripped out at index time, and is also stripped out at query time, and a warning message "Informational: The full-text search condition contained noise word(s)." is displayed

9) Get everything which matches the word 'the'

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'the')


returns 2 rows, both the individual 'the' entry and the new 'jakby przed the test' rows. No message is displayed.

10) Get everything which matches the word 'the' using an explicit query language of Polish

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'the', language 1045)


returns 2 rows, both the individual 'the' entry and the new 'jakby przed the test' rows. No message is displayed.

11) Get everything which matches the word 'the' using an explicit query language of English

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'the', language 1033 )


returns no rows as the word 'the' was stripped at query time according to the noise words for 1033. A warning message "Informational: The full-text search condition contained noise word(s)." is displayed

And then to make it even more interesting, lets add a new word 'jane' to the LanguageData dataset, and to the noisewords file for the Neutral language (LCID 0) which (on my machine at least) is at C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData\noiseNEU.txt

To get the full text indexing service to pick up the changes to the noise files, you need to restart the service via the Control Panel -> Administrative Tools -> Service dialog

INSERT INTO LanguageData (Value)
VALUES ('jane')

12) Get everything which matches the word 'jane' using the implicit query language (Polish)

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'jane')


which returns 1 row, as 'jane' isn't a polish noise word and wasn't stripped out at either index or query time

13) Get everything which matches the word 'jane' using the explicit query language English

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'jane', language 1033 )


which returns 1 row, as 'jane' isn't a polish noise word and so wasn't stripped out an index time, neither is it an english noise word so isn't stripped out at query time either

14) Get everything which matches the word 'jane' using the explicit query language Neutral

SELECT * FROM LanguageData
WHERE CONTAINS(*, 'jane', language 0 )


which returns 0 rows as 'jane' is a neutral noise word and so is stripped out at index time. A warning message "Informational: The full-text search condition contained noise word(s)." is displayed

Summary


What this shows, is that when you choose a language to set your full text index up as, this impacts the words which will be stripped out of the index as anything defined as noise will be removed. This has an impact on the choice of language when different language content is being indexed as we need to be clear that what is one languages noise word, isn't another ones non-noise word.
  • When querying a full text index, it is possible to specify that the query you are running is for a particular language, but if you do and if the language is different to that you set the index up as, then you'll remove 2 sets of noise words from your search - both those that were set up when the index was defined, but also those based on the language specified in the query
  • The noise files are defined on an instance by instance basis and so any alterations to the noise file will affect all full text indexes on an instance.
  • To pick up changes to the noise files, the service needs to be restarted.
  • SQL Server 2008 seems to change this and so more research will be required - it relies on STOPLISTs instead.
In my previous post about How to work out which are valid full text languages on a SQL Server 2005 instance I referred to sys.syslanguages and sys.fulltext_languages in my queries, but didn't really say much more about them, so here goes

sys.syslanguages


In the definition on MSDN it states
"Contains one row for each language present in the instance of SQL Server 2005. Although U.S. English is not in syslanguages, it is always available to SQL Server."

And one thing on the choice of U.S. English vs UK English. The SQL Server Full Text Search: Language Features says
"In actual fact UK English does not refer to the Queen's English or the English used in the United Kingdom, but International English; the English that is used in all other English speaking countries other than US English."

As an English person, living in England and speaking English I find this a somewhat grating use of the phrase UK English. Bah!

sys.fulltext_languages


In the definition on MSDN it states
"This catalog view contains one row per language available for full-text indexing/querying operations. Each row provides an unambiguous representation of the available full-text linguistic resources that are registered with Microsoft SQL Server. The name or lcid can be specified in the full-text queries and full-text index DDL."

The list in this table, doesn't match those in sys.syslanguages. These are purely the full-text-indexable languages. As I mentioned in my previous post 6 languages can be added by following these instructions. The line
"The name or lcid can be specified in the full-text queries and full-text index DDL."
refers to the ability to issue the following SQL:
SELECT *
FROM LanguageData
WHERE CONTAINS(*, 'the', language 1045 )

which indicates that the locale used for querying should be 1045, which equates to Polish. I have some sample SQL to post in the next few days which demonstrates the difference between indexing and querying language choices.

In General


I've been doing quite a bit of work with trying to understand how the SQL Server 2005 full text index works, and how the language choice impacts it. My knowledge of full text indexing as a whole to this stage hasn't been great, so I've done quite a lot of background reading. Amongst the best resources I've found are: both by Hillary Cotter which provide a really simple, but yet pretty comprehensive introduction to the various features of indexing and querying using the Full Text Index service.
Despite SQL Server 2005 supporting 33 languages (found by issuing SELECT * FROM sys.syslanguages), not all of these are available for the full text index service. To find out which ones are run the query:
SELECT *
FROM sys.fulltext_languages


On my machine, this returns the following languages:
  • British English
  • Chinese (Hong Kong SAR, PRC)
  • Chinese (Macau SAR)
  • Chinese (Singapore)
  • Simplified Chinese
  • Traditional Chinese
  • Dutch
  • English
  • French
  • German
  • Italian
  • Japanese
  • Korean
  • Neutral
  • Spanish
  • Swedish
  • Thai

An additional 6 languages are supported and available for a separate install. These are :
  • Danish
  • Polish
  • Português (Brasil)
  • Portuguese
  • Russian
  • Turkish

To install these, follow the instructions here.
The following languages are not supported for full text searching at all within SQL Server 2005:
  • Arabic
  • Bulgarian
  • Croatian
  • Czech
  • Estonian
  • Finnish
  • Greek
  • Hungarian
  • Latvian
  • Lithuanian
  • Norwegian
  • Romanian
  • Slovak
  • Slovenian

SQL Server 2008 offers more full text language support bringing the total of available languages to 50. It would appear that Danish, Polish and Turkish remain installable additions.
I'm currently doing some work on looking into producing a localised version of a Madgex job board (not dis-similar to the work I did this time last year)and am mainly looking at the SQL Server and javascript areas whilst a colleague looks at the .NET side. Glenn gave me a tip off that when he'd been doing something similar, he'd had problems with Visual Studio 2005 not saving his javascript files as UTF.

So, within VS2005 I created a javascript file and put 2 lines into it. They were simply:
alert ('hello world');
alert('Zarys gramatyki por¢wnawczej jezyk¢w slowianskich');


I then linked this into a (very) basic HTML page

<html xmlns="http://www.w3.org/1999/xhtml" >
<head>
  <title>i18n</title>
  <script type="text/javascript" src="js/i18n.js" language="javascript"></script>
</head>
<body>
</body>
</html>


so that on pageload 2 alert boxes are displayed, one saying 'Hello World' and the one saying 'Zarys gramatyki por¢wnawczej jezyk¢w slowianskich'.

Unfortunately what is displayed instead is:

which isn't exactly what I had in mind.

I opened the file in Notepad++ (my text editor of choice) to take a look at the file type and it is, as I'd expected, saved as ANSI, not UTF-8 or UTF-16


I used Notepad++'s menu item Format -> Convert to UTF-8 to convert this file from ANSI into UTF-8, and then re-ran my test and all works correctly as expected. Hurrah!

I then repeated this using VS2008 and found that this is one of the fixes over VS2005.

So, the alert now correctly displays:

and when opened in Notepad++ the file is now, correctly, UTF-8.
In my almost 15 years of development, I've found little more beneficial than a well run Post-Implementation Review meeting. I find them a great to way to learn, improve and ensure that the next project goes more smoothly than the one before.

What is a Post-Implementation/Post-Project Review?


It is a meeting held at the end of a project at which people who have contributed to the project as a whole get an opportunity to discuss the highs and lows of the project.

My preparation normally involves thinking back over the course of the project and thinking about:
  • what went well?
  • what didn't go so well?
  • what we could do better next time and what lessons we can learn
  • How well the project was analysed and specified
  • how well the project was managed
  • how well the testing phase went - bugs found in testing vs UAT vs post-live
  • how well the handover to support went
  • how well the original time estimates reflected reality
  • how well specified the infrastructure was - were the original estimates on page impressions etc valid

The most recent one I attended was run in the order of the project, so feedback was made first against the Sales process, then the Analysis process etc. This worked pretty well, but did mean that the last few stages of the project were rushed to ensure that the meeting finished on time. Alternatives include asking the "What went well?", "What didn't go well?" questions of every person in the room. This ensures that everyone gets their say but does involve preparation on behalf of every attendee (no bad thing).

Who should be there?


For me, the ideal meeting should include everyone who has been involved with the project, from start to finish - in some cases this could be a lot of people but every function should be represented - so definitely Sales, Analysts, Project Managers, Developers, Support and Systems. Every person should have an equal opportunity to speak.

When should it happen?


Usually, after the project has gone live and been handed over into a support phase. In some cases a project can last too long, and if the project is scheduled to take more than 6 months, its probably worth having 6 monthly review meetings to ensure that key learnings aren't forgotten, or that subsequent projects can learn and improve quickly. These shouldn't replace the Post Implementation Review but should supplement it.

What should happen afterwards?


The final part of the meeting should be a quick review of the "What went well?" items, and of the "Key learnings". Someone should be tasked with producing a document which should then be circulated outlining the key learnings from the project, and also the highs - the lows should be kept within the team and learnt from but not circulated - it shouldn't be a shaming exercise but should be a great motivator. Any individuals charged with process review, or implementing changes to current/ongoing projects should be informed of the key learnings to ensure these learnings are escalated and implemented as quickly as possible.
Bruce sent me a link the other day to an article T4 template for generating SQL view from C# enumeration which I found interesting from a modelling constants/enumerations in SQL viewpoint.

The example used was modelling an enumeration of ContactType which has valid items of Individual and Organisation.

The article used a view to model this, as per
CREATE VIEW enumContactType
AS
  SELECT
    0 AS Individual,
    1 AS Organization


and then using it within a SELECT as

SELECT *
FROM Contact
WHERE Type = (SELECT Organization FROM enumContactType)

(Note: in the original article Oleg used a schema called enum, but I'm just ignoring this at the moment and have thus changed the name from enum.ContactType to enumContactType)

An alternative


In my previous company, we used Scalar-Valued Functions to mimic constants, and I guess this could be extended to enumerations. I thought I'd re-create the above example and give it a try to see how it looks and compares.

So, to model the enumeration ContactType, I've created two functions as follows:
CREATE FUNCTION enumContactTypeIndividual()
RETURNS INT
AS
BEGIN
  RETURN 0
END
GO

CREATE FUNCTION enumContactTypeOrganisation()
RETURNS INT
AS
BEGIN
  RETURN 1
END
GO


And then to reproduce the SELECT query I wrote:
SELECT *
FROM Contact
WHERE Type = dbo.enumContactTypeOrganisation()


The resulting data matches that used in the VIEW model and provides an alternative. I'm sure that a template could be written to produce those functions as an output as per the end part of Oleg's article.

Performance and timings


I was interested in the relative performance of these two methods, so armed with my timing code from last week I checked them out. I amended the SELECT to bring back the COUNT(*) FROM Contact into a local integer variable, and ran it 1000000 times.

The results are as follows:
CodeDescription TimeInMS
EnumViewUsing the view13010
EnumUDFUsing the UDF21450

showing that the view method is more performant.

I then changed the function to make use of SCHEMABINDING. The new functions look like:
CREATE FUNCTION enumContactTypeIndividual()
RETURNS INT
WITH SCHEMABINDING
AS
BEGIN
  RETURN 0
END
GO

CREATE FUNCTION enumContactTypeOrganisation()
RETURNS INT
WITH SCHEMABINDING
AS
BEGIN
  RETURN 1
END
GO


And the timings change to be:
CodeDescription TimeInMS
EnumViewUsing the view13010
EnumUDFUsing the UDF20280

which do reduce the time taken for the UDF but still means that the view is faster.

For interests sake I then ran a comparison timing against the code using the literal as:
SELECT *
FROM Contact
WHERE Type = 1

which resulted in
CodeDescription TimeInMS
EnumLiteralUsing the literal12043

showing it is faster, but not by much, than the view.

Summary


So, what has this shown?
  • Using a view is quite efficient and effective for modelling enumerations
  • Using a UDF is an alternative, but is slower
  • Schema binding makes UDF usage quicker
  • The difference between using a VIEW and using the hard-coded literal isn't a lot in perfomance terms
As I alluded to the other day, I'm gradually weaning myself off my dependency on (the fairly ugly) sys.objects, sys.columns etc as a way to query the meta data about my database. Instead I'm using the SQL-92 compliant INFORMATION_SCHEMA views.

Information schema views provide an internal, system table-independent view of the SQL Server metadata. Information schema views enable applications to work correctly although significant changes have been made to the underlying system tables. The information schema views included in SQL Server 2005 comply with the SQL-92 standard definition for the INFORMATION_SCHEMA.


So now, when I'm writing database upgrade scripts and attempting to write defensive SQL (which is my usual position these days, regardless of whether I think the script will be run more than once - lets just say I've learnt from making such assumptions) I usually wrap
ALTER TABLE
statements within
IF NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE Table_Name = 'MyTable' AND Column_Name = 'MyNewColumn'),
CREATE TABLE
statements within
IF NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE Table_Name = 'MyTable') etc

The main area that I have to revert to the sys views for is indexes, and finding out what columns are included in which index, which is the uglier, but no-less-effective
SELECT
  OBJECT_NAME (i.object_id) AS Tablename,
  i.name AS IndexName,
  c.name AS ColumnName,
  CASE ic.is_descending_key
  WHEN 1 THEN 'DESC'
  ELSE 'ASC'
  END as ColumnSort
FROM sys.indexes i
INNER JOIN sys.index_columns ic
ON i.object_id = ic.object_id
AND i.index_id = ic.index_id
INNER JOIN sys.columns c
ON ic.object_id = c.object_id
AND ic.column_id = c.column_id
INNER JOIN sys.objects o
on c.object_id = o.object_id
WHERE o.type = 'U'
ORDER BY TableName, indexName, ic.key_ordinal


MSDN has an interesting article Querying the SQL Server System Catalog FAQ which has examples for finding out (using the various object catalog views) many different areas of meta data across a SQL Server 2005 database and is worth using as a starting point.
A while ago I blogged about how to get the date element of a datetime column in TSQL. In that post I said
I would probably have done it via a CONVERT/CAST operation, converting to a VARCHAR and then back to a DATETIME, but this is a much more efficient method.

but I didn't prove it at the time. I gave it some more thought and wanted to know what the differences were, so I wrote some timings code.

This script creates one table Timings with columns of Code, Description, ActionTime and IsComplete. It has a combined primary key of Code and IsComplete. Code must be unique - and can be a string of up to 10 characters long to uniquely identify the action being timed. IsComplete is used to differentiate between the start time and end time of the process being monitored.

The script also creates 3 stored procedures:
  • up_RecordStart which takes 2 parameters - the unique code and optional description. This is used to record the start of the activity being monitored.
  • up_RecordEnd which takes just 1 parameter - the code - should match the code used in up_RecordStart. This is used to record the end of the activity being monitored.
  • up_GetTimings which again takes just 1 parameter - the code to return the timings from. It then returns the Code, Description and the length of time the action took in ms.

I wrote some script to then use these objects to test the assertion I made that FLOOR and combinations of converting DATETIME to FLOAT etc would be more efficient than using either CAST or CONVERT to VARCHAR(12) and back again to a DATETIME.

------------------------------
-- Clean up before we start --
------------------------------
DELETE FROM Timings
WHERE Code IN ('FLR','CONVERT','CAST')
GO

---------------------
-- Try using Floor --
---------------------
EXEC up_RecordStart @Code='FLR', @Description='SELECT CONVERT(DATETIME,FLOOR(CONVERT(FLOAT,GETDATE())))'
GO

DECLARE @i AS INTEGER
DECLARE @floorDate AS DATETIME
SET @i = 0

WHILE @i < 1000000 -- try the next statement for 1000000 times - this should be enough to see some differences
BEGIN

  SET @floorDate = CONVERT(DATETIME,FLOOR(CONVERT(FLOAT,GETDATE())))
  SET @i = @i + 1
END
PRINT @floorDate
GO

EXEC up_RecordEnd @Code='FLR'
GO

-----------------------
-- Try using convert --
-----------------------
EXEC up_RecordStart @Code='CONVERT', @Description='SELECT CONVERT(DATETIME,CONVERT(VARCHAR(12),GETDATE()))'
GO

DECLARE @i AS INTEGER
DECLARE @floorDate AS DATETIME
SET @i = 0
WHILE @i < 1000000
BEGIN

  SET @floorDate = CONVERT(DATETIME,CONVERT(VARCHAR(12),GETDATE()))
  SET @i = @i + 1
END
PRINT @floorDate
GO

EXEC up_RecordEnd @Code='CONVERT'
GO

--------------------
-- Try using Cast --
--------------------
EXEC up_RecordStart @Code='CAST', @Description='SELECT CAST(CAST(GETDATE() AS VARCHAR(12)) AS DATETIME)'
GO

DECLARE @i AS INTEGER
DECLARE @floorDate AS DATETIME
SET @i = 0
WHILE @i < 1000000
BEGIN

  SET @floorDate = CAST(CAST(GETDATE() AS VARCHAR(12)) AS DATETIME)
  SET @i = @i + 1
END
PRINT @floorDate
GO

EXEC up_RecordEnd @Code='CAST'
GO

-------------------------
-- Now get the timings --
-------------------------
EXEC up_GetTimings 'FLR'
GO

EXEC up_GetTimings 'CONVERT'
GO

EXEC up_GetTimings 'CAST'
GO


This results in the following data being returned:
CodeDescription TimeInMS
FLRSELECT CONVERT(DATETIME,FLOOR(CONVERT(FLOAT,GETDATE())))1313
CONVERTSELECT CONVERT(DATETIME,CONVERT(VARCHAR(12),GETDATE()))3236
CASTSELECT CAST(CAST(GETDATE() AS VARCHAR(12)) AS DATETIME)3203

which shows that the method using FLOOR is more efficient, and that there isn't a lot to chose between CONVERT and CAST
Last night Madgex hosted an excellent VBUG Brighton session by Mike Taulty on LINQ.

Despite the sunny, warm evening we managed to pack 25 or so Microsoft technologies developers into our boardroom and listened intently whilst Mike talked and demo'd his way around LINQ, explaining some of the newer C#/VB9 language features as he went. Whilst not being the exact same slide deck, after a rummage around Mike's site I found a post about a similar sounding talk complete with presentation in PDF format.

I remain slightly dissapointed by the syntax for Linq to XML
var query = from c in data.DescendantsAndSelf("customer")
select (string)c.Attribute("id");

which as Mike said, involves a bit too much of hoping and praying (relying on no underlying changes, no strong typing etc).

However, I'm really encouraged by the idea of Linq to XSD which seems like a much better idea, tying the query to a schema rather than a document.

Fabrice has some sample code based on Linq to XML and Linq to XSD as follows, which goes to show the improvement using the XSD version

Here is a LINQ to XML query:
from item in purchaseOrder.Elements("Item")
select (double)item.Element("Price") * (int)item.Element("Quantity")


Here is the same query as above, but written using LINQ to XSD:
from item in purchaseOrder.Item
select item.Price * item.Quantity

which I think looks much more elegant and less clunky.
Following on from this morning's post about the Monty Hall problem, and proving it in PHP I figured I'd prove it in TSQL as well.

So here is my SQL version.

To maintain consistency with my PHP version, I've made it output similar text, so the results are along the lines of:
Monty Hall Problem
This is a simple TSQL query to prove the Monty Hall problem [http://en.wikipedia.org/wiki/Monty_hall_problem]

------------------------
The Results are in:
------------------------
Out of 10000 games, the contestant was right to swap 66.94% of the time and wrong 33.06% of the time


The TSQL version is a bit more elegant with regards to working out which door to open for the contestant, as it is a simple statement of
SELECT TOP 1 @Opened = DoorNumber
FROM @Doors
WHERE DoorNumber NOT IN (@Prize, @Picked)
ORDER BY NEWID()

making the most of set theory to enable the exclusion of the @Prize door and the @Picked door as opposed to the same thing in my PHP code
$remaining = array();
/* the gameshow host opens a door which has nothing behind it, so the gameshow host knows where the prize is
but can't choose to open the door the contestant has chosen, so remove both picked and prize from the options,
this leaves either one of two doors that can be opened, so pick one randomly */
for ($i=0; $i<3; $i ++)
{
  switch($doors[$i])
  {
    case $prize:
      break;
    case $picked:
      break;
    default:
      array_push($remaining,$doors[$i]);
  }
}
$opened = $remaining[array_rand($remaining)];

which is all a bit more procedural and, at least to my mind, less elegant - but then again I like the syntax of SQL which either makes me a freak or a masochist (according to at least one colleague)
This evening, after returning from an excellent VBUG: Brighton (of which more in a later post), Richard was talking about the Monty Hall problem after listening to a discussion about it on the BBC podcast In Our Time.

For those who don't know the Monty Hall problem it is this:
  • A gameshow set has 3 doors.
  • Behind one of the doors is a prize.
  • Behind the other 2 doors is nothing.
  • The contestant choses a door.
  • The gameshow hosts, knowing where the prize is, and which door the contestant has chosen, opens a door which he knows hasn't got the prize behind it and the contestant hasn't chosen.
  • The contestant is then offered the opportunity to trade their door for the one remaining door.
Should the contestant switch? The answer is yes 2/3rds of the time. See here for the reasons.

Richard and I both set about proving it in the tools we had available, I chose PHP, Richard chose Scala. And we both can successfully demonstrate that by always switching doors the contestant is more likely to win.

My PHP version is available for demo here and downloadable via a zip file (right click, save as) here.
A little over 2 years ago I posted a couple of methods of finding the instances of a string within stored procedures - at the time I'd left some debug 'Print @' code in at least one stored procedure and needed to find it before releasing the code.

Today, I wanted to do the same thing, and as over the past 2 years I've been weaning myself off accessing the sysobjects tables, instead making use of the INFORMATION_SCHEMA views, I decided it was time to add an update to that earlier post.

So, using INFORMATION_SCHEMA.ROUTINES it is coded as :
  SELECT SPECIFIC_NAME
  FROM INFORMATION_SCHEMA.ROUTINES
  WHERE ROUTINE_DEFINITION LIKE '%PRINT @%'
I got an email a while ago (quite a while ago in fact, sorry Anthony) which said

"I'm one of the Brighton bloggers (at IslingwordStreet.com) but also work with a small non-profit called the Democratic Society. We exist to promote participation and democracy, and are going to be holding a discussion event in Brighton in the next couple of months. The topic of the discussion is not decided yet, there's a survey at

http://www.surveymonkey.com/s.aspx?sm=gQVcBe50yfQUF2xvKFBVzA_3d_3d

which will help us decide what people would be interested in hearing about.

Would it be possible to make a mention of this survey on the Brighton Bloggers site, so we can get some views from the local blogosphere?"

So, here it is...
Pages:      1 2 3 ... 41 Next

Pulling together the thoughts and posts of a snowboarding developing photographer :-)

sponsor
time tracking harvest

Harvest - Simple time tracking, powerful reporting.

Suprss
(Subscribe to this page via RSS!)