Motoring along in the SSDT based deployment series, in this post I’m going to look at one particular publish setting and how it behaves, which is my fourth bug bear with the SSDT deployment function:
4. Block incremental deployment if data loss might occur.
On the face of it, this is something that I immediately like; I’m saying that I don’t want my deployment to potentially screw up data in my database. All good. However, the implementation of it seems extremely inflexible, and the key word in the setting is “might”. The only actual check to see whether data loss will occur is to see if there are any rows in the table, not that if there are rows and the data loss might actually occur, and if you explicitly cater for dealing with handling that by, say, having a pre-deployment script that updates all the values in a table from NULL to a new default value (in batches if the table is large), it still errors and stops your deployment. Here’s an example of what is run:-
/* The column [MySchema].[TestTable2].[TestTable1ID] on table [MySchema].[TestTable2] must be added, but the column has no default value and does not allow NULL values. If the table contains data, the ALTER script will not work. To avoid this issue you must either: add a default value to the column, mark it as allowing NULL values, or enable the generation of smart-defaults as a deployment option. */ IF EXISTS (select top 1 1 from [MySchema].[TestTable2]) RAISERROR (N'Rows were detected. The schema update is terminating because data loss might occur.', 16, 127) WITH NOWAIT
The only workaround is to turn it off. But then you open yourself up to release failures when you go ahead and do that change that alters a column to have a NOT NULL constraint whilst having NULL values in it. So, we’ve ended up with a bit of a laborious Octopus Deploy process that first runs the deployment with the setting on, if it then fails raises a warning for user input, allowing us to check if the case is valid and/or handled, and turn it off and let the deployment run again. Which is, frankly, ugly.
There are a whole bunch of scenarios where this block might turn up, some of which are listed below:
- Changing a column from unicode to ascii, or any other datatype size reduction.
- Adding a NOT NULL column, or changing an existing column to NOT NULL.
- Adding a foreign key.
- Adding a check constraint.
- Adding a unique index.
- Renaming objects (there is a note on Books Online about this not happening if you use Premium or Ultimate, so if anyone uses these I’d be interested to know whether this still applies?)
Plenty of these are likely to happen during the course of a database’s lifecycle, and effectively meant that our automated deployments are going to end up with manual intervention most times. Whilst I’d hope that this stuff is considered in the first place (use the right datatypes!), and certain types of silliness like adding a column to the middle of a table would be avoided, there’s always going to be unforeseen changes that end up needing to be deployed.
5. How SSDT deployments handle those data movements.
Following on from this, and already seen in Part 1 where I lambasted the awful way that partition scheme changes happen, is that almost anything that causes data movement involves creating a complete copy of the table in question, copying all the data in, and renaming. This gives me a bunch of headaches:
- I have to keep enough free space in the database and log to ensure I can do this copy.
- The transaction log growth for some of these changes is going to be immense, and have potentially disastrous consequences for the state of my secondary in my Availability Group.
- Foreign Key constraints are also going to be affected.
- The duration of my deployments just grew exponentially to allow for all this data movement, with each one totally blocking my db as the isolation level is serializable.
- Schema bound views are going to have to be recreated; and considering all indexed views are schema bound, they’re going to have to be dropped and recreated, leading to even more data movement.
I’ve yet to test with a serious amount of foreign key hierarchies in place, but I can just think of a few examples in our production systems where there are four or five levels of foreign key nesting: all I can imagine is that each table will end up getting copied across to allow the new copies of the tables to be referenced and so on. It doesn’t take too much to think that before long we end up effectively copying out the entire database!
So, again, we’re in the process of building a parser to look for this kind of nonsense, and stopping the deployment if these happen in our scripts. We then handle with a separate script and allow the deployment to regenerate. Sort of defeating the point of having a tool build our deployment scripts.