As a devoted Doctrine whisperer, I was excited when Marco Pivetta’s Doctrine ORM Good Practices and Tricks talk (from the 2016 PHP UK Conference) popped up in my YouTube suggestions.
I’ve been using Doctrine for five years now, and I have my own list of dos (do build repo custom repo methods to get your entity with fetch joins to its commonly-used associations) and don’ts (don’t ever set up single-table inheritance using part of the primary key as the discriminator column). It’s always interesting to hear a new spin on things though, especially when it comes from someone with inside knowledge.
Using Doctrine with Fixed Databases
Marco’s first suggestion is that Doctrine is not the best tool if you don’t have any control over your database’s schema. I’ve done some pretty gnarly rewriting of legacy apps to use Doctrine, and I only partly agree with this.
Compared to alternatives that heavily favour “convention over configuration” (such as Laravel’s Eloquent), Doctrine’s more imperative style of entity mapping means it’s easy to make model classes with interfaces that reflect your current understanding of the domain, while persisting to tables that are due for an overhaul. It’s also got fairly good support for dropping down to raw SQL when you need to, while still encapsulating all of your query logic inside repository classes.
Thus far in my work, I’ve always had the ability to change the schema, although sometimes that’s been discouraged due to the amount of old broken code that relied on it. There are a couple of deal-breakers that would make it next to impossible to use Doctrine if you had no control over your database.
One codebase I rewrote was built before nullable columns were a thing in mysql (or before the original developer knew that nullable columns were a thing?). The foreign key columns were all not-nullable with a 0 value to indicate no mapping. Doctrine is not very forgiving of “broken” joins and other data integrity issues, and every time it encountered one of these 0 values it spat the dummy.
Design with Business in Mind, Not Data
New features usually mean new data and new entities. The trouble is, as you work on your feature, your understanding of the domain changes and does your schema.
Our solution to this at my last job was to rely on raw SQL files for migrations rather than Doctrine’s built-in support. Need to add or remove a column, no sweat – just update your CREATE TABLE
statement, blast away the table and recreate it. This did become a little bit brain-hurty if you’d made multiple changes to the schema of an existing table and you need to back them out, but we managed.
Marco has a more elegant alternative: build and test your code without the database first. Use serialisation to a text file if you need to save something. Once you’ve got everything functioning well, it’s time to add your mappings to your entity classes. This is a neat idea and would work well in a test-first environment with well-factored code and unit tests that actually test individual “units”. This is typically not anywhere I’ve worked, but I’ll keep the suggestion in mind next time I’m doing some greenfield development.
Plus Size Models
Entities are not typed arrays. Marco is very firm about this point – your entity should not just be a set of private fields with public getters and setters for each one. Think about what your application actually needs, and let the entity be a real object with behaviour to support that.
I personally don’t auto-generate getters and setters for every field in my entity, preferring to wait until I actually need to access that field. I also tend to write extra getters that transform the data into what the entity actually needs – a getFullName()
that concats the first name and last name, or a getTotalAmount()
that adds up the value of all of the related entities.
Marco’s example went further, with a User entity that handled the hashing of the password, and the authentication of the password against the hash. I have done this and it’s fine for most situations, but it does limit your implementation options. What if you want to hash a different salt on your test and production environments? Taken too far it can also make your entities into plus sized models, or god classes as the pun-averse would call them.
Modern PHP does offer a solution to plus sized models, in the form of traits. If your model is getting a wee bit heavy (or your project’s coding conventions ban logic in models), extracting it out into a trait could give you the best of both worlds.
Avoid Chain Wrecks and Arrow Anti-Patterns
These occur when some business logic concerning your entity A retrieves its associated entity B, and then gets B’s associated entity C … Either your code finishes up in a gnarly one-liner that your debugger cringes from in fear, or you finish up with several layers of indentation as you guard against null values and iterate over collections.
I have a confession to make. I dislike maintaining code with many layers of indentation … but the day I discovered chaining in COMP102 was one of the happiest days of my life. For the next year almost every function I wrote consisted of a single line of nastiness extending off the edge of the screen. I keep this tendency on a tight leash on production codebases, and Marco’s reminder that chain wrecks usually indicate a violation of the Law of Demeter is very cogent.
Say No to Validation (in Your Entities)
Here’s an opinion that I find a bit harder to reconcile with my way of doing things. Marco argues that there should never be any need to validate an entity – it will be in a valid state when you create it, and it shouldn’t allow anybody to make it invalid in the first place. He suggests that when you have data that could be invalid, it needs to go into a separate object (e.g. a UserForm class containing the user-inputted data) and be validated there before it goes near the entity.
His logic? Allowing an entity to get into an invalid state means there is the potential for it to get written in an invalid state. Once you have data that is in an invalid state, you’ve got a mess to either clean up or tiptoe cautiously around until the end of time.
I still don’t like building a whole class just to represent the input from my user. It doesn’t feel very DRY to me in simpler cases, when the fields on the form are pretty much the same fields as the model has. I can see the use for more sophisticated setups though, or when the data coming from different sources (e.g. web versus API) may need to be handled differently. Maybe it’s time I embraced the code bloat.
Marco also suggested rolling your own DTOs for the form data rather than relying on the one that your framework provides. Yours will probably be less painful for your specific purpose than your framework’s form component, and as a bonus you’ve reducing your coupling so you’ll be able to switch to a new framework with less hassle later.
Use UUIDs as Primary Keys
Marco presents a couple of reasons not to rely on auto-generated IDs. Firstly, if you are inserting multiple related entities in one hit, future versions of Doctrine won’t be able to execute these concurrently, so you won’t get the best performance out of your application. Secondly, it makes your code dependent on the data layer and the ORM to look after something which is, in his opinion, your responsibility – putting the entity into a valid state.
As an alternative, he suggests using a UUID as the primary key, stored as a 128-bit integer. Generate a new one in your entity’s constructor, and you’re good to go.
Marco goes as far as to suggest that pre-assigned IDs should be used instead of composite primary keys as well. This idea hurts my frugal brain a bit if I try to apply it to mapping tables, but I’ve definitely seen other situations where composite keys created more problems than they solved.
Use Doctrine’s Second-Level Cache
This is a new feature (described as experimental in the Doctrine 2.6 docs) that provides a read-through cache for immutable objects. It sounds very useful, but to make the most of it, you’ll need to think about it when you design your entities, and make as many things immutable as you can.
Yes, this advice does contradict an earlier point about designing for the business logic, not the database … that’s why best practices are best practices, and not laws that must be rigidly adhered to at all costs.
Give Soft-Deletes the Swerve
At my last job we had a lot of tables with soft deletes. In many cases they were used to indicate that something should no longer be visible in the sales subsystem (for example, a product that was no longer for sale) but would still be used for other purposes (e.g.to report on sales by product supplier). We had andWhere('e.deletedAt IS NULL')
scattered through our repositories like confetti. Of course, we also had the odd bug where we forgot these.
Then some wiseguy decided to whack in the SoftDeleteable Doctrine extension. Our problem with data popping up where it shouldn’t was replaced by a different class of problems. SoftDeleteable causes Doctrine to spit the dummy every time it tries to traverse a relationship to an entity that has been deleted, and our code base had many, many places where we did this. We built a utility method to disable soft-delete, run a query, and return the result, and that became the
After this experience I will never, never introduce the SoftDeleteable extension into a mature codebase again. The agony of random exceptions popping up all over the place for months afterwards was just not worth the minimal benefit that it brought us.
Marco provided three alternatives to soft-delete:
- Write the thing you’re about to delete to an audit log somewhere, and then go ahead and nuke it.
- Just nuke it, and restore from backup if you find out you actually needed it. Let’s hope you have your backup strategy in shipshape order first.
- Add a flag to your entity that indicates what soft-deleting really represents in your business logic (e.g. an archivedAt flag). This seems to me like a case of “a rose by any other name would smell as sweet” … it’s still a soft-delete flag, it’s just got a different name…
Pass IDs instead of entities
Marco likes this idea it keeps the different parts of a system separate, and forces each part to flush its transaction before handling over to the next. I hate it because I’ve seen it in the wild, and it got very confusing, very quickly. Suddenly you have a method that takes four IDs and they aren’t named clearly and you’re not sure whether $priceId
is the primary key for a ProductPrice
or an OrderPrice
.
I’m also not convinced that multiple flush()
calls in one execution is a great idea. When possible, I like to leave everything transient until the very end and then flush()
once at the end of all of the business logic. This means that everything happens in a single transaction, and there is no chance that a failed web request can leave the database in an inconsistent state.