November 2018 – Kate the Coder

Why Don’t Girls Become Developers?

This week marked 125 years since the 1893 election, when women voted for the first time in the world. Closer to home, there were celebrations in Waimate to mark the 100th anniversary of the death of Margaret Cruickshank, NZ’s first female doctor. It’s amazing to think about how few opportunities women had 200 years ago and how quickly things have changed.

This week I also received a newsletter from NZTechWomen, linking to a report on diversity in tech by MYOB. This was mostly based on data from the 2013 census, which found that only 23% of tech workers in NZ are women. I’m quite surprised that it’s actually that high, as female developers seem to be the real tech unicorns.

The most interesting statistic to me? Just 3% of 15-year-old girls are considering a career in ICT.

Here’s a few quotes from the report:

“Zoe [Timbrell, OMGTech!] believes peer pressure and cultural norms dissuade girls from studying technology – which they often perceive as ‘boy-ish’, boring and nerdy.”
“Parents need to let their daughters experiment with technology. I have heard so many parents say ‘Oh, she wouldn’t be interested in that, but her brother would probably like it!'” – Alice Gatland, Girl Code
“The fact that just three per cent of fifteen year-old girls want to pursue a career in technology shows us that we need to be targeting girls at a younger age. We have to be teaching computer science, engineering, problem-solving and computational thinking from primary school.” – Mahsa Mohaghegh, She Sharp

It Starts with Toys

… and movies and TV shows, and magazines, and sports, and the many, many other things that send a message to young girls that they are supposed to be Pretty and Sweet and Take Care of People, while boys are supposed to be Tough and Strong and Good at Fixing Things. But let’s focus on toys.

I am lucky that I was raised by hippies who did their best to shield me from this stuff. I had no idea that girls weren’t supposed to be interested in clothing and dolls rather than dinosaurs and Lego. I’m also lucky that I was immobile for a large chunk of my toddlerhood due to an injury, which meant I spent a lot of time playing with jigsaw puzzles while the other kids ran around outside. My family interpreted my skill at jigsaw puzzles as the signature of a budding civil engineer, rather than the logical result of hours of practice.

My father is a COBOL programmer. He dropped out of university study in chemistry and worked in various unskilled odd jobs after that. One day he saw an ad looking for programmers to work at the Post Office, no experience required … so he applied. He’d never touched a computer before in his life, and he got all his training on the job.

I went to his work Christmas parties and what-not, and met quite a few of his colleagues. There were a lot of women; in fact, they probably outnumbered men. They had similar backgrounds – they worked in lower-skilled office jobs and then at some point they landed a job where they were paid to learn how to code.

The previous generation had no shortage of women in tech, so what’s happened and why are there now twice as many young men than women studying ICT subjects in New Zealand? I found an explanation recently in an article from the NPR which investigated the drop in female computer science students in the USA.

They found that the introduction of personal computers into homes in the 1980s was to blame. Until that point, children had no access to computers as they were growing up. Instead people got into the career of programming as adults, often more or less by accident – just like my dad.

When the Commodore 64 and its peers came out, computers became toys that children could experiment with in their own homes. And because our culture feels the need to maintain gender norms by categorising toys as “boys’ toys” or “girls’ toys”, computers were marketed to boys. Boys grew up already knowing a bit about computers and were more likely to consider a career in ICT. Young women became less interested in the field because computers were seen as something complicated and mysterious that only a man could understand.

The most depressing quote from the NPR article? “[Margolis] found that families were much more likely to buy computers for boys than for girls – even when their girls were really interested in computers.” This finding is eerily similar to one of the quotes I pulled from the MYOB report, because even in 2018 in New Zealand, parents think of computers as a boys’ thing.

What’s the Solution?

Stop it with the toy gendering. Let kids play with whatever they’re interested in, and encourage all kids to explore a range of activities, from crafts to coding. Don’t make assumptions about what careers men or women typically do in front of your kids – or preferably, don’t do it at all because Girls Can Do Anything and so can boys.

Hopefully the changes afoot to our digital technologies curriculum, which will see basic coding taught at primary and intermediate schools, will help as well. This will rely on our teachers to remember that girls and boys can both excel at STEM subjects though – an attitude that is sadly not present at every school.

What I’ve Learned This Week (2018-11-25)

SilverStripe 4 brings our homegrown PHP framework and CMS into the 21st century, with namespacing and Composer, the inclusion of a few Symfony components, heavy use of ReactJS, and a more modern Bootstrap-powered look for the admin panel. Almost all of my SilverStripe experience has been with 2.4 – this looks a lot nicer to work with! I’ll have to play around with it next time I have a free weekend.
Unit test “coverage” isn’t useful if your code is just executing the code rather than testing it. I’ve never used coverage reports as it’s usually easy to spot the holes with a quick glance at the project – but it’s a useful reminder that just because something appears to be well-tested, it may not actually be.
A little hack in PHP where you can use variable length arguments to mimic a typehinted array argument. It’s not a perfect solution; neither is my current approach (scatter docblocks everywhere and hope that PHPStorm will spit out a warning if I pass the wrong thing).
Symfony is leaving the PHP Framework Interoperability Group. Founder Fabien Potencier explained the reasons via Twitter, complaining that the PSRs have become too opinionated, and described PHP-FIG’s work as designing a “framework by committee”. With Laravel and Symfony both out, this seems like the beginning of a slow painful death for PHP-FIG.
Java 8 has introduced Streams, which makes me very happy because I can bring my functional programming habits from PHP and underscore.js into Java code. Naturally I had to do some Saturday morning hacking to see it in action. Who wants to write a boring for loop when they can:

        MyObject bestMatch = parent.getChildren().stream()
                .filter(o -> isPerfectMatch(o))
                .findFirst()
                .orElse(parent.getChildren().stream()
                    .filter(o -> isGoodEnoughMatch(o))
                    .findFirst()
                    .orElse(null)
                );

Doctrine Best Practices

As a devoted Doctrine whisperer, I was excited when Marco Pivetta’s Doctrine ORM Good Practices and Tricks talk (from the 2016 PHP UK Conference) popped up in my YouTube suggestions.

I’ve been using Doctrine for five years now, and I have my own list of dos (do build repo custom repo methods to get your entity with fetch joins to its commonly-used associations) and don’ts (don’t ever set up single-table inheritance using part of the primary key as the discriminator column). It’s always interesting to hear a new spin on things though, especially when it comes from someone with inside knowledge.

Using Doctrine with Fixed Databases

Marco’s first suggestion is that Doctrine is not the best tool if you don’t have any control over your database’s schema. I’ve done some pretty gnarly rewriting of legacy apps to use Doctrine, and I only partly agree with this.

Compared to alternatives that heavily favour “convention over configuration” (such as Laravel’s Eloquent), Doctrine’s more imperative style of entity mapping means it’s easy to make model classes with interfaces that reflect your current understanding of the domain, while persisting to tables that are due for an overhaul. It’s also got fairly good support for dropping down to raw SQL when you need to, while still encapsulating all of your query logic inside repository classes.

Thus far in my work, I’ve always had the ability to change the schema, although sometimes that’s been discouraged due to the amount of old broken code that relied on it. There are a couple of deal-breakers that would make it next to impossible to use Doctrine if you had no control over your database.

One codebase I rewrote was built before nullable columns were a thing in mysql (or before the original developer knew that nullable columns were a thing?). The foreign key columns were all not-nullable with a 0 value to indicate no mapping. Doctrine is not very forgiving of “broken” joins and other data integrity issues, and every time it encountered one of these 0 values it spat the dummy.

Design with Business in Mind, Not Data

New features usually mean new data and new entities. The trouble is, as you work on your feature, your understanding of the domain changes and does your schema.

Our solution to this at my last job was to rely on raw SQL files for migrations rather than Doctrine’s built-in support. Need to add or remove a column, no sweat – just update your CREATE TABLE statement, blast away the table and recreate it. This did become a little bit brain-hurty if you’d made multiple changes to the schema of an existing table and you need to back them out, but we managed.

Marco has a more elegant alternative: build and test your code without the database first. Use serialisation to a text file if you need to save something. Once you’ve got everything functioning well, it’s time to add your mappings to your entity classes. This is a neat idea and would work well in a test-first environment with well-factored code and unit tests that actually test individual “units”. This is typically not anywhere I’ve worked, but I’ll keep the suggestion in mind next time I’m doing some greenfield development.

Plus Size Models

Entities are not typed arrays. Marco is very firm about this point – your entity should not just be a set of private fields with public getters and setters for each one. Think about what your application actually needs, and let the entity be a real object with behaviour to support that.

I personally don’t auto-generate getters and setters for every field in my entity, preferring to wait until I actually need to access that field. I also tend to write extra getters that transform the data into what the entity actually needs – a getFullName() that concats the first name and last name, or a getTotalAmount() that adds up the value of all of the related entities.

Marco’s example went further, with a User entity that handled the hashing of the password, and the authentication of the password against the hash. I have done this and it’s fine for most situations, but it does limit your implementation options. What if you want to hash a different salt on your test and production environments? Taken too far it can also make your entities into plus sized models, or god classes as the pun-averse would call them.

Modern PHP does offer a solution to plus sized models, in the form of traits. If your model is getting a wee bit heavy (or your project’s coding conventions ban logic in models), extracting it out into a trait could give you the best of both worlds.

Avoid Chain Wrecks and Arrow Anti-Patterns

These occur when some business logic concerning your entity A retrieves its associated entity B, and then gets B’s associated entity C … Either your code finishes up in a gnarly one-liner that your debugger cringes from in fear, or you finish up with several layers of indentation as you guard against null values and iterate over collections.

I have a confession to make. I dislike maintaining code with many layers of indentation … but the day I discovered chaining in COMP102 was one of the happiest days of my life. For the next year almost every function I wrote consisted of a single line of nastiness extending off the edge of the screen. I keep this tendency on a tight leash on production codebases, and Marco’s reminder that chain wrecks usually indicate a violation of the Law of Demeter is very cogent.

Say No to Validation (in Your Entities)

Here’s an opinion that I find a bit harder to reconcile with my way of doing things. Marco argues that there should never be any need to validate an entity – it will be in a valid state when you create it, and it shouldn’t allow anybody to make it invalid in the first place. He suggests that when you have data that could be invalid, it needs to go into a separate object (e.g. a UserForm class containing the user-inputted data) and be validated there before it goes near the entity.

His logic? Allowing an entity to get into an invalid state means there is the potential for it to get written in an invalid state. Once you have data that is in an invalid state, you’ve got a mess to either clean up or tiptoe cautiously around until the end of time.

I still don’t like building a whole class just to represent the input from my user. It doesn’t feel very DRY to me in simpler cases, when the fields on the form are pretty much the same fields as the model has. I can see the use for more sophisticated setups though, or when the data coming from different sources (e.g. web versus API) may need to be handled differently. Maybe it’s time I embraced the code bloat.

Marco also suggested rolling your own DTOs for the form data rather than relying on the one that your framework provides. Yours will probably be less painful for your specific purpose than your framework’s form component, and as a bonus you’ve reducing your coupling so you’ll be able to switch to a new framework with less hassle later.

Use UUIDs as Primary Keys

Marco presents a couple of reasons not to rely on auto-generated IDs. Firstly, if you are inserting multiple related entities in one hit, future versions of Doctrine won’t be able to execute these concurrently, so you won’t get the best performance out of your application. Secondly, it makes your code dependent on the data layer and the ORM to look after something which is, in his opinion, your responsibility – putting the entity into a valid state.

As an alternative, he suggests using a UUID as the primary key, stored as a 128-bit integer. Generate a new one in your entity’s constructor, and you’re good to go.

Marco goes as far as to suggest that pre-assigned IDs should be used instead of composite primary keys as well. This idea hurts my frugal brain a bit if I try to apply it to mapping tables, but I’ve definitely seen other situations where composite keys created more problems than they solved.

Use Doctrine’s Second-Level Cache

This is a new feature (described as experimental in the Doctrine 2.6 docs) that provides a read-through cache for immutable objects. It sounds very useful, but to make the most of it, you’ll need to think about it when you design your entities, and make as many things immutable as you can.

Yes, this advice does contradict an earlier point about designing for the business logic, not the database … that’s why best practices are best practices, and not laws that must be rigidly adhered to at all costs.

Give Soft-Deletes the Swerve

At my last job we had a lot of tables with soft deletes. In many cases they were used to indicate that something should no longer be visible in the sales subsystem (for example, a product that was no longer for sale) but would still be used for other purposes (e.g.to report on sales by product supplier). We had andWhere('e.deletedAt IS NULL') scattered through our repositories like confetti. Of course, we also had the odd bug where we forgot these.

Then some wiseguy decided to whack in the SoftDeleteable Doctrine extension. Our problem with data popping up where it shouldn’t was replaced by a different class of problems. SoftDeleteable causes Doctrine to spit the dummy every time it tries to traverse a relationship to an entity that has been deleted, and our code base had many, many places where we did this. We built a utility method to disable soft-delete, run a query, and return the result, and that became the

After this experience I will never, never introduce the SoftDeleteable extension into a mature codebase again. The agony of random exceptions popping up all over the place for months afterwards was just not worth the minimal benefit that it brought us.

Marco provided three alternatives to soft-delete:

Write the thing you’re about to delete to an audit log somewhere, and then go ahead and nuke it.
Just nuke it, and restore from backup if you find out you actually needed it. Let’s hope you have your backup strategy in shipshape order first.
Add a flag to your entity that indicates what soft-deleting really represents in your business logic (e.g. an archivedAt flag). This seems to me like a case of “a rose by any other name would smell as sweet” … it’s still a soft-delete flag, it’s just got a different name…

Pass IDs instead of entities

Marco likes this idea it keeps the different parts of a system separate, and forces each part to flush its transaction before handling over to the next. I hate it because I’ve seen it in the wild, and it got very confusing, very quickly. Suddenly you have a method that takes four IDs and they aren’t named clearly and you’re not sure whether $priceId is the primary key for a ProductPrice or an OrderPrice.

I’m also not convinced that multiple flush() calls in one execution is a great idea. When possible, I like to leave everything transient until the very end and then flush() once at the end of all of the business logic. This means that everything happens in a single transaction, and there is no chance that a failed web request can leave the database in an inconsistent state.

What I’ve Learned This Week (2018-11-18)

The Lombok @Data annotation can massively cut down the bloat of writing getters, setters, equals, toString, hashCode and constructor methods for your Java DTOs. It’s fairly configurable so will allow you to roll your own methods when needed, or skip getters and setters for fields which shouldn’t be exposed to the outside world.
Log4J prefers .xml config files over .properties ones. This isn’t a big deal, until you include a dependency that has a log4j.xml file and suddenly your own log4j.properties gets ignored. Fortunately it’s possible to use system properties to drag log4j away from the lure of the third-party xml file.
PHP was originally conceived a lightweight templating system over the top of C code. Even its founder thought the idea of writing business logic in a loosely-typed language was nuts. PHP creator Rasmus Lerdorf discusses the history of PHP and how PHP7’s massive performance improvements can save the planet in this video from the WeAreDevelopers Conference 2017. I’ll have to look more into FDO (feedback-directed optimisation) as it sounds like a useful tool for the sort of projects I typically work on.
Java has moved to a six-month release cycle for each major version of Java SE. I was wondering how we moved from Java 7 (the latest and greatest thing when I last worked fulltime on Java apps) to Java 8 (which I’m currently getting to grips with) to suddenly talking about Java 12! It seems like a rather semantic change – the major number will be increased every six months instead of having update releases roughly every six months, while one release every three years will be designated as an LTS release.
Speaking of Java 12, multi-line Strings are about to get awesome thanks to the addition of a new syntax using `` to encapsulate “raw” multi-line strings. Probably not that useful for production code (maybe for Exception messages I guess?) but could come in handy for unit tests.

Data Protection Basics

Recently the PHP Roundtable podcast had an episode about privacy and GDPR compliance. As a kiwi developer who mostly works on projects intended for NZ and Australian audiences, GDPR isn’t something I’ve had to worry about thus far in my career, but there were some great takeaways from the podcast to consider when thinking about privacy and security:

GDPR may not affect you … yet – if you don’t “target” European customers, the law may not apply to you. Examples of targeting would include owning a European domain name, having an office in Europe, selling things in Euro currency, or translating your site into European languages. One comment on the pocast suggested that just making a statement like “we ship all around the world” on your website is enough to make you eligible. Even if GDPR doesn’t affect you yet, though, it’s always better to design with data protection in mind rather than having to retrofit it.
We need to think about data differently – “The biggest impact GDPR has, is that the data that businesses have is not the data of the business – it’s the data of the individual who provided the data”. This is going to require a major culture shift in a lot of businesses.
Take only what you need – the less data you collect, the less you have to misuse, or to be compromised when you are hacked. You may have a battle to fight here against your BDMs and marketing folk, but the flip side is that having fewer fields to fill in will improve the user experience too.
Plan for the long term – do you need to keep every single piece of data you’ve collected for the lifespan of the business? For example, do you need a record of every single order and every single item that a customer added to the cart, or can you just keep the last couple of years’ worth? Again, the less data you have, the less data about the customer you can leak.
Think outside the box – where does the data that you collect go? Do you manage the server that your database sits on, or is it in the cloud? What APIs do you integrate with? Do you send any emails that contain personal data? Seek written assurances (an explicit contract or part of their Ts and Cs) from each provider that they won’t be using your users’ data. Think very hard about how much data you need to send to APIs, and remember to disclose these use cases in your privacy policy.
Be careful with backups – a backup of your production database needs as much security as the database itself. You also need to ensure that recent requests for data removal or modification are honoured when you restore from the backup – so you need a separate, extremely stable mechanism for storing these.
Don’t use prod data in your test and dev environments – your developers absolutely shouldn’t have a copy of the production database on their laptops. These environments typically don’t have the same level of protection that production servers do, so they should not host real data. Your developers shouldn’t be seeing data about your real clients anyway, especially those who don’t have access to do so on the production system.
Don’t use personal data like IRD numbers as primary or foreign keys – this was something we covered in the privacy and copyright law paper at university. It’s against the Privacy Act in NZ, and it increases the risk to your user if the data is breached. Primary keys should always be fields that are unique to your system.
Consider using hashed foreign keys – if you have data that is linked to a user and only ever needs to be looked up in one direction (e.g. a log of their interactions with your system), consider hashing the key that you use to associate the records back to your user. This makes it harder for anyone who compromises your database to extract information about your users.

Next time I’m working on my side project I’m going to take a very critical eye to the data that I’m collecting and see what I can trim…

What I’ve Learned This Week (2018-11-11)

There’s a big performance difference between for and foreach in JavaScript. My preferred method for iterating over things in JS is to use underscore.js’s _.each() function – I’ve just looked up the source code and it uses for which is the most efficient option.
I become quite attached to FactoryGirl during my first professional foray into Ruby on Rails earlier this year, so this week I was researching Java alternatives. I found a great explanation of the Data Builder pattern and its application for testing. Java being Java, it’s still a bit bloatier than FactoryGirl, but a big improvement on the massive copy and paste Object Mother style testing I’ve mostly been exposed to.
Java has its own weird quirks – new BigDecimal(600.9); is not equal to new BigDecimal("600.9"); is not equal to new BigDecimal("600.90");. Java, my love, how could you do this to me? I expect this nonsense from JavaScript, but not you.
As of Java 8, interfaces can have “default methods”, i.e. an actual implementation of the method. I got my head around PHP 7’s traits towards the end of my last PHP job, and I’m pleased to have access to some sort of multiple inheritance in Java now, although until we can define member variables on an interface they will be of relatively limited use.
ExpectedExceptions are the new kid (well, “new” if you’ve been away from Java for a few years) on the how-do-I-test-my-code-throws-the-right-exception block. Compared to my former goto approach (a try-catch which fail()s at the end of the try, they are a much more elegant way to check that the right Exception was thrown with the right message, but don’t offer the same flexibility to check that object state has been cleaned up before bailing.

Sendmail for Dummies

A few weeks ago I set up my first droplet on Digital Ocean. It’s the first time I’ve been solely responsible for maintaining a server and it’s been quite the learning curve. The most frustrating thing has been trying to send emails from the server.

This list of recipes is mostly for my own future reference, but hopefully it will be useful for someone else as well. My droplet is running Ubuntu 16.04.

Finding out why your emails aren’t sending

service sendmail status reported that sendmail was exiting soon after I started it, but not why.

I found a couple of log files in /var/log, which helped me get through the first hurdles, but were mysteriously silent about the last error that I encountered.

journalctl -u sendmail.service did spit out the relevant error message but it was part of a massive several-screens-wide string and easily lost among other messages.

Eventually the most convenient option I found was to run sendmail -bp, which is supposed to show you a list of items in the queue but spat out the ‘group writable directory’ error message that I’d been getting, without any of the fluff.

Fixing ‘Cannot open {filepath}: Group writable directory’ errors

This message drove me nuts. There’s only so many times you can check that a directory is not group writable before you throw your computer out the window. Googling it gave me a lot of hits but the first few (even my trusty StackOverflow) didn’t help me to actually fix it.

This message pops up because every parent directory of the path in question needs to be not group writable. In my case it was actually the root / directory that was affected, and running chmod 755 / fixed it for me.

Emptying the queue

Let’s say you’ve been struggling with the message above for a while and don’t want your system to fire off every email it’s generated during that time when you finally get it right. Here’s how to properly empty the queue so that those emails are gone forever (h/t to StackOverflow user weeheavy for this one):

Stop sendmail
rm /var/spool/mqueue/* (remove from current queue)
rm /var/spool/mqueue-client/* (remove completely so they won’t be requeued)
mailq (confirm that the queues are empty)
Start sendmail

I finally managed to receive my first email message from my droplet last night – onto the next challenge!

What I’ve Learned This Week (2018-11-04)

There are three mechanisms for injecting values in Spring but constructor injection is superior because A) it allows you to make your fields final and B) it causes you to physically recoil at the stench of your gigantic constructor if your class has too many dependencies. Credit to Roger Guldbrandsen and Petri Kainulainen for putting the case for constructor injection so eloquently.
In Java 8 I can write List<MyThing> things = new ArrayList<>(); rather than List<MyThing> things = new ArrayList<MyThing>();. Probably very old news, but I haven’t worked full time on a Java project since 2012 and I’m absurdly excited to see that Java has become a smidgen less bloaty.
“If you feel that you’re over-communicating, you’re probably doing it correctly.” – Stella Garber (Head of Marketing, Trello) on making remote work work – something for the ‘wish somebody had told me that earlier’ file.
There are some great tools out there to help newcomers dip their toes into the open source community. During NomadPHP’s two-day open access event I watched a lightning talk Your First PR: How to Contribute to Open-Source Projects by Gareth Ellis and learned about two of them: issuehub.io and Up for Grabs. Contributing to open source is one of those things I keep meaning to do but never find the time for, but I pinky promise that next time I spot a typo in the documentation, I’ll fix it.
PHP is getting typed properties in 7.4! This is another step forward in PHP’s transformation into a mature, enterprise-ready language. There isn’t much more left to do, but I’d love to see strictly typed arrays added to the mix.