Hash Comparison Design
Katachi's hash comparison is inspired by OpenAPI (formerly Swagger) specs.
Specifically, it's inspired by all the ways that I've consistently made goofy mistakes when writing them.
Here's the story of how they led to the design of Katachi's hash comparison:
3 Different Versions Of Nullable
OpenAPI has handled null
values a few different ways over the years.
- OpenAPI 2.0 (Swagger) didn't support
null
values at all, so people usedx-nullable: true
- OpenAPI 3.0 made this official by supporting
nullable: true
- OpenAPI 3.1 found a much simpler way by treating
null
as a type:type: ["string", "null"]
I like the 3.1 approach of treating null
as just another possible type.
I decided to take it further with Ruby's tools for inspecting types.
We don't need the type
description for fields -- Ruby can just tell us what type it is!
We can just literally use nil
as a possible value!
All it takes is supporting a hash value being multiple types (e.g. nil
or String
).
That led to the creation of Katachi::AnyOf
.
shape = { email: AnyOf[String, nil] }
OpenAPI Keys Are Optional By Default
In the following description, if a field is not explicitly REQUIRED or described with a MUST or SHALL, it can be considered OPTIONAL.
OpenAPI's decision to make all object keys optional by default has caught repeatedly.
"What do you mean the API response is empty?!? I tested it against the spec!"
- Me, multiple times
I wanted to prevent people from falling into that trap, so Katachi has all the keys required by default. The comparison logic would be a simple set difference:
missing_keys = shape.keys - value.keys
OpenAPI Extra Keys Are Allowed By Default
Additional properties are allowed by default in OpenAPI. To enforce maximum strictness use additionalProperties: false to block all arbitrary data.
On the flip side, OpenAPI's decision to allow extra keys in an object by default has also caught me multiple times.
"Why is the API response so big?!? It's nowhere near that bloated in the spec!"
- Me, multiple times
Again, my chosen solution is to disallow extra keys by default.The comparison logic would be a simple set difference:
extra_keys = value.keys - shape.keys
... Right? (cue foreboding music)
Sane Defaults, But Inflexible
With those decisions, the core design is starting to take shape:
- All keys are required
- No extra keys are allowed
nil
is just another possible value; no special syntax needed
That's a good set of defaults, but it's not flexible enough for most use cases.
- Keys can be optional sometimes.
- Extra keys can be allowed sometimes.
- Sometimes you only want to test a few keys.
I needed to add a way to make keys optional and a way to allow extra keys.
Allowing Optional Keys
I wanted users to not have to look up a special syntax or use a proprietary class for when they want a hash key to be optional.
Borrowing from OpenAPI 3.1's handling of null
, I added a special value :$undefined
to indicate that a key can be missing without the object being invalid.
It's really convenient for users, but it comes with a new issue. We can no longer blindly assume that every key in the shape is required.
missing_keys = shape.keys - value.keys
+ missing_keys -= optional_keys()
Allowing Extra Keys
Again, I wanted to make this easy for users without having to look up a special syntax. I eventually stumbled upon the idea of letting users add Object => Object
to match any key-value pair.
e.g. Checking just the email
compare(
value: User.last.attributes,
shape: {
"email" => request.params[:email],
Object => Object,
},
)
It looks a bit weird to have Object
as a hash key, but it's perfectly valid Ruby.
extra_keys = value.keys - shape.keys
+ extra_keys -= matching_keys()
Matching Priority
The problem with Object => Object
is that it will match literally any key-value pair.
That makes it impossible for the hash comparison to not find a valid match.
So I had to put in a way for specific key matches (e.g. email
) to take priority
over more general matches. That led to a whole branch of code for checking for exact key matches
between the shape and the value.
Non-Required Keys
Another problem with using Object => Object
for extra keys is that it's means that a key defined in the shape isn't necessarily required in the value.
If the comparison threw a :hash_mismatch
when the user's hash didn't literally have a key-value pair Object => Object
, that'd ruin that whole feature.
The lazy solution was to just ignore Object => Object
, but what if users want to be a bit stricter about their extra keys?
Symbol => String
is a normal data structure to enforce.:$email => User
is an excellent description for a lookup hash.
We need to figure out a way to distinguish between shape keys that are required and which ones are more general matching rules.
missing_keys = shape.keys - value.keys
missing_keys -= optional_keys()
+ missing_keys -= matcher_keys()
To keep things consistent, the solution ended up being to use the same compare
algorithm on the hash keys as we do on any other value.
Diagnostic Labels
All of these changes made the comparison logic much more complex than I had anticipated. What really brought it into a whole new level of complexity was the need to provide diagnostic labels for each comparison. Telling users "your hash isn't a match and we're not telling you why" is a frustrating user experience.
It needs to report:
- Which keys were missing
- Which keys were extra
- Which values didn't match
That's too much information to stuff into a flat return value - it needs to be a nested structure where each comparison reports all the factors that led to the match or mismatch.
The Final Design
That all combines to the general flow of hash comparison in Katachi:
Definitions:
VHash: Value Hash
SHash: Shape Hash
VKey: Value Key
SKey: Shape Key
VValue: Value Value
SValue: Shape Value
Katachi::Result: Did the VHash match the SHash?
missing_keys: Are all keys in the shape present in the value?
{each SKey comparisons}:
- Determine if the SKey is required or optional.
- Is the SKey a general matching rule?
- Yes: Consider it optional.
- No: It's a specific key. Does the corresponding SValue contain :$undefined?
- Yes: SKey is optional.
- No: SKey is required.
- Check if the SKey is present in the VHash.
- Identical: label as exact match.
- Match Any: label as match.
- Key Not required: label as optional.
- Else: label as missing key.
extra_keys: Are there any VKeys that aren't in the SHash?
{each VKey comparisons}:
- Is the VKey exactly in the SHash?
- Yes: label it as an exact match.
- No: Does it match any SKey matchers?
- Compare each SKey matcher to the VKey.
- Yes: label that comparison as a general match.
- No: label that comparison as a mismatch.
- Did any of them match?
- Yes: label it as a match.
- No: label it as an extra key.
values: Do the VValues match the corresponding SValues in the shape?
{each VKey comparisons}:
- Is the VKey exactly in the SHash?
- Yes: Compare the corresponding VValue to the SValue.
- Identical: label VValue as an exact match.
- Match: label VValue as a match.
- No Match: label VValue as a mismatch.
- No: Does the VKey match any SKey matching rules?
- Yes: Compare the corresponding VValue to the SValue.
- Identical: label as exact match.
- Match: label as match.
- No Match: label as mismatch.
- No: label as mismatch.
Conclusion
Yeah...
It was rough to code...
But it makes for an awesome user experience :)
shape = {
:$uuid => {
email: :$email,
first_name: String,
last_name: String,
preferred_name: AnyOf[String, nil],
admin_only_information: AnyOf[{Symbol => String}, :$undefined],
Symbol => Object,
},
}
expect(value: api_response.body, shape:).to be_match