post-thumb

YAML is a Weird Beast

“YAML is a weird beast.”

Someone’s comment on a PR months ago triggered my curiosity.
I use YAML daily for configuration files: OpenAPI specs, Helm charts, Symfony configs, Spring Boot properties… yet I’d never questioned its inner workings. How complex could a configuration format really be?

Instead of just accepting it, I decided to dive deep and understand why YAML earned this reputation.
So I did what any curious developer would do: I implemented a YAML parser from scratch in PHP .

Is my parser 100% spec-compliant? No.
Was that the goal? Also no.

The goal was understanding. Learning what’s actually possible with YAML beyond basic key-value pairs. And what I discovered was fascinating.

Beyond Simple Configs: YAML’s Hidden Features

Tags: Extending YAML’s Type System

Did you know YAML supports tags?
Beyond predefined tags like !!str, !!int, !!map, and !!seq, you can define custom tags to represent specialized types.
Want to load environment variables? Include external files? Represent custom objects?
Tags make it possible.

database:
  host: !env DATABASE_HOST
  port: !env DATABASE_PORT
  credentials: !include credentials.yaml

Anchors and Aliases: DRY Configuration

YAML lets you define reusable content with anchors (&) and reference it later with aliases (*).
This works for simple values:

First occurrence: &anchor Value
Second occurrence: *anchor

And even complex structures with circular references:

person: &person
  name: John
  spouse: *spouse

spouse: &spouse
  name: Jane
  spouse: *person

The Merge Key: Inheritance in YAML

The merge key << isn’t part of the YAML 1.2 spec, but it’s widely supported and incredibly useful for creating configuration variants:

default: &default
  host: localhost
  port: 3306

development:
  <<: *default
  database: dev_db

production:
  <<: *default
  database: prod_db

It even supports merging multiple aliases with overrides:

---
- &CENTER { x: 1, y: 2 }
- &LEFT { x: 0, y: 2 }
- &BIG { r: 10 }

- # Merge one map
  <<: *CENTER
  r: 10
  label: center

- # Merge multiple maps
  <<: [ *CENTER, *BIG ]
  label: center/big

- # Merge multiple maps
  <<: [ *CENTER, *LEFT ]
  r: 4
  label: center/left

- # Override
  <<: [ *BIG, *LEFT, { r: 1 } ]
  x: 2
  label: center/left/small

Multi-Document Files

A single YAML file can contain multiple documents separated by ---:

---
name: Document 1
value: 123
---
name: Document 2
value: 456

You can also use the document end marker ... to explicitly terminate documents and define per-document directives:

%YAML 1.2
---
name: Document 1
value: 123
...
%YAML 1.1
%TAG !mytag! tag:example.com,2000:app/
---
name: Document 2
value: !mytag!customValue value
...

Explicit Key Notation: When Keys Get Complex

Most YAML users never encounter explicit key notation, but it’s essential when your key itself is a complex structure:

- sun: yellow
- ? earth: blue
  : moon: white

Or in its weirdest form: a single scalar as an explicit key followed by an empty key:

{
  ? foo :,
  : bar,
}

Weird, right?

YAML 1.2: A JSON Superset

Here’s something that surprised me: YAML 1.2 is a complete superset of JSON.
This means every valid JSON document is also valid YAML 1.2.
You can literally paste JSON into a YAML 1.2 parser, and it will work without any modifications.

# This is valid YAML 1.2
{
  "name": "John Doe",
  "age": 30,
  "hobbies": ["reading", "coding"],
  "address": {
    "city": "Berlin",
    "country": "Germany"
  }
}

This compatibility makes migration between formats seamless and allows you to gradually adopt YAML’s features while maintaining JSON compatibility where needed.
This shows YAML is very flexible. But is it too flexible?

The Real Beast: Complexity and Ambiguity

And here’s the weirdest thing about YAML: its sheer complexity and countless edge cases.
The specification itself isn’t always clear and is sometimes contradictory.
While building my parser, I had to make judgment calls on how to handle ambiguous cases.

I tried to follow the specification as closely as possible, but practical implementation sometimes required deviating from it.
That’s when I truly understood the comment from the PR I mentioned earlier.
YAML’s flexibility comes at the cost of predictability.

Conclusion

YAML is indeed a weird beast.
It’s simultaneously simple enough for basic configs and complex enough to represent intricate data structures with custom types, inheritance, and cross-references.

Would I recommend using all these features in production? Probably not.
But understanding what’s possible helps you make informed decisions about when to use YAML and when to choose something simpler.

If you’re curious about the implementation details, check out my PHP YAML parser on GitHub .
It’s not perfect, but it taught me everything I wanted to know about YAML’s quirks and complexities.