If you’ve ever wondered why some blog posts show up beautifully in Google search results with rich snippets, star ratings, or publication dates, while others look like plain text—the answer is metadata.

Metadata is information about your content. It’s the title, description, tags, publication date, author info, and structured data that tells search engines, social media platforms, and readers what your content is about before they even click.

These last few years, I’ve become somewhat of a metadata megafan and have been (sheepishly) known to proselytize at length to friends and family about its impact on our digital interactions and psyches at large. Being in the middle of reviewing my site’s information architecture, I wanted to evaluate whether I’ve been using consistent yaml front matter within and across content types.

This blog posts talks about what metadata means, in this context of webpage publishing, and why it matters. Also wanted to share my own insights and takeaways from today’s review, as well as document for my future-self.

The Invisible Infrastructure

You can think of metadata as the connective tissue between your content and the world:

Search engines use it to understand and rank content
Social media uses it to generate preview cards when sharing links
Browsers use it for bookmarks and reading lists
RSS readers use it to organize and filter content
You use it to organize and find your own work later (this is my everlasting driving force)

Without good metadata, you’re essentially putting content into a void and hoping people stumble across it. With good metadata, you’re building structured pathways to your content.

How Metadata Becomes Schema.org

When you write a blog post or create a project page, the YAML front matter you add (title, excerpt, tags, etc.) gets automatically transformed into structured Schema.org data that search engines can understand.

Here’s what that transformation actually looks like with real examples from this site:

Click through the tabs above to see how your simple YAML front matter maps to Article, HowTo, and SoftwareSourceCode schema types. The interactive examples show actual schema from this site—this is what search engines read when they crawl your pages.

What Makes Good Metadata?

Based on extensive testing across this site, here are the metadata fields that matter most:

Required Fields (All Content)

title: "Descriptive Title"
excerpt: "150-300 character summary with key concepts"
tags: [concept-1, concept-2, topic-3]  # Use hyphens, not spaces!
date: YYYY-MM-DD (or YYYY-MM-DD HH:MM:SS ±ZZZZ)

Recommended Fields

subtitle: "Context-setting subtitle for SEO"
last_modified_at: YYYY-MM-DD
stack: [Technology, Framework, Tool]  # For technical content
header:
  overlay_image: /assets/images/hero.jpg
  teaser: /assets/images/card.jpg

Schema.org: Structured Data in Practice

Beyond basic metadata (title, description, tags), structured metadata using Schema.org vocabularies tells search engines the type of content you’re publishing. This site automatically generates three types:

Article Schema: For regular blog posts
HowTo Schema: For tutorials (auto-detected when categories or tags include “tutorial”)
SoftwareSourceCode Schema: For projects in the _projects collection

What does this actually do? Schema.org makes your content eligible for rich snippets in search results—those enhanced listings with images, dates, and structured information. It doesn’t guarantee better rankings, but it makes your content more understandable to search engines.

How to verify it’s working:

View the source of any page on this site
Search for "@type": "Article" or "@type": "HowTo"
You’ll see JSON-LD structured data embedded in the <head> section

The interactive visualization above shows exactly how this site transforms your YAML front matter into Schema.org markup. The mapping happens automatically via _includes/schema.html, which is included in every page’s head section.

Common Metadata Mistakes

After auditing 53 content files across this site, here are the most common issues I found:

Missing excerpts — Content showed up in search results with random text snippets
Mixing concepts and technologies — Tags like [python, tutorial, data-viz] confused tools with topics
Inconsistent date formats — Mix of YYYY-MM-DD, MM/DD/YYYY, and missing dates entirely
No visual assets — Missing header images meant poor social media previews

Keeping Metadata Consistent

In my experience, as one’s site grows, metadata quality tends to drift. To prevent this, I built a validation script that checks:

✓ Required fields are present
✓ Excerpt length is search-friendly (150-300 chars)
✓ Tags use hyphens (not spaces)
✓ Dates are properly formatted (YYYY-MM-DD or full timestamps)
✓ Technical content has a stack field
✓ Narrative content has header images
✓ Snippets use inbox/garden status values

Here’s how to run it:

# Check all content
python scripts/validate_metadata.py

# Check specific collection
python scripts/validate_metadata.py --collection posts

The script outputs a detailed report showing exactly what’s missing or improperly formatted, making it easy to fix issues before they affect your SEO.

The Meta-Meta Insight

Here’s the recursive part: This blog post has metadata that describes it.

If you “View Source” on this page, you’ll find JSON-LD structured data that tells search engines:

This is an Article (specifically about metadata)
It was published on 2025-11-15
It’s tagged with metadata, seo, schema-org, content-strategy, and documentation
The technologies involved are Schema.org, Python, Jekyll, and YAML

That metadata will help this post show up when someone searches for “how to implement schema.org” or “metadata best practices for blogs”—which is exactly the point.

Where to Start

If you’re convinced that metadata matters but don’t know where to begin:

Audit your current content — Find what’s missing (excerpts? dates? images?)
Define your standards — Decide on required vs. recommended fields
Implement systematically — Fix existing content, set expectations for new content
Validate regularly — Create a script or checklist to prevent drift
Add structured data — Implement Schema.org markup for rich snippets

The time investment is front-loaded, but the long-term benefits—better SEO, easier content management, professional presentation—compound over time.

Resources

Schema.org documentation — Official guide to structured data
Google Rich Results Test — Validate your Schema.org implementation
Open Graph Protocol — Metadata for social media sharing
Google Search Central - Structured Data
JSON-LD specification
My metadata validation script used on this site

*This is post #1 in a series I’d like to start about my own systematic efforts at SEO. NEXT TOPIC: Using Google’s Tag tracking for understanding web traffic. <Coming Jan 2026> *

Share on

Twitter Facebook LinkedIn

Why Metadata Matters