Skip to content

Query Planning & Topic Resolution

The best part of Last30Days is not that it talks to many sources.

It is that the engine tries to decide where to search before it searches.

That is the real difference between a broad but noisy system and one that can find the right subreddits, handles, repos, and communities for a topic.

Planning sits near the center of the architecture

Section titled “Planning sits near the center of the architecture”

In skills/last30days/scripts/lib/pipeline.py, run() resolves a query plan before retrieval.

There are two entry paths:

  • a host-supplied --plan, sanitized through planner._sanitize_plan()
  • an internally generated plan from planner.plan_query()

This means the system supports both:

  • host-model-first planning when the harness is capable
  • engine fallback planning when it is not

That is a good design for multi-harness compatibility.

The planner is LLM-first, but not LLM-only

Section titled “The planner is LLM-first, but not LLM-only”

skills/last30days/scripts/lib/planner.py describes itself as LLM-first query planning with deterministic guards.

That is exactly what it is.

The planner:

  • infers intent categories like factual, product, comparison, breaking_news, prediction
  • chooses source priorities by intent
  • generates subqueries with separate search_query and ranking_query
  • sanitizes and constrains source choices to available sources
  • falls back deterministically when needed

That separation between search_query and ranking_query is especially smart:

  • the search query can stay keyword-heavy and platform-friendly
  • the ranking query can stay natural-language and semantically richer

In effect, the system decouples retrieval phrasing from judgment phrasing.

The code and skill contract both show a learned fear of bad literal queries.

There is a preflight quality gate in lib.preflight.check_class_1_trap() for obviously poor prompt shapes. On top of that, planner.py contains explicit prompt rules about stripping phrases like:

  • use cases
  • workflows
  • examples
  • reviews
  • comparison

from raw search queries when those phrases would hurt retrieval.

That is not theoretical neatness. It is a response to observed failure modes.

Auto-resolution is the Step 0.55 equivalent in code

Section titled “Auto-resolution is the Step 0.55 equivalent in code”

skills/last30days/scripts/lib/resolve.py::auto_resolve() is one of the repo’s most useful pieces.

It fans out several web searches concurrently to find:

  • likely subreddits
  • current-news context
  • likely X handle
  • likely GitHub profile and repos

Then it extracts and canonicalizes those hints with helpers like:

  • _extract_subreddits()
  • _extract_x_handle()
  • _extract_github_user()
  • _extract_github_repos()
  • canonicalize_github_repos()

So the engine can move from a vague topic like OpenClaw or Peter Steinberger toward targeted retrieval inputs.

That is a major quality multiplier.

Category widening is a subtle but important idea

Section titled “Category widening is a subtle but important idea”

resolve._merge_category_peers() widens subreddit coverage through lib.categories.

That means the resolver is not just collecting literal matches. It can also widen into category-adjacent communities when the topic implies them.

This is exactly the sort of thing a human researcher does instinctively and naive search pipelines miss.

Competitor mode is also planning infrastructure

Section titled “Competitor mode is also planning infrastructure”

Comparison is not handled as a prompt trick.

The repo has a real competitor subsystem:

  • CLI logic in last30days.py
  • entity discovery in lib.competitors.discover_competitors()
  • per-entity targeting overrides via --competitors-plan

competitors.py uses deterministic multi-query web search and capitalized-entity extraction to discover peers. It is a practical choice: cheap, explainable, and good enough for seeding a multi-run comparison.

Many retrieval systems focus almost entirely on source adapters.

Last30Days is stronger because it invests in the step before adapters: forming the right search program.

That matters more than it sounds. If you search the wrong communities, with the wrong handles, and the wrong phrasing, every later stage becomes cleanup.

If you plan well, fusion and rerank get much better inputs.

  • pipeline.run() treats planning as a first-class stage, not a preamble
  • planner.py separates retrieval queries from ranking queries
  • resolve.auto_resolve() is one of the highest-leverage parts of the repo
  • Comparison and competitor discovery are real pipeline features, not presentation sugar
  • Last30Days wins a lot of its quality by deciding where to look before it starts fetching