The AI monitoring assistant revealed where the monitoring tool couldn’t scale.
Users weren’t scanning dashboards anymore.
They were asking direct questions.
The system had data. It lacked decision structure.
Dashboard
Multi-signal visual exploration
Deep data layers for expert users
Shift to Chat
Became the entry point for non-expert reps
What Broke
Results reordered unpredictably
Large queries hit compute limits
+ New users needed monitoring insights before they explored dashboards.
AI Chat for an Agriculture Monitoring Platform
A conversational interface integrated into a monitoring tool used by field sales teams.
Narrow decision windows
Weather & disease uncertainty
Field-level variability
Financial stakes and Crop loss
One sales representative
manages 50+ farmers making decisions across 30,000+ acres.
Monitoring tool help surface risk early.
Reducing crop loss by up to 30%.
Prioritization turned chat into a decision surface.
The system needed deterministic logic before AI responses could be trusted.
Fewer clarification questions
Users asked fewer follow-up questions when reviewing monitoring insights.
Adopted beyond the original feature
The prioritization logic was later adopted by another internal tool.
+ Reaching this level required deeper system decisions.
Three pressures shaped the problem space.
User needs, infrastructure limits, and AI trust requirements all pulled in different directions.
The real issue wasn’t missing insights. It was the structure between signals and responses.
+ The issue wasn’t missing data. It was missing structure.
User Clarity Pressure
Chat became the entry point for many new users.
Typical questions:
+ What should I focus on first today?
+ Where is precipitation behind?
Users required answers. Dashboards weren’t the right surface for them.
Infrastructure Constraint Pressure
Large territory queries stressed the system.
+ Compute truncation
+ Aggregation mismatch
Engineering pushed a narrow scope for v1.
AI Trust Pressure
Early responses revealed reliability problems.
+ Inconsistent signal prioritization
+ Hallucinated insights
Leadership escalated the issue to align on safeguards.
The model was deciding things it shouldn’t.
Without ordering logic, responses became unstable. Users saw different priorities across similar queries.
The system needed a decision layer before an AI layer.
+ The real design problem became a ranking problem.
Monitoring Signals
GDU, Precipitation, Growth Stage, Disease
Decision Layer
Ranked operations, Prioritized fields
AI Chat Response
Prevalence, Count, Severity
Scenario A - Isolated Severity
1 severe field
Scenario B - Widespread Exposure
7 moderately elevated fields
Before
Scenario A ranked first
LLM often ranked based on the most dramatic signal.
After
Scenario B ranked first
Evaluation Order →
Prevalence
% of fields affected within an operation
Count
Total number of affected fields
Severity
Intensity of deviation within a field
Scenario A
Low (1/16 fields)
Low (1 field)
High (severe spike)
Scenario B
High (7/16 fields)
High (7 fields)
Medium (moderate deviation)
Why it matters for users -
Widespread affected fields indicate an emerging operation-level problem.
This increases the likelihood of coordinated crop stress and potential yield loss — the point where intervention becomes necessary, not just monitoring.With limited time, reps prioritize situations where action can meaningfully change outcomes at scale.
Deterministic ranking defined what matters first.
+ Exposure outweighs anomaly when prioritizing risk at scale.
Signal Reliability
Ranking needed model outputs with consistent baselines
Data science
AI compute limits
Large-scale queries frequently failed with “too much to compute.”
System constraint
New user entry point
New users didn’t know what to look for, so they asked generic questions.
User discovery
Leadership direction
Hallucinations raised concerns about trust in AI outputs.
Strategic alignment
Reusable AI pattern
The approach needed to scale beyond a single tool.
Platform thinking
Shipping Reality
The system had to work within existing pipelines and timelines.
Delivery scope
AI wasn’t the challenge. Making it reliable was.
+ We didn’t design a response. We defined how decisions are made.
Decision Architecture for Monitoring AI
Query Layer
Shapes the request before computation
Query + Filters: field / operation / agency, growth stage
+ Narrows scope to relevant fields and ensures outputs match user context
Decision Layer
Turns signals into ranked priorities
Field-Level Data: Planting date, Weather
+ Defines the season window and ensures only valid signals are used
Deterministic Classification: Assigns Elevated / Stable / Insufficient states
+ Removes ambiguity and ensures consistent interpretation across queries
Operation Ranking: Prevalence → Count → Severity
+ Prioritizes widespread exposure over isolated anomalies for territory decisions
Response Layer
Turns decisions into explainable output
Structured Response: Formats overview, rankings, signals, top fields
• Standardizes outputs into a consistent decision-ready format
LLM Explanation
User: Which operations need attention?
Before
" Smith Farms and Green Valley show elevated precipitation and GDU.
Field 22 — Weather (precipitation) +24%
Field 03 — Weather (GDU) +17%
Field 14 — Disease Risk
Field 07 — Yield trending low. "
+ Fields listed without prioritization, mixed and sometimes unreliable signals upfront.
+ Requires follow-ups to understand what actually matters.
After
" 4 operations elevated
Smith Farms — 25% (17 / 68 fields)
Green Valley — 22% (9 / 41 fields)
Precip elevated in 13 fields
GDU elevated in 11 fields
Top fields:
Field 14 — Precip +33%
Field 03 — GDU +17%. "
+ Ranked operations with structured signal breakdown.
+ Decision-ready output without additional follow-ups.
Decision logic separated from AI behavior
+ Each change required deeper architectural decisions.
Behavior changed. Decisions got faster.
Measured through interaction patterns and usage shifts post-launch.
Fewer follow-ups per query
Users reached decisions with fewer prompts
Most monitoring queries resolved in a single response
Increased monitoring usage in chat
Shift from product questions → decision queries
More users engaging with monitoring insights
+ The system didn’t change the data. It changed how users acted on it.
Where I Added Leverage
Deterministic decision engine
Pushed to move ranking out of LLM and defined evaluation order to make outputs stable and defendable
Challenged anomaly-first ranking based on how farming decisions actually scale.
AI reliability & guardrails
Surfaced hallucinated outputs to leadership and Introduced guardrails to limit responses to verified signals only.
Compute-aware response design - Worked with engineering and defined “top fields + operations” strategy instead of full dataset responses.
Negotiated model scope
Drove alignment across PM, Eng, Data, and Commercial on what “good output” means
Pushed back on some data science models due to inconsistent accuracy and to ship weather-first.
Negotiated layer inclusion based on feasibility and cut non-critical signals to ensure a stable first release
New user adoption strategy
Designed responses to deliver value to new users without requiring tool knowledge
Shifted output from information → prioritized action (chat as primary decision layer)
Across-verticals alignment
Defined reusable response structure across hierarchy levels enabling cross-team adoption
A reusable decision layer for AI products
Each tool had different needs. The system adapted without redesign.
Designing for AI isn’t about generating answers.
It’s about structuring decisions.
+ The structure stayed constant. The logic adapted.
Query Layer
Decision Layer
Response Layer
Monitoring (Our tool)
Growth stage
View level
Weather check
Risk priority
Overview
Key fields
Signal drivers
Fungicide (closely matching tool)
Crop type
Timing window
Disease risk
Action timing
Next action
Risk reason
Products (different suite)
Product mix
Field group
Not needed
Plan output
Expected impact
+2 tools till now
Custom filters
Custom logic
Structured output


