Scaling decision intelligence: How agentic analytics transforms data deep-dives

In logistics, ETA accuracy isn’t just a metric. It’s the backbone of downstream operations. A missed ETA triggers delayed docks, idle crews, disrupted production, and disappointed customers. But predicting truckload arrival times is inherently complex: operational behavior of the shipments, carrier network factors, GPS telemetries and human behavior, all influence the outcome.

So, when a customer flags that their ETA accuracy has dropped, the question is immediate: why?

Today, answering that question means a senior analyst spends up to two weeks sifting through 400+ operational variables, building each investigation from scratch. They know what the patterns mean, but they’re spending most of their time finding patterns instead of acting on them.

We built a system to flip that ratio. Our agentic analytics workflow handles discovery and root-cause isolation, so analysts can focus on turning insight into action.

The context gap: Navigating the data moat

project44 tracks millions of FTL shipments, and a single shipment generates over 400 data points, from shipment attributes and GPS telemetry to idle times and prediction feedback. The root cause of an accuracy drop might sit at the intersection of GPS ping quality gaps, route complexity, appointment window adherence metrics and other dimensions, buried under hundreds of irrelevant variables.

Dashboards surface the “what”, a failing carrier, a problematic route. Isolating the “why” takes days to weeks per customer, with no reusable playbook across different operational profiles.

From isolated metrics to a unified shipment profile

Before the system could reason ETA accuracy, it needed the same context an analyst would gather manually: shipment attributes, stop sequences, GPS telemetry, facility dwell times, and ETA prediction history. We unified these into ~400 variables per shipment, organized into coherent analytical dimensions, the foundation the agentic workflow operates on.

The path to autonomous analytics

With this unified data layer in place, our architecture evolved through three maturity stages.

Stage 1: LLM tool-use loop

Our first approach gave the AI direct access to analytical tools and let it explore freely. The result[JM5] : no strategic direction, no reproducibility, no cost predictability, and no human oversight before execution. One thing held: every conclusion was grounded in computed metrics from the data, never fabricated, the analytical foundation was sound, even if the strategy wasn’t.

Stage 2: Plan-then-execute

Instead of acting immediately, the system now generates a structured, human-readable analysis plan first. An analyst reviews, edits, or approves the plan before a single calculation runs. This brought reproducibility, cost predictability, and trust. However, the plans were still static,the system had to guess every drill-down step up front, before seeing any actual results.

Stage 3: Multi-level hypothesis-driven analysis

The breakthrough came from mirroring how a senior analyst works: you can’t plan a deep dive until you know what to dive into. The system now operates in two levels:

Level 1 – Full-scope exploration: The system scans the entire dataset with no filters and no assumptions. Every analytical step carries a paired hypothesis — one asserting that a factor affects ETA accuracy, one asserting it doesn’t. This forces the system to prove its findings rather than confirm its hunches. The output is a ranked set of evidence-backed findings showing where the real problems are likely.

Level 2 – Evidence-based drilldowns: Only the top-ranked findings from Level 1 move forward. This is the only stage where data filtering is allowed, ensuring the system doesn’t narrow its view prematurely.

For example, Level 1 identifies that a particular carrier contributes to a large volume for the customer while also having a low ETA Accuracy. Alongside, it also identifies that shipments with very vow GPS coverage (0.-25%) also have lower accuracy.

Level 2 then deep dives into the Carrier 1 while also holding the findings for GPS Coverage and hence analyses the GPS coverage for the Carrier specifically

At both levels, the analyst remains in the loop, reviewing plans, validating findings, and deciding which recommendations warrant action.

The human edge: Elevating decision intelligence

The goal of this autonomous framework isn’t to automate the analyst, but to automate the discovery phase. By handling the exhaustive data cleaning and relationship hunting that previously took weeks, we’ve moved the Human-in-the-Loop (HITL) gate to a higher, more strategic level.

This is the essence of decision intelligence. Our experts no longer spend their time asking «what happened»; instead, they use the AI-generated evidence to focus on mitigation, collaboration, and long-term strategy. By combining the brute-force processing power of an agentic workflow with the nuanced knowledge of a senior analyst, we are turning fragmented data into a clear, actionable roadmap for the global supply chain.

What we achieved

Insight depth: A typical run executes 60 analytical steps across both L1 and L2, covering segmentation, trend analysis, anomaly detection, and entity-level drilldowns. In validation testing, the system surfaced new driving factors that the original manual analysis had not identified, while matching all existing findings with zero misses.

Speed: The time-to-insight reduced to approximately one hour, an average 16x reduction from manual deep dive analysis.

Cost: A major 95% cost reduction observed with the agentic workflow execution.

How we validated it

We compared the system’s output against a previous manual deep-dive performed on the same dataset and customer.

The workflow identified every driving factor from the original analysis – 100% overlap with manual analysis plus new ones the analyst had not found. Most analytical approaches matched; the workflow introduced one additional approach. The only gap was in reporting granularity: segment-level breakdowns were computed but not fully surfaced in the automated report, a templating issue, not an analytical one.

To test reproducibility, we ran the workflow 3 times each on 2 different datasets with identical inputs. Major findings and recommendations held steady across runs with 80–85% overlap. Between different datasets, the system produced distinct, data-backed insights, confirming its reasons from the data rather than following a fixed script.

What we learned

Looking Ahead

Today, the system surfaces root causes. The next step is to act on them automatically .

From detection to correction. Route findings directly into downstream agentic workflows that trigger experiments to improve ETA models and pipelines,without manual intervention.

From human-directed to self-directed. Every analyst decision, customer correction, and prior run outcome feeds back as reinforcement signals. The system learns which analytical paths pay off, reducing HITL gates progressively as confidence grows.

From reactive to proactive. Auto-generate analyses for high-value customers and deliver persona-tailored insights before issues escalate,with a self-serve UI layer that lets any stakeholder trigger analysis on demand.