Welcome to Data Infrastructure Weekly Oracle has finally broken its silence on MySQL's future. After years of losing ground to PostgreSQL, they're now publicly committing to bring previously commercial features into the free Community Edition — with vector search topping the list. Vector support is the backbone of most modern AI applications, and MySQL's absence there has been costly as PostgreSQL has dominated the open source conversation. For data folks, this is one to watch. MySQL was the dominant database of the early web era, but it hasn't shown up much in the current AI wave. If Oracle delivers, MySQL could re-enter conversations it has been frozen out of for years. The community is cautiously optimistic but wants proof, not promises. What actually ships around April 2026 will tell us everything. Other key moves shaping the data space this week: - AI is in 96.5% of production databases, and only 28.1% of orgs have governance that can prove it
- AI is creating more jobs than it eliminates, but the data quality problem limiting ROI hasn't been solved
- Why the data governance model that depends on human execution was always going to fail
Let's break it all down. 👇 | Why it matters: Qdrant's $50 million Series B, closed alongside its 1.17 platform release, reflects a specific argument: the retrieval problem did not shrink when agents arrived. It scaled up. The context window problem: Agents operate on proprietary enterprise data they were never trained on, across millions of documents that change continuously. Context windows manage session state. They don't provide high-recall search across that data, maintain retrieval quality as it changes, or sustain the query volumes autonomous decision-making generates. Even memory frameworks positioned as alternatives rely on vector storage underneath. A missed result at agent scale isn't a latency problem — it's a decision quality problem that compounds across every retrieval pass in a single agent turn. Vectors are still needed: General-purpose databases degrade under agent query volumes: freshly ingested data sits in unoptimized segments before indexing catches up, and a single slow replica adds latency across every parallel tool call in an agent turn. Purpose-built retrieval handles both, but the starting point is still whatever vector support is already in your stack. Three conditions signal a move: retrieval quality is tied to business outcomes, query patterns involve expansion and multi-stage re-ranking, or data volumes cross tens of millions of documents.
|
How a Data Team Built a Customer Insights App in Retool Most data teams sit on a goldmine of unstructured GTM data — Gong calls, support tickets, product feedback — with no good way to surface it. Malcolm Angus, Analytics Engineer at Retool, solved this by building a self-serve Gong insights app that sales, CS, and marketing use daily. Join us March 25 at 10am PT to see how you can turn raw call data into a searchable app your GTM teams actually use, no frontend skills required. |
Why it matters: Liquibase's 2026 State of Database Change Governance Report found that 68.1% of organizations deploy database changes weekly or faster, while only 28.1% report standardized, consistently enforced governance. The risk surface: For data and infrastructure leads, the issue isn't that AI touches production data — it's whether the org can prove control at the database layer when change is frequent, environments are heterogeneous and AI introduces new pathways for access. Read More |
LLM Observability & Evaluation Research This survey explores how teams are selecting LLMOps tools, defining success metrics, and planning their next phase of investment in evaluation infrastructure. |
Why it matters: Snowflake's "ROI of Gen AI and Agents" report, a global survey of 2,050 business and technology leaders, finds organizations earn roughly $1.49 for every dollar invested in AI. Yet 96% still face significant challenges with data quality, skills gaps and integration with legacy systems.
The data dependency: Data analytics is one of the functions seeing both the strongest job growth and the largest workforce reductions — 37% of organizations report cuts in data analytics roles, on par with customer service. The split reflects what's actually happening: roles building AI infrastructure are growing, roles doing work AI can now automate are shrinking. Among early adopters, 92% report positive ROI, but the strongest returns come from mature deployments built on governed data foundations — which makes the data quality gap the more urgent problem. Read More |
Beyond the Pilot Podcast: Episode 8 LangChain told employees they cannot install OpenClaw on company laptops due to "massive security risk" — yet this unhinged approach is exactly what makes it work. Harrison Chase unpacks why OpenClaw succeeds where AutoGPT failed, and why context engineering, not just smarter models, separates demo agents from production-ready systems.. Listen to Episode 8 |
Why it matters: Incomplete and inconsistent metadata is among the most common reasons AI initiatives stall at the pilot stage. Alation's Curation Automation, now generally available, automates metadata governance at scale.
The architecture: The outcome-based governance system connects a common data manager for identifying and governing business-critical data elements, data quality for automated validation and monitoring, and curation automation for enriching metadata with business context at scale. Organizations declare what compliance looks like once; agents enforce it continuously. Every change is previewable before execution and fully auditable. Read More |
|
|
Thanks for reading. We'll be back in your inbox next week. – Sean Michael Kerner – Sean Michael Kerner (@seankerner) (For U.S.-based readers, don't miss VB on Google: Add us to your trusted feeds. Click here.) |
|
|
|