Rendered at 14:17:31 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
zwaps 10 hours ago [-]
This reads like an AI writing to a point an author wants to make without there being evidence.
It is completely incoherent. Apparently we just need markdown and git, but also a knowledge graph and pgvector which accounts for most of the performance.
We don’t need semantic search, because we use… hybrid search (semantic search plus bm25)???
Really bad look for an AI consulting company this.
zwaps 10 hours ago [-]
The way this argues against its own premise is really ChatGPT like.
This happens when you ask it writing about something that isn’t actually true.
Which again is funny because as an AI consulting company you should have expertise to know what you want to weite about
But good news, this company also has a free ebook. I am sure it is fantastic.
DSemba 3 hours ago [-]
I once made a slide deck on vector vs keyword search, right when vector dbs were on the rise:
Seeing the confusion this article caused maybe someone will find it useful.
jfreds 15 hours ago [-]
I’m confused: is it just markdown files in git? Or does the hybrid graph+semantic layer matter? If the latter is true, the title is just clickbait right
antonvs 6 hours ago [-]
[dead]
bob1029 8 hours ago [-]
A combination of SQL and git seems to work best. They tend to complement each other really well.
git.exe can't tell you things like how many references a specific type has. It can get close with grep and friends, but it's not very precise. Preprocessing the codebase into various SQL tables using compiler tools can provide these insights in a much more stable way.
skiing_crawling 15 hours ago [-]
How does git replace a vector db search exactly? They are orthogonal. Are you gonna burn a million tokens every time you wanna find some relevant files?
DSemba 3 hours ago [-]
I see the authors don't argue it well enough, but one could use AI agents with simple grep and that proved to be efficient enough to be the default in Claude Code.
I personally turned of indexing feature in Cursor and I use it without it - I haven't noticed any accuracy drop, though my codebase is not enterprise-size one.
Kim_Bruning 14 hours ago [-]
I think just one useful thing is mentioned: put your md files in git and then have a hybrid fts/vector search across them actually does work better? Not a very surprising conclusion. Doesn't need that much text to explain, does it?
TranquilMarmot 14 hours ago [-]
This should be common sense and immediately obvious to anybody who has spent more than a few hours with a coding agent.
GrinningFool 15 hours ago [-]
This isn't even well-slopped slop.
knighthack 12 hours ago [-]
I really don't get having a ton of MD files lying around. And the possibility of having to edit a thousand MD files when the metadata frontmatter changes.
A single SQLite database implements columns/metadata handling, and comes baked-in with FTS and BM25 ranking too.
It is completely incoherent. Apparently we just need markdown and git, but also a knowledge graph and pgvector which accounts for most of the performance.
We don’t need semantic search, because we use… hybrid search (semantic search plus bm25)???
Really bad look for an AI consulting company this.
But good news, this company also has a free ebook. I am sure it is fantastic.
https://vec3.ai/
Seeing the confusion this article caused maybe someone will find it useful.
git.exe can't tell you things like how many references a specific type has. It can get close with grep and friends, but it's not very precise. Preprocessing the codebase into various SQL tables using compiler tools can provide these insights in a much more stable way.
I personally turned of indexing feature in Cursor and I use it without it - I haven't noticed any accuracy drop, though my codebase is not enterprise-size one.
A single SQLite database implements columns/metadata handling, and comes baked-in with FTS and BM25 ranking too.