Built SemHub, a free semantic search tool for GitHub, learned valuable lessons from the experience. Key insights include: leverage pgvector, avoid premature optimization, recognize embedding model limits, and understand filtering complexities in vector searches. GitHub's UI has significant flaws, and my aim was to address them by enabling semantic searches across multiple repositories. While SemHub's core features aimed to improve search granularity and relevance, challenges like excessive restrictions in filtering impacted performance. Ultimately, shortening vector lengths and optimizing search strategies proved effective in enhancing responsiveness. Despite facing difficulties with the tech stack and deployment, the project provided deep learning experiences, although I consider it a failure in its execution.
What I Learned Building a Free Semantic Search Tool for GitHub and Why I Failed
