search

PostgreSQL BM25 Full-Text Search: Speed Up Performance With These Tips

PostgreSQL Full-Text Search (FTS) can outperform dedicated search engines when optimized correctly. A benchmark showed significant speeds, with optimizations yielding a ~50x performance increase by pre-calculating and storing the tsvector, and configuring GIN indexes properly. Common pitfalls include calculating tsvector on-the-fly and using the default GIN index setting (fastupdate=on), which hinders performance. For advanced ranking tasks, the VectorChord-BM25 extension may be required, offering better relevance scoring than standard methods. Optimal configurations reveal that standard FTS is faster than often perceived.

https://blog.vectorchord.ai/postgresql-full-text-search-fast-when-done-right-debunking-the-slow-myth

What I Learned Building a Free Semantic Search Tool for GitHub and Why I Failed

Built SemHub, a free semantic search tool for GitHub, learned valuable lessons from the experience. Key insights include: leverage pgvector, avoid premature optimization, recognize embedding model limits, and understand filtering complexities in vector searches. GitHub's UI has significant flaws, and my aim was to address them by enabling semantic searches across multiple repositories. While SemHub's core features aimed to improve search granularity and relevance, challenges like excessive restrictions in filtering impacted performance. Ultimately, shortening vector lengths and optimizing search strategies proved effective in enhancing responsiveness. Despite facing difficulties with the tech stack and deployment, the project provided deep learning experiences, although I consider it a failure in its execution.

https://tzx.notion.site/What-I-Learned-Building-a-Free-Semantic-Search-Tool-for-GitHub-and-Why-I-Failed-1a09b742c7918033b318f3a5d7dc9751

VectorChord-BM25: Revolutionize PostgreSQL Search With BM25 Ranking

VectorChord-BM25 enhances PostgreSQL full-text search via BM25 ranking, outperforming ElasticSearch in speed (3x faster) while maintaining accuracy. Key features include optimized indexing, enhanced tokenization, and relevant scoring, making it ideal for various applications. The implementation focuses on integrating BM25 scoring natively in PostgreSQL, contrasting with other solutions like ParadeDB, which may face compatibility issues. Future developments aim to enhance tokenization for better multilingual support. Overall, VectorChord-BM25 sets a new standard for efficient, relevance-based searches in PostgreSQL.

https://blog.vectorchord.ai/vectorchord-bm25-revolutionize-postgresql-search-with-bm25-ranking-3x-faster-than-elasticsearch

Searchcode

searchcode: code search engine for 75B lines in 40M projects. Search functions, variables, libraries in 346 languages. Filter by source or language. Real-time stats show daily searches and code views. Created by Ben Boyter in Sydney, Australia.

https://searchcode.com/

Algolia Community

Algolia Community offers various projects, including API clients, extensions, frameworks, InstantSearch libraries, and tools for fast and relevant search experiences across platforms like WordPress, Magento, Shopify, and more. InstantSearch provides UI libraries for JavaScript, React, Vue, Angular, iOS, and Android. API clients are available for multiple languages, including PHP, JavaScript, Python, and Java. Showcases include searching packages, public APIs, GDPR text, and more. Tools like DocSearch and Search Grader enhance search functionality in documentation and applications.

https://community.algolia.com/

Scroll to Top