Implementing High-speed Site-wide Search for Hexo with FlexSearch in Node.js
The concept of site-wide indexing and customized FlexSearch for full content searching is now online with V0.0.2 in the testing phase.
Further expansion: large text, multiple texts, multiple file system-level text characters with low resource consumption and high-speed searches.
Purpose
Many websites or bloggers are troubled by the shortcomings of site-wide search functionality, enduring the specific search engine’s pace of article indexing (content search limited by specific old versions). The built-in search consumes too many resources, leading to slow speeds. HTML’s static sites cannot use database search…
Introducing FlexSearch (easily obtain the latest information from Accelerated Browsing) isn’t the focus here. Node.js entry was the focus of my previous article: Node.js Development Environment and Application Examples.
This article primarily shares the practical application of Node.js + FlexSearch within this site (Hexo) to achieve full-text search, used for the “Related Content” section when synchronously releasing articles on Carl Notes Blog Garden (via iframe method). Due to the prohibition of adding JavaScript code within articles on Blog Garden, which caused cross-domain Ajax to fail in fetching relevant content from the CDN site. This was a huge pitfall, which troubled me for around 2-3 days.
FlexSearch is particularly helpful for content searches, while Hexo’s implementation of related content is also good, using plugin algorithms to compare the relationships, keywords, weights, and more among my current 300+ articles. For details, see the section on [Related Articles and How to Increase to 10 in Hello hexo].
However, Hexo (my current version) has a rudimentary implementation of site-wide search, and I wonder if the latest version has improved this “Site-wide Search”.
1 | hexo: 6.3.0 |
The current version of Hexo downloads the search.xml file through an Ajax request, like this search.xml, into memory. It then uses JavaScript to perform local searches and matches (advantage: beautiful UI). However, my file size has reached 3M!
This approach is too heavyweight for the web. Even with the fastest CDN route, it takes at least 350ms+ to download to the local system, and with common CDNs, it takes over 1s. This is very inconvenient for users. This is a deficiency at the solution level.
How to Use
Utilizing the latest version of FlexSearch (as of mid-December 2023) is straightforward:
1 | // Declaration of three application scenarios, from simple to complex: Index, Document, and Worker; start by understanding the simplest Index |
Related Content
Implementation/Process
For basic learning, it’s recommended to follow the process outlined here: https://expressjs.com/en/starter/installing.html (and experiment with the sample examples here: https://expressjs.com/en/starter/examples.html)
Express and Jade-related
1 | # Main program file app.js, excerpt from the file contents: |
FlexSearch-related
1 | # Inside the appsearch.js file, contains the FlexSearch-related content: |
Note 1
1 | // create the index |
This is the syntax for an old version of Flexsearch. After Googling extensively, I found many examples in Chinese (content) using this old syntax, which didn’t work; there were countless errors, and even GPT-3.5 won’t explain what went wrong. This tells us how important versioning is.
Note 2
Jade templates are also very useful, using an indented syntax. It’s easy to generate HTML. During production, you can easily use a conversion tool to turn existing HTML into Jade (or Pug) templates, such as: https://tool.fiaox.com/template-html-pug/.
Adding CORS Support
1 | res.setHeader("Access-Control-Allow-Origin","*"); |
Production Deployment in Docker
Docker deployment is my preference, efficient, simple, and easily managed.
List of deployment files:
- Copy all folders except node_modules
1 | Enter the Node.js container and start bash (used to execute node programs) |
Sample for Node.js Reference
See above: Related Content
URL: https://query.carlzeng.com:3/appsearch?q=node.js
For example, ultimately replacing Hexo’s site-wide search functionality (turn off local_search in the configuration file), displayed as follows:
1 | <form action="https://query.carlzeng.com:3/appsearch?q=" method="get" target="_blank"> |
CDN Deployment
The above address isn’t elegant enough. Using a CDN forwarding address with port makes the address look a bit more elegant.
For instance: https://jp.carlzeng.com/appsearch?q=adsl
Version Updates - Release Notes
V0.0.3
- Updated template, style.css updated
- Added dark mode
- Removed hidden or encrypted articles from search results
V0.0.2
- Updated template, added [Home] and link
- More precise content excerpt, displaying the specific content scope that first matches the keyword
Next Steps
Need to write a trigger that syncs the .db file to the server whenever hexo g or hexo d is run, to provide the latest search data source for FlexSearch; currently, this is done manually :-)
Insights
I truly love Node.js, such a concise framework, capable of customizing for practical needs. Personally, it’s very easy to get started with. It’s reportedly highly efficient in service, low in resource consumption, and supports high concurrency. I will continue to anticipate and follow the actual performance of this app…
Related Content Cross-site Interface
Backend Node.js FlexSearch implementation for the server