Implementing High-speed Site-wide Search for Hexo with FlexSearch in Node.js

The concept of site-wide indexing and customized FlexSearch for full content searching is now online with V0.0.2 in the testing phase.

Further expansion: large text, multiple texts, multiple file system-level text characters with low resource consumption and high-speed searches.

Purpose

Many websites or bloggers are troubled by the shortcomings of site-wide search functionality, enduring the specific search engine’s pace of article indexing (content search limited by specific old versions). The built-in search consumes too many resources, leading to slow speeds. HTML’s static sites cannot use database search…

Introducing FlexSearch (easily obtain the latest information from Accelerated Browsing) isn’t the focus here. Node.js entry was the focus of my previous article: Node.js Development Environment and Application Examples.

This article primarily shares the practical application of Node.js + FlexSearch within this site (Hexo) to achieve full-text search, used for the “Related Content” section when synchronously releasing articles on Carl Notes Blog Garden (via iframe method). Due to the prohibition of adding JavaScript code within articles on Blog Garden, which caused cross-domain Ajax to fail in fetching relevant content from the CDN site. This was a huge pitfall, which troubled me for around 2-3 days.

FlexSearch is particularly helpful for content searches, while Hexo’s implementation of related content is also good, using plugin algorithms to compare the relationships, keywords, weights, and more among my current 300+ articles. For details, see the section on [Related Articles and How to Increase to 10 in Hello hexo].

However, Hexo (my current version) has a rudimentary implementation of site-wide search, and I wonder if the latest version has improved this “Site-wide Search”.

1
2
3
4
hexo: 6.3.0                                                                                                
hexo-cli: 4.3.1
# My Hexo is due for an update... I'm being lazy... The new version is not all that appealing...
node: 20.10.0

The current version of Hexo downloads the search.xml file through an Ajax request, like this search.xml, into memory. It then uses JavaScript to perform local searches and matches (advantage: beautiful UI). However, my file size has reached 3M!

This approach is too heavyweight for the web. Even with the fastest CDN route, it takes at least 350ms+ to download to the local system, and with common CDNs, it takes over 1s. This is very inconvenient for users. This is a deficiency at the solution level.

How to Use

Utilizing the latest version of FlexSearch (as of mid-December 2023) is straightforward:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Declaration of three application scenarios, from simple to complex: Index, Document, and Worker; start by understanding the simplest Index
const {Index, Document, Worker} = require("flexsearch");

// Specifying parameters for the Index index, which can be quite complex; start with the simplest default
const index = new Index({
tokenize: "reverse",
depth: 2,
minlength: 3
});

// Add source content, such as multiple files, databases, etc.
index.add(post.title, post.title + post.source + post.content)

// Searching with a limit of 11 results
var searchRes = index.search(req.query.q, 11);

Related Content

Implementation/Process

For basic learning, it’s recommended to follow the process outlined here: https://expressjs.com/en/starter/installing.html (and experiment with the sample examples here: https://expressjs.com/en/starter/examples.html)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Main program file app.js, excerpt from the file contents:
var appSearchRouter = require('./routes/appsearch');
# Here, the newly created file appsearch.js is referenced

app.use('/appsearch', appSearchRouter); # You can modify this to register appsearch as needed

# The remaining file content follows the default examples at https://expressjs.com/en/starter/examples.html
# The default generated file listing is sufficient. The documentation is clear and user-friendly.
# Then, add the searchTemplate.jade file inside the views directory, with the following content:
doctype html
html
head
title= title
link(rel='stylesheet', href='/stylesheets/style.css')
body
.popup.search-popup
.search-result-container
.search-stats Found #{searchResults.length} search results
hr
ul.search-result-list
each result in searchResults
li
a(href=result.doc.url, class="search-result-title", data-pjax-state="")
mark.search-keyword= result.doc.title
a(data-pjax-state="" href=result.doc.url)
p.search-result= result.doc.subcontent
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# Inside the appsearch.js file, contains the FlexSearch-related content:

const {Index, Document, Worker} = require("flexsearch");

// Declare the document
const index = new Document({
tokenize: "reverse",
minlength: 3,
document: {
id: "id",
index: [
{
field: "content",
context: {
// depth: 2,
// resolution: 3
},
tokenize: "reverse"
}],
store: ["title", "updated", "date", "source", "content"]
}
});

// Load posts from a file, loading Hexo's database file
var pages = JSON.parse(require('fs').readFileSync(path.join(__dirname, '../../blog/db.json')).toString()).models.Post;

// Add posts to the index
pages.forEach(post => {
var idx = 1;
// index.add(post.title, post.title + post.source + post.content)
index.add({
id: post.title,
title: post.title,
"title": post.title,
"updated": post.updated,
"date": post.date,
"source" : post.source,
"related_posts": post.related_posts,
content: post._content
})
idx ++;
});

var arrSearchRes = index.search(req.query.q, {
limit: 11,
enrich: true
});

Note 1

1
2
3
// create the index
// var FlexSearch = require("flexsearch");
// var index = FlexSearch.create();

This is the syntax for an old version of Flexsearch. After Googling extensively, I found many examples in Chinese (content) using this old syntax, which didn’t work; there were countless errors, and even GPT-3.5 won’t explain what went wrong. This tells us how important versioning is.

Note 2

Jade templates are also very useful, using an indented syntax. It’s easy to generate HTML. During production, you can easily use a conversion tool to turn existing HTML into Jade (or Pug) templates, such as: https://tool.fiaox.com/template-html-pug/.

Adding CORS Support

1
2
3
4
5
res.setHeader("Access-Control-Allow-Origin","*");
res.setHeader(
"Access-Control-Allow-Methods",
"PUT, GET, POST, DELETE, HEAD, PATCH"
);

Production Deployment in Docker

Docker deployment is my preference, efficient, simple, and easily managed.

List of deployment files:

  • Copy all folders except node_modules
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Enter the Node.js container and start bash (used to execute node programs)
> docker exec -it nodejs bash

# Run the node program directly:
> node /home/app/blogsearch/bin/www
# Or, navigate to the directory then run
> cd home/app/blogsearch/
> node ./bin/www

# [First-time installation of node app] No need to re-run npm init, but express library needs to be added
> npm install express

> node ./bin/www
# Successful:
http://192.168.6.116:3001/appsearch?q=android

How to run it continuously in the background?

# Proxy out through NPM
https://query.carlzeng.com:3/appsearch?q=android
https://query.carlzeng.com:3/appsearch?q=iptv
# Add Matomo to the template
## Simply add the JS into the template

#[Version Update] How to shut down a specific node process in a Docker container, restart, and redeploy a new version
> ps -falx | head -1; ps -falx | grep 'npm\|node'
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
4 0 173 134 20 0 3324 1536 pipe_r S+ pts/2 0:00 \_ grep npm\|node
0 0 69 8 20 0 1145980 203676 do_epo Sl+ pts/1 7:32 \_ node /home/app/blogsearch/bin/www

# Found ppid as 8, kill the process; then restart
> kill -9 8
> node /home/app/blogsearch/bin/www

Sample for Node.js Reference

See above: Related Content

URL: https://query.carlzeng.com:3/appsearch?q=node.js

For example, ultimately replacing Hexo’s site-wide search functionality (turn off local_search in the configuration file), displayed as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<form action="https://query.carlzeng.com:3/appsearch?q=" method="get" target="_blank">
<div class="popup search-popup">
<div class="search-header">
<span class="search-icon">
<i class="fa fa-search"></i>
</span>
<div class="search-input-container">
<input name="q" autocomplete="off" autocapitalize="off" maxlength="28" placeholder="Search..." spellcheck="false" type="search" class="search-input">
</div>
<span class="popup-btn-close" role="button">
<i class="fa fa-times-circle"></i>
</span>
</div>
</div>
</form>

CDN Deployment

The above address isn’t elegant enough. Using a CDN forwarding address with port makes the address look a bit more elegant.

For instance: https://jp.carlzeng.com/appsearch?q=adsl

Version Updates - Release Notes

V0.0.3

  • Updated template, style.css updated
  • Added dark mode
  • Removed hidden or encrypted articles from search results

V0.0.2

  • Updated template, added [Home] and link
  • More precise content excerpt, displaying the specific content scope that first matches the keyword

Next Steps

Need to write a trigger that syncs the .db file to the server whenever hexo g or hexo d is run, to provide the latest search data source for FlexSearch; currently, this is done manually :-)

Insights

I truly love Node.js, such a concise framework, capable of customizing for practical needs. Personally, it’s very easy to get started with. It’s reportedly highly efficient in service, low in resource consumption, and supports high concurrency. I will continue to anticipate and follow the actual performance of this app…

Backend Node.js FlexSearch implementation for the server

Inspired By

Adding full text Search via FlexSearch to a Blog