Course
Natural Language Fulltext Search
MySQL Tutorial
This SQL tutorial is structured for beginners to guide them from the foundational concepts to advanced data manipulation and querying techniques in SQL. By the end of this tutorial, you will have developed a robust understanding of SQL, equipping you with the knowledge to efficiently manage and analyze data across various database systems. This guide sets the stage for your growth into a skilled data professional, ready to tackle complex data challenges and contribute to the field of data analysis and database management.
Natural Language Fulltext Search
Before we fully get into the concept of Natural Language Full-text Search, let us try to understand the context of it. Nowadays, the keywords used for searches might not always match the results that users expect. So search engines are designed to focus on increasing search relevance to reduce the accuracy gap between search queries and search results. Thus, results are displayed in order of most relevance to the search keyword.
Similarly, in relational databases like MySQL, full-text search is a technique used to retrieve result-sets that might not perfectly match the search keyword. There are three types of search modes used with full-text search −
- Natural Language Mode
- Query Expansion Mode
- Boolean Mode
Natural Language Full-text Search
The Natural Language Full-text search performs the usual Full-text search in the IN NATURAL LANGUAGE mode. When a Full-text search is performed in this mode, the search results are displayed in the order of their relevance to the keyword (against which this search is performed). This is the default mode for the Full-text search.
Since this is a Full-text search, the FULLTEXT indexes must be applied on text-based columns (like CHAR, VARCHAR, TEXT datatype columns). The FULLTEXT index is a special type of index that is used to search for the keywords in the text values instead of trying to compare the keyword with these column values.
Syntax
Following is the basic syntax to perform the Natural Language Full-text Search −
SELECT * FROM table_name WHERE MATCH(column_name(s)) AGAINST ('keyword_name' IN NATURAL LANGUAGE MODE);
Example
Let us understand how to perform Natural Language Full-text Search on a database table in the following example.
For that, we will first create a table named ARTICLES containing the title and description of an article. The FULLTEXT index is applied on text columns article_title and descriptions as shown below −
CREATE TABLE ARTICLES ( ID INT AUTO_INCREMENT NOT NULL PRIMARY KEY, ARTICLE_TITLE VARCHAR(100), DESCRIPTION TEXT, FULLTEXT (ARTICLE_TITLE, DESCRIPTION)) ENGINE = InnoDB;
Now, let us insert details about articles, like their titles and DESCRIPTION, into this table using the following queries −
INSERT INTO ARTICLES (ARTICLE_TITLE, DESCRIPTION) VALUES ('MySQL Tutorial', 'MySQL is a relational database system that uses SQL to structure data stored'),('Java Tutorial', 'Java is an object-oriented and platform-independent programming language'),('Hadoop Tutorial', 'Hadoop is framework that is used to process large sets of data'),('Big Data Tutorial', 'Big Data refers to data that has wider variety of data sets in larger numbers'),('JDBC Tutorial', 'JDBC is a Java based technology used for database connectivity');
The table is created is as follows −
Using the Natural Language Mode in Full-text search, search for records of articles relevant to data, with the keyword 'data set'.
SELECT * FROM ARTICLES WHERE MATCH(ARTICLE_TITLE, DESCRIPTION) AGAINST ('data set' IN NATURAL LANGUAGE MODE);
Output
Following is the output −
As we see above, among all the articles present in the table, three search results are obtained which are relevant to the term 'data set' and are arranged in the order of their relevance. But note how keyword 'data set' is not a perfect match in the 'MySQL Tutorial' article record and its still retrieved because MySQL deals with data sets as well.
Stop Words in a Search
The Natural Language Full-text Search uses tf-idf algorithm, where 'tf' refers to term frequency and 'idf' is inverse document frequency. The search refers to the frequency of a word in a single document, and the number of documents the word is present in. However, there are some words that the search usually ignores, like words having less than certain characters. InnoDB ignores words with less than 3 characters while MyISAM ignores words less than 4 characters. Such words are known as Stopwords (the, a, an, are etc.).
Example
In the following example, we are performing a simple Natural Language Full-text Search on the ARTICLES Table created above. Let us see how stop words impact the Full-text search by performing it against two keywords: 'Big Tutorial' and 'is Tutorial'.
Searching 'Big Tutorial':
Following query performs the full-text search in Natural Language Mode against 'Big Tutorial' keyword −
SELECT ARTICLE_TITLE, DESCRIPTION FROM ARTICLES WHERE MATCH(ARTICLE_TITLE, DESCRIPTION)AGAINST ('Big Tutorial' IN NATURAL LANGUAGE MODE);
Output:
The output is obtained as −
Searching 'is Tutorial':
Following query performs the full-text search in Natural Language Mode against 'is Tutorial' keyword −
SELECT ARTICLE_TITLE, DESCRIPTION FROM Articles WHERE MATCH(ARTICLE_TITLE, DESCRIPTION)AGAINST ('is Tutorial' IN NATURAL LANGUAGE MODE);
Output:
The output is obtained as −
As we see in the example above, since the word 'Tutorial' is present in all the records of the table, all of them are retrieved in both cases. However, the order of relevance is determined by the second word of the keyword specified.
In the first case, as the word 'Big' is present in 'Big Data Tutorial', that record is retrieved first. In the second case, the order of records in the result-set are the same as that of original table since the word 'is' is a stop word, so it is ignored.
Natural-language-Fulltext-search Using a Client Program
We can also Perform Natural-language-fulltext-search operation on a MySQL database using the client program.
Syntax
PHPNodeJSJavaPython
To perform the Natural-language-Fulltext-search through a PHP program, we need to execute the following SELECT statement using the mysqli function query() as follows −
$sql = "SELECT * FROM Articles WHERE MATCH(ARTICLE_TITLE, DESCRIPTION) AGAINST ('data set' IN NATURAL LANGUAGE MODE)";$mysqli->query($sql);
Example
Following are the programs −
PHPNodeJSJavaPython
$dbhost = 'localhost';$dbuser = 'root';$dbpass = 'password';$dbname = 'TUTORIALS';$mysqli = new mysqli($dbhost, $dbuser, $dbpass, $dbname);if ($mysqli->connect_errno) { printf("Connect failed: %s", $mysqli->connect_error); exit();}// printf('Connected successfully.');
$s = "SELECT * FROM Articles WHERE MATCH(ARTICLE_TITLE, DESCRIPTION) AGAINST ('data set' IN NATURAL LANGUAGE MODE)";if ($r = $mysqli->query($s)) { printf("Table Records: \n"); while ($row = $r->fetch_assoc()) { printf(" ID: %d, Title: %s, Descriptions: %s", $row["id"], $row["ARTICLE_TITLE"], $row["DESCRIPTION"]); printf("\n"); }} else { printf('Failed');}$mysqli->close();
Output
The output obtained is as shown below −
Table Records:ID: 4, Title: Big Data Tutorial, Descriptions: Big Data refers to data that has wider variety of data sets in larger numbersID: 1, Title: MySQL Tutorial, Descriptions: MySQL is a relational database system that uses SQL to structure data storedID: 3, Title: Hadoop Tutorial, Descriptions: Hadoop is framework that is used to process large sets of data