Teachnique
      CourseRoadmaps
      Login

      HomeIntroductionFeaturesVersionsVariablesInstallationAdministrationPHP SyntaxNode.js SyntaxJava SyntaxPython SyntaxConnectionWorkbench

      Create DatabaseDrop DatabaseSelect DatabaseShow DatabaseCopy DatabaseDatabase ExportDatabase ImportDatabase Info

      Create UsersDrop UsersShow UsersChange PasswordGrant PrivilegesShow PrivilegesRevoke PrivilegesLock User AccountUnlock User Account

      Create TablesShow TablesAlter TablesRename TablesClone TablesTruncate TablesTemporary TablesRepair TablesDescribe TablesAdd/Delete ColumnsShow ColumnsRename ColumnsTable LockingDrop TablesDerived Tables

      QueriesConstraintsInsert QuerySelect QueryUpdate QueryDelete QueryReplace QueryInsert IgnoreInsert on Duplicate Key UpdateInsert Into Select

      Create ViewsUpdate ViewsDrop ViewsRename Views

      IndexesCreate IndexDrop IndexShow IndexesUnique IndexClustered IndexNon-Clustered Index

      Where ClauseLimit ClauseDistinct ClauseOrder By ClauseGroup By ClauseHaving ClauseAND OperatorOR OperatorLike OperatorIN OperatorANY OperatorEXISTS OperatorNOT OperatorNOT EQUAL OperatorIS NULL OperatorIS NOT NULL OperatorBetween OperatorUNION OperatorUNION vs UNION ALLMINUS OperatorINTERSECT OperatorINTERVAL Operator

      Using JoinsInner JoinLeft JoinRight JoinCross JoinFull JoinSelf JoinDelete JoinUpdate JoinUnion vs Join

      Unique KeyPrimary KeyForeign KeyComposite KeyAlternate Key

      TriggersCreate TriggerShow TriggerDrop TriggerBefore Insert TriggerAfter Insert TriggerBefore Update TriggerAfter Update TriggerBefore Delete TriggerAfter Delete Trigger

      Data TypesVARCHARBOOLEANENUMDECIMALINTFLOATBITTINYINTBLOBSET

      Regular ExpressionsRLIKE OperatorNOT LIKE OperatorNOT REGEXP Operatorregexp_instr() Functionregexp_like() Functionregexp_replace() Functionregexp_substr() Function

      Fulltext SearchNatural Language Fulltext SearchBoolean Fulltext SearchQuery Expansion Fulltext Searchngram Fulltext Parser

      Date and Time FunctionsArithmetic OperatorsNumeric FunctionsString FunctionsAggregate Functions

      NULL ValuesTransactionsUsing SequencesHandling DuplicatesSQL InjectionSubQueryCommentsCheck ConstraintsStorage EnginesExport Table into CSV FileImport CSV File into DatabaseUUIDCommon Table ExpressionsOn Delete CascadeUpsertHorizontal PartitioningVertical PartitioningCursorStored FunctionsSignalResignalCharacter SetCollationWildcardsAliasROLLUPToday DateLiteralsStored ProcedureExplainJSONStandard DeviationFind Duplicate RecordsDelete Duplicate RecordsSelect Random RecordsShow ProcesslistChange Column TypeReset Auto-IncrementCoalesce() Function

      Useful FunctionsStatements ReferenceQuick GuideUseful ResourcesDiscussion

      Feedback

      Submit request if you have any questions.

      Course
      ngram Fulltext Parser

      MySQL Tutorial

      This SQL tutorial is structured for beginners to guide them from the foundational concepts to advanced data manipulation and querying techniques in SQL. By the end of this tutorial, you will have developed a robust understanding of SQL, equipping you with the knowledge to efficiently manage and analyze data across various database systems. This guide sets the stage for your growth into a skilled data professional, ready to tackle complex data challenges and contribute to the field of data analysis and database management.

      ngram Full-Text Parser

      Usually in Full-Text searching, the built-in MySQL Full-Text parser considers the white spaces between words as delimiters. This determines where the words actually begin and end, to make the search simpler. However, this is only simple for languages that use spaces to separate words.
      Several ideographic languages like Chinese, Japanese and Korean languages do not use word delimiters. To support full-text searches in languages like these, an ngram parser is used. This parser is supported by both InnoDB and MyISAM storage engines.

      The ngram Full-Text Parser

      An ngram is a continuous sequence of 'n' characters from a given sequence of text. The ngram parser divides a sequence of text into tokens as a contiguous sequence of n characters.
      For example, consider the text 'Tutorial' and observe how it is tokenized by the ngram parser −
      n=1: 'T', 'u', 't', 'o', 'r', 'i', 'a', 'l'
      n=2: 'Tu', 'ut', 'to' 'or', 'ri', 'ia' 'al'
      n=3: 'Tut', 'uto', 'tor', 'ori', 'ria', 'ial'
      n=4: 'Tuto', 'utor', 'tori', 'oria', 'rial'
      n=5: 'Tutor', 'utori', 'toria', 'orial'
      n=6: 'Tutori', 'utoria', 'torial'
      n=7: 'Tutoria', 'utorial'
      n=8: 'Tutorial'
      
      The ngram full-text parser is a built-in server plugin. As with other built-in server plug-ins, it is automatically loaded when the server is started.

      Configuring ngram Token Size

      To change the token size, from its default size 2, use the ngram_token_size configuration option. The range of ngram values is from 1 to 10. But to increase the speed of search queries, use smallers token sizes; as smaller token sizes allow faster searches with smaller full-text search indexes.
      Because ngram_token_size is a read-only variable, you can only set its value using two options:
      Setting the --ngram_token_size in startup string:
      mysqld --ngram_token_size=1
      
      Setting ngram_token_size in configuration file 'my.cnf':
      [mysqld]
      
      ngram_token_size=1
      

      Creating FULLTEXT Index Using ngram Parser

      A FULLTEXT index can be created on columns of a table using the FULLTEXT keyword. This is used with CREATE TABLE, ALTER TABLE or CREATE INDEX SQL statements; you just have to specify 'WITH PARSER ngram'. Following is the syntax −
      CREATE TABLE table_name (
      column_name1 datatype,
      column_name2 datatype,
      column_name3 datatype,
      ...
      FULLTEXT (column_name(s)) WITH PARSER NGRAM
      ) ENGINE=INNODB CHARACTER SET UTF8mb4;
      

      Example

      In this example, we are creating a FULLTEXT index using the CREATE TABLE statement as follows −
      CREATE TABLE blog (
      ID INT AUTO_INCREMENT NOT NULL,
      TITLE VARCHAR(255),
      DESCRIPTION TEXT,
      FULLTEXT ( TITLE, DESCRIPTION ) WITH PARSER NGRAM,
      PRIMARY KEY(id)
      ) ENGINE=INNODB CHARACTER SET UTF8MB4;
      
      SET NAMES UTF8MB4;
      
      Now, insert data (in any ideographic language) into this table created −
      INSERT INTO BLOG VALUES
      (NULL, '教程', '教程是对一个概念的冗长研究'),
      (NULL, '文章', '文章是关于一个概念的基于事实的小信息');
      
      To check how the text is tokenized, execute the following statements −
      SET GLOBAL innodb_ft_aux_table = "customers/blog";
      
      SELECT * FROM
      INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE
      ORDER BY doc_id, position;
      

      ngram Parser Space Handling

      Any whitespace character is eliminated in the ngram parser when parsing. For instance, consider the following TEXT with token size 2 −
      • "ab cd" is parsed to "ab", "cd"
      • "a bc" is parsed to "bc"

      ngram Parser Stop word Handling

      Apart from the whitespace character, MySQL has a stop word list consisting of various that are considered to be stopwords. If the parser encounters any word in the text present in the stopword list, the word is excluded from the index.

      ngram Parser Phrase Search

      Normal Phrase searches are converted to ngram phrase searches. For example, The search phrase "abc" is converted to "ab bc", which returns documents containing "abc" and "ab bc"; and the search phrase "abc def" is converted to "ab bc de ef", which returns documents containing "abc def" and "ab bc de ef". A document that contains "abcdef" is not returned.

      ngram Parser Term Search

      For natural language mode search, the search term is converted to a union of ngram terms. For example, the string "abc" (assuming ngram_token_size=2) is converted to "ab bc". Given two documents, one containing "ab" and the other containing "abc", the search term "ab bc" matches both documents.
      For boolean mode search, the search term is converted to an ngram phrase search. For example, the string 'abc' (assuming ngram_token_size=2) is converted to '"ab bc"'. Given two documents, one containing 'ab' and the other containing 'abc', the search phrase '"ab bc"' only matches the document containing 'abc'.

      ngram Parser Wildcard Search

      Because an ngram FULLTEXT index contains only ngrams, and does not contain information about the beginning of terms, wildcard searches may return unexpected results. The following behaviors apply to wildcard searches using ngram FULLTEXT search indexes:
      • If the prefix term of a wildcard search is shorter than ngram token size, the query returns all indexed rows that contain ngram tokens starting with the prefix term. For example, assuming ngram_token_size=2, a search on "a*" returns all rows starting with "a".
      • If the prefix term of a wildcard search is longer than ngram token size, the prefix term is converted to an ngram phrase and the wildcard operator is ignored. For example, assuming ngram_token_size=2, an "abc*" wildcard search is converted to "ab bc".

      ngram Full-Text Parser Using a Client Program

      We can also perform ngram full-text parser operation using the client program.

      Syntax

      PHPNodeJSJavaPython
      To perform the ngram fulltext parser through a PHP programe, we need to execute the "Create" statement using the mysqli function query() as follows −
      $sql = "CREATE TABLE blog (ID INT AUTO_INCREMENT NOT NULL, title VARCHAR(255), DESCRIPTION TEXT, FULLTEXT ( title, DESCRIPTION ) WITH PARSER NGRAM, PRIMARY KEY(id) )ENGINE=INNODB CHARACTER SET UTF8MB4";
      $mysqli->query($sql);
      

      Example

      Following are the programs −
      PHPNodeJSJavaPython
      $dbhost = "localhost";
      $dbuser = "root";
      $dbpass = "password";
      $dbname = "TUTORIALS";
      $mysqli = new mysqli($dbhost, $dbuser, $dbpass, $dbname);
      if ($mysqli->connect_errno) {
      printf("Connect failed: %s
      ", $mysqli->connect_error);
      exit();
      }
      // printf('Connected successfully.');
      /*CREATE Table*/
      $sql = "CREATE TABLE blog (ID INT AUTO_INCREMENT NOT NULL, title VARCHAR(255), description TEXT, FULLTEXT ( title, description ) WITH PARSER NGRAM, PRIMARY KEY(id) )ENGINE=INNODB CHARACTER SET UTF8MB4";
      $result = $mysqli->query($sql);
      if ($result) {
      printf("Table created successfully...!\n");
      }
      //insert data
      $q = "INSERT INTO blog (id, title, description) VALUES (NULL, '教程', '教程是对一个概念的冗长研究'), (NULL, '文章', '文章是关于一个概念的基于事实的小信息')";
      if ($res = $mysqli->query($q)) {
      printf("Data inserted successfully...!\n");
      }
      //we will use the below statement to see how the ngram tokenizes the data:
      $setglobal = "SET GLOBAL innodb_ft_aux_table = 'TUTORIALS/blog'";
      if ($mysqli->query($setglobal)) {
      echo "global innodb_ft_aux_table set...!";
      }
      $s = "SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE ORDER BY doc_id, position ";
      if ($r = $mysqli->query($s)) {
      print_r($r);
      }
      //display data (ngram parser phrase search);
      $query = "SELECT * FROM blog WHERE MATCH (title, description) AGAINST ('教程')";
      if ($r = $mysqli->query($query)) {
      printf("Table Records: \n");
      while ($row = $r->fetch_assoc()) {
      printf(
      "ID: %d, Title: %s, Descriptions: %s",
      $row["id"],
      $row["title"],
      $row["description"]
      );
      printf("\n");
      }
      } else {
      printf("Failed");
      }
      $mysqli->close();
      

      Output

      The output obtained is as shown below −
      global innodb_ft_aux_table set...!mysqli_result Object
      (
      [current_field] => 0
      [field_count] => 6
      [lengths] =>
      [num_rows] => 62
      [type] => 0
      )
      Table Records:
      ID: 1, Title: 教程, Descriptions: 教程是对一个概念的冗长研究
      ID: 3, Title: 教程, Descriptions: 教程是对一个概念的冗长研究