Возможно вы искали: 'Cold Shadow'

May 15 2025 19:06:51
  • Как сделать 8Gamers.Ru домашней страницей?
  • Игры
    • База данных по играх
    • Игровые новости
    • Игровая индустрия
    • Обзоры на игры
    • Прохождения игр
    • Гайды к играм
    • Превью о играх
    • Игровые тизеры
    • Игровые арты
    • Игровые обои
    • Игровые скриншоты
    • Игровые обложки
    • Игровые трейлеры
    • Игровое видео
    • Вышедшие игры
    • Ближайшие релизы игр
  • Кино и ТВ
    • База данных по кино
    • Статьи о кино
    • Постеры
    • Кадры из кино
    • Кино трейлеры
    • Сегодня в кино
    • Скоро в кино
  • Комиксы и манга
    • Манга по алфавиту
    • База данных по комиксах
    • Читать онлайн комиксы
    • Читать онлайн манга
    • База персонажей
  • Читы и коды
    • Чит-коды для PC игр
    • Чит-коды для консольных игр
    • Трейнеры
    • Коды Game Genie
  • Моддинг
    • Модификации
    • Карты к играм
    • Программы для моддинга
    • Статьи о моддинге
  • Геймдев
    • Всё о создании игр
    • Список движков
    • Утилиты в помощь игроделу
    • Конструкторы игр
    • Игровые движки
    • Библиотеки разработки
    • 3D-модели
    • Спрайты и тайлы
    • Музыка и звуки
    • Текстуры и фоны
  • Рецензии
    • Игры
    • Кино
    • Аниме
    • Комиксы
    • Мангу
    • Саундтреки
  • Саундтреки
    • Лирика
  • Файлы
    • Патчи к играм
    • Русификаторы к играм
    • Сохранения к играм
    • Субтитры к кино
  • Медиа
    • Видео
    • Фото
    • Аудио
    • Фан-арты
    • Косплей
    • Фото с виставок
    • Девушки из игр
    • Рисунки
    • Рисуем онлайн
    • Фотохостинг
  • Юмор
    • Анекдоты
    • Афоризмы
    • Истории
    • Стишки и эпиграммы
    • Тосты
    • Цитаты
  • Флеш
    • Азартные
    • Аркады
    • Бродилки
    • Гонки
    • Для девочек
    • Для мальчиков
    • Драки
    • Квесты
    • Леталки
    • Логические
    • Мультфильмы
    • Открытки
    • Приколы
    • Разное
    • Спорт
    • Стратегии
    • Стрелялки
Статистика

Статей: 87772
Просмотров: 96111483
Игры
Injustice:  Gods Among Us
Injustice: Gods Among Us
...
Dark Souls 2
Dark Souls 2
Dark Souls II - вторая часть самой хардкорной ролевой игры 2011-2012 года, с новым героем, сюжето...
Battlefield 4
Battlefield 4
Battlefield 4 - продолжение венценосного мультиплеер-ориентированного шутера от первого ли...
Кино
Steins;Gate
Steins;Gate
Любители японской анимации уже давно поняли ,что аниме сериалы могут дать порой гораздо больше пи...
Ку! Кин-дза-дза
Ку! Кин-дза-дза
Начинающий диджей Толик и всемирно известный виолончелист Владимир Чижов встречают на шумной моск...
Обзоры на игры
• Обзор Ibara [PCB/PS2] 18357
• Обзор The Walking ... 18801
• Обзор DMC: Devil M... 19879
• Обзор на игру Valk... 15877
• Обзор на игру Stars! 17764
• Обзор на Far Cry 3 17948
• Обзор на Resident ... 16024
• Обзор на Chivalry:... 17508
• Обзор на игру Kerb... 17981
• Обзор игры 007: Fr... 16619
Превью о играх
• Превью к игре Comp... 17960
• Превью о игре Mage... 14464
• Превью Incredible ... 14721
• Превью Firefall 13479
• Превью Dead Space 3 16334
• Превью о игре SimC... 14730
• Превью к игре Fuse 15442
• Превью Red Orche... 15542
• Превью Gothic 3 16343
• Превью Black & W... 17354
Главная » Статьи » Разное » Полнотекстовый поиск и индексация больших блоков текста в MySQL (mysql text search sql)

Полнотекстовый поиск и индексация больших блоков текста в MySQL (mysql text search sql)

Ключевые слова: mysql, text, search, sql, (найти похожие документы)

Date: Fri, 05 Jul 2002 12:50:35 +0600
From: Nickolay Kondrashov <niq@relinfo.ru>
Newsgroups: fido7.su.dbms.sql
Subject: Полнотекстовый поиск и индексация больших блоков текста в MySQL

> А подскажите под каким типом данных лучьше сохранить текст в базу mysql, если
> текст больше 300 символов ... нужено будет делать поиск в этом тексте
> (поиск текста как подстроки в строке ... )

тип TEXT. до 64К.
Кстати, в mysql есть полнотекстовый поиск (см. индекс FULLTEXT).

25.2 MySQL Full-text Search
Since Version 3.23.23, MySQL has support for full-text indexing and
searching. Full-text indexes in MySQL are an index of type FULLTEXT.
FULLTEXT indexes can be created from VARCHAR and TEXT columns at CREATE
TABLE time or added later with ALTER TABLE or CREATE INDEX. For large
datasets, adding FULLTEXT index with ALTER TABLE (or CREATE INDEX) would be
much faster than inserting rows into the empty table with a FULLTEXT index.

Full-text search is performed with the MATCH function.

mysql> CREATE TABLE t (a VARCHAR(200), b TEXT, FULLTEXT (a,b));
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO t VALUES
-> ('MySQL has now support', 'for full-text search'),
-> ('Full-text indexes', 'are called collections'),
-> ('Only MyISAM tables','support collections'),
-> ('Function MATCH ... AGAINST()','is used to do a search'),
-> ('Full-text search in MySQL', 'implements vector space model');
Query OK, 5 rows affected (0.00 sec)
Records: 5 Duplicates: 0 Warnings: 0

mysql> SELECT * FROM t WHERE MATCH (a,b) AGAINST ('MySQL');
+---------------------------+-------------------------------+
| a | b |
+---------------------------+-------------------------------+
| MySQL has now support | for full-text search |
| Full-text search in MySQL | implements vector-space-model |
+---------------------------+-------------------------------+
2 rows in set (0.00 sec)

mysql> SELECT *,MATCH a,b AGAINST ('collections support') as x FROM t;
+------------------------------+-------------------------------+--------+
| a | b | x |
+------------------------------+-------------------------------+--------+
| MySQL has now support | for full-text search | 0.3834 |
| Full-text indexes | are called collections | 0.3834 |
| Only MyISAM tables | support collections | 0.7668 |
| Function MATCH ... AGAINST() | is used to do a search | 0 |
| Full-text search in MySQL | implements vector space model | 0 |
+------------------------------+-------------------------------+--------+
5 rows in set (0.00 sec)

The function MATCH matches a natural language query AGAINST a text
collection (which is simply the columns that are covered by a FULLTEXT
index). For every row in a table it returns relevance - a similarity measure
between the text in that row (in the columns that are part of the
collection) and the query. When it is used in a WHERE clause (see example
above) the rows returned are automatically sorted with relevance decreasing.
Relevance is a non-negative floating-point number. Zero relevance means no
similarity. Relevance is computed based on the number of words in the row,
the number of unique words in that row, the total number of words in the
collection, and the number of documents (rows) that contain a particular
word.

MySQL uses a very simple parser to split text into words. A ``word'' is any
sequence of letters, numbers, `'', and `_'. Any ``word'' that is present in
the stopword list or just too short (3 characters or less) is ignored.

Every correct word in the collection and in the query is weighted, according
to its significance in the query or collection. This way, a word that is
present in many documents will have lower weight (and may even have a zero
weight), because it has lower semantic value in this particular collection.
Otherwise, if the word is rare, it will receive a higher weight. The weights
of the words are then combined to compute the relevance of the row.

Such a technique works best with large collections (in fact, it was
carefully tuned this way). For very small tables, word distribution does not
reflect adequately their semantical value, and this model may sometimes
produce bizarre results.

For example, search for the word "search" will produce no results in the
above example. Word "search" is present in more than half of rows, and as
such, is effectively treated as a stopword (that is, with semantical value
zero). It is, really, the desired behavior - a natural language query should
not return every other row in 1GB table.

A word that matches half of rows in a table is less likely to locate
relevant documents. In fact, it will most likely find plenty of irrelevant
documents. We all know this happens far too often when we are trying to find
something on the Internet with a search engine. It is with this reasoning
that such rows have been assigned a low semantical value in a particular
dataset.

25.2.1 Fine-tuning MySQL Full-text Search
Unfortunately, full-text search has no user-tunable parameters yet, although
adding some is very high on the TODO. However, if you have a MySQL source
distribution (See section 4.7 Installing a MySQL Source Distribution.), you
can somewhat alter the full-text search behavior.

Note that full-text search was carefully tuned for the best searching
effectiveness. Modifying the default behavior will, in most cases, only make
the search results worse. Do not alter the MySQL sources unless you know
what you are doing!

a.. Minimal length of word to be indexed is defined in myisam/ftdefs.h
file by the line
#define MIN_WORD_LEN 4

Change it to the value you prefer, recompile MySQL, and rebuild your
FULLTEXT indexes.
b.. The stopword list is defined in myisam/ft_static.c Modify it to your
taste, recompile MySQL and rebuild your FULLTEXT indexes.
c.. The 50% threshold is caused by the particular weighting scheme chosen.
To disable it, change the following line in myisam/ftdefs.h:
#define GWS_IN_USE GWS_PROB

to
#define GWS_IN_USE GWS_FREQ

and recompile MySQL. There is no need to rebuild the indexes in this case.
25.2.2 New Features of Full-text Search to Appear in MySQL 4.0
This section includes a list of the fulltext features that are already
implemented in the 4.0 tree. It explains More functions for full-text search
entry of section H.1 Things that should be in 4.0.

a.. REPAIR TABLE with FULLTEXT indexes, ALTER TABLE with FULLTEXT indexes,
and OPTIMIZE TABLE with FULLTEXT indexes are now up to 100 times faster.
b.. MATCH ... AGAINST now supports the following boolean operators:
a.. +word means the that word must be present in every row returned.
b.. -word means the that word must not be present in every row returned.
c.. < and > can be used to decrease and increase word weight in the
query.
d.. ~ can be used to assign a negative weight to a noise word.
e.. * is a truncation operator.
Boolean search utilizes a more simplistic way of calculating the
relevance, that does not have a 50% threshold.
c.. Searches are now up to 2 times faster due to optimized search
algorithm.
d.. Utility program ft_dump added for low-level FULLTEXT index operations
(querying/dumping/statistics).
25.2.3 Full-text Search TODO
a.. Make all operations with FULLTEXT index faster.
b.. Support for braces () in boolean fulltext search.
c.. Support for "always-index words". They could be any strings the user
wants to treat as words, examples are "C++", "AS/400", "TCP/IP", etc.
d.. Support for fulltext search in MERGE tables.
e.. Support for multi-byte charsets.
f.. Make stopword list to depend of the language of the data.
g.. Stemming (dependent of the language of the data, of course).
h.. Generic user-supplyable UDF (?) preparser.
i.. Make the model more flexible (by adding some adjustable parameters to
FULLTEXT in CREATE/ALTER TABLE).

25.3 MySQL Test Suite
Until recently, our main full-coverage test suite was based on proprietary
customer data and for that reason has not been publicly available. The only
publicly available part of our testing process consisted of the crash-me
test, a Perl DBI/DBD benchmark found in the sql-bench directory, and
miscellaneous tests located in tests directory. The lack of a standardized
publicly available test suite has made it difficult for our users, as well
developers, to do regression tests on the MySQL code. To address this
problem, we have created a new test system that is included in the source
and binary distributions starting in Version 3.23.29.

The test system consist of a test language interpreter (mysqltest), a shell
script to run all tests(mysql-test-run), the actual test cases written in a
special test language, and their expected results. To run the test suite on
your system after a build, type mysql-test/mysql-test-run from the source
root. If you have installed a binary distribution, cd to the install root
(eg. /usr/local/mysql), and do scripts/mysql-test-run. All tests should
succeed. If they do not, use mysqlbug to send a bug report to
bugs@lists.mysql.com. Make sure to include the output of mysql-test-run, as
well as contents of all .reject files in mysql-test/r directory.

If you have a copy of mysqld running on the machine where you want to run
the test suite you do not have to stop it, as long as it is not using ports
9306 and 9307. If one of those ports is taken, you should edit
mysql-test-run and change the values of the master and/or slave port to one
that is available.

The current set of test cases is far from comprehensive, as we have not yet
converted all of our private tests to the new format. However, it should
already catch most obvious bugs in the SQL processing code, OS/library
issues, and is quite thorough in testing replication. Our eventual goal is
to have the tests cover 100% of the code. We welcome contributions to our
test suite. You may especially want to contribute tests that examine the
functionality critical to your system, as this will ensure that all future
MySQL releases will work well with your applications.

You can use the mysqltest language to write your own test cases.
Unfortunately, we have not yet written full documentation for it - we plan
to do this shortly. You can, however, look at our current test cases and use
them as an example. The following points should help you get started:

a.. The tests are located in mysql-test/t/*.test
b.. You can run one individual test case with mysql-test/mysql-test-run
test_name removing .test extension from the file name
c.. A test case consists of ; terminated statements and is similar to the
input of mysql command line client. A statement by default is a query to be
sent to MySQL server, unless it is recognized as internal command ( eg.
sleep ).
d.. All queries that produce results, e.g. SELECT, SHOW, EXPLAIN, etc.,
must be preceded with @/path/to/result/file. The file must contain the
expected results. An easy way to generate the result file is to run
mysqltest -r < t/test-case-name.test from mysql-test directory, and then
edit the generated result files, if needed, to adjust them to the expected
output. In that case, be very careful about not adding or deleting any
invisible characters - make sure to only change the text and/or delete
lines. If you have to insert a line, make sure the fields are separated with
a hard tab, and there is a hard tab at the end. You may want to use od -c to
make sure your text editor has not messed anything up during edit. We, of
course, hope that you will never have to edit the output of mysqltest -r as
you only have to do it when you find a bug.
e.. To be consistent with our setup, you should put your result files in
mysql-test/r directory and name them test_name.result. If the test produces
more than one result, you should use test_name.a.result, test_name.b.result,
etc.
f.. Failed test results are put in a file with the same base name as the
result file with the .reject extension. If your test case is failing, you
should do a diff on the two files. If you cannot see how they are different,
examine both with od -c and also check their lengths.
g.. You can prefix a query with ! if the test can continue after that
query returns an error.
h.. If you are writing a replication test case, you should on the first
line of the test file, put source include/master-slave.inc;. To switch
between master and slave, use connection master; and connection slave;. If
you need to do something on an alternate connection, you can do connection
master1; for the master, and connection slave1; for the slave.
i.. If you need to do something in a loop, you can use something like
this:
let $1=1000;
while ($1)
{
# do your queries here
dec $1;
}

j.. To sleep between queries, use the sleep command. It supports fractions
of a second, so you can do sleep 1.3;, for example, to sleep 1.3 seconds.
k.. To run the slave with additional options for your test case, put them
in the command-line format in mysql-test/t/test_name-slave.opt. For the
master, put them in mysql-test/t/test_name-master.opt.
l.. If you have a question about the test suite, or have a test case to
contribute, e-mail to internals@lists.mysql.com. As the list does not accept
attachemnts, you should ftp all the relevant files to:
ftp://support.mysql.com/pub/mysql/Incoming
919 Прочтений •  [Полнотекстовый поиск и индексация больших блоков текста в MySQL (mysql text search sql)] [08.05.2012] [Комментариев: 0]
Добавил: Ukraine Vova
Ссылки
HTML: 
[BB Url]: 
Похожие статьи
Название Добавил Добавлено
• Полнотекстовый поиск и индексация б... Ukraine Vova 08.05.2012
Ни одного комментария? Будешь первым :).
Пожалуйста, авторизуйтесь для добавления комментария.

Проект входит в сеть сайтов «8Gamers Network»

Все права сохранены. 8Gamers.NET © 2011 - 2025

Статьи
Рецензия на Pressure
Рецензия на Pressure
Чтобы обратить на себя внимание, начинающие маленькие разработчики, как правило, уходят в жанры, ...
Рецензия на Lost Chronicles of Zerzura
Рецензия на Lost Chron...
Игры, сделанные без любви и старания, похожи на воздушный шар – оболочка есть, а внутри пусто. Lo...
Рецензия на The Bridge
Рецензия на The Bridge
«Верх» и «низ» в The Bridge — понятия относительные. Прогуливаясь под аркой, можно запросто перей...
Рецензия на SimCity
Рецензия на SimCity
Когда месяц назад состоялся релиз SimCity, по Сети прокатилось цунами народного гнева – глупые ош...
Рецензия на Strategy & Tactics: World War 2
Рецензия на Strategy &...
Название Strategy & Tactics: World War II вряд ли кому-то знакомо. Зато одного взгляда на ее скри...
Рецензия на игру Scribblenauts Unlimited
Рецензия на игру Scrib...
По сложившейся традиции в информационной карточке игры мы приводим в пример несколько похожих игр...
Рецензия на игру Walking Dead: Survival Instinct, The
Рецензия на игру Walki...
Зомби и продукция-по-лицензии — которые и сами по себе не лучшие представители игровой биосферы —...
Обратная связь | RSS | Донейт | Статистика | Команда | Техническая поддержка