Commit a7ed7f39 authored by Indrek Jentson's avatar Indrek Jentson

Esialgne komplekt andmeid

parent eaedaa0d
# Fileset for Estonian_ispell dictionary
## Dictionary files
Overview of project and licenses: http://www.meso.ee/~jjpp/speller
Original files:
* http://www.meso.ee/~jjpp/speller/estonian.aff
* http://www.meso.ee/~jjpp/speller/estonian.dict
Original files are encoded with code-page ISO-8859-13.
Current files are outcomes of following commands:
* iconv -f ISO-8859-13 -t UTF-8 -o et_ee.affix estonian.aff
* iconv -f ISO-8859-13 -t UTF-8 -o et_ee.dict estonian.dict
## Stopwords
Original file:
* https://raw.githubusercontent.com/kristel-/estonian-stopwords/master/estonian-stopwords-lemmas.txt
## Using in Postgresql full-text search
1. Copy files into $SHAREDIR/tsearch_data/, where $SHAREDIR is result of 'pg_config --sharedir'.
2. Duplicate built-in english configuration:
'''
CREATE TEXT SEARCH CONFIGURATION public.pg ( COPY = pg_catalog.english );
'''
3. Load dictionary data into database, using psql:
'''
CREATE TEXT SEARCH DICTIONARY estonian_ispell (
TEMPLATE = ispell,
DictFile = et_ee,
AffFile = et_ee,
Stopwords = estonian);
'''
4. Set up the mappings for words in configuration pg:
'''
ALTER TEXT SEARCH CONFIGURATION pg
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part
WITH estonian_ispell;
'''
5. We choose not to index or search some token types that the built-in configuration does handle:
'''
ALTER TEXT SEARCH CONFIGURATION pg
DROP MAPPING FOR email, url, url_path, sfloat, float;
'''
6. Set the session to use the new configuration, which was created in the public schema:
'''
SET default_text_search_config = 'public.pg';
'''
7. Test the set-up:
'''
SELECT * FROM ts_debug('public.pg','Tuliuues audiopõnevikus astuvad üles rahast pungil kilekott ja mitu põlvkonda armastatud Eesti poliitikuid.');
'''
CREATE TEXT SEARCH CONFIGURATION public.pg ( COPY = pg_catalog.english );
CREATE TEXT SEARCH DICTIONARY estonian_ispell (
TEMPLATE = ispell,
DictFile = et_ee,
AffFile = et_ee,
Stopwords = estonian);
ALTER TEXT SEARCH CONFIGURATION pg
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part
WITH estonian_ispell;
ALTER TEXT SEARCH CONFIGURATION pg
DROP MAPPING FOR email, url, url_path, sfloat, float;
SET default_text_search_config = 'public.pg';
DROP TEXT SEARCH DICTIONARY estonian_ispell CASCADE;
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment