Commit c66fd239 authored by Aleksandr Tkachenko's avatar Aleksandr Tkachenko
Browse files

Updated ner documentation

parent ab27bbda
......@@ -55,6 +55,74 @@ The `Text` instance provides a number of useful methods to get more information
Advanced NER
============
--------------
Tagging scheme
--------------
The default models are trained to recognize names on people, organizations and locations respecivelly tagged as PER, ORG and LOC. Named entity tags are encoded using a widely accepted BIO annotation scheme, where each label is prefixed with B or I, or the entire label is given as O. B- denotes the beginning and I- inside of an entity, while O means omitted.
Tokens with named entity labels::
pprint(list(zip(text.word_texts, text.labels)))
[('Eesti', 'B-LOC'),
('Vabariik', 'I-LOC'),
('on', 'O'),
('riik', 'O'),
('Põhja', 'B-ORG'),
('-', 'O'),
('Euroopas', 'B-LOC'),
('.', 'O'),
('Eesti', 'B-LOC'),
('piirneb', 'O'),
('põhjas', 'O'),
('üle', 'O'),
('Soome', 'B-LOC'),
('lahe', 'I-LOC'),
('Soome', 'B-LOC'),
('Vabariigiga', 'I-LOC'),
('.', 'O'),
('Riigikogu', 'B-ORG'),
('on', 'O'),
('Eesti', 'B-LOC'),
('Vabariigi', 'I-LOC'),
('parlament', 'O'),
('.', 'O'),
('Riigikogule', 'B-ORG'),
('kuulub', 'O'),
('Eestis', 'B-LOC'),
('seadusandlik', 'O'),
('võim', 'O'),
('.', 'O'),
('2005', 'O'),
('.', 'O'),
('aastal', 'O'),
('sai', 'O'),
('peaministriks', 'O'),
('Andrus', 'B-PER'),
('Ansip', 'I-PER'),
(',', 'O'),
('kes', 'O'),
('püsis', 'O'),
('sellel', 'O'),
('kohal', 'O'),
('2014', 'O'),
('.', 'O'),
('aastani', 'O'),
('.', 'O'),
('2006', 'O'),
('.', 'O'),
('aastal', 'O'),
('valiti', 'O'),
('presidendiks', 'O'),
('Toomas', 'B-PER'),
('Hendrik', 'I-PER'),
('Ilves', 'I-PER'),
('.', 'O')]
Default models that come with Estnltk are good enough for basic tasks.
However, for more serious tasks, a custom NER model is crucial to guarantee better accuracy.
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment