Skip to content
GitLab
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
korp
korpora
Commits
a419e606
Commit
a419e606
authored
Oct 21, 2020
by
Neeme Kahusk
Browse files
register ud
parent
69b8f006
Changes
1
Hide whitespace changes
Inline
Side-by-side
vrt/register-ud.sh
0 → 100755
View file @
a419e606
#!/bin/bash
# Execute scripts needed to register corpora files for cwb
# check for existence of file
if
[
"$#"
-ne
1
]
||
!
[
-f
"
$1
"
]
;
then
echo
"Usage:
$0
vrt file"
>
&2
exit
1
fi
FILENAME
=
$1
FN
=
`
echo
$FILENAME
|sed
's/[.]vrt//g'
`
FNUPPER
=
`
echo
$FN
|tr
'[:lower:]'
'[:upper:]'
`
CWBVERSION
=
"cwb-3.4.26"
# CWBVERSION="cwb-3.4.11"
# 0. Clean old data
rm
-rf
/corpora/data/
$FN
rm
-rf
/corpora/registry/
$FN
mkdir
/corpora/data/
$FN
# 1. Convert vrt to cwb format
#/usr/local/$CWBVERSION/bin/cwb-encode -s -p - -d /corpora/data/$FN -R /corpora/registry/$FN -c utf8 -f /corpora/$FILENAME -P word -P baseform -P analysis -P pos -P number -P cases -P voice -P tense -P nominal -P mood -P person -P negation -S sentence:0+n -S para:0+n -S text:0+author+date
/usr/local/
$CWBVERSION
/bin/cwb-encode
-s
-p
-
-d
/corpora/data/
$FN
-R
/corpora/registry/
$FN
-c
utf8
-f
/corpora/
$FILENAME
-P
word
-P
baseform
-P
pos
-P
msd
-P
ref
-P
dephead
-P
deprel
-S
sentence:0+id
-S
text:0
# 2. Register corpus
/usr/local/
$CWBVERSION
/bin/cwb-makeall
-r
/corpora/registry
-V
$FNUPPER
# 3. Insert info about corpus
echo
-e
"Sentences:
$(
cat
$FILENAME
|grep
'<sentence'
|wc
-l
)
\n
Updated:
$(
date
-I
)
\n
"
>
data/
$FN
/.info
# # TEST
# echo $FILENAME
# echo $FN
# echo $FNUPPER
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment