Commit f767dfdf authored by Indrek Jentson's avatar Indrek Jentson
Browse files

Set of scripts for template

parent e7f0a233
.project
.pydevproject
.settings
var/.last_change_description.txt
# <Put_here_name> experiment
This project contains a data transformation experiment of ...
Transformation includes
* <step 1>
* <step 2>
* <etc>
## Source
* Text corpora is ...
## Tools
* Perl
* Parallel
* Bash
* Python 2.7
* transform.sh - Bash script for running transformation.
* validate.sh - Bash script for running validation of output.
## Conf
* setup.sh - Bash script for installing tools.
* setenv.sh - Bash script for initiating environment variables for other scripts.
* version - File with a given version number in it.
## Var
* ...
* The first version of files in _var_ is tagged as "v1.0.0".
## Output
* ...
# Experiment template
This project contains an example of a data transformation experiment. The main principles are following:
The main principles are following:
* The experiment includes a source dataset in directory _source_.
* The experiment includes the transformation and validation tools in directory _tools_.
......@@ -8,15 +47,22 @@ This project contains an example of a data transformation experiment. The main p
* Directories _source_, _tools_ and _conf_ must remain unchanged during the transformations.
* The experiment includes the transformation parameters in directory _var_.
* All files are under version control.
* After each commit done by user an experiment environment must run transformation process
and save an output in directory _output_.
Description of the workflow:
* Active directory must be _tools_.
* An user can start a new experiment with data from any previous experiment
(script: *startchange.sh* [from_tag]; if 'from_tag' is given and it is not 'HEAD' then a new branch will be created).
* An user will make changes in the transformation parameters (in directory _var_) with any means necessary.
* After change in the transformation parameters, an experiment environment must run transformation process
and save an output in directory _output_ (script: *transform.sh*).
* If the transformation process produces the log files then they must be saved in directory _log_.
* When the results are produced, an environment must run validation process
and save an output in directory _result_.
* All output, log and result files must be commited under the same branch where previous commit was done
and tagged with experiment number.
* A validation process must compare current files with files in previous experiment and in root experiment.
* An user can start a new experiment with data from any previous experiment.
* After the results are produced, an environment must run validation process and save a report in directory _result_
(script: *validate.sh* [previous_tag (current_tag|LOCAL)]; LOCAL means that current files are compared against 'previous' set and current files are not commited yet).
* A validation process will compare current files with files in previous experiment. Also, the comparition can be done between any tagged stage in git.
* An user can review a report (file: result/diff.html) and resume the changing of transformation parameters or finish the changing (see next step).
* All changes in var, output, log and result files must be then commited and tagged with new version number
(script: *stopchange.sh* [new_tag]; if 'new_tag' is missing then value will be calculated from previous version number).
For a totally new experiment an user must:
......@@ -27,5 +73,5 @@ For a totally new experiment an user must:
* prepare the files with transformation parameters (usually transformation rules);
* start a experiment environment with created project.
NB! In the first stage of development we assume that experiments are running under Linux (Debian 8).
NB! In the first stage of development we assume that experiments are running under Linux (Debian 8) and user has sudo rights.
#!/bin/sh
# Define here all needed environment variables and use it for other shell scripts
export EXP_HOME=~/Wrk/experiment-template
export SRC_PATH=$EXP_HOME/source
export OUT_PATH=$EXP_HOME/output
export VAR_PATH=$EXP_HOME/var
export LOG_PATH=$EXP_HOME/log
export TRANS_PATH=$EXP_HOME/tools/transform
export VALID_PATH=$EXP_HOME/tools/validate
export RES_PATH=$EXP_HOME/result
#!/bin/sh
# Script for installing all needed software modules.
wget http://apertium.projectjj.com/apt/install-nightly.sh -O - | bash
apt-get install -y perl parallel python cg3
This directory is intended for specific scripts and programs for the experiment project.
This directory is intended for API scripts and programs for the experiment project:
startchange.sh
transform.sh
validate.sh
stopchange.sh
#!/bin/bash
git pull origin
if [ -z "$1" ]; then
CURRVERSION=$(<../conf/version)
else
CURRVERSION=$1
FILEVERSION=$(<../conf/version)
if [ "$CURRVERSION" != "$FILEVERSION" ]; then
TAG="v$CURRVERSION"
I=1
while [ "x"="x" ]; do
BNAME="B$TAG.$I"
if [ `git branch --list $BNAME`]; then
I=`expr $I+1`
else
echo "Creating new branch $BNAME ..."
git checkout -b $BNAME $TAG
echo "Done!"
break
fi
done
fi
fi
echo "Change IS started, you can proceed with changes in var directory."
#!/bin/bash
. ../conf/setenv.sh
if [ -z "$1" ]; then
perl -pe 's/^((\d+\.)*)(\d+)(.*)$/$1.($3+1).$4/e' < ../conf/version > ../conf/version.new
while [ "x"="x" ]; do
NEWVERSION=$(<../conf/version.new)
TAG="v$NEWVERSION"
if git rev-parse -q --verify "refs/tags/$TAG" >/dev/null; then
echo "$NEWVERSION" | perl -pe 's/^((\d+\.)*)(\d+)(.*)$/$1.($3+1).$4/e' > ../conf/version.new
else
break
fi
done
else
NEWVERSION=$1
echo "$NEWVERSION" > ../conf/version.new
fi
echo "Trying to set version number to $NEWVERSION ..."
TAG="v$NEWVERSION"
if git rev-parse -q --verify "refs/tags/$TAG" >/dev/null; then
echo "Error: git tag with value $TAG is already used, choose something else!"
rm ../conf/version.new
echo "Change IS NOT finished! Please provide unused version number."
else
echo "Success"
mv ../conf/version.new ../conf/version
git add ..
if [ -e $VAR_PATH/.last_change_description.txt ]; then
git commit -F $VAR_PATH/.last_change_description.txt
else
git commit -a -m "Changes in version $TAG: "
fi
git tag $TAG
echo "Change IS finished."
fi
#!/bin/sh
./transform/synpar.sh
This directory is intended for transformation scripts and programs for the experiment project.
# Sentence.pm
package Sentence;
use strict;
use Word;
#konstruktor
my $RecNo=0;
sub new {
my $class = shift;
my $self = {
NO => undef,
WORDS => [],
};
bless($self,$class);
return $self;
}
sub DESTROY {
$RecNo=0;
my $self= {};
}
sub empty {
$RecNo=0;
my $self=shift;
#print @{$self->{WORDS}};
#foreach $i (@{$self->{WORDS}}) {
#
#}
$self=>{};
}
sub addform {
my ($self,$line)= @_;
my $word= Word->new(trim($line));
push @{$self->{WORDS}}, $word;
$word->addno($RecNo++);
}
sub addlemma {
my $self= shift;
my $line= shift;
#print ">>>",$line,"\n";
my $word = pop @{$self->{WORDS}};
$word->addlemma(trim($line));
push @{$self->{WORDS}}, $word;
}
sub addmorphsyn {
my $self= shift;
my $line= shift;
my $opt =shift;
my $c=shift;
my $s=shift;
#print "$line";
my $word = pop @{$self->{WORDS}};
$word->addmorphsyn(trim($line),$opt,$c,$s);
push @{$self->{WORDS}}, $word;
}
sub recalculateheads{
my $self = shift;
my $item;
my $i;
foreach $item (@{$self->{WORDS}}){
#kui teisendamise käigus on tõlgenduseta token välja visatud
#siis tuleb headide numbreid muuta
#siin saaks optimeerida
foreach $i (@{$self->{WORDS}}){
if ($i->oldnumber()==$item->head()){
$item->chhead($i->number()+1);
next;
}
}
}
}
sub trim($)
{
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}
sub punctuationmarks_old{ #lisab lingid lause ruudule, mitte iseendale
my $self = shift;
my $item;
my $head=1;
foreach $item (@{$self->{WORDS}}){
if ($item->head()==0){
$head= $item->number()+1;
last;
}
}
foreach $item (@{$self->{WORDS}}){
if ($item->head()==$item->number()+1){
$item->chhead($head);
}
}
}
sub punctuationmarks{ #link eelnevale tokenile, v.a opr ja oquo
my $self = shift;
my $item;
foreach $item (@{$self->{WORDS}}){
if ($item->head()==$item->number()+1){ #kui link on iseendale
$item->chhead($item->number()); #link eelnevale
}
if ($item->feats(1,0) =~ /Opr/) {
$item->chhead($item->number()+2); #link järgnevale
next;
}
if ($item->feats(1,0) =~ /Oqu/) {
$item->chhead($item->number()+2); #link järgnevale
next;
}
if ($item->feats(1,0) =~ /Quo/ && $item->head()==0) {
$item->chhead($item->number()+2); #link järgnevale
}
}
}
sub printCONLL{
my $self=shift;
my $p=shift;
my $c=shift;
my $item;
my $i=0;
foreach $item (@{$self->{WORDS}}){
#print $i++, "\t";
print $item->number()+1, "\t";
print $item->form(),"\t";
print $item->lemma(),"\t";
print $item->cpos(),"\t";
print $item->pos($p),"\t";
print $item->feats($p,$c),"\t";
print $item->head(),"\t";
print $item->deprel($p),"\t_\t_\n";
#print $item->morphsyn(),"\n";
}
print "\n";
}
sub dellast{
my $self = shift;
my $word = pop @{$self->{WORDS}};
$RecNo--;
}
1;
\ No newline at end of file
#Word.pm
package Word;
use strict;
my $l="";
sub new {
my $class = shift;
if (@_) {$l=shift;}
my $self = {
NO => undef, #sõne järjekord lauses, algab 0st
FORM => $l,
LEMMA => undef,
ENDING => "",
CPOS => "X",
POS => "X",
FEATS => undef,
OLDNO => undef, #sõne järjekord sõltuvusinfo põhjal
HEAD => undef,
DEPREL => "xxx",
PHEAD => "_",
PDEPREL => "_",
OTHER => undef,
};
bless($self,$class);
return $self;
}
sub DESTROY {
my $self= {};
}
sub number {
my $self = shift;
return $self->{NO};
}
sub oldnumber {
my $self = shift;
if ($self->{OLDNO}) {return $self->{OLDNO};}
else {print STDERR $self->{FORM}, " ei ole sõltuvust\n";}
}
sub form{
my $self= shift;
return $self-> {FORM};
}
sub lemma{
my $self= shift;
return $self->{LEMMA};
}
sub cpos{
my $self= shift;
return $self->{CPOS};
}
sub pos{
my $self= shift;
my $f = shift;
if ($f==1){
return $self->conllpos();
}
return $self->{POS};
}
sub deprel{
my $self= shift;
my $f =shift;
if ($f>0){
return $self->conlldeprel();
}
return $self->{DEPREL};
}
sub head{
my $self= shift;
return $self->{HEAD};
}
sub feats{
my $self= shift;
my $f =shift;
my $c =shift;
if ($f) {
if ($f==1){
return $self->conllfeatures($c);
}
}
return $self->{FEATS};
}
sub trim($)
{
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}
sub addno {
my $self= shift;
my $no= shift;
$self->{NO}=$no;
}
sub addlemma {
my $self = shift;
my $l =shift;
$self->{LEMMA}="" . $l;
}
sub addmorphsyn {
my $self = shift;
my $l =shift;
my $opt=shift;
my $c=shift;
my $s=shift;
my $feats="";
if ($l =~ /(L\S+)\s*/) {
$self->{ENDING}= $1;
}
if (($l =~ /L\S+ (\S) (.*)$/) || ($l =~ /^\s*(\S) (.*)$/)) {
$self->{CPOS} = $1;
$self->{POS} = $1;
$feats=$2;
}
if ($opt==2){ #kirjuta mitmese asemele _
if ($l =~ /@.*@.*/){ #mitmese analüüsi asemele _
$self->{DEPREL}="_";
} else {
if ($l =~ /(@\S+)\s*/){
$self->{DEPREL}=$1;
} }
}
else {
if ($l =~ /(@\S+)\s*/){
$self->{DEPREL}=$1;
}
}
if ($l =~ /\#(\d+)->(\d+)\s*/){
$self->{OLDNO}= $1;
$self->{HEAD}= $2;
}elsif ($l =~ /\#(\d+)->(\?+)\s*/){
$self->{OLDNO}= $1;
$self->{HEAD}= 0;
}
else {
$self->{HEAD}=$self->{NO}+1;
}
$self->{OTHER}= "".$l;
if ($s==0){
$feats =~ s/@.*$//; #kui tahta sünt funktsiooni tunnuste hulgas, siis see rida välja kommenteerida
}
$feats =~ s/#.*$//;
$feats =~ s/\s+/\|/g;
$feats =~ s/\|$//;
if ($feats =~ /^$/) {$feats="_";}
$self->{FEATS}=trim($feats);
}
sub morphsyn{
my $self = shift;
return $self->{OTHER};
}
sub chhead{
my $self= shift;
my $no = shift;
$self->{HEAD}=$no;
}
####################################################################################################
# Sõnaliikide teisendamine
####################################################################################################
sub conllpos{
my $self = shift;
if ($self->{POS} =~ /A/) {return "A";}
if ($self->{POS} =~ /B/) {return "B";}
if ($self->{POS} =~ /D/) {return "D";}
if ($self->{POS} =~ /G/) {return "A";}
if ($self->{POS} =~ /S/ && $self->{FEATS}=~ /prop/) {return "H";}
if ($self->{POS} =~ /I/) {return "I";}
if ($self->{POS} =~ /J/) {
if ($self->{FEATS} =~ /crd/) {return "Jc";}
if ($self->{FEATS} =~ /sub/) {return "Js";}
return "J";
}
if ($self->{POS} =~ /K/) {
if ($self->{FEATS} =~ /pre/) {return "Ke";}
if ($self->{FEATS} =~ /post/) {return "Kt";}
return "K";
}
if ($self->{POS} =~ /N/) {
if ($self->{FEATS} =~ /card/) {return "N";}
if ($self->{FEATS} =~ /ord/) {return "A";}
return "N";
}
if ($self->{POS} =~ /P/) {
if ($self->{FEATS} =~ /pers/) {return "Ppers";}
return "P";
}
if ($self->{POS} =~ /S/ && $self->{FEATS} !~ /prop/) {return "S";}
if ($self->{POS} =~ /V/) {
if ($self->{FEATS} =~ /aux/) {return "Vaux";}
if ($self->{FEATS} =~ /mod/) {return "Vmod";}
if ($self->{FEATS} =~ /inf/) {return "Vinf";}
if ($self->{FEATS} =~ /sup/) {return "Vsup";}
return "V";
}
if ($self->{POS} =~ /X/) {return "X";}
if ($self->{POS} =~ /Y/) {return "Y";}
if ($self->{POS} =~ /Z/) {return "Z";}
return "M";
}
###################################################################################################
# Sünt funktsioonide teisendamine
###################################################################################################
sub conlldeprel{
my $self=shift;
if ($self->{HEAD} == 0) {
return "ROOT";
#return "_";
}
if ($self->{POS} =~ /Z/) {return "\@Punc";}
#siit nüüd lisakriipsud
#if ($self->{DEPREL} =~ /\@J/) {return "_";}
#if ($self->{DEPREL} =~ /\@<KN/) {return "_";}
#if ($self->{DEPREL} =~ /\@<INFN/) {return "_";}
#if ($self->{DEPREL} =~ /\@<NN/) {return "_";}
return $self->{DEPREL};
# if ($self->{DEPREL} =~ /\@NN>/) {return "attr";}
# if ($self->{DEPREL} =~ /\@<NN/) {return "attr";}
# if ($self->{DEPREL} =~ /\@AN>/) {return "attr";}
# if ($self->{DEPREL} =~ /\@<AN/) {return "attr";}
# if ($self->{DEPREL} =~ /\@DN>/) {return "attr";}
# if ($self->{DEPREL} =~ /\@<DN/) {return "attr";}
# if ($self->{DEPREL} =~ /\@VN>/) {return "attr";}
# if ($self->{DEPREL} =~ /\@<VN/) {return "attr";}
# if ($self->{DEPREL} =~ /\@KN>/) {return "attr";}