Sunday, March 22, 2009

Apertium-stuff Digest, Vol 23, Issue 8

Send Apertium-stuff mailing list submissions to
apertium-stuff@lists.sourceforge.net

To subscribe or unsubscribe via the World Wide Web, visit
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
or, via email, send a message with subject or body 'help' to
apertium-stuff-request@lists.sourceforge.net

You can reach the person managing the list at
apertium-stuff-owner@lists.sourceforge.net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Apertium-stuff digest..."


Today's Topics:

1. New Apertium Tool Project (Al?ssio Miranda)
2. Re: New Apertium Tool Project (Mikel L. Forcada)
3. Re: New Apertium Tool Project (Kevin Donnelly)
4. Re: New Apertium Tool Project (Jimmy O'Regan)
5. Re: New Apertium Tool Project (Jacob Nordfalk)


----------------------------------------------------------------------

Message: 1
Date: Sat, 21 Mar 2009 14:41:38 -0300
From: Al?ssio Miranda <alessioufv@yahoo.com.br>
Subject: [Apertium-stuff] New Apertium Tool Project
To: apertium-stuff@lists.sourceforge.net
Message-ID:
<7e6dcb90903211041n64d54427yc0f92f27171a3322@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hello All....

I'm Al?ssio, and i have a ambitious objective to develop a complete
environment to create and manage languages pairs with a cooperative paradigm
support in Apertium. It is a huge and complex project that i want to do for
my Doctor degree project at UFPR, Brazil.
For that i'm trying to create a project-team of students in the department
of my University and I want help of all community to increase the Idea. Step
by step this project will became alive.

My Master degree dissertation, that I will defend in 20/April, will attack
this points,
- A study of apertium process of create a language pair by the user
point of view,
- An Overview of this Huge objective platform do develop languages
- And a explanation of prototype interface for manager Monolingual and
Bilingual Dictionary.

So, with this study, I want do create the first Gap.

The idea is a tool to manager initially the apertium files with a better
interface.
In this first version the objective is:
Manipulate Completely Monolingual and Bilingual Dictionary's.
With this tool developers will not need to manager xml files direct.
(Monolingual and Bilingual Dictionary's)
They will open xml Files of Monolingual and Bilingual Dictionary's, edit
then with the interface and the xml file will be generated automatically .

I'm a Computer Science Student and my knowledge of linguistic and create
complex Apertium Language Pair is Limited.
In this first fase i'm creating a non-functional prototype with Adobe
Flex, that simulate the interface of the tool. I want suggestions of how
develops think to create the languages and interface features they want to
make easily the process of create a language.
The Firs idea is for Monolingual and Bilingual Dictionarys, but people
who have ides of others process will be welcome.

In the first time the prototipe will be diponible in
www.alessiojr.com/wiklats.

Sorry by the bad english.

Att

--
**************************************************************************
Al?ssio Miranda Junior - www.alessiojr.com
Master Student of Computer Science - Federal University of Paran? - Brazil
**************************************************************************
* MSN: msn@juninho.com.br E-mail: alessio@alessiojr.com.br
**************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 2
Date: Sun, 22 Mar 2009 07:48:51 +0100
From: "Mikel L. Forcada" <mlf@dlsi.ua.es>
Subject: Re: [Apertium-stuff] New Apertium Tool Project
To: apertium-stuff@lists.sourceforge.net
Cc: Al?ssio Miranda <alessioufv@yahoo.com.br>
Message-ID: <200903220748.52902.mlf@dlsi.ua.es>
Content-Type: text/plain; charset="utf-8"

Dear Alessio,

Just some comments.

(1) Many have already tried to write tools to make Apertium dictionary
manipulation frendlier. They have encountered all sorts of problems, because
of the many different ways one can make entries in the defined format (it's
quite free). If you aren't familiar with _real_ Apertium dictionaries (as you
say below), you may miss some of these problems unless you work very hard in
the days to come.

(2) If you have a "project-team of students" there, that's great! they should
come forward, and apply for the Apertium GSoC. Their proposals should clearly
state which part of the task they are going to do, and they should be
individually feasible, because not all will get the scholarhsip.

(3) A word of caution. I will vote against funding building any proposal that
generates obscure code to be run by a non-free, proprietary player (Adobe
Flash player), even if the source of the tools used to generate it are
(apparently) free (Adobe Flex). This is like saying that we will generate
Microsoft Word files using some free software because OpenOffice.org can open
most of them reasonably (read "Flash" here and "gnash" or "swfdec"). We're an
open-source project, and we should only resort to reverse-engineered solutions
when forced to. But when developing, I advocate open, standard, free ones.

I had already hinted at some of these points of view in the #apertium IRC when
we talked.

Good luck

Mikel Forcada


El Saturday 21 March 2009 18:41:38 Al?ssio Miranda va escriure:
> Hello All....
>
> I'm Al?ssio, and i have a ambitious objective to develop a complete
> environment to create and manage languages pairs with a cooperative
> paradigm support in Apertium. It is a huge and complex project that i want
> to do for my Doctor degree project at UFPR, Brazil.
> For that i'm trying to create a project-team of students in the
> department of my University and I want help of all community to increase
> the Idea. Step by step this project will became alive.
>
> My Master degree dissertation, that I will defend in 20/April, will
> attack this points,
> - A study of apertium process of create a language pair by the user
> point of view,
> - An Overview of this Huge objective platform do develop languages
> - And a explanation of prototype interface for manager Monolingual and
> Bilingual Dictionary.
>
> So, with this study, I want do create the first Gap.
>
> The idea is a tool to manager initially the apertium files with a better
> interface.
> In this first version the objective is:
> Manipulate Completely Monolingual and Bilingual Dictionary's.
> With this tool developers will not need to manager xml files direct.
> (Monolingual and Bilingual Dictionary's)
> They will open xml Files of Monolingual and Bilingual Dictionary's,
> edit then with the interface and the xml file will be generated
> automatically .
>
> I'm a Computer Science Student and my knowledge of linguistic and create
> complex Apertium Language Pair is Limited.
> In this first fase i'm creating a non-functional prototype with Adobe
> Flex, that simulate the interface of the tool. I want suggestions of how
> develops think to create the languages and interface features they want to
> make easily the process of create a language.
> The Firs idea is for Monolingual and Bilingual Dictionarys, but people
> who have ides of others process will be welcome.
>
> In the first time the prototipe will be diponible in
> www.alessiojr.com/wiklats.
>
> Sorry by the bad english.
>
> Att


--
Mikel L. Forcada <mlf@dlsi.ua.es>
http://www.dlsi.ua.es/~mlf/


------------------------------

Message: 3
Date: Sun, 22 Mar 2009 12:39:04 +0000
From: Kevin Donnelly <kevin@dotmon.com>
Subject: Re: [Apertium-stuff] New Apertium Tool Project
To: apertium-stuff@lists.sourceforge.net
Message-ID: <200903221239.04350.kevin@dotmon.com>
Content-Type: text/plain; charset="utf-8"

On Saturday 21 March 2009 17:24, Al?ssio Miranda wrote:
> My Master degree dissertation, that I will defend in 20/April, will
> attack this points,
> - A study of apertium process of create a language pair by the user
> point of view,
> - An Overview of this Huge objective platform do develop languages
> - And a explanation of prototype interface for manager Monolingual and
> Bilingual Dictionary.

I would like to read this, if a copy is available.

> In this first fase i'm creating a non-functional prototype with Adobe
> Flex, that simulate the interface of the tool. I want suggestions of how
> develops think to create the languages and interface features they want to
> make easily the process of create a language.

I would be very interested in the idea of a GUI, because I think that is the
next step forward for Apertium.

However, I would support the two points Mikel has made:

(1) Because the XML format is tweakable, most dictionaries probably diverge in
the details. I personally dislike the fact that a "fixed" data structure can
be modified ad hoc like this (it is one of the reasons I dislike XML), but on
the other hand it is helpful in allowing Apertium to adapt to the wide range
of language structures that we meet in real life. So it might be a good idea
to work on a particular language pair "in the wild" rather than on an
idealised version of the dictionary structure.

(2) I think most people working on Apertium will be entirely against the idea
of Apertium-related material that can only be run using a proprietary app.
Adobe in the past has shown itself to be a company that does not
wholeheartedly support the FLOSS community. Apertium needs to be open from
end to end - Mikel's insistence on the need for open data was visionary, and
that needs to run through to any GUI. Is there any possibility you could
consider something like Appcelerator Titanium (http://titaniumapp.com), which
is multi-platform, multi-language, and available under the Apache license? A
separate SDK is available (http://appcelerator.org/appcelerator-sdk) if
Titanium does not do everything you want. This would be far preferable to
Adobe Flex.

> In the first time the prototipe will be diponible in
> www.alessiojr.com/wiklats.

Hmm - I just get a request to install Flash. Perhaps a static webpage might
be better.

> Sorry by the bad english.

Far better than my Portuguese :-)

--
Pob hwyl / Best wishes

Kevin Donnelly

www.cymraeg.org.uk - Welsh-English autotranslator
www.klebran.org.uk - Gwirydd gramadeg rhydd i'r Gymraeg
www.eurfa.org.uk - Geiriadur rhydd i'r Gymraeg

------------------------------

Message: 4
Date: Sun, 22 Mar 2009 13:45:35 +0000
From: "Jimmy O'Regan" <joregan@gmail.com>
Subject: Re: [Apertium-stuff] New Apertium Tool Project
To: apertium-stuff@lists.sourceforge.net
Message-ID:
<e94dc08d0903220645m4f1f7382wd7458576c4676bc9@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

2009/3/22 Mikel L. Forcada <mlf@dlsi.ua.es>:
> Dear Alessio,
>
> Just some comments.
>
> (1) Many have already tried to write tools to make Apertium dictionary
> manipulation frendlier. They have encountered all sorts of problems, because
> of the many different ways one can make entries in the defined format (it's
> quite free). If you aren't familiar with _real_ Apertium dictionaries (as you
> say below), you may miss some of these problems unless you work very hard in
> the days to come.
>

Yes; the dixtools package has some examples, for testing, that may be
useful here.

> (2) If you have a "project-team of students" there, that's great! they should
> come forward, and apply for the Apertium GSoC. Their proposals should clearly
> state which part of the task they are going to do, and they should be
> individually feasible, because not all will get the scholarhsip.
>
> (3) ?A word of caution. I will vote against funding building any proposal that
> generates obscure code to be run by a non-free, proprietary player (Adobe
> Flash player), even if the source of the tools used to generate it are
> (apparently) free (Adobe Flex). This is like saying that we will generate
> Microsoft Word files using some free software because OpenOffice.org can open
> most of them reasonably (read "Flash" here and "gnash" or "swfdec"). We're an
> open-source project, and we should only resort to reverse-engineered solutions
> when forced to. But when developing, I advocate open, standard, free ones.
>

And here's where I pick up my usual role of publicly disagreeing with
everyone else.

Correct me if I'm wrong, but didn't we have dixtools in Java before
Sun GPLd it? I don't see how the situation is different.

Al?ssio mentioned on IRC that this project will include a server part,
where the majority of the work will take place; he also mentioned that
it will be in Java, so if he builds it around dixtools, that should
remove the difficulties in working with real dictionaries.

We can always add a plain HTML interface to the server component
later; having looked at the prototype, I have to say that I really
like the interface: it looks to have the potential to be a tool that
'mere mortals' can use, and I'm all for that -- most of those mere
mortals would have Flash installed. UI experts are a bit thin on the
ground here; the more the merrier!

I do have a concern with the flash part, and that's simply that we
don't have anyone (to my knowledge) who can serve as a mentor for
that.

------------------------------

Message: 5
Date: Sun, 22 Mar 2009 21:14:50 +0545
From: Jacob Nordfalk <jacob.nordfalk@gmail.com>
Subject: Re: [Apertium-stuff] New Apertium Tool Project
To: apertium-stuff@lists.sourceforge.net, Al?ssio Miranda
<alessioufv@yahoo.com.br>
Message-ID:
<20cf28cd0903220829h4f45ca08q83f83b2cf4224e58@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear Al?ssio,

Let me first of all say that I and most others agree that some tools are
strongly needed to make editing of Apertium files easier, and very important
for faster adoption of Apertium worldwide.
For example Ive been with Apertium for 9 months, and I could have saved a
lot of time (at least a month or so) with easier editing capabilities.
Another guy I used a lot of time on helping, left the project, partly
becaurse it was too hard for him to master the XML.

My impression (I might be wrong, Francis, please correct me) is that the
tools made until now tries to hide away the complexity of the XML files,
making forms, and all these forms have in the end made the tools not that
usefull (timited to, more or less, only being only able to add data throught
a web form).

I would suggest another approach: That of making the XML edits more easy,
that of assisting the users editing the raw files, and making "views" into
the XML and helper tools for specific tasks.


In my experience the main obstacles for non-expert users with the dictionary
(.dix) XML files are:

1) It's hard to not make errors in the XML.
A simple tool that immediately highlighted the syntactic errors (for example
missing or too many closing </pardef>, or some other thing that broke the
XML) and helped/forced the user to deal with that immediately would speed up
things greatly.
Also "semantic" (sorry if I use this word wrong) checks, like whether a
given paradigm name existed and auto-completion of pardef names would be a
great help.
And knowledge of the specific things like "the <par> tag is always empty so
its always <par n="something"/> and never <par n="something">...</par>"
could also speed things up. And a prompt for insertion of new symbols (the
<sdef>'s in the top) and alphabet elements would save time.


2) It's hard to navigate in the files.
For mature language pairs the dix files are typically very large (the
Esperanto-English pair's dixes are of approx 30000 lines). A tool that
enabled the user to jump from a entry with a <par n="something"/> to the
paradigm definition (the <pardef n="something"/> ) and back, and perhaps
could showed the expanded version of the paradigm at the cursor (lt-expand),
or hiding all entries not similar to the current one, would be nice.


3) It's hard to find the right template for insertion for a specific task.
These templates vary from language to language, but "insert a noun", "insert
a verb", "insert a multi-word with inner inflection" etc etc. would be very
useful. Right now you have to seach the files for something similar and copy
and paste.


4) A new user needs to be able to see the result of an edit immediately.
It's really simple in principle; just pipe a lt-expand of the currently
edited entry throught the stages of Apertium and show it (like showing the
stages of the edited word in Apertium-view/Apertium-viewer). That's the
principle (but in reality it's generally impossible to assert anything
beyond that 'make' should be invoked).


So, I see a plugin into a mature XML editor (or a program which i.a. uses a
mature XML editor) giving some usefull "views" of the XML and assisted the
user making some common tasks (beginners could then just use the simplest of
them) as the best idea here.

Some time ago I was almost starting out to make such an editor in Java, but
I ended up doing Apertium-viewer instead. Now I'll probably never get to it,
but I'll gladly assist and advice you if you would like to start one in Java
(perhaps even using some of the Apertium-viewer code).

2009/3/22 Jimmy O'Regan <joregan@gmail.com>

> 2009/3/22 Mikel L. Forcada <mlf@dlsi.ua.es>:
> > Dear Alessio,
> >
> > Just some comments.
> >
> > (1) Many have already tried to write tools to make Apertium dictionary
> > manipulation frendlier. They have encountered all sorts of problems,
> because
> > of the many different ways one can make entries in the defined format
> (it's
> > quite free). If you aren't familiar with _real_ Apertium dictionaries (as
> you
> > say below), you may miss some of these problems unless you work very hard
> in
> > the days to come.
> >
>
> Yes; the dixtools package has some examples, for testing, that may be
> useful here.


Yes; it will be very hard to capture all the different ways of doing
.dix/.metadix files (a metadix file is a XML file, which after being
tranformed somehow by XSL becomes a valid dix file) in a "closed" tool, like
web forms or something similar.

> > (3) A word of caution. I will vote against funding building any proposal
> that
> > generates obscure code to be run by a non-free, proprietary player (Adobe
> > Flash player), even if the source of the tools used to generate it are
> > (apparently) free (Adobe Flex). This is like saying that we will generate
> > Microsoft Word files using some free software because OpenOffice.org can
> open
> > most of them reasonably (read "Flash" here and "gnash" or "swfdec").
> We're an
> > open-source project, and we should only resort to reverse-engineered
> solutions
> > when forced to. But when developing, I advocate open, standard, free
> ones.
> >
>
> And here's where I pick up my usual role of publicly disagreeing with
> everyone else.


No, Jimmy, this time I actually agree with you, to an extend :-)

But I see the problems with Flash as that no-one else uses Flash, so the
project stands at risk of being abondoned if/when Al?ssio leaves Apertium (I
would suggest using Java, of course ;-).


he also mentioned that
> it will be in Java, so if he builds it around dixtools, that should
> remove the difficulties in working with real dictionaries.


While dixtools is a stable Java platform for handling Apertium files I would
like to underline that the XML handling is not particular elegant and I
would suggest to improve/simplify the XML handling before building anything
on top of dixtools (see note on
http://wiki.apertium.org/wiki/Apertium-dixtools).


>
>
> We can always add a plain HTML interface to the server component
> later; having looked at the prototype, I have to say that I really
> like the interface: it looks to have the potential to be a tool that
> 'mere mortals' can use, and I'm all for that -- most of those mere
> mortals would have Flash installed. UI experts are a bit thin on the
> ground here; the more the merrier!


I also looked at the prototype (looks extremely cool, BTW!).
And I'm a UI expert (or at least I've done a dozen of UI's, and mentored
more than 50 :-).

I have 3 comments to the approach:

A) As a server application its alien to the current way of working with .dix
files.
Here's the "current way" of working as I see it:

while (time < bedtime+2hours) {
edit();
save();
compile();
if (compiles) {
run();
check_result();
if (looks_ok) {
run_on_big_corpus();
compare_with_previous_corpus_run();
svn commit;
}
}
}
svn commit;

I suggest you make a client application (an editor), or you specify how you
imagine the working cycle when using your application. Perhaps
Francis, could you point to the existing web forms for adding words?


B) It seems like a "closed" system (like a web form, however advanced),
which don't allow direct access to edit the XML.
Therefore it'll probably not be suitable for all language pairs.


C) Before making any design decisions, make sure your system is scalable to
dixes of more than 25.000 lines (entries).

Yours, and good luck!
Jacob

PS Dive into and working with a language pair is almost mandatory, if you
want to do a good job with this task.

--
Jacob Nordfalk
Venu al la plej granda kultura evento en esperantujo: Kultura
Esperanto-Festivalo - la 7a ?is la 12a de julio 2009 - http://kef.saluton.dk
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com

------------------------------

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


End of Apertium-stuff Digest, Vol 23, Issue 8
*********************************************

No comments: