EDICT/JMdict
Future Directions

Introduction

This page contains some thoughts I have been having about the future (if any) of the EDICT and JMdict project/files, and some possible courses it might take.

I welcome discussion, feedback, etc. It can be posted on the sci.lang.japan newsgroup or emailed to me here.

Wiktionary-oriented discussion is probably best carried out on the discussion page there.

Why Am I Raising This Topic?

Well, editing and coordinating these files has been largely a one-man-band. Me. Sure there has been a lot of advice from others, and certainly masses of input, but I have set the standards, done the updates, released new versions, etc. etc.

I think the files are useful enough and of a good enough quality that they deserve to live on, expand, be maintained, etc. They are the only freely-available and parseable source of Japanese-English lexical material.

I would like to see a future where:

  1. the project is continuing and to a major extent self-sustaining, with the edit/update processes spread over a larger group of people;

  2. it is not at all dependent on my continued involvement. I won't be around forever, and in fact after 14 years of EDICT I am seriously thinking about the rest of my life (of which there is not an awful lot left.)

  3. it is not dependent on support from Monash University. My honorary appointment there will not continue for ever, and internal changes at Monash may well result in withdrawal of server support.

My Vision, Hopes, Whatever.

What I would like to see is something like this:

  1. the underlying database from which the EDICT and JMdict files are generated migrates from its present form (a large text file on my PC) to an on-line database where it can be seen, edited, expanded, etc. by a community of users;

  2. the edits to the database undergo some form of moderation/oversight, either prior to their commitment (e.g. a moderation panel) or a more passive after-the-event fixing of mistakes (more the Wiki model);

  3. from the on-line database a regular and automatic extraction take place to generate the distributed forms of the file (currently EDICT and JMdict, but there may be other formats in the future.)

  4. a more "open" copyright and usage licence arrangment. I am considering moving to a Creative Commons licence. The one I have in mind is the Attribution licence, which is very similar to the current one.

Options

Well, as I see it there are two main ways this vision could be achieved:

  1. A Special System

    Outline. A server be developed around updating the EDICT/JMdict database.

    Pros:

    Cons:

    Problems:

  2. Move Into An Established Environment, e.g. Wiktionary

    Outline. The entire dictionary database is uploaded into an established "wiki" environment, e.g. Wiktionary. Edits would happen in that environment, i.e. anyone could edit any entry.

    Pros:

    Cons:

    Problems: