A while ago I came across www.dict.org, the storefront of the DICT Development Group—a large number of enthusiasts who make it possible to run language resources (dictionaries, thesauri, etc) at no cost. I’m hesitant to say it’s an open source movement because I haven’t seen them label is as such, but it’s close—many helping hands, passion for software and willingness to make technology available to others for free.
What Is DICT?
DICT is a protocol defined in RFC 2229 as follows:
The Dictionary Server Protocol (DICT) is a TCP transaction based query/response protocol that allows a client to access dictionary definitions from a set of natural language dictionary databases. […]
The DICT protocol is designed to provide access to multiple databases. Word definitions can be requested, the word index can be searched (using an easily extended set of algorithms), information about the server can be provided (e.g., which index search strategies are supported, or which databases are available), and information about a database can be provided (e.g., copyright, citation, or distribution information).
You can find many implementations of both servers and clients in various languages which adhere to RFC 2229, albeit there are neither dictionary servers nor clients developed for the .NET Framework, so I took time to design a client myself.
You can freely download the source code of my dictionary client. By default, it points to dict.org (port 2628, as per RFC 2229) and interacts with their dictionary server. I put up an online dictionary client page, which is similar to theirs, in my Tools section.
Dictionary Trivia
Here’s an interesting fact for you. Take a look at the top 1000 words searched at dict.org. The search ratio of f*** versus grace is 25:1. I’m surprised yahoo tops the list. I’m shocked that fewer people know what yahoo is than those who want to become familiar with the terms f*** and pussy. With grace trailing the list we can rest assured all is well with humanity. :)
Resources
All resources at dict.org are geared toward English speakers for the most part. There a couple of fabulous multi-lingual resources out there, though:
- mova.org: an impressive collection of online Slavic dictionaries. To tell you the truth, there more than only Slavic ones there. For example, they have German, Swedish and Dutch dictionaries which are not Slavic languages.
- qamoose.arabeyes.org: This is an Arabic dictionary. I plead complete ignorance in Arabic languages, but since we’re speaking UTF-8 here, my client works with it too. :)
- livid.3322.org: this is a mind-boggling collection of dictionaries. Be forewarned, though, that the site is very slow. Also, I don’t think they serve Chinese right. If I decode the stream with a Simplified Chinese codepage it looks fine, but with UTF-8 it converts it to garbage. Judging from the fact that I can read from the Arabic (above), as well as Russian, Polish, German, etc, dictionaries just fine, I assume they send their text in wrong encoding.
In code download you’ll find the same web page I made available in my Tools section. I included dictionary servers I listed above, but commented them out. All you have to do to connect to a DICT server is set the DictionaryServer property. If the dictionary server is too slow, increase the value of the Timeout property, which defaults to 5000 milliseconds = 5 seconds.
L10N Help Needed
I put all text into a resource file. Those are error messages the client returns under various circumstances. I sublclassed my own exception class to wrap geeky errors from dictionary servers into something you won’t be ashamed to present to users.
If you are in the mood for some “community service” and know how to translate .NET resource files, grab anrControls.Dict.NET.resx, translate it and send it to me. Please, remember to rename the file to reflect which culture the translation is for. Don’t forget to give me your name and/or URL so I can give you due credit.
By the way, if anyone comes across a Japanese DICT server, please let me know.
Conclusion
It has been an interesting project which had me make some challenging decisions along the way, and I’m going to share them in upcoming posts. It is a part of a bigger project I have in mind, and I’ll be publishing bits and pieces here.
Just In