ID:153928
 
For dictionary subscribers:

The current size of the pre-looked up word list is about 1,360 words, and is about 14k file size. It's my intention to increase it to somewhere around 5k words by next release, and have it all ready by Sunday. Just thought I'd post this here in Design Philosophy, because the strategy I've thought up is kind of interesting.

As you may know, the defined() proc in the dictionary library will go to the web to see if a word is valid if it is not found in the local word list. Simply feeding in a bunch of words makes the list grow. So what's the most accessible source of new words...? The web of course!

I'm going to teach the dictionary to crawl the web, parse pages into individual words, and look up each word it finds online. The newest lookup source I've found returns results very very quickly, so this should be relatively painless.
Question: Is Kwijibo capable of handling words with apostrophes correctly?
The dictionary library should be perfectly capable of finding those words, which means they can wend their way into the list.

Lummox JR
In response to Lummox JR (#1)
Lummox JR wrote:
Question: Is Kwijibo capable of handling words with apostrophes correctly?
The dictionary library should be perfectly capable of finding those words, which means they can wend their way into the list.

Lummox JR

I can't offhand think of any apostrophized word that should be considered legal. Can you give an example?
In response to Skysaw (#2)
Skysaw wrote:
Lummox JR wrote:
Question: Is Kwijibo capable of handling words with apostrophes correctly?
The dictionary library should be perfectly capable of finding those words, which means they can wend their way into the list.

I can't offhand think of any apostrophized word that should be considered legal. Can you give an example?

No no, I don't mean legal in Kwijibo; I mean legal for the purposes of the dictionary.

In theory if your dictionary library were to get merge routines, and you merged it with someone else's good-word dictionary, you could well find it contained apostrophes. In the position of one of your Kwijibo bots, they might see such a word having the right sequence of letters for what's already on the board, but have nothing except a wild to play for it. If they played the wild in place of the apostrophe where nothing else would fit, you'd then have an impossible word. (Whether they'd get away with using the word on a challenge I don't know; it depends on whether you included some of the same checks on input that you would in player input.)

Lummox JR
In response to Lummox JR (#3)
Lummox JR wrote:
No no, I don't mean legal in Kwijibo; I mean legal for the purposes of the dictionary.

The dictionary routines assume legality in the same way that a Scrabble dictionary does. That is, they discard proper nouns, hyphenated words, etc. All games that use the library will have the same behavoir. If you want to use the dictionary library and not discard those words, you're going to have to rewrite it, or wait for an update on my side that may or may not ever come.

In theory if your dictionary library were to get merge routines, and you merged it with someone else's good-word dictionary, you could well find it contained apostrophes. In the position of one of your Kwijibo bots, they might see such a word having the right sequence of letters for what's already on the board, but have nothing except a wild to play for it. If they played the wild in place of the apostrophe where nothing else would fit, you'd then have an impossible word. (Whether they'd get away with using the word on a challenge I don't know; it depends on whether you included some of the same checks on input that you would in player input.)

If someone's dictionary file contained such words, it would be either due to a bug, or to their muddling about in the file. Only complete and verified words get added to the file.

When a bot plays a tile (unless he is bluffing), he has a word in mind that could be formed. Only this possible word is part of the dictionary. Wild cards cannot be used to substitute for apostrophes, because these words will always return as illegal from the dictionary's routines.

As a side note, Bobosquish encountered a strange bug last night that suggested a bot may have inserted a wildcard in place of the null string between two letters, but this is unverified, and certainly not the intended behavior.
In response to Skysaw (#4)
Skysaw wrote:
Lummox JR wrote:
No no, I don't mean legal in Kwijibo; I mean legal for the purposes of the dictionary.

The dictionary routines assume legality in the same way that a Scrabble dictionary does. That is, they discard proper nouns, hyphenated words, etc. All games that use the library will have the same behavoir. If you want to use the dictionary library and not discard those words, you're going to have to rewrite it, or wait for an update on my side that may or may not ever come.

Suggestion: Give your dictionary a proc that checks on all these rules of legality, based on bit flags that are user-settable.
var/const/DICT_PROPER = 1   // allow proper nouns
var/const/DICT_HYPHEN = 2 // allow hyphenation
var/const/DICT_APOS = 4 // allow apostrophe

var/dictionary_legal_word_conditions = 0
In theory if your dictionary library were to get merge routines, and you merged it with someone else's good-word dictionary, you could well find it contained apostrophes. In the position of one of your Kwijibo bots, they might see such a word having the right sequence of letters for what's already on the board, but have nothing except a wild to play for it. If they played the wild in place of the apostrophe where nothing else would fit, you'd then have an impossible word. (Whether they'd get away with using the word on a challenge I don't know; it depends on whether you included some of the same checks on input that you would in player input.)

If someone's dictionary file contained such words, it would be either due to a bug, or to their muddling about in the file. Only complete and verified words get added to the file.

If you do update your library to allow other word types, it would be well to include such checks in Kwijibo. At present it certainly couldn't hurt.

When a bot plays a tile (unless he is bluffing), he has a word in mind that could be formed. Only this possible word is part of the dictionary. Wild cards cannot be used to substitute for apostrophes, because these words will always return as illegal from the dictionary's routines.

But this is assuming valid dictionary data according to the rules of Kwijibo. I can see this not being the case in the future, as there'd be lots of reasons to expand a common dictionary for those other word types.

As a side note, Bobosquish encountered a strange bug last night that suggested a bot may have inserted a wildcard in place of the null string between two letters, but this is unverified, and certainly not the intended behavior.

Curious. What was the word?

Lummox JR
On UNIX systems, there's a dictionary file (I think /usr/dict/words). You have a Mac with OSX, right?.. methinks there should be one somewhere there. It would be pretty easy just to feed the file to your dictionary, although it may be too big for your purposes (I'm not really sure).

If you don't have it, I have a pretty good sized dictionary file I could send you.

But the webcrawling idea is much cooler.

-AbyssDragon
In response to Lummox JR (#5)
The word in question was "jape." The board showed "j*ape." Very strange behavior.

Your ideas for including alternate lookup rules might actually come along eventually. The thing to keep in mind is that different sources may have to be consulted to get the results. For example, spellcheck.net is helpful because it will say a word is misspelled if it is a proper noun, and was submitted without capitalization. Very helpful for the current usage, since words are sent using the ckey() form.

If I implement this, I'll most likely just keep the words in seperate saved files, so turning on and off the options will simply mean looking at different loaded lists. That will also help with the merge problem, since proper noun lists would only merge with other proper noun lists.
In response to Skysaw (#7)
Skysaw wrote:
The word in question was "jape." The board showed "j*ape." Very strange behavior.

Are you sure it was "jape", you jackanapes? :)