[LIB] Add spellchecking to your applications - support for non-English languages soon - Windows Mobile Software Development

WMSpellChecker
I have been programming for Windows Mobile for a few months by now, mainly using Basic4ppc which is an excellent developing-software, at a low cost, for producing great Windows Mobile-applications (but also Desktop-applications). Basic4ppc is based upon the .Net Framework.
One of the strenght of Basic4ppc is that it has great support and not only from the developer but also from other users who are supplying Basic4ppc with a lot of extra featues through external libraries. I got interested in writing a library myself and this is how the WMSpellChecker-library was born. However, from the very beginning, my idea was that the library should be compatible not only with Basic4ppc but also with Windows Mobile-applications developed in Visual Studio and SharpDevelop using VB.NET and C#.
I have seen commercial solutions for spellchecking but since I wanted to learn writing a library, I thought this would be a nice thing to give "for free" to fellow developers.
Well, let me get back to the WMSpellChecker-library:
Basically, a spell checker customarily consists of two parts:
1) A set of routines for scanning text and extracting words, and
2) An algorithm for comparing the extracted words against a known list of correctly spelled words (i.e., the dictionary).
However, what mentioned above is only a "half" spell checker since these days spell checkers also suggest replacements/corrections for misspelled words (among other things such as synonyms and grammar-hints). Said suggestions can be proposed by a spellchecking-engine based upon various techniques:
- phonetic algorithms such as "Soundex" among others.
- word lists containing common misspelled words and letters commonly inverted
- functions called "Near Miss Strategy" and introduced by one of the first spell-checkers on the market, namely Ispell for UNIX and with its roots dating back to 1971.
- algorithms like "edit distance" which measures the amount of difference between two sequences. A famous one is the "Levenshtein distance".
- and other techniques
The "techniques" mentioned above have all been implemented in the library.
I am aware of the fact that (at least) WM6 already offers spelling-suggestions and a spell checker if PocketWord (Office) has been installed but still I liked this idea so I decided to make a library. In any case, as far as I know, only the dictionary corresponding to the language of WM6 is being installed so if you want to spell check words in other languages you cannot do so.
The way it works.....
First of all, apart from referencing the library itself, you need to add two objects to your application, namely DICTIONARY and COMPUTEDETECTION.
Then you need to load the dictionary-files by using "LoadDict". Currently they consist of four separate files. However, I may change this in a future release. The dictionary-files must be located in the application-directory although you can create sub-folders. This first release only supports English and the dictionaries distributed with the library must not be tempered with. Next release will bring support for other languages and will also include a separate program for handling dictionaries.
Once the dictionaries have been loaded, you can start the spellchecking by calling the library using "ComputeDetection" which passes on your textbox-control to the library. In case there are words that are not present in the dictionary, then a set of suggestions will be returned to the calling application and at the same time the word which was not found will be shown in the textbox in capital-letters. The suggestions produced by the library can be obtained using "ReturnSuggestions" which returns a string-array.
Once you have shown the suggestions returned by the library, you can let your user in your application decide what to do i.e.
-"IgnoreWord" - ignoring the wrong word
-"AddWord" - adding an own word to replace the wrong word
-"ReplaceWord" - replacing the wrong word with a word from the suggestions
At this point, you tell the library to continue spellchecking by using "ContinueDetection". You should also verify if spellchecking has been terminated by using "IsSpellingFinished".
At any time, you can interrupt spellchecking by using "UnloadDict". This will be useful in a future release of the library so you can unload an English dictionary and to replace it with, for instance, a French dictionary without exiting your own application. However, before unloading the dictionary, you should verify if a dictionary has already been loaded or not by using "IsDictionaryLoaded".
In the help-file, you can find more important information as to the methods/properties available. Please also check out the two sample-projects present in the attachment where the source-code has been fully commented. One is using a classic spellchecking-interface and the other one is using context-menus.
Other comments....
This first release has some limitations, such as support only for English and the need for a textbox-control. However, I will add other features in the future, for instance:
-support for other languages
-dictionary-tools (for creating dictionaries) - will be an external program
-possibility to add a user-dictionary
-possibility to limit amount of suggestions produced by the library (by using a "ranking-system")
-no further need for a textbox-control in your application. Your application will be able to pass on to the library only the word(s) you wish to spellcheck and the library will only return the suggestion(s). In this way, the spellchecker-library will not "interfere" with your application and you can use whichever control you prefer although you as a developer has to take care of the words to be passed on the library for verification.
-spellchecking "on the fly"
-extended error-handling
A few notes regarding dictionaries....
The English dictionary supplied with the library is composed of nearly 70'000 words. Dictionaries to be used with the library must be sorted and each word in the dictionary must use LF = chr(10) as line-endings. In addition, the dictionary should be saved as UTF-8.
From the dictionary, a KeyMap is created using either a Soundex - or a DoubleMetaphone-algorithm. In this moment, the KeyMap is being furnished with the library and loaded as an external file but future releases might create it on the fly (or at least an option to do so). With next release, I will add a utility, to be run from the Desktop, which will let you create your own dictionary and corresponding KeyMap which are compatible with WMSpellChecker.
Unlike English and Scandinavian ones, dictionaries for German and Latin languages such as Spanish, Italian and French will probably be rather large. This is due to the fact that German, Italian and other similar languages use a lot of suffixes, for instance when creating verbs. In order to overcome this, certain famous spellcheckers such as ASpell, ISpell, HunSpell (used by OpenOffice) have implemented dictionaries which mostly contain only the base-form of words/verbs. However, they use a supplementary file called "affix" which contains a lot of grammar-rules and this file together with the simplified dictionary overcomes the problem of large dictionaries. However, I believe this system is probably rather memory- and performance-hungry and might not be the best solution for Windows Mobile and PPC. However, maybe in the future I will look into this.
Another negative side-effect of using a too large dictionary is that said dictionary may include more obscure words which will increase the risk that the spelling-engine will "miss" real-word errors. For instance, the word wether illustrates this. The word is, arguably, so obscure that any occurrence of wether in a passage is more likely to be a misspelling of weather or whether than a genuine occurrence of wether, so that a spellchecker that did not have the word in its dictionary would do better than one that did.
Conclusion....
The library can be used with projects developed with Basic4ppc (PPC and Desktop) but should also work with projects created in Visual Studio and SharpDevelop (using VB.NET and C#). The library has been compiled targeting Framework Version 2.0.
Library-version: 1.0
Helpfile-version: 1.0
As mentioned before, this is my first serious library. Please check it out and let me know how well it integrates in your applications.
Please also give me feedback, suggestions for improvements, missing features, bug-reports etc.
The idea is to add spelling-support for other languages as well and here I might need some help from end-users. I will let you know.
UPDATE - 17/08/2009: I will in the next days release an updated version with support for other languages as well (starting with French, German, Swedish and Spanish).
Enjoy!
Rgds,
Tilleke

Reserved for future use

Hopefully this evening or by the latest tomorrow, I will upload a new release of the spellchecking-library :
1) which will permit you to pass on a word to verify to the library and the library will return the suggestions without "interfering" with your application. In this way, there is no further need for a textbox-control in your application and you can apply spellchecking to other controls as well (such as the Webbrowser-control).
2) if I find the time, I will add other languages as well in above release otherwise this will follow in the next release.
By the way, has anyone tested it yet? If yes, does it work? Any problems? Please let me know.
rgds,
tilleke

tilleke,
Thanks for this. Looks great, especially the foriegn language capability. I would like to add Thai and Lao to the libraries.
Thanks.

Hmm..I'd love to be able to add support for Thai and Lao but I foresee a few problems:
1) in order to do so, I would need word lists (dictionaries) in those languages and which would be free to use/distribute. If you have any, please let me know. I tried to google for some but I couldn't find any.
2) I couldn't locate an emulator supporting Thai or Lao which I would need to work with Thai-fonts. I guess there must be some kind of support for UniCode:
ก ข ฃ ค ฅ ฆ ง จ ฉ ช ซ ฌ ญ ฎ ฏ ฐ
ฑ ฒ ณ ด ต ถ ท ธ น บ ป ผ ฝ พ ฟ ภ
ม ย ร ฤ ฤๅ ล ฦ ฦๅ ว ศ ษ ส ห ฬ อ ฮ
Click to expand...
Click to collapse
In any case, if one found a dictionary then maybe the font problem could be resolved in one way or another
3) I don't know if the "techniques" mentioned by me in my first post, can be applied to the languages of Thai and Lao...
nagbenjy said:
tilleke,
Thanks for this. Looks great, especially the foriegn language capability. I would like to add Thai and Lao to the libraries.
Thanks.
Click to expand...
Click to collapse

For Thai there are a couple of SIPs available - Thaiwince and Thai-G. I don't have the links handy, but a search on Google will find them.
I don't use either of them. What I did was copy tahoma and tahomabd from the WINDOWS\FONTS folder on the desktop. Opened them in font creator and added the Thai and Lao fonts. I then copied the new fonts to the phone WINDOWS directory overwriting the existing fonts. I use Resco Keyboard Pro to enter Thai and Lao text.
I can post the fonts and the Lao Language skin if you want them. I will aso find some word lists. BUT after thinking more about your methodology in your first post, I don't think it will work. Thai and Lao only have spaces at the end of phrases and sentences, not between words.
Thanks.

Interesting. In any case, I found a worlist for Thai If you send me by PM your e-mail address, then I can send it to you and you can let me know if it is any good.
Out of curiosity: How do you write in Thai the following sentences?
"Today the sun is shining. I think I will go to the beach with my friends. Do you want to come with me?"
nagbenjy said:
Thai and Lao only have spaces at the end of phrases and sentences, not between words.
Thanks.
Click to expand...
Click to collapse

BTW, do you know if Thai-SIPS (Thaiwince and Thai-G) or keyboards such as the one mentioned by you, Resco Keyboard Pro, insert the Unicode Character 'ZERO WIDTH SPACE' (U+200B) between words. If it does, then one could simplify the spell-checking.
See this page for further information:
http://blogamundo.net/dev/2006/12/28/the-zero-width-space/
Originally Posted by nagbenjy
Thai and Lao only have spaces at the end of phrases and sentences, not between words.
Thanks.
Click to expand...
Click to collapse

I am still looking into the the ZERO WIDTH SPACE and will reply later.
In reply to:
"Today the sun is shining. I think I will go to the beach with my friends. Do you want to come with me?"
Depends on where you live, hot climate or cold climate. For cold climate where the sun hardly shines:
"วันนี้มีแสงแดด ผมคิดว่าจะไปชายหาดกับเพื่อน คุณอยากไปด้วยไหม"
Rough transcription, no breaks separating words, no punctuation:
"wanneemiisaengdaed phomkidwajapaichaaihaadkapphuon khunyaakpaiduaymai"
breaks separating words:
"วัน นี้ มี แสง แดด ผม คิด ว่า จะ ไป ชาย หาด กับ เพื่อน. คุณ อยาก ไป ด้วย ไหม?"
wan nee mii saeng daed. phom kid wa ja pai chaaihaad kap phuon. khun yaak pai duay mai?
Hot climate:
"วันนี้แดดจ้า ผมคิดว่าจะไปชายหาดกับเพื่อน คุณอยากไปด้วยไหม"
Rough transcription, no breaks separating words, no punctuation
wanneedaedjaa phomkidwajapaichaaihaadkapphuon khunyaakpaiduaymai
breaks separating words:
วัน นี้ มี แสง แดด ผม คิด ว่า จะ ไป ชาย หาด กับ เพื่อน. คุณ อยาก ไป ด้วย ไหม?
wan nee mii daed jaa. phom kid wa ja pai chaaihaad kap phuon. khun yaak pai duay mai?
Thanks for the link, interesting
NAG

Update - 17/08/2009: - I will in the next days release an updated version with support for other languages as well (starting with French, German, Swedish and Spanish).
In this regard, I need some help with verifying that the suggested replacements generated by the spellchecking-engine are accurate and reasonable. I need to verify Spanish, French and German so if (any) above languages is (are) your mother-tongue(s) or if you know them very well, please send me a PM and I will send you an application that can be run on a normal PC-desktop (Windows).

Ahh this is your home ...
Funny that you're working on a spell checker as myself I've been looking for a replacement to phatspell for a long time and then gave up. You could find my threads http://forum.xda-developers.com/showthread.php?t=350563.
....

Hal_rr:
this project (library) is more intended for fellow developers who wish to add spellchecking to their applications. For end-users, there is not much use of this library since it's not a standalone program.
For the time being, this project is on hold (although it has evolved a lot compared to the features described in my first post/introduction). However if a developer is interested in an updated version, just let me know.
Who knows, I might one day write a small texteditor with spellchecking support, just for the fun of it.

Related

T9 input method - *UPDATE*

Hello everyone.
Thanks to all those people offering their services as beta testers.
I have begun work on the T9 input method. I would like some help in designing a skin. This needs to be a .bmp file. I will send you the initial bmp I have used for testing, and you should use similar layout for the buttons.
I don't mind if the layout changes, as long as it is easy to use with fingers (not stylus).
The size of the bitmap, however, must not change. It should be 240 x 160.
The yellow area I included for two function buttons (ie, change word, and number mode) 1 will do a space, and 0 will do punctuation.
To the administrators - could you possibly stick the bitmap on the site somewhere, as I don't have a server to use for downloads?
Martin,
First of all, u can use my server for storage, i figure the files r small so it wont mather. just e-mail me the files, or just put in a zip, and i'll send u the link. second I like to see if i can come up with a nice design.
Yeah, no probs.
I've also realise the keypad used for the phone is exactly the right size for me. So If you want to reskin the phone (maybe relable some of the buttons if it would make using it easir), then that is fine.
Everbody,
Here is the file from Martin. http://www.stoelman.com/T9_sms.bmp
Ok people... we've come up with an initial design for the keypad, and I've done the initial work on the T9 input panel. I have a suitable skin, and will be working with this to create button positions.
This skin is not final, and may be changed as we work on the functionality, but when it has been finalised, I will be opening up for new skins to be created. It is anticipated that this will be configurable on the final release, so you will not need to be disappointed if yours isn't chosen!
I am now writing the business logic. I hope to get plenty of work done this weekend, and I'll keep you all informed of progress.
Thanks
Martin
Martin,
How's your project comming along? is it allready far enough for beta testing.
Unfortunately not yet. I'm currently waiting for some dictionary stuff. As soon as I have that, it can continue.
Regards
Martin
UPDATE
Just to update you. I have been in touch with tegic over the licensing of T9. We are currently discussing this.
I have to warn you, though, that because i have to license T9 from Tegic, there will be royalties to pay. I don't yet know the amount, but I'm guessing at about £15 per unit sold. In order to provide assurances to Tegic, the software will have to be licensed based on the IMEI of the target device.
I will keep you all informed of the progress of these negotiations, if they even get beyond this stage.
Big corporates
Nice idea, I have used the t9 on many phones for years.
Siemens c35 and ME45, Sony j3e, Nokia 6310i.
Would making something similar not just simply work? or simplifing the keypad in this way is totally owned by t9 ?
I look forward to a bit of beta testing...
I would like to find a way around it. But it's difficult to get hold of an 'ideal' word list that is compressed, contains 'appropriate' words of the right quantity. and is indexed. I can do the indexing if a good dic is found.
I could also look at alternatives - just awaiting some more communication from T9.
I have done something similar for Palm with graphitti (graphitti can be hell, too ;-)) and would offer u any help u need.

Auto Text or Auto Correct Applications

I've been using my Dash on the weekends at certain times instead of my tytn and I am desperately looking for an autotext or autocorrect applicaton. Basically, I do a tremendous amount of emails on my phone and this application allows me to type in three letters in an email like "mrq" and have it replaced with "meeting request" or "btw" and replace it with "by the way". I use Kai' autocorrect.net on my tytn and it lets me write emails so much faster.
the xt9 on the dash is an annoyance and does not perform the same function.
Qoolapps makes qcaps and qcorrect, but it only works on the motorola q.
Bumping this up to the top again. If anyone has any suggestions, please help. I really need this.
Bump here --- (AUTOTEXT solutions for WM5/6)
I have used Kai Autocorrect but recently it seems to have issues with some of the WM6 ROM's - can't pin it down so input on that would be great.
But also if there are any other options - it amazes me this is such a simple yet killer app for the BB .... example type bb = Blackberry .. type wm = WindowsMobile.... thxm = Thanks Much!
Not using mouse stylus and all that jazz just on the fly word conversion and completion with a self defined library of words. AutoCorrect again does most but has not been updated in a long time.
Thanks for anything ---
I am confused, too: used it on a Excalibur? I thought it runs only on PPC??
ckolibab said:
I have used Kai Autocorrect but recently it seems to have issues with some of the WM6 ROM's - can't pin it down so input on that would be great.
...
Thanks for anything ---
Click to expand...
Click to collapse
XT9 (DB edit) or Autocorrect alternative.....
Correct -
The issue is the WM6 Smartphone version and PPC versions differ in how they do Autocorrect - which for many is brutal pain..... the PPC version has in /windows a hidden file called autocorrect.0409.txt which can be manipulated to add/delete words from your dictionary and you can control it.
The Smartphone/WM6 since it uses ughhhl XT9 yada it uses a different DB which is apparently tough to edit - manipulate - to anything with.
As much as I wish the Smartphone worked like the PPC, I think I am hoping for a prayer here that won't happen - so I see only 2 options that would get us close to that:
1 - Some way to get control to add/edit change this XT9 DB
2 - An add-in app (which I have asked a Dev to look at but says its rough) to run resident and give us the same as what exists on the PPC/WM6 on the smartphone.
The sad part is all in all I could see the SmartPhone needing this even MORE since it's keyboards are usually smaller any anything to minimize keystrokes is key.
This is so sad --- Anyone that can help I would be so grateful. Even if there is some way to port some of the WM6/PPC ROM components over would be VERY NICE!!!
bump
A call to all developers ... this is the only thing I miss from my BlackBerry ....
Someone must be able to amend either QCaps or MagicKBLite to work?

MS standard/official qwerty keyboard gone in WM6.5 - Solution

Hopefully this post clears things out for those still in grief.
I (like many others) missed the MS standard/official qwerty keyboard from my Diamond and HD WM6.1, which seems to have disappeared in WM6.5 (at least on my device and great number of other users, both TP2 and D2). I use it because it gives me access to all the special characters I need (for a nordic language) and has really handy gestures without having to install other software.
After reading through thread after thread with no solution other than 3rd party keyboard I found this:
Modify this registry:
[HKEY_CLASSES_ROOT\CLSID\{42429667-ae04-11d0-a4f8-00aa00a749b9}\IsSIPInputMethod]
"Default"="1" (by default it is "0")
After this tweak I can choose "Keyboard" from the drop down (or drop up ) menu and voila.
At lest this works for me, using stock ROM:
HTCTouchPro2_WM6.5_RUU_Rhodium_S_HTC_Europe_1.86.401.0_Radio_Rhodium_4.49.25.17_Signed_Ship
All thanks go to Omar302 (see post below, found in the Diamond2 section). He also has an interesting way for having MS keyboard on when stylus is out but otherwise using the HTC Touch Input.
Link: forum.xda-developers.com [SLASH] showpost.php?p=4874303&postcount=9
Sorry for how the link is set up. As a new user I'm not allowed to post links (modificators may please remove this restriction).

[Q] Can Hebrew work well on Tilt 2?

Hi,
I know this is a topic that has already been talked about a lot, but after going through many pages in different threads and a lot of experimenting, I still haven't found a complete solution.
I am using a Tilt 2 using a Titanium Energy ROM and am looking for a solution for Hebrew input. I have tried 3 cabs so far: HD2_HebrewEnabled_v1.2", "HebrewHD2", and "Uniscribe".
I have discovered that some apps work better with Hebrew then others. Word, OneNote, Phatnotes, etc. have a cursor problem with all three cabs (the cursor stays to the right, and after typing has a "mirror" effect and if you try to place it in a sentence to select text or edit, it will not actually insert or select there but at a different point). Notepad and PhatPad do not have this cursor problem.
The first 2 cabs have a problem with mixed text (heb & eng) in Word, etc. Uniscribe has a problem always putting punctuation to the right even in Notepad (the first two cabs are fine with this in all apps).
Has anyone found a cab that solves all?
Thanks
UPDATE!
After doing more searching, I found another cab at this thread "forum.xda-developers.com/showthread.php?t=554802" called simply "Hebrew.cab" and it is actually better then the other 3 I had tried. It allows mixed text and does fine with punctuation, the only remaining problem is the cursor in Word, etc.
I also wanted to detail some other things I came up with in my research. To get Hebrew on the hard keyboard I am using AEk mapper. A while back I found a layout provided by a generous member (OSM or rbroudo?) and I made up my own phonetic layout, bec I found using stickers very impractical. I found that I could also set a different shortcut to the Home key, w/o interfering with AEK.
Acrobat Reader 2.0 works well with Hebrew and has a rich feature set, you just have to reverse your input when entering a search term.
Ulrich Grieve has a Hebrew Reader for viewing Palm text files.
I found an epub reader called "Freda" which also works with html that works with Hebrew. I found (on html at least) that the settings that worked are "Force Line Direction = RTL", "Force Word Direction = LTR" (I also set encoding to utf8 but don't know if that is necessary).
Perhaps this will be useful for you (if it's been updated for 6.1/6.5):
http://www.penreader.com/pocket-pc/hebrew/Language_Extender_Hebrew.html
AlecTaylor said:
Perhaps this will be useful for you (if it's been updated for 6.1/6.5):
http://www.penreader.com/pocket-pc/hebrew/Language_Extender_Hebrew.html
Click to expand...
Click to collapse
Thanks for the response
I checked the website and highest listed is 6.0 In any case my experience with that software in the past wasn't so great and I am looking for a free solution.
All in all, despite a few quirks, I am pretty happy with the Hebrew situation now. WP7 doesn't support Hebrew at all. Symbian doesn't have anywhere near all the options I mentioned in post #2, not does Maemo on the n900.
Does anyone know how well Android is doing?
Hebrew for Touch pro 2
Hi
I'm a newbe i would like to be able to read and write Hebrew for Touch pro 2
(not change any menu)
What files do i need to install (can you post a link)
(I tried installing Hebrew support only.cab file from the_dude
I get hebrew words but not RTL)
Which ROM are you using?
I found the attached .cab file to work very well with the "Energy" ROM (I use the Titanium with enhanced Mobile Shell), I don't know if it works as well with the stock ROM.
(see my earlier post #2 for more tips)
If it doesn't work well I could give you other links

Bilingual post spelling options

The HTC One X has the option to underline badly spelled words. It seems to be related to the system language, and therefore it does not seem to be possible to make it bilingual. I would like to have the system check for both Dutch and English, but it seems I have to choose based on my system language. On the other hand, text prediction (so not the post-spelling check!) is able to work with two languages at a time.
Therefore my question: is it possible to set the post-spelling options to both Dutch and English, or will I have to live with just one language (as I fear at the moment)?
Bilingual trace
I'm afraid I'm replying not with an answer, but with an additional question.
I have the impression that the bilingual feature is strictly limited to predictions/suggestions while normally using the keyboard. Because I noticed that when I'm using the Trace feature (very handy in my opinion), it reverts to only using the System Language... sad
Did anyone else notice this? Is there any way to make the second language work for Trace as well (and for the post-spelling check)?
Shouldn't be that hard to develop, so it seems like they just forgot to add this. Except if I'm missing something?

Categories

Resources