Techie Help Needed with Wiki work

Lexicography – The art of writing dictionaries – has always been 99% Perspiration and 1% Inspiration.  But the ground has shifted massively in recent decades.  Now much of the sweat can be absorbed by computers.

I wanted to put this request out to techie-minded friends of our work: can anyone find a way for us to automatise the hyperlinking of words?

Let me explain: in Wiktionary, and Wikipedia too, a term can be further explored by hopping to a page for that particular word.  This is a crucial advantage in online dictionaries: if you don’t understand a word within a definition of a word, just click on it and you will get redirected to that word itself.

But the Kurdish Wiktionary often imports data from printed dictionaries – and these definitions do not contain hyperlinks.

Consider an example from the English Wiktionary:

———————————————–

step ladder (plural step ladders)

ladder with steps or treads instead of rungs that is hinged in the middle…
———————————————–

If you were curious as to what a rung was, just click on ‘rung’ and you’ll be able to see.

If you cleck the edit icon, you will see that the code for this entry reads as follows:

# A [[ladder]] with [[step]]s or [[tread]]s instead of [[rung]]s.

You will notice that the word ‘rungs’ is not hyperlinked, but rather ‘rung’, because the headword to look up is rung not rungs.  So a dictionary editor has to take care what exactly he hyperlinks.  And it’s a judgment call as to which of the words he hyperlinks, because it is clearly not useful to hyperlink common words like ‘a’, ‘with’ or ‘of’.

But my basic question here is: how can we automate a lot of this work?  Can we at least set up a short-cut so that all we have to do is highlight a word and press something like Alt-H in order to hyperlink it: ie insert square brackets round it?

I know that you can choose between visual editing and source editing.  The former does automate some of the tasks, but we have found that once you know the code fairly well it’s quicker to do it using the source code.  But tailor-made shortcuts would be a godsend.

Please enter comments if you have any ideas.

6 thoughts on “Techie Help Needed with Wiki work

  1. Hi Jerry. The highlighting and shortcut option should be quite straightforward to set up. Rather than editing directly in a web-browser, you would need to copy and paste the whole Wikitionary page code into a standalone text editor, which supports macros. When done editing, you can then paste the entry back into the Wikitionary web page and submit it. If you already have Microsoft Word, you could use this (using the macro recording option), alternatively there are other free or cheap text editors that support macros. Anything more intelligent would get a lot more complex. I can’t see it’s feasible to write software that could intelligently identify key words to hyperlink. It would be easier, but still quite tricky, to make software that automatically identifies the singular form of a plural word (e.g., knowing to retain the “e” in “faces” but not “dishes”). The best approach would seem to be a program that scans Wikitionary for each word in the definition, to cross-link all words that already have an entry (but this approach would not filter out trivial words). Tim

    Like

  2. I agree with Tim that editing the source code in an external text editor like Notepad++ is a quick way to get access to a lot more tools, such as macros.

    Another approach would be to write a browser extension that provides some helpful functions:
    – keyboard shortcut to hyperlink / unhyperlink the currently selected word
    – another keyboard shortcut to automatically hyperlink all words in the article

    Hyperlinking all words would be overkill, but maybe with the keyboard shortcut to unhyperlink the selected word, the job could get done more quickly.

    Another feature that could help with that task is setting a minimum length for words to auto-hyperlink; e.g. all words over 3 letters.

    Intelligently parsing a Kurdish word to find the relevant root or stem, so as to link to a valid wiktionary entry, would be complex. You could however have the extension cross-reference for existing wiktionary entries for the occurring word, as Tim said; and also for entries whose headword matches the beginning of the occurring word, or that have a common prefix of a certain minimum length (or, that match up to a suffix of a certain maximum length).

    What browser do you use? I couldn’t find any existing Chrome extensions to do this, but there are some related ones that could be used as a starting point if someone wanted to develop a special extension for this purpose. It seems like it would be useful far beyond Kurdish. Related extensions:

    https://github.com/opr/WiktionarySearch
    https://github.com/falcondai/chrome-ext-wikipedia

    Like

  3. The point of the above is that by automating educated guesses about which words to hyperlink and what entries to hyperlink them to, the amount of work left for the human would be reduced to a fraction of what it would have been.

    The advantage of a browser extension would be that as you continue to work on an article, you wouldn’t have to keep copying and pasting back and forth between the browser and the external text editor. The drawback is that an extension may be harder to develop than text editor macros, though that depends on various factors.

    Regarding trivial words, you might get a fair amount of leverage by just letting the user specify a list of words to ignore. The list could be added to over time. There could be a keyboard shortcut for a function that permanently adds the selected word to the list of trivial words, and unhyperlinks all occurrences of that word in the current article.

    Like

Leave a comment