Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You're correct about Swedish. The extra letters come at the end, in the order å, ä, ö. Interestingly enough this order is different to how Norwegian and Danish orders them.

Also, as opposed to German, they are not referred to as umlauts or accents. ä is a completely different letter compared to a, and they have absolutely no relation. An indication of this is that I can't even think of a word that describes the dots over the a, similar to how most English speakers wouldn't be be able to name the dot over the i.

As such, a search or sorting algorithm that puts equivalence between ä and a would be completely broken. However, an English speaker who only wants to look up a Swedish word in a dictionary would definitely want that equivalence. They may not even be able to type the Swedish letters. From this I conclude that it's the locale of the user that must dictate the way comparison works, not the language of the words that are being compered. This is thankfully how locales typically work.

In Swedish, these are extremely common letters, there are plenty of examples where words differ only in letters which would change using such algorithms. Such as älg/alg (elk/algae), or kö/ko (queue/cow).



It's not exactly right to say that a/å/ä and o/ö have absolutely no relation in Swedish. There are plenty of language cases where they act more like umlauts, such as stor/större (large/larger) or få/färre (few/fewer). Despite this, they are still always treated as separate letters.

Also, the history of the letter shapes are the same as the german umlauts, with e being written above a or o before writing ä or ö and o written above a before writing å.


In one sense you're right. There is a historical relation. However, I'd argue that that relation is so obscure these days that no native speaker would even think about it, unless it's pointed out of course.

But yes, the origin of the letters is indeed from their base, where ä came from an e written on top of the a. This is more clear in Norwegian where you write the same letter like this æ.

These days, very few people (outside of liguists working with the history of the language) would argue that ä is a variant of a though.


To be fair, vowel gradation (aka ablaut) is a rather weak relationship – one wouldn’t usually say, eg. that in English i, a, and u are related just because of the conjugation of Germanic strong verbs such as sing/sang/sung.


Ablaut is unrelated in this context though. These vowels arise from a much more recent sound change (conveniently called umlaut, because it gave rise to a bunch of umlauted vowels).


Fair point, it's about back vowels becoming front vowels of the same height.


Romanian is similar to Swedish with ă, â, î, ț, ș - these are all separate letters, not accents (though they are sorted after the Latin letter they resemble, not at the end of the alphabet - a, ă, â, b, c,..., s, ș, t, ț,... ), and similarly words often differ only in those letters - in/în (flax/in), par/păr (stick/hair).

However, for Romanian it is also very common in electronic documents to replace these letters (collectively called diacritics) with their base Latin forms, and it is not always easy to predict how a document will be spelled. So, it's often useful for text searches to actually conflate them. I'm not sure, but this may also have been common in things like word indexes for books, even before computers.

I'm curious if the same is true for Swedish.


If being faced with a situation where a Swedish word has to be written without the Swedish letters, such as when writing names in broken forms online (I have on of those letters in my name, and forms that tells me to "only use alphabetic letters" are annoyingly common), then either people simply use the letter without the dots, or they use formal transliteration.

The formal way (which is what is done for the international part of a passport for example) is å to aa, ä to ae, ö to oe.


Diacritics are a mess, and most of the time don't even resemble the "original" sound.


Well, they have the advantage that people will typically understand what word was meant even if using only Latin letters, with a bit of context. If they had been entirely different unrelated glyphs, it may be much harder to understand in situations in which you are limited to the Latin alphabet.

Of course, it's debatable whether diacritics were a better solution than using letter combinations. There are some canonical replacements already - sh for ș and tz for ț - and some could have been created for the extra vowels. This is especially puzzling since we already use letter combinations instead of diacritics for the Ç sound (c-e/i) and the soft G sound (g-e/i).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: