How does a type inference work

Hindley Milner type inference
Home | News | TestPage | ForumPage | Participant | Categories | Index | Help | Settings | To change

Hindley-Milner type system

Hindley and Milner developed this type system, extensions of which form the basis of Standard ML, Haskell and several other functional programming languages. The special thing about the Hindley-Milner type system is that each piece of code has exactly one most general type and that there is an algorithm, the HindleyMilnerTypeInference, which finds this most general type. Types are:

  • primitive types (those built into the language or user-defined)
  • Type variables
  • Functions (notation: a -> b, a function with argument a and result b)
  • other type constructors (similar to Ada's generic containers, but more general)
That's all! You don't need functions with several arguments, you just take a function with one argument, the result of which is a function (a -> b -> c == a -> (b -> c)).
Type inference

The inference (is type matching a good translation?) Itself is also pretty straightforward. All variables in the program are initially given the type "a" (a is a type variable that has not yet been used), all functions already have one, as well as constants and data constructors. Now type rules are applied, which mean that certain types must be the same. In fact, there are only two rules: a function must be applied to its argument type and both have in common the result type and another rule for recursive calls. If two types are to be the same, they are set equal, so that the variables are linked to types. If the unification works, the variables are tied to the most general type in the end. If it doesn't work, there is no type at all, the program is wrong.

There are also type classes in Haskell. Usually a type variable fits every type. However, it can be restricted so that it only fits one type of the corresponding class. Types get into a class by declaring that the type belongs to the class and implementing the necessary methods.


In practical use: The Hindley-Milner type system has type constructors with parameters. For example, a list is a polymorphic type:

data list a = Nil | Cons a (List a)

read as "for any type a a list of type a values ​​is either empty (Nil) or the concatenation (cons like construct) of an element of type a with a list of type a". You can also read it like this: Nil is the end of the list, Cons says that another element comes and then it goes on. Functions can therefore also be polymorphic:

length :: List a -> Integer

"for each type a, length is a function that maps lists of type a to integers". This is also possible with several parameters. In Haskell, you can also restrict the polymorphism by means of type classes, for example so that only types are allowed that represent numbers and that can be added together. The important thing now is that every term in this system has a "most general type" that can be determined automatically. This is exactly what a Haskell compiler must be able to do. This is why type declarations are usually redundant, but more type declarations often lead to more precise error messages. By the way, Haskell doesn't know any type-cast, and I've never missed him either.

I still haven't understood the interplay of restriction and inference. What are the types of a sorted list (elements implement comparison for less than one), a hash table and a dictionary (elements implement a hash function)?
Types with restricted polymorphism are allowed, but actually nonsense. Sometimes you want to express an intention, and then you might declare a dictionary like this:

data (Ord key, Eq value) => FiniteMap key value = ...

It makes more sense for functions. Let's take a function for sorting a list:

sort :: Ord a => [a] -> [a]

read as "for every type a with an order (class Ord), sort is a function which transforms a list above type a into a list of type a." The class Ord contains the comparison operator (analogously):

class Ord a where compare :: a -> a -> Ordering

If you simply implement sort without a type declaration, you naturally use the compare function. Since the only exists in the Ord class, the restrictive "Ord a" is found from this by the type comparison. In principle, it would be practical if you could do without the type classes and simply allow overloading as in C ++. Unfortunately, this makes the type inference NP-complete (or something similarly bad, I don't have that in my head), so it's better to keep it in check. In my opinion, the documentation effect is definitely worth it.

By the way, there is also a sorting function with any criterion. It is fully polymorphic again. Note that this roughly corresponds to the qsort in C, but the latter has to fall back on (void *) and an ugly gap in the type system.

sortBy :: (a -> a -> Ordering) -> [a] -> [a] sort = sortBy compare

Another example is the FiniteMap? (aka dictionary). The data type is completely polymorphic. Since the FiniteMap? but is a search tree internally, it naturally only works if the keys can also be compared. This limits the type of each function working on it:

addToFM :: Ord key => FiniteMap key value -> key -> value -> FiniteMap key value deleteFromFM :: Ord key => FiniteMap key value -> key -> FiniteMap key value unionManyFM :: Ord key => [FiniteMap key value ] -> FiniteMap key value

Some also work without restriction. For example, you can run through the dictionary without comparing keys. Sure, you rattle off the finished tree. However, this freedom is not of much use, where should you ever get a FiniteMap? got no order with key type? After all, the empty and the single-element dictionary are available without this type restriction.

foldFM :: (key -> value -> a -> a) -> a -> FiniteMap key value -> a

That may sound a bit screwed up and in case of doubt it looks like a lot of paperwork, but it is very practical. First, you never intentionally write mistyped programs because they wouldn't work anyway; second, you are never obliged to actually write down the types. Usually they are also quite short.
Would something like that be conceivable in a language like Smalltalk or Java?

In my opinion, that doesn't fit small talk. Smalltalk thrives on sending messages "blindly" to objects, regardless of their type or class. You can even create lists of objects that are not at all related to each other and send the same message to each of them, some of the objects may not understand them at all, etc. None of this can be captured with static typing. Yes, maybe it works, but the result would not be more small talk. And Java ... Java took over the object model from Smalltalk and then imposed the poor type system from C (without ++). In Java one is always inclined to declare all interfaces with "Object" only and is then forced to cast when the method is called. To be honest, I think Java is beyond salvage anyway. If you think for yourself, you take (depending on your taste and problem) Smalltalk, C ++ or ML.

Type inference also works with dynamic typing. A type inferenceer is integrated in the RefactoringBrowser for SpeechSmalltalk. It's a bit hidden though. It creates a class comment in which all possible types are specified for each member variable. However, the inferenceer performs considerably less than would be possible with statically reliable typing and especially with functional programming. Type annotations in the style of what the RefactoringBrowser suggests are required in Smalltalk in order to simplify the Rountrip engineering using a case tool.
Heterogeneous containers

What is the type for a hash table (elements implement hash function)?

Do you mean a heterogeneous table of different things that have a hash function but otherwise have nothing in common?

If so: that is not possible. In the Hindley-Milner system there are only homogeneous containers. Usually that doesn't matter: at some point I'll get things out of the container again, and then I want to do something with them, and not cast them to anything! Whatever is then possible, I can do beforehand and put functions in the container. Then it is homogeneous again.

However, if you have a certain selection of types in mind, you can define a type that combines these types. It's like a type-safe union in the language Cee or variant data networks in the language Modula2 ?. One can say with

data TypMix = Text String | Fraction Double | Whole number Integer
use the three types String, Double and Integer as alternatives. However, this should not be misused to simulate dynamic typing!
Home | News | TestPage | ForumPage | Participant | Categories | Index | Help | Settings | To change
Change the text of this page (last changed: April 7, 2005 14:24 (diff))