On Universal Construction and (co-)Products

Presented on June 22, 2017
Presenter: Mika

Preview

As we have talked a lot about category theory around the lab, we’ll be trying to bring it closer to home in the weeks to come, with minimal talk about what category theory is about, and more talk about how to use it in PL. This thursday, we’ll cover universal constructions on several examples, the most important of which are going to be products and co-products, directly translatable into type theory. I’ll also take a swing at limits and co-limits, as generalizations of the concepts that can be built by universal construction. I don’t think I’m going to prepare a cool visual presentation, preferring to use the board instead.

Summary

Categories, functors and natural transformations

Here’s a quick summary of what category theory is about. A category C is a class of objects, which are abstract enough to just have a name and a presentation (generally, a dot), a class of morphisms, which are arrows between some two points of the category, and two rules:

all objects have to have an identity morphism, forall X in objects(C) . id(X) : X -> X in morphisms(C)
if morphisms f : X -> Y and g : Y -> Z exist, for X, Y, Z in objects(C), then the composition of f and g, written g o f (read “g after f”) is a morphism, g o f : X -> Z in morphisms(C)

Categories can represent a wide variety of both mathematical objects and models of non-mathematical objects.

A method which is often used to find features in category theory is universal construction. It has three steps:

define a pattern consisting of some objects and morphisms;
find all instances of said pattern in the category;
define a ranking and (potentially not total) ordering of everything you found;

The top of that ranking now represents the feature that we wanted to define. Universal construction thus never talks of the concrete choice of category, objects or morphisms (hence the name). It can be shown that many important features, like initial and terminal elements, products and sums (which is what we’ll focus on), can be found using universal construction.

As programming languages people, we will mostly be interested in categories whose objects are types and morphisms are functions between types. We can imagine several important types in our category: for example, we can imagine that there’s a function that translates any type into a unit type ():

unit :: a -> ()
unit _ = ()

This is a family of arrows of the form forall x in objects(C) . x -> (). We could also distinguish an uninhabited type Void. We can imagine a function (called absurd in Haskell):

absurd :: Void -> a

which is, again, a family of arrows forall x in objects(C) . Void -> x in our category.This function can be constructed but cannot be called, as nothing can supply an element of Void. These two are special because they represent the initial and terminal object, respectively. Tn a similar way, we can think of Bool and other types, all represented as single objects in C.

To introduce type constructors, we have to go a bit further and look at functors, which are mappings between categories such that they map objects into images of objects and morphisms into morphisms between images of source and target objects. This means that two properties are fulfilled:

F id_a = id_F a
F (x o y) = (F x) o (F y)

In this way, a functor is a structure-preserving mapping. This doesn’t mean that the mapping preserves cardinality or scope, though, as seen with the following mapping between two categories, C and D, such that C is our type-category and D is a category containing a single object, 1:

forall x in objects(C) . F x = 1
forall f in morphisms(C) . F f = id(1)

This satisfies the fmap rules, as:

F id(X) = id(1)
F (x o y) = id(1) = id(1) o id(1) = (F x) o (F y)

A category like D is interesting because it describes a well-known mathematical object, namely, a monoid! If you imagine what we just did in type-theory terminology, we went from a typed language to an untyped language in which all types are 1.

An interesting kind of functors that we care about is endofunctors. These are functors that go from and to the same category. For example, a C => C mapping would be an endofunctor. This is interesting from a programming languages stance as this is basically how we get polymorphic behaviours. For example, observe a type constructor T a: this is not a type, it is a family of types consisting of lifting a to the context of T. For example, the Maybe data structure:

data Maybe a = Nothing | Just a

It’s important to note that there’s not Maybe type, by itself. It is a mapping, from any type a in C to a specific Maybe a. For a specific a, Maybe a is an actual type, in C! Nothing is a type completely equal to Unit, in that regard.

To prove that Maybe a is a functor, we can see that:

fmap :: (a -> b) -> (Maybe a -> Maybe b)
fmap _ Nothing = Nothing
fmap f (Just x) = Just (f x)

fmap (g o f) = (fmap g) o (fmap f)

fmap (g o f) Nothing 
  = Nothing
  = fmap g Nothing
  = fmap g (fmap f Nothing)

fmap (g o f) (Just x)
  = Just ((g o f) x)
  = Just (g (f x))
  = fmap g (Just (f x))
  = fmap g (fmap f (Just x))
  = ((fmap g) o (fmap f)) (Just x)

Not all such structures are functors, though, easily shown with any contravariant data structure. (This isn’t a problem, as there are co-functors which deal with that.)

One more important feature that we might want to discuss are natural transformations. This is a mapping between two functors, with a naturality clause added in, which says that if n :: F a -> G a, then, for each morphism f : X -> Y in C:

n o F(f) = G(f) o n

This clause says that, starting from F(X), we can either do F(f) and then n, or n first (bringing us into G a), and then G(f), to get to G(Y). This is also expressed by making the diagram of these transforms be commutative. As an example, we can observe the well-known function safeHead:

safeHead :: [a] -> Maybe a
safeHead [] = Nothing
safeHead (x::xs) = Just x

This is a function polymorphic in a. it works for any type from the category C, and makes no assessment of the contents of a. Natural transformations are always transformations on structure, and not on the content inside. To prove the naturality of it, we can do:

fmap f . safeHead [] 
  = fmap f (safeHead []) 
  = fmap f Nothing 
  = Nothing
  = safeHead []
  = safeHead (fmap f [])
  = safeHead . fmap f []

fmap f . safeHead (x::xs)
  = fmap f (Just x)
  = Just (f x)
  = safeHead (f x : fmap f xs)
  = safeHead (fmap f (x:xs))
  = safeHead . fmap f (x::xs)

The good thing about having proven naturality is that, if the compiler is aware of it, it can choose the less costly path of execution by doing a simple static analysis.

Product and sum types

Now that we have that, we can observe the following: if we observe a type that can be considered a pair, (a, b), from a categoric perspective, we are observing a type p with arrows to both a and b. This pattern can be understood as a structure (p, fst, snd), where:

fst :: p -> a
snd :: p -> b

We call the two functions projections. However, having this pattern met does not imply that a type is an exact pair (or more generally, product); additionally, we have to show that within the category, there is no other product-like structure more “pure”. What this means, formally, is that if there is a structure (q, fst', snd') such that:

fst' :: q -> a
snd' :: q -> b

and there exists an arrow h :: p -> q, we will say that the structure q is more product-like, or pure, because p can be written as a composition of h and the projections from q. This is true because in a category, every two composable arrows have to be composed, and h :: p -> q and fst' :: q -> a means that fst = fst' o h. As such, h can bring in some impurities into the structure.

Obviously, this construction is universal: we rank all the patterns we find by how pure they are and pick the purest to be the product. This means that the product’s projections aren’t factorizable into any more projections, and are well-established within our type system as the purest way to transform from type p to either a or b.

If we now choose to flip the arrows in our pattern, we get a type s that is receiving arrows from both types a and b. This corresponds to an algebraic data type: both a and b’s constructors finish in the type s. In a similar fashion, we can use universal construction to pick out the best of this pattern, and see that the purest projections mean that no other type can play any role in matching the component-types of a sum-type. It is worth noting (as Kyle did) that the only difference between product and sum types, except the obvious flip of arrows, that the impurities that could be added are added as compositions from different sides: in sum types, the projections, because of the arrow flip, are fst = h o fst' and similar form snd'.

Universally constructing universal construction

To bring the story around, it is worth noting that category theory gives us a way to define what it means to select objects (for our products and sums, for example), in a non-axiomatic way. To pick an object from a category is to provide a functor from a two-object category (an indexing category, so to speak) to C. A discrete category is one where the only morphisms present are identity morphisms, and for our two-object category, we can use a discrete category with two distinguished objects, 1 and 2. Now we can define a functor that maps both of these objects into a single object x of C. Let’s call such a functor Δx.

Let us also define a functor that will go from the indexing category to two separate objects, which will be the projected-upon a and b. That means that, for example, 1 maps onto a and 2 maps onto b. Let’s call this functor F.

Now that we have these two functors, we can build a natural transformation between them: this natural transformation will have two components – the first would be p1 :: x -> a and the second p2 :: x -> b. The naturality of this comes from the fact that both p1 and p2 are morphisms. In a more general light, the indexing category doesn’t need to be a discrete category, and can have complex structure. The functor Δx would still map the whole category into a single object, hence this makes it seem like our selection of objects from the natural transformation between the new Δx and F have to follow additional rules, mapped out by the indexing category. As the natural transformation doesn’t care about what the content is, only the structure, we have picked out a pattern (indexing category, Δx and F), found all of them (calling them cones, as the natural transformation between Δx and F forms an image that looks very much like a cone. The best possible cones in a category are called limits, and we have talked slightly about them, only as a final generalization over products and sums.

I found it very profound to be able to do this construction, as it gave me an understanding of why people said that natural transformations are the most important concept of category theory. Categories laid the foundation for structure, functors gave us structure-preserving mappings between similar (but possibly larger or smaller things), and natural transformations give us a way to use these mappings to establish different ways of doing things, depending on what things are similar in the environment. I think that this is a fact not used enough in computer science, that might climb to the surface at some point in the future.

Attachments

I’ve been learning category theory from Bartosz Milewski, and his youtube channel.

Also, this piece of his blog.