Algebraic Data Types (Sum & Product types)

Learning Goal

In this chapter we’ll understand the downside of not being able to communicate semantics of a function through the built-in types that we have seen till now (in previous chapters). We’ll try to address that problem, first by using newtype and then via product-types.

After that we will see how to restrict the values that a function can accept by using sum-types, thus making our code type-safe, correct, and less error-prone.

We will also be covering pattern-matching along the way.

By the end of this chapter, and after solving the exercises, you should be able to appreciate when to use vanilla built-in types, when to use custom types (like newtypes, sum-types, and product-types) in your code, and the pros & cons of each approach.

Structure vs Semantics - Downside of “regular” type-signatures

Let’s try to understand why we need more “complicated” types than the regular built-in types that we’ve seen till now (i.e. String, Char, Bool, Int, etc). Consider our calculateEndTime function from the previous chapter:

calculateEndTime :: Int -> Int -> Int -> (Int, Int)
calculateEndTime hr mn durationInMins =
  let (addHour, finalMins) = divMod (mn + durationInMins) 60
  in (hr + addHour, finalMins)

If we look at only the function signature, and not the variable names or the actual function definition, here is what we’ll see:

calculateEndTime :: Int -> Int -> Int -> (Int, Int)

By looking at just this type signature can you be sure about what each Int argument represents in real life? Which one is for the hour? Which one is for the minute? Which one is for the duration? Is the duration in seconds or minutes?

This is what is meant by “semantics”. By the way, here are a couple of dictionary meanings of the word:

semantics

  1. the meaning, or an interpretation of the meaning, of a word, sign, sentence, etc.:
  2. the meaning of a word, phrase, or text.
  3. the branch of linguistics and logic concerned with meaning.

While the structure of the calculateEndTime function is absolutely clear - i.e. it takes three integer arguments and returns a 2-tuple of two integers - the semantics are not clear. What does each integer argument really mean? What does each represent in real life?

Let’s take another example. Say, you are working on a shopping-cart project, and one of your team mates has written a function to prepare the content/HTML for an order confirmation email. Here’s what the type-signature looks like:

prepareConfirmationEmail :: String -> String -> [String] -> [String] -> String

I’m pretty certain that you’ll be pulling your hair out if you had to work with just this! Again, the structure is clear, and the compiler is going to enforce it for you, but that doesn’t help you much, does it?

Since, it’s your team’s code (and not some closed-source library), you can peek into the definition of the function and see this:

prepareConfirmationEmail :: String -> String -> [String] -> [String] -> String
prepareConfirmationEmail orderId subject to cc = _todo

Aha! Just by looking at the variable names you can understand the semantics of the function call. Given the type-signature along with the variable names, now you know what you need to pass where, and the compiler will enforce that you are passing the correct types. (i.e. you are not passing a list of strings to the subject argument).

Note

By the way this is why naming is very, very important in computer engineering. It is a well-known saying - There are two hard problems in computer science: cache invalidation and naming things

So, that should solve the problem then, right? Well, not quite…

Problem with API docs

Firstly, the standard doc generation tool in Haskell - haddock [1] - does not generate API docs with variable names. This is one of the problems that you might already be facing. Compare the API docs of the filter function from Clojure vs Haskell:

; clojure

(filter pred coll)
-- haskell

filter :: (a -> Bool) -> [a] -> [a]

If you aren’t accustomed to reading type-signatures, you’ll definitely struggle with the Haskell API docs. On the other hand Clojure’s API docs, don’t contain any type signature, but because of the variable names - pred (for predicate) and coll for collection - make intuitive sense. The flip-side of this is, in Clojure’s case, without seeing usage examples, you cannot really be sure what type of values to pass to pred & coll. Does it work for all types of collections? i.e. lists, vectors, hashmaps, trees, or only certain kinds of collections?

So, the Haskell way of giving more importance to type-signatures is structurally superior, but semantically inferior.

But, why can’t Haskell generate API docs with variable names, like how Java does? The probable reason has to do with “function currying” and “point-free syntax”, but explaining those would be quite a detour for this chapter.

Variable names alone, do not prevent bugs adequately

Secondly, just because you know variable names does not protect you adequately from certain classes of bugs. My favourite example for this is the Mars Climate Orbiter failure:

“…communication with the spacecraft was lost as the spacecraft went into orbital insertion, due to ground-based computer software which produced output in non-SI units of pound (force)-seconds (lbf·s) instead of the SI units of newton-seconds (N·s) specified in the contract between NASA and Lockheed…”

If you’re writing a program where one Int argument is semantically different from another Int argument, say the x co-ordinate from the y co-ordinate – or in the Mars Climate Orbiter’s case, one Double (force-seconds) from another Double (newton-seconds) – just knowing the variable names will not protect you from inadvertently doing an incorrect computation. For example:

  • adding the x co-ordinate to the y co-ordinate
  • dividing a force in newtown-second with a force in pound-seconds
  • substituting someone’s name with their email address
  • adding something in INR with something in USD

Thus, we use Haskell’s advanced type-system to design custom-types that help us write better programs. Let us see how…

Positional product types

Note

In academic literature there are only “sum” types and “product” types. However, I have further split product types into two parts - positional and named/records - just to make them easier to refer to.

Let us go back to the calculateEndTime example and fix the semantics problem in the type-signature, by changing this…

calculateEndTime :: Int -> Int -> Int -> (Int, Int)
calculateEndTime = _todo

…to the following:

data TimeOfDay = MkTimeOfDay Int Int deriving (Eq, Show, Ord)

calculateEndTime :: TimeOfDay -> Int -> TimeOfDay
calculateEndTime (MkTimeOfDay hr mn) durationInMins =
  let (addHour, finalMins) = divMod (mn + durationInMins) 60
  in MkTimeOfDay (hr + addHour) finalMins

-- GHCi> calculateEndTime (MkDuration 10 30) 45
-- MkDuration 11 15

Let’s break this down line-by-line:

data TimeOfDay = MkTimeOfDay Int Int deriving (Eq, Show, Ord)
  • We’ve defined a new “positional product-type”

  • The type is called TimeOfDay (which is now at the same level as the built-in types like Int or Char). This means that we can actually use it in type-signatures.

  • The constructor of the TimeOfDay type is called MkTimeOfDay. This is unlike a lot of other OOP languages where the constructors must either have the same name as the type (or class) OR should have a special name. For example, here’s what a constructor looks like in Java:

    class Person {
      private String name;
      private String email;
    
      // Cosntructors in Java must have the same name as the class
      //
      Person(String name, String email) {
        this.name = name;
        this.email = email;
      }
    }
    

    And here is what a constructor in Ruby looks like:

    class Person
      # Constructors in Ruby must be named `initialize`
      #
      def initialize(name, email)
        @name = name
        @email = email
      end
    end
    

    When we talk about “sum” types we’ll see why is it possible in Haskell to have constructors with different names for the same type.

  • The MkTimeOfDay constructor “holds” two Int values.

  • Finally there is a deriving (Eq, Show, Ord) boilerplate, which we’ll understand when we cover type-classes.

    Important

    Whenever you are defining custom types, please put deriving (Eq, Show, Ord) at the tne end, otherwise GHCi won’t be able to print values of your custom type to the terminal. Why exactly is this required will become clearer when we cover type-classes.

The reason why I’m calling this type a positional product-type is because there is no explicit differentiation between the two Int values. One of them holds the hour and the other minute. But, by just looking at the type definition, we cannot be sure which holds which. We have to fallback on a strong “positional convention”, whenever you’re writing the time-of-day, the first one is hour and the second one is minutes.

Now, before we break down the new definition of calcluateEndTime let’s see what this type actually is, and how to define values of this type. Let’s play around with it in GHCi:

GHCi> data TimeOfDay = MkTimeOfDay Int Int deriving (Eq, Show, Ord)

GHCi> let y = MkTimeOfDay 10 30
GHCi> y
MkTimeOfDay 10 30

GHCi> :t y
y :: TimeOfDay

GHCi> MkTimeOfDay 12 45
MkTimeOfDay 12 45

GHCi> y + 20
<interactive>:10:1: error:
    • No instance for (Num TimeOfDay) arising from a use of ‘+’
    • In the expression: y + 20
      In an equation for ‘it’: it = y + 20

Notice the last part where we tried to do y + 20. The compiler prevented us from evaluating that expression becuase it doesn’t know how to add a TimeOfDay to an Int, even though y internally holds two integers. Some type-safety has started showing up…

Also notice the output of y and :t y. The first printed the value, i.e. MkTimeOfDay 10 30 and the latter printed the type, i.e. TimeOfDay. This is completely analogous to, say, printing 123 (the value) and Int (the type).

Now, we have “wrapped” our two integers into a custom data-type called TimeOfDay that we can’t do numerical computations on. However, we need to do numerical computations on the hour & minute values that have been wrapped. How do we do this?

If this were Java, we would be reaching out for things like getters/setters (for private variables in a class), or would be directly manipulating the variables (for public variables):

Person p = Person("Saurabh", "saurabh@vacationlabs.com");

// either this....
System.out.println(p.name);

// ... or this...
System.out.println(p.getName());

However, Haskell doesn’t have this concept of getters/setters. In fact, it doesn’t even have the concept of public/private variables with respect to custom data types. Instead it has, what is called, pattern matching

Pattern matching

Pattern-matching allows you to extract “inner / wrapped” values from constructors of various types. Let’s see some pattern-matches in action:

Pattern-matching a tuple: This works because (,), (,,), (,,,), and so on, are actually constructors for 2-tuples, 3 tuples, 4 tuples, respectively. It’s just that tuples are built into the language and you can use them in the more intuitive infix notation as well (i.e. what we have been using till now).

-- using the regular "infix" notation for the (,) constructor
--
GHCi> let (x, y) = (10, 'a')
GHCi> x
10
GHCi> y
'a'

-- Using the prefix notation for the (,) constructor
--
GHCi> (,) 'a' 'b'
('a', 'b')
GHCi> let (x, y, z) = (,,) 10 20 'a'
GHCi> x
10
GHCi> y
20
GHCi> z
'a'

-- Even this works...
GHCi> let (,,) x y z = (,,) 10 20 'a'

-- And this works as well...
GHCi> let (,,) x y z = (10, 20, 'b')

Pattern-matching a positional product type: Usually, it is not possible to use the constructor for a positional product type in an infix notation. So, sticking to the prefix notation, pattern-matching for a positional product type works similarly:

GHCi> let (MkTimeOfDay hr mn) = MkTimeOfDay 10 30
GHCi> hr
10
GHCi> mn
30

GHCi> let t = MkTimeOfDay 12 45
GHCi> let (MkTimeOfDay hr mn) = t
GHCi> hr
12
GHCi> mn
45

Note

It is always a good idea to put parentheses around your pattern matches as shown in the example above. While it can work without parentheses some times, a lot of times the compiler gets confused if you use multiple-pattern matches in function arguments without parentheses. This leads to very complicated compiler errors. So, always put parentheses around pattern matches

Pattern-matches can be nested: What if we had a value of type (Char, Int, TimeOfDay)?

GHCi> let (x, y, MkTimeOfDay hr mn) = ('a', 'b', MkTimeOfDay 10 30)
GHCi> x
'a'
GHCi> y
'b'
GHCi> hr
10
GHCi> mn
30

It is not necessary to pattern-match: It is not always necessary to “destructure” a constructor into its “inner values” if you don’t need them. For example:

GHCi> let (x, y, t) = ('a', 'b', MkTimeOfDay 10 30)
GHCi> x
'a'
GHCi> y
'b'
GHCi> t
MkTimeOfDay 10 30

You can pattern-match the entire “wrapped” value as well as the inner values:

GHCi> let (x, y, t@(MkTimeOfDay hr mn)) = ('a', 'b', MkTimeOfDay 10 30)
GHCi> x
'a'
GHCi> y
'b'
GHCi> t
MkTimeOfDay 10 30
GHCi> hr
10
GHCi> mn
30

With this detour of pattern-matching out of the way, re-read the function definition of calculateEndtime would be making more sense now:

calculateEndTime :: TimeOfDay -> Int -> TimeOfDay
calculateEndTime (MkTimeOfDay hr mn) durationInMins =
  let (addHour, finalMins) = divMod (mn + durationInMins) 60
  in (MkTimeOfDay (hr + addHour) finalMins)

By the way, this is functionally equivalent to the following:

calculateEndTime :: TimeOfDay -> Int -> TimeOfDay
calculateEndTime t durationInMins =
  let (MkTimeOfDay hr mn) = t
      (addHour, finalMins) = divMod (mn + durationInMins) 60
  in (MkTimeOfDay (hr + addHour) finalMins)

(The difference is where we are doing the pattern match. In the former, we are pattern matching the function arguments directly. In the latter the function argument is t and is then pattern-matched in the let binding).

While this is a significantly better than our original function signature, there is still a slight problem with our function signature. Notice the Int argument there? Just by looking at the type-signature we still don’t know whether that is supposed to be in hours, minutes, or seconds. Let’s fix that using another custom type:

data TimeOfDay = MkTimeOfDay Int Int deriving (Eq, Show, Ord)
data DurationInMins = MkDuration Int deriving (Eq, Show, Ord)

calculateEndTime :: TimeOfDay -> DurationInMins -> TimeOfDay
calculateEndTime (MkTimeOfDay hr mn) (MkDuration d) =
  let (addHour, finalMins) = divMod (mn + d) 60
  in (MkTimeOfDay (hr + addHour) finalMins)

-- GHCi> calculateEndTime (MkTimeOfDay 10 30) (MkDuration 45)
-- MkTimeOfDay 11 15

Now, in our code, if there are some places where we are representing event duration in seconds and in some places we are representing it in minutes, the compiler will prevent us from passing duration-in-seconds to this function, by mistake.

Detour: Immutability

Let’s go back to our calculateEndTime code. Did you notice that we didn’t really “update” the value of the hour & minute fields? Certainly not in the sense that you would probably be used to in languages like Java, Javascript, Ruby, or Python. For example, consider the same function written in typical Javascript:

function calculateEndTime(t, duration) {
  // btw, Javascript doesn't have a built-in divMod function...
  //
  var addHour = Math.floor((t.mins + duration)/60);
  var finalMins = (t.mins + duration) % 60;
  t.hour = t.hour + addHour;
  t.mins = finalMins;
  return t;
}

(Notice, that without reading the code (or documentation) you cannot really know what structure to pass for the t parameter).

Now, try the following with this function in Javascript:

var start = {hour: 10, mins: 30};
var end = calculateEndTime(start, 45);
console.log("end=", end);

// OUTPUT:
// end={hour: 11, mins: 15}

console.log("start=", start);
// start={hour: 11, mins: 15}
//
// ... WTF?!

Important

The Javascript function is “mutating” the arguments that you pass to it. “Mutation” of this kind is one of the biggest source of bugs.

Say, you have a startTime and duration, and you want to calculate the endTime to display all the three values (startTime, duration, endTime) to the user. However, with such bugs (originating due to mutation), you’ll end up with your startTime being sneakily changed behind your back! In small code-bases this kind of unintended mutation may be easy to manage, but it becomes increasingly hard as the code base grows - the place where a bug is being observed, and the place where the bug is being created could very well be completely different modules, written by different people!

Now, going back to calculateEndTime in Haskell, if you notice, we aren’t really mutating the hour & minute fields in the TimeOfDay that has been passed to the function. We are creating a new TimeOfDay by invoking the MkTimeOfDay constructor.

data TimeOfDay = MkTimeOfDay Int Int deriving (Eq, Show, Ord)
data DurationInMins = MkDuration Int deriving (Eq, Show, Ord)

calculateEndTime :: TimeOfDay -> DurationInMins -> TimeOfDay
calculateEndTime (MkTimeOfDay hr mn) (MkDuration d) =
  let (addHour, finalMins) = divMod (mn + d) 60
  in (MkTimeOfDay (hr + addHour) finalMins)

Try replicating the same Javascript bug in Haskell. You will be unable to! Haskell’s immutability protects us from such bugs.

GHCi> let startsAt = MkTimeOfDay 10 30
GHCi> let endsAt = calculateEndTime startsAt (DurationInMins 45)
GHCi> endsAt
MkTimeOfDay 11 15
GHCi> startsAt
MkTimeOfDay 10 30

Important

And this is what makes Haskell an immutable language. Once a variable is defined you cannot change it’s value. However, using that variable, you can compute/create a new value, and assign it to a new variable (or return it as a result of a function). In fact, it’s a good idea to stop calling them variables, because of the connotation that you would have already built up. It’s better to start calling them bindings instead - once a name is bound to a value, it cannot be bound to a new value within the same scope.

The last part is very important. You can have the same variable name across different funtions, because they are different scopes. In fact, this is how you can keep doing let x = _something over & over again in GHCi. It is because GHCi creates a new scope for every statement that you issue. This is to make interactive evaluation simpler. However, try doing the followoing in a Haskell source file (the compiler won’t allow it):

someFunction :: Int -> Int
someFunction x =
  let var1 = _something
      var1 = _somethingElse
  in _todo

One point to note is that nested functions are technically different scopes. So you can have bindings with the same name in the inner & outer functions, but the compiler will emit a warning when you do that. This is because the binding in the inner function is ambiguous and may be the cause of inadvertent bugs. For example:

import Data.Char (toUpper)

toUppercase :: String -> String
toUppercase str = map (\str -> toUpper str) str

While the example given above will compile, you can probably spot the inherent ambiguity in what str refers to inside the lambda passed to map. The compiler will emit a warning, but will still go-ahead and implicitly use the innermost str. This means, in the toUppercase function-body str will refer to the value passed as the first argument, except within the lambda, where str would refer to the value passed to the lambda, instead. This implicit assumption is not always a good idea and can be the cause of subtle bugs.

+------------------------------------------------------+
| OUTER SCOPE           +-----------------------+      |
|                       | INNER SCOPE           |      |
| toUppercase str = map | (\str -> toUpper str) | str  |
|                       +-----------------------+      |
+------------------------------------------------------+

This ambiguity is easily resolved by one of two approaches:

  1. Do not use overlapping binding names. Again, this does not mean that you can not, or should not, use the same name in different functions/scopes. Just don’t use same names in overlapping, or nested, scopes.
  2. If you cannot come up with a completely new name to disambiguate (remember, naming things is a hard problem in computer science!), you can use a suffix of _ or ' to create a different name, eg. customer and customer_, or user and user'

Named product types OR records

Let’s go back to the original defintion of TimeOfDay and what we said about it:

“The reason why I’m calling this type a positional product-type is because there is no explicit differentiation between the two Int values. One of them holds the hour and the other minute. But, by just looking at the type definition, we cannot be sure which holds which. We have to fallback on a strong “positional convention”, whenever you’re writing the time-of-day, the first one is hour and the second one is minutes.”

With TimeOfDay we could get away with positional semantics because the convention of stating the hour before the minute is almost universal. However, that is not always the case. Take the example of dates. Some countries use dd/mm/yy (eg. India), while some countries use mm/dd/yy (eg. UK?), and some countries even use yy/mm/dd (eg. Japan). What happens if you use a positional product type to represent a date in your application?

data Date = MkDate Int Int Int deriving (Eq, Show, Ord)

If you’re an Indian, you would assume dd/mm/yy semantics, whereas your team-mate, who’s working out of UK, would assume mm/dd/yy. If you were on the Mars Orbiter team, you’d surely end-up causing a major confusion around launch dates!

Thus, the motivation for using named product types aka records for cases where positional semantics are not universal, or can confusion at a later date (or when the maintainer of the code changes).

Let’s see how we can use records in the date example above:

data Date = MkDate {
  dtDay :: Int
, dtMonth :: Int
, dtYear :: Int
} deriving (Eq, Show, Ord)
  • We created a new record-type, called Date
  • The constructor is called MkDate
  • The constructor wraps three named fields called dtDay, dtMonth, & dtYear all of the type Int
  • If you notice, all the fields are prefixed with dt and are not simply called day, month, and year. The reason will be explained in Gotcha: Field names

Let’s play around with this type in GHCi:

GHCi> data Date = MkDate {dtDay :: Int, dtMonth :: Int, dtYear :: Int} deriving (Eq, Show, Ord)
GHCi> let d = MkDate{dtDay = 1, dtMonth = 1, dtYear = 2017}
GHCi> d
MkDate {dtDay = 1, dtMonth = 1, dtYear = 2017}

-- Notice these "field-accesor" functions that the compiler gives us for free...
GHCi> :t dtDay
dtDay :: Date -> Int

GHCi> :t dtMonth
dtMonth :: Date -> Int

GHCi> :t dtYear
dtYear :: Date -> Int

-- Let's use these field-accesssors to access individial fields...
GHCi> dtDay d
1
GHCi> dtMonth d
1
GHCi> dtYear d
2017

The field-accessor functions that the compiler gives us for free (for each record field), is the closes that you can get to Java’s “getters” in Haskell.

Pattern matching

Since Date also has a constructor it should be possible to pattern-match it, right? Absolutely right!

GHCi> let dt = MkDate{dtDay = 1, dtMonth = 1, dtYear = 2017}
GHCi> let (MkDate {dtDay = d, dtMonth = m, dtYear = y}) = dt
GHCi> d
1
GHCi> m
1
GHCi> y
2017

However, unlike positional product-types where you are forced to pattern-match all values of a constructor, pattern-matching a record-constructor allows us to omit fields. Example:

GHCi> let (MkTimeOfDay hr) = MkTimeOfDay 10 30

<interactive>:94:6: error:
    • The constructor ‘MkTimeOfDay’ should have 2 arguments, but has been given 1
    • In the pattern: MkTimeOfDay hr
      In a pattern binding: (MkTimeOfDay hr) = MkTimeOfDay 10 30

-- We are not pattern matching dtDay & dtMonth
--
GHCi> let (MkDate {dtYear = y}) = MkDate{dtDay=31, dtMonth=12, dtYear=2017}
GHCi> y
2017

Record updates

In Detour: Immutability we saw how we can’t mutate values in Haskell. Using existing values to create new values (rather than mutating existing values), wasn’t a problem with positional products, because, realistically, positional product types are only used with a small number of fields/values. For example, it’s not a good idea to use a positional product for representing a Customer value:

data Customer = Customer Int String String String Int String String deriving (Eq, Show, Ord)

You’d much rather use named products for a type with more than five values (it’s a guideline, not a hard rule):

data Customer = Customer
  { customerId :: Int
  , customerFirstName :: String
  , customerLastName :: String
  , customerEmail :: String
  , customerBirthYear :: Int
  , customerCity :: String
  , customerCountry :: String
  } deriving (Eq, Show, Ord)

So, for small positional products you can do the following without much hassle:

-- Representing weigth in pounds & ounces. Take a look at
-- https://en.wikipedia.org/wiki/Imperial_units#Mass_and_weight to
-- understand what's going on.
--
data ImperialWeight = MkImperialWeight Int Int Int Int deriving (Eq, Show, Ord)

addWeights :: ImperialWeight -> ImperialWeight -> ImperialWeight
addWeights (MkImperialWeight p1 o1 d1 g1) (MkImperialWeight p2 o2 d2 g2) =
  let (addDrachm, finalGrain) = divMod (g1 + g2) 7000
      (addOunce, finalDrachm) = divMod (d1 + d2 + addDrachm) 256
      (addPound, finalOunce) = divMod (o1 + o2 + addOunce) 16
  in (MkImperialWeight (p1 + p2 + addPound) finalOunce finalDrachm finalGrain)

However, for large records, doing something similar is going to get irritating rather quickly:

import Data.List (dropWhile, dropWhileEnd)
import Data.Char (isSpace)

data Customer = MkCustomer
  { customerId :: Int
  , customerFirstName :: String
  , customerLastName :: String
  , customerEmail :: String
  , customerBirthYear :: Int
  , customerCity :: String
  , customerCountry :: String
  } deriving (Eq, Show, Ord)

removeSpaces :: String -> String
removeSpaces s = dropWhileEnd (\x -> isSpace x) (dropWhile (\x -> isSpace x) s)

cleanupName :: Customer -> Customer
cleanupName MkCustomer{customerId=cid, customerFirstName=fn, customerLastName=ln, customerEmail=e, customerBirthYear=by, customerCity=cy, customerCountry=co} =
  MkCustomer
    { customerId = cid
    , customerFirstName = (removeSpaces fn)
    , customerLastName = (removeSpaces ln)
    , customerEmail = e
    , customerBirthYear = by
    , customerCity = cy
    , customerCountry = co
    }

This is solved by the “record update syntax”. However, please note, no real “updation” or “mutation” happens even when you use the record update syntax. A new record is always created, just that certain fields will contain different values.

Let’s see this in action - rewriting the cleanupName function shown above:

cleanupName :: Customer -> Customer
cleanupName cust@MkCustomer{customerFirstName=fn, customerLastName=ln} =
  cust
    { customerFirstName = (removeSpaces fn)
    , customerLastName = (removeSpaces ln)
    }

Or, another way of writing the same thing with accessor functions, instead of pattern-matching:

cleanupName :: Customer -> Customer
cleanupName cust =
  cust
    { customerFirstName = (removeSpaces (customerFirstName cust))
    , customerLastName = (removeSpaces (customerLastName cust))
    }
  • Instead of using the MkCustomer constructor, we using an existing cust value to update
  • Intead of providing all the fields within the curly braces, we provided only those fields that we wanted to change.

Important

Please verify that the Customer record being passed to the cleanupName function is not getting mututated irrespective of which technique you use to write the code.

Note

This section makes it sound as if positional products don’t have any way to “update” them and that you always have to recreate them from scratch. That is not completely true. We will see a unified way of changing values in any data-structure using “lenses” in a later chapter.

Gotcha: Field names

If you’ve noticed, in all our code examples, records always had field-names prefixed with a common name. Example:

-- all field names prefixed with `dt`
--
data Date = MkDate {dtDay :: Int, dtMonth :: Int, dtYear :: Int} deriving (Eq, Show, Ord)

-- instead of
--
data Date = MkDate {day :: Int, month :: Int, year :: Int} deriving (Eq, Show, Ord)

This is because of a limitation in Haskell’s record-types. You cannot define records with the same field-name in the same module. For exampe, try doing the following in your editor and loading your file in GHCi: (note, doing this in a file, and loading the file in GHCi is important. Directly typing out this code in GHCi will not uncover this limitation)

module MyRecordTypes where

data Date = MkDate {day :: Int, month :: Int, year :: Int} deriving (Eq, Show, Ord)

data Copyright = MkCopyright {author :: String, year :: Int} deriving (Eq, Show, Ord)

You will get an error, and the compiler won’t allow it. One way to intuitively understand this limitation is to see what’s going on with respect to field-accessor functions. The Date type will automatically give you the following three field accessors:

-- Inspect the types of `day`, `month`, and `year`, via `:t` in GHCi
-- to confirm this
--
day :: Date -> Int
month :: Date -> Int
year :: Date -> Int

Similarly, the Copyright type will also attempt to give you the following two field accessors…

author :: Copyright -> String
year :: Copyright -> Int

… and, thus, cause a compile error. You can’t define functions having the same name but different type-signatures in Haskell. This is not to say that function overlaoding is not possible in Haskell. It is possible, but it is very different from how it’s done in Java or C++. We will cover it along with type-classes and will see how it is possible to use the same “name” for different record fields, via lenses.

Important

To sum it up, if you’re sticking with basic Haskell, and not going crazy with advanced type-level hackery, it is strongly recommended you prefix all fields in your record with a common name. And use camelCase. For example, if your field is firstName, call it userFirstName, and not userfirstName, or user_firstName.

Sum types

Let’s motivate the need for sum-types through an example. Say, we were building a typical user signup flow and needed a way to represent a user in our code. We could try using a positional-product…

data Customer = MkCustomer String String String String String String deriving (Eq, Show, Ord)

…but that wouldn’t take us very far, because we would keep getting confused about the semantics of each String value. Positional products would not be appropriate because the semantics of each field would not be clear. So, we go with a named product type, aka, record:

data User = MkUser
  { userEmail :: String
  , userFulllName :: String

  -- NOTE: DON'T ever do this in production code. You should be using an
  -- appropriate type exported by the `bcrypt` library for passwords.
  --
  , userPassword :: String

  , userPostalCode :: String
  , userStatus :: String
  , userVerificationCode :: String
  } deriving (Eq, Show, Ord)

Now, as most typical user signups go, a user cannot fill up a registration form and start using your service/app immediately. There is an verification step, where a code is send via email or SMS, which the user has to enter on a webpage. Once the verification step is complete, the user is deemed as active or verified. Further, due to misuse of your app/service/platform, you may want to deactivate a user account as well, thus preventing any further access.

All of this complexity is usually represented by a value stored in the userStatus field.

Now, let’s try writing the registerUser, verifyUser, and deactivateUser functions using this User record:

module UserRegistration where

import Data.List (break, null, filter, take)

data Email = MkEmail String deriving (Eq, Show, Ord)

data User = MkUser
  { userEmail :: Email
  , userFullName :: String

  -- NOTE: DON'T ever do this in production code. You should be using an
  -- appropriate type exported by the `bcrypt` library for passwords.
  --
  , userPassword :: String

  , userPostalCode :: String
  , userStatus :: String
  , userVerificationCode :: String
  } deriving (Eq, Show, Ord)

data NewUser = MkNewUser
  { nuserEmail :: Email
  , nuserFullName :: String
  , nuserPassword :: String
  , nuserPostalCode :: String
  } deriving (Eq, Show, Ord)

registerUser :: NewUser -> [User] -> [User]
registerUser newUser@MkNewUser{nuserEmail=(MkEmail email)} userDb =
  let user = MkUser
              { userEmail = (nuserEmail newUser)
              , userFullName = (nuserFullName newUser)
              , userPassword = (nuserPassword newUser)
              , userPostalCode = (nuserPostalCode newUser)
              , userStatus = "unverified"

              -- NOTE: Pretty lame way of generating a verification code, but
              -- let's stick with this, for now, in order to not complicate the
              -- example.
              --
              , userVerificationCode = (take 2 email) ++ (take 2 (nuserFullName newUser)) ++ (take 2 (nuserPostalCode newUser))
              }
  in (user:userDb)

verifyUser :: Email -> String -> [User] -> (Bool, String, [User])
verifyUser e code userDb =
  let existingUsers = Data.List.filter (\u -> (userEmail u) == e) userDb
  in  if (Data.List.null existingUsers)
        then (False, "No such user", userDb)
        else  let existingUser = head existingUsers
              in  if (code==(userVerificationCode existingUser))
                    then  let verifiedUser = existingUser{userStatus="active"}
                              newUserDb = replaceUserInDb verifiedUser userDb
                          in  (True, "Verified", newUserDb)
                    else (False, "Incorrect verification code", userDb)

deactivateUser :: Email -> [User] -> (Bool, String, [User])
deactivateUser e userDb =
  let existingUsers = Data.List.filter (\u -> (userEmail u) == e) userDb
  in  if (Data.List.null existingUsers)
        then (False, "No such user", userDb)
        else  let existingUser = head existingUsers
                  deactiveUser = existingUser{userStatus = "deactivated"}
              in  (True, "User deactivated", replaceUserInDb deactiveUser userDb)

replaceUserInDb :: User -> [User] -> [User]
replaceUserInDb u userDb =
  let (a, b) = Data.List.break (\x -> (userEmail x)==(userEmail u)) userDb
  in if (Data.List.null b) -- NOTE: Handling the edge-case where the user may not be in the DB
      then  (a ++ [u])
      else  (a ++ (u:(tail b)))

Note

Notice how the replaceUserInDb function is written. The concept of “immutability” cuts across the entire language. In this particular function, we aren’t mutating the [User] list (which represents the user DB). In fact, even if we wanted to, the language wouldn’t allow us. So, we accept a [User] list and return a new [User] list in all our functions. Please verify this for yourself in GHCi.

Let’s play around with this in GHCi:

GCHi> :l UserRegistration
GHCi> let db1 = []
GCHi> let db2 = registerUser MkNewUser{nuserEmail = (MkEmail "saurabhnanda@gmail.com"), nuserFullName = "Saurabh Nanda", nuserPassword = "unsafe", nuserPostalCode = "110001"} db1
GHCi> db2
[MkUser {userEmail = MkEmail "saurabhnanda@gmail.com", userFullName = "Saurabh Nanda", userPassword = "unsafe", userPostalCode = "110001", userStatus = "unverified", userVerificationCode = "saSa11"}]

GHCi> let db3 = registerUser MkNewUser{nuserEmail = (MkEmail "john@doe.com"), nuserFullName = "John Doe", nuserPassword = "unsafeagain", nuserPostalCode = "220002"} db2
GHCi> db3
[MkUser {userEmail = MkEmail "john@doe.com", userFullName = "John Doe", userPassword = "unsafeagain", userPostalCode = "220002", userStatus = "unverified", userVerificationCode = "joJo22"},MkUser {userEmail = MkEmail "saurabhnanda@gmail.com", userFullName = "Saurabh Nanda", userPassword = "unsafe", userPostalCode = "110001", userStatus = "unverified", userVerificationCode = "saSa11"}]

-- NOTE the pattern-matching in this let-binding. `verifyUser` returns a 3-tuple.
--
GHCi> let (result, reason, db4) = verifyUser (MkEmail "saurabhnanda@gmail.com") "incorrect" db3
GHCi> result
False
GHCi> reason
"Incorrect verification code"

-- NOTE no change in the user DB
--
GHCi> db4
[MkUser {userEmail = MkEmail "john@doe.com", userFullName = "John Doe", userPassword = "unsafeagain", userPostalCode = "220002", userStatus = "unverified", userVerificationCode = "joJo22"},MkUser {userEmail = MkEmail "saurabhnanda@gmail.com", userFullName = "Saurabh Nanda", userPassword = "unsafe", userPostalCode = "110001", userStatus = "unverified", userVerificationCode = "saSa11"}]

-- Let's try this again, with the correct verification code, and notice that `db4` changes...
--
GHCi> let (result, reason, db4) = verifyUser (MkEmail "saurabhnanda@gmail.com") "saSa11" db3
GHCi> db4
[MkUser {userEmail = MkEmail "john@doe.com", userFullName = "John Doe", userPassword = "unsafeagain", userPostalCode = "220002", userStatus = "unverified", userVerificationCode = "joJo22"},MkUser {userEmail = MkEmail "saurabhnanda@gmail.com", userFullName = "Saurabh Nanda", userPassword = "unsafe", userPostalCode = "110001", userStatus = "active", userVerificationCode = "saSa11"}]

GHCi> let (result, reason, db5) = deactivateUser (MkEmail "incorrect@incorrect.com") db4
GHCi> result
False
GHCi> reason
"No such user"

GHCi> let (result, reason, db5) = deactivateUser (MkEmail "john@doe.com") db4
GHCi> db5
[MkUser {userEmail = MkEmail "john@doe.com", userFullName = "John Doe", userPassword = "unsafeagain", userPostalCode = "220002", userStatus = "deactivated", userVerificationCode = "joJo22"},MkUser {userEmail = MkEmail "saurabhnanda@gmail.com", userFullName = "Saurabh Nanda", userPassword = "unsafe", userPostalCode = "110001", userStatus = "active", userVerificationCode = "saSa11"}]

Sum-types save you from typos

Now, let’s write a function that returns the number of users in verified, inactive, and deactive statuses:

countUsers :: String -> [User] -> Int
countUsers status userDb = length (filter (\u -> (userStatus u) == status) userDb)

userStatusSummary :: [User] -> [(String, Int)]
userStatusSummary userDb =
  let statuses = ["verified", "inactive", "deactive"]
  in map (\status -> (status, countUsers status userDb)) statuses

Look fine, right? WRONG! Can you spot the error in the userStatusSummary function given above?

Reveal the error

The UserRegistration module expects the userStatus field to have the following values:

  • active for active/verified users
  • deactivated for inactive/deactivated users
  • unverified for new users

However, in our userStatusSummary function we have used similarly named, but different, values of - verified, inactive and deactive

This is the first use-case for sum-types. Whenever you have a piece of data/field/value that can only have a fixed set of values, always use a sum-type, instead of an Int or String to represent all possible values. In other languages this is possible by using enum types, but as we will see, Haskell’s sum-types are more powerful than standard enum types.

So, let’s create a sum-type for our userStatus -

data Status = Active
            | Inactive
            | Deactive
            deriving (Eq, Show, Ord)

data User = MkUser
  { userEmail :: Email
  , userFullName :: String

  -- NOTE: Bad idea alert!
  , userPassword :: String

  , userPostalCode :: String
  , userStatus :: Status
  , userVerificationCode :: String
  } deriving (Eq, Show, Ord)

Let’s unwrap this:

  1. We’ve created a sum type called Status. This is at the same level as Int, String, or Bool, which means it can be used in type signatures.
  2. The Status type can have only three values: Active, Inactive and Deactive. Please make sure you understand this to avoid confusion going forward. Active, Inactive and Deactive are not types, but values. This is similar to how Int is a type and 1, 5, 10, are values of the type.

Let’s try and appreciate the second point about types vs values a little more:

GHCi> data Status = Active | Inactive | Deactive deriving (Eq, Show, Ord)
GHCi> :t Active
Active :: Status

GHCi> :t Inactive
Inactive :: Status

GHCi> :t Deactive
Deactive :: Status

GHCi> let x = Active
GHCi> :t x
x :: Status


-- Try inspecting the type of Status and notice the error that you'll get.
-- This is because Status itself is a type. To find more information about
-- types you can use `:i` instead of `:t`
--
-- GHCi> :t Status
-- <interactive>:1:1: error: Data constructor not in scope: Status
--
-- GHCi> :i Status
-- data Status = Active | Inactive | Deactive
--          Defined at <interactive>:13:1
-- instance [safe] Eq Status -- Defined at <interactive>:13:54
-- instance [safe] Ord Status -- Defined at <interactive>:13:64
-- instance [safe] Show Status -- Defined at <interactive>:13:58

-- Now, notice the similarity with Int & Bool. Try `:t` and `:i`
-- with `Int` and `Bool` as well
--
GHCi> let y = 10
:t y

GHCi> :t True
True :: Bool

GHCi> :t False
False :: Bool

GHCi> let z = True
GHCi> :t z
z :: Bool

Reinforcing this types vs values, yet again, please see if you can understand why the following is incorrect:

data Status = Active | Inactive | Deactive deriving (Eq, Show, Ord)

someFunc :: Active -> String
someFunc = _todo
Reveal the error

Active is a value of type Status. The type signature of someFunc given above is on the lines of the following, which you intuitively understand to be nonsensical:

someFunc :: 1 -> String
someFunc = _todo

You would rather write the following:

someFunc :: Int -> String
someFunc = _todo

Similarly, you’d write the following, when it comes to the Status type:

someFunc :: Status -> String
someFunc = _todo

Now, let’s use the Status type in the UserRegistration module given above, and include the the userStatusSummary function, as well:

module UserRegistration where

import Data.List (break, null, filter, take)

data Email = MkEmail String deriving (Eq, Show, Ord)
data Status = Active | Inactive | Deactive deriving (Eq, Show, Ord)

data User = MkUser
  { userEmail :: Email
  , userFullName :: String

  -- Bad idea alert!
  --
  , userPassword :: String

  , userPostalCode :: String
  , userStatus :: Status
  , userVerificationCode :: String
  } deriving (Eq, Show, Ord)

data NewUser = MkNewUser
  { nuserEmail :: Email
  , nuserFullName :: String
  , nuserPassword :: String
  , nuserPostalCode :: String
  } deriving (Eq, Show, Ord)

registerUser :: NewUser -> [User] -> [User]
registerUser newUser@MkNewUser{nuserEmail=(MkEmail email)} userDb =
  let user = MkUser
              { userEmail = (nuserEmail newUser)
              , userFullName = (nuserFullName newUser)
              , userPassword = (nuserPassword newUser)
              , userPostalCode = (nuserPostalCode newUser)
              , userStatus = Inactive
              , userVerificationCode = (take 2 email) ++ (take 2 (nuserFullName newUser)) ++ (take 2 (nuserPostalCode newUser))
              }
  in (user:userDb)

verifyUser :: Email -> String -> [User] -> (Bool, String, [User])
verifyUser e code userDb =
  let existingUsers = Data.List.filter (\u -> (userEmail u) == e) userDb
  in  if (Data.List.null existingUsers)
        then (False, "No such user", userDb)
        else  let existingUser = head existingUsers
              in  if (code==(userVerificationCode existingUser))
                    then  let verifiedUser = existingUser{userStatus=Active}
                              newUserDb = replaceUserInDb verifiedUser userDb
                          in  (True, "Verified", newUserDb)
                    else (False, "Incorrect verification code", userDb)

deactivateUser :: Email -> [User] -> (Bool, String, [User])
deactivateUser e userDb =
  let existingUsers = Data.List.filter (\u -> (userEmail u) == e) userDb
  in  if (Data.List.null existingUsers)
        then (False, "No such user", userDb)
        else  let existingUser = head existingUsers
                  deactiveUser = existingUser{userStatus = Deactive}
              in  (True, "User deactivated", replaceUserInDb deactiveUser userDb)

replaceUserInDb :: User -> [User] -> [User]
replaceUserInDb u userDb =
  let (a, b) = Data.List.break (\x -> (userEmail x)==(userEmail u)) userDb
  in if (Data.List.null b) -- NOTE: Handling the edge-case where the user may not be in the DB
      then  (a ++ [u])
      else  (a ++ (u:(tail b)))

countUsers :: Status -> [User] -> Int
countUsers status userDb = length (filter (\u -> (userStatus u) == status) userDb)

userStatusSummary :: [User] -> [(Status, Int)]
userStatusSummary userDb =
  let statuses = [Active, Inactive, Deactive]
  in map (\status -> (status, countUsers status userDb)) statuses

Sum-types force you to consider all known cases

Firstly, take a moment to appreciate that using a sum-type forced us to list down (or enumerate) all known values (or valid values) of a given type. As long as the userStatus field was a String it could potentially have any value, even though all throughout the code we had implicitly assumed that only three values were possible - active, verified, and deactivated. By changing the type of userStatus to Status we made this implcit assumption absolutely explicit and, more imporantly, self-documenting. If you were inheriting this code, what kind of functions would you rather deal with:

countUsers :: Status -> [User] -> Int

-- or

countUsers :: String -> [User] -> Int

Now, let’s build-up a use-case for why we would want to be forced to consider all known-cases.

Say, this user-registration module were part of a larger app, where users could comment on posts (something like Reddit), and we needed to write a function that would go through all the comments in the DB for a specific PostId and display them on a webpage. Here’s a starting point for our hypothetical fetchCommentsForPost function, whose result will be given to the UI:

module Post where

-- NOTE: Importing the previous UserRegistration module
--
import UserRegistration
import Data.List as DL

data PostId = MkPostId Int deriving (Eq, Show, Ord)

data Comment = MkComment
  { commentPostId :: PostId
  , commentUserEmail :: Email
  , commentBody :: String
  } deriving (Eq, Show, Ord)

fetchCommentsForPost :: [Comment] -> PostId -> [Comment]
fetchCommentsForPost commentDb postId =
  filter (\c -> (commentPostId c) == postId) commentDb

Now, if we have deactivated any user we don’t want his/her comments to show-up on the post’s page. Here’s one possible way we could achieve this:

fetchCommentsForPost :: [Comment] -> [User] -> PostId -> [Comment]
fetchCommentsForPost commentDb userDb postId =
  filter (\c -> ((commentPostId c) == postId) && (userNotDeactivated userDb (commentUserEmail c))) commentDb

userNotDeactivated :: [User] -> Email -> Bool
userNotDeactivated userDb email =
  let deactivatedUsers = DL.filter (\u -> ((userEmail u) == email) && ((userStatus u) == Deactive)) userDb
  in (DL.null deactivatedUsers)

So far, so good.

Now, after 3 months a new user-status is added to the system, called ShadowBanned. In most message forums, comments by shadow-banned users are visible only to the shadow-banned user himself/herself. Banned/deactivated users immediately come to know because they are locked out of the app completely. On the other hand, shadow-banned users can keep commenting (or sharing, or posting) to their heart’s content without anyone else ever seeing their content. It’s a pretty devilish scheme, if you come to think of it!

Another team member goes and adds the following code to the UserRegistration module:

-- First adding another value to the Status type:
data Status = Active | Inactive | Deactive | ShadowBanned deriving (Eq, Show, Ord)

-- Then, adding another function to shadow-ban a user:
shadowbanUser :: Email -> [User] -> (Bool, String, [User])
shadowbanUser e userDb =
  let existingUsers = Data.List.filter (\u -> (userEmail u) == e) userDb
  in  if (Data.List.null existingUsers)
        then (False, "No such user", userDb)
        else  let existingUser = head existingUsers
                  shadowBannedUser = existingUser{userStatus = ShadowBanned}
              in  (True, "User deactivated", replaceUserInDb shadowBannedUser userDb)

You probably know where this is going.

The other team-member won’t immediately know where else in the app userStatus has been been used to take important decisions. The actual implementation of shadow-banning would end-up becoming a whack-a-mole debugging session, with the developer having to manually test the entire app to discover places where shadow-banning isn’t working and patching the code one-by-one.

Let’s use sum-types to solve this problem…

Important

First ensure that all compiler warnings are enabled

What’re we’re about to do next, is technically called exhaustiveness checking and for it to work, we need to tell enable some compiler-level warnings. In your stack project open the package.yaml file and locate the following section and ensure the highlighted line is there:

library:
  source-dirs: src
  ghc-options:
  - -Wall

Now, go exit your stack ghci session and run the hpack command from your project’s root directory:

$ cd exercises
$ hpack
generated exercises.cabal

$ stack ghci
GHCi>

==> Now restart your IDE as well. <==

Let’s rewrite our fetchCommentsForPost function leveraging case-matching for sum-types:

fetchCommentsForPost :: [Comment] -> [User] -> PostId -> [Comment]
fetchCommentsForPost commentDb userDb postId =
  filter
    (\c -> ((commentPostId c) == postId)
             && (DL.all (\x -> shouldDisplayCommentsForUser x) userDb))))
    commentDb

shouldDisplayCommentsForUser :: User -> Bool
shouldDisplayCommentsForUser user =
  case (userStatus user) of
    -- NOTE: We are displaying comments for inactive users because they may
    -- have been forced to re-verify their accounts. And we wouldn't want their comments
    -- to suddenly disappear from the site just because they haven't yet *re-verified* their account.
    --
    Inactive -> True
    Active -> True
    Deactive -> False

Go back and add the ShadowBanned value to the Status type and see what the compiler says for the shouldDisplayCommentsForUser function.

  • Till now, it looks deceptively similar to an enum type from other languages, but here’s what it makes this different. Active, Inactive and Deactive are not only values, they are also zero-parameter data-constructors of the Status type.

  • Referring back to what was written in Positional product types, you will now begin to understand why it’s necessary in Haskell to allow a type and it’s constructor to have different names:

    "...This is unlike a lot of other OOP languages where the constructors must
    either have the same name as the type (or class) OR should have a special
    name...When we talk about 'sum' types we’ll see why is it possible in
    Haskell to have constructors with different names for the same type."
    

Todo

complete this section

Sum-types can help you to always consider edge-cases

Todo

complete this section

Sum types that you’ve alredy been using

Bool

Todo

complete this section

Lists

Todo

complete this section

Exercises

Rails fence cipher

Revisit the encodeMessage function in Rail Fence Cipher. It is a classic case of structure-vs-sematics problem described in this chapter. Rewrite the encodeMessage function using various techniques discussed in this chapter and describe the pros-and-cons of each solution.

You should try rewriting the function using 2-3 different approaches, and actually write down the pros & cons as comments.

Exam score processing

Revisit all your solution to the Exam-score processing problem. How can they benefit from using product and/or sum types discussed in this chapter? Reimplement all your solutions using the new techniques that you have learnt. Try rewriting your solutions using 2-3 different approaches, and actually write down the pros & cons as comments.

Date arithmetic

Write a function that adds number-of-days to a given date, while taking care of a few edge-cases:

  • Spill-over of days is added to months
  • Spill-over of months is added to years
  • How are you determining which month has how many days? What about February in leap years?

Solve this problem using the following techniques:

  1. Using plain built-in types:
addDays :: (Int, Int, Int) -> Int -> (Int, Int, Int)
  1. Using separate types for date/month/year:
data Day = MkDay Int deriving (Eq, Show, Ord)
data Month = MkMonth Int deriving (Eq, Show, Ord)
data Year = MkYear Int deriving (Eq, Show, Ord)

addDays :: (Day, Month, Year) -> Int -> (Day, Month, Year)
  1. Using a single positional-product type:
data Date = MkDate Int Int Int deriving (Eq, Show, Ord)

addDays :: Date -> Int -> Date
  1. Using a record type, and pattern matching on the MkDate constructor to extract the day, month, and year fields
data Date = MkDate
  { dtDay :: Int
  , dtMonth :: Int
  , dtYear :: Int
  } deriving (Eq, Show, Ord)

addDays :: Date -> Int -> Date
  1. Using the same record type as (4) above, but using accessor functions to access the day, month, and year fields.

Linked lists

Write a program to create an manipulate a linked list of integers. Implement the following operations on linked lists:

  • Inserting an integer at the front of the linked-list
  • Inserting an integer at the end of the linked-list
  • Inserting an integer at the n-th position of a linked-list
  • removing an integer at the n-th position of a linked-list
  • Obtaining the length of a linked list
  • Reversing a linked list
  • Returning the index of an element in the linked list
  • Breaking a linked list into two separate linked lists at the n-th position

What is the core data type that you will use for representing a linked list? If you have implemented linked lists in languages like C/C++/Java, then you probably know that the core data-type has to be recursive in nature.

Reveal the core data-type
data LinkedList = LLEmpty
                | LLNode Int LinkedList
                deriving (Eq, Show, Ord)

Stack

Write a program to create and manipulate a stack of integers. Implement the following operations for your stack:

  • Inserting an element at the top of the stack
  • Popping an element from the top of the stack

Can you re-use your linked-list code to create your stack?

Reverse polish notation calculator

Write a program that can calculate a mathematical expression expressed in reverse polish notation One possible core data-type for your calculator can be the following:

data Expr = ELit Float Expr
          | EMul Expr
          | EAdd Expr
          | ESubtract Expr
          | EDivide Expr
          | ENone
          deriving (Eq, Show, Ord)

expr1 :: Expr
expr1 = ELit 10.0 (ELit 20.0 (EAdd ENone))

-- NOTE: This is an invalid expression because it would lead to to ``10`` and
-- ``50`` remaining on the stack with no operand to process them further.
--
expr2 :: Expr
expr2 = ELit 10.0 (ELit 20.0 (ELit 30.0 (EAdd ENone)))

-- NOTE: This is an invalid expression because it would lead to to ``10 / 0``
--
expr2 :: Expr
expr2 = ELit 10.0 (ELit 20.0 (ELit 20.0 (EMinus (EDivide ENone))))


calculate :: Expr -> (Bool, String, Float)
calculate e = _todo

-- GHCi> calcualte expr1
-- (True, "Valid expression", 30.0)

-- GHCi> calcualte expr2
-- (True, "Invalid expression", 0.0)

-- GHCi> calcualte expr3
-- (True, "Divide by zero", 0.0)

You can also choose to keep your Expr data-type separate and a stack of Expr separate.

Run-length encoding

Run-length encoding (RLE) is a simple form of data compression, where runs (consecutive data elements) are replaced by just one data value and count.

For example we can represent the original 53 characters with only 13:

"WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB"  ->  "12WB12W3B24WB"

And from the compressed string, we can get the original string back, without loss of any data. eg:

"12WB12W3B24WB" -> "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB"

Can you think of a sum-type that will help you in writing a correct codec easily?

  1. Write a program that can run-length encode and decode any string comprising of alphabets alone.
  2. Write a program that can run-length encode and decode any string comprising of alphabets and numbers. What makes this part of the problem tougher than the earlier part? What do you need to change in the problem statement to be able to solve encode/decoding of strings containing alphabets and numbers? Make your suggesting change and solve the problem.

Scratch section - PLEASE IGNORE

In this video we’ll cover:

  • The downside of tuples
  • Fixing the downside by using “product types”
  • Using product-types (and tuples) via pattern matching
  • Difference between types, data-constructors, and actual values.
  • Downside of “tagging” product types and fixing them with newtypes.
  • Downside of product-types and fixing them with records
  • Sum types and why does one needs them
  • Exhaustiveness checking
  • Creating our own type using a mix of sum, product, and records
  • Why are these called sum & product types
  • Named Puns

Exercises: - Run length encoding -

Core motivation:

We touched upon tuples previously and outlined three reasons why they’re used:

  • Returning multiple values from functions
  • Passing multiple values to high-order functions where a lambda just allows a single parameter
  • Ad-hoc way to associate data or elements

If you think about it, the first two are really specific use-cases of the last.

One of the problems with tuples like (Int, Int) is that when you’re looking at type-signatures you don’t really come to know what each of the Int’s means. The structure of the value is known, but the semantics are not known. Discuss about what semantics means.

An (Int, Int) could mean anything – hour / min, x / y coordinates, row / col coordinates,

PRODUCT TYPE - Canonical example of representing a point in 2-d space - (x, y) coordinates OR (row, col) coordinates

  • data Point = MkPoint Double Double deriving (Eq, Show)
  • equivalent to writing: data Point where MkPoint :: Double -> Double -> Point
  • Inspect the type of the data-constructors in GHCi. They are just functions – for most part.
  • Pattern matching against constructors
  • Representing an hour/min pair: data TimeOfDay = TimeOfDay Int Int
  • This can also be represented as (Int, Int) ==> TimeOfDay Int Int
  • Similar to tuples, but different because of semantics. They are structurally the same, but semantically different. They mean very, very different things.
  • For example, the Mars Climate Orbiter disaster due to confusion between pounds (imperial units) and kilograms (metric, SI units)

SUM TYPE - Designing an Enum – status of an Order, status of a Payment, presence/absence of a field (say social media profiles – basically a hand-written Maybe) - Advantages of using exhaustiveness checking - That is called a sum-type

Mix of Sum & Product: - Example of representing a person’s full-name:

data Title = Mr | Mrs | Dr | Miss | Other String data FullName = FullName Title String String

Need for records. - But there’s a problem here. What if you were building the data-structure for

  • Presence / absence of fields. Why Haskell doesn’t have NULL values.
[1]Haddock is to Haskell, what Javadocs is to Java.