Algebraic Data Types (Sum & Product types) ========================================== Learning Goal ------------- In this chapter we'll understand the downside of **not** being able to communicate *semantics* of a function through the built-in types that we have seen till now (in previous chapters). We'll try to address that problem, first by using ``newtype`` and then via product-types. After that we will see how to *restrict* the values that a function can accept by using sum-types, thus making our code type-safe, correct, and less error-prone. We will also be covering pattern-matching along the way. By the end of this chapter, and after solving the exercises, you should be able to appreciate when to use vanilla built-in types, when to use custom types (like newtypes, sum-types, and product-types) in your code, and the pros & cons of each approach. Structure vs Semantics - Downside of "regular" type-signatures -------------------------------------------------------------- Let's try to understand *why* we need more "complicated" types than the regular built-in types that we've seen till now (i.e. ``String``, ``Char``, ``Bool``, ``Int``, etc). Consider our ``calculateEndTime`` function from the previous chapter: .. code-block:: haskell calculateEndTime :: Int -> Int -> Int -> (Int, Int) calculateEndTime hr mn durationInMins = let (addHour, finalMins) = divMod (mn + durationInMins) 60 in (hr + addHour, finalMins) If we look at *only* the function signature, and not the variable names or the actual function definition, here is what we'll see: .. code-block:: haskell calculateEndTime :: Int -> Int -> Int -> (Int, Int) By looking at *just* this type signature can you be sure about what each ``Int`` argument represents in real life? Which one is for the hour? Which one is for the minute? Which one is for the duration? Is the duration in seconds or minutes? This is what is meant by "semantics". By the way, here are a couple of dictionary meanings of the word: .. code-block:: text semantics 1. the meaning, or an interpretation of the meaning, of a word, sign, sentence, etc.: 2. the meaning of a word, phrase, or text. 3. the branch of linguistics and logic concerned with meaning. While the *structure* of the ``calculateEndTime`` function is absolutely clear - i.e. it takes three integer arguments and returns a 2-tuple of two integers - the *semantics* are not clear. What does each integer argument really *mean*? What does each represent in real life? Let's take another example. Say, you are working on a shopping-cart project, and one of your team mates has written a function to prepare the content/HTML for an order confirmation email. Here's what the type-signature looks like: .. code-block:: haskell prepareConfirmationEmail :: String -> String -> [String] -> [String] -> String I'm pretty certain that you'll be pulling your hair out if you had to work with just this! Again, the structure is clear, and the compiler is going to enforce it for you, but that doesn't help you much, does it? Since, it's your team's code (and not some closed-source library), you can peek into the definition of the function and see this: .. code-block:: haskell prepareConfirmationEmail :: String -> String -> [String] -> [String] -> String prepareConfirmationEmail orderId subject to cc = _todo Aha! Just by looking at the variable names you can understand the *semantics* of the function call. Given the type-signature **along with** the variable names, now you know what you need to pass where, **and** the compiler will enforce that you are passing the correct types. (i.e. you are not passing a list of strings to the ``subject`` argument). .. note:: By the way **this** is why naming is very, very important in computer engineering. It is a well-known saying - **There are two hard problems in computer science: cache invalidation and naming things** So, that should solve the problem then, right? Well, not quite... Problem with API docs ^^^^^^^^^^^^^^^^^^^^^ Firstly, the standard doc generation tool in Haskell - haddock [1]_ - **does not** generate API docs with variable names. This is one of the problems that you might already be facing. Compare the API docs of the ``filter`` function from Clojure vs Haskell: .. code-block:: clojure ; clojure (filter pred coll) .. code-block:: haskell -- haskell filter :: (a -> Bool) -> [a] -> [a] If you aren't accustomed to reading type-signatures, you'll definitely struggle with the Haskell API docs. On the other hand Clojure's API docs, don't contain any type signature, but because of the variable names - ``pred`` (for predicate) and ``coll`` for collection - make intuitive sense. The flip-side of this is, in Clojure's case, without seeing usage examples, you cannot really be sure what type of values to pass to ``pred`` & ``coll``. Does it work for *all* types of collections? i.e. lists, vectors, hashmaps, trees, or only certain kinds of collections? So, the Haskell way of giving more importance to type-signatures is *structurally superior*, but *semantically inferior.* But, why can't Haskell generate API docs with variable names, like how Java does? The *probable* reason has to do with "function currying" and "point-free syntax", but explaining those would be quite a detour for this chapter. Variable names alone, do not prevent bugs adequately ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Secondly, just because you know variable names does not protect you adequately from certain classes of bugs. My favourite example for this is the `Mars Climate Orbiter failure `_: "...communication with the spacecraft was lost as the spacecraft went into orbital insertion, due to ground-based computer software which produced output in non-SI units of pound (force)-seconds (lbf·s) instead of the SI units of newton-seconds (N·s) specified in the contract between NASA and Lockheed..." If you're writing a program where one ``Int`` argument is *semantically* different from another ``Int`` argument, say the ``x`` co-ordinate from the ``y`` co-ordinate -- or in the Mars Climate Orbiter's case, one ``Double`` (force-seconds) from another ``Double`` (newton-seconds) -- just knowing the variable names will not protect you from inadvertently doing an incorrect computation. For example: - adding the ``x`` co-ordinate to the ``y`` co-ordinate - dividing a force in ``newtown-second`` with a force in ``pound-seconds`` - substituting someone's name with their email address - adding something in INR with something in USD **Thus, we use Haskell's advanced type-system to design custom-types that help us write better programs.** Let us see how... .. _positional_product_types: Positional product types ------------------------ .. note:: In academic literature there are only "sum" types and "product" types. However, I have further split product types into two parts - positional and named/records - just to make them easier to refer to. Let us go back to the ``calculateEndTime`` example and fix the *semantics* problem in the type-signature, by changing this... .. code-block:: haskell calculateEndTime :: Int -> Int -> Int -> (Int, Int) calculateEndTime = _todo ...to the following: .. code-block:: haskell data TimeOfDay = MkTimeOfDay Int Int deriving (Eq, Show, Ord) calculateEndTime :: TimeOfDay -> Int -> TimeOfDay calculateEndTime (MkTimeOfDay hr mn) durationInMins = let (addHour, finalMins) = divMod (mn + durationInMins) 60 in MkTimeOfDay (hr + addHour) finalMins -- GHCi> calculateEndTime (MkDuration 10 30) 45 -- MkDuration 11 15 Let's break this down line-by-line: .. code-block:: haskell data TimeOfDay = MkTimeOfDay Int Int deriving (Eq, Show, Ord) - We've defined a new "positional product-type" - The *type* is called ``TimeOfDay`` (which is now at the same *level* as the built-in types like ``Int`` or ``Char``). This means that we can actually use it in type-signatures. - The *constructor* of the ``TimeOfDay`` *type* is called ``MkTimeOfDay``. This is unlike a lot of other OOP languages where the constructors must either have the *same* name as the type (or class) OR should have a *special* name. For example, here's what a constructor looks like in Java: .. code-block:: java :emphasize-lines: 5,6,7 class Person { private String name; private String email; // Cosntructors in Java must have the same name as the class // Person(String name, String email) { this.name = name; this.email = email; } } And here is what a constructor in Ruby looks like: .. code-block:: ruby :emphasize-lines: 2,3,4 class Person # Constructors in Ruby must be named `initialize` # def initialize(name, email) @name = name @email = email end end When we talk about "sum" types we'll see why is it possible in Haskell to have constructors with different names for the same type. - The ``MkTimeOfDay`` constructor "holds" two ``Int`` values. - Finally there is a ``deriving (Eq, Show, Ord)`` boilerplate, which we'll understand when we cover type-classes. .. important:: Whenever you are defining custom types, please put ``deriving (Eq, Show, Ord)`` at the tne end, otherwise GHCi won't be able to print values of your custom type to the terminal. Why exactly is this required will become clearer when we cover type-classes. The reason why I'm calling this type a **positional** product-type is because there is no *explicit* differentiation between the two ``Int`` values. One of them holds the hour and the other minute. But, by just looking at the type definition, we cannot be sure which holds which. We have to fallback on a strong "positional convention", whenever you're writing the time-of-day, the first one is hour and the second one is minutes. Now, before we break down the new definition of ``calcluateEndTime`` let's see what this type actually is, and how to define values of this type. Let's play around with it in GHCi: .. code-block:: haskell GHCi> data TimeOfDay = MkTimeOfDay Int Int deriving (Eq, Show, Ord) GHCi> let y = MkTimeOfDay 10 30 GHCi> y MkTimeOfDay 10 30 GHCi> :t y y :: TimeOfDay GHCi> MkTimeOfDay 12 45 MkTimeOfDay 12 45 GHCi> y + 20 :10:1: error: • No instance for (Num TimeOfDay) arising from a use of ‘+’ • In the expression: y + 20 In an equation for ‘it’: it = y + 20 Notice the last part where we tried to do ``y + 20``. The compiler prevented us from evaluating that expression becuase it doesn't know how to add a ``TimeOfDay`` to an ``Int``, even though ``y`` internally holds two integers. Some type-safety has started showing up... Also notice the output of ``y`` and ``:t y``. The first printed the *value*, i.e. ``MkTimeOfDay 10 30`` and the latter printed the *type*, i.e. ``TimeOfDay``. This is completely analogous to, say, printing ``123`` (the *value*) and ``Int`` (the *type*). Now, we have "wrapped" our two integers into a custom data-type called ``TimeOfDay`` that we can't do numerical computations on. However, we *need* to do numerical computations on the hour & minute values that have been wrapped. How do we do this? If this were Java, we would be reaching out for things like getters/setters (for private variables in a class), or would be directly manipulating the variables (for public variables): .. code-block:: java Person p = Person("Saurabh", "saurabh@vacationlabs.com"); // either this.... System.out.println(p.name); // ... or this... System.out.println(p.getName()); However, Haskell doesn't have this concept of getters/setters. In fact, it doesn't even have the concept of public/private variables with respect to custom data types. Instead it has, what is called, **pattern matching** Pattern matching ^^^^^^^^^^^^^^^^ Pattern-matching allows you to extract "inner / wrapped" values from constructors of various types. Let's see some pattern-matches in action: **Pattern-matching a tuple:** This works because ``(,)``, ``(,,)``, ``(,,,)``, and so on, are actually constructors for 2-tuples, 3 tuples, 4 tuples, respectively. It's just that tuples are built into the language and you can use them in the more intuitive infix notation as well (i.e. what we have been using till now). .. code-block:: haskell -- using the regular "infix" notation for the (,) constructor -- GHCi> let (x, y) = (10, 'a') GHCi> x 10 GHCi> y 'a' -- Using the prefix notation for the (,) constructor -- GHCi> (,) 'a' 'b' ('a', 'b') GHCi> let (x, y, z) = (,,) 10 20 'a' GHCi> x 10 GHCi> y 20 GHCi> z 'a' -- Even this works... GHCi> let (,,) x y z = (,,) 10 20 'a' -- And this works as well... GHCi> let (,,) x y z = (10, 20, 'b') **Pattern-matching a positional product type:** *Usually*, it is not possible to use the constructor for a positional product type in an infix notation. So, sticking to the prefix notation, pattern-matching for a positional product type works similarly: .. code-block:: haskell GHCi> let (MkTimeOfDay hr mn) = MkTimeOfDay 10 30 GHCi> hr 10 GHCi> mn 30 GHCi> let t = MkTimeOfDay 12 45 GHCi> let (MkTimeOfDay hr mn) = t GHCi> hr 12 GHCi> mn 45 .. note:: It is always a good idea to put parentheses around your pattern matches as shown in the example above. While it can work without parentheses some times, a lot of times the compiler gets confused if you use multiple-pattern matches in function arguments without parentheses. This leads to very complicated compiler errors. So, **always put parentheses around pattern matches** **Pattern-matches can be nested:** What if we had a value of type ``(Char, Int, TimeOfDay)``? .. code-block:: haskell GHCi> let (x, y, MkTimeOfDay hr mn) = ('a', 'b', MkTimeOfDay 10 30) GHCi> x 'a' GHCi> y 'b' GHCi> hr 10 GHCi> mn 30 **It is not necessary to pattern-match:** It is not always necessary to "destructure" a constructor into its "inner values" if you don't need them. For example: .. code-block:: haskell GHCi> let (x, y, t) = ('a', 'b', MkTimeOfDay 10 30) GHCi> x 'a' GHCi> y 'b' GHCi> t MkTimeOfDay 10 30 **You can pattern-match the entire "wrapped" value as well as the inner values:** .. code-block:: haskell :emphasize-lines: 1 GHCi> let (x, y, t@(MkTimeOfDay hr mn)) = ('a', 'b', MkTimeOfDay 10 30) GHCi> x 'a' GHCi> y 'b' GHCi> t MkTimeOfDay 10 30 GHCi> hr 10 GHCi> mn 30 With this detour of pattern-matching out of the way, re-read the function definition of ``calculateEndtime`` would be making more sense now: .. code-block:: haskell calculateEndTime :: TimeOfDay -> Int -> TimeOfDay calculateEndTime (MkTimeOfDay hr mn) durationInMins = let (addHour, finalMins) = divMod (mn + durationInMins) 60 in (MkTimeOfDay (hr + addHour) finalMins) By the way, this is functionally equivalent to the following: .. code-block:: haskell :emphasize-lines: 2,3 calculateEndTime :: TimeOfDay -> Int -> TimeOfDay calculateEndTime t durationInMins = let (MkTimeOfDay hr mn) = t (addHour, finalMins) = divMod (mn + durationInMins) 60 in (MkTimeOfDay (hr + addHour) finalMins) (The difference is *where* we are doing the pattern match. In the former, we are pattern matching the function arguments directly. In the latter the function argument is ``t`` and is then pattern-matched in the ``let`` binding). While this is a significantly better than our original function signature, there is still a slight problem with our function signature. Notice the ``Int`` argument there? Just by looking at the type-signature we still don't know whether that is supposed to be in hours, minutes, or seconds. Let's fix that using another custom type: .. code-block:: haskell data TimeOfDay = MkTimeOfDay Int Int deriving (Eq, Show, Ord) data DurationInMins = MkDuration Int deriving (Eq, Show, Ord) calculateEndTime :: TimeOfDay -> DurationInMins -> TimeOfDay calculateEndTime (MkTimeOfDay hr mn) (MkDuration d) = let (addHour, finalMins) = divMod (mn + d) 60 in (MkTimeOfDay (hr + addHour) finalMins) -- GHCi> calculateEndTime (MkTimeOfDay 10 30) (MkDuration 45) -- MkTimeOfDay 11 15 Now, in our code, if there are some places where we are representing event duration in seconds and in some places we are representing it in minutes, the compiler will prevent us from passing duration-in-seconds to this function, by mistake. .. _immutability: Detour: Immutability ^^^^^^^^^^^^^^^^^^^^ Let's go back to our ``calculateEndTime`` code. Did you notice that we didn't really "update" the value of the hour & minute fields? Certainly not in the sense that you would probably be used to in languages like Java, Javascript, Ruby, or Python. For example, consider the same function written in *typical* Javascript: .. code-block:: javascript function calculateEndTime(t, duration) { // btw, Javascript doesn't have a built-in divMod function... // var addHour = Math.floor((t.mins + duration)/60); var finalMins = (t.mins + duration) % 60; t.hour = t.hour + addHour; t.mins = finalMins; return t; } (Notice, that without reading the code (or documentation) you cannot really know *what* structure to pass for the ``t`` parameter). Now, try the following with this function in Javascript: .. code-block:: javascript :emphasize-lines: 8,9 var start = {hour: 10, mins: 30}; var end = calculateEndTime(start, 45); console.log("end=", end); // OUTPUT: // end={hour: 11, mins: 15} console.log("start=", start); // start={hour: 11, mins: 15} // // ... WTF?! .. important:: **The Javascript function is "mutating" the arguments that you pass to it.** "Mutation" of this kind is one of the biggest source of bugs. Say, you have a ``startTime`` and ``duration``, and you want to calculate the ``endTime`` to display all the three values (startTime, duration, endTime) to the user. However, with such bugs (originating due to mutation), you'll end up with your ``startTime`` being sneakily changed behind your back! In small code-bases this kind of unintended mutation may be easy to manage, but it becomes increasingly hard as the code base grows - the place where a bug is being observed, and the place where the bug is being created could very well be completely different modules, written by different people! Now, going back to ``calculateEndTime`` in Haskell, if you notice, we aren't really mutating the hour & minute fields in the ``TimeOfDay`` that has been passed to the function. We are creating a **new** ``TimeOfDay`` by invoking the ``MkTimeOfDay`` constructor. .. code-block:: haskell :emphasize-lines: 7 data TimeOfDay = MkTimeOfDay Int Int deriving (Eq, Show, Ord) data DurationInMins = MkDuration Int deriving (Eq, Show, Ord) calculateEndTime :: TimeOfDay -> DurationInMins -> TimeOfDay calculateEndTime (MkTimeOfDay hr mn) (MkDuration d) = let (addHour, finalMins) = divMod (mn + d) 60 in (MkTimeOfDay (hr + addHour) finalMins) Try replicating the same Javascript bug in Haskell. You will be unable to! **Haskell's immutability protects us from such bugs.** .. code-block:: haskell GHCi> let startsAt = MkTimeOfDay 10 30 GHCi> let endsAt = calculateEndTime startsAt (DurationInMins 45) GHCi> endsAt MkTimeOfDay 11 15 GHCi> startsAt MkTimeOfDay 10 30 .. important:: And this is what makes Haskell an **immutable language**. Once a variable is defined you **cannot** change it's value. However, using that variable, you can compute/create a new value, and assign it to a new variable (or return it as a result of a function). In fact, it's a good idea to stop calling them variables, because of the connotation that you would have already built up. It's better to start calling them **bindings** instead - once a name is bound to a value, it cannot be bound to a new value **within the same scope.** The last part is very important. You can have the same variable name across different funtions, because they are different scopes. In fact, this is how you can keep doing ``let x = _something`` over & over again in GHCi. It is because GHCi creates a new scope for every statement that you issue. This is to make interactive evaluation simpler. However, try doing the followoing in a Haskell source file (the compiler won't allow it): .. code-block:: haskell someFunction :: Int -> Int someFunction x = let var1 = _something var1 = _somethingElse in _todo One point to note is that nested functions are technically different scopes. So you *can* have bindings with the same name in the inner & outer functions, but the compiler will emit a warning when you do that. This is because the binding in the inner function is ambiguous and *may* be the cause of inadvertent bugs. For example: .. code-block:: haskell :emphasize-lines: 4 import Data.Char (toUpper) toUppercase :: String -> String toUppercase str = map (\str -> toUpper str) str While the example given above will compile, you can probably spot the inherent ambiguity in what ``str`` refers to inside the lambda passed to ``map``. The compiler will emit a warning, but will still go-ahead and *implicitly* use the innermost ``str``. This means, in the ``toUppercase`` function-body ``str`` will refer to the value passed as the first argument, **except** within the lambda, where ``str`` would refer to the value passed to the lambda, instead. This implicit assumption is not always a good idea and *can* be the cause of subtle bugs. .. code-block:: text +------------------------------------------------------+ | OUTER SCOPE +-----------------------+ | | | INNER SCOPE | | | toUppercase str = map | (\str -> toUpper str) | str | | +-----------------------+ | +------------------------------------------------------+ This ambiguity is easily resolved by one of two approaches: 1. Do not use *overlapping* binding names. Again, this does not mean that you can not, or should not, use the same name in *different* functions/scopes. Just don't use same names in overlapping, or nested, scopes. 2. If you cannot come up with a completely new name to disambiguate (remember, naming things is a hard problem in computer science!), you can use a suffix of ``_`` or ``'`` to create a different name, eg. ``customer`` and ``customer_``, or ``user`` and ``user'`` Named product types OR records ------------------------------ Let's go back to the original defintion of ``TimeOfDay`` and what we said about it: "The reason why I'm calling this type a **positional** product-type is because there is no *explicit* differentiation between the two ``Int`` values. One of them holds the hour and the other minute. But, by just looking at the type definition, we cannot be sure which holds which. We have to fallback on a strong "positional convention", whenever you're writing the time-of-day, the first one is hour and the second one is minutes." With ``TimeOfDay`` we could get away with *positional semantics* because the convention of stating the hour before the minute is almost universal. However, that is not always the case. Take the example of dates. Some countries use ``dd/mm/yy`` (eg. India), while some countries use ``mm/dd/yy`` (eg. UK?), and some countries even use ``yy/mm/dd`` (eg. Japan). What happens if you use a positional product type to represent a date in your application? .. code-block:: haskell data Date = MkDate Int Int Int deriving (Eq, Show, Ord) If you're an Indian, you would *assume* ``dd/mm/yy`` *semantics*, whereas your team-mate, who's working out of UK, would *assume* ``mm/dd/yy``. If you were on the Mars Orbiter team, you'd surely end-up causing a major confusion around launch dates! Thus, the motivation for using named product types aka records for cases where *positional semantics* are not universal, or can confusion at a later date (or when the maintainer of the code changes). Let's see how we can use records in the date example above: .. code-block:: haskell data Date = MkDate { dtDay :: Int , dtMonth :: Int , dtYear :: Int } deriving (Eq, Show, Ord) - We created a new record-type, called ``Date`` - The constructor is called ``MkDate`` - The constructor wraps three **named fields** called ``dtDay``, ``dtMonth``, & ``dtYear`` all of the type ``Int`` - If you notice, all the fields are prefixed with ``dt`` and are not simply called ``day``, ``month``, and ``year``. The reason will be explained in :ref:`field_names` Let's play around with this type in GHCi: .. code-block:: haskell GHCi> data Date = MkDate {dtDay :: Int, dtMonth :: Int, dtYear :: Int} deriving (Eq, Show, Ord) GHCi> let d = MkDate{dtDay = 1, dtMonth = 1, dtYear = 2017} GHCi> d MkDate {dtDay = 1, dtMonth = 1, dtYear = 2017} -- Notice these "field-accesor" functions that the compiler gives us for free... GHCi> :t dtDay dtDay :: Date -> Int GHCi> :t dtMonth dtMonth :: Date -> Int GHCi> :t dtYear dtYear :: Date -> Int -- Let's use these field-accesssors to access individial fields... GHCi> dtDay d 1 GHCi> dtMonth d 1 GHCi> dtYear d 2017 The field-accessor functions that the compiler gives us for free (for each record field), is the closes that you can get to Java's "getters" in Haskell. Pattern matching ^^^^^^^^^^^^^^^^ Since ``Date`` also has a constructor it should be possible to pattern-match it, right? Absolutely right! .. code-block:: haskell GHCi> let dt = MkDate{dtDay = 1, dtMonth = 1, dtYear = 2017} GHCi> let (MkDate {dtDay = d, dtMonth = m, dtYear = y}) = dt GHCi> d 1 GHCi> m 1 GHCi> y 2017 However, unlike positional product-types where you are forced to pattern-match *all values* of a constructor, pattern-matching a record-constructor allows us to omit fields. Example: .. code-block:: haskell :emphasize-lines: 1,4,8-10 GHCi> let (MkTimeOfDay hr) = MkTimeOfDay 10 30 :94:6: error: • The constructor ‘MkTimeOfDay’ should have 2 arguments, but has been given 1 • In the pattern: MkTimeOfDay hr In a pattern binding: (MkTimeOfDay hr) = MkTimeOfDay 10 30 -- We are not pattern matching dtDay & dtMonth -- GHCi> let (MkDate {dtYear = y}) = MkDate{dtDay=31, dtMonth=12, dtYear=2017} GHCi> y 2017 Record updates ^^^^^^^^^^^^^^ In :ref:`immutability` we saw how we can't *mutate* values in Haskell. Using existing values to create new values (rather than mutating existing values), wasn't a problem with positional products, because, realistically, positional product types are only used with a small number of fields/values. For example, it's not a good idea to use a positional product for representing a ``Customer`` value: .. code-block:: haskell data Customer = Customer Int String String String Int String String deriving (Eq, Show, Ord) You'd much rather use named products for a type with more than five values (it's a guideline, not a hard rule): .. code-block:: haskell data Customer = Customer { customerId :: Int , customerFirstName :: String , customerLastName :: String , customerEmail :: String , customerBirthYear :: Int , customerCity :: String , customerCountry :: String } deriving (Eq, Show, Ord) So, for small positional products you can do the following without much hassle: .. code-block:: haskell -- Representing weigth in pounds & ounces. Take a look at -- https://en.wikipedia.org/wiki/Imperial_units#Mass_and_weight to -- understand what's going on. -- data ImperialWeight = MkImperialWeight Int Int Int Int deriving (Eq, Show, Ord) addWeights :: ImperialWeight -> ImperialWeight -> ImperialWeight addWeights (MkImperialWeight p1 o1 d1 g1) (MkImperialWeight p2 o2 d2 g2) = let (addDrachm, finalGrain) = divMod (g1 + g2) 7000 (addOunce, finalDrachm) = divMod (d1 + d2 + addDrachm) 256 (addPound, finalOunce) = divMod (o1 + o2 + addOunce) 16 in (MkImperialWeight (p1 + p2 + addPound) finalOunce finalDrachm finalGrain) However, for large records, doing something similar is going to get irritating rather quickly: .. code-block:: haskell import Data.List (dropWhile, dropWhileEnd) import Data.Char (isSpace) data Customer = MkCustomer { customerId :: Int , customerFirstName :: String , customerLastName :: String , customerEmail :: String , customerBirthYear :: Int , customerCity :: String , customerCountry :: String } deriving (Eq, Show, Ord) removeSpaces :: String -> String removeSpaces s = dropWhileEnd (\x -> isSpace x) (dropWhile (\x -> isSpace x) s) cleanupName :: Customer -> Customer cleanupName MkCustomer{customerId=cid, customerFirstName=fn, customerLastName=ln, customerEmail=e, customerBirthYear=by, customerCity=cy, customerCountry=co} = MkCustomer { customerId = cid , customerFirstName = (removeSpaces fn) , customerLastName = (removeSpaces ln) , customerEmail = e , customerBirthYear = by , customerCity = cy , customerCountry = co } This is solved by the "record update syntax". **However, please note, no real "updation" or "mutation" happens even when you use the record update syntax.** A new record is **always created**, just that certain fields will contain different values. Let's see this in action - rewriting the ``cleanupName`` function shown above: .. code-block:: haskell cleanupName :: Customer -> Customer cleanupName cust@MkCustomer{customerFirstName=fn, customerLastName=ln} = cust { customerFirstName = (removeSpaces fn) , customerLastName = (removeSpaces ln) } Or, another way of writing the same thing with accessor functions, instead of pattern-matching: .. code-block:: haskell cleanupName :: Customer -> Customer cleanupName cust = cust { customerFirstName = (removeSpaces (customerFirstName cust)) , customerLastName = (removeSpaces (customerLastName cust)) } - Instead of using the ``MkCustomer`` constructor, we using an existing ``cust`` value to update - Intead of providing all the fields within the curly braces, we provided only those fields that we wanted to change. .. important:: Please verify that the ``Customer`` record being passed to the ``cleanupName`` function is not getting mututated irrespective of which technique you use to write the code. .. note:: This section makes it sound as if positional products don't have any way to "update" them and that you always have to recreate them from scratch. That is not completely true. We will see a *unified* way of changing values in any data-structure using "lenses" in a later chapter. .. _field_names: Gotcha: Field names ^^^^^^^^^^^^^^^^^^^ If you've noticed, in all our code examples, records always had field-names prefixed with a common name. Example: .. code-block:: haskell -- all field names prefixed with `dt` -- data Date = MkDate {dtDay :: Int, dtMonth :: Int, dtYear :: Int} deriving (Eq, Show, Ord) -- instead of -- data Date = MkDate {day :: Int, month :: Int, year :: Int} deriving (Eq, Show, Ord) This is because of a limitation in Haskell's record-types. You cannot define records with the **same field-name in the same module.** For exampe, try doing the following in your editor and **loading your file in GHCi:** (note, doing this in a file, and loading the file in GHCi is important. Directly typing out this code in GHCi will not uncover this limitation) .. code-block:: haskell module MyRecordTypes where data Date = MkDate {day :: Int, month :: Int, year :: Int} deriving (Eq, Show, Ord) data Copyright = MkCopyright {author :: String, year :: Int} deriving (Eq, Show, Ord) You will get an error, and the compiler won't allow it. One way to intuitively understand this limitation is to see what's going on with respect to field-accessor functions. The ``Date`` type will automatically give you the following three field accessors: .. code-block:: haskell :emphasize-lines: 6 -- Inspect the types of `day`, `month`, and `year`, via `:t` in GHCi -- to confirm this -- day :: Date -> Int month :: Date -> Int year :: Date -> Int Similarly, the ``Copyright`` type will also *attempt* to give you the following two field accessors... .. code-block:: haskell :emphasize-lines: 2 author :: Copyright -> String year :: Copyright -> Int ... and, thus, cause a compile error. You can't define functions having the same name but different type-signatures in Haskell. This is not to say that function overlaoding is not possible in Haskell. It is possible, but it is *very different* from how it's done in Java or C++. We will cover it along with type-classes and will see how it is possible to use the same "name" for different record fields, via lenses. .. important:: To sum it up, if you're sticking with basic Haskell, and not going crazy with advanced type-level hackery, it is **strongly recommended** you prefix all fields in your record with a common name. And use camelCase. For example, if your field is ``firstName``, call it ``userFirstName``, and not ``userfirstName``, or ``user_firstName``. Sum types --------- Let's motivate the *need* for sum-types through an example. Say, we were building a *typical* user signup flow and needed a way to represent a user in our code. We could try using a positional-product... .. code-block:: haskell data Customer = MkCustomer String String String String String String deriving (Eq, Show, Ord) ...but that wouldn't take us very far, because we would keep getting confused about the *semantics* of each ``String`` value. Positional products would not be appropriate because the *semantics* of each field would not be clear. So, we go with a named product type, aka, record: .. code-block:: haskell :emphasize-lines: 11 data User = MkUser { userEmail :: String , userFulllName :: String -- NOTE: DON'T ever do this in production code. You should be using an -- appropriate type exported by the `bcrypt` library for passwords. -- , userPassword :: String , userPostalCode :: String , userStatus :: String , userVerificationCode :: String } deriving (Eq, Show, Ord) Now, as most *typical* user signups go, a user cannot fill up a registration form and start using your service/app immediately. There is an verification step, where a code is send via email or SMS, which the user has to enter on a webpage. Once the verification step is complete, the user is deemed as *active* or *verified*. Further, due to misuse of your app/service/platform, you may want to deactivate a user account as well, thus preventing any further access. All of this complexity is usually represented by a value stored in the ``userStatus`` field. Now, let's try writing the ``registerUser``, ``verifyUser``, and ``deactivateUser`` functions using this ``User`` record: .. _user_registration: .. code-block:: haskell module UserRegistration where import Data.List (break, null, filter, take) data Email = MkEmail String deriving (Eq, Show, Ord) data User = MkUser { userEmail :: Email , userFullName :: String -- NOTE: DON'T ever do this in production code. You should be using an -- appropriate type exported by the `bcrypt` library for passwords. -- , userPassword :: String , userPostalCode :: String , userStatus :: String , userVerificationCode :: String } deriving (Eq, Show, Ord) data NewUser = MkNewUser { nuserEmail :: Email , nuserFullName :: String , nuserPassword :: String , nuserPostalCode :: String } deriving (Eq, Show, Ord) registerUser :: NewUser -> [User] -> [User] registerUser newUser@MkNewUser{nuserEmail=(MkEmail email)} userDb = let user = MkUser { userEmail = (nuserEmail newUser) , userFullName = (nuserFullName newUser) , userPassword = (nuserPassword newUser) , userPostalCode = (nuserPostalCode newUser) , userStatus = "unverified" -- NOTE: Pretty lame way of generating a verification code, but -- let's stick with this, for now, in order to not complicate the -- example. -- , userVerificationCode = (take 2 email) ++ (take 2 (nuserFullName newUser)) ++ (take 2 (nuserPostalCode newUser)) } in (user:userDb) verifyUser :: Email -> String -> [User] -> (Bool, String, [User]) verifyUser e code userDb = let existingUsers = Data.List.filter (\u -> (userEmail u) == e) userDb in if (Data.List.null existingUsers) then (False, "No such user", userDb) else let existingUser = head existingUsers in if (code==(userVerificationCode existingUser)) then let verifiedUser = existingUser{userStatus="active"} newUserDb = replaceUserInDb verifiedUser userDb in (True, "Verified", newUserDb) else (False, "Incorrect verification code", userDb) deactivateUser :: Email -> [User] -> (Bool, String, [User]) deactivateUser e userDb = let existingUsers = Data.List.filter (\u -> (userEmail u) == e) userDb in if (Data.List.null existingUsers) then (False, "No such user", userDb) else let existingUser = head existingUsers deactiveUser = existingUser{userStatus = "deactivated"} in (True, "User deactivated", replaceUserInDb deactiveUser userDb) replaceUserInDb :: User -> [User] -> [User] replaceUserInDb u userDb = let (a, b) = Data.List.break (\x -> (userEmail x)==(userEmail u)) userDb in if (Data.List.null b) -- NOTE: Handling the edge-case where the user may not be in the DB then (a ++ [u]) else (a ++ (u:(tail b))) .. note:: Notice how the ``replaceUserInDb`` function is written. The concept of "immutability" cuts across the *entire* language. In this particular function, we aren't mutating the ``[User]`` list (which represents the user DB). In fact, even if we wanted to, the language wouldn't allow us. So, we accept a ``[User]`` list and return a *new* ``[User]`` list in all our functions. **Please verify this for yourself in GHCi.** Let's play around with this in GHCi: .. code-block:: haskell GCHi> :l UserRegistration GHCi> let db1 = [] GCHi> let db2 = registerUser MkNewUser{nuserEmail = (MkEmail "saurabhnanda@gmail.com"), nuserFullName = "Saurabh Nanda", nuserPassword = "unsafe", nuserPostalCode = "110001"} db1 GHCi> db2 [MkUser {userEmail = MkEmail "saurabhnanda@gmail.com", userFullName = "Saurabh Nanda", userPassword = "unsafe", userPostalCode = "110001", userStatus = "unverified", userVerificationCode = "saSa11"}] GHCi> let db3 = registerUser MkNewUser{nuserEmail = (MkEmail "john@doe.com"), nuserFullName = "John Doe", nuserPassword = "unsafeagain", nuserPostalCode = "220002"} db2 GHCi> db3 [MkUser {userEmail = MkEmail "john@doe.com", userFullName = "John Doe", userPassword = "unsafeagain", userPostalCode = "220002", userStatus = "unverified", userVerificationCode = "joJo22"},MkUser {userEmail = MkEmail "saurabhnanda@gmail.com", userFullName = "Saurabh Nanda", userPassword = "unsafe", userPostalCode = "110001", userStatus = "unverified", userVerificationCode = "saSa11"}] -- NOTE the pattern-matching in this let-binding. `verifyUser` returns a 3-tuple. -- GHCi> let (result, reason, db4) = verifyUser (MkEmail "saurabhnanda@gmail.com") "incorrect" db3 GHCi> result False GHCi> reason "Incorrect verification code" -- NOTE no change in the user DB -- GHCi> db4 [MkUser {userEmail = MkEmail "john@doe.com", userFullName = "John Doe", userPassword = "unsafeagain", userPostalCode = "220002", userStatus = "unverified", userVerificationCode = "joJo22"},MkUser {userEmail = MkEmail "saurabhnanda@gmail.com", userFullName = "Saurabh Nanda", userPassword = "unsafe", userPostalCode = "110001", userStatus = "unverified", userVerificationCode = "saSa11"}] -- Let's try this again, with the correct verification code, and notice that `db4` changes... -- GHCi> let (result, reason, db4) = verifyUser (MkEmail "saurabhnanda@gmail.com") "saSa11" db3 GHCi> db4 [MkUser {userEmail = MkEmail "john@doe.com", userFullName = "John Doe", userPassword = "unsafeagain", userPostalCode = "220002", userStatus = "unverified", userVerificationCode = "joJo22"},MkUser {userEmail = MkEmail "saurabhnanda@gmail.com", userFullName = "Saurabh Nanda", userPassword = "unsafe", userPostalCode = "110001", userStatus = "active", userVerificationCode = "saSa11"}] GHCi> let (result, reason, db5) = deactivateUser (MkEmail "incorrect@incorrect.com") db4 GHCi> result False GHCi> reason "No such user" GHCi> let (result, reason, db5) = deactivateUser (MkEmail "john@doe.com") db4 GHCi> db5 [MkUser {userEmail = MkEmail "john@doe.com", userFullName = "John Doe", userPassword = "unsafeagain", userPostalCode = "220002", userStatus = "deactivated", userVerificationCode = "joJo22"},MkUser {userEmail = MkEmail "saurabhnanda@gmail.com", userFullName = "Saurabh Nanda", userPassword = "unsafe", userPostalCode = "110001", userStatus = "active", userVerificationCode = "saSa11"}] Sum-types save you from typos ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Now, let's write a function that returns the number of users in ``verified``, ``inactive``, and ``deactive`` statuses: .. code-block:: haskell countUsers :: String -> [User] -> Int countUsers status userDb = length (filter (\u -> (userStatus u) == status) userDb) userStatusSummary :: [User] -> [(String, Int)] userStatusSummary userDb = let statuses = ["verified", "inactive", "deactive"] in map (\status -> (status, countUsers status userDb)) statuses Look fine, right? **WRONG!** Can you spot the error in the ``userStatusSummary`` function given above? .. container:: toggle .. container:: header **Reveal the error** .. container:: The ``UserRegistration`` module expects the ``userStatus`` field to have the following values: * ``active`` for active/verified users * ``deactivated`` for inactive/deactivated users * ``unverified`` for new users However, in our ``userStatusSummary`` function we have used similarly named, but different, values of - ``verified``, ``inactive`` and ``deactive`` This is the **first use-case** for sum-types. Whenever you have a piece of data/field/value that can only have a fixed set of values, **always** use a sum-type, instead of an ``Int`` or ``String`` to represent all possible values. In other languages this is possible by using ``enum`` types, but as we will see, Haskell's sum-types are more powerful than standard ``enum`` types. So, let's create a sum-type for our ``userStatus`` - .. code-block:: haskell :emphasize-lines: 1-4, 14 data Status = Active | Inactive | Deactive deriving (Eq, Show, Ord) data User = MkUser { userEmail :: Email , userFullName :: String -- NOTE: Bad idea alert! , userPassword :: String , userPostalCode :: String , userStatus :: Status , userVerificationCode :: String } deriving (Eq, Show, Ord) Let's unwrap this: 1. We've created a *sum type* called ``Status``. This is at the same *level* as ``Int``, ``String``, or ``Bool``, which means it can be used in type signatures. 2. The ``Status`` *type* can have only three *values*: ``Active``, ``Inactive`` and ``Deactive``. **Please make sure you understand this to avoid confusion going forward.** ``Active``, ``Inactive`` and ``Deactive`` are **not types, but values.** This is similar to how ``Int`` is a type and ``1``, ``5``, ``10``, are values of the type. Let's try and appreciate the second point about types vs values a little more: .. code-block:: haskell GHCi> data Status = Active | Inactive | Deactive deriving (Eq, Show, Ord) GHCi> :t Active Active :: Status GHCi> :t Inactive Inactive :: Status GHCi> :t Deactive Deactive :: Status GHCi> let x = Active GHCi> :t x x :: Status -- Try inspecting the type of Status and notice the error that you'll get. -- This is because Status itself is a type. To find more information about -- types you can use `:i` instead of `:t` -- -- GHCi> :t Status -- :1:1: error: Data constructor not in scope: Status -- -- GHCi> :i Status -- data Status = Active | Inactive | Deactive -- Defined at :13:1 -- instance [safe] Eq Status -- Defined at :13:54 -- instance [safe] Ord Status -- Defined at :13:64 -- instance [safe] Show Status -- Defined at :13:58 -- Now, notice the similarity with Int & Bool. Try `:t` and `:i` -- with `Int` and `Bool` as well -- GHCi> let y = 10 :t y GHCi> :t True True :: Bool GHCi> :t False False :: Bool GHCi> let z = True GHCi> :t z z :: Bool Reinforcing this *types* vs *values*, yet again, please see if you can understand why the following is incorrect: .. code-block:: haskell data Status = Active | Inactive | Deactive deriving (Eq, Show, Ord) someFunc :: Active -> String someFunc = _todo .. container:: toggle .. container:: header **Reveal the error** .. container:: ``Active`` is a *value* of *type* ``Status``. The type signature of ``someFunc`` given above is on the lines of the following, which you intuitively understand to be nonsensical: .. code-block:: haskell someFunc :: 1 -> String someFunc = _todo You would rather write the following: .. code-block:: haskell someFunc :: Int -> String someFunc = _todo Similarly, you'd write the following, when it comes to the ``Status`` type: .. code-block:: haskell someFunc :: Status -> String someFunc = _todo Now, let's use the ``Status`` type in the ``UserRegistration`` module given above, and include the the ``userStatusSummary`` function, as well: .. code-block:: haskell :emphasize-lines: 6, 17, 35, 47, 58, 68-74 module UserRegistration where import Data.List (break, null, filter, take) data Email = MkEmail String deriving (Eq, Show, Ord) data Status = Active | Inactive | Deactive deriving (Eq, Show, Ord) data User = MkUser { userEmail :: Email , userFullName :: String -- Bad idea alert! -- , userPassword :: String , userPostalCode :: String , userStatus :: Status , userVerificationCode :: String } deriving (Eq, Show, Ord) data NewUser = MkNewUser { nuserEmail :: Email , nuserFullName :: String , nuserPassword :: String , nuserPostalCode :: String } deriving (Eq, Show, Ord) registerUser :: NewUser -> [User] -> [User] registerUser newUser@MkNewUser{nuserEmail=(MkEmail email)} userDb = let user = MkUser { userEmail = (nuserEmail newUser) , userFullName = (nuserFullName newUser) , userPassword = (nuserPassword newUser) , userPostalCode = (nuserPostalCode newUser) , userStatus = Inactive , userVerificationCode = (take 2 email) ++ (take 2 (nuserFullName newUser)) ++ (take 2 (nuserPostalCode newUser)) } in (user:userDb) verifyUser :: Email -> String -> [User] -> (Bool, String, [User]) verifyUser e code userDb = let existingUsers = Data.List.filter (\u -> (userEmail u) == e) userDb in if (Data.List.null existingUsers) then (False, "No such user", userDb) else let existingUser = head existingUsers in if (code==(userVerificationCode existingUser)) then let verifiedUser = existingUser{userStatus=Active} newUserDb = replaceUserInDb verifiedUser userDb in (True, "Verified", newUserDb) else (False, "Incorrect verification code", userDb) deactivateUser :: Email -> [User] -> (Bool, String, [User]) deactivateUser e userDb = let existingUsers = Data.List.filter (\u -> (userEmail u) == e) userDb in if (Data.List.null existingUsers) then (False, "No such user", userDb) else let existingUser = head existingUsers deactiveUser = existingUser{userStatus = Deactive} in (True, "User deactivated", replaceUserInDb deactiveUser userDb) replaceUserInDb :: User -> [User] -> [User] replaceUserInDb u userDb = let (a, b) = Data.List.break (\x -> (userEmail x)==(userEmail u)) userDb in if (Data.List.null b) -- NOTE: Handling the edge-case where the user may not be in the DB then (a ++ [u]) else (a ++ (u:(tail b))) countUsers :: Status -> [User] -> Int countUsers status userDb = length (filter (\u -> (userStatus u) == status) userDb) userStatusSummary :: [User] -> [(Status, Int)] userStatusSummary userDb = let statuses = [Active, Inactive, Deactive] in map (\status -> (status, countUsers status userDb)) statuses Sum-types force you to consider all known cases ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Firstly, take a moment to appreciate that using a sum-type forced us to list down (or enumerate) all *known values* (or valid values) of a given type. As long as the ``userStatus`` field was a ``String`` it could potentially have *any* value, even though all throughout the code we had *implicitly* assumed that only three values were possible - ``active``, ``verified``, and ``deactivated``. By changing the type of ``userStatus`` to ``Status`` we made this *implcit* assumption absolutely *explicit* and, more imporantly, *self-documenting*. If you were inheriting this code, what kind of functions would you rather deal with: .. code-block:: haskell countUsers :: Status -> [User] -> Int -- or countUsers :: String -> [User] -> Int Now, let's build-up a use-case for why we would *want* to be forced to consider all known-cases. Say, this user-registration module were part of a larger app, where users could comment on posts (something like Reddit), and we needed to write a function that would go through all the comments in the DB for a specific ``PostId`` and display them on a webpage. Here's a starting point for our hypothetical ``fetchCommentsForPost`` function, whose result will be given to the UI: .. code-block:: haskell module Post where -- NOTE: Importing the previous UserRegistration module -- import UserRegistration import Data.List as DL data PostId = MkPostId Int deriving (Eq, Show, Ord) data Comment = MkComment { commentPostId :: PostId , commentUserEmail :: Email , commentBody :: String } deriving (Eq, Show, Ord) fetchCommentsForPost :: [Comment] -> PostId -> [Comment] fetchCommentsForPost commentDb postId = filter (\c -> (commentPostId c) == postId) commentDb Now, if we have deactivated any user we don't want his/her comments to show-up on the post's page. Here's *one possible way* we could achieve this: .. code-block:: haskell fetchCommentsForPost :: [Comment] -> [User] -> PostId -> [Comment] fetchCommentsForPost commentDb userDb postId = filter (\c -> ((commentPostId c) == postId) && (userNotDeactivated userDb (commentUserEmail c))) commentDb userNotDeactivated :: [User] -> Email -> Bool userNotDeactivated userDb email = let deactivatedUsers = DL.filter (\u -> ((userEmail u) == email) && ((userStatus u) == Deactive)) userDb in (DL.null deactivatedUsers) So far, so good. Now, after 3 months a new user-status is added to the system, called ``ShadowBanned``. In most message forums, comments by shadow-banned users are visible *only to the shadow-banned* user himself/herself. Banned/deactivated users immediately come to know because they are locked out of the app completely. On the other hand, shadow-banned users can keep commenting (or sharing, or posting) to their heart's content without *anyone else* ever seeing their content. It's a pretty devilish scheme, if you come to think of it! Another team member goes and adds the following code to the ``UserRegistration`` module: .. code-block:: haskell -- First adding another value to the Status type: data Status = Active | Inactive | Deactive | ShadowBanned deriving (Eq, Show, Ord) -- Then, adding another function to shadow-ban a user: shadowbanUser :: Email -> [User] -> (Bool, String, [User]) shadowbanUser e userDb = let existingUsers = Data.List.filter (\u -> (userEmail u) == e) userDb in if (Data.List.null existingUsers) then (False, "No such user", userDb) else let existingUser = head existingUsers shadowBannedUser = existingUser{userStatus = ShadowBanned} in (True, "User deactivated", replaceUserInDb shadowBannedUser userDb) You probably know where this is going. The other team-member won't immediately know where else in the app ``userStatus`` has been been used to take important decisions. The actual implementation of shadow-banning would end-up becoming a whack-a-mole debugging session, with the developer having to manually test the entire app to discover places where shadow-banning isn't working and patching the code one-by-one. Let's use sum-types to solve this problem... .. _enable_compiler_warnings: .. important:: **First ensure that all compiler warnings are enabled** What're we're about to do next, is technically called **exhaustiveness checking** and for it to work, we need to tell enable some compiler-level warnings. In your stack project open the ``package.yaml`` file and locate the following section and ensure the highlighted line is there: .. code-block:: yaml :emphasize-lines: 4 library: source-dirs: src ghc-options: - -Wall Now, go exit your ``stack ghci`` session and run the ``hpack`` command from your project's root directory: .. code-block:: shell $ cd exercises $ hpack generated exercises.cabal $ stack ghci GHCi> ==> Now **restart your IDE as well**. <== Let's rewrite our ``fetchCommentsForPost`` function leveraging case-matching for sum-types: .. code-block:: haskell fetchCommentsForPost :: [Comment] -> [User] -> PostId -> [Comment] fetchCommentsForPost commentDb userDb postId = filter (\c -> ((commentPostId c) == postId) && (DL.all (\x -> shouldDisplayCommentsForUser x) userDb)))) commentDb shouldDisplayCommentsForUser :: User -> Bool shouldDisplayCommentsForUser user = case (userStatus user) of -- NOTE: We are displaying comments for inactive users because they may -- have been forced to re-verify their accounts. And we wouldn't want their comments -- to suddenly disappear from the site just because they haven't yet *re-verified* their account. -- Inactive -> True Active -> True Deactive -> False Go back and add the ``ShadowBanned`` value to the ``Status`` type and see what the compiler says for the ``shouldDisplayCommentsForUser`` function. - Till now, it looks deceptively similar to an ``enum`` type from other languages, but here's what it makes this different. ``Active``, ``Inactive`` and ``Deactive`` are not only *values*, they are also *zero-parameter data-constructors* of the ``Status`` type. - Referring back to what was written in :ref:`positional_product_types`, you will now begin to understand why it's necessary in Haskell to *allow* a type and it's constructor to have different names: .. code-block:: text "...This is unlike a lot of other OOP languages where the constructors must either have the same name as the type (or class) OR should have a special name...When we talk about 'sum' types we’ll see why is it possible in Haskell to have constructors with different names for the same type." .. todo:: complete this section Sum-types can help you to always consider edge-cases ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. todo:: complete this section Sum types that you've alredy been using --------------------------------------- Bool ^^^^ .. todo:: complete this section Lists ^^^^^ .. todo:: complete this section Exercises --------- Rails fence cipher ^^^^^^^^^^^^^^^^^^ Revisit the ``encodeMessage`` function in :ref:`rails_fence_cipher`. It is a classic case of structure-vs-sematics problem described in this chapter. Rewrite the ``encodeMessage`` function using various techniques discussed in this chapter and describe the pros-and-cons of each solution. You should try rewriting the function using 2-3 different approaches, and actually **write down** the pros & cons as comments. Pagination ^^^^^^^^^^ Revisit the ``displayPagination`` function in :ref:`pagination`. There we had used **type synonyms** to clarify the semantics of the function signature. However, with type-synonyms, the compiler doesn't prevent us from passing ``TotalItems`` to something that is expecting ``ItemsPerPage``. Modify your solution to the problem by using techniques discussed in this chapter. Try rewriting the function using 2-3 different approaches, and actually **write down** the pros & cons as comments. Exam score processing ^^^^^^^^^^^^^^^^^^^^^ Revisit all your solution to the :ref:`exam_score_processing` problem. How can they benefit from using product and/or sum types discussed in this chapter? Reimplement *all* your solutions using the new techniques that you have learnt. Try rewriting your solutions using 2-3 different approaches, and actually **write down** the pros & cons as comments. Date arithmetic ^^^^^^^^^^^^^^^ Write a function that adds number-of-days to a given date, while taking care of a few edge-cases: - Spill-over of days is added to months - Spill-over of months is added to years - How are you determining which month has how many days? What about February in leap years? Solve this problem using the following techniques: 1. Using plain built-in types: .. code-block:: haskell addDays :: (Int, Int, Int) -> Int -> (Int, Int, Int) 2. Using separate types for date/month/year: .. code-block:: haskell data Day = MkDay Int deriving (Eq, Show, Ord) data Month = MkMonth Int deriving (Eq, Show, Ord) data Year = MkYear Int deriving (Eq, Show, Ord) addDays :: (Day, Month, Year) -> Int -> (Day, Month, Year) 3. Using a single positional-product type: .. code-block:: haskell data Date = MkDate Int Int Int deriving (Eq, Show, Ord) addDays :: Date -> Int -> Date 4. Using a record type, and **pattern matching** on the ``MkDate`` constructor to extract the day, month, and year fields .. code-block:: haskell data Date = MkDate { dtDay :: Int , dtMonth :: Int , dtYear :: Int } deriving (Eq, Show, Ord) addDays :: Date -> Int -> Date 5. Using the same record type as (4) above, but using accessor functions to access the day, month, and year fields. Linked lists ------------ Write a program to create an manipulate a linked list of integers. Implement the following operations on linked lists: - Inserting an integer at the front of the linked-list - Inserting an integer at the end of the linked-list - Inserting an integer at the n-th position of a linked-list - removing an integer at the n-th position of a linked-list - Obtaining the length of a linked list - Reversing a linked list - Returning the index of an element in the linked list - Breaking a linked list into two separate linked lists at the n-th position What is the core data type that you will use for representing a linked list? If you have implemented linked lists in languages like C/C++/Java, then you probably know that the core data-type has to be recursive in nature. .. container:: toggle .. container:: header **Reveal the core data-type** .. container:: .. code-block:: haskell data LinkedList = LLEmpty | LLNode Int LinkedList deriving (Eq, Show, Ord) Stack ----- Write a program to create and manipulate a stack of integers. Implement the following operations for your stack: - Inserting an element at the top of the stack - Popping an element from the top of the stack Can you re-use your linked-list code to create your stack? Reverse polish notation calculator ---------------------------------- Write a program that can calculate a mathematical expression expressed in `reverse polish notation `_ One possible core data-type for your calculator can be the following: .. code-block:: haskell data Expr = ELit Float Expr | EMul Expr | EAdd Expr | ESubtract Expr | EDivide Expr | ENone deriving (Eq, Show, Ord) expr1 :: Expr expr1 = ELit 10.0 (ELit 20.0 (EAdd ENone)) -- NOTE: This is an invalid expression because it would lead to to ``10`` and -- ``50`` remaining on the stack with no operand to process them further. -- expr2 :: Expr expr2 = ELit 10.0 (ELit 20.0 (ELit 30.0 (EAdd ENone))) -- NOTE: This is an invalid expression because it would lead to to ``10 / 0`` -- expr2 :: Expr expr2 = ELit 10.0 (ELit 20.0 (ELit 20.0 (EMinus (EDivide ENone)))) calculate :: Expr -> (Bool, String, Float) calculate e = _todo -- GHCi> calcualte expr1 -- (True, "Valid expression", 30.0) -- GHCi> calcualte expr2 -- (True, "Invalid expression", 0.0) -- GHCi> calcualte expr3 -- (True, "Divide by zero", 0.0) You can also choose to keep your ``Expr`` data-type separate and a stack of ``Expr`` separate. Run-length encoding ------------------- Run-length encoding (RLE) is a simple form of data compression, where runs (consecutive data elements) are replaced by just one data value and count. For example we can represent the original 53 characters with only 13: .. code-block:: text "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB" -> "12WB12W3B24WB" And from the compressed string, we can get the original string back, without loss of any data. eg: .. code-block:: text "12WB12W3B24WB" -> "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB" Can you think of a sum-type that will help you in writing a correct codec easily? 1. Write a program that can run-length encode and decode any string comprising of alphabets alone. 2. Write a program that can run-length encode and decode any string comprising of alphabets **and** numbers. What makes this part of the problem tougher than the earlier part? What do you need to change in the problem statement to be able to solve encode/decoding of strings containing alphabets **and** numbers? Make your suggesting change and solve the problem. Scratch section - PLEASE IGNORE ------------------------------- In this video we'll cover: - - The downside of tuples - Fixing the downside by using "product types" - Using product-types (and tuples) via pattern matching - Difference between types, data-constructors, and actual values. - Downside of "tagging" product types and fixing them with newtypes. - Downside of product-types and fixing them with records - Sum types and why does one needs them - Exhaustiveness checking - Creating our own type using a mix of sum, product, and records - Why are these called sum & product types - Named Puns Exercises: - Run length encoding - Core motivation: We touched upon tuples previously and outlined three reasons why they're used: - Returning multiple values from functions - Passing multiple values to high-order functions where a lambda just allows a single parameter - Ad-hoc way to associate data or elements If you think about it, the first two are really specific use-cases of the last. One of the problems with tuples like (Int, Int) is that when you're looking at type-signatures you don't really come to know what each of the Int's means. The structure of the value is known, but the semantics are not known. Discuss about what semantics means. An (Int, Int) could mean anything -- hour / min, x / y coordinates, row / col coordinates, PRODUCT TYPE - Canonical example of representing a point in 2-d space - (x, y) coordinates OR (row, col) coordinates - data Point = MkPoint Double Double deriving (Eq, Show) - equivalent to writing: data Point where MkPoint :: Double -> Double -> Point - Inspect the type of the data-constructors in GHCi. They are just functions -- for most part. - Pattern matching against **constructors** - Representing an hour/min pair: data TimeOfDay = TimeOfDay Int Int - This can also be represented as (Int, Int) ==> TimeOfDay Int Int - Similar to tuples, but different because of semantics. They are structurally the same, but semantically different. They mean very, very different things. - For example, the Mars Climate Orbiter disaster due to confusion between pounds (imperial units) and kilograms (metric, SI units) SUM TYPE - Designing an Enum -- status of an Order, status of a Payment, presence/absence of a field (say social media profiles -- basically a hand-written Maybe) - Advantages of using exhaustiveness checking - That is called a sum-type Mix of Sum & Product: - Example of representing a person's full-name: data Title = Mr | Mrs | Dr | Miss | Other String data FullName = FullName Title String String Need for records. - But there's a problem here. What if you were building the data-structure for - Presence / absence of fields. Why Haskell doesn't have NULL values. - .. [1] Haddock is to Haskell, what Javadocs is to Java.