Type Systems

Data abstraction
- programming is about manipulating data, and building abstractions.
- there are control flow abstrctions(loops, recursion, exception handling, subroutines,...) this is how we break code down into smaller pieces.
- there also are data abstractions. tools to represent the data we want to manipulate.
- below it all, its our cpu manipulating bits and bytes
- we map real world objects to data types in our programs.
- how well we do this depends on the tools of the language...

Type system: mechanism for defining types and asscociating them with operations that can be performed on values of that type.
e.g you can add subtract and multiply integers...

What's a type? Classification of types.
- given that types are key to representing data and figuringo ut what we can do with it, we should look at what exactly a type is.
- how would you explain to someone whos not a computer scientist what a tpye is? e.g. integer.
→ zeh says: its kind of like a set of values.
- Denotational view of types: a type is a set of values.
- Constructive: a type is built in or composite. A composite type is defined by how it is constructed from other types.
- Abstraction-based: a type is defined by an interface, the set of operations it supports.
→ “duck typing” in python or ruby or any dynamically typed language. “looks like a duck, quacks like a duck, its a duck”. compiler doesnt check that yure not makign any mistakes, but as long as the method exists for the object youre operating on, the interpreter wont make a fuss and will just let you do it.

Constructive classification of types:
- built-in types: intgers, booleans, characters, “real” numbers (floating point)...
→ depends on the language what's built in or not. built in to Java, but its just an alias for int in C
- Enumeration and range types:
→ c: enum t {A, B};
→ Pascal: 0..100. stores a byte that can only hold values 0-100. compiler generates extra checking code to make sure things are actually in that range.
- composite types: records, arrays, files, lists, sets, pointers...
→ how are pointers composite? composite means something thats built from a simpler type. in order to build a pointer to int, you need an int type to begin with. you need an underlying type to point to thats kind of a building block for your pointer. you can't have a pointer alone, it needs to be a pointer to void, float, int, etc etc. the type itself is built from a simpler type, even tho the value is just an integer/address.

Basic questions a type system must answer:
- Type equivalence: do two values have the same types?
→ If i give you two integers, they obviously have the same type. But what if i do: typedef int banner_number; this is defining banner_number as an alias for int, they are still the same.
→ There are other scenarios where it's less obvious.
- Type compatibility: can a value of a certain type be used in a certain context?
→ if you have a float x, and an int y, should you be able to compute x * y? In java or C, the result will be a floating point number. but java and C's coercion allows you to use integers in this context, but under the hood they get converted to a float since the * operation expects a float. Haskell doesn't allow this, it will assume multiplying two different types is an error.
- Type inference: How is the type of an expression computed from the types of its parts?
→ the compiler needs to be able to figure out whether an assignment to a variable is permissible or not. aka can you assign 5.6 to an int.

Type equivalence:
Are the following tpyes the same?

struct City {
    string name;
    double area;
}

struct Element{
    string name;
    double atomic_weight;
}

Name equivalence: two types are the same if they have the same name

Structural equivalence: Two types are the same if they have the same structure, that is, if their byte for byte representations in memory are identical.

If we take the first option, these structs are different. but if we take the second definition, they would be the same type.

What do we do in general when we give things types? We restrict what we are allowed to do with that data.

Designing a programming language is a balancing act between giving us the freedom to do whatever we want, but also restricting it enough that things have structure.

Structural equivalence is much more permissive than name equivalence. If we use a City in place of an Element, this is likely an error, but technically will go wrong because the data type of the inner implementation are the same. Any operation that would be performed on a City could be applied to an Element and not cause any massive errors.

Are the following types the same?

struct A{
    string name;
    int *ptr;
}

struct B{
    string name;
    char *ptr;
}

\

Dereferencing these pointers would cause the program to break. Especially because int and char take up different amounts of space. This would be especially catastrophic trying to write an int to a char pointer, because it would overwrite stuff outside of the bounds of the char.

What about these?

struct A{
    string name;
    B *ptr;
}

struct B{
    string name;
    A *ptr;
}

What you end up doing is you have A and B, nodes in a finite state automata. you have a String node. If you take the first field of A or B, they both lead you to the string node. A has a pointer o B in its second component, and B has a pointer to A. Add a node *A and *B. A points to *b with a weight of 2. B points to *A with a weight of 2. *B points to B with a weight of 1. *A points to A with a weight of 1. An acceptance state is a node that contains a built in type.
Does there exist an accessor sequence that leads me to an acceptance state for one tpye and NOT the other, if so theyre not equivalent. Or if they're two different acceptacne states theyre not equivalent. But if for any accessor sequence that leads to an acceptance state, it works for both types, This is like minimizing a DFA. (Equivalence classes)

Strength of a type system:
- What is a strong type system? Strongly typed if every type hasa certain set of permissible operations, and youre not allowed to apply any unsupported action to an item of that type.
- Strongly typed: prohibits the application of an operation defined for one type to any object of another type
- Weakly typed: all or some operations can be applied to arbitrary types. Weakest typed language: assembly. you can basically just put whatever you want.
- Haskell, ML Rust: very strong typed. The disadvantage of this is you have to sometimes be more verbose in conversion, this can be tedious. But usually this is worth the effort because the conversions are not as frequent as you think, and stringent type checks protect you from your own errors.
- C, C++, Java: fairly strong typing. Unchecked type casts though.
- Ruby, Python: “duck typing”, if nothing goes wrong, its fine. But there are still type checks for certain things, like trying to add strings etc.
- Assembly: untyped, anything goes.

Do variabled have types? Static Vs Dynamic typing.
- Statically typed: if variables have types. Compile time checks
- Dynamically typed: if variables dont have types. Runtime checks.
- Why do we even have these approaches?
- The advantage of static typing is that by giving types to vairbles, compiler can check at compile time whether waht we're storing actually has that type. We can guarantee that our vars are the right type because we were never allowed to put anything in the wrong place. Allows for much greater safety, type errors can be found before program even runs.
- Dynamic typig doesnt have any of these checks becuse they are impossible, because by definition we can store anything in a variable. Would it be possible to do type checks in a dynamically typed language? Zeh claims only at runtime can these issues be found.

def fib(a);
    -- implementation --

def foo(n):
    if fib(n) > 1317 * fib(n-1)
        return "hello"
    else:
        return 0

x = foo(53)
x +=1

The type of foo depends on the arugment that you call foo with. At compile time you dont know which branch will be executed and therefore which value will be assigned to x. It's not checkable what will be stored in x. You can't check types at compile-time.
If you look at any modern statically typed language, you don't actually have to spell out the types. You can get the convenience of dynamic typing with the safety of static. Unit testing isnt the same as proving. You will never catch every single bug with unit testing alone.

Inference of variable types:
- the compiler has to check whether the value actually is the same as the type of the vairbale.

Composite Types:
- Product types: represent structures that are composed of more than one component
→ representation in memory requires storing the individual fields.
⇒ in memory you need to store each of the indivual fields.
→ Capture the cartesian product of two or more types
⇒ (Int, Bool, String) = Int x Bool x String
⇒ class {int x; string y;} = int x string
- Sum types: disjoint union of underlying types.
→ represent individual values that can have one of many types
→ Discjoint union or direct sum of two or more types:
⇒
- Algebraic data types: refer to types built from sums and products
- Recursive types: any type T twhose values contain as one of their fields other values of type T
→ data List a = Empty | Cons a (List a)
→ one importnat thing we discussed, is that biulding this type of thing requires either:
⇒ explicit pointers
⇒ reference model of variables
→ if you dont have either, you cant define this type.
⇒ Becuase: assume we havea value model. no pointers. everything is stored by value. This type would automatically be infinite in size, because the size of A would have to be the size of A bigger than the A in side it and inside it and inside it....
- Collection types: allow us to store an arbitrary number of values in a fixed type
→ arrays, lists, dicts, sets...
→ type always has to be fixed. Why?
⇒ vectors with different types are allowed in python, but you cant do it in statically typed languages like C or Java or Haskell.
⇒ why not in static typed?
• Iterating through the loop makes things hard, because you cant predict what each value may be and that is a problem if the language is statically typed.
- Pointers: the type to rule them all
→ Exist only in languages with a value model of variables
→ use to build data structures
→ used to build recursive types.
- Rust.. you cant really make a double linked list because of the lifetime system. You cant have a pointer that points to nothing. Heres how the rust porgrammer gets around that: You have a thing that you call a list node that stores a T.
-

struct ListNode<T> {
    val :: T;
    usize pred, next;
}
struct List<T> {
    elems : Vec<List<T>>;
}

- This is actually an array, and each node stores the index of its next and previous. its not really “pointing”. There are no explicit pointers.... except Vector.. It's like an arrayList. It has a pointer to a heap where the items are stored, and going over the size might trigger a reallocation of heap space.

Index