Generic Data Types
We use generics to create definitions for items like function signatures or structs, which we can then use with many different concrete data types. Let's first look at how to define functions, structs, enums, and methods using generics. Then, we will discuss how generics affect code performance.
In Function Definitions
When defining a function that uses generics, we place the generics in the signature of the function where we would usually specify the data types of the parameters and return value. Doing so makes our code more flexible and provides more functionality to callers of our function while preventing code duplication.
Continuing with our largest function, here are two functions that both find
the largest value in a slice. We will then combine these into a single function
that uses generics.
fn largestInt(list: &[Int]): &Int {
var largest = &list[0]
for item in list {
if item > largest {
largest = item
}
}
largest
}
fn largestChar(list: &[Char]): &Char {
var largest = &list[0]
for item in list {
if item > largest {
largest = item
}
}
largest
}
fn main() {
let numberList = vec![34, 50, 25, 100, 65]
let result = largestInt(&numberList)
println!("The largest number is \(result)")
let charList = vec!['y', 'm', 'a', 'q']
let result = largestChar(&charList)
println!("The largest char is \(result)")
}
The largestInt function is the one we extracted previously that finds the
largest Int in a slice. The largestChar function finds the largest Char
in a slice. The function bodies have the same code, so let's eliminate the
duplication by introducing a generic type parameter in a single function.
To parameterize the types in a new single function, we need to name the type
parameter, just as we do for the value parameters to a function. You can use
any identifier as a type parameter name. But we will use T because, by
convention, type parameter names are short, often just one letter, and
the type-naming convention is UpperCamelCase. Short for type, T is the
default choice of most programmers.
When we use a parameter in the body of the function, we have to declare the
parameter name in the signature so that the compiler knows what that name
means. Similarly, when we use a type parameter name in a function signature, we
have to declare the type parameter name before we use it. To define the generic
largest function, we place type name declarations inside angle brackets,
<>, between the name of the function and the parameter list, like this:
fn largest<T>(list: &[T]): &T {
// ...
}
We read this definition as "The function largest is generic over some type
T." This function has one parameter named list, which is a slice of values
of type T. The largest function will return a reference to a value of the
same type T.
Here is the combined largest function definition using the generic data type
in its signature. The listing also shows how we can call the function with
either a slice of Int values or Char values. Note that this code won't
compile yet:
fn largest<T>(list: &[T]): &T {
var largest = &list[0]
for item in list {
if item > largest {
largest = item
}
}
largest
}
fn main() {
let numberList = vec![34, 50, 25, 100, 65]
let result = largest(&numberList)
println!("The largest number is \(result)")
let charList = vec!['y', 'm', 'a', 'q']
let result = largest(&charList)
println!("The largest char is \(result)")
}
If we compile this code right now, we will get an error:
error[E0369]: binary operation `>` cannot be applied to type `&T`
--> src/main.rs:5:17
|
5 | if item > largest {
| ---- ^ ------- &T
| |
| &T
|
help: consider restricting type parameter `T`
|
1 | fn largest<T: std::cmp::PartialOrd>(list: &[T]): &T {
| ++++++++++++++++++++++
The help text mentions std.cmp.PartialOrd, which is a trait, and we are
going to talk about traits in the next section. For now, know that this error
states that the body of largest won't work for all possible types that T
could be. Because we want to compare values of type T in the body, we can
only use types whose values can be ordered. To enable comparisons, the standard
library has the std.cmp.PartialOrd trait that you can implement on types.
By restricting T to only those types that implement PartialOrd, the code
will compile because the standard library implements PartialOrd on both Int
and Char.
In Struct Definitions
We can also define structs to use a generic type parameter in one or more
fields using the <> syntax. Here is a Point<T> struct to hold x and y
coordinate values of any type:
struct Point<T> {
x: T,
y: T,
}
fn main() {
let integer = Point { x: 5, y: 10 }
let float = Point { x: 1.0, y: 4.0 }
}
The syntax for using generics in struct definitions is similar to that used in function definitions. First, we declare the name of the type parameter inside angle brackets just after the name of the struct. Then, we use the generic type in the struct definition where we would otherwise specify concrete data types.
Note that because we have used only one generic type to define Point<T>, this
definition says that the Point<T> struct is generic over some type T, and
the fields x and y are both that same type, whatever that type may be. If
we create an instance of a Point<T> that has values of different types, our
code won't compile:
struct Point<T> {
x: T,
y: T,
}
fn main() {
let wontWork = Point { x: 5, y: 4.0 }
}
In this example, when we assign the integer value 5 to x, we let the
compiler know that the generic type T will be an integer for this instance of
Point<T>. Then, when we specify 4.0 for y, which we have defined to have
the same type as x, we will get a type mismatch error like this:
error[E0308]: mismatched types
--> src/main.rs:7:38
|
7 | let wontWork = Point { x: 5, y: 4.0 }
| ^^^ expected integer, found floating-point number
To define a Point struct where x and y are both generics but could have
different types, we can use multiple generic type parameters. For example, we
change the definition of Point to be generic over types T and U where x
is of type T and y is of type U:
struct Point<T, U> {
x: T,
y: U,
}
fn main() {
let bothInteger = Point { x: 5, y: 10 }
let bothFloat = Point { x: 1.0, y: 4.0 }
let integerAndFloat = Point { x: 5, y: 4.0 }
}
Now all the instances of Point shown are allowed! You can use as many generic
type parameters in a definition as you want, but using more than a few makes
your code hard to read. If you find you need lots of generic types in your
code, it could indicate that your code needs restructuring into smaller pieces.
In Enum Definitions
As we did with structs, we can define enums to hold generic data types in their
variants. Let's take another look at the Option<T> enum that the standard
library provides, which we use through Oxide's T? syntax:
#![allow(unused)] fn main() { // This is what Option<T> looks like in Rust enum Option<T> { Some(T), None, } }
This definition should now make more sense to you. As you can see, the
Option<T> enum is generic over type T and has two variants: Some, which
holds one value of type T, and a None variant that doesn't hold any value.
By using the Option<T> enum (or T? in Oxide), we can express the abstract
concept of an optional value, and because Option<T> is generic, we can use
this abstraction no matter what the type of the optional value is.
In Oxide, we typically write this using the nullable type syntax:
fn findFirst<T>(items: &[T], predicate: (&T) -> Bool): T? {
for item in items {
if predicate(item) {
return Some(item.clone())
}
}
null
}
Enums can use multiple generic types as well. The definition of the Result
enum that we used in Chapter 9 is one example:
enum Result<T, E> {
Ok(T),
Err(E),
}
The Result enum is generic over two types, T and E, and has two variants:
Ok, which holds a value of type T, and Err, which holds a value of type
E. This definition makes it convenient to use the Result enum anywhere we
have an operation that might succeed (return a value of some type T) or fail
(return an error of some type E). In fact, this is what we used to open a
file in Chapter 9, where T was filled in with the type std.fs.File when
the file was opened successfully and E was filled in with the type
std.io.Error when there were problems opening the file.
When you recognize situations in your code with multiple struct or enum definitions that differ only in the types of the values they hold, you can avoid duplication by using generic types instead.
In Method Definitions
We can implement methods on structs and enums and use generic types in their
definitions too. Here is the Point<T> struct with a method named x
implemented on it:
struct Point<T> {
x: T,
y: T,
}
extension Point<T> {
fn x(): &T {
&self.x
}
}
fn main() {
let p = Point { x: 5, y: 10 }
println!("p.x = \(p.x())")
}
Here, we have defined a method named x on Point<T> that returns a reference
to the data in the field x.
Note that we have to declare T just after extension so that we can use T
to specify that we are implementing methods on the type Point<T>. By declaring
T as a generic type after extension, Oxide can identify that the type in
the angle brackets in Point is a generic type rather than a concrete type. We
could have chosen a different name for this generic parameter than the generic
parameter declared in the struct definition, but using the same name is
conventional.
Rust Equivalent
The Oxide code above translates to this Rust code:
#![allow(unused)] fn main() { struct Point<T> { x: T, y: T, } impl<T> Point<T> { fn x(&self) -> &T { &self.x } } }
We can also specify constraints on generic types when defining methods on the
type. We could, for example, implement methods only on Point<Float> instances
rather than on Point<T> instances with any generic type. Here we use the
concrete type Float, meaning we don't declare any types after extension:
extension Point<Float> {
fn distanceFromOrigin(): Float {
(self.x.powi(2) + self.y.powi(2)).sqrt()
}
}
This code means the type Point<Float> will have a distanceFromOrigin
method; other instances of Point<T> where T is not of type Float will not
have this method defined. The method measures how far our point is from the
point at coordinates (0.0, 0.0) and uses mathematical operations that are
available only for floating-point types.
Generic type parameters in a struct definition aren't always the same as those
you use in that same struct's method signatures. Here is an example that uses
the generic types X1 and Y1 for the Point struct and X2 and Y2 for
the mixup method signature to make the example clearer:
struct Point<X1, Y1> {
x: X1,
y: Y1,
}
extension Point<X1, Y1> {
fn mixup<X2, Y2>(other: Point<X2, Y2>): Point<X1, Y2> {
Point {
x: self.x,
y: other.y,
}
}
}
fn main() {
let p1 = Point { x: 5, y: 10.4 }
let p2 = Point { x: "Hello", y: 'c' }
let p3 = p1.mixup(p2)
println!("p3.x = \(p3.x), p3.y = \(p3.y)")
}
In main, we have defined a Point that has an Int for x (with value 5)
and a Float for y (with value 10.4). The p2 variable is a Point
struct that has a string slice for x (with value "Hello") and a Char for
y (with value 'c'). Calling mixup on p1 with the argument p2 gives us
p3, which will have an Int for x because x came from p1. The p3
variable will have a Char for y because y came from p2. The println!
macro call will print p3.x = 5, p3.y = c.
The purpose of this example is to demonstrate a situation in which some generic
parameters are declared with extension and some are declared with the method
definition. Here, the generic parameters X1 and Y1 are declared after
extension because they go with the struct definition. The generic parameters
X2 and Y2 are declared after fn mixup because they are only relevant to
the method.
Performance of Code Using Generics
You might be wondering whether there is a runtime cost when using generic type parameters. The good news is that using generic types won't make your program run any slower than it would with concrete types.
Oxide and Rust accomplish this by performing monomorphization of the code using generics at compile time. Monomorphization is the process of turning generic code into specific code by filling in the concrete types that are used when compiled. In this process, the compiler does the opposite of the steps we used to create the generic function: The compiler looks at all the places where generic code is called and generates code for the concrete types the generic code is called with.
Let's look at how this works by using the standard library's generic
Option<T> enum:
let integer: Int? = Some(5)
let float: Float? = Some(5.0)
When Oxide compiles this code, it performs monomorphization. During that
process, the compiler reads the values that have been used in Option<T>
instances and identifies two kinds of Option<T>: One is Int and the other
is Float. As such, it expands the generic definition of Option<T> into two
definitions specialized to Int and Float, thereby replacing the generic
definition with the specific ones.
The monomorphized version of the code looks similar to the following (the compiler uses different names than what we're using here for illustration):
enum OptionInt { Some(Int), None, } enum OptionFloat { Some(Float), None, } fn main() { let integer = OptionInt.Some(5) let float = OptionFloat.Some(5.0) }
The generic Option<T> is replaced with the specific definitions created by
the compiler. Because Oxide compiles generic code into code that specifies the
type in each instance, we pay no runtime cost for using generics. When the code
runs, it performs just as it would if we had duplicated each definition by
hand. The process of monomorphization makes generics extremely efficient at
runtime.