At this point, we have written code to draw some interesting features in our inflammation data, loop over all our data files to quickly draw these plots for each of them, and have Julia make decisions based on what it sees in our data. But, our code is getting pretty long and complicated. What if we had thousands of datasets, and didn’t want to generate a figure for every single one? Commenting out the figure-drawing code is a nuisance. Also, what if we want to use that code again, on a different dataset or at a different point in our program? Cutting and pasting it is going to make our code get very long and very repetitive, very quickly. We’d like a way to package our code so that it is easier to reuse, and Julia provides this by letting us define things called ‘functions’ — a shorthand way of re-executing longer pieces of code.

function function_name(argument1, argument2, ...)
  do things
end

Let’s start by creating a function called fahr_to_celsius that converts temperature from Fahrenheit to Celsius:

function fahr_to_celsius(temperature)
   (temperature-32)*(5/9) 
end

Let’s now test our function by converting a temperature of 32 Fahrenheit to Celsius

println("Freezing point of water: ", fahr_to_celsius(32), " C")

Freezing point of water: 0.0 C

In Julia, as we learnt in loops and conditionals before, there is an alternative and more compact way to declare the function in a single line:

fahr_to_celsius2(temperature) = (temperature-32)*(5/9)

If you convert 32 Fahrenheit to Celsius using the fahr_to_celsius2 function, you will get the same answer:

println("Freezing point of water: ", fahr_to_celsius2(32), " C")

Freezing point of water: 0.0 C

You can also create a so called “anonymous” function, without giving a function name, using either of these syntaxes:

??????temperature -> (temperature-32)*(5/9)

function (temperature)
    (temperature-32)*(5/9) 
end

?????????????

This creates a function taking one argument temperature and returning the value of the (temperature-32)(5/9)* at that value. Notice that the result is a generic function, but with a compiler-generated name based on consecutive numbering. The primary use for anonymous functions is passing them to functions which take other functions as arguments. A classic example is the map() function, which applies a function to each value of an array and returns a new array containing the resulting values. We will talk about the map() function later in this section.

function celsius_to_kelvin(temperature_c)
    temperature_c += 273.15
end

celsius_to_kelvin(0)

273.15

Chain functions

So now, if you would like to create a function to convert Fahrenheit to Kelvin, you can do it using the two functions you previously created, namely fahr_to_celsius and celsius_to_kelvin:

function fahr_to_kelvin(temperature_f)
    temperature_c = fahr_to_celsius(temperature_f)
    temperature_k = celsius_to_kelvin(temperature_c)
end

Now, if you test the function for a temperature of 212 Fahrenheit, you will get a temperature of 373.15 Kelvin. Note that the ouput value will correspond to the temperature_k variable within the function as it is the last calculated variable within the function. Of course, you can always return more outputs from your function, which we will learn how to do it in the next subsection.

fahr_to_kelvin(212)

373.15

It is good to know that there is an alternative way to do the previous convertion, i.e. you could have chained the two functions in order to get the same result without creating another function:

celsius_to_kelvin(fahr_to_celsius(212))

373.15

In this case, first we convert the temperature of 212 Fahrenheit to Celsius, and then we pass this new value (in Celsius) as an input in the celsius_to_kelvin function, with the final result being in Kelvin.

Return values

By default, the function will return the value of the last variable that was defined within the function. For example, the fahr_to_kelvin function will return the value of the temperature_k variable, which is the last command within the function. If we would like to return more than one values, we need to add a command called return before the end of the function. For example:

function fahr_to_kelvin(temperature_f)
    temperature_c = fahr_to_celsius(temperature_f)
    temperature_k = celcius_to_kelvin(temperature_c)
    return temperature_c, temperature_k
end

a, b = fahr_to_kelvin(10)

(-12.222222222222223, 260.92777777777775)

In Julia, another way to return multiple values without using the return command is:

function new_function(a,b)
    a+b, a-b
end

x, y = new_function(1,3)

(4, -2)

Default values

Although most of the functions have many arguments, you usually call them using only a few. This means that the other arguments of the function have default values. To define arguments with default values in the function, you have to use the assign (=) symbol when you define them:

function myprint(a,b=1,c=10)
    println("a:",a," b:",b," c:",c)
end

myprint(1)

In this example, the arguments b and c have a default value of 1 and 10, respectively. This way, when we call the function without defining the b and c arguments, the function will run and the arguments b and c will take their default values.

a:1 b:1 c:10

myprint(1,5)

a:1 b:5 c:10

?????????????????Be careful! The order you define the arguments when you call the function is important. In the example of myprint(1,5), 1 will be assigned to the argument a, 5 will replace the default value of argument b, and since we haven’t defined a value for the argument c, it will have its default value, i.e. 10. However, if, when you call the function, you define the name of the argument, then the order is not important. For example, the order is important in this case, if we would like to define a=1 and b=5

myprint(1,5)

myprint(b=5,a=1)

Multiple dispatch

Multiple dispatch makes software generic and fast! Let’s start by exploring an example. We can declare functions in Julia without giving Julia any information about the types of the input arguments that function will receive:

f(x) = (2*x)^3

and then Julia will determine on its own which input argument types make sense and which do not:

f(10)

??????????????????

f([1, 2, 3])

??????????????????

However, we also have the option to tell Julia explicitly what types our input arguments are allowed to have. For example, let’s write a function called my_func that only takes strings as inputs.

my_funct(a::String, b::String) = println("My inputs a and b are both strings!")

You can see here that in order to restrict the type of x and y to Strings, we just follow the input argument name by a double colon (::) and the keyword String, which indicates the accepted type.

my_funct("hello", "hi!")

my_func(3, 4)

Now my_func works on integers! But, my_func also still works when x and y are strings!

my_func("hello", "hi!")

Instead, we just added an additional method to the generic function called my_func. A generic function is the abstract concept associated with a particular operation. For example, the generic function + represents the concept of addition. A method is a specific implementation of a generic function for particular argument types. For example, + has methods that accept floating point numbers, integers, matrices, etc. We can use the methods command to see how many methods there are for my_func.

methods(my_func)

So, we now can call my_func on integers or strings. When you call my_func on a particular set of arguments, Julia will infer the types of the inputs and dispatch the appropriate method. This is the concept behind the multiple dispatch.

Multiple dispatch makes our code generic and fast. Our code can be generic and flexible because we can write code in terms of abstract operations such as addition and multiplication, rather than in terms of specific implementations. At the same time, our code runs quickly because Julia is able to call efficient methods for the relevant types.

To see which method is being dispatched when you call a generic function, you can use the @which macro:

@which my_func(3, 4)

And we can continue to add other methods to our generic function foo. Let’s add one that takes the abstract type Number, which includes subtypes such as Int, Float64, and other objects you would think of as numbers:

my_func(a::Number, b::Number) = println("My inputs a and b are both numbers!")

We can also add a fallback, duck-typed method for foo that takes inputs of any type:

my_func(a, b) = println("I accept inputs of any type!")

Mutating vs non-mutating functions

By convention, functions followed by the exclamation mark symbol (!) can mutate its inputs. Any function can mutate its inputs, but so that it is clear that it is doing so, we suffix it with a !

v = [3,5,2]
sort(v)

3-element Array{Int64,1}:
 2
 3
 5

3-element Array{Int64,1}:
 3
 5
 2

Now if we run the sort function but with the !, the list will retain the sorted format because we used the ! after the sort function:

sort!(v)

3-element Array{Int64,1}:
 2
 3
 5

3-element Array{Int64,1}:
 2
 3
 5

Higher order functions

map function

The map function is a “higher-order” function in Julia that takes a function as one of its input arguments. map then applies that function to every element of the data structure you pass it. For example, executing

map(f, [1, 2, 3])

will give you an output array where the function f has been applied to all elements of [1,2,3], i.e. [f(1),f(2),f(3)].

Here is an example. We have a function that calculates the cube of a number and we would like to apply this calculation to a list of numbers. We are going to use the map function to do that:

map(x->x^3, [1,4,7])

3-element Array{Int64,1}:
   1
  64
 343

broadcast function

The broadcast function is another “higher-order” function like map. broadcast is a generilisation of map, so it can do every thing map can do and more. The syntax for calling broadcast is the same as for calling map

broadcast(function_name, [item1, item2, item3])

Some syntactic sugar for calling broadcast is to place a dot (.) between the name of the function you want to broadcast and its input arguments. For example

broadcast(function_name, [item1, item2, item3])

function_name.([item1, item2, item3])

broadcast(x->x^3, [1,4,7])

3-element Array{Int64,1}:
   1
  64
 343

f(x) = x^3
f.([1,4,7])

3-element Array{Int64,1}:
   1
  64
 343

Another example. Let’s create a 3x3 array using the compact way for nested loops in Julia

A = [i + 3*j for j in 0:2, i in 1:3]

3×3 Array{Int64,2}:
 1  2  3
 4  5  6
 7  8  9

f(A)

3×3 Array{Int64,2}:
  468   576   684
 1062  1305  1548
 1656  2034  2412

Now let’s try to apply the broadcast function on f using the dot syntax. This syntax for broadcasting allows us to write relatively complex compound elementwise expressions in a way that looks natural/closer to mathematical notation.

f.(A)

3×3 Array{Int64,2}:
   1    8   27
  64  125  216
 343  512  729

A .+ 2 .* f.(A) ./ A

3×3 Array{Float64,2}:
   3.0   10.0   21.0
  36.0   55.0   78.0
 105.0  136.0  171.0

Function documentation

The function documentation syntax is very simple: any string appearing at the top-level right before an object (function, macro, type or instance) will be interpreted as documentation (these are called docstrings). Here is an example:

"This is a sample of a function documentation"
function my_func(a,b)
   println("a is:",a," while b is:",b) 
end

?my_func

search:

This is a sample of a function documentation

Documentation is interpreted as Markdown, so you can use indentation and code fences to delimit code examples from text. Technically, any object can be associated with any other as metadata. Markdown happens to be the default, but one can construct other string macros and pass them to the @doc macro as well.

"""
This is a more complex function documentation:

    my_func(a,b)

- This function **prints** the arguments a and b. 
- It doesn't use any default values.
"""
function my_func(a,b)
   println("a is:",a," while b is:",b) 
end

?my_func

Please find more tips for writing documentation in this link: https://docs.julialang.org/en/stable/manual/documentation/#Documentation-1

Readable functions

In programming, it is really important to use names for variables and functions that are descriptive and meaningful. Otherwise, if you check your scripts after a few months, or if you share your scripts with your colleagues, it will be hard to read them and you or your colleagues will need to spend hours understanding what you are doing in the scripts. Here is an example of a non-readable function. Please spend one minute reading this function and try to understand what the function does:

function s(p)
    a=0
    
    for v in p
        a+=v
    end
    
    m=a/length(p)
    d=0
    
    for v in p
        d+=(v-m)*(v-m)
    end
    
    return sqrt(d/(length(p)-1))
end

For those who couldn’t understand what the previous functions does, please have a look in the same function but now using more descriptive and meaningful names:

function std_dev(sample)
    sample_sum=0
    
    for value in sample
        sample_sum+=value
    end
    
    sample_mean=sample_sum/length(sample)
    sum_squared_devs=0
    
    for value in sample
        sum_squared_devs+=(value-sample_mean)*(value-sample_mean)
    end
    
    return sqrt(sum_squared_devs/(length(sample)-1))
end

Inflammation datasets and functions

In the previous modules, we learnt how to run the analysis for multiple datasets and how to use conditionals to detect problems in our datasets. Now let’s go one step further and implement some functions to our scipt. We are going to create a function called analyze, which takes the filename as an input argument and produces the plot with the three subfigures for the average, maximum and minimum inflammation per day. Then we will create another function called detect_problems to detect if there are any problems in the inflammation dataset, which also takes the filename as an input argument.

function analyze(filename)
    
    println("Processing dataset: ",filename)
    
    sleep(0.5)
    
    data = readdlm(filename, ',');
    days=1:40;
    
    p1=plot(days,mean(data,1)', ylabel="Average", label="Mean", color="blue", xlims=(-2,45), ylims=(0,14))
    p2=plot(days,maximum(data,1)', ylabel="Maximum", label="Max", c="green",alpha=0.5, fill=(0,"gray"))
    p3=plot(days,maximum(data,1)', seriestype=:scatter, ylabel="Minimum", label="Min", marker=(:white,2,:o,stroke(1,:black))) 

    p=plot(p1,p2,p3,layout=(1,3), legend=false, xlabel="Day", lw=2,size=(1000,300), grid=true)
    display(p)
    
end

analyze (generic function with 1 method)

function detect_problems(filename)
    
    data = readdlm(filename, ',');

    if (maximum(data,1)'[1]==0) & (maximum(data,1)'[21]==20)
        println("Suspicious looking maximum!")
    elseif sum(minimum(data,1)')==0
        println("Minimum add up to zero!")
    else
        println("The dataset is OK")
    end
    
end

detect_problems (generic function with 1 method)

Let’s check if the functions are working correctly using the first inflammation dataset:

analyze("./data/inflammation-01.csv")

detect_problems("./data/inflammation-03.csv")

Minimum add up to zero!

So both functions seem to work correctly. Now, we will try to combine everything we learnt until now to analyse the first three inflammation dataset using the concepts of loops, conditionals and functions:

using Plots
using Glob

filenames = sort(glob("infl*","./data/"), rev=false)

for f in filenames[1:3]
    detect_problems(f)
    analyze(f)
end

Functions