Tag Archives: Algorithm

Named after the Persian scholar Muhammad ibn Musa al-Khwarizmi algorithms have become a cornerstone of today’s society. Basically our whole economy depends on them and their correctness. And in a sense even your everyday cooking recipe is basically an algorithm.

And even if you don’t know it, you probably already learnt some mathematical algorithms. But it’s not only important to know them, but also to be able to implement and if necessary change them to fit the needs of your current problem.

In this section I will explain you some algorithms, I know. And furthermore show you how to implement and use them.
As in many other ares of my homepage… If you want a specific algorithm explained, don’t hestitate, just ask!

Revisiting Dithering

It’s been a while since my last post and an even longer while since the last one about image processing. However I’ll just catch up, where I left off. The last post in this category was about image dithering. A common usage of dithering is the conversion of a grey scale picture to a picture consisting just out of black and white pixels, saving space and also useful for some print media. On a further note dithered images have a nice visual style, I would call something like dollar bill style.

Conventional Dithering methods however also introduce artifacts to pictures and of course they also lose sharpness. So there’s the question, if one could approve on those conventional dithering methods in order to reduce artifacts and increase sharpness.
If you’re mathematically minded, you might also be worried, that conventional dithering is not rotationally invariant, which I assume is due to the fact that the dithering kernel is not symmetric and the way one iterates over the picture. Rotationally invariant basically means, that the output of a method doesn’t change, when you rotate your input image, which is of course a desirable trait.

So what about taking some inspiration from physics? After all physical laws of motion are rotationally invariant. Lattice Boltzmann methods are a way of computing dynamics in fluids. And they have been successfully employed for dithering by Hagenburg et al . The paper regarding this research also served as my main source for the method and its implementation, even though my specific implementation might differ in some details.
I’ll call the method Lattice Boltzmann Dithering during the rest of the article, as I didn’t find any other handy name for it.

The Idea Behind Lattice Boltzmann Dithering

The idea behind the method is to model the picture like a fluid with particles in it. In this case the grey values are the particles. A pixel that is black (with a grey value of 0) has so to say no particles in it. Those particles dissipate in the fluid over time. The time is modelled as discrete steps. They dissipate according to the following laws during each time step:

• If a pixel has a value v greater than 1, where 1 is the maximal grey value, (v – 1) values are distributed amongst the neighbors of the pixel and the value of original pixel becomes 1.
• If a pixel’s value v is lower than a minimal treshold, the user can set, v is distributed over its neighbors and the pixel itself becomes 0.
• Otherwise a fraction of the pixels value is distributed to each neighbor, that has a larger value, while being smaller than 1. This fraction is subtracted from the pixel itself.

I hope you see, how with this laws pixels with a high value (many particles) attract more values. After some time steps you should more or less only be left with black and white pixels this way.

And the whole process is stopped, when the difference from one time step to the next one becomes sufficiently small. And that’s about it… Not really that complicated, if you ask me! 🙂
Of course the paper goes more into detail about the theoretical background and proves some things about the method. If you’re interested, you should absolutely read it.

Implementation of Lattice Boltzmann Dithering

And now let’s come to the part you all waited for… The implementation. First I’ll implement a method for the dissipation of the values from each pixel, which will e executed in each time step to get the next distribution of values.

One thing to keep in mind, when working with images is, that they have borders. And as you don’t want to dissipate the values across borders or worse access a part of the image, that isn’t there. So you have to treat the pixels at the borders differently. In the case of this method, I always check first, if a neighboring pixel is there or not.

dissipatePixel <- function(img, minimalTreshold = 0.01){

imgAtNewStep <- matrix(c(0), ncol = ncol(img), nrow = nrow(img))

for(i in seq(nrow(img))){
for(j in seq(ncol(img))){
if(img[i,j] > 1 || img[i,j] < minimalTreshold){

toDissipate <- img[i,j]

if(img[i,j] > 1){
toDissipate <- toDissipate - 1
}

dissipated <- 0

if( i > 1){
imgAtNewStep[i - 1,j] <- imgAtNewStep[i - 1,j] + 1.0 / 9.0 * toDissipate
dissipated <- dissipated + 1.0 / 9.0 * toDissipate
}

if( j > 1){
imgAtNewStep[i,j - 1] <- imgAtNewStep[i,j - 1] + 1.0 / 9.0 * toDissipate
dissipated <- dissipated + 1.0 / 9.0 * toDissipate
}

if(i < nrow(img)){
imgAtNewStep[i + 1,j] <- imgAtNewStep[i + 1,j] + 1.0 / 9.0 * toDissipate
dissipated <- dissipated + 1.0 / 9.0 * toDissipate
}

if(j < ncol(img)){
imgAtNewStep[i,j + 1] <- imgAtNewStep[i,j + 1] + 1.0 / 9.0 * toDissipate
dissipated <- dissipated + 1.0 / 9.0 * toDissipate
}

if( i > 1 && j > 1){
imgAtNewStep[i - 1,j - 1] <- imgAtNewStep[i - 1,j - 1] + 1.0 / 36.0 * toDissipate
dissipated <- dissipated + 1.0 / 36.0 * toDissipate
}

if( i > 1 && j < ncol(img)){
imgAtNewStep[i - 1,j + 1] <- imgAtNewStep[i - 1,j + 1] + 1.0 / 36.0 * toDissipate
dissipated <- dissipated + 1.0 / 36.0 * toDissipate
}

if( i < nrow(img) && j > 1){
imgAtNewStep[i + 1,j - 1] <- imgAtNewStep[i + 1,j - 1] + 1.0 / 36.0 * toDissipate
dissipated <- dissipated + 1.0 / 36.0 * toDissipate
}

if( i < nrow(img) && j > ncol(img)){
imgAtNewStep[i + 1,j + 1] <- imgAtNewStep[i + 1,j + 1] + 1.0 / 36.0 * toDissipate
dissipated <- dissipated + 1.0 / 36.0 * toDissipate
}

## add the non dissipated amount to the same pixel in next time-step
imgAtNewStep[i,j] <- imgAtNewStep[i,j] + (img[i,j] - dissipated)
}else{

dissipated <- 0
currentPixel <- img[i,j]

if( i > 1 && img[i - 1,j] > img[i,j] && img[i - 1,j] < 1){
imgAtNewStep[i - 1,j] <- imgAtNewStep[i - 1,j] + 1.0 / 9.0 * currentPixel
dissipated <- dissipated + 1.0 / 9.0 * currentPixel
}

if( j > 1 && img[i,j - 1] > img[i,j] && img[i,j - 1] < 1){
imgAtNewStep[i,j - 1] <- imgAtNewStep[i,j - 1] + 1.0 / 9.0 * currentPixel
dissipated <- dissipated + 1.0 / 9.0 * currentPixel
}

if(i < nrow(img) && img[i + 1,j] > img[i,j] && img[i + 1,j] < 1){
imgAtNewStep[i + 1,j] <- imgAtNewStep[i + 1,j] + 1.0 / 9.0 * currentPixel
dissipated <- dissipated + 1.0 / 9.0 * currentPixel
}

if(j < ncol(img) && img[i,j + 1] > img[i,j] && img[i,j + 1] < 1){
imgAtNewStep[i,j + 1] <- imgAtNewStep[i,j + 1] + 1.0 / 9.0 * currentPixel
dissipated <- dissipated + 1.0 / 9.0 * currentPixel
}

if( i > 1 && j > 1 && img[i - 1,j - 1] > img[i,j] && img[i - 1,j - 1] < 1){
imgAtNewStep[i - 1,j - 1] <- imgAtNewStep[i - 1,j - 1] + 1.0 / 36.0 * currentPixel
dissipated <- dissipated + 1.0 / 36.0 * currentPixel
}

if( i > 1 && j < ncol(img) && img[i - 1,j + 1] > img[i,j] && img[i - 1,j + 1] < 1){
imgAtNewStep[i - 1,j + 1] <- imgAtNewStep[i - 1,j + 1] + 1.0 / 36.0 * currentPixel
dissipated <- dissipated + 1.0 / 36.0 * currentPixel
}

if( i < nrow(img) && j > 1 && img[i + 1,j - 1] > img[i,j] && img[i + 1,j - 1] < 1){
imgAtNewStep[i + 1,j - 1] <- imgAtNewStep[i + 1,j - 1] + 1.0 / 36.0 * currentPixel
dissipated <- dissipated + 1.0 / 36.0 * currentPixel
}

if( i < nrow(img) && j > ncol(img) && img[i + 1,j + 1] > img[i,j] && img[i + 1,j + 1] < 1){
imgAtNewStep[i + 1,j + 1] <- imgAtNewStep[i + 1,j + 1] + 1.0 / 36.0 * currentPixel
dissipated <- dissipated + 1.0 / 36.0 * currentPixel
}

## add the non dissipated amount to the same pixel in next time-step
imgAtNewStep[i,j] <- imgAtNewStep[i,j] + (img[i,j] - dissipated)
}
}
}

return(imgAtNewStep)

}


Done that! Now the easy part…
But as the implementation in R with loops is incredibly inefficient I’ll just run 50 time steps this time. I will however implement this method at some later point in C++, where loops aren’t inefficient. This will also serve as a good example on how to integrate C++ code in R.

lbDithering <- function(img, epsilon, minimalTreshold){

i <- 0
while(TRUE){

imgAtNewStep <- dissipatePixel(img = img, minimalTreshold = minimalTreshold)

#if(norm(imgAtNewStep - img, type = "2") < epsilon){
if(i >= 50){
return(imgAtNewStep)
}else{
img <- imgAtNewStep
}
i <- i +1
}
}


Usage

Now let’s reap the fruits of our coding and test the method on a picture I’ve selected. 🙂

birb <- imageBWFromJpeg("birb.jpg")
birbDithered <- lbDithering(birb@current, epsilon = 20, minimalTreshold = 0.05)
writeJPEG(round(birbDithered), "birbDithered.jpg")


Let me show you the resulting Jpeg:

Isn’t that beautiful? As comparison the original:

And a conventionally dithered version:

birbDithered2 <- errorDiffusiondDithering(birb, method = "mea")
writeJPEG(birbDithered2@current, "birbDithered2.jpg")


You can see more structure in the one created with Lattice Boltzmann Dithering, don’t you? And also you can better understand how light and shadows are distributed.
So overall a pretty nice algorithm I would say! Although I like the dollar bill look of the conventional one as well.

So that’s it for now! Until soon…

Yours, David

Availability Of The Code

You can access a maintained version of the code for the color spaces in my GitHub repository Raspository under R/imageConversion.R.

And well… You might’ve noticed, that I used some of my methods/classes a bit different than the last few times. Having some distance from coding this stuff I noticed I have to change some things about it. Although I don’t know yet, if I’ll make a blog post about it or not. Do you wanna read one?

0

For Allocation of Seats in the EU Parliament

On Monday I had a talk over Discord with Boris Biba, who himself runs a blog. We wanted to do a cooperation for some time. The focus of his blog are philosophy and politics. And as I told him, that I’m interested in crunching numbers, the comming EU elections are the perfect opportunity for a cooperation.
First we talked about doing something regarding the Wahl-O-Mat. Now in hindsight it was probably good that we decided for something else, as the Wahl-O-Mat was taken offline just today.
Then Boris brought up that he wanted to a post about the seat allocation method, which is called Sainte-Laguë/Schepers method, for German votes in the EU election. And I thought to myself, that this is wonderful, as voting is basically a paradigm for statistics. So I would be able to implement a small algorithm.

So be also sure to check out the post, which you can find here, from Boris, if you’re able to read German!

What I’ll be doing in this post, is explain you the seat allocation method called Sainte-Laguë/Schepers and then give you a demonstrative example for it. And as an easteregg I throw in some election posters for the imaginary parties, I’ll use in the example. I created those posters with Adobe Spark.

As a main source for my post, I took the corresponding article from the German Wahl-Lexikon.

Description of the Method

So there are basically three variants of this method, which all deliver the same result.
Two of them work by ranking the voting result. The other one by simple division, which is the one used for the German part of the EU election. It is either called iterative or divisor method.

The simple idea behind this divisor method is to find a divisor for the voting result, which delivers you the right amount of total seats, if you divide the voting results by it and then round them by standard rounding.

To find the right divisor, first the total amount of votes is divided by the number of seats to be assigned.

$$divisor = \frac{\#votesTotal}{\#seats}$$

The for each party the number of votes is divided by this divisor.

$$seatsParty_{i} = \frac{\#votesParty_{i}}{divisor}$$

And if the sum of the seats of all parties matches up with the amount to be assigned, we’re already done!
If not, we have to either increment or decrement the divisor depending on, if we have to few or to many seats.

Just think about that… If you increase the divisor, the amount of seats shrinks. And vice versa if you decrease the divisor, the amount of seats increases.

And so the divisor is adjusted and the final seats per party are obtained.

Implementation of the Sainte-Laguë/Schepers method

And of course it wouldn’t be me, if I wouldn’t also implement the method.
Here we go…

seatAllocation <- function(votes, seats){
## calculate the initial divisor

## get the initial seats per party

## if they already satisfy the seats to be assigned, return the seat allocation
if(sum(seatsPerParty) == seats){
return(list(divisor = divisor, seatsPerParty = seatsPerParty))
}

## otherwise increment or decrement the divisor until
## the result fits and then return it
if(sum(seatsPerParty) < seats){
while(sum(seatsPerParty) < seats){
divisor = divisor - 1
}
return(list(divisor = divisor, seatsPerParty = seatsPerParty))
}else{
while(sum(seatsPerParty) > seats){
divisor = divisor + 1
}
return(list(divisor = divisor, seatsPerParty = seatsPerParty))
}

}


The function is basically the same as what I described under the last point in plain text. As always, if you have some questions or remarks regarding my implementation feel free to write me a comment!

Example with the Sainte-Laguë/Schepers method

Now to test the method, let’s just come up with some arbitrary voting result for our imaginary parties introduced earlier. And of course plot them as a pie chart!

votes <- c(AP = 11345, CVP = 563342, EP = 618713, OSP = 305952, PDP = 95001)


Subsequently, let’s test what result the method delivers and if the percentages match up approximately.

result <- seatAllocation(votes, 310)


OK, first let’s visualize the result. But let’s not use a pie chart again. Because to be honest they can be misleading. This time we will use a waffle chart, which displays the actual seats.
Of course we also need to do some preprocessing. We want the parties ordered after their size and we won’t their percentage of seats in the legend.

seatsPerParty <- result$seatsPerParty seatsPerParty <- sort(seatsPerParty, decreasing = TRUE) names(seatsPerParty) <- paste0(names(seatsPerParty), " (", format(seatsPerParty/sum(seatsPerParty) * 100, digits = 2), "%)") waffle::waffle(seatsPerParty)  Well, there’s some difference in the percentage, but that’s to be expected as you can’t distribute fractions of seats between the parties. Outlook Of course there are many other methods for allocating seats in an election. Some that are equivalent to this one and others that are not. And if you’re interesting in them, I would encourage you to write me. If you like, we can look at a bunch of them an then compare them. And we could also take a look at things like overhang seat or different kinds of voting. I think it’s a nice topic for making plots. By the way if you also wanna read this post in German, check the following link out! Please follow and like us: 0 Image Dithering I showed you a lot of elementary stuff regarding image processing lately, so now it’s time to do something nice. What I will show you is called image dithering. But first let me give you a motivation for it Let’s say you have a grayscale picture and you want it to paint with really only black and white. Therefore I will demonstrate you, what happens, if you just “round” the pixel values to black and white. For this purpose I will use a lot of functions I implemented in the previous posts. At first we have to load the picture. library(Raspository) imgColor <- imageRGBFromJpeg("Wagner.jpg") plot(imgColor)  Next step is, that we have to convert it to grayscale. Remember the function I implemented in this post? img <- imageBWFromRGB(imgColor, c(0.3, 0.59, 0.11)) plot(img)  Our eyes don’t treat all the colors equally, that’s why I choose this proportions for the colors to the grayscale. I took the values from this post on tutorialspoint. Now let’s just quick an dirty convert this to completely black and white. imgTmp <- img imgTmp@current <- round(imgTmp@current) plot(imgTmp)  Wow! We certainly lost basically all the information in the picture. Implementing Image Dithering Before I show you how to implement a version of dithering I will shortly explain you the idea behind it. Let me ask you one question… Is there really such a thing as gray? Or how exactly would you define gray? Quite simply as a mixture of black and white. Now think about colors and the screen, you’re probably sitting in front of. You have basically only three colors (RGB!). And each of the pixels on your screen consists of three sub-pixels, one of each of those three colors. You perceive them not as individual dots, because their too close together for your eyes to distinguish them. Now’s the question, if we could something similar in this black and white case? And of course: Yes, we can! And it is called image dithering, which is by the way also applicable to colored images. The idea now is that you iterate over all of your pixels ans apply to each of them still some kind of round function. But then you propagate the difference of the original pixel and the rounded pixels to its neighbors, that are still to be processed. But of course there are also different methods in doing so. I’ll show you to of them today. Floyd Steinberg dithering Let’s begin with the Floyd Steinberg Algorithm. I suggest you to read the corresponding Wikipedia article, as it is very straightforward. And my implementation is also pretty straightforward. floydSteinbergDithering <- function(img, transformPaletteFunction = round){ pixel <- img@current n <- dim(pixel)[1] m <- dim(pixel)[2] for(y in seq(m)){ for(x in seq(n)){ oldPixel <- pixel[x,y] newPixel <- transformPaletteFunction(oldPixel) error <- oldPixel - newPixel pixel[x,y] <- newPixel if(x < n){ pixel[x + 1, y] <- pixel[x + 1, y] + error * 7/16 } if(x > 1 && y < m){ pixel[x - 1, y + 1] <- pixel[x - 1, y + 1] + error * 3/16 } if(y < m){ pixel[x, y + 1] <- pixel[x, y + 1] + error * 5/16 } if(x < n && y < m){ pixel[x + 1, y + 1] <- pixel[x + 1, y + 1] + error * 1/16 } } } ditheredImage <- new(class(img)[[1]], original = img@original, current = pixel, operations = img@operations) return(cropPixels(ditheredImage)) }  For the future some kind of Kernel function would be nice to be able to apply different kernels to pictures. But now let’s test it. imgFS <- floydSteinbergDithering(img) plot(imgFS)  That’s awesome! It almost looks like we had different gray values in our pictures. And there are just some minor artifacts introduced by it, meaning some appearent structures, that aren’t actually present in the original. Now let’s try another method, which has a larger kernel. Minimized Average Error Dithering This dithering method was introduced by Jarvis et al from the famous Bell Lab in 1976. So you see that this whole field is pretty old. And some of you might remember a time, where it was actually difficult to transmit data from one location to another. I still remember being a six year old child waiting minutes on the NASA homepage to load one picture of a stellar nebular. Today image compression is of course still important for things like dynamic homepages, especially if they are mobile friendly. OK, now let’s come to the actual method. It is called minimized average error. And again the Wikipedia article on it is pretty good. This time the neighborhood of your pixel is increased by a range of one. Let me show you the implementation… minimizedAverageErrorDithering <- function(img, transformPaletteFunction = round){ pixel <- img@current n <- dim(pixel)[1] m <- dim(pixel)[2] for(y in seq(m)){ for(x in seq(n)){ oldPixel <- pixel[x,y] newPixel <- transformPaletteFunction(oldPixel) error <- oldPixel - newPixel pixel[x,y] <- newPixel if(x < n){ pixel[x + 1, y ] <- pixel[x + 1, y ] + error * 7/48 } if(x < n - 1){ pixel[x + 2, y ] <- pixel[x + 2, y ] + error * 5/48 } if(x > 2 && y < m){ pixel[x - 2, y + 1] <- pixel[x - 2, y + 1] + error * 3/48 } if(x > 1 && y < m){ pixel[x - 1, y + 1] <- pixel[x - 1, y + 1] + error * 5/48 } if(y < m){ pixel[x , y + 1] <- pixel[x , y + 1] + error * 7/48 } if(x < n && y < m){ pixel[x + 1, y + 1] <- pixel[x + 1, y + 1] + error * 5/48 } if(x < n - 1 && y < m){ pixel[x + 2, y + 1] <- pixel[x + 2, y + 1] + error * 3/48 } if(x > 2 && y < m - 1){ pixel[x - 2, y + 2] <- pixel[x - 2, y + 2] + error * 1/48 } if(x > 1 && y < m - 1){ pixel[x - 1, y + 2] <- pixel[x - 1, y + 2] + error * 3/48 } if(y < m - 1){ pixel[x , y + 2] <- pixel[x , y + 2] + error * 5/48 } if(x < n && y < m - 1){ pixel[x + 1, y + 2] <- pixel[x + 1, y + 2] + error * 3/48 } if(x < n - 1 && y < m - 1){ pixel[x + 2, y + 2] <- pixel[x + 2, y + 2] + error * 1/48 } } } ditheredImage <- new(class(img)[[1]], original = img@original, current = pixel, operations = img@operations) return(cropPixels(ditheredImage)) }  You wanna see it’s effect, don’t you? Here you go… imgMea <- minimizedAverageErrorDithering(img) plot(imgMea)  Do you see the difference? I think we got rid of the artifacts! Isn’t that amazing? I really love how demonstrative image processing is. But that’s it for today… See you soon! Availability Of The Code You can access a maintained version of the code for the color spaces in my GitHub repository Raspository under R/imageConversion.R. Please follow and like us: 0 Bookmark-Coloring Algorithm And Writing Documentations In R I wrote this function months ago, while writing the report for my seminar and I wanted to my make a post about this Bookmark-Coloring Algorithm ever since, but I never found the right chance. As I yesterday in the evening wrote some documentation for the Raspository, I thought it would be nice to make a post about this specific topic. Therefor I thought that using the Bookmark-Coloring Algorithm as an example would be nice. The Bookmark-Coloring Algorithm This specific algorithm’s use is to generate a personal pagerank. In more detail this algorithm calculates given a starting website the chance to end up at other websites. But this idea is applicable to other fields as well. In my post about network-based integration through heat diffusion I showed you a similar method applied to a network of multi-omic data. On the same data-set you could use the Bookmark-Coloring Algorithm. The basic idea behind the Bookmark-Coloring Algorithm is that you have some color that diffuses through a network, which is in my opinion equivalent to heat diffusing in a network. Correct me, if I’m wrong. I implemented the algorithm following the paper about it by Pavel Berkhin. More precisely I implemented the Algorithm 2 from the paper. So let me show my implementation: require(igraph) require(dequer)  BCA<- function(graph, v, retentionCoefficient = 0.4, tol = 0.001){ # initialise vector of transition chances p<-c() p[V(graph)] <- 0 q <- queue() pushback(q, v) # initialise vector that indicates how much color is in one node colorInVertex <- c() colorInVertex[v] <- 1 # execute as long queque q has elements while(length(q) > 0){ i <- pop(q) w <- colorInVertex[i] # use up the color in node colorInVertex[i] <- NA p[i] <- p[i] + retentionCoefficient * w # if all color is used up continuew to next element in queque if(w < tol){ next } # execute for all neighbors for(j in neighbors(graph, i, mode = "out")){ if(!is.na(colorInVertex[j])){ # add color to neighbor colorInVertex[j] <- colorInVertex[j] + ((1 - retentionCoefficient) * w/degree(graph, i, mode = "out")) }else{ # initialise color in neighbor pushback(q, j) colorInVertex[j] <- (1 - retentionCoefficient) * w/degree(graph, i, mode = "out") } } } return(p) }  I wrote some comments, that hopefully help you to understand, what’s going on. That’s also the first part of documentation. It’s the step you’ll probably do, while writing your code and in my opinion it’s always useful. So are we done with the documentation? No. Not, if we want to this function into a package. roxygen2 documentation roxygen2 is a nice package allowing you to write in-line documentation in R. I won’t go to much into detail about it here as there are lot of online sources for it1, but I will show you a short example, how to do it! Now let me show, how to write a documentation for the BCA function, I will go over all the specified tags. #' Bookmark Coloring Algorithm #' #' @aliases BookmarkColoringAlgorithm #' #' @description This function calculates a teleportation vector from a given #' starting node to other nodes in a given network. #' #' @export BCA #' @import dequer #' @import igraph #' #' @param graph an object of type \code{\link[igraph]{igraph}}. #' @param v a starting vertex from the above graph. Can be either its identifier #' or a igraph.vs object. #' @param retentionCoefficient the restart probability for each node. #' @param tol a tolerance treshold, indicating what the smalltest value of color #' is, that should propagate further #' #' @return a preference/teleportation vector #' #' @references \insertRef{Berkhin2006}{Raspository} #' #' @examples #' library(igraph) #' g <- make_ring(5) #' preferenceVector <- BCA(g, 1) BCA <- function(graph, v, retentionCoefficient = 0.4, tol = 0.001){  OK, let’s go over some of the tags… In the first line of course you have the title of the function. Additional to the description, you can also add a details tag, where description should give a short overview over the method, you could include theoretical background in the details. Then needed packages are imported. roxygen2 will convert them into lines in your NAMESPACE file. With params and return you should shortly describe their types and what they should contain. For the following use of the references tag, you also need to import the Rdpack package and include a “REFERENCES.bib” in the inst folder with the regarding BibTeX entries. In my opinion you should always use this, when implementing some method from some source… Be it a book or a paper. Rdpack imports those references automatically from your BibTeX file into the documentation files. Last, but not least I included a runnable example. This is important to give your user a starting point on how to use your function, but furthermore it is a test for the function. Each time your package is built, example code is run. So you will be notified, if there are any errors. But we will go more into automated testing another time. Because there is of course more you can do. But you should always write example code, if your function is visible to the user. After writing this documentation in your file, you have to compile it by writing: roxygen2::roxygenise()  I hope that this short example will help you writing and documenting your own functions! 🙂 Availability Of The Code You can access a maintained version of the code in my GitHub repository Raspository under R/BCA.R. Please follow and like us: 0 Bibliography Alternating Least Squares How To Implement Alternating Least Squares In R And How Not To Do It So it’s time for my first actual content. And like predicted in my Hello World post, it will be something implemented in the R programming language. More precisely it’s a Machine learning algorithm called Alternating Least Squares. But first, before we indulge ourselves in code, let me tell you why this algorithm is of interest for me and what it does. Introduction I’ve been working now for a few months as part of my research assistant job on the Bioconductor package BEclear . I won’t go into detail about the package, you only need to know that it uses something called a Latent Factor Model to impute1 missing data in data-sets. Let’s say you have a matrix D containing missing values. The rows in the matrix stand for features and the columns for samples. Then you could assume that the matrix $$D_ {ij}$$ is modeled by both features and sample specific effects in the following way: $$D_ {ij} = L_ {i}^ {T} \times R_ {j}$$ Where $$L_i$$is the feature specific matrix and $$R_j$$ the sample specific matrix. For the imputation of missing values you now try to estimate those two matrices, the latent factors, from the existing values. Methods based on this assumption are already applied in variety of fields. Like with batch effects in DNA Methylation in the case of the BEclear package or in recommender systems for e.g. Netflix like described in a paper, which helped me a lot in understanding this topic. To estimate the latent factors there are different methods. One of them, implemented in the BEclear package is a gradient descent. Another method for it is Alternating Least Squares (ALS), which I wanted to implement on my own.2 The lecture notes of Hastie et al served as my source for implementing this method. I highly recommend you reading both the paper from Koren et al and those lecture notes, if you want to know more about the theoretical background and also the applications of those methods. You just need to know some Linear Algebra, then they should be easy enough to understand in my opinion. But as a short summary… ALS tries to estimate the feature and sample matrix by alternating fixing one of them and then calculating the other one by solving the the system of equations and you do this in general until convergence. In Gradient Descent on the other hand both matrices are estimated at the same time. How Not To Implement Alternating Least Squares As a start I will show you my first faulty try in implementing the Alternating Least Squares. Maybe you will learn something by me sharing it with you. And you should try to guess what my mistake was. As my implementation reuses a function from BEclear34, you have to install this package first. For this purpose I guess it’s easiest if you just install it from GitHub via the following lines of code: if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools") devtools::install_github("David-J-R/BEclear")  And now let’s come to the first implementation of the Alternating Least Squares algorithm. See if you can find its problem. I tried to comment all those steps so that the code should be comprehensible. But if not, please let me know! 🙂 imputeALSfaulty<- function(data, lambda = 0.75, r = 10, iters = 20){ # We require the BEclear package, because we're using its loss function require(BEclear) D <- data D[is.na(D)] <- 0 # We initialise L and R with random values L <- matrix(rnorm(nrow(data) * r), nrow(data), r) / sqrt(r) R <- matrix(rnorm(r * ncol(data)), r, ncol(data)) / sqrt(r) currLoss <- BEclear:::loss(L,R, 1, data)$loss

print(currLoss)

for(i in 1:iters){
# L and R are determined by solving the following system of the equations
# We repeat this step iters-times
L <- t(solve(R %*% t(R) + diag(lambda,r), R %*% t(D)))
R <- solve(t(L) %*% L + diag(lambda,r), t(L) %*% D)

currLoss <- BEclear:::loss(L,R, 1, data)$loss print(currLoss) } # L and R are multiplied to get the estimated values D <- L %*% R # Missing values are replaced with estimated value for (i in seq_len(nrow(data))) for (j in seq_len(ncol(data))) { if (is.na(data[i, j])) { data[i, j] <- D[i, j] } } return(data) }  Now let me show you the problem with this implementation, if you haven’t recognized it on your own by now. First we will load the example data and functions from the BEclear package to generate ourselves a sample data-set with missing values: library(BEclear) data("BEclearData") batchEffect <- calcBatchEffects( data = ex.data, samples = ex.samples, adjusted = TRUE, method = "fdr") mdifs <- batchEffect$med
pvals <- batchEffect$pval summary <-calcSummary(mdifs, pvals) cleared.data <- clearBEgenes(ex.data, ex.samples, summary)  And then we run the faulty version of ALS on it. The printed output of the function is the loss of the current solution during each iteration. result <- imputeALSfaulty(cleared.data, iters = 10)  ## [1] 2586.68 ## [1] 101.8086 ## [1] 95.60281 ## [1] 95.29458 ## [1] 95.21404 ## [1] 95.20139 ## [1] 95.20632 ## [1] 95.21329 ## [1] 95.21869 ## [1] 95.22233 ## [1] 95.2247  If we now take a look at the imputed values, you can see what’s wrong: boxplot(result[is.na(cleared.data)])  They’re all pretty close to zero. That’s because we set the missing values to zero. This way the solve method “tries” to generate R and L the way that the missing values are also very close to zero. Of course we don’t want that… This way we could just set the missing values right away to zero. How To Implement Alternating Least Squares Finally let me show you an implementation that actually does, what it should do. And again if something is unclear, don’t hesitate to ask me! imputeALScorrect<- function(data, lambda = 0.75, r = 10, iters = 80){ # We require the BEclear package, because we're using its loss function require(BEclear) # We initialise L and R with random values L <- matrix(rnorm(nrow(data) * r), nrow(data), r) / sqrt(r) R <- matrix(rnorm(r * ncol(data)), r, ncol(data)) / sqrt(r) currLoss <- BEclear:::loss(L,R, 1, data)$loss
print(currLoss)

for(iter in 1:iters){

# Now we iterate over the feature dimmension of L
for(i in 1:dim(L)[[1]]){
# We determine the revealed entries for the feature
# And subset the data and R so to only retain the revealed entries
revealedEntries <- !is.na(data[i,])
y <- as.matrix(data[i, revealedEntries])
x <- R[,revealedEntries]
# We solve the linear equation for the feature
L[i,] <- as.vector(solve(x %*% t(x) + diag(lambda, r), x %*% y))
}

# We iterate over the sample dimmension of R
for(j in 1:dim(R)[[2]]){
# We determine the revealed entries for the sample
# And subset the data and L so to only retain the revealed entries
revealedEntries <- !is.na(data[,j])
y <- as.matrix(data[revealedEntries, j])
x <- L[revealedEntries,]
# We solve the linear equation for the sample
R[,j] <- as.vector(solve(t(x) %*% x + diag(lambda, r), t(x) %*% y))
}
currLoss <- BEclear:::loss(L,R, 1, data)$loss print(currLoss) } # L and R are multiplied to get the estimated values D <- L %*% R # Missing values are replaced with estimated value for (i in seq_len(nrow(data))) for (j in seq_len(ncol(data))) { if (is.na(data[i, j])) { data[i, j] <- D[i, j] } } return(data) }  A further advantage of this implementation is, that it is relatively easy to write a parallelised version of it. Maybe I will show you that in one of my next posts. After I overheard a conversation at the university that R is apparently bad for this, I feel almost challenged to do so. Now let’s take a look at the imputed values. We just take the sample data-set from before for this cause. result <- imputeALScorrect(cleared.data, iters = 10)  ## [1] 2571.072 ## [1] 109.301 ## [1] 99.38027 ## [1] 97.17519 ## [1] 95.42625 ## [1] 94.00547 ## [1] 92.83838 ## [1] 91.87368 ## [1] 91.07338 ## [1] 90.40794 ## [1] 89.85372  boxplot(result[is.na(cleared.data)])  Now that looks more like real data… Doesn’t it? But to be sure let’s compare it to the predicted values by the BEclear package. For the comparison we calculated the Root Mean Squared Error: library(Metrics) result.BEclear <- imputeMissingData(cleared.data)  ## INFO [2019-02-08 12:17:10] Starting the imputation of missing values. ## INFO [2019-02-08 12:17:10] This might take a while. ## INFO [2019-02-08 12:17:10] BEclear imputation is started: ## INFO [2019-02-08 12:17:10] block size: 60 x 60 ## INFO [2019-02-08 12:17:10] Impute missing data for block 1 of 4 ## INFO [2019-02-08 12:17:10] Impute missing data for block 2 of 4 ## INFO [2019-02-08 12:17:11] Impute missing data for block 3 of 4 ## INFO [2019-02-08 12:17:11] Impute missing data for block 4 of 4  rmse(result.BEclear[is.na(cleared.data)], result[is.na(cleared.data)])  ## [1] 0.03196931  Well the difference isn’t that big. But of course for assessing the accuracy of the method an elaborate evaluation would be needed. However for something I coded just for fun I’m satisfied with this first look. Addendum: Biases Just for fun let’s also add biases to our model, like described by Koren et al, to further improve our algorithm. The idea behind the bias is to capture the variability of the data that arises from the features or samples alone, while the two matrices L and R capture the variability that arises from the interaction of features and samples together. In other words by introducing the biases we unburden L and R a bit. We use a method, where the biases for each entry in the data-set are the sum of the overall average over all values and the average difference from this average of the corresponding column and row. And to save valuable computation time we just subtract this bias for each value from a copy of each value and use this transformed matrix for further calculations. Of course we have to add the bias later again. And here we go with the improved implementation: imputeALSBias<- function(data, lambda = 0.75, r = 5, iters = 10, use.biases=TRUE){ # We require the BEclear package, because we're using its loss function require(BEclear) # copy the data D <- data # We initialise L and R with random values L <- matrix(rnorm(nrow(data) * r), nrow(data), r) / sqrt(r) R <- matrix(rnorm(r * ncol(data)), r, ncol(data)) / sqrt(r) currLoss <- BEclear:::loss(L,R, 1, D)$loss
print(currLoss)

if(use.biases){
# we calculate the biases
biasData<-mean(data, na.rm = TRUE)
biasRows<-rowMeans(data - biasData, na.rm= TRUE)
biasCols<-colMeans(data - biasData, na.rm= TRUE)

# subtract the biases from the data
D <- D - biasData - biasRows
D <- t(t(D) - biasCols)
}

for(iter in 1:iters){

# Now we iterate over the feature dimmension of L
for(i in 1:dim(L)[[1]]){
# We determine the revealed entries for the feature
# And subset the data and R so to only retain the revealed entries
revealedEntries <- !is.na(D[i,])
y <- as.matrix(D[i, revealedEntries])
x <- R[,revealedEntries]
# We solve the linear equation for the feature
L[i,] <- as.vector(solve(x %*% t(x) + diag(lambda, r), x %*% y))
}

# We iterate over the sample dimmension of R
for(j in 1:dim(R)[[2]]){
# We determine the revealed entries for the sample
# And subset the data and L so to only retain the revealed entries
revealedEntries <- !is.na(D[,j])
y <- as.matrix(D[revealedEntries, j])
x <- L[revealedEntries,]
# We solve the linear equation for the sample
R[,j] <- as.vector(solve(t(x) %*% x + diag(lambda, r), t(x) %*% y))
}
currLoss <- BEclear:::loss(L,R, 1, D)\$loss

print(currLoss)
}

# L and R are multiplied to get the estimated values
D <- L %*% R

if(use.biases){
# we add the biases again
D <- t(t(D) + biasCols)
D <- D + biasData + biasRows
}

# Missing values are replaced with estimated value

for (i in seq_len(nrow(data)))
for (j in seq_len(ncol(data)))
{
if (is.na(data[i, j])) {
data[i, j] <- D[i, j]
}
}

return(data)
}


Testing this implementation, if you wish, is now your turn! 🙂
Maybe at some later point I will compare the performance and correctness of various different settings of this functions5. But for now that’s enough. Of course there are more sophisticated bias models we could think of. But one could even think of bias models like biases that are also determined by the Alternating Least Squares method during each iteration.
So we won’t run out of things to do any time soon.

Conclusion

Yea, what’s the conclusion? I think it’s quite simple… Don’t be lazy, while coding!

OK, OK… I will say a bit more. I think what you can learn from the faulty example is that you should always think what your code is actually doing and take a look at the results to see, if something is fishy. Other than that I hope that you learned something and I could show you that some methods used in Machine Learning aren’t that complicated.

For now my implementation of ALS is still worse, when it comes to run time, in comparison to the Gradient Descent implemented in the BEclear package. But I also spend a lot of time optimizing the second. And maybe I will show you in a future blog how to optimize it. As this is my first blog post I would highly welcome feedback from you! 🙂

So have a nice day and until next time!

Availability Of The Code

You can access a maintained version of the code of the correct version in my GitHub repository Raspository under R/imputeALS.R.