# Tag Archives: Papers

I like to read papers, even in my free time. It took me a while to be really able to understand them, but now most of the time I do. Even if the articles are very math heavy.
I actually started reading papers on a regular basis even before my highschool degree, Abitur in German. Back then I had a subscription of the Nature Journal and recieved it as a hard copy. Pretty old school… I know, but it actually has some advantages to have articles in a physical printed format. As much as I love Mendeley, physical notes, posits, etc just rule.

Here under this section I will summarise scientific articles for you or talk more general about them. When it comes to computer science articles I might also show you here, how to implement the algorithms from them. Something, so I think, is important to understand papers from this area of research. And sometimes it is a quite tricky part as those algorithms are often formulated in beautiful prose instead of pseudo code ore something similar.

If you need help with understanding one specific paper just write me and I will see, what I can do!

# Revisiting Dithering

It’s been a while since my last post and an even longer while since the last one about image processing. However I’ll just catch up, where I left off. The last post in this category was about image dithering. A common usage of dithering is the conversion of a grey scale picture to a picture consisting just out of black and white pixels, saving space and also useful for some print media. On a further note dithered images have a nice visual style, I would call something like dollar bill style.

Conventional Dithering methods however also introduce artifacts to pictures and of course they also lose sharpness. So there’s the question, if one could approve on those conventional dithering methods in order to reduce artifacts and increase sharpness.
If you’re mathematically minded, you might also be worried, that conventional dithering is not rotationally invariant, which I assume is due to the fact that the dithering kernel is not symmetric and the way one iterates over the picture. Rotationally invariant basically means, that the output of a method doesn’t change, when you rotate your input image, which is of course a desirable trait.

So what about taking some inspiration from physics? After all physical laws of motion are rotationally invariant. Lattice Boltzmann methods are a way of computing dynamics in fluids. And they have been successfully employed for dithering by Hagenburg et al . The paper regarding this research also served as my main source for the method and its implementation, even though my specific implementation might differ in some details.
I’ll call the method Lattice Boltzmann Dithering during the rest of the article, as I didn’t find any other handy name for it.

## The Idea Behind Lattice Boltzmann Dithering

The idea behind the method is to model the picture like a fluid with particles in it. In this case the grey values are the particles. A pixel that is black (with a grey value of 0) has so to say no particles in it. Those particles dissipate in the fluid over time. The time is modelled as discrete steps. They dissipate according to the following laws during each time step:

• If a pixel has a value v greater than 1, where 1 is the maximal grey value, (v – 1) values are distributed amongst the neighbors of the pixel and the value of original pixel becomes 1.
• If a pixel’s value v is lower than a minimal treshold, the user can set, v is distributed over its neighbors and the pixel itself becomes 0.
• Otherwise a fraction of the pixels value is distributed to each neighbor, that has a larger value, while being smaller than 1. This fraction is subtracted from the pixel itself.

I hope you see, how with this laws pixels with a high value (many particles) attract more values. After some time steps you should more or less only be left with black and white pixels this way.

And the whole process is stopped, when the difference from one time step to the next one becomes sufficiently small. And that’s about it… Not really that complicated, if you ask me! 🙂
Of course the paper goes more into detail about the theoretical background and proves some things about the method. If you’re interested, you should absolutely read it.

## Implementation of Lattice Boltzmann Dithering

And now let’s come to the part you all waited for… The implementation. First I’ll implement a method for the dissipation of the values from each pixel, which will e executed in each time step to get the next distribution of values.

One thing to keep in mind, when working with images is, that they have borders. And as you don’t want to dissipate the values across borders or worse access a part of the image, that isn’t there. So you have to treat the pixels at the borders differently. In the case of this method, I always check first, if a neighboring pixel is there or not.

dissipatePixel <- function(img, minimalTreshold = 0.01){

imgAtNewStep <- matrix(c(0), ncol = ncol(img), nrow = nrow(img))

for(i in seq(nrow(img))){
for(j in seq(ncol(img))){
if(img[i,j] > 1 || img[i,j] < minimalTreshold){

toDissipate <- img[i,j]

if(img[i,j] > 1){
toDissipate <- toDissipate - 1
}

dissipated <- 0

if( i > 1){
imgAtNewStep[i - 1,j] <- imgAtNewStep[i - 1,j] + 1.0 / 9.0 * toDissipate
dissipated <- dissipated + 1.0 / 9.0 * toDissipate
}

if( j > 1){
imgAtNewStep[i,j - 1] <- imgAtNewStep[i,j - 1] + 1.0 / 9.0 * toDissipate
dissipated <- dissipated + 1.0 / 9.0 * toDissipate
}

if(i < nrow(img)){
imgAtNewStep[i + 1,j] <- imgAtNewStep[i + 1,j] + 1.0 / 9.0 * toDissipate
dissipated <- dissipated + 1.0 / 9.0 * toDissipate
}

if(j < ncol(img)){
imgAtNewStep[i,j + 1] <- imgAtNewStep[i,j + 1] + 1.0 / 9.0 * toDissipate
dissipated <- dissipated + 1.0 / 9.0 * toDissipate
}

if( i > 1 && j > 1){
imgAtNewStep[i - 1,j - 1] <- imgAtNewStep[i - 1,j - 1] + 1.0 / 36.0 * toDissipate
dissipated <- dissipated + 1.0 / 36.0 * toDissipate
}

if( i > 1 && j < ncol(img)){
imgAtNewStep[i - 1,j + 1] <- imgAtNewStep[i - 1,j + 1] + 1.0 / 36.0 * toDissipate
dissipated <- dissipated + 1.0 / 36.0 * toDissipate
}

if( i < nrow(img) && j > 1){
imgAtNewStep[i + 1,j - 1] <- imgAtNewStep[i + 1,j - 1] + 1.0 / 36.0 * toDissipate
dissipated <- dissipated + 1.0 / 36.0 * toDissipate
}

if( i < nrow(img) && j > ncol(img)){
imgAtNewStep[i + 1,j + 1] <- imgAtNewStep[i + 1,j + 1] + 1.0 / 36.0 * toDissipate
dissipated <- dissipated + 1.0 / 36.0 * toDissipate
}

## add the non dissipated amount to the same pixel in next time-step
imgAtNewStep[i,j] <- imgAtNewStep[i,j] + (img[i,j] - dissipated)
}else{

dissipated <- 0
currentPixel <- img[i,j]

if( i > 1 && img[i - 1,j] > img[i,j] && img[i - 1,j] < 1){
imgAtNewStep[i - 1,j] <- imgAtNewStep[i - 1,j] + 1.0 / 9.0 * currentPixel
dissipated <- dissipated + 1.0 / 9.0 * currentPixel
}

if( j > 1 && img[i,j - 1] > img[i,j] && img[i,j - 1] < 1){
imgAtNewStep[i,j - 1] <- imgAtNewStep[i,j - 1] + 1.0 / 9.0 * currentPixel
dissipated <- dissipated + 1.0 / 9.0 * currentPixel
}

if(i < nrow(img) && img[i + 1,j] > img[i,j] && img[i + 1,j] < 1){
imgAtNewStep[i + 1,j] <- imgAtNewStep[i + 1,j] + 1.0 / 9.0 * currentPixel
dissipated <- dissipated + 1.0 / 9.0 * currentPixel
}

if(j < ncol(img) && img[i,j + 1] > img[i,j] && img[i,j + 1] < 1){
imgAtNewStep[i,j + 1] <- imgAtNewStep[i,j + 1] + 1.0 / 9.0 * currentPixel
dissipated <- dissipated + 1.0 / 9.0 * currentPixel
}

if( i > 1 && j > 1 && img[i - 1,j - 1] > img[i,j] && img[i - 1,j - 1] < 1){
imgAtNewStep[i - 1,j - 1] <- imgAtNewStep[i - 1,j - 1] + 1.0 / 36.0 * currentPixel
dissipated <- dissipated + 1.0 / 36.0 * currentPixel
}

if( i > 1 && j < ncol(img) && img[i - 1,j + 1] > img[i,j] && img[i - 1,j + 1] < 1){
imgAtNewStep[i - 1,j + 1] <- imgAtNewStep[i - 1,j + 1] + 1.0 / 36.0 * currentPixel
dissipated <- dissipated + 1.0 / 36.0 * currentPixel
}

if( i < nrow(img) && j > 1 && img[i + 1,j - 1] > img[i,j] && img[i + 1,j - 1] < 1){
imgAtNewStep[i + 1,j - 1] <- imgAtNewStep[i + 1,j - 1] + 1.0 / 36.0 * currentPixel
dissipated <- dissipated + 1.0 / 36.0 * currentPixel
}

if( i < nrow(img) && j > ncol(img) && img[i + 1,j + 1] > img[i,j] && img[i + 1,j + 1] < 1){
imgAtNewStep[i + 1,j + 1] <- imgAtNewStep[i + 1,j + 1] + 1.0 / 36.0 * currentPixel
dissipated <- dissipated + 1.0 / 36.0 * currentPixel
}

## add the non dissipated amount to the same pixel in next time-step
imgAtNewStep[i,j] <- imgAtNewStep[i,j] + (img[i,j] - dissipated)
}
}
}

return(imgAtNewStep)

}

Done that! Now the easy part…
But as the implementation in R with loops is incredibly inefficient I’ll just run 50 time steps this time. I will however implement this method at some later point in C++, where loops aren’t inefficient. This will also serve as a good example on how to integrate C++ code in R.

lbDithering <- function(img, epsilon, minimalTreshold){

i <- 0
while(TRUE){

imgAtNewStep <- dissipatePixel(img = img, minimalTreshold = minimalTreshold)

#if(norm(imgAtNewStep - img, type = "2") < epsilon){
if(i >= 50){
return(imgAtNewStep)
}else{
img <- imgAtNewStep
}
i <- i +1
}
}

## Usage

Now let’s reap the fruits of our coding and test the method on a picture I’ve selected. 🙂

birb <- imageBWFromJpeg("birb.jpg")
birbDithered <- lbDithering(birb@current, epsilon = 20, minimalTreshold = 0.05)
writeJPEG(round(birbDithered), "birbDithered.jpg")

Let me show you the resulting Jpeg:

Isn’t that beautiful? As comparison the original:

And a conventionally dithered version:

birbDithered2 <- errorDiffusiondDithering(birb, method = "mea")
writeJPEG(birbDithered2@current, "birbDithered2.jpg")

You can see more structure in the one created with Lattice Boltzmann Dithering, don’t you? And also you can better understand how light and shadows are distributed.
So overall a pretty nice algorithm I would say! Although I like the dollar bill look of the conventional one as well.

So that’s it for now! Until soon…

Yours, David

## Availability Of The Code

You can access a maintained version of the code for the color spaces in my GitHub repository Raspository under R/imageConversion.R.

And well… You might’ve noticed, that I used some of my methods/classes a bit different than the last few times. Having some distance from coding this stuff I noticed I have to change some things about it. Although I don’t know yet, if I’ll make a blog post about it or not. Do you wanna read one?

# And Writing Documentations In R

I wrote this function months ago, while writing the report for my seminar and I wanted to my make a post about this Bookmark-Coloring Algorithm ever since, but I never found the right chance. As I yesterday in the evening wrote some documentation for the Raspository, I thought it would be nice to make a post about this specific topic. Therefor I thought that using the Bookmark-Coloring Algorithm as an example would be nice.

## The Bookmark-Coloring Algorithm

This specific algorithm’s use is to generate a personal pagerank. In more detail this algorithm calculates given a starting website the chance to end up at other websites. But this idea is applicable to other fields as well. In my post about network-based integration through heat diffusion I showed you a similar method applied to a network of multi-omic data. On the same data-set you could use the Bookmark-Coloring Algorithm.
The basic idea behind the Bookmark-Coloring Algorithm is that you have some color that diffuses through a network, which is in my opinion equivalent to heat diffusing in a network. Correct me, if I’m wrong.

I implemented the algorithm following the paper about it by Pavel Berkhin. More precisely I implemented the Algorithm 2 from the paper. So let me show my implementation:

require(igraph)
require(dequer)
BCA<- function(graph, v, retentionCoefficient = 0.4, tol = 0.001){
# initialise vector of transition chances
p<-c()
p[V(graph)] <- 0

q <- queue()
pushback(q, v)

# initialise vector that indicates how much color is in one node
colorInVertex <- c()
colorInVertex[v] <- 1

# execute as long queque q has elements
while(length(q) > 0){
i <- pop(q)
w <- colorInVertex[i]
# use up the color in node
colorInVertex[i] <- NA

p[i] <- p[i] + retentionCoefficient * w

# if all color is used up continuew to next element in queque
if(w < tol){

next
}

# execute for all neighbors
for(j in neighbors(graph, i, mode = "out")){
if(!is.na(colorInVertex[j])){
# add color to neighbor
colorInVertex[j] <- colorInVertex[j] +
((1 - retentionCoefficient) * w/degree(graph, i, mode = "out"))
}else{
# initialise color in neighbor
pushback(q, j)
colorInVertex[j] <- (1 - retentionCoefficient) * w/degree(graph, i, mode = "out")
}
}

}

return(p)
}

I wrote some comments, that hopefully help you to understand, what’s going on. That’s also the first part of documentation. It’s the step you’ll probably do, while writing your code and in my opinion it’s always useful.

So are we done with the documentation? No. Not, if we want to this function into a package.

## roxygen2 documentation

roxygen2 is a nice package allowing you to write in-line documentation in R. I won’t go to much into detail about it here as there are lot of online sources for it1, but I will show you a short example, how to do it!

Now let me show, how to write a documentation for the BCA function, I will go over all the specified tags.

#' Bookmark Coloring Algorithm
#'
#' @aliases BookmarkColoringAlgorithm
#'
#' @description This function calculates a teleportation vector from a given
#' starting node to other nodes in a given network.
#'
#' @export BCA
#' @import dequer
#' @import igraph
#'
#' @param graph an object of type \code{\link[igraph]{igraph}}.
#' @param v a starting vertex from the above graph. Can be either its identifier
#' or a igraph.vs object.
#' @param retentionCoefficient the restart probability for each node.
#' @param tol a tolerance treshold, indicating what the smalltest value of color
#' is, that should propagate further
#'
#' @return a preference/teleportation vector
#'
#' @references \insertRef{Berkhin2006}{Raspository}
#'
#' @examples
#' library(igraph)
#' g <- make_ring(5)
#' preferenceVector <- BCA(g, 1)
BCA <- function(graph, v, retentionCoefficient = 0.4, tol = 0.001){

OK, let’s go over some of the tags…
In the first line of course you have the title of the function. Additional to the description, you can also add a details tag, where description should give a short overview over the method, you could include theoretical background in the details.
Then needed packages are imported. roxygen2 will convert them into lines in your NAMESPACE file.

With params and return you should shortly describe their types and what they should contain.

For the following use of the references tag, you also need to import the Rdpack package and include a “REFERENCES.bib” in the inst folder with the regarding BibTeX entries. In my opinion you should always use this, when implementing some method from some source… Be it a book or a paper. Rdpack imports those references automatically from your BibTeX file into the documentation files.

Last, but not least I included a runnable example. This is important to give your user a starting point on how to use your function, but furthermore it is a test for the function. Each time your package is built, example code is run. So you will be notified, if there are any errors.
But we will go more into automated testing another time. Because there is of course more you can do. But you should always write example code, if your function is visible to the user.

After writing this documentation in your file, you have to compile it by writing:

roxygen2::roxygenise()

I hope that this short example will help you writing and documenting your own functions! 🙂

## Availability Of The Code

You can access a maintained version of the code in my GitHub repository Raspository under R/BCA.R.

# Does It Make You Slim?

Is a question asked me someone, after the person saw the I began blog with scientific topics. This someone apparently got a lot of ads for activated charcoal latte, which is also called Black Latte here in Germany.
A few months ago, when I got the question, I went on PubMed and searched for papers regarding this topic…
Tl;dr I didn’t find any! 😀
However this doesn’t mean that there aren’t any. And in no case it means that it doesn’t.

But! It seems, that there is some evidence from around the 80s and 90s, that activated charcoal can reduce LDL-cholesterol (the bad one) in patients with high blood cholesterol. Neuvonen et al even postulated a mechanism behind this effect.

Sounds nice doesn’t it? Maybe not as nice as getting slim, but still nice. Well… Unless you consider some of the adversary effects of it. According to webMD it can e.g. cause constipation and reduce the absorption of some nutrients. So you probably shouldn’t take it, including activated charcoal latte, as a regular supplement, unless your doctor tells you to.

Huh… Short post today? It seems so. 🙂 If you like this format (and also if not), please tell me! And if you have a question, just ask me. I’ll do my best in answering it.

# Or the Devil is in the Preprocessing

It’s been a while, eh? So I guess it’s time for another contribution to the Raspository. I didn’t really find the time to write something sensible last month, as I had a lot of things to do. More specifically I had to finish the essay I talked about in my last post. Then I had to learn for my exams and do some stuff regarding my old apartment, but now I have time again.

The topic of my essay will also be the topic of this post… More or less. The method I had to write about and for which I’m preparing a presentation right now is called netICS. It is a graph based method for the integration of multi-omics data.
I won’t go into detail about the core method itself in this post, but this will be a topic for a coming post… I just need to think about how to present the code I already coded to you! Instead of that I will tell you about the non-trivial step of preprocessing for the integration of multi-omics data. This is probably the part you’ll be spending most of your time on, if you’re planing to do so.

## Necessary simplifications

If you’re integrating data from various sources, what you need to do is to implement some sensible simplification. This does not only hold for multi-omics data1, but basically to every case where you combine different kind of sources.
This is kind of necessary as you don’t have unlimited resources. To elaborate more on this… With resources I mean everything from samples to computing power and verified2 interactions in your network.

I’m talking about networks, because networks are one way of making sense of those data-sets. In this manner you can treat your features as nodes in the network which can interact in certain ways with each other.
Of course nodes of different kinds interact in different ways with each other. That’s something you either have to think about unless this is the part where you simplify your model. And it is a reasonable simplification to treat all nodes in a network the same. That is more or less what was done with netICS.
In more detail, in the network all different products of a gene3 are represented by the same node.
This is reasonable as we may know a lot about interactions between certain kinds of gene products 4, but we don’t know enough about all kinds of possible interactions. But you always have to keep in mind your simplifications.

## Combination of different data types

Then there is the question on how to combine different data types. Because in the case of the above mentioned simplification you have to combine the data from the different gene products into one.

One example used in netICS is the combination of p-values according to the following formula, which is called Fisher’s method:

$$X = -2 \cdot (\sum^{k}_{i = 1}log(p_{i}))$$

You can then apply a Chi-Square test with 2k degrees of freedom on the variable X to get a new combined p-value. Of course this combination method comes with its own limitations.
One of them being that you can only apply it, when the different p-values you’re combing are independent. Independence is often assumed, but there are also methods for dependent cases. It however gets more complicated then.

Depending on what kind of omic data you want to combine it might also be sensible to combine them into a indicator variable. For example this indicator variable could take on the value of 1 if either one or both of two features from different data-sets are present in a sample and take on 0 otherwise.

## Integration of Multi-Omics Data is hard work

There is however no recipe that you can always just apply… Each data-set is different. And so is a combination of different data-sets. That’s why Data Science needs industrious people, that don’t shy away from boggling their mind.

But this work will also hold its own rewards. Because if you do so you will be able to use plethora of sweet5 algorithms on your now preprocessed data-set. netICS e.g. uses an approach from page ranking. The thing our friend Google does. And I will talk more about it very soon.
For now I hope I could give you a little impression into the struggle of data integration. Of course there are more things to keep in mind like unclean data. But we will save the cleaning of data-sets for another time. 😛

So see you soon!

# By understanding your sources

At the moment I have to write an essay about a scientific method called netICS for a seminar. The date for my presentation is in April. Normally I would at this moment be head over heels in making notes and writing summaries. But this time I decided to tackle the problem differently. Instead of trying to understand and analyse the complete text of the paper I tried to implement the method described in it on my own. And I’m mostly done doing so… And in doing so I now understand the method and could explain it to someone else. Something I wouldn’t probably be able to that degree, if I just analysed the paper itself.

## The alternative to implementation

Of course it is not always possible to implement a method on your own. Either because it is to complicated1 or because you’re not yet that proficient in programming. But there’s still one thing you can do… Pen and Paper. No, I’m not talking about Dungeons and Dragons2, but about doing some example calculations on paper. And if you do so you will probably already have content for your slides.
You know the phrase “Show, don’t tell”? This maxim does not only hold for novels, but basically for every medium of storytelling. And in fact if you’re doing a presentation right you have some kind of story line which you follow.

## The purpose of a presentation

And here we come to the question, what the purpose of a presentation should be, which is in my opinion not that different from the purpose of small talk, about which a lately did a blog post. To be more specific a presentation should awaken the interest in a topic for your audience. But of course you shouldn’t mislead your listeners. In other words don’t just tell a story, that sounds nice, but omits your actual topic. So your presentation should be interesting to listen to, but you shouldn’t beat around the bush. And in my opinion you do this best by showing you audience easy comprehensible examples… You know those thingies some professors call trivial. But in comparison to those professors your goal isn’t to filter out people by the lecture, but that give hopefully everybody something to take with.

## Conclusion

What I’m trying to say is basically: Even for doing something like presentation you have to do some practical work. At least when it comes to fields like Bioinformatics or Computer Science it’s never enough to do just literary work. And by doing so you can provide your audience with comprehensible examples. The alternatives are to batter your audience with theory or to just tell a story, that is hollow.
And both are things you don’t wanna do… Hopefully!

# How To Implement Alternating Least Squares In R And How Not To Do It

So it’s time for my first actual content. And like predicted in my Hello World post, it will be something implemented in the R programming language. More precisely it’s a Machine learning algorithm called Alternating Least Squares. But first, before we indulge ourselves in code, let me tell you why this algorithm is of interest for me and what it does.

## Introduction

I’ve been working now for a few months as part of my research assistant job on the Bioconductor package BEclear . I won’t go into detail about the package, you only need to know that it uses something called a Latent Factor Model to impute1 missing data in data-sets.

Let’s say you have a matrix D containing missing values. The rows in the matrix stand for features and the columns for samples. Then you could assume that the matrix $$D_ {ij}$$ is modeled by both features and sample specific effects in the following way:

$$D_ {ij} = L_ {i}^ {T} \times R_ {j}$$

Where $$L_i$$is the feature specific matrix and $$R_j$$ the sample specific matrix. For the imputation of missing values you now try to estimate those two matrices, the latent factors, from the existing values. Methods based on this assumption are already applied in variety of fields. Like with batch effects in DNA Methylation in the case of the BEclear package or in recommender systems for e.g. Netflix like described in a paper, which helped me a lot in understanding this topic.

To estimate the latent factors there are different methods. One of them, implemented in the BEclear package is a gradient descent. Another method for it is Alternating Least Squares (ALS), which I wanted to implement on my own.2

The lecture notes of Hastie et al served as my source for implementing this method. I highly recommend you reading both the paper from Koren et al and those lecture notes, if you want to know more about the theoretical background and also the applications of those methods. You just need to know some Linear Algebra, then they should be easy enough to understand in my opinion.

But as a short summary… ALS tries to estimate the feature and sample matrix by alternating fixing one of them and then calculating the other one by solving the the system of equations and you do this in general until convergence. In Gradient Descent on the other hand both matrices are estimated at the same time.

## How Not To Implement Alternating Least Squares

As a start I will show you my first faulty try in implementing the Alternating Least Squares. Maybe you will learn something by me sharing it with you. And you should try to guess what my mistake was.

As my implementation reuses a function from BEclear34, you have to install this package first. For this purpose I guess it’s easiest if you just install it from GitHub via the following lines of code:

if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github("David-J-R/BEclear")

And now let’s come to the first implementation of the Alternating Least Squares algorithm. See if you can find its problem. I tried to comment all those steps so that the code should be comprehensible. But if not, please let me know! 🙂

imputeALSfaulty<- function(data, lambda = 0.75, r = 10, iters = 20){

# We require the BEclear package, because we're using its loss function
require(BEclear)

D <- data
D[is.na(D)] <- 0

# We initialise L and R with random values
L <- matrix(rnorm(nrow(data) * r), nrow(data), r) / sqrt(r)
R <- matrix(rnorm(r * ncol(data)), r, ncol(data)) / sqrt(r)

currLoss <- BEclear:::loss(L,R, 1, data)$loss print(currLoss) for(i in 1:iters){ # L and R are determined by solving the following system of the equations # We repeat this step iters-times L <- t(solve(R %*% t(R) + diag(lambda,r), R %*% t(D))) R <- solve(t(L) %*% L + diag(lambda,r), t(L) %*% D) currLoss <- BEclear:::loss(L,R, 1, data)$loss

print(currLoss)
}

# L and R are multiplied to get the estimated values
D <- L %*% R

# Missing values are replaced with estimated value
for (i in seq_len(nrow(data)))
for (j in seq_len(ncol(data)))
{
if (is.na(data[i, j])) {
data[i, j] <- D[i, j]
}
}

return(data)
}

Now let me show you the problem with this implementation, if you haven’t recognized it on your own by now.
First we will load the example data and functions from the BEclear package to generate ourselves a sample data-set with missing values:

library(BEclear)
data("BEclearData")
batchEffect <- calcBatchEffects(
data = ex.data, samples = ex.samples,
adjusted = TRUE, method = "fdr")

mdifs <- batchEffect$med pvals <- batchEffect$pval

summary <-calcSummary(mdifs, pvals)
cleared.data <- clearBEgenes(ex.data, ex.samples, summary)

And then we run the faulty version of ALS on it. The printed output of the function is the loss of the current solution during each iteration.

result <- imputeALSfaulty(cleared.data, iters = 10)
## [1] 2586.68
## [1] 101.8086
## [1] 95.60281
## [1] 95.29458
## [1] 95.21404
## [1] 95.20139
## [1] 95.20632
## [1] 95.21329
## [1] 95.21869
## [1] 95.22233
## [1] 95.2247

If we now take a look at the imputed values, you can see what’s wrong:

boxplot(result[is.na(cleared.data)])

They’re all pretty close to zero. That’s because we set the missing values to zero. This way the solve method “tries” to generate R and L the way that the missing values are also very close to zero.
Of course we don’t want that… This way we could just set the missing values right away to zero.

## How To Implement Alternating Least Squares

Finally let me show you an implementation that actually does, what it should do. And again if something is unclear, don’t hesitate to ask me!

imputeALScorrect<- function(data, lambda = 0.75, r = 10, iters = 80){

# We require the BEclear package, because we're using its loss function
require(BEclear)

# We initialise L and R with random values
L <- matrix(rnorm(nrow(data) * r), nrow(data), r) / sqrt(r)
R <- matrix(rnorm(r * ncol(data)), r, ncol(data)) / sqrt(r)

currLoss <- BEclear:::loss(L,R, 1, data)$loss print(currLoss) for(iter in 1:iters){ # Now we iterate over the feature dimmension of L for(i in 1:dim(L)[[1]]){ # We determine the revealed entries for the feature # And subset the data and R so to only retain the revealed entries revealedEntries <- !is.na(data[i,]) y <- as.matrix(data[i, revealedEntries]) x <- R[,revealedEntries] # We solve the linear equation for the feature L[i,] <- as.vector(solve(x %*% t(x) + diag(lambda, r), x %*% y)) } # We iterate over the sample dimmension of R for(j in 1:dim(R)[[2]]){ # We determine the revealed entries for the sample # And subset the data and L so to only retain the revealed entries revealedEntries <- !is.na(data[,j]) y <- as.matrix(data[revealedEntries, j]) x <- L[revealedEntries,] # We solve the linear equation for the sample R[,j] <- as.vector(solve(t(x) %*% x + diag(lambda, r), t(x) %*% y)) } currLoss <- BEclear:::loss(L,R, 1, data)$loss

print(currLoss)
}

# L and R are multiplied to get the estimated values
D <- L %*% R

# Missing values are replaced with estimated value

for (i in seq_len(nrow(data)))
for (j in seq_len(ncol(data)))
{
if (is.na(data[i, j])) {
data[i, j] <- D[i, j]
}
}

return(data)
}

A further advantage of this implementation is, that it is relatively easy to write a parallelised version of it. Maybe I will show you that in one of my next posts. After I overheard a conversation at the university that R is apparently bad for this, I feel almost challenged to do so.

Now let’s take a look at the imputed values. We just take the sample data-set from before for this cause.

result <- imputeALScorrect(cleared.data, iters = 10)
## [1] 2571.072
## [1] 109.301
## [1] 99.38027
## [1] 97.17519
## [1] 95.42625
## [1] 94.00547
## [1] 92.83838
## [1] 91.87368
## [1] 91.07338
## [1] 90.40794
## [1] 89.85372
boxplot(result[is.na(cleared.data)])

Now that looks more like real data… Doesn’t it? But to be sure let’s compare it to the predicted values by the BEclear package. For the comparison we calculated the Root Mean Squared Error:

library(Metrics)
result.BEclear <- imputeMissingData(cleared.data)
## INFO [2019-02-08 12:17:10] Starting the imputation of missing values.
## INFO [2019-02-08 12:17:10] This might take a while.
## INFO [2019-02-08 12:17:10] BEclear imputation is started:
## INFO [2019-02-08 12:17:10] block size: 60  x  60
## INFO [2019-02-08 12:17:10] Impute missing data for block 1 of 4
## INFO [2019-02-08 12:17:10] Impute missing data for block 2 of 4
## INFO [2019-02-08 12:17:11] Impute missing data for block 3 of 4
## INFO [2019-02-08 12:17:11] Impute missing data for block 4 of 4
rmse(result.BEclear[is.na(cleared.data)], result[is.na(cleared.data)])
## [1] 0.03196931

Well the difference isn’t that big. But of course for assessing the accuracy of the method an elaborate evaluation would be needed. However for something I coded just for fun I’m satisfied with this first look.

Just for fun let’s also add biases to our model, like described by Koren et al, to further improve our algorithm.

The idea behind the bias is to capture the variability of the data that arises from the features or samples alone, while the two matrices L and R capture the variability that arises from the interaction of features and samples together.
In other words by introducing the biases we unburden L and R a bit.
We use a method, where the biases for each entry in the data-set are the sum of the overall average over all values and the average difference from this average of the corresponding column and row. And to save valuable computation time we just subtract this bias for each value from a copy of each value and use this transformed matrix for further calculations. Of course we have to add the bias later again.

And here we go with the improved implementation:

imputeALSBias<- function(data, lambda = 0.75, r = 5, iters = 10, use.biases=TRUE){

# We require the BEclear package, because we're using its loss function
require(BEclear)

# copy the data
D <- data

# We initialise L and R with random values
L <- matrix(rnorm(nrow(data) * r), nrow(data), r) / sqrt(r)
R <- matrix(rnorm(r * ncol(data)), r, ncol(data)) / sqrt(r)

currLoss <- BEclear:::loss(L,R, 1, D)$loss print(currLoss) if(use.biases){ # we calculate the biases biasData<-mean(data, na.rm = TRUE) biasRows<-rowMeans(data - biasData, na.rm= TRUE) biasCols<-colMeans(data - biasData, na.rm= TRUE) # subtract the biases from the data D <- D - biasData - biasRows D <- t(t(D) - biasCols) } for(iter in 1:iters){ # Now we iterate over the feature dimmension of L for(i in 1:dim(L)[[1]]){ # We determine the revealed entries for the feature # And subset the data and R so to only retain the revealed entries revealedEntries <- !is.na(D[i,]) y <- as.matrix(D[i, revealedEntries]) x <- R[,revealedEntries] # We solve the linear equation for the feature L[i,] <- as.vector(solve(x %*% t(x) + diag(lambda, r), x %*% y)) } # We iterate over the sample dimmension of R for(j in 1:dim(R)[[2]]){ # We determine the revealed entries for the sample # And subset the data and L so to only retain the revealed entries revealedEntries <- !is.na(D[,j]) y <- as.matrix(D[revealedEntries, j]) x <- L[revealedEntries,] # We solve the linear equation for the sample R[,j] <- as.vector(solve(t(x) %*% x + diag(lambda, r), t(x) %*% y)) } currLoss <- BEclear:::loss(L,R, 1, D)$loss

print(currLoss)
}

# L and R are multiplied to get the estimated values
D <- L %*% R

if(use.biases){
# we add the biases again
D <- t(t(D) + biasCols)
D <- D + biasData + biasRows
}

# Missing values are replaced with estimated value

for (i in seq_len(nrow(data)))
for (j in seq_len(ncol(data)))
{
if (is.na(data[i, j])) {
data[i, j] <- D[i, j]
}
}

return(data)
}

Testing this implementation, if you wish, is now your turn! 🙂
Maybe at some later point I will compare the performance and correctness of various different settings of this functions5. But for now that’s enough. Of course there are more sophisticated bias models we could think of. But one could even think of bias models like biases that are also determined by the Alternating Least Squares method during each iteration.
So we won’t run out of things to do any time soon.

## Conclusion

Yea, what’s the conclusion? I think it’s quite simple… Don’t be lazy, while coding!

OK, OK… I will say a bit more. I think what you can learn from the faulty example is that you should always think what your code is actually doing and take a look at the results to see, if something is fishy. Other than that I hope that you learned something and I could show you that some methods used in Machine Learning aren’t that complicated.

For now my implementation of ALS is still worse, when it comes to run time, in comparison to the Gradient Descent implemented in the BEclear package. But I also spend a lot of time optimizing the second. And maybe I will show you in a future blog how to optimize it. As this is my first blog post I would highly welcome feedback from you! 🙂

So have a nice day and until next time!

## Availability Of The Code

You can access a maintained version of the code of the correct version in my GitHub repository Raspository under R/imputeALS.R.