CSE 455 Computer Vision

CSE455: Computer Vision - Spring 2018
I saw this course on pjreddie's GitHub page, and found it intersting.👍
It is an undergraduate course provided by School of Computer Science and Engineering at University of Washington. I did the assignment for my personal interest.😋

Solution of Assignments📁

My solution to the Assignments includes codes to finish the homework and extra things to get the credits.

Style Things📗

  1. ‘for’ loop initial declarations are only allowed in C99 mode
    Although i could use -std=c99 flag to tell the complier to use the C99, i think it's cooler to do the declaration out of the loop.
    int i, j, k;
    for (i = 0; i < im.c; ++i){
      for (j = 0; j < im.h; ++j){
        for (k = 0; k < im.w; ++k){
          /*body*/
        }
      }
    }
    
  2. If statement
    My obsession: if(expression) for single-line things, and if (expression){ for multiple lines. And always use if(1) or if(0) to enable/disable code snippet.
    if(!sum) return;
    
    if (a == LOGISTIC){
        d.data[i][j] *= x * (1 - x);
    } else if (a == RELU){
        d.data[i][j] *= x > 0 ? 1 : 0;
    } else if (a == LRELU){
        d.data[i][j] *= x > 0 ? 1 : 0.1;
    }
    
    if(0){
        /*disabled body*/
    } else
    {
        /*enabled body*/
    }
    
    So i can search for if(0) to locate the snippet and do the switch quickly?
    Actually, i was not stick to this norm in this repository.😂
  3. Always use ++i when i have a choice

Note📝

  1. Makefile
    TODO Should write a gist for it.

  2. Complie with opencv(using MinGW)

  3. struct with pointer inside it
    When we define a struct with at least one pointer in it.

    typedef struct matrix{
        int rows, cols;
        double **data;
        int shallow;
    } matrix;
    

    We should write a function to allocate and initialize memory for it for safety amd convenience:

    matrix make_matrix(int rows, int cols)
    {
        matrix m;
        m.rows = rows;
        m.cols = cols;
        m.shallow = 0;
        m.data = calloc(m.rows, sizeof(double *));
        int i;
        for(i = 0; i < m.rows; ++i) m.data[i] = calloc(m.cols, sizeof(double));
        return m;
    }
    

    And also a function to free the memory:

    void free_matrix(matrix m)
    {
        if (m.data) {
            int i;
            if (!m.shallow) for(i = 0; i < m.rows; ++i) free(m.data[i]);
            free(m.data);
        }
    }
    

    Remember to call it to free the memory manually to avoid ⚠️segmentation fault.
    And also a funtion for deep copy(if necessary).

    matrix copy_matrix(matrix m)
    {
        int i,j;
        matrix c = make_matrix(m.rows, m.cols);
        for(i = 0; i < m.rows; ++i){
            for(j = 0; j < m.cols; ++j){
                c.data[i][j] = m.data[i][j];
            }
        }
        return c;
    }
    
  4. Never use struct with pointer inside it as intermediate varible in the expression
    in ./vision-hw4/src/classifier.c i used to write things like this.

    // THIS IS TOTALLY WRONG!
    matrix backward_layer(layer *l, matrix delta)
    {
        // back propagation through the activation
        gradient_matrix(l->out, l->activation, delta);
        
        // calculate dL/dw and save it in l->dw
        free_matrix(l->dw);
        matrix dw = matrix_mult_matrix(transpose_matrix(l->in), delta);
        l->dw = dw;
        
        // calculate dL/dx and return it.
        matrix dx = matrix_mult_matrix(delta, transpose_matrix(l->w));
    
        return dx;
    }
    

    It is totally wrong because the intermediate struct variable transpose_matrix(l->in) and transpose_matrix(l->w) will never ever be freed. And this stupid Python-like convenient writing will run out of ur memory. And fill it up with these intermediate garbage. Finally throw out a ⚠️segmentation fault.
    The right way to do this is:

     matrix backward_layer(layer *l, matrix delta)
    {
        // back propagation through the activation
        gradient_matrix(l->out, l->activation, delta);
        
        // calculate dL/dw and save it in l->dw
        free_matrix(l->dw);
        matrix inT = transpose_matrix(l->in);
        matrix dw = matrix_mult_matrix(inT, delta);
        free_matrix(inT);
        l->dw = dw;
        
        // calculate dL/dx and return it.
        matrix wT = transpose_matrix(l->w);
        matrix dx = matrix_mult_matrix(delta, wT);
        free_matrix(wT);
    
        return dx;
    }
    
  5. String things could cause fatal mistake
    After finishing my code in ./vision-hw4, i trained the model on my windows laptop and it worked well. But when i tried to use Linux to do the same thing, the training procedure just get crashed which gave me a 0% training and test accuracy.
    After debugging i found that i accidentally changed the line ending(of the file mnist.labels) from LF to CRLF, which is default on Windows.
    This converts all \n(represents line break on Linux) to \r\n(represents line break on Windows).
    So num0\n becomes num0\r\n in mnist.labels, so does the rest.
    And see char *fgetl(FILE *fp) function in ./src/data.c. This function parses labels from the text file and stores labels for training and test phase.

    char *fgetl(FILE *fp)
    {
        if(feof(fp)) return 0;
        size_t size = 512;
        char *line = malloc(size*sizeof(char));
        if(!fgets(line, size, fp)){
            free(line);
            return 0;
        }
    
        size_t curr = strlen(line);
    
        while((line[curr-1] != '\n') && !feof(fp)){
            if(curr == size-1){
                size *= 2;
                line = realloc(line, size*sizeof(char));
                if(!line) {
                    fprintf(stderr, "malloc failed %ld\n", size);
                    exit(0);
                }
            }
            size_t readsize = size-curr;
            if(readsize > INT_MAX) readsize = INT_MAX-1;
            fgets(&line[curr], readsize, fp);
            curr = strlen(line);
        }
        if(line[curr-1] == '\n') line[curr-1] = '\0';
    
        return line;
    }
    

    And most importantly, this function looks for \n as a marker of line ending. So label num0 becomes num0\r, so does the other labels.
    At the training phase, all the training samples will be considered as negative so does the test phase. Surprisingly but reasonablely, i got 0% for both training and test accuracy.
    Remember:

    1. LF as a default option
    2. Make string function compatible with both Linux and Windows
  6. More Extra Credit of vision-hw2(spherical coordinates)

Resources📚

  1. Text Book: Computer Vision: Algorithms and Applications Rick Szeliski, 2010.
  2. My solution: ivanpp/CSE455_Spring_2018