<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[ivanpp vs. AUTOCRACY]]></title><description><![CDATA[Woman Life Freedom]]></description><link>https://ivanpp.cc/</link><image><url>https://ivanpp.cc/favicon.png</url><title>ivanpp vs. AUTOCRACY</title><link>https://ivanpp.cc/</link></image><generator>Ghost 3.42</generator><lastBuildDate>Thu, 02 Apr 2026 11:16:00 GMT</lastBuildDate><atom:link href="https://ivanpp.cc/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Bavarian Adventure Pt. 1]]></title><description><![CDATA[<blockquote>This post is about the first semester of my Erasmus exchange in TUM (Technical University of Munich) Germany🇩🇪  from Chalmers Sweden🇸🇪. </blockquote><h2 id="early-scouting">Early Scouting</h2><p>During the semester break, I chose to explore the Alps instead of enjoying the 18 hours of daylights in Scandinavian. It was the exam weeks of TUM</p>]]></description><link>https://ivanpp.cc/bavarian-adventure/</link><guid isPermaLink="false">65e9b1eac55d5d18ffeeafc7</guid><dc:creator><![CDATA[ivan Ding]]></dc:creator><pubDate>Mon, 11 Mar 2024 22:41:33 GMT</pubDate><content:encoded><![CDATA[<blockquote>This post is about the first semester of my Erasmus exchange in TUM (Technical University of Munich) Germany🇩🇪  from Chalmers Sweden🇸🇪. </blockquote><h2 id="early-scouting">Early Scouting</h2><p>During the semester break, I chose to explore the Alps instead of enjoying the 18 hours of daylights in Scandinavian. It was the exam weeks of TUM when I visited Munich. I sited in front of the <a href="https://www.cit.tum.de/en/cit/school/locations/parabola-slide/">Parabola Slide</a> in CIT building, joined the welcome meeting via Zoom. Following their advice, I immediately started my house finding in Munich. After more than 100 applications, I finally got a cozy place in Schwabing. </p><p>Apart from getting a place to stay, this "30-day (NOT) free trial" gave me an expectation of what my life will be for the following year. I also got to know the most delicious dishes in Mensa quite in advance. </p><figure class="kg-card kg-image-card"><img src="https://ivanpp.cc/content/images/2024/03/IMG_7814.jpg" class="kg-image" alt srcset="https://ivanpp.cc/content/images/size/w600/2024/03/IMG_7814.jpg 600w, https://ivanpp.cc/content/images/size/w1000/2024/03/IMG_7814.jpg 1000w, https://ivanpp.cc/content/images/size/w1600/2024/03/IMG_7814.jpg 1600w, https://ivanpp.cc/content/images/size/w2400/2024/03/IMG_7814.jpg 2400w" sizes="(min-width: 720px) 720px"></figure><p>I also travelled to Switzerland🇨🇭, "The Notorious B.A.H.N" (not a German Rapper) drove me to my bff. It is a beautiful but expensive heaven. </p><h2 id="chaos">Chaos</h2><figure class="kg-card kg-image-card"><img src="https://ivanpp.cc/content/images/2024/03/IMG_9481-1.jpg" class="kg-image" alt srcset="https://ivanpp.cc/content/images/size/w600/2024/03/IMG_9481-1.jpg 600w, https://ivanpp.cc/content/images/size/w1000/2024/03/IMG_9481-1.jpg 1000w, https://ivanpp.cc/content/images/size/w1600/2024/03/IMG_9481-1.jpg 1600w, https://ivanpp.cc/content/images/size/w2400/2024/03/IMG_9481-1.jpg 2400w" sizes="(min-width: 720px) 720px"></figure><p>I went back to Sweden in August, with the feeling of something went wrong. The Swedish <a href="https://en.wikipedia.org/wiki/Travel_visa">visa</a> extension I applied in April 2023 went totally silent. I tried to contact my case officer, and had some really bad experience. I decided to start my "Plan B". I contacted TUM and German embassy, they gave me every information I need to start the German visa application. </p><p>I got a termin in October in Stockholm. Although a lot of documents were needed, the whole process only took one month. </p><p>This was the most difficult, most stressful part of my first semester exchange study. From October, the start of TUM's winter semester, till the day I got my passport back from German Embassy, the stress piled up. It was my partner and friends' support made me the way to Munich. </p><h2 id="servus">Servus</h2><p>Finally, I arrived Munich again at the end of November. Like every other settlers here, I have to anmeldung my address, get a bank account, find a family doctor... Everything makes me happy because I'm finally here. </p><p>I lost my position in Praktikum because I cannot attend in person, but for other courses I was not lagging behind too much (thanks to <a href="https://tum.live/">TUM-Live</a>).  The Oktoberfest passed by but we prepared and celebrated Christmas together. </p><h2 id="courses-and-exams">Courses and Exams</h2><p>As a bigger uni (of ~50k students), TUM has more types of courses, with finer granularity. Most of the lecture-based courses have 6 or 8 credits, seminars have 4 credits, and praktikums have 10 credits. Lecture-based courses are examined by written exams, seminars are evaluated by presentation and/or report. Some praktikums are lab course, they're evaluated by lab report and presentation. Some praktikums are project course, they're evaluated by final project. </p><p>Exams in my home university have a length of 4 hours, getting 80% of the points will give you the highest grade (5.0). In TUM, you only have 90-120 minutes to finish lot of questions, this is called "Überhangklausur" (overhang exam). But don't be panic, to get a highest grade (1.0), you don't have to get full points, or certain percentage of points. You ONLY have to beat your peers, become top x% of all hundreds of students (grading distribution or bar is defined by the examiner). </p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://ivanpp.cc/content/images/2024/03/grade-distribution-example.png" class="kg-image" alt srcset="https://ivanpp.cc/content/images/size/w600/2024/03/grade-distribution-example.png 600w, https://ivanpp.cc/content/images/size/w1000/2024/03/grade-distribution-example.png 1000w, https://ivanpp.cc/content/images/2024/03/grade-distribution-example.png 1098w" sizes="(min-width: 720px) 720px"><figcaption>example of the grade distribution</figcaption></figure><p>One important thing to know is, once you passed one exam, you can never attend any of its retake. For someone who cares the grade, getting a 4.0 (lowest passing grade) is the biggest nightmare. Many students fail the exam on purpose when they find it's not worth to pass it <em>this time</em>😱. </p><h2 id="entertainment">Entertainment</h2><p>As a fan of FCB for more than 15 years, of course I will be happy to move to Munich. Now I live only about 1300 kilometers away from my favorite team - FC Barcelona❤️💙! </p><figure class="kg-card kg-embed-card"><iframe style="border-radius: 12px" width="100%" height="152" title="Spotify Embed: Un Dia de Partit" frameborder="0" allowfullscreen allow="autoplay; clipboard-write; encrypted-media; fullscreen; picture-in-picture" loading="lazy" src="https://open.spotify.com/embed/track/6b8BRSs64mczsUqFyrZ95d?utm_source=oembed"></iframe></figure><p>I boulder 2-3 times a week at Boulderwelt, but Fysiken Klätterlabbet Centrum will always be my favorite boulder gym.</p><h2 id="pt-2">Pt. 2</h2><p>The adventure is a continuous and differentiable function, I'm still exploring and optimizing it, and I know what direction to go. </p>]]></content:encoded></item><item><title><![CDATA[Why Instant NGP is extremely fast?]]></title><description><![CDATA[<blockquote>Neural Graphics Primitives (NGP) is an object represented by quires to a neural network.</blockquote><p>Instant NGP speeds up the training of the original NeRF by 1000x, while still using neural network to implicitly store the scene. What is the magic in it?</p><h2 id="trainable-multiresolution-hash-encoding">Trainable Multiresolution Hash Encoding</h2><p>The trainable multiresolution hash</p>]]></description><link>https://ivanpp.cc/why-instant-ngp-is-fast/</link><guid isPermaLink="false">650357f68049c35910ad06b2</guid><dc:creator><![CDATA[ivan Ding]]></dc:creator><pubDate>Fri, 03 Nov 2023 22:50:00 GMT</pubDate><content:encoded><![CDATA[<blockquote>Neural Graphics Primitives (NGP) is an object represented by quires to a neural network.</blockquote><p>Instant NGP speeds up the training of the original NeRF by 1000x, while still using neural network to implicitly store the scene. What is the magic in it?</p><h2 id="trainable-multiresolution-hash-encoding">Trainable Multiresolution Hash Encoding</h2><p>The trainable multiresolution hash encoding permits the use of a smaller neural network without sacrificing quality, and remains generality. Several techniques are used to make the encoding works better on modern GPU.</p><figure class="kg-card kg-image-card"><img src="https://ivanpp.cc/content/images/2023/09/image-3.png" class="kg-image" alt srcset="https://ivanpp.cc/content/images/size/w600/2023/09/image-3.png 600w, https://ivanpp.cc/content/images/size/w1000/2023/09/image-3.png 1000w, https://ivanpp.cc/content/images/size/w1600/2023/09/image-3.png 1600w, https://ivanpp.cc/content/images/2023/09/image-3.png 2297w" sizes="(min-width: 720px) 720px"></figure><p>The trainable features are arranged into $L=16$ levels of hash tables, each mapped to one resolution of a virtual 3D voxel grid. Given a 3D location $(x,y,z)$, on each level of 3D voxel grid, we interpolate the feature vectors from the feature vectors of its 8 integer corners (4 corners if 2D, as shown in the picture). All feature vectors ($F$ dimension) of all these integer corners are stored in a static data structure, i.e. the hash table of size $T$. So for each location we're interested, at each $L$ level, we lookup hash table for 8 times, and interpolate to get a feature vector of size $F$. Then we concatenate all feature vectors of all levels with a auxiliary feature vector (can be anything!) of size $E$. Finally, we get a feature vector of size $(L\times F + E)$.</p><p>To be notice, this process can be done in parallel efficiently. For every pixels we try to render at a time, we load one level of the hash table into the GPU cache, do the hash, look-up the feature vectors of all these pixels, then interpolate. Then move on to the next level, do the same thing. Finally, all the interpolated features are concatenated together, along with a auxiliary input, becomes the input of the neural network.</p><p>The efficiency of the static hash table is better than dynamic structures like tree, and it is more general. When the resolution of a certain level is larger than the hash table size, there will be hash collision. But it is automatically solved by multi-resolution and interpolation. (The chance that 2 different location has the same final feature vector input is near zero)</p><p>The proposed hash encoding is highly efficient and is tailored with several techniques to improve even more. </p><h3 id="mixed-precision">Mixed-precision</h3><p>The hash table entries are stored in half-precision, and mixed precision training were used. That enables faster training and faster inference. </p><h3 id="gpu-cache-optimization">GPU Cache Optimization</h3><p>As mentioned before, the hash tables are evaluated level by level. So only some levels of the hash tables will reside in caches and will be reused over and over again, at any given time.</p><p>More importantly, the use of the multiresolution hash encoding makes it possible to use a <strong>smaller neural network</strong> without sacrificing quality.</p><h2 id="smaller-neural-networks-fully-fused-mlps-">Smaller Neural Networks (Fully-fused MLPs)</h2><blockquote>Instant NGP uses highly optimized fully-fused MLP, which is 5-10x faster than TensorFlow implementation (e.g. in original NeRF).</blockquote><p>By using a <strong>relatively small neural network</strong>, and <strong>make good use of the GPU</strong>, instant NGP gets their neural network part close to voxel lookup speed. </p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://ivanpp.cc/content/images/2023/09/image-1.png" class="kg-image" alt srcset="https://ivanpp.cc/content/images/size/w600/2023/09/image-1.png 600w, https://ivanpp.cc/content/images/size/w1000/2023/09/image-1.png 1000w, https://ivanpp.cc/content/images/size/w1600/2023/09/image-1.png 1600w, https://ivanpp.cc/content/images/2023/09/image-1.png 1960w" sizes="(min-width: 720px) 720px"><figcaption>Voxel versus Neural Network</figcaption></figure><p>Voxel based method stores the scene 3D voxels, like store image data in 2D pixels. To know the attribute of a given position (3D coordinates), a simple look-up is enough. Methods like Plenoxels uses voxels in replace of neural network to significantly speedup the pipeline. But to store a high-resolution scene, excess memory are needed. That amount of storage makes a simple look-up not simple anymore. When training, huge amount of the voxel data are needed to be transferred into memory and cache repeatedly. Excess memory operations bound the speed (but it is still relatively fast). </p><p>Theoretically, voxel based method can be faster when we have more memory and cache. And neural network based method trades the memory footprint for compute. It can be faster if we make the computation more effectively.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://ivanpp.cc/content/images/2023/09/image.png" class="kg-image" alt srcset="https://ivanpp.cc/content/images/size/w600/2023/09/image.png 600w, https://ivanpp.cc/content/images/size/w1000/2023/09/image.png 1000w, https://ivanpp.cc/content/images/size/w1600/2023/09/image.png 1600w, https://ivanpp.cc/content/images/2023/09/image.png 2252w" sizes="(min-width: 720px) 720px"><figcaption>Fully-fused NN</figcaption></figure><p>For a standard neural network, given a fixed batch size, the compute cost is $O(M)$, and memory cost is $O(M^2)$, while $M$ is number of neurons per layer. On bigger neural networks, focus on optimization of computation is wise, but on smaller ones, the memory is the most important thing. </p><p>They made their neural network so small, so that the whole network can fit into the on-chip memory of the GPU. When evaluating the network (imagine ray marching and query thousands of vales at the same time), each thread block can run the whole network independently, using the weights and bias stored in on-chip memory. </p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://ivanpp.cc/content/images/2023/09/image-2.png" class="kg-image" alt srcset="https://ivanpp.cc/content/images/size/w600/2023/09/image-2.png 600w, https://ivanpp.cc/content/images/size/w1000/2023/09/image-2.png 1000w, https://ivanpp.cc/content/images/size/w1600/2023/09/image-2.png 1600w, https://ivanpp.cc/content/images/2023/09/image-2.png 1922w" sizes="(min-width: 720px) 720px"><figcaption>Speedup against TensorFlow</figcaption></figure><p>The authors are from NVIDIA, they know their hardware well, and they know CUDA well, so they implemented <a href="https://github.com/NVlabs/instant-ngp">instant NGP</a> in CUDA and integrated with fully-fused MLPs of the <a href="https://github.com/NVlabs/tiny-cuda-nn">tiny-cuda-nn framework</a>. With carefully tailored neural network and good use of the NVIDIA GPU, 5-10x speedup is achieved compared with TensorFlow version.</p><h2 id="accelerated-ray-marching">Accelerated Ray Marching</h2><blockquote>Overall, instant NGP takes 10-100x fewer steps than the naïve dense stepping, which means 10-100x fewer less query of the neural network.</blockquote><ol><li>Exponential stepping for large scenes.</li><li>Skipping of empty space and occluded regions.</li><li>Compaction of samples into dense buffers for efficient execution.</li></ol><h3 id="exponential-stepping-for-large-scenes">Exponential stepping for large scenes</h3><p>Typically, larger scenes have more empty regions, and coarser details is not too noticeable. A exponential step size is so the computation grows with scene size. </p><h3 id="skipping-of-empty-space-and-occluded-regions">Skipping of empty space and occluded regions</h3><p>A multi-scale occupancy grid is maintained to indicate where in the space is <strong>empty</strong>. For empty spaces we don't have to infer the neural network, hence computation is saved. (little extra memory but less computation)</p><h2 id="reference">Reference</h2><ol><li><a href="https://dl.acm.org/doi/pdf/10.1145/3528223.3530127">Müller, Thomas, et al. "Instant neural graphics primitives with a multiresolution hash encoding." <em>ACM transactions on graphics (TOG)</em> 41.4 (2022): 1-15.</a></li><li><a href="https://tom94.net/data/publications/mueller22instant/mueller22instant-gtc.mp4">Instant NGP Presentation | GTC</a></li></ol>]]></content:encoded></item><item><title><![CDATA[Back to Computer Vision]]></title><description><![CDATA[<p>After 3 years, me decides to dive Deeep into CV again...</p><!--kg-card-begin: markdown--><h2 id="resources">Resources</h2>
<ul>
<li><a href="https://sites.google.com/berkeley.edu/nerf-tutorial/home"><strong>ECCV 2022: NeRF Tutorial</strong></a></li>
<li><a href="https://learning3d.github.io/index.html"><strong>CMU 16-825: Learning for 3D Vision</strong></a></li>
<li><a href="https://sites.google.com/berkeley.edu/learningfor3d-seminar/home">UCB CS294-173: Learning for 3D Vision</a></li>
</ul>
<!--kg-card-end: markdown--><p></p><!--kg-card-begin: markdown--><h2 id="paperlist">Paper List</h2>
<p><a href="https://arxiv.org/abs/2003.08934">Mildenhall, Ben, et al. &quot;Nerf: Representing scenes as neural radiance fields for view synthesis.&quot; Communications of the ACM</a></p>]]></description><link>https://ivanpp.cc/back2cv/</link><guid isPermaLink="false">645185f58049c35910ad061b</guid><category><![CDATA[Computer Vision]]></category><category><![CDATA[Course]]></category><dc:creator><![CDATA[ivan Ding]]></dc:creator><pubDate>Sun, 04 Jun 2023 03:00:00 GMT</pubDate><content:encoded><![CDATA[<p>After 3 years, me decides to dive Deeep into CV again...</p><!--kg-card-begin: markdown--><h2 id="resources">Resources</h2>
<ul>
<li><a href="https://sites.google.com/berkeley.edu/nerf-tutorial/home"><strong>ECCV 2022: NeRF Tutorial</strong></a></li>
<li><a href="https://learning3d.github.io/index.html"><strong>CMU 16-825: Learning for 3D Vision</strong></a></li>
<li><a href="https://sites.google.com/berkeley.edu/learningfor3d-seminar/home">UCB CS294-173: Learning for 3D Vision</a></li>
</ul>
<!--kg-card-end: markdown--><p></p><!--kg-card-begin: markdown--><h2 id="paperlist">Paper List</h2>
<p><a href="https://arxiv.org/abs/2003.08934">Mildenhall, Ben, et al. &quot;Nerf: Representing scenes as neural radiance fields for view synthesis.&quot; Communications of the ACM 65.1 (2021): 99-106.</a></p>
<h3 id="worksthatpriortonerf">Works that prior to NeRF</h3>
<ol>
<li><a href="https://arxiv.org/abs/1906.07751">Lombardi, Stephen, et al. &quot;Neural volumes: Learning dynamic renderable volumes from images.&quot; arXiv preprint arXiv:1906.07751 (2019).</a></li>
<li><a href="https://arxiv.org/abs/1905.00889">Mildenhall, Ben, et al. &quot;Local light field fusion: Practical view synthesis with prescriptive sampling guidelines.&quot; ACM Transactions on Graphics (TOG) 38.4 (2019): 1-14.</a></li>
<li><a href="https://arxiv.org/abs/1906.01618">Sitzmann, Vincent, Michael Zollhöfer, and Gordon Wetzstein. &quot;Scene representation networks: Continuous 3d-structure-aware neural scene representations.&quot; Advances in Neural Information Processing Systems 32 (2019).</a></li>
</ol>
<h3 id="nerfrelatedwork">NeRF related work</h3>
<ol>
<li><a href="https://arxiv.org/abs/1901.05103">Park, Jeong Joon, et al. &quot;Deepsdf: Learning continuous signed distance functions for shape representation.&quot; Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.</a></li>
<li><a href="https://arxiv.org/abs/1812.03828">Mescheder, Lars, et al. &quot;Occupancy networks: Learning 3d reconstruction in function space.&quot; Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.</a></li>
</ol>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[WCET Analysis - Fibonacci recursion, structure matters]]></title><description><![CDATA[<h2 id="the-natural-way">The natural way</h2><p>Here is a recursion function, generating <a href="https://en.wikipedia.org/wiki/Fibonacci_number">Fibonacci number</a>.</p><pre><code class="language-c">int fib (int z) {
    int r;
    if (z == 0)
        r = 0;
    else if (z == 1)
        r = 1;
    else
        r = fib(z-1) + fib(z-2);
    return r;
}</code></pre><p>The first two items of the Fib are 0 and 1. So it is</p>]]></description><link>https://ivanpp.cc/wcet/</link><guid isPermaLink="false">63dc34ac8049c35910ad042c</guid><dc:creator><![CDATA[ivan Ding]]></dc:creator><pubDate>Fri, 10 Feb 2023 22:35:35 GMT</pubDate><content:encoded><![CDATA[<h2 id="the-natural-way">The natural way</h2><p>Here is a recursion function, generating <a href="https://en.wikipedia.org/wiki/Fibonacci_number">Fibonacci number</a>.</p><pre><code class="language-c">int fib (int z) {
    int r;
    if (z == 0)
        r = 0;
    else if (z == 1)
        r = 1;
    else
        r = fib(z-1) + fib(z-2);
    return r;
}</code></pre><p>The first two items of the Fib are 0 and 1. So it is very nature to write the code like this. But it is under-optimized, in a "Fibonacci-way".</p><h2 id="wcet-analysis">WCET Analysis</h2><p>Now we do the <strong>WCET(worst case execution time)</strong> analysis, assuming:</p><ul><li>Each declaration or assignment statement costs 1 time unit</li><li>Each compare statement costs 1 time unit</li><li>Each return statement costs 1 time unit</li><li>Each addition or subtraction operation costs 4 time units.</li><li>A function call costs 2 time units plus WCET for the function in question.</li><li>All other language constructs can be assumed to take 0 time units to execute.</li></ul><p>Let's make $f(z)$ to be the WCET of the <code>fib(z)</code>, so $f(0)$ is the WCET of the <code>fib(0)</code> function execution. For different <code>z</code> value, the code being executed is different, with different length, and different WCET.</p><p>Now divide the code snippet to 3 parts, and analyze them seperately:</p><pre><code class="language-c">// path a (z=0): 4
int fib (int z) {
    int r; // declaration (1)
    if (z == 0) // compare (1)
        r = 0; // assignment (1)
    return r; // return (1)
}</code></pre><p>For path a, the WCET of <code>fib(0)</code> is 4, that is $f(0)=4$</p><pre><code class="language-c">// path b (z=1): 5
int fib (int z) {
    int r; // declaration (1)
    if (z == 0) // compare (1)
    	;
    else if (z == 1) // compare (1)
    	;
    else
        r = 1; // assignment (1)
    return r; // return (1)
}</code></pre><p>For path b, $f(1)=5$ because we have <strong>one more comparison</strong> executed with the if-else statement. </p><pre><code class="language-c">// path c (z&gt;=2): 21+f(z-1)+f(z-2)
int fib (int z) {
    int r; // declaration (1)
    if (z == 0) // compare (1)
        ;
    else if (z == 1) // compare (1)
        ;
    else
        r = fib(z-1) + fib(z-2); // 3*4+2*2+f(z-1)+f(z-2)
    return r; // return (1)
}</code></pre><p>For <code>r = fib(z-1) + fib(z-2);</code> it takes  3 ALUs(add/sub), 2 function calls, 1 assignment, and it contains the code of other 2 fib calling. So $f(z)=21+f(z-1)+f(z-2)$ for path c. And you may noticed that, we need to do <strong>one more comparison</strong>, same as path b.</p><p>For a if-else code snippet, the structure/placement matters, <strong>especially it will be called recursively</strong>.</p><pre><code class="language-c">if (cond1)
    func1(); // execute after cond1
else if (cond2)
    func1(); // execute after cond1, cond2
else if (cond3)
    func1(); // execute after cond1, cond2, cond3
else
    func1(); // execute after cond1, cond2, cond3</code></pre><p>Let me use some real numbers, if we want to compute the WCET for <code>fib(5)</code>, we'll have:</p><!--kg-card-begin: markdown--><p>$$<br>
\begin{align*}<br>
f(2)=f(1)+f(0)+c=a+b+c \\<br>
f(3)=f(2)+f(1)+c=a+2b+2c \\<br>
f(4)=f(3)+f(2)+c=2a+3b+4c \\<br>
f(5)=f(4)+f(3)+c=3a+5b+7c \\<br>
\end{align*}<br>
$$</p>
<!--kg-card-end: markdown--><p>The WCET of <code>fib(5)</code> is composed of some path a, more path b, much more path c. And try to look at it vertically, you may understand why I said, this is <strong>under-optimized in a Fibonacci-way</strong>.</p><p>The weight of path A grows in a Fibonacci-way; the weight of path B grows in a same fashion but one step ahead (one step in Fibonacci...). And for path C? Also grows in a Fibonacci fashion, but every time 1 is added to the weight (for the execution of path c itself).</p><p>For this natural version code, the WCET of <code>fib(5)</code> is $3\times 4+5\times 5+7\times 21=184$</p><h2 id="better-structure-nature-way-for-the-machine">Better Structure, nature way for the machine</h2><pre><code class="language-c">int fib (int z) {
    int r;
    if (z &gt; 1)
        r = fib(z-1) + fib(z-2);
    else if (z == 1)
        r = 1;
    else
        r = 0;
    return r;
}</code></pre><p>If we switch the position of path C with path A, it will save one comparison for path C and add one to path A, which makes WCET of the new fib function is $3\times 5 + 5\times 5 + 7\times 20=180$. The difference is 4 time units of comparison.</p><p>For a computation of  <code>fib(10)</code>, the difference will be 54 time units.</p><p>For a computation of <code>fib(20)</code>, the difference will be 6764 time units...</p>]]></content:encoded></item><item><title><![CDATA[GEMM - Part 1: Basics and a CPU Implementation Example]]></title><description><![CDATA[<p></p><blockquote><strong>GEMM</strong> stands for general matrix multiply, it is the "level 3" routine of the <a href="https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms#Level_3"><strong>BLAS</strong> (Basic Linear Algebra Subprograms)</a>, built for common linear algebra operations. GEMM is also widely used in areas like computer vision, machine learning.</blockquote><h2 id="the-formula">The Formula</h2><p>The formula of GEMM is: </p><p>$C=\alpha AB+\beta C$</p><p>where</p>]]></description><link>https://ivanpp.cc/gemm-pt1/</link><guid isPermaLink="false">6385323c8049c35910ad009e</guid><category><![CDATA[Algorithm]]></category><dc:creator><![CDATA[ivan Ding]]></dc:creator><pubDate>Mon, 12 Dec 2022 14:33:00 GMT</pubDate><content:encoded><![CDATA[<p></p><blockquote><strong>GEMM</strong> stands for general matrix multiply, it is the "level 3" routine of the <a href="https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms#Level_3"><strong>BLAS</strong> (Basic Linear Algebra Subprograms)</a>, built for common linear algebra operations. GEMM is also widely used in areas like computer vision, machine learning.</blockquote><h2 id="the-formula">The Formula</h2><p>The formula of GEMM is: </p><p>$C=\alpha AB+\beta C$</p><p>where $A$, $B$, and $C$ are matrices, $\alpha$ and $\beta$ are constant values.</p><h2 id="gemm-c-example">GEMM C Example</h2><p>Here is a very neat GEMM implementation written in C, in the well-known neural network framework <em><a href="https://pjreddie.com/darknet/">darknet</a>. </em></p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/pjreddie/darknet/blob/master/src/gemm.c"><div class="kg-bookmark-content"><div class="kg-bookmark-title">darknet/gemm.c at master · pjreddie/darknet</div><div class="kg-bookmark-description">Convolutional Neural Networks. Contribute to pjreddie/darknet development by creating an account on GitHub.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.githubassets.com/favicons/favicon.svg"><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">pjreddie</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/d8198e898edb3147024a45fc8fc67045a2bd82ba232e4dfeecaa2b793c15062e/pjreddie/darknet"></div></a></figure><pre><code class="language-c">void gemm_cpu(int TA, int TB, int M, int N, int K, float ALPHA, 
        float *A, int lda, 
        float *B, int ldb,
        float BETA,
        float *C, int ldc)
{
    int i, j;
    for(i = 0; i &lt; M; ++i){
        for(j = 0; j &lt; N; ++j){
            C[i*ldc + j] *= BETA;
        }
    }
    if(!TA &amp;&amp; !TB)
        gemm_nn(M, N, K, ALPHA,A,lda, B, ldb,C,ldc);
    else if(TA &amp;&amp; !TB)
        gemm_tn(M, N, K, ALPHA,A,lda, B, ldb,C,ldc);
    else if(!TA &amp;&amp; TB)
        gemm_nt(M, N, K, ALPHA,A,lda, B, ldb,C,ldc);
    else
        gemm_tt(M, N, K, ALPHA,A,lda, B, ldb,C,ldc);
}</code></pre><p>By default, matrix $A$, $B$ is not transposed (<code>TA=0 &amp;&amp; TB=0</code>), which means:</p><p><code>*A</code> is a 1-d array which stores a $(M, K)$ matrix</p><p><code>*B</code> is a 1-d array which stores a $(K, N)$ matrix</p><p><code>*C</code> is a 1-d array which stores a $(M, N)$ matrix, and it will be used to store the final result.</p><p>$C=\beta C$ will be computed first for better efficiency. Then we will do $C=\alpha AB+C$ part.</p><p>Matrix $A$, $B$, and $C$ are all stored in a <a href="https://en.wikipedia.org/wiki/Row-_and_column-major_order">row-majored order</a><strong>,</strong> which means elements of the same row are stored <strong>consecutive</strong> in memory. (This doesn't mean <strong>all elements</strong> in the matrix is stored consecutive in memory).</p><h3 id="leading-dimension">Leading Dimension</h3><p>Elements of the matrix used in the gemm function is not necessarily stored consecutive in memory? A little counter-intuitive right? To explain this, I need to introduce leading dimension (argument <code>lda</code>, <code>ldb</code>, and <code>ldc</code>).</p><p>Actually, elements of a matrix is stored consecutive in memory, but when multiplying matrix, sometimes we want to use part of an existing matrix as the input/output, not all of it. </p><p>Suppose we have a $(6, 8)$ matrix $Q$ in our memory (row-majored order), and we want to do matrix multiply on part of it, which is a $(3, 4)$ matrix $q$.</p><figure class="kg-card kg-code-card"><pre><code class="language-C">/*
Q Q Q Q Q Q Q Q
Q Q q q q q Q Q
Q Q q q q q Q Q
Q Q q q q q Q Q
Q Q Q Q Q Q Q Q
Q Q Q Q Q Q Q Q
*/</code></pre><figcaption>i want to use q in gemm, but i don't need to copy it explicitly</figcaption></figure><p>Apparently, elements in matrix $q$ were not stored consecutively in memory. Instead of copying the data first then do the gemm, we can do gemm directly if we use the right parameters <code>*A</code>, <code>M</code>, <code>K</code>, and most importantly, <code>lda</code>. In the previous example:</p><p><code>TA=0</code> means matrix $Q$ and of course matrix $q$ are row-majored, or not transposed.</p><p><code>lda=8</code> means the <strong>leading dimension</strong> (number of columns in this case) of the matrix stored in the memory is $8$, which is the dimension of the matrix $Q$.</p><p><code>K=4</code> means the dimension (number of columns in this case) of the matrix used for gemm is $4$, which is the dimension of the matrix $q$.</p><p><code>M=3</code> means the number of rows is $3$ for matrix $q$.</p><p><code>*A=*(Q+10)</code> means the first element of matrix $q$, is the 11th element of the matrix $Q$, starting address together with offset were given here.</p><p>These are all we need for one input/output of the gemm function. And if you're familiar with <em>numpy</em>, here's an example in Python:</p><pre><code class="language-python">import numpy as np

# 1-d array Q, to get the idea how it is stored in memory
Q = np.arange(6 * 8)
print(Q)
# 2-d array QQ, how we understand the matrix, with 2-d shape information
QQ = Q.reshape(6, 8)
print(QQ)
# to help you understand the C explanation above
lda = QQ.shape[1] # 8
K = 4
M = 3
offset = 10
# these are all we need to get the q, or to use it directly in gemm function
q = QQ[offset//lda: offset//lda+M, offset%lda: offset%lda+K]
# q = QQ[1:4, 2:6]
print(q)</code></pre><p>And here's <a href="https://www.ibm.com/docs/en/essl/6.3?topic=matrices-how-leading-dimension-is-used">the explanation of leading dimension provided by IBM for their ESSL (Engineering and Scientific Subroutine Library)</a>. </p><h3 id="matrix-multiplication">Matrix Multiplication</h3><p>After the easy part $C=\beta C$ is done, $\alpha AB$ will be computed. The <strong>order of storage</strong> of matrix $A$, $B$ should be considered and taken care of.</p><!--kg-card-begin: html--><div style="text-align:center">
<img src="https://ivanpp.cc/content/images/2022/11/Row-and-column-major-order.svg" alt="drawing" width="320">
	<figcaption>
    from wikipedia (Row- and column-major order), link below
    </figcaption>
</div><!--kg-card-end: html--><p>Matrices can be stored in <a href="https://en.wikipedia.org/wiki/Row-_and_column-major_order">row-major order or column-major order</a>. <strong>Row-major order</strong> is used for <strong>C-style arrays</strong>. That means, by default, elements of the same row are considered to be stored consecutively. But in some cases (an example below will show a situation that can benefit from it) we need to store matrices in a column-major order. Storing a matrix in column-major order in a row-major order convention is eqivalent to store the <strong>transpose matrix</strong> of the origianl in the memory.</p><p>Now you may know why we need <code>int TA</code> and <code>int TB</code> parameters in our gemm function. In our simple example, <code>TA=0</code> means matrix $A$ is stored in a row-major order. And <code>TA!=0</code> means matrix $A$ is stored in a column-major order, or you can say the transpose matrix of $A$, which is $A^T$, is stored in the memory. </p><pre><code class="language-c">// if (TA == 0 &amp;&amp; TB == 0)
void gemm_nn(int M, int N, int K, float ALPHA, 
        float *A, int lda, 
        float *B, int ldb,
        float *C, int ldc)
{
    int i,j,k;
    #pragma omp parallel for
    for(i = 0; i &lt; M; ++i){
        for(k = 0; k &lt; K; ++k){
            register float A_PART = ALPHA*A[i*lda+k];
            for(j = 0; j &lt; N; ++j){
                C[i*ldc+j] += A_PART*B[k*ldb+j];
            }
        }
    }
}</code></pre><p>When <code>TA==0 &amp;&amp; TB==0</code>, the snippet above will be used to compute $C=C+\alpha AB$</p><h3 id="a-more-efficient-way">A More Efficient Way</h3><pre><code class="language-c">// if (TA == 0 &amp;&amp; TB != 0)
void gemm_nt(int M, int N, int K, float ALPHA, 
        float *A, int lda, 
        float *B, int ldb,
        float *C, int ldc)
{
    int i,j,k;
    #pragma omp parallel for
    for(i = 0; i &lt; M; ++i){
        for(j = 0; j &lt; N; ++j){
            register float sum = 0;
            for(k = 0; k &lt; K; ++k){
                sum += ALPHA*A[i*lda+k]*B[j*ldb + k];
            }
            C[i*ldc+j] += sum;
        }
    }
}</code></pre><p></p>]]></content:encoded></item><item><title><![CDATA[cvgear, inauguration]]></title><description><![CDATA[<!--kg-card-begin: markdown--><blockquote>
<p><a href="https://github.com/ivanpp/cvgear">cvgear</a> 0.1.0 was released on 20 May, 2020.</p>
<p>CVGear means Computer Vision Gear. It is under MIT License and contains computer vision gears for good uses.</p>
</blockquote>
<h2 id="torchnestedloader"><code>TorchNestedLoader</code></h2>
<p><code>TorchNestedLoader</code> allows you to save/load between different modules with actually the same logic structure.</p>
<p>Suppose we have a SimpleNet:</p>
<pre><code class="language-python">import</code></pre>]]></description><link>https://ivanpp.cc/cvgear-inauguration/</link><guid isPermaLink="false">5ed52e27c9f5c22967efb2b3</guid><category><![CDATA[Darknet]]></category><category><![CDATA[Computer Vision]]></category><category><![CDATA[PyTorch]]></category><dc:creator><![CDATA[ivan Ding]]></dc:creator><pubDate>Wed, 20 May 2020 02:47:00 GMT</pubDate><content:encoded><![CDATA[<!--kg-card-begin: markdown--><blockquote>
<p><a href="https://github.com/ivanpp/cvgear">cvgear</a> 0.1.0 was released on 20 May, 2020.</p>
<p>CVGear means Computer Vision Gear. It is under MIT License and contains computer vision gears for good uses.</p>
</blockquote>
<h2 id="torchnestedloader"><code>TorchNestedLoader</code></h2>
<p><code>TorchNestedLoader</code> allows you to save/load between different modules with actually the same logic structure.</p>
<p>Suppose we have a SimpleNet:</p>
<pre><code class="language-python">import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(
            in_channels=3,
            out_channels=32,
            kernel_size=3,
            stride=1,
            padding=1,
            bias=False
        )
        self.bn1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(
            in_channels=32,
            out_channels=32,
            kernel_size=3,
            stride=1,
            padding=1,
            bias=False
        )
        self.bn2 = nn.BatchNorm2d(32)
        
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.conv2(x)
        x = self.bn2(x)
        return x

simplenet = SimpleNet()
</code></pre>
<p>The structure of SimpleNet is:</p>
<pre><code class="language-python">SimpleNet(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
</code></pre>
<p>And we have a more &quot;wrapped&quot; version WrappedSimpleNet:</p>
<pre><code class="language-python">import torch.nn as nn

class Conv2d(torch.nn.Conv2d):
    def __init__(self, *args, **kwargs):
        norm = kwargs.pop(&quot;norm&quot;, None)
        super().__init__(*args, **kwargs)
        self.norm = norm
	
    def forward(self, x):
        x = super().forward(x)
        if self.norm is not None:
            x = self.norm(x)
        return x

class WrappedSimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.stem = Conv2d(
            in_channels=3,
            out_channels=32,
            kernel_size=3,
            stride=1,
            padding=1,
            bias=False,
            norm=nn.BatchNorm2d(32)
        )
        self.conv1 = Conv2d(
            in_channels=32,
            out_channels=32,
            kernel_size=3,
            stride=1,
            padding=1,
            bias=False,
            norm=nn.BatchNorm2d(32)
        )
	
    def forward(self, x):
        x = self.stem(x)
        x = self.conv1(x)
        return x
    
wrappedsimplenet = WrappedSimpleNet()
</code></pre>
<p>The structure of WrappedSimpleNet is:</p>
<pre><code class="language-python">WrappedSimpleNet(
  (stem): Conv2d(
    3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
    (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (conv1): Conv2d(
    32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
    (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
)
</code></pre>
<p>The logic structure of SimpleNet and WrappedSimpleNet is exactly the same. But they differ from submodule names and tree structure. So you cannot save/load <code>state_dict</code> between these two modules  using <code>.state_dict()</code> and <code>.load_state_dict()</code> methods easily.</p>
<p>But with <code>TorchNestedLoader</code>, you can save/load <code>nested_dict</code> between these two modules easily:</p>
<pre><code class="language-python">from cvgear.framework.torch import TorchNestedLoader

simplenetloader = TorchNestedLoader(simplenet)
wrappedsimplenetloader = TorchNestedLoader(wrappedsimplenet)

# save as nested_dict
nested_dict = simplenetloader.nested_dict()
# load nested_dict
wrappedsimplenetloader.load_nested_dict(nested_dict)
</code></pre>
<p>Imagine you just implement one state-of-the-art model in torch and want to test it. Train your model from scratch will be time-consuming. After downloading pre-trained model from the Internet then found it painful to load it to your model manually?</p>
<p>Use <code>TorchNestedLoader</code> as your gear!</p>
<h2 id="darknetparser"><code>DarknetParser</code></h2>
<p>Everyone loves darknet. It is very fast and in public domain.</p>
<p>The configuration file of the darknet network is often long and tedious(due to its sequential structure), hard to read through. With darknet installed, the network configuration file can be parsed easily and you can get a clear sense of what the network structure is from the information it displayed.</p>
<p>Without darknet installed:</p>
<pre><code class="language-python">from cvgear.framework.darknet import DarknetParser, build_darknet_parser

# create a DarknetParser instance, then load network configuration
darknet53 = DarknetParser(&quot;darknet53&quot;)
darknet53.load_darknet_cfg(&quot;path/to/darknet53.cfg&quot;)
# or build a DarknetParser from network configuration file directly
darknet53 = build_darknet_parser(&quot;path/to/darknet53.cfg&quot;)

print(darknet53)
</code></pre>
<p>Crystal clear!</p>
<!--kg-card-end: markdown--><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://ivanpp.cc/content/images/2020/06/cvgear_inauguration_darknet53-2.png" class="kg-image" alt="cvgear_inauguration_darknet53-2"><figcaption>parse darknet network with DarknetParser&nbsp;</figcaption></figure><!--kg-card-begin: markdown--><h2 id="darknetnestedloader"><code>DarknetNestedLoader</code></h2>
<p>Save/load torch module...</p>
<p>Parse darknet network...</p>
<p>What about... Save/load darknet network?</p>
<p>Even more! Save/load between darknet network and <code>torch.nn.Module</code> ! <code>DarknetNestedLoader</code> is made for save/load darknet network(DarknetParser) as binary weights(<code>.weights</code> file) or as <code>nested_dict</code>.</p>
<pre><code class="language-python">from cvgear.framework.darknet import DarknetNestedLoader, build_darknet_nested_loader

# create a DarknetNestedLoader instance with DarknetParser, then load from binary weights file
darknet53loader = DarknetNestedLoader(darknet53)
darknet53loader.load_darknet_weights(&quot;path/to/darknet53.weights&quot;)
# or build a DarknetNestedLoader from network configuration file and binary weights file
darknet53loader = build_darknet_nested_loader(&quot;path/to/darknet53.cfg&quot;, &quot;path/to/darknet53.weights&quot;)

# save weights to nested_dict
nested_dict = darknet53loader.nested_dict()

# load nested_dict to a torch.nn.Module with TorchNestedLoader
# ...
</code></pre>
<h2 id="recap">Recap</h2>
<!--kg-card-end: markdown--><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://ivanpp.cc/content/images/2020/06/cvgear_inauguration_visio.png" class="kg-image" alt><figcaption>save/load between darknet and torch models as nested_dict</figcaption></figure><!--kg-card-begin: markdown--><ol>
<li><code>DarknetParser</code> describes a darknet network(as <code>torch.nn.Module</code> describes a torch module)</li>
<li><code>DarknetNestedLoader</code> can save/load a darknet network as <code>nested_dict</code> or binary file</li>
<li><code>TorchNestedLoader</code> can save/load a torch module as <code>nested_dict</code></li>
<li>So with <code>DarknetNestedLoader</code> and <code>TorchNestedLoader</code>, you can convert between darknet weights and torch weights easily.</li>
</ol>
<p>That is little cvgear 0.1.0<br>
More gears are coming up...<br>
Happy inauguration!🎉🎉🎉</p>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[Detectron2 walkthrough (Windows)]]></title><description><![CDATA[<!--kg-card-begin: markdown--><blockquote>
<p>New research starts with understanding, reproducing and verifying previous results in the literature. Detectron2 made the process easy for computer vision tasks.<br>
This post contains the <a href="https://ivanpp.cc/detectron2-walkthrough-windows/#installation">#installation</a>, <a href="https://ivanpp.cc/detectron2-walkthrough-windows/#runapretrainedmodel">#demo</a> and <a href="https://ivanpp.cc/detectron2-walkthrough-windows/#reproducetheresult">#training</a> of detectron2 on windows.</p>
</blockquote>
<p><em><strong>update:</strong></em><br>
<em>2020/07/08</em></p>
<ol>
<li><em>install pycocotools 2.0.1 from PyPi</em></li>
<li><em>add File 5 and File</em></li></ol>]]></description><link>https://ivanpp.cc/detectron2-walkthrough-windows/</link><guid isPermaLink="false">5e3d0663c9f5c22967efb168</guid><category><![CDATA[Computer Vision]]></category><category><![CDATA[PyTorch]]></category><dc:creator><![CDATA[ivan Ding]]></dc:creator><pubDate>Thu, 06 Feb 2020 13:30:00 GMT</pubDate><content:encoded><![CDATA[<!--kg-card-begin: markdown--><blockquote>
<p>New research starts with understanding, reproducing and verifying previous results in the literature. Detectron2 made the process easy for computer vision tasks.<br>
This post contains the <a href="https://ivanpp.cc/detectron2-walkthrough-windows/#installation">#installation</a>, <a href="https://ivanpp.cc/detectron2-walkthrough-windows/#runapretrainedmodel">#demo</a> and <a href="https://ivanpp.cc/detectron2-walkthrough-windows/#reproducetheresult">#training</a> of detectron2 on windows.</p>
</blockquote>
<p><em><strong>update:</strong></em><br>
<em>2020/07/08</em></p>
<ol>
<li><em>install pycocotools 2.0.1 from PyPi</em></li>
<li><em>add File 5 and File 6</em></li>
</ol>
<h2 id="installation">Installation</h2>
<blockquote>
<p>Learning detectron2 starts with installing.</p>
</blockquote>
<h3 id="requirements">Requirements</h3>
<ol>
<li>Windows 10 with Python ≥ 3.6</li>
<li>PyTorch ≥ 1.3 and corresponding torchvision</li>
<li>CUDA ≥ 9.2</li>
<li>Visual Studio 2013-2019</li>
<li>(Optional) OpenCV, needed by demo and visualization</li>
</ol>
<h3 id="step0setupacondaenvironmentwiththerightpythonversionoptionalbutrecommended">Step 0. Setup a conda environment with the right python version(optional but recommended)</h3>
<pre><code class="language-shell">REM &quot;Create a conda environment named 'detectron2' with the latest version of Python 3.7.x&quot;
conda create --name detectron2 python=3.7
REM &quot;Activate the conda environment for 'detectron2'&quot;
conda activate detectron2
</code></pre>
<p><strong>Note:</strong> All required python package will be installed in this environment(so does detectron2 itself), make sure activate the environment by command <code>conda activate detectron2</code> before you do anything with detectron2. Deactivate the environment by <code>conda deactivate</code> so you can go back to your previous working environment.</p>
<h3 id="step1installpythoncocoapipycocotools201">Step 1. Install Python COCO API(pycocotools 2.0.1)</h3>
<p>The latest version of detectron2 requires <a href="https://pypi.org/project/pycocotools/">pycocotools 2.0.1</a></p>
<p>Install it by <code>pip install pycocotools&gt;=2.0.1</code> for <strong>Linux</strong></p>
<p>But for windows, you should first download  <a href="https://files.pythonhosted.org/packages/5c/82/bcaf4d21d7027fe5165b88e3aef1910a36ed02c3e99d3385d1322ea0ba29/pycocotools-2.0.1.tar.gz">pycocotools-2.0.1.tar.gz</a> from PyPi.</p>
<p>Unzip it then edit <code>pycocotools-2.0.1\setup.py</code>:</p>
<p>replace <code>extra_compile_args=['-Wno-cpp', '-Wno-unused-function', '-std=c99']</code>, with <code>extra_compile_args={'gcc': ['/Qstd=c99']},</code></p>
<p>Back to command prompt, install pycocotools to site-packages of current environment(detectron2):</p>
<pre><code class="language-shell">cd pycocotools-2.0.1
python setup.py build_ext install
</code></pre>
<p>If it works, you should see the info <code>Finished processing dependencies for pycocotools==2.0.1</code>, then you can delete the cocoapi directory if you like:</p>
<pre><code class="language-shell">cd ..
RMDIR /S pycocotools-2.0.1
</code></pre>
<h3 id="step2installpytorchandtorchvision">Step 2. Install PyTorch and torchvision</h3>
<p>Check your CUDA version first:</p>
<pre><code class="language-shell">nvcc --version
</code></pre>
<p>It should be ≥ 9.2 (that is 9.2, 10.0 or 10.1), go to <a href="https://pytorch.org/get-started/locally/">https://pytorch.org/get-started/locally/</a>, select your CUDA version copy the command (e.g. for CUDA 10.1 it should be)</p>
<pre><code class="language-shell">conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
</code></pre>
<h3 id="step3installdetectron2">Step 3. Install Detectron2</h3>
<p>Official version doesn't support windows currently. To build and use it successfully on windows, you should edit some files: <a href="https://github.com/ivanpp/detectron2/commit/d90dd858ebdba197ca37b0164e677a380c187956">File 1</a>, <a href="https://github.com/ivanpp/detectron2/commit/468b8f1f6141280bdf775aea35249cbc0486d589">File 2</a>, <a href="https://github.com/ivanpp/detectron2/commit/496ca0410d5f00fe5d7e1deb07574168a95d51fd">File 3</a>, <a href="https://github.com/ivanpp/detectron2/commit/8222fe67e077086597e1b56ca0a6622b18b4bb1f">File 4</a>, <a href="https://github.com/ivanpp/detectron2/commit/f620ab1fe0f1314743d113de092e0be3b6e4b2f3">File 5</a>, <a href="https://github.com/ivanpp/detectron2/commit/32cbe9834f19fd7236d6be3184492a71e46981d9">File 6</a></p>
<p>This repository <a href="https://github.com/ivanpp/detectron2">ivanpp/detectron2</a> contains the latest version of official detectron2 with windows patches mentioned above. So the easy way to do this is to clone and build it:</p>
<pre><code class="language-shell">git clone https://github.com/ivanpp/detectron2.git
cd detectron2
pip install -e .
</code></pre>
<p>Or use the <a href="https://github.com/facebookresearch/detectron2">official version</a>:</p>
<pre><code class="language-shell">git clone https://github.com/facebookresearch/detectron2.git
</code></pre>
<p>Then <strong>edit the files mentioned above</strong> and build it:</p>
<pre><code class="language-shell">cd detectron2
pip install -e .
</code></pre>
<p><strong>Note:</strong> it may took a while to build all the <code>.cu</code> and <code>.cpp</code> files, be patient!</p>
<h3 id="step4checktheinstallation">Step 4. Check the installation</h3>
<p>Check the installation:</p>
<pre><code class="language-shell">python -m detectron2.utils.collect_env
</code></pre>
<p>The result should like:</p>
<!--kg-card-end: markdown--><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://ivanpp.cc/content/images/2020/02/detectron2_check_installation.png" class="kg-image" alt="environment info"><figcaption>environment info</figcaption></figure><!--kg-card-begin: markdown--><p>Make sure the NVCC version of detectron2 matches the NVCC version of PyTorch. If not, you may choose the wrong version at Step 2.</p>
<h2 id="runapretrainedmodel">Run a pre-trained model</h2>
<p>Choose a model in the <a href="https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md">model zoo</a>, set the input config file and specify the corresponding <code>MODEL.WEIGHT</code> for it.</p>
<pre><code class="language-shell">python demo/demo.py ^
	--config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml ^
	--input datasets/coco/unlabeled2017/000000000361.jpg ^
	--output output.jpg ^
	--opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x/137260431/model_final_a54504.pkl
</code></pre>
<!--kg-card-end: markdown--><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://ivanpp.cc/content/images/2020/02/coco_unlabeled2017_361.png" class="kg-image" alt="demo of pre-trained model"><figcaption>demo on pre-trained model</figcaption></figure><!--kg-card-begin: markdown--><p><strong>Note:</strong></p>
<ol>
<li>&quot;detectron2://&quot; is equal to &quot;<a href="https://dl.fbaipublicfiles.com/detectron2/">https://dl.fbaipublicfiles.com/detectron2/</a>&quot; here, it will be resolve by <code>Detectron2Handle</code>, see <a href="https://github.com/facebookresearch/detectron2/blob/master/detectron2/checkpoint/catalog.py">detectron2/detectron2/checkpoint/catalog.py</a> for details.</li>
<li>Pre-trained weights from Internet will be cached to <code>%USERPROFILE%/.torch/fvcore_cache</code> if <code>$FVCORE_CACHE</code> environment variable is not set. (For Linux, the default cache file is <code> ~/.torch/fvcore_cache</code>), see <a href="https://github.com/facebookresearch/fvcore/blob/master/fvcore/common/file_io.py">fvcore/fvcore/common/file_io.py</a> for details.</li>
<li>If you don't want detectron2 to download and cache the model weight automatically. Specify the local path to the pre-trained weight after downloading it, like <code>--opts PATH/TO/model_final_a54504.pkl</code>.</li>
</ol>
<h2 id="reproducetheresult">Reproduce the result</h2>
<h3 id="trainingmaskrcnnmodel">Training mask r-cnn model</h3>
<p>All the config files are made for 8-GPU training. To reproduce the result on 1 GPU, there are changes to made. For example, to reproduce the result in <code>configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml</code>, you can edit the corresponding <code>.yaml</code> file(<code>mask_rcnn_R_50_FPN_1x.yaml</code> or <code>Base-RCNN-FPN.yaml</code>) or overwrite the training parameters in command line.</p>
<ul>
<li>
<p>Inconvenient but once-for-all way:</p>
<ol>
<li>
<p>Edit <code>configs\Base-RCNN-FPN.yaml</code>:</p>
<pre><code class="language-yaml">SOLVER:
  IMS_PER_BATCH: 2
  BASE_LR: 0.0025
  STEPS: (480000, 640000)
  MAX_ITER: 720000
</code></pre>
</li>
<li>
<p>Train the model:</p>
<pre><code class="language-shell">python tools/train_net.py ^
	--config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml ^
	OUTPUT_DIR output/mask_rcnn_r50_fpn_1x
</code></pre>
</li>
</ol>
</li>
<li>
<p>Convenient way:</p>
<p>Simply overwrite it through command line, no need to edit any file:</p>
<pre><code class="language-shell">python tools/train_net.py ^
	--config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml ^
	SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 ^
	SOLVER.MAX_ITER 720000 SOLVER.STEPS (480000,640000) ^
	OUTPUT_DIR output/mask_rcnn_r50_fpn_1x
</code></pre>
</li>
</ul>
<p>All the checkpoints and the final model will be stored at the <code>OUTPUT_DIR</code> we defined, <code>output/mask_rcnn_r50_fpn_1x</code>, along with tensorflow eventlog file, log file... A comprehensive model config file will be generated automatically(<code>output/mask_rcnn_r50_fpn_1x/config.yaml</code>).</p>
<h3 id="resumetrainingprogress">Resume training progress</h3>
<p>Training progress may shut down sometimes, manually or accidentally. To resume training, simply run:</p>
<pre><code class="language-shell">python tools/train_net.py ^
	--config-file output/mask_rcnn_r50_fpx_1x/config.yaml
	--resume
</code></pre>
<p>The training will be resumed from the last checkpoint automatically, no need to specify the checkpoint unless you need it for some reason.</p>
<h3 id="visualizethetrainingprogressthroughtensorboard">Visualize the training progress through TensorBoard</h3>
<p>Use tensorboard to visualize the training progress during or after training:</p>
<pre><code class="language-shell">tensorboard --logdir output
</code></pre>
<!--kg-card-end: markdown--><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://ivanpp.cc/content/images/2020/02/detectron2_tensorboard.png" class="kg-image" alt><figcaption>visualization through tensorboard</figcaption></figure><!--kg-card-begin: markdown--><h3 id="evaluatetheperformance">Evaluate the performance</h3>
<p>Detectron2 will evaluate the final model after the training progress. To evaluate the performance of any checkpoint:</p>
<pre><code class="language-shell">python tools/train_net.py ^
	--config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml ^
	--eval-only MODEL.WEIGHTS /path/to/checkpoint_file
</code></pre>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[📙Conda Note]]></title><description><![CDATA[<p><a href="https://docs.conda.io/projects/conda/en/latest/user-guide/cheatsheet.html">Conda cheat sheet</a><br><a href="https://docs.conda.io/en/latest/">Conda documentation</a></p><h2 id="conda-channels">Conda Channels</h2><p>Conda installs packages from <a href="https://repo.anaconda.com/pkgs/">default channel</a> if no channel is specified.</p><p>Use <code>conda config --show channels</code> to see the current channel list.</p><p>Subject to the <a href="https://en.wikipedia.org/wiki/Great_Firewall">GFW</a>, downloading process may be very slow in mainland of China.</p><p>The once for all solution is breaking</p>]]></description><link>https://ivanpp.cc/conda_note/</link><guid isPermaLink="false">5d6a3a49de40e976062fd9d9</guid><category><![CDATA[Note]]></category><dc:creator><![CDATA[ivan Ding]]></dc:creator><pubDate>Sun, 01 Sep 2019 10:18:16 GMT</pubDate><media:content url="https://ivanpp.cc/content/images/2019/09/conda_logo-1.svg" medium="image"/><content:encoded><![CDATA[<img src="https://ivanpp.cc/content/images/2019/09/conda_logo-1.svg" alt="📙Conda Note"><p><a href="https://docs.conda.io/projects/conda/en/latest/user-guide/cheatsheet.html">Conda cheat sheet</a><br><a href="https://docs.conda.io/en/latest/">Conda documentation</a></p><h2 id="conda-channels">Conda Channels</h2><p>Conda installs packages from <a href="https://repo.anaconda.com/pkgs/">default channel</a> if no channel is specified.</p><p>Use <code>conda config --show channels</code> to see the current channel list.</p><p>Subject to the <a href="https://en.wikipedia.org/wiki/Great_Firewall">GFW</a>, downloading process may be very slow in mainland of China.</p><p>The once for all solution is breaking the wall or leaving the mainland, both hard to achieve.</p><p>You can circumvent the GFW through proxy or use domestic channel as alternative. I prefer the former.</p><h3 id="setup-proxy-for-conda">Setup Proxy for Conda</h3><p>Suppose you have a local socks5 proxy listening port 1080, simply modified the <code>.condarc</code> :</p><pre><code class="language-yaml">proxy_servers:
  http: socks5://127.0.0.1:1080
  https: socks5://127.0.0.1:1080
</code></pre><h3 id="add-channel-list">Add Channel List</h3><p>Two ways to add channel list:</p><ol><li>Override the config file:</li></ol><p>Create <code>.condarc</code> in <code>%UserProfile%/.conda</code>, follow the <code>YAML</code> syntax, override the channel list configuration like:</p><pre><code class="language-yaml">channels:
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
  - defaults
</code></pre><p>Channels are organized from highest to lowest priority.</p><ol><li><a href="https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-channels.html?highlight=channels">Manager channels</a> from command prompt:</li></ol><pre><code class="language-shell">REM "Add to the top of the channel list"
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
</code></pre><p>OR</p><pre><code class="language-shell">REM "Add to the bottom of the channel list"
conda config --append channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --append channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
</code></pre><h3 id="specify-channels-when-installing-packages"><a href="https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html?highlight=specify%20channel#specifying-channels-when-installing-packages">Specify Channels When Installing Packages</a></h3><p>Use <code>-c</code> or <code>--channel</code> flag to add additional channel to search when installing:</p><pre><code class="language-shell">conda install numpy --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
</code></pre><p>And use <code>--override-channels</code> flag to skip the channels list in <code>.condarc</code></p><pre><code class="language-shell">conda install numpy --override-channels --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
</code></pre><h2 id="conda-packages">Conda Packages</h2><ul><li><a href="https://docs.conda.io/projects/conda/en/latest/commands/search.html#conda-search">Search package</a></li></ul><pre><code class="language-shell">REM "Searh package and display the detailed information"
conda search PKGNAME --info
</code></pre><ul><li><a href="https://docs.conda.io/projects/conda/en/latest/commands/install.html#conda-install">Install package</a></li></ul><pre><code class="language-bash">REM "Specify version"
conda install PKGNAME==3.14
REM "Specify channel"
conda install --channel conda-forge
conda install -c conda-forge
REM "Specify environment"
conda install PKGNAME --name ENVNAME
conda install PKGNAME -n ENVNAME
REM "Install from local directory"
conda install PATH/TO/PKGNAME.tar.bz2 --offline</code></pre><ul><li><a href="https://docs.conda.io/projects/conda/en/latest/commands/update.html#conda-update">Update package</a></li></ul><pre><code class="language-bash">conda update PKGNAME --name ENVNAME
</code></pre><ul><li><a href="https://docs.conda.io/projects/conda/en/latest/commands/remove.html#conda-remove">Remove package</a></li></ul><pre><code class="language-shell">conda remove PKGNAME --name ENVNAME
</code></pre><h2 id="conda-environment">Conda Environment</h2><p>Conda allows you to create environments containing different packages and even different python version that will not interact with other environments.</p><p>Conda will install package to <code>base</code> environment as default if no environment is specified when installing.</p><p>Here are some convenient command to <a href="https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html">manager conda environment</a>:</p><ul><li>List available conda environments(name &amp; path):</li></ul><pre><code class="language-shell">REM "Current environment is highlighted with an asterisk(*)"
conda info --envs
</code></pre><p>Or just go to <code>%UserProfile%/.conda/environments.txt</code></p><ul><li>Create environment</li></ul><pre><code class="language-shell">REM "Create environment with specific python version"
conda create --name ENVNAME python==VERSION
REM "Create environment to specific path"
conda create --prefix PATH/TO/ENVNAME
REM "Create environment from .yaml file"
conda env create --file PATH/TO/environment.yaml
REM "Create environment from .txt file"
conda create --name ENVNAME --file PATH/TO/spec-file.txt
REM "Create environment from existing environment"
conda env create --name NEWNAME --clone OLDNAME
</code></pre><ul><li>Activate environment</li></ul><pre><code class="language-shell">conda activate ENVNAME
</code></pre><ul><li>Deactivate environment</li></ul><pre><code class="language-shell">conda deactive
</code></pre><ul><li>List packages in specific environment</li></ul><pre><code class="language-shell">conda list --name ENVNAME
</code></pre><ul><li>Using pip in an environment</li></ul><pre><code class="language-shell">conda install --name ENVNAME pip
conda activate myenv
pip &lt;pip_subcommand&gt;
REM "example: pip install PKGNAME -f LINK/LOCAL_PATH"
</code></pre><ul><li>Export environment</li></ul><pre><code class="language-shell">REM "Export to .yaml file"
conda activate ENVNAME
conda env export &gt; PATH/TO/environment.yaml
REM "Export spec list as txt file"
conda list --name ENVNAME --explicit &gt; PATH/TO/spec-file.txt
</code></pre><ul><li>Remove environment</li></ul><pre><code class="language-shell">conda remove --name ENVNAME --all
</code></pre>]]></content:encoded></item><item><title><![CDATA[Telegram Enslaves WeChat]]></title><description><![CDATA[<!--kg-card-begin: markdown--><p>Happy Chinese New Year🎉! It's seventh lunar month also end of the holiday. So i learn to do something meaningful, get rid of some bad shit.</p>
<p>The ringleader of Chinese increasingly closed Internet - WeChat, kidnapped my family and friends, forced me to use their so-called social software(spyware actually)</p>]]></description><link>https://ivanpp.cc/telegram-enslaves-wechat/</link><guid isPermaLink="false">5d37d974545b0d07e1d9c04f</guid><dc:creator><![CDATA[ivan Ding]]></dc:creator><pubDate>Sun, 10 Feb 2019 16:59:00 GMT</pubDate><media:content url="https://ivanpp.cc/content/images/2019/02/china_online_censorship.png" medium="image"/><content:encoded><![CDATA[<!--kg-card-begin: markdown--><img src="https://ivanpp.cc/content/images/2019/02/china_online_censorship.png" alt="Telegram Enslaves WeChat"><p>Happy Chinese New Year🎉! It's seventh lunar month also end of the holiday. So i learn to do something meaningful, get rid of some bad shit.</p>
<p>The ringleader of Chinese increasingly closed Internet - WeChat, kidnapped my family and friends, forced me to use their so-called social software(spyware actually). And I'm not looking for network neutrality cuz we're far far away from it. That fuckin spyware doing something worse. Like scan my disk for privacy, do content review to decide what I could see, filter what I'm sayin without telling me👿.</p>
<p>Since that imp enslaves me and I have no escape, I decide to let my Telegram enslaves that imp and then talk to my Telegram equally and freely, without getting my hands dirty.</p>
<h2 id="libsversion">Libs Version</h2>
<p>Libraries to use:</p>
<ul>
<li><a href="https://github.com/blueset/ehForwarderBot">EH Forwarder Bot</a> (EFB) 2.0 Beta -  extensible chat tunneling framework</li>
<li><a href="https://github.com/blueset/efb-telegram-master">EFB Telegram Master Channel</a> (ETM) - EFB Telegram Master Channel</li>
<li><a href="https://github.com/blueset/efb-wechat-slave">EFB WeChat Slave Channel</a> (EWS). - EFB WeChat Slave Channel</li>
</ul>
<p>It's important to know EWS is now Alpha version so it's unstable and changes rapidly. So again, I use:</p>
<ul>
<li>Python 3.6 (It requires Python &gt;= 3.6)</li>
<li><a href="https://pypi.org/project/ehforwarderbot/2.0.0b13/">EFB 2.0.0b13</a></li>
<li><a href="https://pypi.org/project/efb-telegram-master/2.0.0b18/">ETM 2.0.0b18</a></li>
<li><a href="https://pypi.org/project/efb-wechat-slave/2.0.0a16/">EWS 2.0.0a16</a></li>
</ul>
<h2 id="requirements">Requirements</h2>
<ol>
<li>Telegram and WeChat account (seriously?)</li>
<li>Telegram bot</li>
<li>VPS (that can access the real Internet)</li>
</ol>
<h2 id="setupthetelegrambot">Setup the Telegram Bot</h2>
<ol>
<li>Ask <a href="https://telegram.me/BotFather">@BotFather</a> for a new bot, <code>/newbot</code></li>
<li>Name it, <code>WeChat Slave</code></li>
<li>Choose a username for ur bot (unique), <code>panda_wechat_bot</code></li>
<li>Set <code>/setprivacy</code> status to <code>Disable</code></li>
<li>Set <code>/setjoingroups</code> status to <code>Enable</code></li>
</ol>
<p>Optional:</p>
<ol>
<li>
<p>Set bot's profile photo: <code>/setuserpic</code></p>
</li>
<li>
<p>Set bot's description: <code>/setdescription</code></p>
</li>
<li>
<p>Set bot's about text: <code>/setabouttext</code></p>
</li>
<li>
<p>Set commands helper: <code>/setcommands</code></p>
<p>Commands helper:</p>
<pre><code class="language-yaml">help - Show commands list.
link - Link a remote chat to a group.
chat - Generate a chat head.
info - Display information of the current Telegram chat.
update_info - Update the group name and profile picture.
unlink_all - Unlink all remote chats from a group.
extra - Access additional features from Slave Channels.
</code></pre>
</li>
</ol>
<h2 id="gettoken">Get Token</h2>
<p>Ask <a href="https://telegram.me/BotFather">@BotFather</a> for your <strong>Bot's token</strong>: <code>/token</code>, record it like: <code>123456789:EXAMPLEOF5BOTTOEKN5TOACCESS5HTTPAPI</code>.</p>
<p>Ask <a href="https://t.me/get_id_bot">@get_id_bot</a> for your <strong>Chat ID</strong>, record it like: <code>716124421</code>.</p>
<h2 id="deployandconfigonserver">Deploy and Config on Server</h2>
<h3 id="installation">Installation</h3>
<pre><code class="language-bash">sudo apt update
sudo apt install -y python3 python3-pip python3-pil python3-setuptools python3-numpy python3-yaml python3-requests
sudo apt install -y ffmpeg libmagic-dev libwebp-dev screen
pip3 install imageio==2.4.0
pip3 install ehforwarderbot==2.0.0b13
pip3 install efb-telegram-master==2.0.0b18
pip3 install efb-wechat-slave==2.0.0a16
</code></pre>
<h3 id="enableefbintheprofile">Enable EFB in the profile</h3>
<pre><code class="language-bash">mkdir -p ~/.ehforwarderbot/profiles/default
vim ~/.ehforwarderbot/profiles/default/config.yaml
</code></pre>
<p>Set the master and slave:</p>
<pre><code class="language-yaml">master_channel: &quot;blueset.telegram&quot;
slave_channels: 
    - &quot;blueset.wechat&quot;
</code></pre>
<h3 id="settokenandadmin">Set token and admin</h3>
<pre><code class="language-bash">mkdir -p ~/.ehforwarderbot/profiles/default/blueset.telegram
vim ~/.ehforwarderbot/profiles/default/blueset.telegram/config.yaml
</code></pre>
<p>Set token as <strong>Bot's token</strong> recorded before to access the bot.<br>
And set admins as <strong>Chat ID</strong> recorded before to make sure only you can access it.</p>
<pre><code class="language-yaml">token: &quot;123456789:EXAMPLEof5BOTtoken5toaccess5HTTPAPI&quot;
admins: 
    - 716124421
</code></pre>
<h3 id="launch">Launch! 🚀</h3>
<ol>
<li>Launch EFB:</li>
</ol>
<pre><code class="language-bash">screen ehforwarderbot
</code></pre>
<ol start="2">
<li>Scan the QR Code to login your WeChat account</li>
<li>Chat with your Telegram Bot</li>
</ol>
<blockquote>
<p>Post cover image from <a href="https://www.vyprvpn.com/blog/self-censorship-china-continues-extends-to-mobile-apps">Self-Censorship in China Continues, Extends to Mobile Apps</a></p>
</blockquote>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[Darknet - Custom Layer]]></title><description><![CDATA[<!--kg-card-begin: markdown--><h1 id="darknetcustomlayer">Darknet - Custom Layer</h1>
<blockquote>
<p><strong>Note</strong>: This post is based on <a href="https://github.com/AlexeyAB/darknet">AlexeyAB/darknet</a> version, the procedure of <a href="https://github.com/pjreddie/darknet">pjreddie/darknet</a> version may differ slightly (have not tried, maybe identical).</p>
</blockquote>
<p>8 steps to build your own deep learning lego module in Darknet:</p>
<ol>
<li>
<p>Define <strong>LAYER_TYPE</strong><br>
Add LAYER_TYPE for your custom layer in</p></li></ol>]]></description><link>https://ivanpp.cc/darknet-custom-layer/</link><guid isPermaLink="false">5d37d974545b0d07e1d9c051</guid><category><![CDATA[Darknet]]></category><category><![CDATA[Computer Vision]]></category><dc:creator><![CDATA[ivan Ding]]></dc:creator><pubDate>Wed, 06 Feb 2019 03:30:00 GMT</pubDate><media:content url="https://ivanpp.cc/content/images/2019/05/lego_copenhagen.jpg" medium="image"/><content:encoded><![CDATA[<!--kg-card-begin: markdown--><h1 id="darknetcustomlayer">Darknet - Custom Layer</h1>
<blockquote>
<img src="https://ivanpp.cc/content/images/2019/05/lego_copenhagen.jpg" alt="Darknet - Custom Layer"><p><strong>Note</strong>: This post is based on <a href="https://github.com/AlexeyAB/darknet">AlexeyAB/darknet</a> version, the procedure of <a href="https://github.com/pjreddie/darknet">pjreddie/darknet</a> version may differ slightly (have not tried, maybe identical).</p>
</blockquote>
<p>8 steps to build your own deep learning lego module in Darknet:</p>
<ol>
<li>
<p>Define <strong>LAYER_TYPE</strong><br>
Add LAYER_TYPE for your custom layer in <code>layer.h</code></p>
<pre><code class="language-c">typedef enum {
    // ...
    CUSTOM;
} LAYER_TYPE;
</code></pre>
</li>
<li>
<p>Define <strong>layer string</strong><br>
Add layer string for your custom layer in <code>parser.c</code></p>
<pre><code class="language-c">LAYER_TYPE string_to_layer_type(char * type)
{
// ...
if (strcmp(type, &quot;[custom]&quot;)==0) return CUSTOM;
}
</code></pre>
<p>Then darknet would be able to recognize your custom layer in cfg file:</p>
<pre><code class="language-ini">[net]
#...
[custom]
#...
</code></pre>
</li>
<li>
<p>Implement your custom layer: <code>custom_layer.c</code> and <code>custom_layer.h</code><br>
Should contain at least 4 functions:</p>
<pre><code class="language-c">layer make_custom_layer(int batch, int w, int h, .....);
void forward_custom_layer(const layer l, network_state state);
void backward_custom_layer(const layer l, network_state state);
void resize_custom_layer(layer *l, int w, int h);
</code></pre>
<p>(optional) If you want to train it with GPU, implement these:</p>
<pre><code class="language-c">#ifdef GPU
void forward_custom_layer_gpu(const layer l, network_state state);
void backward_custom_layer_gpu(const layer l, network_state state);
#endif
</code></pre>
</li>
<li>
<p>In <code>parser.c</code></p>
<p>Include source file of your custom layer(to use <code>make_custom_layer()</code>):</p>
<pre><code class="language-c">#include &quot;custom_layer.h&quot;
</code></pre>
<p>Implement the parse function:</p>
<pre><code class="language-c">layer parse_custom(list *options, size_params params)
{	
    int param1 = option_find_int(options, &quot;param1&quot;, 1);
    //...
    layer l = make_custom_layer(params.batch, params.w, params.h, param1, ...);
    l.param2 = option_find_float(options, &quot;param2&quot;, .1);
    //...
    return l;
}
</code></pre>
<p>Add your parse function in <code>parse_network_cfg_custom()</code>:</p>
<pre><code class="language-c">network parse_network_cfg_custom(char *filename, int batch)
{
    //...
    while(n){
        //...
        LAYER_TYPE lt = string_to_layer_type(s-&gt;type);
        if(lt == CONVOLUTIONAL){
            l = parse_convolutional(options, params);
        }else if(lt == CUSTOM){
            l = parse_custom(options, params);
        }
    }
    //...
    return net;
}
</code></pre>
</li>
<li>
<p>In <code>network.c</code></p>
<p>Include source file of your custom layer(to use <code>resize_custom_layer()</code>):</p>
<pre><code class="language-c">#include &quot;custom_layer.h&quot;
</code></pre>
<p>Modify <code>int resize_network(network *net, int w, int h)</code> function:</p>
<pre><code class="language-c">int resize_network(network *net, int w, int h)
{
    //...
    for (i = 0; i &lt; net-&gt;n; ++i){
        layer l = net-&gt;layers[i];
        if(l.type == CONVOLUTIONAL){
            resize_convolutional_layer(&amp;l, w, h);
        }else if(l.type == CUSTOM){
            resize_custom_layer(&amp;l, w, h);
        }
</code></pre>
</li>
<li>
<p><em>[optional]</em> <strong>If your custom layer is used to produce results(like YOLO, REGION or DETECTION):</strong></p>
<p>Implement <code>custom_num_detections()</code> and <code>get_custom_detections()</code> in <code>custom_layer.c</code>, then modify 2 functions in <code>network.c</code> (to count the detections and get the detections):</p>
<pre><code class="language-c">int num_detections(network *net, float thresh)
{
    int i;
    int s = 0;
    for (i = 0; i &lt; net-&gt;n; ++i) {
        if (l.type == CUSTOM) {
             s += custom_num_detections(l, thresh);
        }
    //...
    }
    return s;
}

void fill_network_boxes(network *net, int w, int h, float thresh, float hier, int *map, int relative, detection *dets, int letter)
{
     int prev_classes = -1;
     int j;
     for (j = 0; j &lt; net-&gt;n; ++j) {
         layer l = net-&gt;layers[j];
         if (l.type == CUSTOM){
             int count = get_custom_detections(...);
             //...
         }
         //...
     }
}
</code></pre>
</li>
<li>
<p>Add <code>custom_layer.c</code> and <code>custom_layer.h</code> in your Visual Studio Solution <code>build/darknet.sln</code></p>
<p>Or add <code>custom_layer.o</code> in your <code>Makefile</code></p>
</li>
<li>
<p>Rebuild your project</p>
</li>
</ol>
<blockquote>
<p>Post cover image from <a href="https://www.afar.com/places/lego-store-copenhagen">Lego Store | Copenhagen</a></p>
</blockquote>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[Le blog de la Rime]]></title><description><![CDATA[<!--kg-card-begin: markdown--><blockquote>
<p>這篇 blog 用來記錄我客製化 Rime 輸入法過程中的一些想法😄<br>
做這件事之前必須搞清楚的一點就是: 何謂客製化?<br>
客製化是根據需求, 進行定製<br>
Rime 可以定製的地方非常多, 但並非因爲一個組件可定製就要去定製它. 根本上還是要基於自己的習慣, 去 customize @ivanpp 自己的 Rime<br>
這篇文章用正體字, 是因爲在<a href="https://www.moedict.tw/%E4%BA%86%E8%A7%A3">瞭解</a>和使用這款軟體的過城中, 我感受到了繁體字文化的意義所在. 但由於大環境的因素, 我並不方便頻繁使用繁體字, 所以就在這裏以小小的行動表現出我的敬意了!</p>
</blockquote>
<h2 id>全局設置</h2>
<h3 id>配色方案</h3>
<p>配色方案我選擇了默認的 ps4 方案, 默認的字體以及字號我也非常滿意, 所以並沒有去爲了客製化而客製化, 只是簡單地選擇了這個配色.</p>
<h3 id>特定程序中默認使用英文輸入</h3>
<p>同樣是在 <code>weasel.custom.yaml</code> 文件中進行了的定製, 所以我寫在這裏, 事實上我並不清楚爲什麼爲什麼將這個功能的客製化放在這裏進行.<br>
很簡單的, 根據自己的需求, 在 bash, cmd, atom, MSVS 等軟體中,</p>]]></description><link>https://ivanpp.cc/le-blog-de-la-rime/</link><guid isPermaLink="false">5d37d974545b0d07e1d9c04e</guid><dc:creator><![CDATA[ivan Ding]]></dc:creator><pubDate>Mon, 04 Feb 2019 18:00:13 GMT</pubDate><media:content url="https://ivanpp.cc/content/images/2019/02/home-title.svg" medium="image"/><content:encoded><![CDATA[<!--kg-card-begin: markdown--><blockquote>
<img src="https://ivanpp.cc/content/images/2019/02/home-title.svg" alt="Le blog de la Rime"><p>這篇 blog 用來記錄我客製化 Rime 輸入法過程中的一些想法😄<br>
做這件事之前必須搞清楚的一點就是: 何謂客製化?<br>
客製化是根據需求, 進行定製<br>
Rime 可以定製的地方非常多, 但並非因爲一個組件可定製就要去定製它. 根本上還是要基於自己的習慣, 去 customize @ivanpp 自己的 Rime<br>
這篇文章用正體字, 是因爲在<a href="https://www.moedict.tw/%E4%BA%86%E8%A7%A3">瞭解</a>和使用這款軟體的過城中, 我感受到了繁體字文化的意義所在. 但由於大環境的因素, 我並不方便頻繁使用繁體字, 所以就在這裏以小小的行動表現出我的敬意了!</p>
</blockquote>
<h2 id>全局設置</h2>
<h3 id>配色方案</h3>
<p>配色方案我選擇了默認的 ps4 方案, 默認的字體以及字號我也非常滿意, 所以並沒有去爲了客製化而客製化, 只是簡單地選擇了這個配色.</p>
<h3 id>特定程序中默認使用英文輸入</h3>
<p>同樣是在 <code>weasel.custom.yaml</code> 文件中進行了的定製, 所以我寫在這裏, 事實上我並不清楚爲什麼爲什麼將這個功能的客製化放在這裏進行.<br>
很簡單的, 根據自己的需求, 在 bash, cmd, atom, MSVS 等軟體中, 我使用了默認的英文輸入.</p>
<h3 id>輸入方案</h3>
<p>Rime 支持非常非常多的輸入方案, 我僅保留了我需要的三種方案:</p>
<ol>
<li>朙月拼音·简化字</li>
<li>粵拼</li>
<li>小鶴雙拼</li>
</ol>
<p>就像 Counter Strike 1.6 <code>~</code> 中使用呼叫控制檯一樣, 我使用 <code>Contrl_L + ~</code> 呼出 Rime 的控制檯. 我去掉了默認的<code>F4</code> 因爲我覺得難以記住並且會經常引起熱鍵衝突.</p>
<p><img src="https://ivanpp.cc/content/images/2019/02/1548920937068.png" alt="Le blog de la Rime"><br>
選項 <code>1</code> 固定爲目前正在使用的輸入方案, 而選項2固定爲模式切換的選單. 事實上選1和選2都會進入模式切換的選單. It makes sense. 因爲當你已經處於你選擇的方案的時候 , 與其進行無意義的空操作, 不如讓它成爲一個具體模式切換的按鈕. 實際使用起來這樣的方式也非常的舒服.</p>
<p><img src="https://ivanpp.cc/content/images/2019/02/1548877842255.png" alt="Le blog de la Rime"><br>
而具體的模式切换選單, 被我調整成了這樣. 原因如下: 雖然全角標點看起來已經 deprecated 了. 但由於中西文的切換與中西文標點的切換我並不會大費周章的去控制檯切換, 而會去簡單的按下<code>Shift_L</code> 所以它們實際上的使用頻率更低, 理應放在更後. 所以實際上, 我擁有了一個 <code>1 2</code> 或是 <code>2 1</code> 都可以實現的便捷且不用考慮到熱鍵佔用問題的簡繁體切換. 而 <code>1 1</code> 和 <code>2 1</code> 成爲了真正的空操作. 事實上當自己呼出控制檯後不知道自己想要幹什麼的時候, 我們就要執行空操作或者去取消. 根據我的使用發現, 當我自己的思維比較活躍的時候, 我往往會迅速的使用 <code>1 1</code> 進行空操作並繼續碼字, 而當我在思考或者比較遲鈍的時候我往往會按下 <code>Esc</code> 來達到同樣的效果.</p>
<h3 id>中英文切換</h3>
<p>我使用了如下的中英文切換方案來應對一些不同的情況:<br>
<code>Control_L</code> 設置成了 <code>commit_code</code>, 當我處於中文輸入的狀態, 即將輸入一段英文, 輸入了第一個單詞才發現自己的狀態, 這時候按 <code>enter</code> 再去按 <code>Shift_L</code> 進行中英文的轉化就很費事, <code>Control_L</code> 這時可以立刻將我已經輸入的英文內容 commit 到屏幕上, 並自動切換爲英文模式, 可以省區不少力氣.<br>
<img src="https://ivanpp.cc/content/images/2019/02/1549299123614.png" alt="Le blog de la Rime"><br>
另外, 在沒有任何輸入的情況下, <code>Control_L</code> 也可以被用作中英文切換, 但我的小拇指已經黏在了 <code>Shift_L</code> 上了, 我想我應該很少會去用它實現這個功能.</p>
<p><code>Shift_L</code> 被設定成 <code>inline_ascii</code>, 在無輸入的狀態下, 它是中英文的切換, 使用頻率比較高. 這裏它做另一種用法, 就是在大量的中文之間, 我需要插入一個單詞量 'grater than 1' 的短語, 而這個時候, 我又又又忘記了切換爲英文模式, 或者說專登沒有去切換. 那麼我需要做的就是, 再輸入了第一個英文字母後按下 <code>Shift_L</code> 然後輸入完整個短語, 或者句子, 或者郵箱? 再按 <code>enter</code>. More effective!<br>
<img src="https://ivanpp.cc/content/images/2019/02/1548875944095.png" alt="Le blog de la Rime"></p>
<h3 id><a href="https://gist.github.com/lotem/3076166">自動識別西文及數字組成的用戶名</a></h3>
<p>經常會輸入一些郵箱或者文件名, 所以遇到 <code>_</code> 或是 <code>@</code> 時不能直接上屏, 要允許輸完整個郵箱或者是文件名:<br>
<img src="https://ivanpp.cc/content/images/2019/02/1549300142643.png" alt="Le blog de la Rime"><br>
當然如果自己記得事先切換到西文模式, 那也非常好. 郵箱也是一樣:<br>
<img src="https://ivanpp.cc/content/images/2019/02/1548879754326.png" alt="Le blog de la Rime"><br>
Convenient~</p>
<h2 id>朙月拼音·简化字</h2>
<p>對自己使用頻率最高的'朙月拼音·简化字'我進行了一些定製與擴充, 從而讓我自己的使用更加便利.</p>
<h3 id><a href="https://github.com/rime/home/wiki/CustomizationGuide#%E4%BD%BF%E7%94%A8%E5%85%A8%E5%A5%97%E8%A5%BF%E6%96%87%E6%A8%99%E9%BB%9E">全套西文標點</a></h3>
<p>我使用頻率最高的輸入方案是'朙月拼音·简化字'(下邊簡稱爲'簡體方案'), 而我最經常做的事情是 coding, 基於這兩點, 我在簡體方案中 ban 掉了中文標點, 徹徹底底! 因爲由中文符號造成的程式錯誤會讓我發瘋, 而全套的西文標點在日常聊天中也很協調. 至少我用起來很舒服.😃</p>
<h3 id>換頁方案</h3>
<p>增加了 MacOS 的換頁方案, <code>[ ]</code>, 同時保留了所有默認的方案, 包括 <a href="https://github.com/rime/home/wiki/UserGuide#emacs-%E9%A2%A8%E6%A0%BC%E7%9A%84%E7%B7%A8%E8%BC%AF%E9%8D%B5">Emacs 風格</a>的那些. 實際上我使用最頻繁的還是 <code>- +</code>.</p>
<h3 id><a href="https://github.com/rime/home/wiki/CustomizationGuide#%E6%B4%BB%E7%94%A8%E6%A8%99%E9%BB%9E%E5%89%B5%E5%BB%BA%E8%87%AA%E5%AE%9A%E7%BE%A9%E8%A9%9E%E7%B5%84">自定義詞組</a></h3>
<p>拼音是我最熟練的輸入方案, 而我又很少去輸入生僻字, 所以並不需要筆畫輸入作爲編碼反查. 所以我將<code>~</code> 定義爲自定義詞組鍵位:<br>
<code>~f</code> 用來輸入常用的 Emoji, 😋<br>
<code>~m</code> 用來輸入數學符號, ±<br>
<code>~ar</code> 用來輸入箭頭, ↑</p>
<p>當然還有很多其他什麼的! 還有給自己埋了<s>彩蛋</s> 😂</p>
<h3 id>詞庫擴充</h3>
<p>主要就是擴充了英文詞庫, 以滿足我自己的(頻繁地 😄)中英文混合輸入的需要.<br>
夾帶了大量'私貨':<br>
<img src="https://ivanpp.cc/content/images/2019/02/1549301339524.png" alt="Le blog de la Rime"></p>
<h2 id>快捷鍵</h2>
<p><code>Shift + Control + 1/2/3/4/5</code> 對應 menu 的五個選項, 其中 <code>1</code> 對應: 下一個輸入法.<br>
其實對於我來說, 一隻手來按的話, 還是用<code>Contrl_L + ~</code> 再按具體數字更加快捷(手掌配合兩根手指)<br>
使用 <code>Control + Delete</code> 或是 <code>Shift + Delete</code> 可以刪除字典中的錯詞.</p>
<h2 id>備份</h2>
<p>需要備份的文件有: <code>default.custom.yaml</code>, <code>weasel.custom.yaml</code>, <code>luna_pinyin_simp.custom.yaml</code><br>
還有標點定義文件 <code>ivanpp_punc.yaml</code>, 字典定義文件 <code>ivanpp_dict.extended.dict.yaml</code> 以及其中使用的所有字典文件.</p>
<p>可定期<a href="https://github.com/rime/home/wiki/UserGuide#%E5%82%99%E4%BB%BD%E5%8F%8A%E5%90%88%E4%BD%B5%E8%A9%9E%E5%85%B8%E5%BF%AB%E7%85%A7">備份字典快照</a>, <code>luna_pinyin.userdb.txt</code>, 該文件處於用戶文件夾內.<br>
RIME 也提供 GUI 用于<a href="https://github.com/rime/home/wiki/UserGuide#%E5%82%99%E4%BB%BD%E5%8F%8A%E5%90%88%E4%BD%B5%E8%A9%9E%E5%85%B8%E5%BF%AB%E7%85%A7">備份及合併詞典快照</a>和<a href="https://github.com/rime/home/wiki/UserGuide#%E5%B0%8E%E5%87%BA%E5%8F%8A%E5%B0%8E%E5%85%A5%E6%96%87%E6%9C%AC%E7%A2%BC%E8%A1%A8">導出及導入文本碼表</a>.<br>
所以... 幾個月後, 導出文本碼表, 看看我經常使用哪些詞彙好了! 😋</p>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[Darknet - Yolo Layer]]></title><description><![CDATA[<!--kg-card-begin: markdown--><h2 id="inputshape">Input Shape</h2>
<p>The convolutional layer before yolo layer should have <code>filters=n*(4+1+classes)</code>. <code>n</code> is number of the prior anchors we used in the following yolo layer, namely <code>sizeof(mask)</code>. <code>classes</code> is number of the classes.</p>
<pre><code class="language-ini">[convolutional]
size=1
stride=1
pad=1
filters=75
activation=linear
</code></pre>
<p>The</p>]]></description><link>https://ivanpp.cc/yolo_layer/</link><guid isPermaLink="false">5d37d974545b0d07e1d9c04d</guid><category><![CDATA[Darknet]]></category><category><![CDATA[Computer Vision]]></category><dc:creator><![CDATA[ivan Ding]]></dc:creator><pubDate>Mon, 28 Jan 2019 14:07:00 GMT</pubDate><media:content url="https://ivanpp.cc/content/images/2019/01/darknet.jpg" medium="image"/><content:encoded><![CDATA[<!--kg-card-begin: markdown--><h2 id="inputshape">Input Shape</h2>
<img src="https://ivanpp.cc/content/images/2019/01/darknet.jpg" alt="Darknet - Yolo Layer"><p>The convolutional layer before yolo layer should have <code>filters=n*(4+1+classes)</code>. <code>n</code> is number of the prior anchors we used in the following yolo layer, namely <code>sizeof(mask)</code>. <code>classes</code> is number of the classes.</p>
<pre><code class="language-ini">[convolutional]
size=1
stride=1
pad=1
filters=75
activation=linear
</code></pre>
<p>The shape of the input tensor is $(b, n*(4+1+classes), h, w)$. More specifically, it's the concatenation of $n$ individual $(4+1+classes, h, w)$ tensor per image. It's an 1-d array actually, but imagine it to be a $(b, n, (4+1+classes), h, w)$ tensor. The first and second dimensions are $w$ and $h$ and the third dimension is $(4+1+classes)$.  So for all <code>b</code> images and all <code>n</code> anchors, we have a $(4+1+classes)$ prediction tensor at each location. And the stride of these predictions' elements are <code>l.h*l.w</code>.</p>
<pre><code class="language-c">static int entry_index(layer l, int batch, int location, int entry)
{
    int n =   location / (l.w*l.h);
    int loc = location % (l.w*l.h);
    return batch*l.outputs + n*l.w*l.h*(4+l.classes+1) + entry*l.w*l.h + loc;
}
</code></pre>
<p><code>location</code> should have the value between <code>0</code> and <code>l.n*l.h*l.w-1</code>, it gives the number of the prior anchor <code>n</code> and the location <code>loc</code> simultaneously. <code>entry</code> should between <code>0</code> and <code>4+1+classes-1</code>, gives the index of the third dimension(the prediction tensor).</p>
<h2 id="prioranchorboxes">Prior anchor boxes</h2>
<p>In Yolo Layer, the net predicts offset from the bounding box prior width and height. We define 3 options <code>mask</code>, <code>num</code>, <code>anchors</code> to use the prior anchor boxes.</p>
<p>In cfg file:</p>
<pre><code class="language-ini">[net]
width=416
height=416

[yolo]
mask = 0,1,2
num=9
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
</code></pre>
<p>In c source file:</p>
<pre><code class="language-c">// https://github.com/pjreddie/darknet/blob/master/src/parser.c#L306-L342
int total = option_find_int(options, &quot;num&quot;, 1);
int num = total;
char *a = option_find_str(options, &quot;mask&quot;, 0);  // char *a = &quot;0,1,2&quot;;
int *mask = parse_yolo_mask(a, &amp;num);  // int *mask = {0, 1, 2};
</code></pre>
<p><code>num</code> in cfg file or <code>total</code> in the source file is the <strong>total number of the prior anchors</strong> that we can use in the entire network. <code>mask</code> in cfg file gives the index of the prior anchors that we use in the current yolo layer. So we can define lots of anchors and use only few of them in current yolo layer. <code>anchors</code> in cfg file gives all <code>num</code> available anchors with the shape $(p_w, p_h)$. The anchor sizes $(p_w, p_h)$ are <a href="https://github.com/pjreddie/darknet/issues/555#issuecomment-376190325"><strong>actual pixel values</strong></a> on input image of the network, in this case is $(416, 416)$. So $(10, 13)$ is a prior anchor with width=10 pixels and height=13 on the $(416, 416)$ resized input image.</p>
<pre><code class="language-ini"># Example
#yolo_layer0
[yolo]
mask = 0,1
num=3
anchors = 10,13,  16,30,  33,23

#yolo_layer1
[yolo]
mask = 1,2
num = 3
anchors = 10,13,  16,30,  33,23

#yolo_layer2
[yolo]
num=2
</code></pre>
<p>In the example above,  yolo_layer0 uses anchors $(10, 13), (16, 30)$, yolo_layer1 uses anchors $(16, 30), (33, 23)$ and yolo_layer2 does not use prior anchor. In fact, yolo_layer2 uses 2 $(0.5, 0.5)$ anchors as default.</p>
<h2 id="gradientsofobjectivenessprediction">Gradients of Objectiveness Prediction</h2>
<p>Yolo layer predicts <code>l.n*l.h*l.w</code> bounding boxes per image(<code>l.n</code> is the length of <code>*mask</code>, namely the number of the prior anchors used in the current yolo layer). And there is one objectiveness score for each predicted bounding box which gives the $Pr(Object)​$ of it. What we want is to predict <code>1</code> objectiveness for all positive samples and <code>0</code> for all negative ones.</p>
<pre><code class="language-ini">[yolo]
ignore_thresh = .5
truth_thresh = .9
</code></pre>
<p>Two kinds of predictions are considered as positive:</p>
<ol>
<li>
<p>For all <code>num</code> prior anchors centered in the same cell with the GT Bbox(ground truth bounding box), the anchor which has the most similar shape with the GT Bbox will be the only anchor responsible for that GT Bbox. In the other word, at most one best prior anchor will be allocated to each GT Bbox in the current yolo layer. And if this best anchor is not used in the current yolo layer(the index of the best anchor is not in <code>*mask</code> of the current yolo layer), no anchor will be allocated for that GT Bbox.</p>
</li>
<li>
<p>For all <code>l.n*l.h*l.w</code> predictions, if the highest IoU between the prediction and all ground truth bounding boxes is grater than <code>truth_thresh</code>, this prediction will be responsible for the GT Bbox which gives the highest IoU with it.</p>
</li>
</ol>
<p>Additionally, yolo layer sets <code>truch_thresh = 1</code> as default. Since IoU is always less than or equal to <code>1</code>, the second situation will never happen. So yolo layer penalizes at most 1 (of <code>l.n*l.h*l.w</code>) prediction for each GT Bbox. Penalize its objectiveness for not being <code>1</code>.</p>
<p>And there is a <code>ignore_thresh</code> for negative(background) definition. If the highest IoU between the prediction and all GT Bboxes are less than or equal to <code>ignore_thresh</code>, that prediction will be assigned as negative. Penalize its objectiveness score for not being <code>0</code>.</p>
<h2 id="gradientsofclassprobability">Gradients of Class Probability</h2>
<p><code>*output</code> is the input <code>*state.input</code> of the last convolutional layer, namely the prediction tensor. <code>*delta</code> is the gradient of the yolo layer. <code>index</code> gives the index of the starting class probability $Pr(Class_0|Object)$ for certain batch <code>b</code>, certain anchor <code>n</code> and certain position <code>w, h</code>. Remember we have <code>b</code> images, <code>n</code> anchors for each position and <code>w*h</code> locations. <code>class</code> is class of the ground truth and <code>classes</code> gives the number of the classes. <code>stride</code> will always be <code>l.w*l.h</code> and <code>*avg_cat</code> is for statistic usage, to calculate the average class probability</p>
<pre><code class="language-c">void delta_yolo_class(float *output, float *delta, int index, int class, int classes, int stride, float *avg_cat)
{
    int n;
    if (delta[index]){  // if some anchor is responsible for more than one GT
        delta[index + stride*class] = 1 - output[index + stride*class];
        if(avg_cat) *avg_cat += output[index + stride*class];
        return;
    }
    for(n = 0; n &lt; classes; ++n){  // common situation
        // penalize Pr(Classi|Object) for all classes
        delta[index + stride*n] = ((n == class)?1 : 0) - output[index + stride*n];
        if(n == class &amp;&amp; avg_cat) *avg_cat += output[index + stride*n];
    }
}
</code></pre>
<p>Given the index of the $Pr(Class_0|Object)$ of <a href="https://github.com/pjreddie/darknet/blob/master/src/yolo_layer.c#L111-L123"><code>delta_yolo_class</code></a> penalize $Pr(Class_i|Object)$ for all $Class_i$. It wants the $Pr(Class_{i=gt}|Object)$ to be <code>1</code> and others to be <code>0</code>. And if some lucky anchor is responsible for more than one ground truth box and these GT boxes may or may not contain the same class. Just overwrite the gradients for the other ground truth class probability and leave others along. For example if we have 20 classes and some lucky anchor is responsible for 2 different classes(let's say there are dog and cat) in some naughty image. It will penalize $Pr(Class_{i=dog}|Object)$ and $Pr(Class_{i=cat}|Object)$ for not be <code>1</code> and penalize others for not be <code>0</code>.</p>
<pre><code class="language-c">int obj_index = entry_index(l, b, n*l.w*l.h + j*l.w + i, 4);  // index of objectiveness
avg_anyobj += l.output[obj_index];  // sum the objectiveness for all pred box
l.delta[obj_index] = 0 - l.output[obj_index];  // common situation, low iou
if (best_iou &gt; l.ignore_thresh) {  // best_iou &gt; ignore_thresh -&gt; Positive, then don't penalize the 
    l.delta[obj_index] = 0;
}
if (best_iou &gt; l.truth_thresh) {  // nerver gonna happen when l.truth_thresh = 1
    l.delta[obj_index] = 1 - l.output[obj_index];  // 

    int class_id = state.truth[best_t*(4 + 1) + b*l.truths + 4];  // get the class_id of the GT box
    if (l.map) class_id = l.map[class_id];
    int class_index = entry_index(l, b, n*l.w*l.h + j*l.w + i, 4 + 1);
    delta_yolo_class(l.output, l.delta, class_index, class_id, l.classes, l.w*l.h, 0, l.focal_loss);
    box truth = float_to_box_stride(state.truth + best_t*(4 + 1) + b*l.truths, 1);
    delta_yolo_box(truth, l.output, l.biases, l.mask[n], box_index, i, j, l.w, l.h, state.net.w, state.net.h, l.delta, (2-truth.w*truth.h), l.w*l.h);
}
</code></pre>
<h2 id="gradientofboxprediction">Gradient of Box Prediction</h2>
<p>The network predicts 4 coordinates for each bounding box, $t_x, t_y, t_w, t_h$.</p>
<p>$\sigma(t_x)$ and $\sigma(t_y)$ are the box center position relative to the cell. $t_w$ and $t_h$ predict how much the bounding box is grater or smaller than the prior anchor. For example, if $t_w &gt; 0$, then there will be $\mathrm{e}^{t_w} &gt; 1$, and we will have $b_w  &gt; p_w$.</p>
<p>$$<br>
\begin{align}<br>
b_x &amp; = \sigma(t_x) + c_x \\<br>
b_y &amp; = \sigma(t_y) + c_y \\<br>
b_w &amp; = p_w\mathrm{e}^{t_w} \\<br>
b_h &amp; = p_h\mathrm{e}^{t_h} \\<br>
\end{align}<br>
$$</p>
<p>$b_x$ and $b_y$ are the pixel distance from the top left corner of the current feature map $(l.w, l.h)$. And since $p_w$ and $p_h$ are actual pixel values, $b_w$ and $b_h$ are the actual pixel values on the resized network input image. To get normalized prediction result, $b_x$ and $b_y$ should be divided by the size of current feature map <code>lw</code> and <code>lh</code>. Similarly, $b_w$ and $b_h$ should be divided by the size of the resized input of the network <code>w</code> and <code>h</code>.</p>
<pre><code class="language-c">box get_yolo_box(float *x, float *biases, int n, int index, int i, int j, int lw, int lh, int w, int h, int stride)
{
    box b;
    b.x = (i + x[index + 0*stride]) / lw;
    b.y = (j + x[index + 1*stride]) / lh;
    b.w = exp(x[index + 2*stride]) * biases[2*n]   / w;
    b.h = exp(x[index + 3*stride]) * biases[2*n+1] / h;
    return b;
}
</code></pre>
<p><a href="https://github.com/pjreddie/darknet/blob/master/src/yolo_layer.c#L83-L91"><code>get_yolo_box</code></a> converts the prediction result $\sigma(t_x), \sigma(t_y), t_w, t_h$ to the normalized box struct instance. Inversely, to compute the gradients of the bounding box prediction, we should convert the already normalized ground truth label <code>box truth</code> back to $\sigma(\hat{t}_x), \sigma(\hat{t}_y), \hat{t}_w, \hat{t}_h$.</p>
<pre><code class="language-c">float delta_yolo_box(box truth, float *x, float *biases, int n, int index, int i, int j, int lw, int lh, int w, int h, float *delta, float scale, int stride)
{
    box pred = get_yolo_box(x, biases, n, index, i, j, lw, lh, w, h, stride);
    float iou = box_iou(pred, truth);

    float tx = (truth.x*lw - i);
    float ty = (truth.y*lh - j);
    float tw = log(truth.w*w / biases[2*n]);
    float th = log(truth.h*h / biases[2*n + 1]);

    delta[index + 0*stride] = scale * (tx - x[index + 0*stride]);
    delta[index + 1*stride] = scale * (ty - x[index + 1*stride]);
    delta[index + 2*stride] = scale * (tw - x[index + 2*stride]);
    delta[index + 3*stride] = scale * (th - x[index + 3*stride]);
    return iou;
}
</code></pre>
<p>So using <a href="https://github.com/pjreddie/darknet/blob/master/src/yolo_layer.c#L93-L108"><code>delta_yolo_box</code></a>, we can convert normalized bounding box label to $\sigma(\hat{t}_x), \sigma(\hat{t}_y), \hat{t}_w, \hat{t}_h$ then subtract $\sigma(t_x), \sigma(t_y), t_w, t_h$ to get the gradients. But if we only do the subtraction, large bounding boxes will take advantage of their big size to impact the gradient. To compromise this, we multiply the gradients by <code>scale</code> to magnify the gradients of the relatively small GT bounding boxes. Generally, set <code>scale</code> to <code>2-truth.w*truth.h</code> to do this, gradients of the small GT bounding boxes will be magnify by a factor close to <code>2</code> while gradients of the big GT bounding boxes will be magnify by a factor close to <code>1</code>.</p>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[📙 PyTorch Note - CUDA]]></title><description><![CDATA[<!--kg-card-begin: markdown--><h2 id="cputensororgputensor">CPU Tensor or GPU tensor?</h2>
<blockquote>
<p><strong>doc</strong>: <a href="https://pytorch.org/docs/stable/tensors.html">torch.Tensor — PyTorch master documentation</a><br>
<strong>ref</strong>: <a href="https://discuss.pytorch.org/t/how-to-create-a-tensor-on-gpu-as-default/2128">How to create a tensor on GPU as default - PyTorch Forums</a><br>
<a href="https://pytorch.org/docs/stable/tensors.html#torch.Tensor"><code>torch.Tensor</code></a> is an alias for the default tensor type (<code>torch.FloatTensor</code>). <em>- said by the document</em></p>
</blockquote>
<pre><code class="language-Python">import torch
import torch.nn as nn
import torch.</code></pre>]]></description><link>https://ivanpp.cc/pytorch_note_cuda/</link><guid isPermaLink="false">5d37d974545b0d07e1d9c04b</guid><category><![CDATA[PyTorch]]></category><category><![CDATA[Note]]></category><dc:creator><![CDATA[ivan Ding]]></dc:creator><pubDate>Thu, 20 Dec 2018 14:39:00 GMT</pubDate><media:content url="https://ivanpp.cc/content/images/2019/02/PyTorch_GPU-1.png" medium="image"/><content:encoded><![CDATA[<!--kg-card-begin: markdown--><h2 id="cputensororgputensor">CPU Tensor or GPU tensor?</h2>
<blockquote>
<img src="https://ivanpp.cc/content/images/2019/02/PyTorch_GPU-1.png" alt="📙 PyTorch Note - CUDA"><p><strong>doc</strong>: <a href="https://pytorch.org/docs/stable/tensors.html">torch.Tensor — PyTorch master documentation</a><br>
<strong>ref</strong>: <a href="https://discuss.pytorch.org/t/how-to-create-a-tensor-on-gpu-as-default/2128">How to create a tensor on GPU as default - PyTorch Forums</a><br>
<a href="https://pytorch.org/docs/stable/tensors.html#torch.Tensor"><code>torch.Tensor</code></a> is an alias for the default tensor type (<code>torch.FloatTensor</code>). <em>- said by the document</em></p>
</blockquote>
<pre><code class="language-Python">import torch
import torch.nn as nn
import torch.nn.functional as F
</code></pre>
<p>Torch defines 8 CPU tensor type and 8 GPU tensor type<br>
The <strong>default</strong> tensor type is <code>torch.FloatTensor</code>, which is a CPU Tensor and has a dtype of <code>torch.float32</code></p>
<pre><code class="language-Python">print(torch.get_default_dtype()) # To get the default Tensor dtype(torch.float32)
</code></pre>
<p>And this(<code>torch.FloatTensor</code>) makes tensors to be created on CPU if no device is specified.<br>
To make tensors to be crated on GPU by default:</p>
<pre><code class="language-Python">torch.set_default_tensor_type('torch.cuda.FloatTensor')
</code></pre>
<p>After this, all tensors will be created on the selected GPU device, and still has a dtype of  <code>torch.float32</code> by default.</p>
<pre><code class="language-Python">a = torch.tensor([1.])
print(a.dtype)
print(a.device)
</code></pre>
<p>One more example:</p>
<pre><code class="language-Python">torch.set_default_tenor_type('torch.cuda.DoubleTensor')
</code></pre>
<p>This makes tensors to be created on GPU by default and has a dtype of <code>torch.float64</code></p>
<h2 id="devicecurrentdevice">Device, Current Device</h2>
<p>Use <code>torch.device</code> to get the <code>torch.device</code> object</p>
<ol>
<li>
<p>Get the CPU device</p>
<pre><code class="language-Python">cpu = torch.device('cpu') # Current CPU device
cpu1 = torch.device('cpu:0')
</code></pre>
<p>It's <strong>exactly the same</strong>, cuz there is no multiple CPUs mode.</p>
</li>
<li>
<p>Get the GPU device</p>
<pre><code class="language-Python"># Current GPU device
cuda = torch.device('cuda')
cuda = torch.device('cuda', None)
# GPU 0
cuda0 = torch.device('cuda:0')
cuda0 = torch.device('cuda', 0)
# GPU 1
cuda1 = torch.device('cuda:1')
cuda1 = torch.device('cuda', 1)
</code></pre>
<p>Current CPU device will always be 'CPU:0', but <strong>current GPU device depends (on the currently selected device)</strong>.<br>
So, if <strong>currently selected device</strong> is 'GPU 0' now, <code>gpu</code> is 'GPU 0'. But when we change current selected device to 'GPU 1'(if u have...😂), <code>gpu</code> will become 'GPU 1'.</p>
</li>
<li>
<p>Create Tensors on device<br>
Get the index of currently selected device:</p>
<pre><code class="language-Python">print(torch.cuda.current_device())
</code></pre>
<p>Let's suppose it's <code>0</code>, now we can</p>
<pre><code class="language-Python"># Create a tensor on CPU, given a torch.device object or a string
a = torch.tensor([1.], device=cpu)
a = torch.tensor([1.], device='cpu')
# Create a tensor on currently selected GPU, which is GPU 0 now
b = torch.tensor([1.], device=cuda)
b = torch.tensor([1.], device='cuda')
# Create a tensor on specific GPU
c = torch.tensor([1.], device=cuda1)
c = torch.tensor([1.], device='cuda:1')
</code></pre>
</li>
</ol>
<h2 id="withonegpu">With One GPU</h2>
<p>With one GPU, we only care about tensor <strong>on CPU or on GPU</strong>. No need to care about currently selected device, cuz u have only 1 GPU that can be selected :joy:.</p>
<h3 id="transferdatacpugpu">Transfer Data (CPU &lt;-&gt; GPU)</h3>
<p><a href="https://pytorch.org/docs/stable/tensors.html?highlight=torch%20tensor%20cuda#torch.Tensor.cuda"><code>torch.Tensor.cuda()</code></a> returns a copy of this <code>torch.Tensor</code> object in CUDA memory in a specified device and will copy to the currently selected device if no device parameter was given.</p>
<pre><code class="language-Python">cuda = torch.device('cuda')
cuda0 = torch.device('cuda:0')
tensor = torch.randn(2, 2)
# To currently selected GPU device or specific device(both 'cuda:0' in this situation)
tensor = tensor.cuda()
tensor = tensor.cuda(cuda0)
</code></pre>
<p>Inversely,  <a href="https://pytorch.org/docs/stable/tensors.html?highlight=torch%20tensor%20cpu#torch.Tensor.cpu"><code>torch.Tensor.cpu()</code></a> to get a copy in CPU memory.</p>
<pre><code class="language-Python">tensor = torch.randn(2, 2)
# CPU -&gt; GPU
tensor = tensor.cuda()
# GPU -&gt; CPU
tensor = tensor.cpu()
</code></pre>
<p><a href="https://pytorch.org/docs/stable/tensors.html?highlight=torch%20tensor#torch.Tensor.to"><code>torch.Tensor.to()</code></a> performs Tensor dtype and/or device conversion. It returns a copy of the desired Tensor.</p>
<pre><code class="language-Python">cuda0 = torch.device('cuda:0')
cpu = torch.device('cpu')
tensor = torch.randn(2, 2)
# to float64
tensor = tensor.to(torch.float64)
# to float 32, using torch.Tensor.type()
tensor = tensor.type(torch.float32)
# to GPU
tensor = tensor.to(cuda0)
# to CPU
tensor = tensor.to(cpu)
</code></pre>
<p>So <code>torch.Tensor.to(device, dtype)</code> can be considered as a combination of <code>torch.Tensor.cuda(device)</code>, <code>torch.Tensor.cpu(device)</code> and <code>torch.Tensor.type(dtype)</code></p>
<h3 id="transfermodelcpugpu">Transfer Model (CPU &lt;-&gt; GPU)</h3>
<p>Once the data Tensor is allocated (to CPU/GPU), we can do operations to it irrespective of the selected device, and the results will be always placed on the same device as the Tensor.</p>
<p>Furthermore, if we do operations between 2 or more Tensors, they should be allocated to the same device so the operation will take place at that device and the result will be placed there.</p>
<p><a href="https://pytorch.org/docs/stable/nn.html#parameters"><code>torch.nn.Parameter</code></a> is a kind of Tensor that is to be considered a module parameter. And Parameters are Tensor subclasses. Equally, <code>torch.nn</code> module provides <a href="https://pytorch.org/docs/stable/nn.html?highlight=torch%20nn%20module%20cuda#torch.nn.Module.cuda"><code>torch.nn.Module.cuda()</code></a>, <a href="https://pytorch.org/docs/stable/nn.html?highlight=torch%20nn%20module%20cpu#torch.nn.Module.cpu"><code>torch.nn.Module.cpu()</code></a> methods for easily tensor(parameters) transferring between CPU and GPU. And also the <a href="https://pytorch.org/docs/stable/nn.html?highlight=torch%20nn%20module#torch.nn.Module.to"><code>torch.nn.Module.to()</code></a> method to do the transfer/cast things.</p>
<pre><code class="language-Python">class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
       x = F.relu(self.conv1(x))
       return F.relu(self.conv2(x))

model = Model()
# list contains parameters of model.conv1(weight and bias)
param_list_conv1 = list(model.conv1.parameters())
print(param_list_conv1[0].device)
# CPU -&gt; GPU(.cuda() method)
model.cuda()
print(param_list_conv1[0].device)
# GPU -&gt; CPU(.cpu() method)
model.cpu()
print(param_list_conv1[0].device)
# CPU -&gt; GPU(.to() method)
cuda0 = torch.device('cuda:0')
model.to(cuda0)
print(param_list_conv1[0].device)
</code></pre>
<p>After allocating data and model to GPU, we are able to use GPU to accelerate our training process.</p>
<h2 id="withmultiplegpus">With Multiple GPUs</h2>
<h3 id="usecontextmanager">Use Context-Manager</h3>
<p>With multiple GPUs, you should care about the currently selected device. Use a <strong>context-manager</strong> <a href="https://pytorch.org/docs/stable/cuda.html?highlight=torch%20cuda%20device#torch.cuda.device"><code>torch.cuda.device()</code></a> to manually control which GPU a tensor is created on meanwhile make our code more clear.</p>
<pre><code class="language-Python">cuda = torch.device('cuda')

# Create tensor a,b,c on device cuda:0
with torch.cuda.device(0):
    a = torch.tensor([1., 2.], device=cuda)
    b = torch.tensor([1., 2.]).cuda()
    c = torch.tensor([1., 2.]).to(cuda)

# Create tensor d,e,f on device cuda:1
with torch.cuda.device(1):
    d = torch.tensor([1., 2.], device=cuda)
    e = torch.tensor([1., 2.]).cuda()
    f = torch.tensor([1., 2.]).to(cuda)
</code></pre>
<h3 id="controlgpuvisibilitywithcuda_visible_devices">Control GPU Visibility with CUDA_VISIBLE_DEVICES</h3>
<blockquote>
<p><strong>doc</strong>: <a href="https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars">J. CUDA Environment Variables :: CUDA Toolkit Documentation</a><br>
<strong>ref</strong>: <a href="https://devblogs.nvidia.com/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/">CUDA Pro Tip: Control GPU Visibility with CUDA_VISIBLE_DEVICES | NVIDIA Developer Blog</a><br>
<strong>ref</strong>: <a href="https://genius.com/2pac-cant-c-me-lyrics">2Pac – Can't C Me Lyrics | Genius Lyrics</a></p>
</blockquote>
<p>Let's suppose (or dream about) that you have 4 GPUs, and want to use three of them to train your model while using the remaining one to play <overwatch>:video_game:. Just set the environment variable <code>CUDA_VISIBLE_DEVICES</code> to restrict the devices that your CUDA application(model training process) sees.</overwatch></p>
<p>Many ways to achieve that, just introduce 2 of them:</p>
<ol>
<li>
<p>Set the environment variable in your python script (<em>not recommended</em>)</p>
<pre><code class="language-Python">import os
os.environ['CUDA_VISIBLE_DEVICE']='1,2,3'
</code></pre>
<p>This method is not recommended cuz it's not flexible. Use this method when you do this thing as normal.</p>
</li>
<li>
<p>Set the environment variable when you run the python script (<strong>recommended</strong>)</p>
<pre><code class="language-Bash">CUDA_VISIBLE_DEVICES=1,2,3 python train.py
</code></pre>
<p>Use this if you just want to play <overwatch> tonight. Or only make one of them visible to test the compatibility with one GPU environment.</overwatch></p>
</li>
</ol>
<p>And after that,</p>
<blockquote>
<p><em>The blind stares of a million pairs of eyes<br>
Lookin' hard but won't realize<br>
That they will never see the 'GPU0'!</em></p>
</blockquote>
<h3 id="dataparallelism">Data Parallelism</h3>
<blockquote>
<p><strong>ref</strong>: <a href="https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html#">Optional: Data Parallelism — PyTorch Tutorials</a><br>
<strong>doc</strong>: <a href="https://pytorch.org/docs/stable/nn.html#dataparallel-layers-multi-gpu-distributed">torch.nn — PyTorch master documentation</a></p>
</blockquote>
<p>Torch will only use one GPU by default. Simply use <a href="https://pytorch.org/docs/stable/nn.html#dataparallel"><code>torch.nn.DataParallel</code></a> to run your model parallelized over multiple GPUs <strong>in the batch dimension</strong>.</p>
<pre><code class="language-Python">model = nn.DataParallel(model)
</code></pre>
<h2 id="usepinnedmemorybufferandasynchronization">Use Pinned Memory Buffer and Asynchronization</h2>
<blockquote>
<p><strong>ref</strong>: <a href="https://discuss.pytorch.org/t/when-to-set-pin-memory-to-true/19723">When to set pin_memory to true? - vision - PyTorch Forums</a><br>
<strong>ref</strong>: <a href="https://devblogs.nvidia.com/how-optimize-data-transfers-cuda-cc/">How to Optimize Data Transfers in CUDA C/C++ | NVIDIA Developer Blog</a></p>
</blockquote>
<p><a href="https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.DataLoader"><code>torch.utils.data.DataLoader</code></a> admits a parameter <strong><code>pin_memory</code></strong>, and if <code>True</code> the tensors will be copied into CUDA pinned memory.</p>
<pre><code class="language-Python">#https://github.com/pytorch/examples/blob/master/imagenet/main.py#L211-L223
train_loader = torch.utils.data.DataLoader(
    train_dataset, batch_size=args.batch_size, shuffle=(train_sampler is None),
    num_workers=args.workers, pin_memory=True, sampler=train_sampler)
</code></pre>
<p>By default, GPU operations are asynchronous, this allows to execute more computations in parallel. But when copying data between CPU and GPU or between GPUs, it will be synchronous by default. E.g. <code>torch.Tensor.to()</code>， <code>torch.Tensor.cuda()</code> and <code>torch.nn.Module.to()</code> . And these functions admit a <strong><code>non_blocking</code></strong>  argument which was named as <code>async</code> before. When <code>non_blocking</code> is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.</p>
<pre><code class="language-Python">#https://github.com/pytorch/examples/blob/master/imagenet/main.py#L270-L272
input = input.cuda(args.gpu, non_blocking=True)
target = target.cuda(args.gpu, non_blocking=True)
</code></pre>
<p>These methods provides a larger bandwidth between the host(CPU) and the device(GPU), also improves the data transfer performance.</p>
<h2 id="writedeviceagnosticcode">Write Device-Agnostic Code</h2>
<h3 id="usecudaifpossible">Use CUDA if Possible</h3>
<pre><code class="language-Python"># At the begining of the script
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# When loading data
image, label = image.to(device), label.to(device)
# Create the model
model = Model().to(device)
</code></pre>
<h3 id="useargumentstocontrolit">Use Arguments to Control it</h3>
<pre><code class="language-Python"># https://pytorch.org/docs/stable/notes/cuda.html#device-agnostic-code
import argparse
import torch

parser = argparse.ArgumentParser(description='PyTorch Example')
parser.add_argument('--disable-cuda', action='store_true',
                    help='Disable CUDA')
args = parser.parse_args()
args.device = None
if not args.disable_cuda and torch.cuda.is_available():
    args.device = torch.device('cuda')
else:
    args.device = torch.device('cpu')
    
# When loading the data
for i, x in enumerate(train_loader):
    x = x.to(args.device)
# When creating the model
model = Model().to(args.device)
</code></pre>
<h2 id="inpractice"><strong>In Practice</strong></h2>
<p>Actually, it's a brief conclusion. So in practice, we should:</p>
<ol>
<li>Write device-agnostic code that uses GPU by default and provide an argument to disable it.</li>
<li>Use pinned memory buffer and also asynchronous data transfer.</li>
<li>Use data parallel when you have multiple GPUs.</li>
<li>Use environment variable to control GPU visibility when you have multiple GPUs.</li>
</ol>
<blockquote>
<p>Post cover image from <a href="https://codeburst.io/quick-guide-for-setting-up-pytorch-in-window-in-2-mins-9342a84704a6">Quick Guide for setting up PyTorch with Window in 2 mins</a></p>
</blockquote>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[CSE 455 Computer Vision]]></title><description><![CDATA[<!--kg-card-begin: markdown--><blockquote>
<p><a href="https://courses.cs.washington.edu/courses/cse455/18sp/">CSE455: Computer Vision - Spring 2018</a><br>
I saw this course on <a href="https://github.com/pjreddie">pjreddie's GitHub page</a>, and found it intersting.👍<br>
It is an undergraduate course provided by <a href="https://www.cs.washington.edu/research/graphics/courses">School of Computer Science and Engineering</a> at <a href="https://www.washington.edu/">University of Washington</a>. I did the assignment for my personal interest.😋</p>
</blockquote>
<h2 id="solutionofassignments">Solution of Assignments📁</h2>
<p>My <a href="https://github.com/ivanpp/CSE455_Spring_2018">solution to the</a></p>]]></description><link>https://ivanpp.cc/cse455_uw/</link><guid isPermaLink="false">5d37d974545b0d07e1d9c049</guid><category><![CDATA[Computer Vision]]></category><category><![CDATA[Course]]></category><dc:creator><![CDATA[ivan Ding]]></dc:creator><pubDate>Mon, 10 Dec 2018 15:25:01 GMT</pubDate><media:content url="https://ivanpp.cc/content/images/2019/02/cse455.jpg" medium="image"/><content:encoded><![CDATA[<!--kg-card-begin: markdown--><blockquote>
<img src="https://ivanpp.cc/content/images/2019/02/cse455.jpg" alt="CSE 455 Computer Vision"><p><a href="https://courses.cs.washington.edu/courses/cse455/18sp/">CSE455: Computer Vision - Spring 2018</a><br>
I saw this course on <a href="https://github.com/pjreddie">pjreddie's GitHub page</a>, and found it intersting.👍<br>
It is an undergraduate course provided by <a href="https://www.cs.washington.edu/research/graphics/courses">School of Computer Science and Engineering</a> at <a href="https://www.washington.edu/">University of Washington</a>. I did the assignment for my personal interest.😋</p>
</blockquote>
<h2 id="solutionofassignments">Solution of Assignments📁</h2>
<p>My <a href="https://github.com/ivanpp/CSE455_Spring_2018">solution to the Assignments</a> includes codes to finish the homework and extra things to get the credits.</p>
<h2 id="stylethings">Style Things📗</h2>
<ol>
<li>‘for’ loop initial declarations are only allowed in C99 mode<br>
Although i could use <code>-std=c99</code> flag to tell the complier to use the C99, i think it's cooler to do the declaration out of the loop.<pre><code class="language-C">int i, j, k;
for (i = 0; i &lt; im.c; ++i){
  for (j = 0; j &lt; im.h; ++j){
    for (k = 0; k &lt; im.w; ++k){
      /*body*/
    }
  }
}
</code></pre>
</li>
<li>If statement<br>
My obsession: <code>if(expression)</code> for single-line things, and <code>if (expression){</code> for multiple lines. And always use <code>if(1)</code> or <code>if(0)</code> to enable/disable code snippet.<pre><code class="language-C">if(!sum) return;
</code></pre>
<pre><code class="language-C">if (a == LOGISTIC){
    d.data[i][j] *= x * (1 - x);
} else if (a == RELU){
    d.data[i][j] *= x &gt; 0 ? 1 : 0;
} else if (a == LRELU){
    d.data[i][j] *= x &gt; 0 ? 1 : 0.1;
}
</code></pre>
<pre><code class="language-C">if(0){
    /*disabled body*/
} else
{
    /*enabled body*/
}
</code></pre>
So i can search for <code>if(0)</code> to locate the snippet and do the switch quickly?<br>
Actually, i was not stick to this norm in this repository.😂</li>
<li>Always use <code>++i</code> when i have a choice</li>
</ol>
<h2 id="note">Note📝</h2>
<ol>
<li>
<p>Makefile<br>
<strong>TODO</strong> Should write a gist for it.</p>
</li>
<li>
<p>Complie with opencv(using MinGW)</p>
</li>
<li>
<p>struct with pointer inside it<br>
When we define a struct with at least one pointer in it.</p>
<pre><code class="language-C">typedef struct matrix{
    int rows, cols;
    double **data;
    int shallow;
} matrix;
</code></pre>
<p>We should write a function to allocate and initialize memory for it for safety amd convenience:</p>
<pre><code class="language-C">matrix make_matrix(int rows, int cols)
{
    matrix m;
    m.rows = rows;
    m.cols = cols;
    m.shallow = 0;
    m.data = calloc(m.rows, sizeof(double *));
    int i;
    for(i = 0; i &lt; m.rows; ++i) m.data[i] = calloc(m.cols, sizeof(double));
    return m;
}
</code></pre>
<p>And also a function to free the memory:</p>
<pre><code class="language-C">void free_matrix(matrix m)
{
    if (m.data) {
        int i;
        if (!m.shallow) for(i = 0; i &lt; m.rows; ++i) free(m.data[i]);
        free(m.data);
    }
}
</code></pre>
<p>Remember to call it to free the memory manually to avoid ⚠️segmentation fault.<br>
And also a funtion for deep copy(if necessary).</p>
<pre><code class="language-C">matrix copy_matrix(matrix m)
{
    int i,j;
    matrix c = make_matrix(m.rows, m.cols);
    for(i = 0; i &lt; m.rows; ++i){
        for(j = 0; j &lt; m.cols; ++j){
            c.data[i][j] = m.data[i][j];
        }
    }
    return c;
}
</code></pre>
</li>
<li>
<p>Never use struct with pointer inside it as intermediate varible in the expression<br>
in <code>./vision-hw4/src/classifier.c</code> i used to write things like this.</p>
<pre><code class="language-C">// THIS IS TOTALLY WRONG!
matrix backward_layer(layer *l, matrix delta)
{
    // back propagation through the activation
    gradient_matrix(l-&gt;out, l-&gt;activation, delta);
    
    // calculate dL/dw and save it in l-&gt;dw
    free_matrix(l-&gt;dw);
    matrix dw = matrix_mult_matrix(transpose_matrix(l-&gt;in), delta);
    l-&gt;dw = dw;
    
    // calculate dL/dx and return it.
    matrix dx = matrix_mult_matrix(delta, transpose_matrix(l-&gt;w));

    return dx;
}
</code></pre>
<p>It is totally wrong because the intermediate struct variable <code>transpose_matrix(l-&gt;in)</code> and <code>transpose_matrix(l-&gt;w)</code> will never ever be freed. And this stupid Python-like convenient writing will run out of ur memory. And fill it up with these intermediate garbage. Finally throw out a ⚠️segmentation fault.<br>
The <strong>right way</strong> to do this is:</p>
<pre><code class="language-C"> matrix backward_layer(layer *l, matrix delta)
{
    // back propagation through the activation
    gradient_matrix(l-&gt;out, l-&gt;activation, delta);
    
    // calculate dL/dw and save it in l-&gt;dw
    free_matrix(l-&gt;dw);
    matrix inT = transpose_matrix(l-&gt;in);
    matrix dw = matrix_mult_matrix(inT, delta);
    free_matrix(inT);
    l-&gt;dw = dw;
    
    // calculate dL/dx and return it.
    matrix wT = transpose_matrix(l-&gt;w);
    matrix dx = matrix_mult_matrix(delta, wT);
    free_matrix(wT);

    return dx;
}
</code></pre>
</li>
<li>
<p>String things could cause fatal mistake<br>
After finishing my code in <code>./vision-hw4</code>, i trained the model on my windows laptop and it worked well. But when i tried to use Linux to do the same thing, the training procedure just get crashed which gave me a 0% training and test accuracy.<br>
After debugging i found that i accidentally changed the line ending(of the file <code>mnist.labels</code>) from <code>LF</code> to <code>CRLF</code>, which is default on Windows.<br>
This converts all <code>\n</code>(represents line break on Linux) to <code>\r\n</code>(represents line break on Windows).<br>
So <code>num0\n</code> becomes <code>num0\r\n</code> in <code>mnist.labels</code>, so does the rest.<br>
And see <code>char *fgetl(FILE *fp)</code> function in <code>./src/data.c</code>. This function parses labels from the text file and stores labels for training and test phase.</p>
<pre><code class="language-C">char *fgetl(FILE *fp)
{
    if(feof(fp)) return 0;
    size_t size = 512;
    char *line = malloc(size*sizeof(char));
    if(!fgets(line, size, fp)){
        free(line);
        return 0;
    }

    size_t curr = strlen(line);

    while((line[curr-1] != '\n') &amp;&amp; !feof(fp)){
        if(curr == size-1){
            size *= 2;
            line = realloc(line, size*sizeof(char));
            if(!line) {
                fprintf(stderr, &quot;malloc failed %ld\n&quot;, size);
                exit(0);
            }
        }
        size_t readsize = size-curr;
        if(readsize &gt; INT_MAX) readsize = INT_MAX-1;
        fgets(&amp;line[curr], readsize, fp);
        curr = strlen(line);
    }
    if(line[curr-1] == '\n') line[curr-1] = '\0';

    return line;
}
</code></pre>
<p>And most importantly, this function looks for <code>\n</code> as a marker of line ending. So label <code>num0</code> becomes <code>num0\r</code>, so does the other labels.<br>
At the training phase, all the training samples will be considered as negative so does the test phase. Surprisingly but reasonablely, i got 0% for both training and test accuracy.<br>
Remember:</p>
<ol>
<li><code>LF</code> as a default option</li>
<li>Make string function compatible with both Linux and Windows</li>
</ol>
</li>
<li>
<p>More Extra Credit of vision-hw2(spherical coordinates)</p>
</li>
</ol>
<h2 id="resources">Resources📚</h2>
<ol>
<li>Text Book: <a href="http://szeliski.org/Book/drafts/SzeliskiBook_20100903_draft.pdf">Computer Vision: Algorithms and Applications</a> Rick Szeliski, 2010.</li>
<li>My solution: <a href="https://github.com/ivanpp/CSE455_Spring_2018">ivanpp/CSE455_Spring_2018</a></li>
</ol>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[Darknet - Convolutional Layer]]></title><description><![CDATA[<!--kg-card-begin: markdown--><blockquote>
<p>All GPU implementations have been ignored<br>
Written by ivanpp for fun, contact me: <a href="mailto:ding@ivanpp.me">ding@ivanpp.me</a></p>
</blockquote>
<h2 id="initializeaconvolutionallayer">Initialize a convolutional layer</h2>
<p>make_convolutional_layer</p>
<pre><code class="language-c">convolutional_layer make_convolutional_layer(int batch, int h, int w, int c, int n, int groups, int size, int stride, int padding, ACTIVATION activation, int batch_</code></pre>]]></description><link>https://ivanpp.cc/darknet-convolutional-layer/</link><guid isPermaLink="false">5d37d974545b0d07e1d9c03f</guid><category><![CDATA[Darknet]]></category><category><![CDATA[Computer Vision]]></category><dc:creator><![CDATA[ivan Ding]]></dc:creator><pubDate>Thu, 07 Jun 2018 15:40:36 GMT</pubDate><media:content url="https://ivanpp.cc/content/images/2018/06/darknet.jpg" medium="image"/><content:encoded><![CDATA[<!--kg-card-begin: markdown--><blockquote>
<img src="https://ivanpp.cc/content/images/2018/06/darknet.jpg" alt="Darknet - Convolutional Layer"><p>All GPU implementations have been ignored<br>
Written by ivanpp for fun, contact me: <a href="mailto:ding@ivanpp.me">ding@ivanpp.me</a></p>
</blockquote>
<h2 id="initializeaconvolutionallayer">Initialize a convolutional layer</h2>
<p>make_convolutional_layer</p>
<pre><code class="language-c">convolutional_layer make_convolutional_layer(int batch, int h, int w, int c, int n, int groups, int size, int stride, int padding, ACTIVATION activation, int batch_normalize, int binary, int xnor, int adam)
{
    int i;
    // create a convolutional_layer(layer) type variable l, initialize all struct members to 0.
    convolutional_layer l = {0};
    l.type = CONVOLUTIONAL;
    // Get the params
    l.groups = groups;  // optional: weight sharing across 'groups' channels
    l.h = h;  // input height
    l.w = w;  // input width
    l.c = c;  // input channels
    l.n = n;  // num of filters
    l.binary = binary;  // optional: ?
    l.xnor = xnor;  // optional: ?
    l.batch = batch;  // num of image per batch
    l.stride = stride;  // stride of the conv operation
    l.size = size;  // kernel size of filters
    l.pad = padding;  // padding of the conv operation
    l.batch_normalize = batch_normalize;  // optional: bn after conv

    // Allocate memory (for conv weight and conv weight_update)
    l.weights = calloc(c/groups*n*size*size, sizeof(float));  // stored as (n*(c/groups)*size*size)
    l.weight_updates = calloc(c/groups*n*size*size, sizeof(float));

    l.biases = calloc(n, sizeof(float));
    l.bias_updates = calloc(n, sizeof(float));

    l.nweights = c/groups*n*size*size;  // num of params for l.weights
    l.nbiases = n;  // num of params for l.biases

    // Initialize weights to random_uniform
    float scale = sqrt(2./(size*size*c/l.groups));
    for(i = 0; i &lt; l.nweights; ++i) l.weights[i] = scale*rand_normal();
    // Allocate memory (for forward and backward)
    int out_w = convolutional_out_width(l);  // compute output width
    int out_h = convolutional_out_height(l); // compute output height
    l.out_h = out_h;
    l.out_w = out_w;
    l.out_c = n;  // output channel should be num of filter, n
    l.outputs = l.out_h * l.out_w * l.out_c;
    l.inputs = l.w * l.h * l.c;

    l.output = calloc(l.batch*l.outputs, sizeof(float));  // for conv output(forward pass)
    l.delta  = calloc(l.batch*l.outputs, sizeof(float));  // for prev layer's gradient(backward pass)
	// Assign forward, backward and update function
    l.forward = forward_convolutional_layer;
    l.backward = backward_convolutional_layer;
    l.update = update_convolutional_layer;
    if(binary){
        l.binary_weights = calloc(l.nweights, sizeof(float));
        l.cweights = calloc(l.nweights, sizeof(char));
        l.scales = calloc(n, sizeof(float));
    }
    if(xnor){
        l.binary_weights = calloc(l.nweights, sizeof(float));
        l.binary_input = calloc(l.inputs*l.batch, sizeof(float));
    }

    if(batch_normalize){
        l.scales = calloc(n, sizeof(float));
        l.scale_updates = calloc(n, sizeof(float));
        for(i = 0; i &lt; n; ++i){
            l.scales[i] = 1;
        }

        l.mean = calloc(n, sizeof(float));
        l.variance = calloc(n, sizeof(float));

        l.mean_delta = calloc(n, sizeof(float));
        l.variance_delta = calloc(n, sizeof(float));

        l.rolling_mean = calloc(n, sizeof(float));
        l.rolling_variance = calloc(n, sizeof(float));
        l.x = calloc(l.batch*l.outputs, sizeof(float));
        l.x_norm = calloc(l.batch*l.outputs, sizeof(float));
    }
    if(adam){
        l.m = calloc(l.nweights, sizeof(float));
        l.v = calloc(l.nweights, sizeof(float));
        l.bias_m = calloc(n, sizeof(float));
        l.scale_m = calloc(n, sizeof(float));
        l.bias_v = calloc(n, sizeof(float));
        l.scale_v = calloc(n, sizeof(float));
    }

    l.workspace_size = get_workspace_size(l);
    l.activation = activation;  // which activation to use

    fprintf(stderr, &quot;conv  %5d %2d x%2d /%2d  %4d x%4d x%4d   -&gt;  %4d x%4d x%4d  %5.3f BFLOPs\n&quot;, n, size, size, stride, w, h, c, l.out_w, l.out_h, l.out_c, (2.0 * l.n * l.size*l.size*l.c/l.groups * l.out_h*l.out_w)/1000000000.);

    return l;
}
</code></pre>
<p><strong>Describe Sth please</strong></p>
<p>Optional params:</p>
<table>
<thead>
<tr>
<th style="text-align:center">Optional params</th>
<th style="text-align:center">Forward</th>
<th style="text-align:center">Backward</th>
<th style="text-align:center">Update</th>
<th style="text-align:center">Usage</th>
<th style="text-align:center">Defined in</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center"><a href="#groups">l.groups</a></td>
<td style="text-align:center">Y</td>
<td style="text-align:center">Y</td>
<td style="text-align:center">N</td>
<td style="text-align:center"></td>
<td style="text-align:center">[convolutional]</td>
</tr>
<tr>
<td style="text-align:center"><a href="#binary">l.binary</a></td>
<td style="text-align:center">TODO</td>
<td style="text-align:center">TODO</td>
<td style="text-align:center">TODO</td>
<td style="text-align:center"></td>
<td style="text-align:center">[convolutional]</td>
</tr>
<tr>
<td style="text-align:center"><a href="#xnor">l.xnor</a></td>
<td style="text-align:center">TODO</td>
<td style="text-align:center">TODO</td>
<td style="text-align:center">TODO</td>
<td style="text-align:center"></td>
<td style="text-align:center">[convolutional]</td>
</tr>
<tr>
<td style="text-align:center">l.batch_normalize</td>
<td style="text-align:center">Y</td>
<td style="text-align:center">Y</td>
<td style="text-align:center">N</td>
<td style="text-align:center">Regularization</td>
<td style="text-align:center">[convolutional]</td>
</tr>
<tr>
<td style="text-align:center">adam</td>
<td style="text-align:center"></td>
<td style="text-align:center"></td>
<td style="text-align:center"></td>
<td style="text-align:center">Optimization Algorithm</td>
<td style="text-align:center">[net]</td>
</tr>
</tbody>
</table>
<p><strong>Batch normalization and Adam will not be covered in this blog.</strong></p>
<h2 id="forward_convolutional_layer">forward_convolutional_layer()</h2>
<p>Image(or image like) input <em>*net.input</em> (given by the net) has the size of $[batch\times c\times h\times w]$, consider it as a $(batch, c, h, w)$ matrix.</p>
<p>Conv filter <em>*l.weights</em> has the size of $[n\times \frac{c}{groups}\times\ size\times size]$, consider it as a $(n, \frac{c}{groups}, size, size)$ matrix.<br>
Conv bias <em>*l.biases</em> has the size of $[n]$, namely 1 float bias for 1 filter.</p>
<p>Conv output <em>*l.output</em> has the size of $[batch\times n\times out_h\times out_w]$, consider it as a $(batch, n, out_h, out_w)$ matrix.</p>
<p>Conv workspace <em><strong>*l.workspace</strong></em> should have the size of $[\frac{c}{groups}\times out_h\times  out_w\times size\times size]$. But the actual size of the <em><strong>*net.workspace</strong></em>(workspace for all <em>conv/deconv/local</em> layers in a net) should be suitable for the layer that needs the most workspace memory. All conv/deconv/local layers <strong>share the workspace of the net</strong>. So the most of the <em>conv/deconv/local</em> layers are only using part of the net's workspace.</p>
<h3 id="convoperation">Conv Operation</h3>
<p>Use default value for all optional params:</p>
<pre><code class="language-c">net.adam = 0;
l.groups = 1;
l.batch_normalize = 0;
l.binary = 0;
l.xnor = 0;
</code></pre>
<p>We can get the minimal implementation:</p>
<pre><code class="language-c">// minimal implementation of conv forward
void forward_convolutional_layer_min(convolutional_layer l, network net)
{
    int i;

    fill_cpu(l.outputs*l.batch, 0, l.output, 1);

    int m = l.n;
    int k = l.size*l.size*l.c;
    int n = l.out_w*l.out_h;
    for(i = 0; i &lt; l.batch; ++i){
        float *a = l.weights;
        float *b = net.workspace;
        float *c = l.output + i*n*m;

        im2col_cpu(net.input + i*l.c*l.h*l.w, l.c, l.h, l.w, l.size, l.stride, l.pad, b);
        gemm(0,0,m,n,k,1,a,k,b,n,1,c,n);
    }

    add_bias(l.output, l.biases, l.batch, l.n, l.out_h*l.out_w);

    activate_array(l.output, l.outputs*l.batch, l.activation);
}
</code></pre>
<p>Now <em>*net.input</em> remains the same, has the size of $[batch\times c\times h\times\ w]$. For one single image in current batch, it has the size of $[c\times h\times w]$. And conv filter matrix has the shape of $(n,c,size,size)$.</p>
<p>Instead of using for loops to do conv operations at each input location using all the filters, we use <strong>im2col</strong> and then just do <strong>matrix multiplication</strong>.</p>
<pre><code class="language-c">// src/im2col.c
void im2col_cpu(float* data_im,
     int channels,  int height,  int width,
     int ksize,  int stride, int pad, float* data_col)
{
    int c,h,w;
    int height_col = (height + 2*pad - ksize) / stride + 1;  // height after reconstruct
    int width_col = (width + 2*pad - ksize) / stride + 1;  // width after reconstruct

    int channels_col = channels * ksize * ksize;  // filter(channels, ksize, ksize)
    for (c = 0; c &lt; channels_col; ++c) {  // flatten the filter
        int w_offset = c % ksize;  // from which column
        int h_offset = (c / ksize) % ksize;  // from which row
        int c_im = c / ksize / ksize;  // from which channel
        for (h = 0; h &lt; height_col; ++h) {  // iterate the reconstructed img(out_h*out_w)
            for (w = 0; w &lt; width_col; ++w) {
                // mapping reconstructed img to padded img(which col)
                int im_row = h_offset + h * stride;
                // mapping reconstructed img to padded img(which row)
                int im_col = w_offset + w * stride;
                // index of the data_col(reconstructed img)
                int col_index = (c * height_col + h) * width_col + w;
                data_col[col_index] = im2col_get_pixel(data_im, height, width, channels,
                        im_row, im_col, c_im, pad);  // mapping pixel by pixel
            }
        }
    }
}
</code></pre>
<p><em>im2col_cpu()</em> accepts 2 pointer as input. Input pointer(a.k.a. image data pointer) points to start address of the image input, <code>net.input + i*l.c*l.h*l.w</code>. Output pointer(a.k.a. <strong>col data</strong> pointer) points to the start address of the <strong>workspace</strong>, <code>b = net.workspace</code>.</p>
<p><em>im2col_cpu()</em> <strong>reconstruct</strong> image data $(c,h,w)$ to col data $(c\times size\times size, out_h\times out_w)$.</p>
<p><em>gemm()</em> stands for <strong>General Matrix Multiplication</strong>. So weight matrix $(n, c\times size\times size)$ multiplies col data matrix $(c\times size\times size, out_h, out_w)$, finally get the output matrix $(n, out_h, out_w)$, for one single image. Note that pointer for <em>*l.output</em> has already points to the right place <code>float *c = l.output + i*n*m</code>.</p>
<p>And for <em>batch</em> images, we will get the $(batch, n, out_h, out_w)$ output for <em>*l.output</em>.</p>
<p>Use <em>add_bias()</em> to <strong>add bias</strong> to <em>*l.output</em> and use <em>activate_array()</em> to pass through some chosen activation function. Conv forward done!</p>
<p>Just a review:</p>
<ol>
<li>Fill output with 0.0</li>
<li>For each image:
<ol>
<li>Convert input image data to col data</li>
<li>Matrix multiplication between weights and col data to get the conv output(without adding the bias)</li>
</ol>
</li>
<li>Add biases to the output</li>
<li>Pass through activation</li>
</ol>
<p><span id="groups"></span></p>
<h3 id="groupvision">Group Vision</h3>
<p><strong>Each image has been divided into <em>l.groups</em> groups</strong>, or more specifically, grouped by channels. So each group of image has the shape of $(\frac{c}{groups},h,w)$. Size of the filters is $[n\times \frac{c}{groups}\times size\times size]$. In other words, there are $l.n\times [\frac{c}{groups}\times size\times size]$ filters.</p>
<p>We don't use all the $l.n\times (\frac{c}{groups}, size, size)$ kernels to do conv operation with all $l.groups\times  (\frac{c}{groups},h,w)$ partial-channel image, we group the filters as well as the image(actually image channels) first. <strong>Conv kernels have also been divided into <em>l.groups</em> groups</strong>, so each filter group has $\frac{l.n}{l.groups}\times (\frac{c}{groups},h,w)$ filters.</p>
<p>The image (channel) groups and filter groups are <strong>one-to-one correspondent</strong>. $Gropu_j$ filters are only responsible for $Group_j$ image, like, sort of <strong>conv pair</strong>.</p>
<pre><code class="language-c">int i, j;

int m = l.n/l.groups;  // num of filters
int k = l.size*l.size*l.c/l.groups;  // len of filter
int n = l.out_w*l.out_h;  // len of output per output channel
for(i = 0; i &lt; l.batch; ++i){
    for(j = 0; j &lt; l.groups; ++j){
        float *a = l.weights + j*l.nweights/l.groups;
        float *b = net.workspace;
        float *c = l.output + (i*l.groups + j)*n*m;

        // use im2col_cpu() to reconstruct input for each (input, weight) pair
        im2col_cpu(net.input + (i*l.groups + j)*l.c/l.groups*l.h*l.w,
                   l.c/l.groups, l.h, l.w, l.size, l.stride, l.pad, b);
        // conv operation(actually matrix multiplication) for one pair
        gemm(0,0,m,n,k,1,a,k,b,n,1,c,n);
    }
}
</code></pre>
<p><em>*a</em> has the start address of $Group_j$ filters<br>
<em>*b</em> has the start address of the workspace<br>
<em>*c</em> has the start address of the output for $Group_j$ of $image_i$<br>
<em>net.input + (i*l.groups + j)*l.c/l.groups*l.h*l.w</em> gives the address for $Group_j$ of $image_i$</p>
<p>Using <em>im2col_cpu()</em> each $(\frac{c}{groups},h,w)$ partial-channel image will get the 'partial-channel col data' of shape $(\frac{c}{groups}\times size\times size, out_h, out_w)$. Along with its $(\frac{n}{groups},\frac{c}{groups}\times size\times size)$ filters pair, do the matrix multiplication, outcome will be $(\frac{n}{groups},out_h,out_w)$.</p>
<p>Concatenating $groups \times (\frac{n}{groups},out_h,out_w)$ output, we will get $(n,out_h,out_w)$ output for one image as usual. The output shape remains the same, regardless of using this group thing.</p>
<p>Just a review:</p>
<ol>
<li>Image channels are divided into groups, so do the filters</li>
<li>Image channel groups and filter groups are one-to-one correspondent, like conv pair. Do conv operation to each conv pair then concatenate to get the final output</li>
<li>Num of the filter parameters $l.nweights=n\times \frac{c}{groups}\times size\times size$, reduced by a factor of <em>l.groups</em></li>
<li>Num of float operations become $\frac{n}{groups}\times \frac{c}{groups}\times size\times size\times out_h\times out_w\times groups$, reduced by a factor of <em>l.groups</em></li>
<li>Some global information may be lost, because conv operations do not cross the conv pairs.</li>
</ol>
<p>E.G. <strong>No fuckin examples</strong> because it is stupid.</p>
<h3 id="binarythings">Binary things?</h3>
<h2 id="backward_convolutional_layer">backward_convolutional_layer()</h2>
<p><em>*l.delta</em> has the same size of <em>*l.output</em>, as it will store the gradients w.r.t the output of current conv layer.</p>
<p>What forward_convolutional_layer() should compute for each input $x$ is $y=x\ast W$, and what it actually does is to compute $y=W\times x_{col}$.</p>
<p>So for backward_convolutional_layer(), it computes:<br>
$\frac{\partial L}{\partial W}=\frac{\partial L}{\partial y}\cdot \frac{\partial y}{\partial W}=\frac{\partial L}{\partial y} \times {x_{col}}^T$<br>
$\frac{\partial L}{\partial x_{col}}=\frac{\partial L}{\partial y}\cdot \frac{\partial y}{\partial x_{col}}=W^T\times \frac{\partial L}{\partial y}$</p>
<pre><code class="language-c">void backward_convolutional_layer(convolutional_layer l, network net)
{
    int i, j;
    int m = l.n;
    int n = l.size*l.size*l.c;
    int k = l.out_w*l.out_h;
    // gradients pass through activation function
    gradient_array(l.output, l.outputs*l.batch, l.activation, l.delta);

    if(l.batch_normalize){
        backward_batchnorm_layer(l, net);
    } else {
        backward_bias(l.bias_updates, l.delta, l.batch, l.n, k);
    }

    for(i = 0; i &lt; l.batch; ++i){
        for(j = 0; j &lt; l.groups; ++j){
            float *a = l.delta + (i*l.groups + j)*m*k;
            float *b = net.workspace;
            float *c = l.weight_updates + j*l.nweights/l.groups;

            float *im = net.input+(i*l.groups + j)*l.c/l.groups*l.h*l.w;

            im2col_cpu(im, l.c/l.groups, l.h, l.w,
                    l.size, l.stride, l.pad, b);
            // compute gradients w.r.t. weights
            gemm(0,1,m,n,k,1,a,k,b,k,1,c,n);

            if(net.delta){  // if gradient descent continues(not the first layer)
                a = l.weights + j*l.nweights/l.groups;
                b = l.delta + (i*l.groups + j)*m*k;
                c = net.workspace;

                // compute gradients w.r.t. the reconstructed inputs(x_col)
                gemm(1,0,n,k,m,1,a,n,b,k,0,c,k);

                // reconstruct the im_col using col2im_cpu, resotre the struct,
                // and get the gradients w.r.t. the inputs
                col2im_cpu(net.workspace, l.c/l.groups, l.h, l.w, l.size, l.stride,
                    l.pad, net.delta + (i*l.groups + j)*l.c/l.groups*l.h*l.w);
            }
        }
    }
}
</code></pre>
<p>$X$ has the shape of $(batch,c,h,w)$, and $X_{col}$ has the shape of $(batch, c\times size\times size, out_h, out_w)$.<br>
For $Group_j$ in $Image_i$, $x_{col}^{ij}$ should have the shape of $(\frac{c}{groups}\times size\times size,out_h\times out_w)$.<br>
$W$ has the shape of $(n,\frac{c}{groups},size,size)$. And what is responsible for $x_{col}^{ij}$, $w^{ij}$ has the shape of $(\frac{n}{groups},\frac{c}{groups}\times size\times size)$.<br>
$\frac{\partial L}{\partial Y}$ has the shape of $(batch,n,out_h,out_w)$. What we need at a time is $\frac{\partial L}{\partial y^{ij}}$, has the shape of $(\frac{n}{groups},out_h\times out_w)$.</p>
<p>Given $\frac{\partial L}{\partial y^{ij}}$ and $x_{col}^{ij}$, call <em><strong>gemm(TA=0, TB=1, ...)</strong></em>, we will get  $\frac{\partial L}{\partial w^{ij}}=\frac{\partial L}{\partial y^{ij}}\times {x_{col}^{ij}}^T$, which has the shape of $(\frac{n}{groups}, \frac{c}{groups}\times size\times size)$. And will finally get the $\frac{\partial L}{\partial W}$, which will be stored in the memory block started from <em><strong>*l.weight_updates</strong></em>, of the size $(n,\frac{c}{groups},size,size)$.<br>
Also, give $w^{ij}$ and $\frac{\partial L}{\partial y^{ij}}$, call <em><strong>gemm(TA=1, TB=0)</strong></em>, we will get $\frac{\partial L}{\partial x_{col}^{ij}}={w^{ij}}^T\times \frac{\partial L}{\partial y^{ij}}$, which has the shape of $(\frac{c}{groups}\times size\times size, out_h\times out_w)$. And will finally get the $\frac{\partial L}{\partial X_{col}}$ of the shape $(batch, c\times size\times size, out_h, out_w)$, stored (start) from <em>*net.workspace</em>, $X_{col}$ will be overwritten.<br>
Using <em>col2im_cpu()</em>, $\frac{\partial L}{\partial X_{col}}$ will be reconstruct to $\frac{\partial L}{\partial X}$, stored in the memory block that start from <em><strong>*net.delta</strong></em>, of the size $(batch,c,h,w)$.</p>
<p>Just a review (again...):</p>
<ol>
<li>Backprop through activation function</li>
<li>Compute $\frac{\partial L}{\partial b}$, stored at <em>l.bias_updates</em></li>
<li>For each image(or each group/pair in each image):
<ol>
<li>Compute $\frac{\partial L}{\partial W}$, stored at <em>l.weight_updates</em></li>
<li>Compute $\frac{\partial L}{\partial x_{col}}$, stored at <em>net.workspace</em></li>
<li>Use <em>col2im_cpu()</em> to reconstruct $\frac{\partial L}{\partial x_{col}}$ to $\frac{\partial L}{\partial x}$, stored at <em>net.delta</em>, namely the <em>l.delta</em> of the previous layer</li>
</ol>
</li>
</ol>
<h2 id="update_convolutional_layer">update_convolutional_layer()</h2>
<pre><code class="language-c">void update_convolutional_layer(convolutional_layer l, update_args a)
{
    float learning_rate = a.learning_rate*l.learning_rate_scale;
    float momentum = a.momentum;
    float decay = a.decay;
    int batch = a.batch;

    axpy_cpu(l.n, learning_rate/batch, l.bias_updates, 1, l.biases, 1);
    scal_cpu(l.n, momentum, l.bias_updates, 1);

    if(l.scales){
        axpy_cpu(l.n, learning_rate/batch, l.scale_updates, 1, l.scales, 1);
        scal_cpu(l.n, momentum, l.scale_updates, 1);
    }

    axpy_cpu(l.nweights, -decay*batch, l.weights, 1, l.weight_updates, 1);
    axpy_cpu(l.nweights, learning_rate/batch, l.weight_updates, 1, l.weights, 1);
    scal_cpu(l.nweights, momentum, l.weight_updates, 1);
}
</code></pre>
<p><em>*l.weight_updates</em> has the same size of <em>*l.weights</em>, <em>*l.bias_update</em> has the same size of <em>*l.biases</em> as well.</p>
<!--kg-card-end: markdown-->]]></content:encoded></item></channel></rss>