<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Tidbits on untidy bytes]]></title><description><![CDATA[Tidbits on untidy bytes]]></description><link>https://byteauthor.com</link><generator>RSS for Node</generator><lastBuildDate>Tue, 14 Apr 2026 05:55:01 GMT</lastBuildDate><atom:link href="https://byteauthor.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Making the monolith testable]]></title><description><![CDATA[To write reproducible and isolated tests, we need two things :

Isolate global mutable state to the same scope as the test. Either the state can be scoped to the test or the test needs to isolate that global state.
Control the inputs and outputs of a...]]></description><link>https://byteauthor.com/making-the-monolith-testable</link><guid isPermaLink="true">https://byteauthor.com/making-the-monolith-testable</guid><category><![CDATA[Testing]]></category><category><![CDATA[TDD (Test-driven development)]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[System Architecture]]></category><dc:creator><![CDATA[Kevin Coulombe]]></dc:creator><pubDate>Sat, 09 Apr 2022 16:48:42 GMT</pubDate><content:encoded><![CDATA[<p>To write reproducible and isolated tests, we need two things :</p>
<ul>
<li>Isolate global mutable state to the same scope as the test. Either the state can be scoped to the test or the test needs to isolate that global state.</li>
<li>Control the inputs and outputs of a piece of code. That means faking the network, database, disk, clock, etc.  </li>
</ul>
<p>We can combine an <a target="_blank" href="https://netflixtechblog.com/ready-for-changes-with-hexagonal-architecture-b315ec967749">Hexagonal Architecture</a> with <a target="_blank" href="https://livebook.manning.com/book/dependency-injection-principles-practices-patterns/about-this-book/">Dependency Injection</a> to produce a testable services for the next steps. Bringing a legacy monolith to such a testable architecture has been my main job for a few years now. It is a large endeavor, but it can be done testable piece by testable piece to incremental reap the benefits.</p>
<p>Alternatively (or temporarily), the entire monolith can be treated as a black box service, but that requires preparing files and databases prior to the test and possibly launching the <a target="_blank" href="https://en.wikipedia.org/wiki/System_under_test">SuT</a> in isolated processes. Tests will be slow, coverage analysis will be less meaningful because of inter-dependencies and environment variations will cause flakiness.</p>
<h2 id="heading-components">Components</h2>
<p>I will refer to each individual service in the hexagonal architecture as a Component to make it clear this is not necessarily part of a microservice ecosystem. I'll also classify the Adapters that connect Components as either Contracts or Protocols.</p>
<p>A Contract may be an actual REST/gRPC API, an object oriented interface, a dynamic library with exported functions, etc. It is an immediate dependency from one Component to another with the usual versioning requirements of a synchronous dependency.</p>
<p>A Protocol may be a file format, a database schema, the JSON schema of a message queue, etc. It is an asynchronous dependency and it has more complex backward and sometimes forward combability requirements. As such, the protocol is usually versioned in itself without being bound to the version of the consumer. These are the hardest things to change in a system and designing them deserves the appropriate amount of effort.</p>
<h2 id="heading-test-coverage">Test Coverage</h2>
<p>Testers, product managers and developers see test coverage quite differently. The opposing views come from measuring code coverage VS requirements coverage. </p>
<p>Code coverage is pretty easy to monitor, but that measure is like a badly trained AI : 100% confident yet no less wrong. So smart people dug into that problem and made up new more stringent criteria like covering all <a target="_blank" href="https://en.wikipedia.org/wiki/Modified_condition/decision_coverage">branches permutations</a> or <a target="_blank" href="https://stryker-mutator.io/">mutation testing</a>. But that leads to an astronomic number of tests and it doesn't really help choosing which of those tests are the most valuable.</p>
<p>On the other hand, requirements coverage sounds great. Make a test for everything you want to keep working. The value of the test is a combination of the value of the feature times the risk of breaking it in the future. That can work quite fine except for two things :</p>
<ul>
<li>The list of requirements for a brown field project of a moderate size is not easy to gather.</li>
<li>Without clear contracts and protocols between components, the whole of the system needs to be tested as one single big black box. It requires significant engineering investments to build the test infrastructure around a monolith.</li>
</ul>
<p>I like to use a combination of these methods : Use code coverage gaps to discover untested requirements. And in large monolithic systems, I've actually also found dead code this way a few times...</p>
<h2 id="heading-how-to-test">How to test</h2>
<p>Coverage analysis gives us way to list and prioritize what to test, but it doesn't help with the "how". </p>
<p>A good starting point is the <a target="_blank" href="https://twitter.com/kentcdodds/status/960723172591992832">testing trophy</a>. </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1642688511921/ruTVZojSR.png" alt="image.png" /></p>
<ul>
<li><p>E2E tests are used to find Component availability and Contract/Protocol versioning issues. It launches the entire product and might uses UI Automation to do a shallow test of a Component. If one of these tests fails, either there an issue with the testing infrastructure or the whole Component is out. In the case of microservices, this would run in production as much as possible to monitor services availability.</p>
</li>
<li><p>Component tests are the acceptance criteria for the feature. It launches only the Component with faked dependencies, so it should run fairly quickly and represent the largest investment in test automation. When a component test fails, its name and failure details should indicate what part of the feature is broken for end users.</p>
</li>
<li><p>Unit tests serve as development accelerators. Test-Driven development using Component tests should work well most of the time, but implementing specific algorithms sometimes works better with tests of a smaller scope.</p>
</li>
<li><p>Static tests is an interesting addition to the typical vision of tests. This refers to static analysis which is a bit outside of the scope of this post, but is enormously important nevertheless.</p>
</li>
</ul>
<p>A side-quest worth mentioning is non-functional requirement tests, specifically performance. These could be Component tests like any other, but they need to be run on specific hardware and the pass condition is difficult to choose. An easier alternative is Monitoring, especially in a DevOps world. </p>
<p>For software that is operated by the customer on their own hardware, some industries will allow analytics collection on their system, but not ours. We therefore maintain a staging system and a benchmark suite as representative of typical user scenarios as possible.</p>
]]></content:encoded></item><item><title><![CDATA[Go slow and break things]]></title><description><![CDATA[It takes one person to change a lightbulb. Maybe a second person to hold the ladder. Any more people assigned to the job will slow it down.
Go slow
Assigning too many people to a project is a pernicious way of slowing it down. The number of people on...]]></description><link>https://byteauthor.com/go-slow-and-break-things</link><guid isPermaLink="true">https://byteauthor.com/go-slow-and-break-things</guid><category><![CDATA[team]]></category><category><![CDATA[System Architecture]]></category><category><![CDATA[process]]></category><category><![CDATA[project management]]></category><category><![CDATA[management]]></category><dc:creator><![CDATA[Kevin Coulombe]]></dc:creator><pubDate>Sat, 22 Jan 2022 19:36:51 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1642880086482/oMVRv3aTY.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>It takes one person to change a lightbulb. Maybe a second person to hold the ladder. Any more people assigned to the job will slow it down.</p>
<h2 id="heading-go-slow">Go slow</h2>
<p>Assigning too many people to a project is a pernicious way of slowing it down. The number of people on a project increase communication and Conway's Law tells us this directly affect complexity. </p>
<p>There are many products being developed and maintained by huge teams, but this requires huge investments in <strong>processes</strong> and <strong>architecture</strong>.</p>
<p>Processes like tests and reviews optimize the flow of communication. Anyone who would be misaligned with the vision that is encoded into processes will be stopped and realigned before they can introduce significant chaos.</p>
<p>Software architecture tries to maintain the relation between complexity and time as linear as possible. It attempts to isolate complexity behind boundaries and abstractions to benefit from information hiding. </p>
<h2 id="heading-break-things">Break things</h2>
<p>The original "Go fast and break things" motto was meant not to let that one customer still using Windows XP from tying you down to .NET 4.0 (I still know of a case like that still in 2022!). It was never meant to let the software degrade into a <a target="_blank" href="http://www.laputan.org/mud/">big ball of mud</a>.</p>
<p>But I'm subverting it for the catchy title. I mean "break things" as in the bugs that come from the increase in complexity. Even the most experienced developers with great processes and architecture will produce more bugs as complexity increases.</p>
<p>Ask developers on a 30 people or more project where the bugs come from and most will tell you that their services are well designed. It's the upstream team that is misusing their API and the downstream team that is putting constraints on their API that are preventing value delivery.</p>
<p>This is perfectly normal, especially when developing interdependent green field components in parallel. The contracts on those APIs need to be negotiated and no amount of overthinking can bypass that experimentation phase for any significantly complex problem. That negotiation builds a common vocabulary and understanding that lets people develop the systems-level thinking required to solve real world problems. I hope to write about how I believe linguistic determinism applies to software design someday...</p>
<p>For the functional aspects of the APIs, the negotiation is quite straightforward. It's the non-functional aspects that slip under the radar and into production issues. Performance/Scalability, error handling, what fields are sensitive data and should not be written in trace logs, etc.</p>
<h2 id="heading-team-sizing">Team sizing</h2>
<p>Resource allocation is a complex topic affected by hiring/retention constraints from below as well as strategic initiatives and ROI vs cost of opportunity considerations from above. It is an incredibly complex problem, and every situation is unique.</p>
<p>Yet there are patterns to discern. I've seen many teams of a 1 to 3 people start or take over a project and scale up gradually as they gained new customers and maintain their velocity with great success. I've also seen a few instances of a new 10 or more people team being formed to work on a project and stand still for over a year, hardly delivering any value.</p>
]]></content:encoded></item><item><title><![CDATA[Who's talking to my cloud?]]></title><description><![CDATA[It's the second time we get into this situation. It always begins the same. Someone says : "We'll do authentication after proving the design".
Then a few months into the project, it's time to implement authentication. It takes a few weeks to sync all...]]></description><link>https://byteauthor.com/whos-talking-to-my-cloud</link><guid isPermaLink="true">https://byteauthor.com/whos-talking-to-my-cloud</guid><category><![CDATA[Devops]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[Azure]]></category><category><![CDATA[monitoring]]></category><dc:creator><![CDATA[Kevin Coulombe]]></dc:creator><pubDate>Sat, 08 Jan 2022 15:18:31 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1641655139495/N4TJTxc79.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>It's the second time we get into this situation. It always begins the same. Someone says : "We'll do authentication after proving the design".</p>
<p>Then a few months into the project, it's time to implement authentication. It takes a few weeks to sync all services and clients.</p>
<h2 id="heading-are-we-all-good-then">Are we all good then?</h2>
<p>Many months later, the monitoring still looks like this...</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1641651402157/I6-IpcjKf.png" alt="image.png" /></p>
<p>You'd think 400 class status codes are client side request problems so why are we monitoring them? The answer is that this is still pre-release. Those would be pretty valuable to find client side errors and report them to the correct team if the monitoring wasn't completely flooded...</p>
<p>Those failed requests are coming from 5-6 clients running on tester VMs or IoT devices and no one on the cloud services team can locate them...  An intern that left an application running on some shared infrastructure and went back to school? A tester who forgot about that one off test he did on one of his physical machines?</p>
<p>The request IP is the corporate public address or that of a data center and there is no identifiable information in the payload because privacy protection good practices justly mandate it. And without authentication, there is no way to trace it back to anyone...</p>
<p>Every now and then, one of those is found and shutdown so the number of failed requests dips a little as we see in this graph. It's the little victories...</p>
<p>In this case, it's much worse since the old Proof of Concept client side application had a bug where the backoff mechanism didn't work and it just retried those errors in a tight loop...</p>
<h2 id="heading-lessons-learned">Lessons learned</h2>
<p>When building an API, before even the first client connects...</p>
<ul>
<li>Make sure you can identify your clients even if authentication isn't a requirement.</li>
<li>Add the client application version in the request header. It will make it so much easier to retire deprecated functionalities.</li>
</ul>
<p>In any case, next time someone says "We'll do authentication after proving the design", I'll have a war story to link them!</p>
]]></content:encoded></item><item><title><![CDATA[Changing development machine on  Windows]]></title><description><![CDATA[Anything I work on is either natively in the cloud like design documents and emails or it has a common distributed backup strategy with a few hundreds of replicas. I mean git repositories... 
Everything else is transient on this machine, from unpubli...]]></description><link>https://byteauthor.com/changing-development-machine-on-windows</link><guid isPermaLink="true">https://byteauthor.com/changing-development-machine-on-windows</guid><category><![CDATA[Developer Tools]]></category><category><![CDATA[devtools]]></category><category><![CDATA[automation]]></category><category><![CDATA[Windows]]></category><category><![CDATA[visual studio]]></category><dc:creator><![CDATA[Kevin Coulombe]]></dc:creator><pubDate>Sat, 18 Dec 2021 05:47:07 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1639806861477/WlDI4M2jR.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Anything I work on is either natively in the cloud like design documents and emails or it has a common distributed backup strategy with a few hundreds of replicas. I mean git repositories... </p>
<p>Everything else is transient on this machine, from unpublished commits to investigation files for an on-site issue.</p>
<p>So what makes changing machine complicated is the <strong>machine configuration</strong> and <strong>installed applications</strong>. Those need to be backed up somehow and I'll humbly share how I do it. Your mileage may vary.</p>
<h1 id="heading-changing-machines-the-painful-way">Changing machines the painful way</h1>
<p>I am always surprised to see programmers who will manually install everything they need whenever they get a new machine. I've seen many people postpone machine upgrades for months because of the overhead of transferring to the new machine.</p>
<p>And then when they do switch machines, <strong>there is a problem</strong>. It usually sounds like "one component is missing in my Visual Studio installation and an F# library won't build on my machine". </p>
<p>With a sufficiently large software, these issues amount to a list that ends up in a knowledge base and <a target="_blank" href="https://en.wikipedia.org/wiki/Tribal_knowledge">tribal knowledge</a> gets built around the solutions. Countless hours are spent investigating these issues multiple times within each silo in the organization. I've even seen a case where the generally accepted solution was to install the company's software, then uninstall it so the dependencies will be present for the local build to succeed!</p>
<p>Simpler microservices and especially <a target="_blank" href="https://docs.docker.com/desktop/dev-environments/">containerized development environments</a> will change the game by adding the dependencies of the software under development as part of a source controlled container definition. Yet there will remain a certain number of applications which need to be installed, even if it's just git and some way to run the container. I think solving the problem of actually configuring a development machine in a repeatable manner will remain worthwhile for the foreseeable future.</p>
<h2 id="heading-yearly-is-a-pain-monthly-is-a-breeze">Yearly is a pain, monthly is a breeze</h2>
<p>The problem actually is quite similar to that very common backup strategy where we focus entirely the backup procedure instead of the <a target="_blank" href="https://www.joelonsoftware.com/2009/12/14/lets-stop-talking-about-backups/">restore procedure</a> . We're sure we documented everything, but the problems emerge when we try to consume this documentation.</p>
<p>If we all reimaged our development machines every few months, any sane developer would have automated their machine setup. Next thing you'd know, teams would have put the common bits of that automation together in a shared starter package.</p>
<p>And as with automated tests, infrastructure as code and other routine automation, a big part of the value of automating the task doesn't come from the time saved not doing it manually, but from sharing the knowledge in the most unambiguous way possible : Code.</p>
<h1 id="heading-the-teams-setup-script">The team's setup script</h1>
<p>Part of our machine setup is shared and maintained by the team in a shared version controlled location.</p>
<p>We use Chocolatey to install a few common packages, then install Visual Studio. The script also clones the mono-repo and builds the monolith application, but I'll leave that out. If you're using a more modern containerized multi-repo architecture, the bits of script below are only dependent on the .NET technology stack and git workflow.</p>
<h2 id="heading-chocolatey-and-packages">Chocolatey and packages</h2>
<p>I'll leave the actual installation of Chocolatey to the official documentation since they have changed it over the last year and it will keep up to date longer : https://docs.chocolatey.org/en-us/choco/setup</p>
<p>And what follows is a subset of our agreed upon choice of tools.</p>
<pre><code class="lang-Powershell">choco install git.install --params "/NoAutoCrlf /SChannel /NoShellIntegration /GitOnlyOnPath"
choco install linqpad -y
choco install sysinternals -y
choco install everything -y
choco install sql-server-express -y
choco install sql-server-management-studio -y
</code></pre>
<h2 id="heading-visual-studio">Visual Studio</h2>
<p>This next script installs Visual Studio 2022. </p>
<pre><code class="lang-Powershell"># Visual Studio 2022
(New-Object System.Net.WebClient).DownloadFile("https://aka.ms/vs/17/release/vs_enterprise.exe", (Get-Item .).FullName + "/vs_enterprise.exe")
Start-Process "vs_enterprise.exe" -ArgumentList "--update --passive --wait" -Wait # Updates the Installer
Start-Process "vs_enterprise.exe" -ArgumentList "--passive --wait" -Wait # Installs Visual Studio 2022 if not present
Start-Process "vs_enterprise.exe" -ArgumentList "update --passive --wait" -Wait # Updates Visual Studio 2022
Remove-Item "vs_enterprise.exe"
</code></pre>
<p>And then apply the common  <a target="_blank" href="https://devblogs.microsoft.com/setup/configure-visual-studio-across-your-organization-with-vsconfig/">.vsconfig file</a>. In our case, the file is stored on a git repository which is cloned by the script.</p>
<p>If Visual Studio was not initially installed, it usually requires a reboot to proceed at this point, but the script is idempotent and can be executed again after the reboot to continue with the setup.</p>
<pre><code class="lang-Powershell">$VSInstallPath = &amp; "${Env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -version 17.0 -property installationPath
Start-Process "${Env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vs_installer.exe" -ArgumentList "modify --config ""PathToVsConfig\.vsconfig"" --installPath ""$VSInstallPath""" -Wait
</code></pre>
<p>Visual Studio 2022 may have turned 64 bits and moved to "C:/Program Files", but Microsoft <a target="_blank" href="https://github.com/Microsoft/vswhere/wiki/Installing">guaranteed to maintain the path</a> for vswhere.exe and vs_installer.exe back in Visual Studio 2017, so we can safely hardcode the "Program Files(x86)" path for the foreseeable future.</p>
<h1 id="heading-personal-customization">Personal customization</h1>
<p>Now this is where the $PROFILE file comes in for those parts of the setup that are custom to me.</p>
<p>I maintain an idempotent "setup-machine" Powershell function with the following.</p>
<h2 id="heading-rider">Rider</h2>
<p>First off, I am a big fan of Jetbrains' <a target="_blank" href="https://www.jetbrains.com/rider/">Rider</a>. I use it for all of my main C# development and only fallback to Visual Studio for C++ and memory dump investigations.</p>
<p>Rider has a Chocolatey package that is regularly updated, but it doesn't add Rider to the PATH. Also, its installation path changes every minor version, so maintaining any sort of alias is painful. And their direct download link changes every minor version as well.</p>
<p>I currently install Rider using chocolatey and maintain an alias to it manually. I would be grateful for any better solution!</p>
<pre><code class="lang-Powershell">choco install jetbrains-rider -y
</code></pre>
<h2 id="heading-git-and-pull-requests">Git and pull requests</h2>
<p>I use the following script to create Pull Requests for Azure DevOps directly from the command line.</p>
<pre><code class="lang-Powershell">function gitlogin
{
    echo SECRET-PRIVATE-ACCESS-TOKEN | az devops login --organization https://AzureDevopsServerUrl.com/DefaultCollection
}

choco install azure-cli -y
refreshenv

az extension add --name azure-devops
</code></pre>
<p>Replace AzureDevOpsServerUrl with the URL to your DevOps Server SECRET-PRIVATE-ACCESS-TOKEN with your own  <a target="_blank" href="https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/use-personal-access-tokens-to-authenticate">PAT</a> .</p>
<p>Using a PAT is only necessary when using an on-premise <a target="_blank" href="https://azure.microsoft.com/en-us/services/devops/server/">Azure DevOps Server</a>. It will also show messages about not being supported, but if you're running a relatively recent version of Azure DevOps Server, it should work fine for the simple use cases.</p>
<p>This line sets up an alias for <code>az repos pr</code> as <code>git pr</code> ( <a target="_blank" href="https://docs.microsoft.com/en-us/cli/azure/repos/pr">documentation</a> ). </p>
<pre><code class="lang-Powershell">az devops configure --use-git-alias true
</code></pre>
<p>The following also makes git push a little easier to use by automatically creating the missing remote branch when pushing a new local branch.</p>
<pre><code class="lang-Powershell">git config --global push.default current
</code></pre>
<p>Combining these lets the normal git flow look like this : </p>
<pre><code class="lang-Powershell">gitlogin
git checkout -b user/kcoulombe/MyNewBranch
git add .
git commit -m "A very informative commit message"
git push
git pr create --open
</code></pre>
<p>The last command will open a browser tab with the PullRequest open to adjust the description.</p>
<h2 id="heading-more-apps">More apps</h2>
<p>This is my personal preference of tools.</p>
<pre><code class="lang-Powershell">Install-Module -Name posh-git
choco install powershell-core -y
choco install microsoft-windows-terminal -y
choco install paint.net -y
choco install powertoys -y
</code></pre>
<p>In order, these were...</p>
<ol>
<li><a target="_blank" href="https://github.com/dahlbyk/posh-git">posh-git</a>  : Adds current branch information within the Powershell command prompt.</li>
<li><a target="_blank" href="https://github.com/PowerShell/PowerShell">Powershell Core</a>  : The next version of Powershell</li>
<li><a target="_blank" href="https://github.com/microsoft/terminal">Windows Terminal</a>  : Making command line on Windows easier</li>
<li><a target="_blank" href="https://www.getpaint.net/">Paint.Net</a> : Mostly just for editing screenshots</li>
<li><a target="_blank" href="https://docs.microsoft.com/en-us/windows/powertoys/">Windows PowerToys</a>  : Power User tools for Windows</li>
</ol>
<p>I have excluded a few other more niche applications like <a target="_blank" href="https://mh-nexus.de/en/hxd/">HxD</a> ,  <a target="_blank" href="https://github.com/microsoft/perfview">PerfView</a> and quite a few internal tools.</p>
]]></content:encoded></item><item><title><![CDATA[Sudoku solver]]></title><description><![CDATA[This is an old post from August 9th, 2010
I recently had to build an optimized Sudoku puzzle solver for a homework which turned into a contest between students. I was part of a team of three and we competed again six similar teams… And we won! Here, ...]]></description><link>https://byteauthor.com/sudoku-solver</link><guid isPermaLink="true">https://byteauthor.com/sudoku-solver</guid><category><![CDATA[Java]]></category><category><![CDATA[Games]]></category><category><![CDATA[algorithms]]></category><dc:creator><![CDATA[Kevin Coulombe]]></dc:creator><pubDate>Sat, 18 Dec 2021 02:30:50 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1639752798415/dDu_JO7ge.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is an old post from August 9th, 2010</p>
<p>I recently had to build an optimized Sudoku puzzle solver for a homework which turned into a contest between students. I was part of a team of three and we competed again six similar teams… And we won! Here, I will discuss the algorithm we used.</p>
<h1 id="heading-the-problem-context">The problem context</h1>
<p>First, the Sudoku grid is provided as a simple text file like this example.</p>
<pre><code><span class="hljs-number">030605000</span>
<span class="hljs-number">600090002</span>
<span class="hljs-number">070100006</span>
<span class="hljs-number">090000000</span>
<span class="hljs-number">810050069</span>
<span class="hljs-number">000000080</span>
<span class="hljs-number">400003020</span>
<span class="hljs-number">900020005</span>
<span class="hljs-number">000908030</span>
</code></pre><p>The position of each digits is just the same as on the Sudoku grid and a zero represents an unknown value (which the algorithm must figure out). The files always ends with an extra line break after the last line and contains no unnecessary white-space. This is important, otherwise the program return an error message.  This example maps to the following Sudoku grid.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1639494057518/hDBH9raaq.png" alt="sudoku1.png" /></p>
<p>The time spent to read and parse the problem file is not counted towards the algorithm’s total resolution time.</p>
<p>The algorithm must be implemented in Java and was to be run on a Quad core 3.2 GHz with 4 GB RAM, but we did not take advantage of multithreading because of the thread creation overhead. For example, on my machine, resolving the example Sudoku takes anywhere between 1 and 3 milliseconds while creating a thread and switching to it takes at least 0.5 millisecond. Here is the code used to test the overhead.</p>
<pre><code class="lang-java"><span class="hljs-keyword">final</span> <span class="hljs-keyword">long</span> time = System.nanoTime();

Thread thread = <span class="hljs-keyword">new</span> Thread() {
    <span class="hljs-meta">@Override</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">run</span><span class="hljs-params">()</span> </span>{
        System.out.println(<span class="hljs-string">"end: "</span> + (System.nanoTime() - time));
    }
};

thread.run();

<span class="hljs-keyword">try</span> {
    thread.join();
}
<span class="hljs-keyword">catch</span> (InterruptedException ex) {}
</code></pre>
<p>Another aspect of the problem context is very important as it allowed us to forget about an edge case in which our algorithm does not perform well. The assumption is that the Sudoku grid is always set to have only one solution. This makes a common <a target="_blank" href="https://en.wikipedia.org/wiki/Backtracking">backtracking</a> only algorithms take a lot more time than our solution. Such an algorithm would be better suited to calculate one possible solution for an empty or almost empty board. On the other hand, our algorithm which makes a lot of rules based decisions is unable to efficiently apply rules in such an open-ended scenario and thus spends considerably more time on an empty board than a naive backtracking algorithm does.</p>
<p>Explaining the <a target="_blank" href="https://www.sudoku.name/rules/en">basic rules of Sudoku</a> is quite outside the scope of this post, but here I will define a few terms for the sake of clarity.</p>
<p>“Sections” are the 3×3 sub-grids (often separated by darker lines on Sudoku boards)
“Numbers”, “digits” or “values” are the numerals from 1 through 9 that are placed on the board. On our board, zero means an unknown value.
“Square” or “position” represents the location of a digit determined by its x and y coordinates.</p>
<h1 id="heading-representing-the-board">Representing the board</h1>
<p>We have chosen to represent the board as an 9×9 integer matrix. As we discovered later, this is not the most efficient solution. Java treats a two dimensional array as an array of arrays. This means accessing a cell in a matrix costs twice the time of accessing an element in an array, once for the column and once for the row. Unlike C++ and C#, Java does not support <a target="_blank" href="https://en.wikipedia.org/wiki/Array_data_structure#Multidimensional_arrays">real multidimensional arrays</a>. In light of this, we should have used a single dimension array.</p>
<p>In any case, we used a 9×9 integer matrix to represent the board as in the file, thus zero means the digits at this position is unknown. Accompanying the board is the “allowedValues” matrix. This matrix contains all the possible values for each position in a 9×9 matrix of integer bit fields. For each of these bit field, it is possible to determine if a certain “value” is legal using this calculation.</p>
<pre><code class="lang-Java"><span class="hljs-keyword">boolean</span> isValueLegal = ((allowedValues[x][y] &amp; (<span class="hljs-number">1</span> &lt;&lt; (value - <span class="hljs-number">1</span>))) != <span class="hljs-number">0</span>);
</code></pre>
<p>In practice, we determined that it was more efficient to cache the possible masks into a static array called “allowedBitFields”. I would have thought accessing the value in the array would have taken longer than these mathematical operations, but profiling proved me wrong (likely due to the JIT compiler optimizing behind the scenes since the cache array is final).</p>
<h1 id="heading-code-structure">Code structure</h1>
<p>For the sake of efficiency, we created a single SudokuSolver class with only static private final methods and properties so there is no state for the program to manage (except for “main”, obviously). We also made heavy use of final method arguments and variables whenever possible.</p>
<p>Some loops even declare a final variable to hold the current iterated value if it was used enough for a performance gain to be measurable. Consider this method which applies the correct value for every square which have only one possible value left according to the “allowedValues” matrix.</p>
<pre><code class="lang-java"><span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">final</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">int</span> <span class="hljs-title">moveNothingElseAllowed</span><span class="hljs-params">(<span class="hljs-keyword">final</span> <span class="hljs-keyword">int</span>[][] board,
        <span class="hljs-keyword">final</span> <span class="hljs-keyword">int</span>[][] allowedValues)</span> </span>{

    <span class="hljs-keyword">int</span> moveCount = <span class="hljs-number">0</span>;

    <span class="hljs-keyword">for</span> (<span class="hljs-keyword">int</span> x = <span class="hljs-number">0</span>; x &lt; <span class="hljs-number">9</span>; x++) {
        <span class="hljs-keyword">final</span> <span class="hljs-keyword">int</span>[] allowedValuesRow = allowedValues[x];

        <span class="hljs-keyword">for</span> (<span class="hljs-keyword">int</span> y = <span class="hljs-number">0</span>; y &lt; <span class="hljs-number">9</span>; y++) {
            <span class="hljs-keyword">final</span> <span class="hljs-keyword">int</span> currentAllowedValues = allowedValuesRow[y];

            <span class="hljs-keyword">if</span> (countSetBits(currentAllowedValues) == <span class="hljs-number">1</span>) {
                setValue(board, allowedValues, getLastSetBitIndex(
                        currentAllowedValues), x, y);

                moveCount++;
            }
        }
    }

    <span class="hljs-keyword">return</span> moveCount;
}
</code></pre>
<h1 id="heading-the-algorithm">The algorithm</h1>
<p>The whole magic starts with the “solveBoard” method. It has two types of tools at its disposal to solve a Sudoku puzzle: rules and hypothesis (or brute force). An hypothesis is the equivalent of the common backtracking solution. We set a hypothesis such as the value of a square among the available values and then try to solve the modified board. If we cannot, then that was not a valid move, so we try another instead. If the program tries all the possible values for a square and none succeeds, then it declares the puzzle impossible to crack.</p>
<p>Using hypothesis only is very inefficient. We already know from the way people solve Sudoku puzzles that many moves can be logically deduced from the position. Thus, we implemented four rules that were shown to reduce the time taken to solve most puzzles. These are as follow:</p>
<ol>
<li>No other value is allowed according to the allowed values matrix.</li>
<li>A certain value is allowed in no other square in the same section.</li>
<li>A certain value is allowed in no other square in the same row or column.</li>
<li>A certain value is allowed only on one column or row inside a section, thus we can eliminate this value from that row or column in the other sections.</li>
</ol>
<p>Rules 2 and 3 are very similar and may benefit from being merged together as a single rule. However, they are not the same as rule 1 which only checks the “allowedValues” matrix to see if a position allows only one value and sets it. It is very fast. Rules 2 and 3 instead goes on all positions on the board, and for every legal value for these positions, checks if any other positions on the same row, column or section allow that value. If no other position the row (for example) allow a certain value, then we can be certain that the position we are testing has that value.</p>
<p>This diagram illustrates rule 2. The small numbers in each square represent the legal values for each position.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1639753226215/a7CYQKPBT.jpeg" alt="Example of rule #2" /></p>
<p>In the top left square, there were multiple possible values, but since “2” was allowed in no other square, then we can safely say that “2” belongs to the top left square. The same logic applies to rows and columns and is implemented as rule 3.</p>
<p>As for rule 4, it is the most complex of our rules and has the lowest benefits. Yet, in most Sudoku puzzles, it still improves performance, even if by a lower margin than the other rules. This diagram shows an application of this rule.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1639793650640/DqbylulPS.jpeg" alt="Example for rule #4" /></p>
<p>As we can see, in the top left section, the value “1” is allowed only in the second column. This means “1” is forced to be in that column, even if we do not know what row it is in. Using this property, we can remove “1” from the possible values in the same column in other sections. This rule does not let us set values directly. It simply supports the first three rules.</p>
<p>The program runs these four rules in a loop. We have witnessed that, whenever an iteration places less than four values, then it is more efficient to attempt a hypothesis than to continue using the rules. After doing a hypothesis, the program calls the “solveBoard” method recursively and runs the rules on the “test” board.</p>
<h1 id="heading-the-legal-stuff-and-acknowledgements">The legal stuff and acknowledgements</h1>
<p>As everything provided on this site, you are free to do whatever you want with it. You don’t even have to mention me (although I’ll appreciate it). However, I’ll ask for one thing in return… If it doesn’t work, don’t sue me! Really, I provide no implied warranty, guarantee of fitness for any purpose or whatsoever. If you have any issue using it, post a comment and I’ll help you out because I’m a nice guy (!), but unless you’re willing to pay me for it, I won’t work for you… probably (I’m a really nice guy).</p>
<p>Also, thanks to my teammates on this project: Mathieu Lacasse-Potvin and Michael Badeau.</p>
<p>The program was part of an assignment for the class LOG320 at  <a target="_blank" href="https://www.etsmtl.ca/">École de Technologie Supérieure</a>  in Montréal during summer 2010.</p>
<h1 id="heading-the-code">The code</h1>
<p>The original version of the code is now available on Github :  <a target="_blank" href="https://github.com/stonkie/SudokuSolverV1">SudokuSolverV1</a> </p>
<p>An updated version of this algorithm is upcoming...</p>
]]></content:encoded></item><item><title><![CDATA[Blogging circa 2010]]></title><description><![CDATA[Twelve years ago, I figured I'd build a technical blog and write about projects and ideas. I already did some development consulting and I wanted to leave an anchor. Here I am. I'm on the web. I exist. Hello world!
The main blogging platform even bac...]]></description><link>https://byteauthor.com/blogging-circa-2010</link><guid isPermaLink="true">https://byteauthor.com/blogging-circa-2010</guid><category><![CDATA[Blogging]]></category><category><![CDATA[WordPress]]></category><dc:creator><![CDATA[Kevin Coulombe]]></dc:creator><pubDate>Tue, 14 Dec 2021 14:54:54 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1639492552956/Uv-ieG88N.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Twelve years ago, I figured I'd build a technical blog and write about projects and ideas. I already did some development consulting and I wanted to leave an anchor. Here I am. I'm on the web. I exist. Hello world!</p>
<p>The main blogging platform even back then was WordPress and I never had much interest in Front-End development, so the barest minimum was good enough with me.</p>
<h1 id="heading-the-not-so-good-old-days">The not so good old days</h1>
<p>Back in 2010, bare minimum still meant stitching DNS with a WebHost with a WordPress installation with a Plugin selection... SSL/TLS certificates meant spending as much as the hosting cost and having to manually renew certificates using an obscure command line every two years, so... I didn't need no encryption! </p>
<p>Also, Virtual Private Servers (basically Virtual Machines) were expensive, so the cheapest solution was actually Shared Hosting : hundreds of user accounts on a Linux box each separated by only permissions and eventually cgroups against bad neighbors problems.</p>
<p>So I wrote a few blog posts, a few of which kept getting traffic even 10 years later, but...</p>
<h1 id="heading-what-killed-it">What killed it</h1>
<p>Then I went from consultant to full time dev with a teaching side gig and a now 2 years old daughter. Life happened. Blogging happened no longer. And all along, there was this very annoying WordPress installation and its cocktail of plugins that I had to go and update regularly.</p>
<p>So when my web host WebFaction got bought by GoDaddy and told me they couldn't migrate my account, I started looking around. But then, that WordPress backup format kind of locks me in to WordPress, doesn't it? </p>
<h1 id="heading-what-now">What now</h1>
<p>These days there are plenty of SaaS offerings for blogging platforms. But if I'm going to convert those old blog posts to a new format, I'd rather it was the last time. So I want a standard format.</p>
<p>My perfect blogging platform would be a SaaS offering that dynamically loads up LaTex files from a Github repository. However, I figured building my own blogging engine just to recover those old blog posts was turning into a pretty long <a target="_blank" href="https://en.wiktionary.org/wiki/yak_shaving">yak shaving</a> chain...</p>
<p>So I settled with <a target="_blank" href="https://hashnode.com/">HashNode</a>. It has great reviews, uses Markdown and free is a price you can't beat.</p>
<p>I'll be reposting a bunch of my old blog posts here in the coming weeks.</p>
]]></content:encoded></item></channel></rss>