Tuesday, December 22, 2015

Nerd Food: Dogen: The Package Management Saga

Nerd Food: Dogen: The Package Management Saga

We've just gone past Dogen's Sprint 75, so I guess it's time for one of those "reminiscing posts" - something along the lines of what we did for Sprint 50. This one is a bit more practical though; if you are only interested in the practical side, keep scrolling until you see "Conan".

So, package management. Like any other part-time C++ developer whose professional mainstay is C# and Java, I have keenly felt the need for a package manager when in C++-land. The problem is less visible when you are working with mature libraries and dealing with just Linux, due to the huge size of the package repositories and the great tooling built around them. However, things get messier when you start to go cross-platform, and messier still when you are coding on the bleeding edge of C++: either the package you need is not available in the distro's repos or even PPA's; or, when it is, its rarely at the version you require.

Alas, for all our sins, that's exactly where we were when Dogen got started.

A Spoonful of Dogen History

Dogen sprung to life just a tad after C++-0x became C++-11, so we experienced first hand the highs of a quasi-new-language followed by the lows of feeling the brunt of the bleeding edge pain. For starters, nothing we ever wanted was available out of the box, on any of the platforms we were interested in. Even Debian testing was a bit behind - probably stalled due to a compiler transition or other, but I can't quite recall the details. In those days, Real Programmers were Real Programmers and mice were mice: we had to build and install the C++ compilers ourselves and, even then, C++-11 support was new, a bit flaky and limited. We then had to use those compilers to compile all of the dependencies in C++-11 mode.

The PFH Days

After doing this manually once or twice, it soon stopped being fun. And so we solved this problem by creating the PFH - the Private Filesystem Hierarchy - a gloriously over-ambitious name to describe a set of wrapper scripts that helped with the process of downloading tarballs, unpacking, building and finally installing them into well-defined locations. It worked well enough in the confines of its remit, but we were often outside those, having to apply out-of-tree patches, adding new dependencies and so on. We also didn't use Travis in those days - not even sure it existed, but if it did, the rigmarole of the bleeding edge experience would certainly put a stop to any ideas of using it. So we used a local install of CDash with a number of build agents on OSX, Windows (MinGW) and Linux (32-bit and 64-bit). Things worked beautifully when nothing changed and the setup was stable; but, every time a new version of a library - or god forbid, of a compiler - was released, one had that sense of dread: do I really need to upgrade?

Since one of the main objectives of Dogen was to learn about C++-11, one has to say that the pain was worth it. But all of the moving parts described above were not ideal and they were certainly not the thing you want to be wasting your precious time on when it is very scarce. They were certainly not scalable.

The Good Days and the Bad Days

Things improved slightly for a year or two when distros started to ship C++-11 compliant compilers and recent boost versions. It was all so good we were able to move over to Travis and ditch almost all of our private infrastructure. For a while things looked really good. However, due to Travis' Ubuntu LTS policy, we were stuck with a rapidly ageing Boost version. At first PPAs were a good solution for this, but soon these became stale too. We also needed to get latest CMake as there are a lot of developments on that front, but we certainly could not afford (time-wise) to revert back to the bad old days of the PFH. At the same time, it made no sense to freeze dependencies in time, providing a worse development experience. So the only route left was to break Travis and hope that some solution would appear. Some alternatives were tried such as Drone.io but nothing was successful.

There was nothing else for it; what was needed was a package manager to manage the development dependencies.

Nuget Hopes Dashed

Having used Nuget in anger for both C# and C++ projects, and given Microsoft's recent change of heart with regards to open source, I was secretly hoping that Nuget would get some traction in the wider C++ world. To recap, Nuget worked well enough in Mono; in addition, C++ support for Windows was added early on. It was somewhat limited and a bit quirky at the start, but it kept on getting better, to the point of usability. Trouble was, their focus was just Visual Studio.

Alas, nothing much ever came from my Nuget hopes. However, there have been a couple of recent announcements from Microsoft that make me think that they will eventually look into this space:

Surely the logical consequence is to be able to manage packages in a consistent way across platforms? We can but hope.

Biicode Comes to the Rescue?

Nuget did not pan out but what did happen was even more unlikely: some crazy-cool Spaniards decided to create a stand alone package manager. Being from the same peninsula, I felt compelled to use their wares, and was joyful as they went from strength to strength - including the success of their open source campaign. And I loved the fact that it integrated really well with CMake, and that CLion provided Biicode integration very early on.

However, my biggest problem with Biicode was that it was just too complicated. I don't mean to say the creators of the product didn't have very good reasons for their technical choices - lord knows creating a product is hard enough, so I have nothing but praise to anyone who tries. However, for me personally, I never had the time to understand why Biicode needed its own version of CMake, nor did I want to modify my CMake files too much in order to fit properly with Biicode and so on. Basically, I needed a solution that worked well and required minimal changes at my end. Having been brought up with Maven and Nuget, I just could not understand why there wasn't a simple "packages.xml" file that specified the dependencies and then some non-intrusive CMake support to expose those into the CMake files. As you can see from some of my posts, it just seemed it required "getting" Biicode in order to make use of it, which for me was not an option.

Another thing that annoyed me was the difficulty on knowing what the "real" version of a library was. I wrote, at the time:

One slightly confusing thing about the process of adding dependencies is that there may be more than one page for a given dependency and it is not clear which one is the "best" one. For RapidJson there are three options, presumably from three different Biicode users:

  • fenix: authored on 2015-Apr-28, v1.0.1.
  • hithwen: authored 2014-Jul-30
  • denis: authored 2014-Oct-09

The "fenix" option appeared to be the most up-to-date so I went with that one. However, this illustrates a deeper issue: how do you know you can trust a package? In the ideal setup, the project owners would add Biicode support and that would then be the one true version. However, like any other project, Biicode faces the initial adoption conundrum: people are not going to be willing to spend time adding support for Biicode if there aren't a lot of users of Biicode out there already, but without a large library of dependencies there is nothing to draw users in. In this light, one can understand that it makes sense for Biicode to allow anyone to add new packages as a way to bootstrap their user base; but sooner or later they will face the same issues as all distributions face.

A few features would be helpful in the mean time:

  • popularity/number of downloads
  • user ratings

These metrics would help in deciding which package to depend on.

For all these reasons, I never found the time to get Biicode setup and these stories lingered in Dogen's backlog. And the build continued to be red.

Sadly Biicode the company didn't make it either. I feel very sad for the guys behind it, because their heart was on the right place.

Which brings us right up to date.

Enter Conan

When I was a kid, we were all big fans of Conan. No, not the barbarian, the Japanese Manga Future Boy Conan. For me the name Conan will always bring back great memories of this show, which we watched in the original Japanese with Portuguese subtitles. So I was secretly pleased when I found conan.io, a new package management system for C++. The guy behind it seems to be one of the original Biicode developers, so a lot of lessons from Biicode were learned.

To cut a short story short, the great news is I managed to add Conan support to Dogen in roughly 3 hours and with very minimal knowledge about Conan. This to me was a litmus test of sorts, because I have very little interest in package management - creating my own product has proven to be challenging enough, so the last thing I need is to divert my energy further. The other interesting thing is that roughly half of that time was taken by trying to get Travis to behave, so its not quite fair to impute it to Conan.

Setting Up Dogen for Conan

So, what changes did I do to get it all working? It was a very simple 3-step process. First I installed Conan using a Debian package from their site.

I then created a conanfile.txt on my top-level directory:

[requires]
Boost/1.60.0@lasote/stable

[generators]
cmake

Finally I modified my top-level CMakeLists.txt:

# conan support
if(EXISTS "${CMAKE_BINARY_DIR}/conanbuildinfo.cmake")
    message(STATUS "Setting up Conan support.")
    include("${CMAKE_BINARY_DIR}/conanbuildinfo.cmake")
    CONAN_BASIC_SETUP()
else()
    message(STATUS "Conan build file not found, skipping include")
endif()

This means that it is entirely possible to build Dogen without Conan, but if it is present, it will be used. With these two changes, all that was left to do was to build:

$ cd dogen/build/output
$ mkdir gcc-5-conan
$ conan install ../../..
$ make -j5 run_all_specs

Et voila, I had a brand spanking new build of Dogen using Conan. Well, actually, not quite. I've omitted a couple of problems that are a bit of a distraction on the Conan success story. Let's look at them now.

Problems and Their Solutions

The first problem was that Boost 1.59 does not appear to have an overridden FindBoost, which means that I was not able to link. I moved to Boost 1.60 - which I wanted to do any way - and it worked out of the box.

The second problem was that Conan seems to get confused with Ninja, my build system of choice. For whatever reason, when I use the Ninja generator, it fails like so:

$ cmake ../../../ -G Ninja
$ ninja -j5
$ ninja: error: '~/.conan/data/Boost/1.60.0/lasote/stable/package/ebdc9c0c0164b54c29125127c75297f6607946c5/lib/libboost_system.so', needed by 'stage/bin/dogen_utility_spec', missing and no known rule to make it

This is very strange because boost system is clearly available in the Conan download folder. Using make solved this problem. I am going to open a ticket on the Conan GitHub project to investigate this.

The third problem is more boost related than anything else. Boost Graph has not been as well maintained as it should, really. Thus users now find themselves carrying patches, and all because no one seems to be able to apply them upstream. Dogen is in this situation as we've hit the issue described here: Compile error with boost.graph 1.56.0 and g++ 4.6.4. Sadly this is still present on Boost 1.60; the patch exists in Trac but remains unapplied (#10382). This is a tad worrying as we make a lot of use of Boost Graph and intend to increase the usage in the future.

At any rate, as you can see, none of the problems were showstoppers, nor can they all be attributed to Conan.

Getting Travis to Behave

Once I got Dogen building locally, I then went on a mission to convince Travis to use it. It was painful, but mainly because of the lag between commits and hitting an error. The core of the changes to my YML file were as follows:

install:
<snip>
  # conan
  - wget https://s3-eu-west-1.amazonaws.com/conanio-production/downloads/conan-ubuntu-64_0_5_0.deb -O conan.deb
  - sudo dpkg -i conan.deb
  - rm conan.deb
<snip>
script:
  - export GIT_REPO="`pwd`"
  - cd ${GIT_REPO}/build
  - mkdir output
  - cd output
  - conan install ${GIT_REPO}
  - hash=`ls ~/.conan/data/Boost/1.60.0/lasote/stable/package/`
  - cd ~/.conan/data/Boost/1.60.0/lasote/stable/package/${hash}/include/
  - sudo patch -p0 < ${GIT_REPO}/patches/boost_1_59_graph.patch
  - cmake ${GIT_REPO} -DWITH_MINIMAL_PACKAGING=on
  - make -j2 run_all_specs
<snip>

I probably should have a bash script by know, given the size of the YML, but hey - if it works. The changes above deal with installation of the package, applying the boost patch and using Make instead of Ninja. Quite trivial in the end, even though it required a lot of iterations to get there.

Conclusions

Having a red build is a very distressful event for a developer, so you can imagine how painful it has been to have red builds for several months. So it is with unmitigated pleasure that I got to see build #628 in a shiny emerald green. As far as that goes, it has been an unmitigated success.

In a broader sense though, what can we say about Conan? There are many positives to take home, even at this early stage of Dogen usage:

  • it is a lot less intrusive than Biicode and easier to setup. Biicode was very well documented, but it was easy to stray from the beaten track and that then required reading a lot of different wiki pages. It seems easier to stay on the beaten track with Conan.
  • as with Biicode, it seems to provide solutions to Debug/Release and multi-platforms and compilers. We shall be testing it on Windows soon and reporting back.
  • hopefully, since it started Open Source from the beginning, it will form a community of developers around the source with the know-how required to maintain it. It would also be great to see if a business forms around it, since someone will have to pay the cloud bill.

In terms of negatives:

  • I still believe the most scalable approach would have been to extend Nuget for the C++ Linux use case, since Microsoft is willing to take patches and since they foot the bill for the public repo. However, I can understand why one would prefer to have total control over the solution rather than depend on the whims of some middle-manager in order to commit.
  • it seems publishing packages requires getting down into Python. Haven't tried it yet, but I'm hoping it will be made as easy as importing packages with a simple text file. The more complexity around these flows the tool adds, the less likely they are to be used.
  • there still are no "official builds" from projects. As explained above, this is a chicken and egg problem, because people are only willing to dedicate time to it once there are enough users complaining. Having said that, since Conan is easy to setup, one hopes to see some adoption in the near future.
  • even when using a GitHub profile, one still has to define a Conan specific password. This was not required with Biicode. Minor pain, but still, if they want to increase traction, this is probably an unnecessary stumbling block. It was sufficient to make me think twice about setting up a login, for one.

In truth, these are all very minor negative points, but still worth making them. All and all, I am quite pleased with Conan thus far.

Created: 2015-12-22 Tue 14:00

Emacs 24.5.1 (Org mode 8.2.10)

Validate

Monday, December 21, 2015

Nerd Food: Interesting...

Nerd Food: Interesting…

Time to flush all those tabs again. Some interesting stuff I bumped into recently-ish.

Finance, Economics, Politics

Startups et al.

General Coding

Databases

  • What's new in PostgreSQL 9.5: The RC's are starting and 9.5 looks to continue the trend of amazing Postgres releases. My only missing wish is for native (and full) support for bitemporality really, though to be fair Temporal Tables is probably enough for my needs.

C++

  • Optimizing software in C++: One to bookmark now but to digest later. A whole load of stuff on optimisation.
  • Support for Android CMake projects in Visual Studio: So, as if the latest patches to Clang hadn't been enough, MS now decides to add support for CMake in Visual Studio. A bit embryonic, and a bit too android focused, but surely it should be extensible for more regular C++ use. Whats going on at MS? This is all far too cool to be true.
  • Quickly Loading Things From Disk: interesting analysis about the state of affairs of serialisation in C++. I'll probably require a few passes to fully digest it.
  • Beyond ad-hoc automation: leveraging structured platforms: I've been consuming this presentation slowly but steadily. It deals with a lot of the questions we all have about the new world of containers and microservices, and it seems vital to learn from experience before one finds oneself in a much bigger mess than the monolith could ever get you into. Bridget Kromhout talks intelligently about the subject.

Layman Science

Other

Created: 2015-12-21 Mon 23:31

Emacs 24.5.1 (Org mode 8.2.10)

Validate

Friday, December 11, 2015

Nerd Food: Pull Request Driven Development

Nerd Food: Pull Request Driven Development

Being in this game for the best part of twenty years, I must confess that its not often I find something that revolutionises my coding ways. I do tend to try a lot of things, but most of them end up revealing themselves as fads or are incompatible with my flow. For instance, I never managed to get BDD to work for me, try as I might. I will keep trying because it sounds really useful, but it hasn't clicked just yet.

Having said all of that, these moments of enlightenment do occasionally happen, and when they do, nothing beats that life-changing feeling. "Pull Request Driven Development" (or PRDD) is my latest find. I'll start by confessing that "PRDD" as a name was totally made up for this post and hopefully you can see its rather tongue in cheek. However, the benefits of this approach are very real. In fact, I've been using PRDD for a while now but I just never really noticed its presence creeping in. Today, as I introduced a new developer to the process, I finally had the eureka moment and saw just how brilliant it has been thus far. It also made me realise that some people are not aware of this great tool in the developer's arsenal.

But first things first. In order to explain what I mean by PRDD, I need to provide a bit of context. Everyone is migrating to git these days, even those of us locked behind corporate walls; in our particular case, the migration path implied exposure to Git Stash. For those not in the know, picture it as an expensive and somewhat less featureful version of GitHub, but with most of the core functionality there. Of course, I'm sure GitHub is not that cheap for enterprises either, but hey at least its the tool everyone uses. Anyway - grumbling or not - we moved to Stash and all development started to revolve around Pull Requests (PRs), raised for each new feature.

Not long after PRs were introduced, a particularly interesting habit started to appear: developers begun opening the PRs earlier and earlier during the feature cycle rather than waiting to the very end. Taking this approach to the limit, the idea is that when you start to work on a new feature, you raise the ticket and the PR before you write any code at all. In practice - due to Stash's anachronisms - you need to push at least one commit, but the general notion is valid. This was never mandated anywhere, and there was no particular coordination. I guess one possible explanation for this behaviour is that one wants to get rid of the paperwork as quickly as possible to get to the coding. At any rate, the causes may be obscure but the emerging behaviour was not.

When you combine early PRs with the commit early and commit often approach - which you should be using anyway - the PR starts to become a living document; people see your development work as it progresses and they start commenting on it and possibly even sending you patches as you go along. In a way, this is an enabler for a very efficient kind of peer programming - particularly if you have a tightly knit team - because it gives you maximum parallelism but in a very subtle, non-noticeable way. The main author of the PR is coding as she would normally be, but whenever there is a lull in development - those moments where you'd be browsing the web for five minutes or so - you can quickly check for any comments on your PR and react to those. Similarly, other developers can carry on doing their own work and browse the PRs on their downtime; this allows them to provide feedback whenever it is convenient to them, and to choose the format of the feedback - lengthy or quick, as time permits.

Quick feedback is many a times invaluable in large code bases because everyone tends to know their own little corner of the code and only very few old hands know how it all hangs together. Thus, seemingly trivial one liners such as "have you considered using API xyz instead of rolling your own" or "don't forget to do abc when you do that" could save you many hours of pain and enable knowledge to be transferred organically - something that no number of wiki pages could hope to achieve in a million years because its very difficult to find these pearls in a sea of uncurated content. And because you committed early and often, each commit is very small and very easy to parse in a small interval of time, so people are much more willing to review - as opposed to that several Kb (or even Mb!) patch that you will have to allocate a day or two for. Further: if you take your commit message seriously - as, again, you should - you will find that the number of reviewers will grow rapidly simply because developers are nosy and opinionated.

Note that this review process involves no vague meetings and no lengthy and unfocused email chains; it is very high-quality because it is (or can be) very focused to specific lines of code; it causes no unwanted disruptions because you review where and when you choose to review; reviewers can provide examples and even fix things themselves if they so choose; it is totally inclusive because anyone who wants to participate can, but no one is forced to; and it equalises local and remote developers because they all have access to the same data (modulus some IRL conversations that always take place) - an important feature in this world of near-shoring, off-shoring and home-working. Most importantly, instead of finding out some fundamental errors of approach at the end of an intense period of coding, you now have timely feedback. This saves an enormous amount of time - an advantage that anyone who has been through lengthy code reviews and then spent a week or two reacting to that feedback can appreciate.

I am now a believer in PRDD. So much so that whenever I go back to work on legacy projects in svn, I find myself cringing all the way to the end of the feature. It just feels so nineties.

Update: As I finished penning this post and started reflecting about it it suddenly dawned on me that a lot of things we now take for granted are only possible because of git. And I don't mean DVCS', I specifically mean git. For example PRDD is made possible to a large extent because committing in git is a reversible process and history can be fluid if required. This means that people are not afraid of committing, which in turn enables a lot of the goodness I described above. Many DVCS' didn't like this way of viewing history - and to be fair, I know of very few people that liked the idea until they started using it. Once you figure out what it is good for (and not so good for), it suddenly becomes an amazing tool. Git is full of little decisions like this that at first sight look either straight insane or just not particularly useful but then turn out to change entire development flows.

Created: 2015-12-11 Fri 13:12

Emacs 24.5.1 (Org mode 8.2.10)

Validate

Wednesday, December 09, 2015

Nerd Food: Interesting...

Nerd Food: Interesting…

Time to flush all those tabs again. Some interesting stuff I bumped into recently-ish.

Finance, Economics, Politics

Startups et al.

General Coding

Databases

C++

  • New ELF Linker from the LLVM Project: LLVM keeps on delivering! Now a new ELF linker. To be totally honest, I haven't even started using Gold in anger - I get the feeling the LLVM linker is going to be transitioned in much quicker than Gold.
  • Clang with Microsoft CodeGen in VS 2015 Update 1: OMG, OMG how cool is this - MSFT decided to create a backend for Clang that is totally compatible with MSVC AND open source it! This is just insane. This means for example that you now can develop C++ on Windows without ever having to use MSVC and Visual Studio. It also means you can cross-compile from Linux into Windows with 100% certainty things will work. It means that projects like Wine and ReactOS can start thinking about a migration path into Clang (not quite as simple as it may sound but surely makes sense). CLion with Clang on Windows will rock. The possibilities are just endless. I never quite understood what C2 was all about until I read this announcement - suddenly it all makes sense. This is fantastic news.

Layman Science

Other

  • NoiseRV Live: Still discovering this Portuguese musician, but love his work. Great concert. Could do a little bit less talking between songs, but still - artists prerogative and all that.
  • Warm Focus: Winging It: Interesting set of "intelligent dance music" as we used to call it back in the day.
  • Mosaic - The “First” Web Browser: Super-cool podcasts about internet history. It would be great to have something like this for UNIX!
  • Jackson C. Frank (1965): Tragic musician from the 60s. Great tunes.
  • Reason in common sense: Always wanted to read Santayana properly. Started, but I guess it will be a very long exercise. Interesting, if somewhat strange book.
  • Ceu - jazz baltica Live (2010): New find, Brazilian musician Ceu.

Created: 2015-12-09 Wed 12:49

Emacs 24.5.1 (Org mode 8.2.10)

Validate