Why You Shouldn’t Hate VirtualEnv and PIP

So a friend passed a link to me about Why [Someone] Hates Virtualenv and PIP.

Well, I also wrote my fair share of angry posts, but there is a lot of this that it is bothering me. Read it and then come back.

Back? Ok, let’s see…

Illusion of isolation

I think the argument is somewhat weak. What the author mentions in this section is basically “Virtualenv provides isolation for python things”, which basically is what the box says: “Virtual Python Environment builder”. I kinda understand that some people may confuse this as pure isolation but that is the same about complaining that people may use Word and think they can do math because it has tables.

But stop for a second and think: “Who would think Word can do math just because it says ‘Tables’ in the menus?” Well, there you have it. Seriously, if someone think virtualenv can provide a full isolation when the package clearly says “Python Environment”… well, they shouldn’t be coding anyway, right?

Full isolation

His point is on point: Yes, if you want full isolation, you’ll need another solution. He provides two, Vagrant and LXC (which stands for Linux
Containers). Thing is, a Vagrant environment is not an easy “5 seconds” process. Heck, it’s not even an easy “5 seconds” start process.

Vagrant, for those unaware, create a virtual machine, boots it, start a SSH session to it and provides a somewhat easy process to map a local directory to a directory inside your virtual machine. Vagrant provides a full isolation by creating a full operating system inside your operating system, based simply on a file (it’s Vagrantfile.rb, or something like that). But, again, it’s far from being a “5 second” process, creating or starting.

LXC (which, again, and keep this in mind, stands for Linux Containers) provides something like Vagrant, but apparently using Linux internal
virtualization system to create such machines. Unfortunately, after installing, I tried to use it but it requires some “templates”, which it can’t download from anywhere (which Vagrant does: It has its list of available templates, so you just pass the URL and it will download and create the machines — although it’s kinda hard to have two different OSes as base system). So, let’s say, it’s Vagrant with the “10 second” create/start. The problem with LXC is that it is tied to Linux and, thus, it would require everything to use Linux. While Linux is a nice operating system and all (and I use it as my primary OS these days), Python is not tied to a single operating system and we need a solution that works everywhere. Virtualenv works on Linux; virtualenv works on OS X; virtualenv works even on Windows; LXC works on Linux; LXC doesn’t work on OS X; LXC doesn’t work on Windows.

(The fact that LXC is even suggested makes the solution even mor silly if you check the blog title and it says “platform-agnostic python developer”. How can you suggest a platform specific solution if you are a platform-agnostic developer?)

If you need full isolation, the only real solution is Vagrant. Which is slow, even if that provides a full operating system isolation, which is way more than virtualenv provides — and, most of the time, way more than you need.

I’ll steal the point here and bring something here: Virtualenv is a nice way to have two different apps running under the same server. You can wrap both under different WSGIs (uWSGI or Caussette), provide two different ports for each and make NGinx provide each in different URIs. How would you do that with Vagrant of LXC? Install a different
NGinx inside each and use a third outside your virtual machines as load balancer? Make the outside NGinx access each via different ports, losing all the benefits Linux provides when dealing with sockets in the same machine? Either solution is stupid and moronic, specially if your apps are small/have low access count and virtualenv provides the perfect isolation for such situations.

Virtualenv for deployment

Here I’ll admit my ignorance and say that the only type of Python deployments I ever did were deployments for web apps. And really, what we did was simply create a virtualenv and install the packages. New version? No problem, just install the package in the virtualenv. Done.

(Actually, I had one experience with desktop deployment even before virtualenv existed — or was so widely know as it is today — but I guess that doesn’t count.)

So… no, virtualenv is not for deployments. You can use for deployment, but it’s not its primary function.

Also, if you need external dependencies (like the mysql-devel packages to be able to compile python-mysql), neither Vagrant nor LXC will help you there. You would need to install those even there (even worst, you can forget that you are using one of those and create your databases inside the virtual engine and, if something goes wrong with your installation, the whole data will be gone — and it’s really easy to forget such configuration things.)

Virtualenv is full of messy hacks

The whole “hacks” here is that you get a full package of Python inside your virtualenv. Well, this is needed because there are slightly changes even in the python standard libraries and virtualenv can create an environment for any python version installed. Thus, the packages must follow.

The binary inside the virtualenv also get changed to reflect a lot of stuff. I’ll admit that some things are silly — not stupid — because things will break if you change your virtualenv directory. But hey, that’s your fault for messing with the environment (or would you say that Vagrant can gracefully recover if you change the virtual machine image filename?).

If you need to run a Virtualenv’d python app in a cron job you’ll need to pass the virtualenv initialization, yes. But so should you check if your Vagrant engine is running (unless you put your cron job inside the vagrant engine, but then you’ll need to make sure the configuration file reflects the creation of the cron job, or it will be lost forever if need to recreate the environment). The same goes to LXC. If you forgot to start the virtualenv, or starting the Vagrant machine or start the LXC container, all 3 would fail. The fact that you need to start your virtualenv before calling the script doesn’t make any worse the the other options.

On top of that, if you need to keep going into virtualenvs to run your scripts, you’d do what any sysadmin worth its salt would do: Create a script to start it. That’s what virtualenv wrapper do — heck, even I wrote something like that already.

bin/activate

Nope, bin/activate is not exciting. Neither is Vagrantfile. But both do a lot of things in the background — setting PATHs, defining environment variables — which you don’t want to worry about. The fact that active changes your prompt is not “exciting” but it is a nice informative measure to tell you “hey, you are in a virtualenv now”. Do you want to make bin/activate “exciting”? Install powerline then.

Since we are talking about those “this thing starts a virtual environment/engine”, do Vagrantfile change anything to tell you you are in a virtual machine? Nope. Unless your virtual machine is using a different prompt, you’ll never know you are in a virtual machine for start!

(You will see differences in the prompts, yes, but that’s because people who upload the images for Vagrant actually change the original images prompts to reflect that — after all, all you’re doing is SSHing to a virtual machine. Or do you think Vagrant does a wrapper around SSH to change the prompt?)

And, since we are talking about scripts that suck, let’s talk about Vagrantfile, which is the most stupid idea I ever had (sorry, I need to go to rant mode now). A Vagrantfile is, basically, a Ruby script, with access to all Ruby can provide. If you can’t see the damage that can be done with it — or the pure laziness of its developers, which didn’t even care about writing a proper configuration file — seriously man, give up coding, for the sake of everyone else.

–no-site-packages

See the answer above about “messy hacks”: There is a reason things get cluttered inside the virtualenv and that’s due the versioning of packages inside the virtualenv.

I don’t even think it’s worth discussing this.

PIP and virtualenv buddies

I don’t know how to respond this. At first, it seems the author has a personal vendetta with Ian Bicking, which makes the point about both going hand-to-hand moot. Actually, the same can be said about Werkzeug + Flask + Jinja: “Oh, look, they fit so perfectly together, I bet it’s because Armin Ronacher wants to promote his personal philosophy and workflows”. Yes, if I said something like that, a giant “WAT” would appears on the top of your head. Thing is, Werkzeug + Flask + Jinja work so fine together because the author knows each inside and out and it makes easier to make one fit into the other — and the same goes with PIP and virtualenv.

Also, easy_install is not a solution. Easy_install do not have uninstall. Easy_install requires that you use an special option to record which files have been added/modified. PIP has none of those problems. And if you think “oh damn, this package isn’t needed anymore, better let it there” or “well, this package isn’t needed anymore, better destroy my virtualenv and create it again”, you’re doing package management wrong.

PIP builds from source

Anyone that had to deal with eggs know they sucked. Yes, they did. The whole concept of eggs is so broken that it’s being replaced (I think they new format is called “gears”, or something like that), but really, after so many installations, fuck binary installs of Python stuff.

The fact that PIP generate its install from the source is a good thing: It promotes a lot of clean storage of stuff, a proper setup.py for your project, a proper MANIFEST.in for your project, a proper project structure, a proper separation of each component and seriously, no freaking hacks to read non-python files inside your egg (try it, it’s terrible ’cause you need one behavior for development, when you have no eggs, and another when your project is packaged in one egg).

requirements.txt

PIP accepts a file as a list of requirements, yes, but you don’t need to name it “requirements.txt”; you can name it whatever you want. All you need to put in this file are the names of the packages your package/project requires. Just that. PIP does no magic over it.

The real magic happens when you read it inside your setup.py to provide the list of requirements to PIP/easy_install. And that’s it.

URIs as dependencies

Ok, semi-point. But it is not like "everyone is doing it, AMG!". Actually, I can’t remember any package that I used professionally (or even in my personal projects) that the author used an URI instead of the package name. Even in our projects, we always did create a company-wide PyPI with the company packages to deployment and as a cache for the official PyPI.

Can the fact that PIP accepts URIs be considered a problem? It can be abused, yes, but, as I put before, Vagrantfile can be abused in terrible ways, so maybe we should ban Vagrant too, right?

Actually, no. Vagrantfile, as stupid as it is, provides a lot of access to things that may be required when you’re creating your virtual machine, and so can URIs as requirements in that silly, stupid corner case.

But, again, no serious project uses URI in their requirements.

PIP freeze

Semi point again. I see a lot of people who go “I need this, and this, and this… Ok, everything here, let me create my requirements.txt by using pip freeze”, which is utterly wrong. But that doesn’t make “freeze” a bad option: It’s a pretty nice way to see what is installed in your environement. Or is “ls” a bad tool? Are stdin/stdout redirects a bad tool?

Conclusion

Dunno, some points are completely off the mark and the rest are semi-ok. I guess it was just a rant for the sake of ranting, nothing else.

It doesn’t mean virtualenv and pip don’t have their problems. But the fact that both are now part of the Python standard library may provide a cleaner implementation and a more tight implementation with the Python interpreter.

Auto-virtualenv, now with more magic

Following yesterday’s post about Auto-virtualenv trick, today I managed to fix the issue of “auto-virtualenv loses the virtualenv if you go into a subdirectory of the directory with the .venv“.

The only change is the _venv_cd function. All the other alias still remain the same.

function _upwards_search {
    venv=""
    curdir=`pwd`

    while [[ `pwd` != '/' ]]; do
        if [ -f ./.venv ]; then
            venv=`cat ./.venv`
            break
        fi
        cd ..
    done

    cd $curdir
    echo $venv;
}

function _venv_cd { 
    if [ ! -f $PWD/$1 -a "$VIRTUAL_ENV." != "."  ]; then 
        deactivate
    fi;
    \cd $1
    venv=$(_upwards_search)
    if [ -n "$venv" ]; then 
        venv $venv 
    fi
}
alias cd=_venv_cd

Next step: remove all this stuff from my .bashrc, move to a single file which can be easily sourced inside your on .bashrc and drop it in a repository somewhere.

NOTE: Apparently, there is something wrong with the test for empty venv. Hold down your horses for a sec.

NOTE 2: Ok, problem solved. Also, the repository is now live at https://bitbucket.org/juliobiason/auto-virtualenv.

My magical auto-virtualenv trick (without VirtualenvWrapper)

One thing that pissed me off a few days ago was working in a Python project with some modules and switching between virtualenvs every time[1]. So I quickly hacked a solution.

But before going further, let me say that the solution is highly based on VirtualenvWrapper — to the point that I’m using the same environment variables. I just didn’t want to install a whole package for a simple feature.

And, without further ado…

The whole thing started with two alias added in my .bashrc, one to create a virtualenv and another to “active” the virtualenv. Today, they look like this:

export WORKON_HOME=$HOME/Venv
function venv { source $WORKON_HOME/$1/bin/activate; }
function mkenv { virtualenv $WORKON_HOME/$1; venv $1; echo "$1" > ./.venv; }

Nothing fancy here: I’m using WORKON_HOME exactly as it is used with VirtualenvWrapper, to point the directory where all virtualenvs sit. Then, to avoid going full path to activate them, I can simply use venv <virtualenv-name> to activate any virtualenv and, finally, to create virtualenvs in the right directory, I have mkenv <virtualenv-name>. Simple as that.

One thing you may notice is that I’m saving the virtualenv name in a hidden file inside the current directory, called .venv. This is what makes the magic happen.

Then, I have this script + alias:

function _venv_cd { 
if [ ! -f $PWD/$1 -a "$VIRTUAL_ENV." != "."  ]; then 
    deactivate
fi;
\cd $1; 
if [ -f ./.venv ]; then 
    venv `cat ./.venv`; 
fi }
alias cd=_venv_cd

This basically replaces cd with my function, which checks if the target directory have a .venv and, if it does, activate the virtualenv (so I don’t need to use venv anymore in normal situations); if there is no .venv but a virtualenv is active, deactivate it.

The only problema I still have is that going up inside the project/module won’t check if there is a .venv in any of the parent directories and, thus, would disable the virtualenv.

[1] It was just a matter of “keeping each with their own”. The whole project goes around creating modules for a web framework and each module must be capable of working standalone, without the others.