神刀安全网

My work for Debian in May

No double posting this time 😉

I’ve got not so much spare time this month to spend on Debian, but I could work on the following packages:

  • golang-github-hpcloud-tail/1.0.0+git20160415.b294095-3: put versioned dependency & rebuild against golang-fsnotify/1.3.0-3 to fix FTBFS on ppc64el.

  • updates: packer/0.10.1-1, pybtex/0.20.1-1, afl/2.12b-1, afl/2.13b-1, pyutilib/5.3.5-1.

  • new packages: golang-github-azure-go-ntlmssp/0.0~git20160412.e0b63eb-1 (needed by Packer 0.10.1), and python-latexcodec/1.0.3-1 (needed by Pybtex 0.20).

  • prospector/0.11.7-7 fixed for reproducible builds: there were variations in the sorting order of dependencies in prospector.egg-info/requires.txt . I’ve prepared a patch to make the package reproducible again (that problem began with 0.11.7-5) before the proposed toolchain patch for setuptools ( #804249 ) gets accepted.

  • python-latexcodec/1.0.3-3 also fixed for reproducible builds ( #824454 ).

This series of blog postings also includes little introductions of and into new packages in the archive. This month there is:

Pyinfra

Pyinfra is a new project which is currently still in development state. It has been already pointed out in an interesting German article, and is now available as package maintained within the Python Applications Team. It’s currently a one man production by Nick Barrett, and eagerly developed in the past weeks (we’re currently at 0.1~dev24).

Pyinfra is a remote server configuration/provisioning/service deployment tool which belongs in the same software category like Puppet or Ansible. It’s for provisioning one or an array of remote servers with software packages and to configure them. Pyinfra runs agentless like Ansible, that means for using it nothing special (like a daemon) has to run on targeted servers. It’s written to be used for provisioning POSIX compatible Linux systems and has alternatives when it comes to special features like package managers (e.g. supports apt as well as yum). The documentation could be found in usr/share/doc/pyinfra/html/ .

Here’s a little crash course on how to use Pyinfra: The pyinfra CLI tool is used on the command line like this, deploy scripts, single operations or facts (see below) could be used on a single server or a multitude of remote servers:

$ pyinfra -i <inventory script/single host> <deploy script> $ pyinfra -i <inventory script/single host> --run <operation> $ pyinfra -i <inventory script/single host> --facts <fact>

Remote servers which are operated on must provide a working shell and they must be reachable by SSH. For connecting, --port , --user , --password , --key / --key-password and --sudo flags are available, --sudo to gain superuser rights. Root access or sudo rights of course have to be already set up. By the way, localhost could be operated on the same way.

Single operations are organized in modules like "apt", "files", "init", "server" etc. With the --run option they could be used individually on servers like follows, e.g. server.user adds a new user on a single targeted system ( -v adds verbosity to the pyinfra run):

$ pyinfra -i 192.0.2.10 --run server.user sam --user root --key ~/.ssh/sshkey --key-password 123456 -v

Multiple servers can be grouped in inventories, which hold the targeted hosts and data associated with them, like e.g. an inventory file farm1.py would contain lists like this:

COMPUTE_SERVERS = ['192.0.2.10', '192.0.2.11'] DATABASE_SERVERS = ['192.0.2.20', '192.0.2.21']

Group designators must be all caps. A higher level of grouping are the file names of inventory scripts, thus COMPUTE_SERVERS and DATABASE_SERVERS can be referenced to at the same time by the group designator farm1 . Plus, all servers are automatically added to the group all . And, inventory scripts should be stored in the subfolder inventory/ in the project directory. Inventory files then could be used instead of specific IP addresses like this, the single operation then gets performed on all given machines in farm1.py :

$ pyinfra -i inventory/farm1.py  --run server.user sam --user root --key ~/.ssh/sshkey --key-password=123456 -v

Deployment scripts could be used together with group data files in the subfolder group_data/ in the project directory. For example, a group_data/farm1.py designates all servers given in inventory/farm1.py (by the way, all.py designates all servers), and contains the random attribute user_name (attributes must be lowercase), next to authentication data for the whole inventory group:

user_name = 'sam'  ssh_user = 'root' ssh_key = '~/.ssh/sshkey' ssh_key_password = '123456'

The random attribute can be picked up by a deployment script using host.data() like follows, user_name could be used again for e.g. server.user() , like this:

from pyinfra import host from pyinfra.modules import server  server.user(host.data.user_name)

This deploy, the ensemble of inventory file, group data file and deployment script (usually placed top level in the project folder) then could be run that way:

$ pyinfra -i inventory/farm1.py deploy.py

You have guessed it, since deployment scripts are Python scripts they are fully programmable (please regard that Pyinfra is build & runs on Python 3 on Debian), and that’s the main advantage point with this piece of software.

Quite handy for that come Pyinfra facts , functions which check different things on remote systems and return information as Python data. Like e.g. deb_packages returns a dictionary of installed packages from a remote apt based server:

$ pyinfra -i 192.0.2.10 --fact deb_packages --user root --key ~/.ssh/sshkey --key-password=123456 {     "192.0.2.10": {         "libdebconfclient0": "0.192",         "python-debian": "0.1.27",         "libavahi-client3": "0.6.31-5",         "dbus": "1.8.20-0+deb8u1",         "libustr-1.0-1": "1.0.4-3+b2",         "sed": "4.2.2-4+b1",

Using facts, Pyinfra reveals its full potential. For example, a deployment script could go like this, linux.distribution() returns a dict containing the installed distribution:

from pyinfra import host from pyinfra.modules import apt  if host.fact.linux_distribution['name'] == 'Debian':    apt.packages(packages='gummi', present=True, update=True) elif host.fact.linux_distribution['name'] == 'CentOS':    pass

I’ll spare more sophisticated examples to keep this introduction simple. Beyond fancy deployment scripts, Pyinfra features an own API by which it could be programmed from the outside, and much more. But maybe that’s enough to introduce Pyinfra. That are the usage basics.

Pyinfra is a brand new project and it remains to be seen whether the developer can keep on further developing the tool like he does these days. For a private project it’s insane to attempt to become a contender for the established "big" free configuration management tools and frameworks, but, if Puppet has become too complex in the meanwhile or not, I really don’t think that’s the point here. Pyinfra follows an own approach in being programmable the way which it is. And it’s definitely not harm to have it in the toolbox already, not trying to replace nothing.

Brainstorm

After the first package has been in experimental, the Brainstorm library from Swiss AI research institute IDSIA is now available as python3-brainstorm in unstable. Brainstorm is a lean, easy-to-use library for setting up deep learning networks (multiple layered artificial neural networks) for machine learning applications like for image and speech recognition or natural language processing. To set up a working training network for a classifier for handwritten digits like the MNIST dataset (a usual "hello world") just takes a couple of lines, like an example demonstrates. The package is maintained within the Debian Python Modules Team.

The Debian package ships a couple of examples in /usr/share/python3-brainstorm/examples (the data/ and examples/ folders of the upstream tarball are combined here). Among them there are:

  • scripts for creating proper HDF5 training data of the MNIST database of handwritten digits and for training a simple neural network on it ( create_mnist.py , mnist_pi.py ),

  • examples for setting up data and training a convolutional neural network (CNN) on the CIFAR-10 dataset of pictures ( create_cifa10.py , cifar10_cnn.py ),

  • as well as example scripts for setting up training data and creating a LSTM (Long short-term memory) recurrent neural network (RNN) on test data used in the Hutter Prize competition ( create_hutter.py , hutter_lstm.py ).

  • And there’s also another example script for creating training data of the CIFAR-100 dataset ( create_cifar100.py ).

The current documentation in /usr/share/doc/python3-brainstorm/html/ isn’t complete yet (several chapters are under construction), but there’s a walkthrough on the CIFAR-10 example. The MNIST example has been extended by Github user pinae , and has been explained in German C’t recently.

What are the perspectives for further development? Like Zhou Mo confirmed, there are a couple of deep learning frameworks around having a rather poor outlook since there have been abandoned after being completed as PhD projects. There’s really no point for thriving to have them all in Debian, like the ITP of Minerva has been given up partly for this reason, there weren’t any commits since 08/2015 (and because cuDNN isn’t available and most likely won’t). Brainstorm, 0.5 have been released 05/2015, also was a PhD project as IDSIA. It’s stated on Github that the project is "under active development", but the rather sparse project page on the other side expresses the "hope the community will help us to further improve Brainstorm". This sentence much often implies that the developers are not actively working on the project. But there are recent commits and it looks that upstream is active and could be reached when there are problems, and that the project is active. So I don’t think we’re riding a dead horse, here.

The downside for Brainstorm in Debian is, it seems that the libraries which are needed for GPU accelerated processing can’t be fully provided. Pycuda is available, but scikit-cuda (an additional library which provides wrappers for CUDA features like CUBLAS, CUFFT and CUSOLVER) is not and won’t be, because the CULA Dense Toolkit (scikit-cuda also contains wrappers for also that) is not available freely as source. Because of that, a dependency against pycuda, not even as Suggests (it’s non-free), has been spared. Without GPU acceleration, Brainstorm computes the matrices on openBLAS using a Cython wrapper on the NumpyHandler , and the PyCudaHandler couldn’t be used. openBLAS makes pretty good use of the available hardware (it distributes over all available CPU cores), but it’s not yet possible to run Brainstorm full throttle using available floating point devices to reduce training times, which becomens rucial when the projects are getting bigger.

Brainstorm belongs to the number of deep learning frameworks already being or becoming available in Debian. Currently there is:

  • Caffe for image recognition resp. classificationis just around the corner ( #823140 ).

  • Theano is currently in experimental , and will be ready together with libgpuarray (OpenCL based GPU accelerated processing) and Keras (abstraction layer) for Stretch. It could already run on NVIDIA graphics card via CUDA(limited to amd64 and ppc64el, though).

  • Lasagne, the somewhat higher-leveled abstraction layer for Theano is RFP ( #818641 ).

  • Google’s Tensorflow , the free successor of Dist-Belief, is currently on ITP ( #804612 ). It’s waiting for Google’s build system Bazel to become available.

  • Torch is also ITP ( #794634 ). It’s blocked by a wishlist bug on dh-lua to get closed.

  • Amazon’s own machine learning workhorse dsstne ("destiny") is now also put under a free license and also will becoming available ( #824692 ) in the foreseeable future for Debian (contrib). It’s not yet for image recognition applications, though (lacks CNN).

  • Mxnet is RFP ( #808235 ).

I’ve checked over Microsoft’s CNTK, but although it’s also set free recently I have my doubts if that could be included. Apparently there are dependencies against non-free software and most likely other issues. So much for a little update on the state of deep learning in Debian, please excuse if my radar misses something.

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » My work for Debian in May

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址