IPython is a great tool for doing interactive exploration of code and data. IPython.parallel is part of IPython that enables interactive exploration of parallel code, and aims to make distributing your work on local clusters or AWS simple and straightforward. The tutorial will cover the basics of getting IPython.parallel up and running in various environments, and how to do interactive and asynchronous parallel computing with IPython. Some of IPython's cooler interactive features will be demonstrated, such as automatically parallelizing code with magics in the IPython Notebook and interactive debugging of remote execution, all with the help of real-world examples.
HDF5 is for Lovers
Slides can be found here: slideshare.net/PyData/hdf5-isforlovers
HDF5 is a hierarchical, binary database format that has become a de facto standard for scientific computing. While the specification may be used in a relatively simple way (persistence of static arrays) it also supports several high-level features that prove invaluable. These include chunking, ragged data, extensible data, parallel I/O, compression, complex selection, and in-core calculations. Moreover, HDF5 bindings exist for almost every language - including two Python libraries (PyTables and h5py).
This tutorial will discuss tools, strategies, and hacks for really squeezing every ounce of performance out of HDF5 in new or existing projects. It will also go over fundamental limitations in the specification and provide creative and subtle strategies for getting around them. Overall, this tutorial will show how HDF5 plays nicely with all parts of an application making the code and data both faster and smaller. With such powerful features at the developer's disposal, what is not to love?!
This tutorial is targeted at a more advanced audience which has a prior knowledge of Python and NumPy. Knowledge of C or C++ and basic HDF5 is recommended but not required.
This tutorial will require Python 2.7, IPython 0.12+, NumPy 1.5+, and PyTables 2.3+. ViTables and MatPlotLib are also recommended. These may all be found in Linux package managers. They are also available through EPD or easy_install. ViTables may need to be installed independently.
Python in an Evolving Enterprise System
Slides can be found here: slideshare.net/PyData/py-data-svslideshare
Our data pipeline is growing like crazy, processing more than 30 terabytes of data every day and more than tripling in the last year alone. In 2011, we moved our data pipeline to a Hadoop stack in order to enable horizontal scalability for future growth. Our optimization tools used for data exploration, aggregations, and general data hackery are critical for updating budgets and optimization data. However, these tools are built in Python, and integrating them with our Hadoop data pipeline has been an enormous challenge. Our continued explosive growth demands increased efficiency, whether that's in simplifying our infrastructure or building more shared services. Over the past few months, we evaluated multiple solutions for integrating Python with Hadoop including using Hadoop Streaming, PIG with Jython UDFs, writing MapReduce in Jython, and of course, why not just do it in Java? In our talk, we'll explore the different Python-Hadoop integration options, share our evaluation process and best practices, and invite an interactive dialogue of lessons learned.
Creating Interactive Applications in Matplotlib
Matplotlib is the leading scientific visualization tool for Python. Though its ability to generate publication-quality plots is well-known, some of its more advanced features are less-often utilized. In this tutorial, we will explore the ability to create custom mouse- and key-bindings within matplotlib plot windows, giving participants the background and tools needed to create simple cross-platform GUI applications within matplotlib. After going through the basics, we will walk through some more intricate scripts, including a simple MineSweeper game and a 3D interactive Rubik's cube, both implemented entirely in Matplotlib.
Presentation notebook can be viewed at: github.com/jakevdp/matplotlib_pydata2013