At Bitfusion, our job is to know how well various compute-intensive workloads scale on different infrastructures and to help people maximize performance. Since we launched our Deep Learning and CUDA AMIs in the AWS Marketplace we’ve heard many of our customers ask for bigger GPU instances, but the largest Amazon EC2 instance, the g2.8xlarge, currently maxes out at just 4 GPUs.
With a combination of our Bitfusion Boost remoting technology and the use of CloudFormation templates ( step-by step Caffe CFN tutorial ), we now allow you to easily spin up virtual instances with a lot more GPU power. For example, you can combine a large memory machine with an unlimited number of GPU nodes into a single more powerful virtual instance:
Currently we support OpenCL and CUDA based applications, but support for other APIs like OpenGL is just around the corner. Here is how it works:
The Boost runtime intercepts API calls, splits up the compute and data, and forwards the requests to a fast run-time scheduler to dispatch computation to both local and remote GPUs. As a result, you can combine the compute resources of various nodes into a single giant node. In fact, the GPU application doesn’t see any of this complexity; it just sees itself running on a single giant machine as shown in the figure below.
Ready to take one of these big machines for a spin? You can try out the new G2 instance with 8 GPUs:
Feeling adventurous? Try out the massive R3 instance with 16 GPUs, 32 CPUs, and 244 GB of Memory:
Want to build a custom GPU instance type by mixing and matching various system, we have you covered:
To launch any of the systems above you first need to subscribe to two or more of the following AMI products . Subscribing indicates that you have accepted the terms and pricing of these products, but does not cost you anything until you launch the systems.
You can start any of these systems with pre-configured AMIs that feature Caffe or Torch, or you can start with a clean client AMI and install whatever GPU application you want. Over the next few weeks, we’ll share some case studies, performance results you can expect across several GPU apps, and several new product announcement.
Have an idea for a feature or have a question? Drop us a linehere, or subscribe to out mailing list to stay up to date:
转载本站任何文章请注明：转载至神刀安全网，谢谢神刀安全网 » Combine Multiple AWS Instances into a 16-GPU Monster Machine