As a certified instructor at NVIDIA Deep Learning Institute I teach the following courses

Model Parallelism: Building and Deploying Large Neural Networks

Large language models (LLMs) and deep neural networks (DNNs), whether applied to natural language processing (e.g., GPT-3), computer vision (e.g., huge Vision Transformers), or speech AI (e.g., Wave2Vec 2), have certain properties that set them apart from their smaller counterparts. As LLMs and DNNs become larger and are trained on progressively larger datasets, they can adapt to new tasks with just a handful of training examples, accelerating the route toward general artificial intelligence. Training models that contain tens to hundreds of billions of parameters on vast datasets isn’t trivial and requires a unique combination of AI, high-performance computing (HPC), and systems knowledge. The goal of this course is to demonstrate how to train the largest of neural networks and deploy them to production.

Building Transformer-Based Natural Language Processing Applications

Learn how to apply and fine-tune a Transformer-based Deep Learning model to Natural Language Processing (NLP) tasks. In this course, you'll: construct a Transformer neural network in PyTorch, build a named-entity recognition (NER) application with BERT, deploy the NER application with ONNX and TensorRT to a Triton inference server. Upon completion, you’ll be proficient in task-agnostic applications of Transformer-based models.

Rapid Application Development Using Large Language Models

Recent advancements in both the techniques and accessibility of large language models (LLMs) have opened up unprecedented opportunities to help businesses streamline their operations, decrease expenses, and increase productivity at scale. Additionally, enterprises can use LLM-powered apps to provide innovative and improved services to clients or strengthen customer relationships. For example, enterprises could provide customer support via AI companions or use sentiment analysis apps to extract valuable customer insights. In this course you will gain a strong understanding and practical knowledge of LLM application development by exploring the open-sourced ecosystem including pretrained LLMs, enabling you to get started quickly in developing LLM-based applications.

Building RAG Agents with LLMs

The evolution and adoption of large language models (LLMs) have been nothing short of revolutionary, with retrieval-based systems at the forefront of this technological leap. These models are not just tools for automation; they are partners in enhancing productivity, capable of holding informed conversations by interacting with a vast array of tools and documents. This course is designed for those eager to explore the potential of these systems, focusing on practical deployment and the efficient implementation required to manage the considerable demands of both users and deep learning models. As we delve into the intricacies of LLMs, participants will gain insights into advanced orchestration techniques that include internal reasoning, dialog management, and effective tooling strategies.

Large-Scale Production Deployment of RAG Pipelines

Retrieval augmented generation (RAG) pipelines are already changing every aspect of modern enterprise operation. There are countless online tutorials demonstrating proof-of-concept-level naïve RAG applications incapable of dealing with large volumes of traffic and large document volumes. This training lab will bridge this gap and discuss an opinionated best practice for production-level deployment. From infrastructure sizing through breaking down end-to-end Helm-based deployment of NVIDIA NIMs, to customizing individual pipeline components, we'll provide a high-level overview of steps your organization will have to take to transform early proofs of concept into enterprise-grade deployments.

Efficient Large Language Model (LLM) Customization

In this course, you'll go beyond prompt engineering LLMs and learn a variety of techniques to efficiently customize pretrained LLMs for your specific use cases—without engaging in the computationally intensive and expensive process of pretraining your own model or fine-tuning a model's internal weights. Using NVIDIA NeMo service, you’ll learn various parameter-efficient fine-tuning methods to customize LLM behavior for your organization.

Fundamentals of Accelerated Computing with CUDA Python

This course explores how to use Numba—the just-in-time, type-specializing Python function compiler—to accelerate Python programs to run on massively parallel NVIDIA GPUs. You’ll learn how to: · Use Numba to compile CUDA kernels from NumPy universal functions (ufuncs) · Use Numba to create and launch custom CUDA kernels · Apply key GPU memory management techniques Upon completion, you’ll be able to use Numba to compile and launch CUDA kernels to accelerate your Python applications on NVIDIA GPUs.

Fundamentals of Accelerated Data Science

Whether you work at a software company that needs to improve customer retention, a financial services company that needs to mitigate risk, or a retail company interested in predicting customer purchasing behavior, your organization is tasked with preparing, managing, and gleaning insights from large volumes of data without wasting critical resources. Traditional CPU-driven data science workflows can be cumbersome, but with the power of GPUs, your teams can make sense of data quickly to drive business decisions.

Fundamentals of Deep Learning

Businesses worldwide are using artificial intelligence to solve their greatest challenges. Healthcare professionals use AI to enable more accurate, faster diagnoses in patients. Retail businesses use it to offer personalized customer shopping experiences. Automakers use it to make personal vehicles, shared mobility, and delivery services safer and more efficient. Deep learning is a powerful AI approach that uses multi-layered artificial neural networks to deliver state-of-the-art accuracy in tasks such as object detection, speech recognition, and language translation. Using deep learning, computers can learn and recognize patterns from data that are considered too complex or subtle for expert-written software.