My job alerts

Research Engineer, Large Language Model (LLM) Acceleration

Tenstorrent

This job is no longer accepting applications

See open jobs at Tenstorrent.See open jobs similar to "Research Engineer, Large Language Model (LLM) Acceleration" Eclipse.

Cambridge, MA, USA

Posted on Jul 22, 2023

Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high performance RISC-V CPU from scratch, and share a passion for AI and a deep desire to build the best AI platform possible. We value collaboration, curiosity, and a commitment to solving hard problems. We are growing our team and looking for contributors of all seniorities.

We're looking for ML Researchers/Engineers to help us make LLMs go fast on Tenstorrent’s leading edge AI platform. If you’re enthusiastic about large scale deep learning models, are a competent software/ML engineer or researcher, and enjoy working on challenging problems, this is your opportunity to be at the bleeding edge of AI processing.

Responsibilities

Develop novel LLMs and primitives that take advantage of Tenstorrent’s breakthrough accelerator architecture to deliver orders of magnitude performance & efficiency improvements.
Model benchmarking, Performance Analysis, Characterization on novel hardware accelerator.
Fine tune/Retrain emerging LLM models after architectural modification to improve performance.
Implement ideas from latest research papers and validate new optimizations on novel hardware, suggest improvements on the existing ideas and validate new ideas through experiments.
Implement ML models on Tenstorrent’s PyBUDA Compiler framework and analyze results, suggest novel architectural changes to make them more efficient.

Experience & Qualifications

Deep understanding of Transformer based model architecture (e.g., BERT, GPT, LLaMA, FALCON) - should be able to analyse performance bottlenecks in training or inference.
Expertise (through evidence in prior projects or research) in large-scale foundation model (CNN/ Transformer/Generative) training and optimisations including quantisation, pruning, low-rank approximation (e.g., LoRA, SVD), sparsity aware training, early exit models, conditional computing, sparse attentions, NAS, fine-tuning (e.g., RLHF, PEFT)
Excellent working knowledge of Python, PyTorch or TensorFlow, (optional C/C++ if interested in compiler and software stack)
Familiarity with and passion for any of the following -- ML compilers, high-performance and massively parallel systems, distributed model training and inference, GPU or AI accelerator architecture -- is a plus.

Locations:

Cambridge (UK), Boston (US), Toronto (CAN)

Tenstorrent offers a highly competitive compensation package and benefits, and we are an equal opportunity employer.

Due to U.S. Export Control laws and regulations, Tenstorrent is required to ensure compliance with licensing regulations when transferring technology to nationals of certain countries that have been sanctioned by the U.S. government.

As this position will have direct and/or indirect access to information, systems, or technologies that are subject to U.S. Export Control laws and regulations, please note that citizenship/permanent residency information and/or documentation will be required and considered as Tenstorrent moves through the employment process.

This job is no longer accepting applications

See open jobs at Tenstorrent.See open jobs similar to "Research Engineer, Large Language Model (LLM) Acceleration" Eclipse.

See more open positions at Tenstorrent

Privacy policy Cookie policy