GPT-4 architecture: what we can deduce from research literature

This text is my personal opinion, developed by researching publicly available sources such as research publications and rumors. I did not and do not work in any of the companies whose current or future products this text speculates about. Intended audience: people with engineering experience or some basic ML knowledge who are interested in language modeling techniques that may have been selected for implementation by “GPT-4” authors from OpenAI. We need such speculation, because the authors have elected to keep the technical detail private, citing safety concerns and competitive landscape....

March 14, 2023 · 8 min · 1655 words · Kirill Gadjello

Introducing ZIPSLICER📁✂️

ZIPSLICER is available on GitHub and on PyPI. Intended audience: individuals who find themselves working with torch checkpoints with size on the order of available CPU RAM. Introduction Successful software design and satisfactory performance have underpinned PyTorch’s rise to the top as the premier deep learning (hereafter DL) framework over the past 5 years. From this simple observation we can proceed to notice that the majority of DL models start the post-training part of their lifecycle as a monolithic pytorch checkpoint produced by a simple torch....

March 3, 2023 · 8 min · 1667 words · Kirill Gadjello

Wildcard Introduction

One day I will distill enough context to write a concise intro. For now, let’s just say I like generally useful entities.

February 4, 2023 · 1 min · 22 words · Kirill Gadjello