- Product Prompts
- Posts
- 🎬 Intro to gen AI and LLMs
🎬 Intro to gen AI and LLMs
A primer on the key innovations behind the creative AI revolution.
The Rise of Creative Computers
Remember when your computer could only do boring number-crunching tasks or basic word processing? Those days are long gone. Our laptops are now mastering abilities that used to be uniquely human - like writing novels, composing music and painting stunning works of art.
This is all thanks to groundbreaking advances in artificial intelligence over the past decade. Specifically, huge leaps in "generative AI" and "large language models" are enabling computers to generate all kinds of creative content with just a few prompts.
So how is your laptop suddenly able to write you a poem? What's changed to make this possible? Keep reading for a plain English primer on the key innovations behind the creative AI revolution.
Your Digital Assistant, Explained
Let's start with the AI chatbots that have gone viral lately, like ChatGPT. Under the hood, these chatbots are powered by something called a large language model. Despite the fancy name, the basic concept is simple pattern matching.
Here's an analogy: think of how your phone's keyboard suggests the next word as you type based on the previous words. If you type "I love", it will suggest words like "you" or "it" next. That's because it has matched the pattern of those words frequently coming after "I love" in sentences.
Large language models work on the same principle, but at a much larger scale. They examine billions of sentences from books, websites and more to learn the patterns of how words fit together. Seeing all those examples helps them predict words that are likely to come next.
So if you give a language model the words "I love", it will suggest probable next words like ChatGPT does. Give it more context like "I love taking my dog to the..." and it can make an even better prediction for the next word, maybe "park".
The more words you provide, the more accurate it becomes. With a whole sentence or paragraph as context, the models can generate surprisingly human-like text by predicting the most likely next words over and over.
It's all statistics under the hood. No rules of grammar involved!
The models simply recognise patterns in massive datasets.
Building Giant Brains
But hold on - your phone's keyboard has seen only a tiny fraction of the data that large language models are trained on. The biggest models have ingested hundreds of billions of words from books, Wikipedia and the public internet.
Processing that kind of scale requires enormous computing power. The largest language models cost millions of dollars to train using hundreds or even thousands of high-end graphics cards running complex neural network architectures.
One key innovation was the "Transformer" architecture created by Google in 2017. Transformers can handle looking at tens of thousands of words of context to make predictions, where previous models maxed out at a few hundred words.
This was a game changer. With the Transformer, language models could take advantage of the massive datasets needed to get really good at generating human-like text.
Today's leading large language models have over 100 billion parameters. Parameters are essentially all the little settings that define how the model works. That's 100 billion dials for these models to learn from data!
No human could manually set so many dials. Instead, the models continuously tune themselves through a training process called "neural network back propagation". The coolest part is that no humans directly program what content the models produce - they learn solely by identifying patterns and statistics in the data.
Creativity Through Clever Tricks
But hang on - if language models just predict the most statistically likely words, how can they produce creative original content?
The answer lies in some clever tricks that add random noise to the model's predictions. A technique called sampling lets the model generate text that varies each time rather than repeating the same exact output.
Tuning the "temperature" controls the amount of randomness. A high temperature makes the model more adventurous, picking less obvious words. This randomness enables the creation of novels, poems and jokes that don't directly copy anything the model has seen before.
Of course, the model can't always be trusted to behave itself. Biases in training data could lead to toxic outputs. Large providers tend to use human feedback to fine-tune the models to converse politely - but it's an ongoing challenge.
The Inner Workings of Text Generation
Now that we've covered the basics, let's really dive into the nitty-gritty details of how LLMs generate text that seems human-written.
It all starts with representing each word as a vector - essentially just a list of numbers. Using vectors allows the model to understand relationships between words mathematically.
For example, "king" and "queen" have similar vectors, while "king" and "banana" are far apart. The numbers encode information about each word's meaning based on patterns in massive text corpora.
The model is trained to predict the next vector (word) in a sequence by analysing previous vectors using a transformer architecture. Transformers have attention mechanisms that focus on the most relevant context when making predictions. This allows handling long sequences of tens of thousands of words.
Here's a simplified example:
The model is given the input vectors (words) for "The quick brown fox"
Its job is to predict the next most likely vector (word). To do this, it:
Uses self-attention to focus on the most important context - probably "brown fox"
Adds information to each vector to clarify meanings - like labelling nouns, verbs etc.
Uses feed-forward layers to match learned patterns - perhaps "brown fox" frequently precedes "jumped"
Outputs "jumped" as the predicted next vector
This happens for each and every word, billions of times over massive datasets. The weights (parameters) are continually adjusted through "back propagation" to improve predictions.
When trained, the model can generate new text by predicting the most probable next vectors iteratively. But always picking the top result would just replicate training data.
Instead, models use temperature and sampling settings to sometimes pick less likely vectors. This randomness enables novel outputs.
Higher temperature settings make the model more creative, but risk unrealistic or ungrammatical results. Proper tuning balances creativity versus coherence.
The training data largely determines the knowledge and writing style the model adopts. More conversational datasets like Reddit produce casual, friendly tones. Academic texts yield more formal language.
Ethical risks stem from biases in datasets reflecting unfair societal standards. But curating the data and fine-tuning the model on inclusive texts can help mitigate problems.
While not fully understood, the interplay of self-attention, massive data and tuning tricks gives rise to the text generation abilities we interact with today. The outputs continue improving as data and compute scale up.
How Images Are Generated
Text generation is just the tip of the iceberg. Some of the most stunning outputs from AI systems are brand new photographs, paintings, album covers and other images created entirely by algorithms.
But how can a bunch of numbers generate a beautiful scenic portrait or a photorealistic picture of someone's imaginary friend? Let's unpack how computer vision and neural networks make it possible.
Patterns in Pixels
It all starts with recognising that images are just numbers to a computer. A digital image is made up of pixels - tiny dots of colour. Each pixel has a numeric code for its exact shade and brightness.
String all those pixel codes together, and you have a full image represented numerically. So just like with words in language models, neural networks can be trained to identify visual patterns in pixels across millions of images.
For example, one layer of a network might recognise simple edges and shapes. The next layer might spot patterns like circles and triangles. Further layers can identify more complex shapes like eyes, ears and paws.
After several layers, the network can tell if the full set of pixels makes up a cat picture or not. This is called image classification.
Now flip that process around. If a neural network has learned to recognise all the features that make up a cat picture, it can also generate pixel codes from scratch that include those features. Then it will look like a new cat rather than random noise!
Noisy Origins
Generating brand new images involves training models called Generative Adversarial Networks (GANs). GANs learn to create images that are indistinguishable from real photographs.
They work by starting with pure random noise - just random pixels. Then they gradually refine that noise until it forms a coherent image. It's a bit like an artist roughly sketching a landscape then repeatedly adding details to the sketch.
One key trick that helps steer GANs is conditioning the noise on text tags or descriptions. As the model turns noisy pixels into a cat, you also provide the text “an adorable cartoon cat” to guide the image in that direction. The text acts as a clue to help select the right features out of the noise.
Diffusion models are an exciting new spin on this technique. They add noise to an existing image iteratively, a bit at a time. Then the model is trained to remove just that latest added noise at each step.
This approach allows for more photorealistic outputs. And conditioning on text or tags enables generating diverse, customisable images - like "a vacation photo of my family at the Eiffel Tower."
The interplay between computer vision pattern recognition and steering text embeddings is what unlocks this new creative tool. Just describe anything you can imagine, and you can likely generate it!
Beyond Text and Images
So far we've covered how models can generate text and images. But their capabilities don't stop there. These flexible algorithms are unlocking new creativity across many mediums.
For example, large language models can translate text between thousands of languages. This enables writers to reach global audiences. The models have surpassed most human translators in accuracy.
You can also summarise long articles or books in just a few sentences. This speeds up research and learning. Simply ask the model to "provide a 2 paragraph summary of this 10 page report" and it will distill the key points.
Writing computer code used to be solely a human endeavour. But now AI systems can generate functions and applications from natural language prompts. For instance, you could say "write a Python program that scrapes data from Wikipedia and stores it in a CSV file." The model will generate complete code for you.
For audio generation, models can create instrumental music from just a text description like "a smooth hip hop beat with piano and strings". They can also add background music to podcasts and videos to set the vibe.
The latest frontier is video generation. Models are learning to create short videos from text prompts and image guidance. We're still in the early stages, but creativity seems to know no bounds!
Bias and Ethics Concerns
Of course, handing algorithms these unchecked creative powers raises many concerns. Large language models risk amplifying harmful societal biases and misinformation present in their training data.
Most providers are working actively to mitigate risks and build useful guardrails. But caution is warranted, as generative models don't inherently have human ethics or judgment.
Transparency about training practices, evaluating risks systematically and allowing user control will be crucial going forward. There is much active debate among researchers and companies on policies and best practices.
For all their capabilities, these systems are still just narrow artificial intelligence optimised for generating content. They don't have true understanding or reasoning abilities akin to human intelligence - yet.
Democratising Creativity
Despite the risks, the creative potential unlocked by recent advances in generative AI is enormous. What used to require specialised skills like writing, graphic design or translation can now be done by anyone.
These technologies are also becoming widely accessible. Large companies offer APIs and services to build with. Enthusiasts even train models using free resources and share them openly online.
We're only beginning to explore activities that will be reimagined by language models and neural networks. From productivity tools to artistic outlets, the possibilities span every domain.
At the end of the day, these models don't have imaginations - we humans provide that. AI is simply helping execute our creative visions and augment our abilities.
So while healthy skepticism is warranted, an open mind can discern many beneficial and enriching opportunities. The humans behind the algorithms still have everything to play for.