Base64 Encoding Explained: When and Why to Use It

· 9 min read

If you have spent any time working with web APIs, email systems, or data embedding in HTML and CSS, you have almost certainly encountered Base64 encoding. It appears in data URIs, JSON Web Tokens, email attachments, and countless API payloads. Yet many developers use Base64 without fully understanding how it works, when it is appropriate, and — critically — what it is not. This article provides a thorough, practical explanation of Base64 encoding from first principles.

What Is Base64?

Base64 is a binary-to-text encoding scheme that represents binary data using a set of 64 printable ASCII characters. It was designed to solve a specific problem: transmitting binary data through systems that were built to handle only text. Email protocols (SMTP), many older APIs, and text-based formats like XML and JSON cannot safely transport raw binary data because certain byte values would be interpreted as control characters, delimiters, or would simply be corrupted during transmission.

Base64 encodes arbitrary binary data into a safe alphabet of letters (A-Z, a-z), digits (0-9), and two additional characters (typically + and /), with = used for padding. The result is a string that can travel safely through any text-based channel without corruption.

The standard Base64 alphabet, defined in RFC 4648, consists of these 64 characters:

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

Each character represents a value from 0 to 63 — exactly 6 bits of information. This 6-bit representation is the fundamental mechanism behind the encoding.

How the Encoding Works

The encoding process converts groups of three bytes (24 bits) into four Base64 characters (4 × 6 = 24 bits). Here is the step-by-step process:

  1. Take three bytes of input. Each byte is 8 bits, so three bytes give you 24 bits of data.
  2. Split into four 6-bit groups. Divide those 24 bits into four groups of 6 bits each.
  3. Map each 6-bit group to a Base64 character. Use the value (0–63) as an index into the Base64 alphabet to get the corresponding character.
  4. Output the four characters. These four ASCII characters represent the original three bytes.

Let us walk through a concrete example. Consider encoding the text "Hi" (two bytes: 0x48, 0x69).

Step Data
ASCII bytes H = 72 (01001000), i = 105 (01101001)
Binary stream 01001000 01101001 (16 bits)
Pad to multiple of 6 01001000 01101001 00 (18 bits, padded with two zero bits)
Split into 6-bit groups 010010 000110 100100
Decimal values 18, 6, 36
Base64 characters S, G, k
Final output (with padding) SGk=

Padding with =

Because Base64 processes input in three-byte blocks but input data is not always a multiple of three bytes, padding is needed. When the input has one remaining byte (after processing complete three-byte blocks), the output gets two Base64 characters followed by ==. When two bytes remain, the output gets three Base64 characters followed by =. When the input length is an exact multiple of three, no padding is needed.

The = padding character tells the decoder exactly how many bytes of real data the final group contains. Some implementations allow omitting padding (the decoder can infer the original length from the output length), but the padded form is standard and maximally compatible.

The 33% Size Overhead

Because three bytes of input become four bytes of output, Base64 encoding always increases data size by approximately 33 percent. A 1 MB binary file becomes roughly 1.33 MB when Base64-encoded. This overhead is the fundamental trade-off: you gain text-safe transportability at the cost of increased size. For small payloads (icons, configuration fragments, tokens), this overhead is negligible. For large files, it can be significant and should factor into your architecture decisions.

When to Use Base64

Data URIs

Data URIs allow you to embed small files directly in HTML, CSS, or JavaScript using the data: URL scheme. For example, embedding a small PNG icon in CSS:

.icon {
  background-image: url(data:image/png;base64,iVBORw0KGgo...);
}

This eliminates an HTTP request for the image, which can improve performance for small assets. The trade-off is that the Base64 string is larger than the original file and cannot be cached independently. Data URIs are most effective for assets under 4-8 KB — above that size, a separate file with proper caching is usually more efficient.

Email Attachments (MIME)

Email was designed for 7-bit ASCII text. To send binary attachments (images, PDFs, archives), MIME (Multipurpose Internet Mail Extensions) encodes them in Base64. When you attach a file to an email, your mail client Base64-encodes it, wraps it in MIME headers specifying the content type, and includes it in the message body. The recipient's client reverses the process. This is one of the original and most widespread uses of Base64, and it remains essential to how email works today.

API Payloads

REST and GraphQL APIs typically exchange data in JSON, which is a text format and cannot natively represent binary data. When an API needs to accept or return binary content — an uploaded image, a PDF document, a cryptographic signature — Base64 encoding is the standard approach. The binary data is encoded to a Base64 string and placed in a JSON field:

{
  "filename": "document.pdf",
  "content": "JVBERi0xLjQKMSAwIG9iago..."
}

Many APIs, including those from AWS, Google Cloud, and GitHub, use this pattern extensively. If you are designing an API that needs to handle binary data, consider whether Base64 in JSON is appropriate for your payload sizes, or whether multipart form data or direct binary streaming would be more efficient.

JSON Web Tokens (JWT)

JWTs are a cornerstone of modern web authentication. A JWT consists of three parts — header, payload, and signature — each Base64URL-encoded and separated by dots. The header and payload are JSON objects containing claims like user identity, token expiration, and permissions. The signature is a cryptographic hash that ensures the token has not been tampered with.

eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0.dozjgNryP4J3jVmNHl0w5N_XgL0n3I9PlFUP0THsR8U

Notice the use of Base64URL encoding (discussed below) rather than standard Base64 — this is critical because JWTs frequently appear in URLs, headers, and cookies where + and / characters would cause problems.

Want to encode or decode Base64 data quickly? Our Base64 Encoder & Decoder handles both standard and URL-safe Base64, runs entirely in your browser, and supports large inputs.

Base64 Is NOT Encryption

This is perhaps the most important point in this entire article, and it cannot be stated strongly enough: Base64 is not encryption. It provides zero security.

Encoding and encryption are fundamentally different operations. Encoding transforms data into a different representation for compatibility or transport purposes — it is fully reversible by anyone without any secret key. Encryption transforms data to prevent unauthorized access — it requires a secret key to reverse.

Base64 encoding is as transparent as writing a message in a different alphabet. Anyone can decode a Base64 string instantly using freely available tools (including the browser's built-in atob() function). If you see a Base64 string in a URL, cookie, or API response, you can decode it in seconds to read the original content.

Never use Base64 to "hide" sensitive information like passwords, API keys, tokens, or personal data. If you need to protect data, use proper encryption algorithms (AES-256, ChaCha20) with secure key management. If you need to protect data in transit, use TLS/HTTPS. Base64's role is encoding for compatibility, not security.

A common misconception arises from JWTs: because the payload is Base64-encoded, some developers assume it is hidden. It is not. Anyone who intercepts a JWT can decode the payload and read its contents. The signature protects integrity (detecting tampering) but not confidentiality. Sensitive claims in JWTs should use JWE (JSON Web Encryption) or should not be included in the token at all.

URL-Safe Base64

Standard Base64 uses + and / as the 62nd and 63rd characters in its alphabet. These characters have special meanings in URLs (+ represents a space in query strings, / is a path separator), which causes problems when Base64 strings appear in URLs, query parameters, or filenames.

URL-safe Base64 (also called Base64URL, defined in RFC 4648 Section 5) solves this by replacing + with - and / with _. Padding with = is often omitted in URL-safe variants because = is also a special character in query strings. The resulting strings are safe to use directly in URLs without additional percent-encoding.

Character Standard Base64 URL-safe Base64
Index 62 + -
Index 63 / _
Padding = Often omitted

When working with JWTs, OAuth tokens, or any Base64 data that appears in URLs, always use the URL-safe variant. Most languages provide dedicated functions for this: Python's base64.urlsafe_b64encode(), Java's Base64.getUrlEncoder(), and JavaScript libraries like base64url.

Performance Considerations

While Base64 encoding and decoding are computationally inexpensive operations, the 33% size increase has real performance implications that you should consider in your architecture:

  • Bandwidth. Base64-encoded data consumes a third more bandwidth than the raw binary. For APIs that transfer large volumes of binary data, this overhead adds up in network costs and transfer times. Consider using multipart uploads or binary protocols (gRPC, WebSocket binary frames) for large payloads.
  • Memory. Both the encoding and decoding processes create intermediate string representations in memory. For very large files (tens of megabytes or more), this can cause memory pressure. Stream-based encoding — processing the data in chunks rather than loading it entirely into memory — is available in most languages and should be used for large inputs.
  • Parsing overhead in JSON. When Base64 strings are embedded in JSON, the JSON parser must process the entire string as text. For large embedded payloads, this adds parsing time on both the sender and receiver sides. If your API frequently transfers large binary objects, consider an alternative transport mechanism.
  • Caching. Data URIs embedded in CSS or HTML cannot be cached independently by the browser. If the same Base64-encoded asset appears on multiple pages, it is downloaded with each page rather than being fetched once and cached. External files with proper cache headers are more efficient for assets reused across pages.

Base64 in Different Languages

Every major programming language provides built-in or standard library support for Base64:

// JavaScript (browser)
btoa('Hello')          // encode: "SGVsbG8="
atob('SGVsbG8=')       // decode: "Hello"

// JavaScript (Node.js)
Buffer.from('Hello').toString('base64')    // "SGVsbG8="
Buffer.from('SGVsbG8=', 'base64').toString() // "Hello"

# Python
import base64
base64.b64encode(b'Hello')   # b'SGVsbG8='
base64.b64decode(b'SGVsbG8=') # b'Hello'

// Java
Base64.getEncoder().encodeToString("Hello".getBytes()) // "SGVsbG8="
new String(Base64.getDecoder().decode("SGVsbG8="))     // "Hello"

Note that JavaScript's btoa() and atob() functions only handle Latin-1 characters. For Unicode strings, you must first encode to UTF-8 bytes. Modern browsers also support the TextEncoder and TextDecoder APIs for this purpose.

When NOT to Use Base64

Understanding when to avoid Base64 is as important as knowing when to use it:

  • For security. As discussed, Base64 provides no confidentiality. Use encryption instead.
  • For large file transfers. The 33% overhead makes Base64 inefficient for large binaries. Use multipart form data, binary streaming, or dedicated file upload endpoints.
  • For data compression. Base64 makes data larger, not smaller. If you need to reduce data size, use compression algorithms (gzip, Brotli, zstd) before encoding, or use them instead of Base64 when the transport supports binary.
  • For storing data in databases. If your database supports binary columns (BLOB, BYTEA), store binary data natively rather than Base64-encoding it into a text column. The text representation wastes storage space and adds encoding/decoding overhead to every read and write.

Conclusion

Base64 is a simple, elegant solution to a fundamental problem in computing: safely transporting binary data through text-based systems. It is not glamorous, but it is everywhere — in every email you send, every JWT you parse, every data URI in your stylesheets. Understanding how it works, when it is appropriate, and what its limitations are makes you a more effective developer.

The key takeaways: Base64 is an encoding, not encryption. It increases data size by 33%. Use it when you need text-safe binary representation. Use URL-safe variants when data appears in URLs. And for large payloads, consider whether a binary transport mechanism would be more efficient.

Ready to experiment? Try our Base64 Encoder & Decoder to encode and decode data instantly in your browser — no server processing, no data collection, just fast and private Base64 conversion.