Data Encoding Fundamentals: Base64, Base64URL, and URL Encoding

Last updated: February 2025 Β· 12 min read

What you will learn

  • Why data encoding exists and what problems it solves
  • How Base64 encoding works, including its alphabet, padding, and 33% size overhead
  • What Base64URL is and why it matters for JWTs and URL-safe contexts
  • How URL percent-encoding follows RFC 3986 rules
  • When to use each encoding type in ad tech workflows
  • How to diagnose and fix common encoding problems

Why Encoding Exists

Data encoding is the process of transforming information from one representation into another so that it can be safely transmitted, stored, or processed by systems that have specific character or format constraints. Encoding is not encryption β€” it does not protect data from being read. Instead, it ensures that data survives the journey through channels that would otherwise corrupt or misinterpret certain byte values.

Consider a simple example: you need to include binary image data inside a JSON payload. JSON is a text-based format and cannot represent arbitrary bytes directly. If you attempted to paste raw binary into a JSON string, the parser would encounter control characters, null bytes, or invalid Unicode sequences and reject the input. Base64 encoding solves this by converting every three bytes of binary data into four printable ASCII characters, producing a string that JSON (or XML, or an email body, or an HTML attribute) can safely carry.

Similarly, URLs have a restricted character set. Characters like spaces, ampersands, question marks, and equals signs have special meaning in a URL's structure. If your data contains these characters, they must be percent-encoded so that the URL parser can distinguish between structural delimiters and literal data values. Without this encoding, a tracking URL that contains a redirect destination as a query parameter would break the moment the destination URL itself contained a question mark or ampersand.

Base64 Encoding in Depth

Base64 is a binary-to-text encoding scheme defined in RFC 4648. It uses an alphabet of 64 characters β€” the uppercase letters A through Z, the lowercase letters a through z, the digits 0 through 9, and the two symbols + and /. A 65th character, =, is used as a padding suffix.

The encoding process works by reading the input three bytes at a time (24 bits total) and splitting those 24 bits into four groups of 6 bits each. Each 6-bit group maps to one character from the Base64 alphabet. Because three input bytes produce four output characters, Base64 always increases the data size by approximately 33 percent. This overhead is the fundamental tradeoff: you gain universal text compatibility at the cost of larger payloads.

Padding Explained

When the input length is not a multiple of three, padding is required. If the final block contains only one byte (8 bits), it produces two Base64 characters followed by two = padding characters. If the final block contains two bytes (16 bits), it produces three Base64 characters followed by one =. The padding ensures that every Base64 string has a length that is a multiple of four, which simplifies decoding.

For example, encoding the string "Hi" (two bytes: 0x48 0x69) produces the Base64 output "SGk=". The trailing = indicates that one byte of padding was needed to complete the final four-character block. While some implementations allow omitting padding for brevity, doing so can break decoders that strictly expect it. Understanding padding behavior is essential for diagnosing truncation and corruption issues.

Base64URL: The URL-Safe Variant

Standard Base64 uses + and / in its alphabet, but both of these characters have special meaning in URLs. The + character is interpreted as a space in query strings (the application/x-www-form-urlencoded format), and / is a path separator. If you place a standard Base64 string into a URL without additional encoding, these characters will be misinterpreted and the data will be corrupted.

Base64URL, also defined in RFC 4648, solves this by replacing + with - (hyphen) and / with _ (underscore). These two replacement characters are URL-safe β€” they do not have special meaning in any part of a URL. Additionally, Base64URL often omits the = padding characters entirely, since = is the query parameter delimiter in URLs and would also cause parsing conflicts.

The most prominent use of Base64URL is in JSON Web Tokens (JWTs). A JWT consists of three Base64URL-encoded segments separated by dots: the header, the payload, and the signature. Because JWTs frequently appear in URL query parameters, HTTP headers, and cookies, using standard Base64 would break token parsing in these contexts. If you are working with JWTs and your decoded output looks like garbage, verify that you are using a Base64URL decoder rather than a standard Base64 decoder β€” confusing the two is one of the most common encoding mistakes in ad tech.

URL Percent-Encoding (RFC 3986)

URL encoding, formally called percent-encoding, is the mechanism defined by RFC 3986 for including arbitrary data in a URI. It works by replacing each unsafe byte with a percent sign followed by two hexadecimal digits representing the byte's value. For example, a space character (byte value 0x20) becomes %20, an ampersand (0x26) becomes %26, and a forward slash (0x2F) becomes %2F.

RFC 3986 defines a set of "unreserved characters" that do not require encoding: the uppercase and lowercase ASCII letters, the digits 0 through 9, and the four characters - . _ ~. Everything else β€” including reserved characters like ? & = # / : @ β€” must be percent-encoded when they appear as data rather than as structural delimiters. The distinction between "this character is part of the URL structure" and "this character is part of the data" is precisely what percent-encoding preserves.

In ad tech, percent-encoding appears constantly in tracking URLs, redirect chains, and macro-substituted parameters. A typical click-tracking URL might contain an encoded destination URL as a query parameter: https://tracker.example.com/click?dest=https%3A%2F%2Fwww.advertiser.com%2Flanding%3Fcampaign%3Dspring. Without encoding, the destination URL's own query parameters would merge with the tracker's parameters, making the URL unparseable.

When to Use Each Encoding Type

EncodingUse WhenAvoid When
Base64Embedding binary data in JSON, XML, email bodies, or data URIsPlacing encoded data directly in URLs or filenames
Base64URLJWTs, URL query parameters, cookies, anywhere a URL-safe string is neededContexts expecting standard Base64 with padding
URL / PercentEncoding values in URL query strings, form data, redirect parametersEncoding binary data β€” use Base64 first, then URL-encode if needed

Using the Encoder / Decoder Tool

The Base64 & URL Encoder tool provides a single interface for encoding and decoding across all three types. Here is a step-by-step workflow for using it effectively:

  1. Paste your input into the left panel. This can be raw text, a Base64 string, a JWT, or a URL-encoded string β€” whatever you need to work with.
  2. Select the encoding mode. Choose Base64, Base64URL, or URL encoding depending on what transformation you need. The tool automatically detects whether your input looks encoded and suggests the appropriate decode direction.
  3. Click Encode or Decode. The output appears in the right panel instantly. For Base64URL mode, padding is handled automatically β€” the tool adds or strips padding as needed.
  4. Copy the result. Use the output in your logs, configuration files, tracking URLs, or debugging sessions. The tool preserves the exact byte sequence, so what you decode is exactly what was originally encoded.

Common Encoding Issues and How to Fix Them

Double Encoding

Double encoding is the most frequent encoding bug in ad tech. It occurs when a value is encoded twice β€” for example, a URL is percent-encoded once, then the entire string (including the percent signs) is encoded again. The result is that %20 (a single space) becomes %2520 (the literal characters %20). When the receiving system decodes once, it gets %20 instead of a space, which breaks parameter parsing. Double encoding often happens when multiple systems in a chain each apply their own encoding without checking whether the input was already encoded. The fix is to ensure that encoding is applied exactly once, at the point where the value is inserted into the URL.

Wrong Base64 Variant

Attempting to decode a Base64URL string with a standard Base64 decoder (or vice versa) produces corrupted output. The symptoms are subtle: the decoded data might look mostly correct but contain a few garbled characters wherever a - or _ appeared in the encoded string. If you see mostly-correct-but-slightly-wrong decoded data, check which variant was used to encode it and switch your decoder accordingly.

Padding Problems

Some systems strip Base64 padding (the trailing = characters) to save bytes, while others require it. If a decoder fails with an "invalid length" or "incorrect padding" error, try adding back the missing = characters. The required number of padding characters is (4 - (length % 4)) % 4. Many modern libraries handle missing padding gracefully, but older or stricter implementations do not.

Character Set Confusion

Base64 operates on bytes, not characters. When encoding text, the character encoding (UTF-8, Latin-1, UTF-16) determines which bytes are produced, and thus which Base64 output is generated. If the sender encodes using UTF-8 but the receiver decodes expecting Latin-1, the text will be garbled for any characters outside the ASCII range. Always agree on the character encoding before applying Base64, and document it in your API contracts.

Encoding in Ad Tech Workflows

Encoding plays a central role in several ad tech workflows. Tracking URLs are the most common example β€” when an ad server constructs a click-tracking URL, it must percent-encode the destination URL so that the tracker can parse its own query parameters separately from the destination's parameters. If the destination URL is not encoded, the combined URL becomes ambiguous and the redirect will fail or go to the wrong page.

JWT tokens are another critical use case. Many ad tech platforms use JWTs for authentication and bid request signing. These tokens use Base64URL encoding for their header and payload segments. When debugging a bid request that fails authentication, decoding the JWT to inspect its claims (issuer, audience, expiration time) is the first diagnostic step.

API payloads in programmatic advertising frequently carry Base64-encoded binary data. For example, the OpenRTB specification allows certain fields (such as native ad request objects) to be embedded as Base64-encoded strings within the JSON bid request. Decoding these embedded values is necessary to inspect and validate the underlying data structures.

Step-by-Step Diagnostic Approach

When you encounter an encoding issue in the field, follow this systematic approach to identify and resolve it:

  1. Identify the encoding type. Look at the string. Does it contain +, /, and =? It is likely standard Base64. Does it contain - and _ with no padding? It is likely Base64URL. Does it contain % followed by hex digits? It is percent-encoded.
  2. Decode one layer at a time. Apply the appropriate decoding operation once and examine the result. If the result still looks encoded (for example, you see %20 in the decoded output), decode again. Count the layers β€” this reveals whether double or triple encoding occurred.
  3. Check the character encoding. If the decoded text contains garbled characters, try decoding the bytes as UTF-8, Latin-1, or Windows-1252 to see which interpretation produces readable text.
  4. Trace the encoding chain. Identify every system that touches the data between origin and destination. Determine which system applies encoding and which applies decoding. Look for mismatches β€” a system that encodes but never decodes, or two systems that both encode, will produce double encoding.
  5. Fix at the source. Once you identify where the extra or incorrect encoding occurs, fix it at that point in the chain. Do not add compensating decodes downstream β€” this masks the root cause and creates fragile dependencies.

Related Resources