Character Sets

Percent-encode a string

percent-encoding-badge cat-encoding-badge

Encode an input string with percent-encoding using the utf8_percent_encode function from the percent-encoding crate. Then decode using the percent_decode function.

use percent_encoding::{utf8_percent_encode, percent_decode, AsciiSet, CONTROLS};
use std::str::Utf8Error;

/// https://url.spec.whatwg.org/#fragment-percent-encode-set
const FRAGMENT: &AsciiSet = &CONTROLS.add(b' ').add(b'"').add(b'<').add(b'>').add(b'`');

fn main() -> Result<(), Utf8Error> {
    let input = "confident, productive systems programming";

    let iter = utf8_percent_encode(input, FRAGMENT);
    let encoded: String = iter.collect();
    assert_eq!(encoded, "confident,%20productive%20systems%20programming");
    println!("{encoded}");

    let iter = percent_decode(encoded.as_bytes());
    let decoded = iter.decode_utf8()?;
    assert_eq!(decoded, "confident, productive systems programming");
    println!("{decoded}");

    Ok(())
}

The encode set defines which bytes (in addition to non-ASCII and controls) need to be percent-encoded. The choice of this set depends on context. For example, url encodes ? in a URL path but not in a query string.

The return value of encoding is an iterator of &str slices which collect into a String.

Encode a string as application/x-www-form-urlencoded

url-badge cat-encoding-badge

Encodes a string into application/x-www-form-urlencoded syntax using the form_urlencoded::byte_serialize and subsequently decodes it with form_urlencoded::parse. Both functions return iterators that collect into a String.

use url::form_urlencoded::{byte_serialize, parse};

fn main() {
    let urlencoded: String = byte_serialize("What is ❤?".as_bytes()).collect();
    assert_eq!(urlencoded, "What+is+%E2%9D%A4%3F");
    println!("urlencoded:'{}'", urlencoded);

    let decoded: String = parse(urlencoded.as_bytes())
        .map(|(key, val)| [key, val].concat())
        .collect();
    assert_eq!(decoded, "What is ❤?");
    println!("decoded:'{}'", decoded);
}

Encode and decode hex

data-encoding-badge cat-encoding-badge

The data_encoding crate provides a HEXUPPER::encode method which takes a &[u8] and returns a String containing the hexadecimal representation of the data.

Similarly, a HEXUPPER::decode method is provided which takes a &[u8] and returns a Vec<u8> if the input data is successfully decoded.

The example below coverts &[u8] data to hexadecimal equivalent. Compares this value to the expected value.

use data_encoding::{HEXUPPER, DecodeError};

fn main() -> Result<(), DecodeError> {
    let original = b"The quick brown fox jumps over the lazy dog.";
    let expected = "54686520717569636B2062726F776E20666F78206A756D7073206F76\
        657220746865206C617A7920646F672E";

    let encoded = HEXUPPER.encode(original);
    assert_eq!(encoded, expected);

    let decoded = HEXUPPER.decode(&encoded.into_bytes())?;
    assert_eq!(&decoded[..], &original[..]);
    println!("{}", std::str::from_utf8(original).unwrap());
    println!("{expected}");

    Ok(())
}

Note that since the original is just raw bytes, the conversion to UTF8 may fail, so the Result must be checked before printing. Since this variable was initialized from a program constant, it is fine to use just use .unwrap(), but in general any error from such a conversion should be explicitly handled.

Encode and decode base64

base64-badge cat-encoding-badge

Encodes byte slice into base64 String using encode and decodes it with decode.

use std::error::Error;
use std::str;
use base64::{encode, decode};

fn main() -> Result<(), Box<dyn Error>> {
    let hello = b"hello rustaceans";
    let encoded = encode(hello);
    let decoded = decode(&encoded)?;

    println!("origin: {}", str::from_utf8(hello)?);
    println!("base64 encoded: {}", encoded);
    println!("back to origin: {}", str::from_utf8(&decoded)?);

    Ok(())
}