CSV processing

Add these crates to your own project:

cargo add csv serde anyhow

Read CSV records

csv-badge cat-encoding-badge

Reads standard CSV records into csv::StringRecord — a weakly typed data representation which expects valid UTF-8 rows. Alternatively, csv::ByteRecord makes no assumptions about UTF-8.

use std::error::Error;

fn main() -> Result<(), Box<dyn Error>> {
    let csv = "\
year,make,model,description
1948,Porsche,356,Luxury sports car
1967,Ford,Mustang fastback 1967,American car
";

    let mut reader = csv::Reader::from_reader(csv.as_bytes());
    for record in reader.records() {
        let record = record?;
        // println!("{:?}", record);
        println!(
            "In {}, {} built the {} model. It is a {}.",
            &record[0],
            &record[1],
            &record[2],
            &record[3]
        );
    }

    Ok(())
}

Serde deserializes data into strongly type structures. See the csv::Reader::deserialize method.

use std::error::Error;
use serde::Deserialize;

#[derive(Deserialize)]
struct Record {
    year: u16,
    make: String,
    model: String,
    description: String,
}

fn main() -> Result<(), Box<dyn Error>> {
    let csv = "\
year,make,model,description
1948,Porsche,356,Luxury sports car
1967,Ford,Mustang fastback 1967,American car
";

    let mut reader = csv::Reader::from_reader(csv.as_bytes());

    for record in reader.deserialize() {
        let record: Record = record?;
        println!(
            "In {}, {} built the {} model. It is a {}.",
            record.year,
            record.make,
            record.model,
            record.description
        );
    }

    Ok(())
}

Read CSV records with different delimiter

csv-badge cat-encoding-badge

Reads CSV records with a tab delimiter.

use std::error::Error;
use serde::Deserialize;
use csv::ReaderBuilder;

#[derive(Debug, Deserialize)]
struct Record {
    name: String,
    place: String,
    #[serde(deserialize_with = "csv::invalid_option")]
    id: Option<u64>,
}

fn main() -> Result<(), Box<dyn Error>> {

    let data = "\
name\tplace\tid
Mark\tMelbourne\t46
Ashley\tZurich\t92
Brian\tVenice\t
";

    let mut reader = ReaderBuilder::new().delimiter(b'\t').from_reader(data.as_bytes());
    for result in reader.deserialize::<Record>() {
        match result {
            Ok(rec) => {
                print!("name: {:10}  place: {:10}", rec.name, rec.place);
                if let Some(id) = rec.id {
                    println!("  id: {id:4}");
                } else  {
                    println!("  id: none");
                }
            },
            Err(e) => {
                println!("Error: {:?}", e);
            },
        }
    }

    Ok(())
}

Filter CSV records matching a predicate

csv-badge cat-encoding-badge

Returns only the rows from data with a field that matches query. In this case, cities in the state of California (CA).

use std::error::Error;

fn main() -> Result<(), Box<dyn Error>> {
    let query = "CA";
    let data = "\
City,State,Population,Latitude,Longitude
Kenai,AK,7610,60.5544444,-151.2583333
Oakman,AL,,33.7133333,-87.3886111
Sandfort,AL,,32.3380556,-85.2233333
West Hollywood,CA,37031,34.0900000,-118.3608333
";

    let mut rdr = csv::ReaderBuilder::new().from_reader(data.as_bytes());
    let mut wtr = csv::Writer::from_writer(std::io::stdout());

    wtr.write_record(rdr.headers()?)?;

    for result in rdr.records() {
        let record = result?;
        if record.iter().any(|field| field == query) {
            wtr.write_record(&record)?;
        }
    }

    wtr.flush()?;
    Ok(())
}

Disclaimer: this example has been adapted from the csv crate tutorial.

Handle invalid CSV data with Serde

csv-badge serde-badge cat-encoding-badge

CSV files often contain invalid data. In this example, not all the values used for id in the data can be parsed correctly as unsigned integers. This would normally generate a parse error.

The csv crate provides a custom deserializer, csv::invalid_option, which automatically converts invalid data to None values. Try commenting out the serde macro invocation in the definition of the struct Record before the id: Option<u64> line.

use std::error::Error;
use serde::Deserialize;

#[derive(Debug, Deserialize)]
struct Record {
    name: String,
    place: String,
    #[serde(deserialize_with = "csv::invalid_option")]
    id: Option<u64>,
}

fn main() -> Result<(), Box<dyn Error>> {
    let data = "\
name,place,id
doug,phoenix,
mark,sydney,46.5
ashley,zurich,92
akshat,delhi,37
alisha,colombo,xyz
";

    let mut rdr = csv::Reader::from_reader(data.as_bytes());
    for result in rdr.deserialize() {
        let record: Record = result?;
        // println!("{:?}", record);
        print!("{:10}  {:10}  ", record.name, record.place);
        if let Some(id) = record.id {
            println!("{id:2}");
        } else {
            println!("no id");
        }
    }

    Ok(())
}

Serialize records to CSV

csv-badge cat-encoding-badge

This example shows how to serialize a Rust tuple. csv::writer supports automatic serialization from Rust types into CSV records. write_record writes a simple record containing string data only. Data with more complex values such as numbers, floats, and options use serialize. Since CSV writer uses internal buffer, always explicitly flush when done.

use std::error::Error;
use std::io;

fn main() -> Result<(), Box<dyn Error>> {
    let mut wtr = csv::Writer::from_writer(io::stdout());

    wtr.write_record(&["Name", "Place", "ID"])?;

    wtr.serialize(("Mark", "Sydney", 87))?;
    wtr.serialize(("Ashley", "Dublin", 32))?;
    wtr.serialize(("Akshat", "Delhi", 11))?;

    wtr.flush()?;
    Ok(())
}

Serialize records to CSV using Serde

csv-badge serde-badge cat-encoding-badge

The following example shows how to serialize custom structs as CSV records using the serde crate. Here the csv is being directly to STDOUT because the writer is created from io::stdout(), but this could instead be anything that supports the std::io::Write trait. As with any file-like object, it is good practice to call flush() to ensure all buffered data has made it to the final destination (stable storage in case of a file, for example).

use std::error::Error;
use serde::Serialize;
use std::io;

#[derive(Serialize)]
struct Record<'a> {
    name: &'a str,
    place: &'a str,
    id: u64,
}

fn main() -> Result<(), Box<dyn Error>> {
    let mut wtr = csv::Writer::from_writer(io::stdout());

    let rec1 = Record { name: "Mark", place: "Melbourne", id: 56};
    let rec2 = Record { name: "Ashley", place: "Sydney", id: 64};
    let rec3 = Record { name: "Akshat", place: "Delhi", id: 98};

    wtr.serialize(rec1)?;
    wtr.serialize(rec2)?;
    wtr.serialize(rec3)?;

    wtr.flush()?;

    Ok(())
}

Transform CSV column

csv-badge serde-badge cat-encoding-badge

Transform a CSV file containing a color name and a hex color into one with a color name and an rgb color. Utilizes the csv crate to read and write the csv file, and serde to deserialize and serialize the rows to and from bytes.

See csv::Reader::deserialize, serde::Deserialize, and std::str::FromStr

use csv::{Reader, Writer};
use serde::{de, Deserialize, Deserializer};
use std::str::FromStr;
use anyhow::anyhow;

#[derive(Debug)]
struct HexColor {
    red: u8,
    green: u8,
    blue: u8,
}

#[derive(Debug, Deserialize)]
struct Row {
    color_name: String,
    color: HexColor,
}

impl FromStr for HexColor {
    type Err = anyhow::Error;

    fn from_str(hex_color: &str) -> std::result::Result<Self, Self::Err> {
        let trimmed = hex_color.trim_matches('#');
        if trimmed.len() != 6 {
            Err(anyhow!("Invalid length of hex string"))
        } else {
            Ok(HexColor {
                red: u8::from_str_radix(&trimmed[..2], 16)?,
                green: u8::from_str_radix(&trimmed[2..4], 16)?,
                blue: u8::from_str_radix(&trimmed[4..6], 16)?,
            })
        }
    }
}

impl<'a> Deserialize<'a> for HexColor {
    fn deserialize<D>(deserializer: D) -> std::result::Result<Self, D::Error>
    where
        D: Deserializer<'a>,
    {
        let s = String::deserialize(deserializer)?;
        FromStr::from_str(&s).map_err(de::Error::custom)
    }
}

fn main() -> anyhow::Result<()> {
    let data = "\
color_name,color
red,#ff0000
green,#00ff00
blue,#0000FF
periwinkle,#ccccff
magenta,#ff00ff
"
        .to_owned();
    let mut out = Writer::from_writer(vec![]);
    let mut reader = Reader::from_reader(data.as_bytes());
    for result in reader.deserialize::<Row>() {
        let res = result?;
        out.serialize((
            res.color_name,
            res.color.red,
            res.color.green,
            res.color.blue,
        ))?;
    }
    let written = String::from_utf8(out.into_inner()?)?;
    assert_eq!(Some("magenta,255,0,255"), written.lines().last());
    println!("{}", written);
    Ok(())
}