CSV processing
Add these crates to your own project:
cargo add csv serde anyhow
Read CSV records
Reads standard CSV records into csv::StringRecord
— a weakly typed
data representation which expects valid UTF-8 rows. Alternatively,
csv::ByteRecord
makes no assumptions about UTF-8.
use std::error::Error; fn main() -> Result<(), Box<dyn Error>> { let csv = "\ year,make,model,description 1948,Porsche,356,Luxury sports car 1967,Ford,Mustang fastback 1967,American car "; let mut reader = csv::Reader::from_reader(csv.as_bytes()); for record in reader.records() { let record = record?; // println!("{:?}", record); println!( "In {}, {} built the {} model. It is a {}.", &record[0], &record[1], &record[2], &record[3] ); } Ok(()) }
Serde deserializes data into strongly type structures. See the
csv::Reader::deserialize
method.
use std::error::Error; use serde::Deserialize; #[derive(Deserialize)] struct Record { year: u16, make: String, model: String, description: String, } fn main() -> Result<(), Box<dyn Error>> { let csv = "\ year,make,model,description 1948,Porsche,356,Luxury sports car 1967,Ford,Mustang fastback 1967,American car "; let mut reader = csv::Reader::from_reader(csv.as_bytes()); for record in reader.deserialize() { let record: Record = record?; println!( "In {}, {} built the {} model. It is a {}.", record.year, record.make, record.model, record.description ); } Ok(()) }
Read CSV records with different delimiter
Reads CSV records with a tab delimiter
.
use std::error::Error; use serde::Deserialize; use csv::ReaderBuilder; #[derive(Debug, Deserialize)] struct Record { name: String, place: String, #[serde(deserialize_with = "csv::invalid_option")] id: Option<u64>, } fn main() -> Result<(), Box<dyn Error>> { let data = "\ name\tplace\tid Mark\tMelbourne\t46 Ashley\tZurich\t92 Brian\tVenice\t "; let mut reader = ReaderBuilder::new().delimiter(b'\t').from_reader(data.as_bytes()); for result in reader.deserialize::<Record>() { match result { Ok(rec) => { print!("name: {:10} place: {:10}", rec.name, rec.place); if let Some(id) = rec.id { println!(" id: {id:4}"); } else { println!(" id: none"); } }, Err(e) => { println!("Error: {:?}", e); }, } } Ok(()) }
Filter CSV records matching a predicate
Returns only the rows from data
with a field that matches query
. In this case, cities in the state of California (CA).
use std::error::Error; fn main() -> Result<(), Box<dyn Error>> { let query = "CA"; let data = "\ City,State,Population,Latitude,Longitude Kenai,AK,7610,60.5544444,-151.2583333 Oakman,AL,,33.7133333,-87.3886111 Sandfort,AL,,32.3380556,-85.2233333 West Hollywood,CA,37031,34.0900000,-118.3608333 "; let mut rdr = csv::ReaderBuilder::new().from_reader(data.as_bytes()); let mut wtr = csv::Writer::from_writer(std::io::stdout()); wtr.write_record(rdr.headers()?)?; for result in rdr.records() { let record = result?; if record.iter().any(|field| field == query) { wtr.write_record(&record)?; } } wtr.flush()?; Ok(()) }
Disclaimer: this example has been adapted from the csv crate tutorial.
Handle invalid CSV data with Serde
CSV files often contain invalid data. In this example, not all the
values used for id
in the data can be parsed correctly as unsigned integers.
This would normally generate a parse error.
The csv
crate provides a custom deserializer, csv::invalid_option
,
which automatically converts invalid data to None
values. Try
commenting out the serde
macro invocation in the definition of the
struct Record
before the id: Option<u64>
line.
use std::error::Error; use serde::Deserialize; #[derive(Debug, Deserialize)] struct Record { name: String, place: String, #[serde(deserialize_with = "csv::invalid_option")] id: Option<u64>, } fn main() -> Result<(), Box<dyn Error>> { let data = "\ name,place,id doug,phoenix, mark,sydney,46.5 ashley,zurich,92 akshat,delhi,37 alisha,colombo,xyz "; let mut rdr = csv::Reader::from_reader(data.as_bytes()); for result in rdr.deserialize() { let record: Record = result?; // println!("{:?}", record); print!("{:10} {:10} ", record.name, record.place); if let Some(id) = record.id { println!("{id:2}"); } else { println!("no id"); } } Ok(()) }
Serialize records to CSV
This example shows how to serialize a Rust tuple. csv::writer
supports automatic
serialization from Rust types into CSV records. write_record
writes
a simple record containing string data only. Data with more complex values
such as numbers, floats, and options use serialize
. Since CSV
writer uses internal buffer, always explicitly flush
when done.
use std::error::Error; use std::io; fn main() -> Result<(), Box<dyn Error>> { let mut wtr = csv::Writer::from_writer(io::stdout()); wtr.write_record(&["Name", "Place", "ID"])?; wtr.serialize(("Mark", "Sydney", 87))?; wtr.serialize(("Ashley", "Dublin", 32))?; wtr.serialize(("Akshat", "Delhi", 11))?; wtr.flush()?; Ok(()) }
Serialize records to CSV using Serde
The following example shows how to serialize custom structs as CSV records using
the serde crate. Here the csv is being directly to STDOUT because the
writer is created from io::stdout()
, but this could instead be
anything that supports the std::io::Write
trait. As with any file-like object,
it is good practice to call flush()
to ensure all buffered data has made it to the
final destination (stable storage in case of a file, for example).
use std::error::Error; use serde::Serialize; use std::io; #[derive(Serialize)] struct Record<'a> { name: &'a str, place: &'a str, id: u64, } fn main() -> Result<(), Box<dyn Error>> { let mut wtr = csv::Writer::from_writer(io::stdout()); let rec1 = Record { name: "Mark", place: "Melbourne", id: 56}; let rec2 = Record { name: "Ashley", place: "Sydney", id: 64}; let rec3 = Record { name: "Akshat", place: "Delhi", id: 98}; wtr.serialize(rec1)?; wtr.serialize(rec2)?; wtr.serialize(rec3)?; wtr.flush()?; Ok(()) }
Transform CSV column
Transform a CSV file containing a color name and a hex color into one with a color name and an rgb color. Utilizes the csv crate to read and write the csv file, and serde to deserialize and serialize the rows to and from bytes.
See csv::Reader::deserialize
, serde::Deserialize
, and std::str::FromStr
use csv::{Reader, Writer}; use serde::{de, Deserialize, Deserializer}; use std::str::FromStr; use anyhow::anyhow; #[derive(Debug)] struct HexColor { red: u8, green: u8, blue: u8, } #[derive(Debug, Deserialize)] struct Row { color_name: String, color: HexColor, } impl FromStr for HexColor { type Err = anyhow::Error; fn from_str(hex_color: &str) -> std::result::Result<Self, Self::Err> { let trimmed = hex_color.trim_matches('#'); if trimmed.len() != 6 { Err(anyhow!("Invalid length of hex string")) } else { Ok(HexColor { red: u8::from_str_radix(&trimmed[..2], 16)?, green: u8::from_str_radix(&trimmed[2..4], 16)?, blue: u8::from_str_radix(&trimmed[4..6], 16)?, }) } } } impl<'a> Deserialize<'a> for HexColor { fn deserialize<D>(deserializer: D) -> std::result::Result<Self, D::Error> where D: Deserializer<'a>, { let s = String::deserialize(deserializer)?; FromStr::from_str(&s).map_err(de::Error::custom) } } fn main() -> anyhow::Result<()> { let data = "\ color_name,color red,#ff0000 green,#00ff00 blue,#0000FF periwinkle,#ccccff magenta,#ff00ff " .to_owned(); let mut out = Writer::from_writer(vec![]); let mut reader = Reader::from_reader(data.as_bytes()); for result in reader.deserialize::<Row>() { let res = result?; out.serialize(( res.color_name, res.color.red, res.color.green, res.color.blue, ))?; } let written = String::from_utf8(out.into_inner()?)?; assert_eq!(Some("magenta,255,0,255"), written.lines().last()); println!("{}", written); Ok(()) }