Skip to content

How to parse a csv file in java

Robson edited this page Dec 22, 2018 · 6 revisions

TL DR

use a library, like my Java Csv Parser

How not to do it

It's easy to think you can just write a quick loop line by line and split the line on the ,

BufferedReader + split

try (BufferedReader reader = Files.newBufferedReader(path)) {
    String line;
    while((line = reader.readLine()) != null ){
        String[] row = line.split(",");
        doSomeStuff(row);
    }        
}

If you can guarantee that you have no carriage return in your value, no field protected by a quote and you don't mind it being slow then you will be good.

Using a library

There are a pletora of library available that allows you to read the file easily without having to worries about multi lined quote protected field. Most of them will also delivered better performance than your hand written code.

Most of them will also provide mapping to object capability. We will look quickly look at 3 of them.

Opencsv

Also not really maintained anymore it has a good support of different file format. And provide a writer and a reader. There are now a few forks on google code and github.

jackson-dataformat-csv

Jackson is the best json parser that I know of and dataformat-csv is an extension that maps the csv to an Object. It's very fast and highly configurable. Although because it's made to map to object it's actually more declaration to just get a String[].

SimpleFlatMapper

Is my project and is aimed at mapping flat record to object. As part of that I also wrote a simple csv parser that is essentially a state machine iterating over the chars. It works pretty well and will give the best performance overall.

It's a very complete csv library that perform very well on very big files, unfortunately not so good on more regular row numbers. I will skip the sample for that one for now as it prob won't be relevant for most uses.

PS still waiting for an update of the benchmark with 1.0.0 of sfm that fixes a lot of perf issue. I also need to update my fork with those result.

Samples Reading Strings

Opencsv

try (CSVReader reader = new CSVReader(new FileReader(myfile))) {
    String [] nextLine;
    while ((nextLine = reader.readNext()) != null) {
        doSomeStuff(nextLine);
    }
}

jackson-dataformat-csv

CsvMapper csvMapper = new CsvMapper();
csvMapper.enable(CsvParser.Feature.WRAP_AS_ARRAY);
ObjectReader oReader = jacksonCsvMapper.reader(String[].class);

try (Reader reader = new FileReader(myfile)) {
    MappingIterator<String[]> mi = oReader.readValues(reader);
    while (mi.hasNext()) {
        doSomeStuff(mi.next());
    }
}

SimpleFlatMapper

try (Reader reader = new FileReader(myfile)) {
    CsvParser.stream(reader).forEach(this::doSomeStuff)
}

Reading Objects

OpenCSV

CsvToBean<MyObject> csvToBean = new CsvToBean<MyObject>();
HeaderColumnNameTranslateMappingStrategy<MyObject> strategy =
    new HeaderColumnNameTranslateMappingStrategy<MyObject>();
strategy.setType(MyObject.class);

try (Reader reader = new FileReader(myfile)) {
    List<MyObject> list = csvToBean.parse(strategy, new CSVReader(reader));
    for(MyObject o : list) {
         doSomeStuffWithObject(o);
    }
}finally {
	reader.close();
}

Jackson

CsvMapper csvMapper = new CsvMapper();
CsvSchema schema = CsvSchema.emptySchema().withHeader(); 

ObjectReader oReader = csvMapper.reader(MyObject.class).with(schema);

try (Reader reader = new FileReader(myfile)) {
    MappingIterator<MyObject> mi = oReader.readValues(reader);
    while (mi.hasNext()) {
        doSomeStuffWithObject(mi.next());
    }
}

SimpleFlatMapper 1.2.0

try (Reader reader = new FileReader(myfile)) {
    CsvParser
        .mapTo(MyObject.class)
        .stream(reader)
        .forEach(this::doSomeStuffWithObject);
}

Conclusions

  • don't wast time hand coding your parsing.
  • the csv libraries are very easy to use
  • a fluent api + streams makes things a lot easier to use.
  • SimpleFlatMapper is faster and easier to use ... says the guy who wrote it.
Clone this wiki locally