Package org.apache.flink.formats.csv
Class CsvReaderFormat<T>
- java.lang.Object
-
- org.apache.flink.connector.file.src.reader.SimpleStreamFormat<T>
-
- org.apache.flink.formats.csv.CsvReaderFormat<T>
-
- Type Parameters:
T
- The type of the returned elements.
- All Implemented Interfaces:
Serializable
,org.apache.flink.api.java.typeutils.ResultTypeQueryable<T>
,org.apache.flink.connector.file.src.reader.StreamFormat<T>
@PublicEvolving public class CsvReaderFormat<T> extends org.apache.flink.connector.file.src.reader.SimpleStreamFormat<T>
AStreamFormat
for reading CSV files.The following example shows how to create a
CsvReaderFormat
where the schema for CSV parsing is automatically derived based on the fields of a POJO class.
Note: you might need to addCsvReaderFormat<SomePojo> csvFormat = CsvReaderFormat.forPojo(SomePojo.class); FileSource<SomePojo> source = FileSource.forRecordStreamFormat(csvFormat, Path.fromLocalFile(filesPath)).build();
@JsonPropertyOrder({field1, field2, ...})
annotation from theJackson
library to your class definition with the fields order exactly matching those of the CSV file columns).If you need more fine-grained control over the CSV schema or the parsing options, use the more low-level
forSchema
static factory method based on theJackson
library utilities:Function<CsvMapper, CsvSchema> schemaGenerator = mapper -> mapper.schemaFor(SomePojo.class) .withColumnSeparator('|'); CsvReaderFormat<SomePojo> csvFormat = CsvReaderFormat.forSchema(() -> new CsvMapper(), schemaGenerator, TypeInformation.of(SomePojo.class)); FileSource<SomePojo> source = FileSource.forRecordStreamFormat(csvFormat, Path.fromLocalFile(filesPath)).build();
- See Also:
- Serialized Form
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.flink.connector.file.src.reader.StreamFormat.Reader<T>
createReader(org.apache.flink.configuration.Configuration config, org.apache.flink.core.fs.FSDataInputStream stream)
static <T> CsvReaderFormat<T>
forPojo(Class<T> pojoType)
Builds a newCsvReaderFormat
for reading CSV files mapped to the provided POJO class definition.static <T> CsvReaderFormat<T>
forSchema(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema schema, org.apache.flink.api.common.typeinfo.TypeInformation<T> typeInformation)
Builds a newCsvReaderFormat
using aCsvSchema
.static <T> CsvReaderFormat<T>
forSchema(org.apache.flink.util.function.SerializableSupplier<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper> mapperFactory, org.apache.flink.util.function.SerializableFunction<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper,org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema> schemaGenerator, org.apache.flink.api.common.typeinfo.TypeInformation<T> typeInformation)
Builds a newCsvReaderFormat
using aCsvSchema
generator andCsvMapper
factory.org.apache.flink.api.common.typeinfo.TypeInformation<T>
getProducedType()
CsvReaderFormat<T>
withIgnoreParseErrors()
Returns a newCsvReaderFormat
configured to ignore all parsing errors.
-
-
-
Method Detail
-
forSchema
public static <T> CsvReaderFormat<T> forSchema(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema schema, org.apache.flink.api.common.typeinfo.TypeInformation<T> typeInformation)
Builds a newCsvReaderFormat
using aCsvSchema
.- Type Parameters:
T
- The type of the returned elements.- Parameters:
schema
- The Jackson CSV schema configured for parsing specific CSV files.typeInformation
- The Flink type descriptor of the returned elements.
-
forSchema
public static <T> CsvReaderFormat<T> forSchema(org.apache.flink.util.function.SerializableSupplier<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper> mapperFactory, org.apache.flink.util.function.SerializableFunction<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper,org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema> schemaGenerator, org.apache.flink.api.common.typeinfo.TypeInformation<T> typeInformation)
Builds a newCsvReaderFormat
using aCsvSchema
generator andCsvMapper
factory.- Type Parameters:
T
- The type of the returned elements.- Parameters:
mapperFactory
- The factory creating theCsvMapper
.schemaGenerator
- A generator that creates and configures the Jackson CSV schema for parsing specific CSV files, from a mapper created by the mapper factory.typeInformation
- The Flink type descriptor of the returned elements.
-
forPojo
public static <T> CsvReaderFormat<T> forPojo(Class<T> pojoType)
Builds a newCsvReaderFormat
for reading CSV files mapped to the provided POJO class definition. Produced reader uses default mapper and schema settings, useforSchema
if you need customizations.- Type Parameters:
T
- The type of the returned elements.- Parameters:
pojoType
- The type class of the POJO.
-
withIgnoreParseErrors
public CsvReaderFormat<T> withIgnoreParseErrors()
Returns a newCsvReaderFormat
configured to ignore all parsing errors. All the other options directly carried over from the subject of the method call.
-
createReader
public org.apache.flink.connector.file.src.reader.StreamFormat.Reader<T> createReader(org.apache.flink.configuration.Configuration config, org.apache.flink.core.fs.FSDataInputStream stream) throws IOException
- Specified by:
createReader
in classorg.apache.flink.connector.file.src.reader.SimpleStreamFormat<T>
- Throws:
IOException
-
getProducedType
public org.apache.flink.api.common.typeinfo.TypeInformation<T> getProducedType()
- Specified by:
getProducedType
in interfaceorg.apache.flink.api.java.typeutils.ResultTypeQueryable<T>
- Specified by:
getProducedType
in interfaceorg.apache.flink.connector.file.src.reader.StreamFormat<T>
- Specified by:
getProducedType
in classorg.apache.flink.connector.file.src.reader.SimpleStreamFormat<T>
-
-