Load ODPS table into org.apache.spark.sql.DataFrame.
Load ODPS table into org.apache.spark.sql.DataFrame.
val sqlContext = ... val odpsOps = ... val odpstableDF = odpsOps.loadOdpsTable(sqlContext, "odps-project", "odps-table", Array(0, 2, 3), 2)
A Spark SQL context
The name of ODPS project.
The name of table, which job is reading.
Implying to load which columns, i.e. Array(0, 1, 3).
The number of RDD partition, implying the concurrency to read ODPS table.
A DataFrame which contains relevant records of ODPS table.
Load ODPS table into org.apache.spark.sql.DataFrame.
Load ODPS table into org.apache.spark.sql.DataFrame.
val sqlContext = ... val odpsOps = ... val odpstableDF = odpsOps.loadOdpsTable(sqlContext, "odps-project", "odps-table", "odps-partition", Array(0, 2, 3), 2)
A Spark SQL context
The name of ODPS project.
The name of table, which job is reading.
The name of partition, when job is reading a
Partitioned Table
, like pt='xxx',ds='yyy'.
Implying to load which columns
The number of RDD partition, implying the concurrency to read ODPS table.
A DataFrame which contains relevant records of ODPS table.
Read table from ODPS.
Read table from ODPS.
val odpsOps = ... val odpsTable = odpsOps.readTable("odps-project", "odps-table", readFunc, 2) def readFunc(record: Record, schema: TableSchema): Array[Long] = { val ret = new Array[Long](schema.getColumns.size()) for (i <- 0 until schema.getColumns.size()) { ret(i) = record.getString(i).toLong } ret }
The name of ODPS project.
The name of table, which job is reading.
A function for transferring ODPS table to org.apache.spark.rdd.RDD. We apply the function to all com.aliyun.odps.data.Record of table.
The number of RDD partition, implying the concurrency to read ODPS table.
A RDD which contains all records of ODPS table.
Read table from ODPS.
Read table from ODPS.
val odpsOps = ... val odpsTable = odpsOps.readTable("odps-project", "odps-table", "odps-partition", readFunc, 2) def readFunc(record: Record, schema: TableSchema): Array[Long] = { val ret = new Array[Long](schema.getColumns.size()) for (i <- 0 until schema.getColumns.size()) { ret(i) = record.getString(i).toLong } ret }
The name of ODPS project.
The name of table, which job is reading.
The name of partition, when job is reading a
Partitioned Table
, like pt='xxx',ds='yyy'.
A function for transferring ODPS table to org.apache.spark.rdd.RDD. We apply the function to all com.aliyun.odps.data.Record of table.
The number of RDD partition, implying the concurrency to read ODPS table.
A RDD which contains all records of ODPS table.
Read table from ODPS.
Read table from ODPS.
OdpsOps odpsOps = ... static class RecordToLongs implements Function2<Record, TableSchema, List<Long>> { @Override public List<Long> call(Record record, TableSchema schema) throws Exception { List<Long> ret = new ArrayList<Long>(); for (int i = 0; i < schema.getColumns().size(); i++) { ret.add(Long.valueOf(record.getString(i))); } return ret; } } JavaRDD<List<Long>> readData = odpsOps.readTableWithJava("odps-project", "odps-table", new RecordToLongs(), 2);
The name of ODPS project.
The name of table from which the job is reading
A function for transferring ODPS table to org.apache.spark.api.java.JavaRDD. We apply the function to all com.aliyun.odps.data.Record of table.
The number of RDD partition, implying the concurrency to read ODPS table.
A JavaRDD which contains all records of ODPS table.
Read table from ODPS.
Read table from ODPS.
OdpsOps odpsOps = ... static class RecordToLongs implements Function2<Record, TableSchema, List<Long>> { @Override public List<Long> call(Record record, TableSchema schema) throws Exception { List<Long> ret = new ArrayList<Long>(); for (int i = 0; i < schema.getColumns().size(); i++) { ret.add(Long.valueOf(record.getString(i))); } return ret; } } JavaRDD<List<Long>> readData = odpsOps.readTableWithJava("odps-project", "odps-table", "odps-partition", new RecordToLongs(), 2);
The name of ODPS project.
The name of table, which job are reading.
The name of partition, when job is reading a
Partitioned Table
, like pt='xxx',ds='yyy'.
A function for transferring ODPS table to org.apache.spark.api.java.JavaRDD. We apply the function to all com.aliyun.odps.data.Record of table.
The number of RDD partition, implying the concurrency to read ODPS table.
A JavaRDD which contains all records of ODPS table.
Save a RDD to ODPS table.
Save a RDD to ODPS table.
val odpsOps = ... val data: RDD[Array[Long]] = ... odps.saveToTable("odps-project", "odps-table", data, writeFunc) def writeFunc(kv: Array[Long], record: Record, schema: TableSchema) { for (i <- 0 until schema.getColumns.size()) { record.setString(i, kv(i).toString) } }
The name of ODPS project.
The name of table, which job is writing.
A org.apache.spark.rdd.RDD which will be written into a ODPS table.
A function for transferring org.apache.spark.rdd.RDD to ODPS table. We apply the function to all elements of RDD.
Save a RDD to ODPS table.
Save a RDD to ODPS table.
val odpsOps = ... val data: RDD[Array[Long]] = ... odps.saveToTable("odps-project", "odps-table", "odps-partition", data, writeFunc, false, false) def writeFunc(kv: Array[Long], record: Record, schema: TableSchema) { for (i <- 0 until schema.getColumns.size()) { record.setString(i, kv(i).toString) } }
The name of ODPS project.
The name of table, which job is writing.
The name of partition, when job is writing a
Partitioned Table
, like pt='xxx',ds='yyy'.
A org.apache.spark.rdd.RDD which will be written into a ODPS table.
A function for transferring org.apache.spark.rdd.RDD to ODPS table. We apply the function to all elements of RDD.
Implying whether to create a table partition, if the specific partition does not exist.
Implying whether to overwrite the specific partition if exists. NOTE: only support overwriting partition, not table.
Save a RDD to ODPS table.
Save a RDD to ODPS table.
val odpsOps = ... val data: RDD[Array[Long]] = ... odps.saveToTable("odps-project", "odps-table", "odps-partition", data, writeFunc, false) def writeFunc(kv: Array[Long], record: Record, schema: TableSchema) { for (i <- 0 until schema.getColumns.size()) { record.setString(i, kv(i).toString) } }
The name of ODPS project.
The name of table, which job is writing.
The name of partition, when job is writing a
Partitioned Table
, like pt='xxx',ds='yyy'.
A org.apache.spark.rdd.RDD which will be written into a ODPS table.
A function for transferring org.apache.spark.rdd.RDD to ODPS table. We apply the function to all elements of RDD.
Implying whether to create a table partition, if the specific partition does not exist.
Save a RDD to ODPS table.
Save a RDD to ODPS table.
val odpsOps = ... val data: RDD[Array[Long]] = ... odps.saveToTable("odps-project", "odps-table", "odps-partition", data, writeFunc) def writeFunc(kv: Array[Long], record: Record, schema: TableSchema) { for (i <- 0 until schema.getColumns.size()) { record.setString(i, kv(i).toString) } }
The name of ODPS project.
The name of table, which job is writing.
The name of partition, when job is writing a
Partitioned Table
, like pt='xxx',ds='yyy'.
A org.apache.spark.rdd.RDD which will be written into a ODPS table.
A function for transferring org.apache.spark.rdd.RDD to ODPS table. We apply the function to all elements of RDD.
Save a RDD to ODPS table.
Save a RDD to ODPS table.
OdpsOps odpsOps = ... JavaRDD<List<Long>> data = ... static class SaveRecord implements Function3<List<Long>, Record, TableSchema, BoxedUnit> { @Override public BoxedUnit call(List<Long> data, Record record, TableSchema schema) throws Exception { for (int i = 0; i < schema.getColumns().size(); i++) { record.setString(i, data.get(i).toString()); } return null; } } odpsOps.saveToTableWithJava("odps-project", "odps-table", data, new SaveRecord());
The name of ODPS project.
The name of table to which the job is writing.
A org.apache.spark.api.java.JavaRDD which will be written into a ODPS table.
A function for transferring org.apache.spark.api.java.JavaRDD to ODPS table. We apply the function to all elements of JavaRDD.
Save a RDD to ODPS table.
Save a RDD to ODPS table.
OdpsOps odpsOps = ... JavaRDD<List<Long>> data = ... static class SaveRecord implements Function3<List<Long>, Record, TableSchema, BoxedUnit> { @Override public BoxedUnit call(List<Long> data, Record record, TableSchema schema) throws Exception { for (int i = 0; i < schema.getColumns().size(); i++) { record.setString(i, data.get(i).toString()); } return null; } } odpsOps.saveToTableWithJava("odps-project", "odps-table", "odps-partition", data, new SaveRecord(), false, false);
The name of ODPS project.
The name of table to which the job is writing.
The name of partition, when job is writing a
Partitioned Table
, like pt='xxx',ds='yyy'.
A org.apache.spark.api.java.JavaRDD which will be written into a ODPS table.
A function for transferring org.apache.spark.api.java.JavaRDD to ODPS table. We apply the function to all elements of JavaRDD.
Implying whether to create a table partition, if the specific partition does not exist.
Implying whether to overwrite the specific partition if exists. NOTE: only support overwriting partition, not table.
Save a RDD to ODPS table.
Save a RDD to ODPS table.
OdpsOps odpsOps = ... JavaRDD<List<Long>> data = ... static class SaveRecord implements Function3<List<Long>, Record, TableSchema, BoxedUnit> { @Override public BoxedUnit call(List<Long> data, Record record, TableSchema schema) throws Exception { for (int i = 0; i < schema.getColumns().size(); i++) { record.setString(i, data.get(i).toString()); } return null; } } odpsOps.saveToTableWithJava("odps-project", "odps-table", "odps-partition", data, new SaveRecord(), false);
The name of ODPS project.
The name of table to which the job is writing.
The name of partition, when job is writing a
Partitioned Table
, like pt='xxx',ds='yyy'.
A org.apache.spark.api.java.JavaRDD which will be written into a ODPS table.
A function for transferring org.apache.spark.api.java.JavaRDD to ODPS table.We apply the function to all elements of JavaRDD.
Implying whether to create a table partition, if specific partition does not exist.
Save a RDD to ODPS table.
Save a RDD to ODPS table.
OdpsOps odpsOps = ... JavaRDD<List<Long>> data = ... static class SaveRecord implements Function3<List<Long>, Record, TableSchema, BoxedUnit> { @Override public BoxedUnit call(List<Long> data, Record record, TableSchema schema) throws Exception { for (int i = 0; i < schema.getColumns().size(); i++) { record.setString(i, data.get(i).toString()); } return null; } } odpsOps.saveToTableWithJava("odps-project", "odps-table", "odps-partition", data, new SaveRecord());
The name of ODPS project.
The name of table to which the job is writing.
The name of partition, when job is writing a
Partitioned Table
, like pt='xxx',ds='yyy'.
A org.apache.spark.api.java.JavaRDD which will be written into a ODPS table.
A function for transferring org.apache.spark.api.java.JavaRDD to ODPS table. We apply the function to all elements of JavaRDD.