Ejecutar trabajo de Hadoop sin utilizar JobConf

No puedo encontrar un solo ejemplo de envío de un trabajo de Hadoop que no utilice la clase obsoleta JobConf. JobClient, que no se ha desaprobado, solo admite métodos que toman un parámetro JobConf.Ejecutar trabajo de Hadoop sin utilizar JobConf

Por favor alguien puede apuntar a un ejemplo de código Java presentar un mapa Hadoop/reducir el trabajo usando sólo la clase Configuration (no JobConf), y utilizando el paquete mapreduce.lib.input en lugar de mapred.input?

Fuente

2010-01-22 Greg Cottman

Esperanza útil esta información

import java.io.File; 

import org.apache.commons.io.FileUtils; 
import org.apache.hadoop.conf.Configured; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.Mapper; 
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 
import org.apache.hadoop.util.Tool; 
import org.apache.hadoop.util.ToolRunner; 

public class MapReduceExample extends Configured implements Tool { 

    static class MyMapper extends Mapper<LongWritable, Text, LongWritable, Text> { 
     public MyMapper(){ 

     } 

     protected void map(
       LongWritable key, 
       Text value, 
       org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, LongWritable, Text>.Context context) 
       throws java.io.IOException, InterruptedException { 
      context.getCounter("mygroup", "jeff").increment(1); 
      context.write(key, value); 
     }; 
    } 

    @Override 
    public int run(String[] args) throws Exception { 
     Job job = new Job(); 
     job.setMapperClass(MyMapper.class); 
     FileInputFormat.setInputPaths(job, new Path(args[0])); 
     FileOutputFormat.setOutputPath(job, new Path(args[1])); 

     job.waitForCompletion(true); 
     return 0; 
    } 

    public static void main(String[] args) throws Exception { 
     FileUtils.deleteDirectory(new File("data/output")); 
     args = new String[] { "data/input", "data/output" }; 
     ToolRunner.run(new MapReduceExample(), args); 
    } 
}

Fuente

2010-01-22 09:12:29 zjffdu

Los tres constructores 'Job' ahora están en desuso. La forma correcta es: 'Trabajo de trabajo = Job.getInstance (getConf());' –

¿En qué versión? Estoy usando v1.0.4 pero no encontré este constructor. –

creo this tutorial ilustra la eliminación de la clase JobConf obsoleto el uso de Hadoop 0.20.1.

Fuente

2010-01-23 20:20:31

Este es un buen ejemplo con código descargable: http://sonerbalkir.blogspot.com/2010/01/new-hadoop-api-020x.html También tiene más de dos años y no hay documentación oficial sobre la nueva API. Triste.

Fuente

2012-04-20 17:15:45

En la API anterior, hubo tres formas de enviar el trabajo y una de ellas es enviando el trabajo y obteniendo una referencia a RunningJob y obteniendo una identificación de RunningJob.

submitJob(JobConf) : only submits the job, then poll the returned handle to the RunningJob to query status and make scheduling decisions.

¿Cómo se puede utilizar la nueva API y obtener una referencia a la RunningJob y obtener un identificador de la runningJob ya que ninguno de la API de devolver una referencia a RunningJob

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html

gracias

Fuente

2013-04-08 04:35:38 Yatin

Intente usar Configuration y Job. Aquí está un ejemplo:

(Cambie sus Mapper, Combiner, Reducer clases y otra configuración)

import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.Mapper; 
import org.apache.hadoop.mapreduce.Reducer; 
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; 
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; 

public class WordCount { 
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { 
    Configuration conf = new Configuration(); 
    if(args.length != 2) { 
     System.err.println("Usage: <in> <out>"); 
     System.exit(2); 
    } 
    Job job = Job.getInstance(conf, "Word Count"); 

    // set jar 
    job.setJarByClass(WordCount.class); 

    // set Mapper, Combiner, Reducer 
    job.setMapperClass(TokenizerMapper.class); 
    job.setCombinerClass(IntSumReducer.class); 
    job.setReducerClass(IntSumReducer.class); 

    /* Optional, set customer defined Partioner: 
    * job.setPartitionerClass(MyPartioner.class); 
    */ 

    // set output key 
    job.setMapOutputKeyClass(Text.class); 
    job.setMapOutputValueClass(IntWritable.class); 
    job.setOutputKeyClass(Text.class); 
    job.setOutputValueClass(IntWritable.class); 

    // set input and output path 
    FileInputFormat.addInputPath(job, new Path(args[0])); 
    FileOutputFormat.setOutputPath(job, new Path(args[1])); 

    // by default, Hadoop use TextInputFormat and TextOutputFormat 
    // any customer defined input and output class must implement InputFormat/OutputFormat interface 
    job.setInputFormatClass(TextInputFormat.class); 
    job.setOutputFormatClass(TextOutputFormat.class); 

    System.exit(job.waitForCompletion(true) ? 0 : 1); 
    } 
}

Fuente

2015-03-31 09:33:08 coderz

Ejecutar trabajo de Hadoop sin utilizar JobConf

Respuesta

Cuestiones relacionadas