Scan con filtro usando HBase cáscara

46

probar esto. Es algo feo, pero funciona para mí.

import org.apache.hadoop.hbase.filter.CompareFilter 
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter 
import org.apache.hadoop.hbase.filter.SubstringComparator 
import org.apache.hadoop.hbase.util.Bytes 
scan 't1', { COLUMNS => 'family:qualifier', FILTER => 
    SingleColumnValueFilter.new 
     (Bytes.toBytes('family'), 
     Bytes.toBytes('qualifier'), 
     CompareFilter::CompareOp.valueOf('EQUAL'), 
     SubstringComparator.new('somevalue')) 
}

La cáscara HBase incluirá todo lo que tiene en ~/.irbrc, lo que puede poner algo como esto en allí (no soy experto Ruby, las mejoras son bienvenidos):

# imports like above 
def scan_substr(table,family,qualifier,substr,*cols) 
    scan table, { COLUMNS => cols, FILTER => 
     SingleColumnValueFilter.new 
      (Bytes.toBytes(family), Bytes.toBytes(qualifier), 
      CompareFilter::CompareOp.valueOf('EQUAL'), 
      SubstringComparator.new(substr)) } 
end

y entonces se puede decir simplemente en la cáscara:

scan_substr 't1', 'family', 'qualifier', 'somevalue', 'family:qualifier'

Fuente

2011-09-16 16:07:29 havanki4j

+0

Esto es de hecho super feo. Sin embargo, gracias, no pude encontrar ningún ejemplo de esto en el libro de HBase docs/book/oreilly. – mumrah

8

se utiliza la opción FILTRO de scan, como se muestra en la ayuda en el uso:

hbase(main):002:0> scan 

ERROR: wrong number of arguments (0 for 1) 

Here is some help for this command: 
Scan a table; pass table name and optionally a dictionary of scanner 
specifications. Scanner specifications may include one or more of: 
TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH, 
or COLUMNS. If no columns are specified, all columns will be scanned. 
To scan all members of a column family, leave the qualifier empty as in 
'col_family:'. 

Some examples: 

    hbase> scan '.META.' 
    hbase> scan '.META.', {COLUMNS => 'info:regioninfo'} 
    hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} 
    hbase> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)} 
    hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]} 

For experts, there is an additional option -- CACHE_BLOCKS -- which 
switches block caching for the scanner on (true) or off (false). By 
default it is enabled. Examples: 

    hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}

Fuente

2011-08-31 21:03:58 Tony

28

scan 'test', {COLUMNS => ['F'],FILTER => \ 
"(SingleColumnValueFilter('F','u',=,'regexstring:http:.*pdf',true,true)) AND \ 
(SingleColumnValueFilter('F','s',=,'binary:2',true,true))"}

Más información se puede encontrar here. Tenga en cuenta que existen varios ejemplos en el archivo adjunto Filter Language.docx.

Fuente

2012-06-28 02:13:25 dape

+0

Creo que este lenguaje de análisis de filtro solo funciona en versiones posteriores de Hbase; en 0.90.6 (cdh 3u6) no pude obtener ninguna variación de esto para que funcione. – Mikeb

+0

Creo que es muy útil mirar el javadoc; aquí está el javadoc para 0.94: http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html – mooreds

6

Scan scan = new Scan(); 
FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL); 

//in case you have multiple SingleColumnValueFilters, 
you would want the row to pass MUST_PASS_ALL conditions 
or MUST_PASS_ONE condition. 

SingleColumnValueFilter filter_by_name = new SingleColumnValueFilter( 
        Bytes.toBytes("SOME COLUMN FAMILY"), 
        Bytes.toBytes("SOME COLUMN NAME"), 
        CompareOp.EQUAL, 
        Bytes.toBytes("SOME VALUE")); 

filter_by_name.setFilterIfMissing(true); 
//if you don't want the rows that have the column missing. 
Remember that adding the column filter doesn't mean that the 
rows that don't have the column will not be put into the 
result set. They will be, if you don't include this statement. 

list.addFilter(filter_by_name); 


scan.setFilter(list);

Fuente

2014-02-18 07:03:53 KannarKK

+0

Este código está en Java, la pregunta es sobre el shell de HBase. – Tony

3

Uno del filtro es Valuefilter que se puede utilizar para filtrar todos los valores de columna.

hbase(main):067:0> scan 'dummytable', {FILTER => "ValueFilter(=,'binary:2016-01-26')"}

binario es uno de los comparadores utilizados dentro del filtro. Puede usar diferentes comparadores dentro del filtro según lo que desee hacer.

Puede consultar la siguiente url: http: // www.hadooptpoint.com/filters-in-hbase-shell/. Proporciona buenos ejemplos sobre cómo utilizar diferentes filtros en el shell HBase.

Fuente

2016-02-12 21:17:21

+0

Las respuestas de enlace único no son buenas. Publica un código y explícalo para ayudar. – KittMedia

Scan con filtro usando HBase cáscara

Respuesta

Cuestiones relacionadas