神刀安全网

一篇文章看懂TPCx-BB(大数据基准测试工具)源码

TPCx-BB是大数据基准测试工具,它通过模拟零售商的30个应用场景,执行30个查询来衡量基于Hadoop的大数据系统的包括硬件和软件的性能。其中一些场景还用到了机器学习算法(聚类、线性回归等)。为了更好地了解被测试的系统的性能,需要对TPCx-BB整个测试流程深入了解。本文详细分析了整个TPCx-BB测试工具的源码,希望能够对大家理解TPCx-BB有所帮助。

代码结构

主目录($BENCH_MARK_HOME)下有:

  • bin
  • conf
  • data-generator
  • engines
  • tools

几个子目录。

bin下有几个 module ,是执行时需要用到的脚本:bigBench、cleanLogs、logEnvInformation、runBenchmark、zipLogs等

conf下有两个配置文件:bigBench.propertiesuserSettings.conf

bigBench.properties 主要设置 workload(执行的benchmarkPhases)和 power_test_0POWER_TEST 阶段需要执行的SQL查询)

默认 workload

workload=CLEAN_ALL,ENGINE_VALIDATION_DATA_GENERATION,ENGINE_VALIDATION_LOAD_TEST,ENGINE_VALIDATION_POWER_TEST,ENGINE_VALIDATION_RESULT_VALIDATION,CLEAN_DATA,DATA_GENERATION,BENCHMARK_START,LOAD_TEST,POWER_TEST,THROUGHPUT_TEST_1,BENCHMARK_STOP,VALIDATE_POWER_TEST,VALIDATE_THROUGHPUT_TEST_1

默认 power_test_01-30

userSetting.conf 是一些基本设置,包括JAVA environment 、default settings for benchmark(database、engine、map_tasks、scale_factor …)、HADOOP environment、
HDFS config and paths、Hadoop data generation options(DFS_REPLICATION、HADOOP_JVM_ENV…)

data-generator下是跟数据生成相关的脚本及配置文件。详细内容在下面介绍。

engines下是TPCx-BB支持的4种引擎:biginsights、hive、impala、spark_sql。默认引擎为hive。实际上,只有hive目录下不为空,其他三个目录下均为空,估计是现在还未完善。

tools下有两个jar包:HadoopClusterExec.jarRunBigBench.jar 。其中 RunBigBench.jar 是执行TPCx-BB测试的一个非常重要的文件,大部分程序都在该jar包内。

数据生成

数据生成相关程序和配置都在 data-generator 目录下。该目录下有一个 pdgf.jar 包和 config、dicts、extlib 三个子目录。

pdgf.jar是数据生成的Java程序,代码量很大。config下有两个配置文件:bigbench-generation.xmlbigbench-schema.xml

bigbench-generation.xml 主要设置生成的原始数据(不是数据库表)包含哪几张表、每张表的表名以及表输出的目录、表文件的后缀、分隔符、字符编码等。

<schema name="default">         <tables>             <!-- not refreshed tables -->                  <!-- tables not used in benchmark, but some tables have references to them. not refreshed. Kept for legacy reasons  -->             <table name="income_band"></table>             <table name="reason"></table>             <table name="ship_mode"></table>             <table name="web_site"></table>                     <!-- /tables not used in benchmark  -->              <!-- Static tables (fixed small size, generated only on node 1, skipped on others, not generated during refresh) -->             <table name="date_dim" static="true"></table>             <table name="time_dim" static="true"></table>             <table name="customer_demographics" static="true"></table>             <table name="household_demographics" static="true"></table>             <!-- /static tables -->              <!-- "normal" tables. split over all nodes. not generated during refresh -->             <table name="store"></table>             <table name="warehouse"></table>             <table name="promotion"></table>             <table name="web_page"></table>             <!-- /"normal" tables.-->          <!-- /not refreshed tables -->                  <!--              refreshed tables. Generated on all nodes.              Refresh tables generate extra data during refresh (e.g. add new data to the existing tables)             In "normal"-Phase  generate table rows:  [0,REFRESH_PERCENTAGE*Table.Size];              In "refresh"-Phase generate table rows:  [REFRESH_PERCENTAGE*Table.Size+1, Table.Size]              .Has effect only if  ${REFRESH_SYSTEM_ENABLED}==1.              -->             <table name="customer">                 <scheduler name="DefaultScheduler">                     <partitioner                         name="pdgf.core.dataGenerator.scheduler.TemplatePartitioner">                         <prePartition><![CDATA[                     if(${REFRESH_SYSTEM_ENABLED}>0){                         int tableID = table.getTableID();                         int timeID = 0;                         long lastTableRow=table.getSize()-1;                         long rowStart;                         long rowStop;                         boolean exclude=false;                         long refreshRows=table.getSize()*(1.0-${REFRESH_PERCENTAGE});                         if(${REFRESH_PHASE}>0){                             //Refresh part                             rowStart = lastTableRow - refreshRows +1;                             rowStop  = lastTableRow;                             if(refreshRows<=0){                                 exclude=true;                             }                          }else{                             //"normal" part                             rowStart = 0;                             rowStop = lastTableRow - refreshRows;                         }                         return new pdgf.core.dataGenerator.scheduler.Partition(tableID, timeID,rowStart,rowStop,exclude);                     }else{                         //DEFAULT                         return getParentPartitioner().getDefaultPrePartition(project, table);                                     }                      ]]></prePartition>                     </partitioner>                 </scheduler>             </table>
<output name="SplitFileOutputWrapper">   <!-- DEFAULT output for all Tables, if no table specific output is specified-->     <output name="CSVRowOutput">       <fileTemplate><![CDATA[outputDir + table.getName() +(nodeCount!=1?"_"+pdgf.util.StaticHelper.zeroPaddedNumber(nodeNumber,nodeCount):"")+ fileEnding]]></fileTemplate>       <outputDir>output/</outputDir>       <fileEnding>.dat</fileEnding>       <delimiter>|</delimiter>       <charset>UTF-8</charset>       <sortByRowID>true</sortByRowID>     </output>      <output name="StatisticsOutput" active="1">       <size>${item_size}</size><!-- a counter per item .. initialize later-->        <fileTemplate><![CDATA[outputDir + table.getName()+"_audit" +(nodeCount!=1?"_"+pdgf.util.StaticHelper.zeroPaddedNumber(nodeNumber,nodeCount):"")+ fileEnding]]></fileTemplate>       <outputDir>output/</outputDir>       <fileEnding>.csv</fileEnding>       <delimiter>,</delimiter>       <header><!--"" + pdgf.util.Constants.DEFAULT_LINESEPARATOR-->       </header>       <footer></footer>

bigbench-schema.xml 设置了很多参数,有跟表的规模有关的,比如每张表的大小(记录的条数);绝大多数是跟表的字段有关的,比如时间的起始、结束、性别比例、结婚比例、指标的上下界等。还具体定义了每个字段是怎么生成的,以及限制条件。示例如下:

生成的数据大小由 SCALE_FACTOR(-f) 决定。如 -f 1,则生成的数据总大小约为1G;-f 100,则生成的数据总大小约为100G。那么SCALE_FACTOR(-f) 是怎么精确控制生成的数据的大小呢?

原因是 SCALE_FACTOR(-f) 决定了每张表的记录数。如下,customer 表的记录数为 100000.0d * ${SF_sqrt},即如果 -f 1customer 表的记录数为 100000*sqrt(1)= 10万条 ;如果 -f 100customer 表的记录数为 100000*sqrt(100)= 100万条

<property name="${customer_size}" type="long">100000.0d * ${SF_sqrt}</property>
<property name="${DIMENSION_TABLES_START_DAY}" type="datetime">2000-01-03 00:00:00</property> <property name="${DIMENSION_TABLES_END_DAY}" type="datetime">2004-01-05 00:00:00</property>
<property name="${gender_likelihood}" type="double">0.5</property> <property name="${married_likelihood}" type="double">0.3</property>
<property name="${WP_LINK_MIN}" type="double">2</property> <property name="${WP_LINK_MAX}" type="double">25</property>
  <field name="d_date" size="13" type="CHAR" primary="false">    <gen_DateTime>      <disableRng>true</disableRng>      <useFixedStepSize>true</useFixedStepSize>      <startDate>${date_dim_begin_date}</startDate>      <endDate>${date_dim_end_date}</endDate>      <outputFormat>yyyy-MM-dd</outputFormat>     </gen_DateTime>   </field>
  <field name="t_time_id" size="16" type="CHAR" primary="false">    <gen_ConvertNumberToString>     <gen_Id/>     <size>16.0</size>     <characters>ABCDEFGHIJKLMNOPQRSTUVWXYZ</characters>    </gen_ConvertNumberToString>   </field>
<field name="cd_dep_employed_count" size="10" type="INTEGER" primary="false">    <gen_Null probability="${NULL_CHANCE}">    <gen_WeightedListItem filename="dicts/bigbench/ds-genProbabilities.txt" list="dependent_count" valueColumn="0" weightColumn="0" />    </gen_Null>   </field>

dicts下有city.dict、country.dict、male.dict、female.dict、state.dict、mail_provider.dict等字典文件,表里每一条记录的各个字段应该是从这些字典里生成的。

extlib下是引用的外部程序jar包。有 lucene-core-4.9.0.jarcommons-net-3.3.jarxml-apis.jarlog4j-1.2.15.jar

总结

pdgf.jar根据bigbench-generation.xmlbigbench-schema.xml两个文件里的配置(表名、字段名、表的记录条数、每个字段生成的规则),从 dicts 目录下对应的 .dict
文件获取表中每一条记录、每个字段的值,生成原始数据。

customer 表里的某条记录如下:

0 AAAAAAAAAAAAAAAA 1824793 3203 2555 28776 14690 Ms. Marisa Harrington N 17 4 1988 UNITED ARAB EMIRATES RRCyuY3XfE3a Marisa.Harrington@lawyer.com   gdMmGdU9

如果执行 TPCx-BB 测试时指定 -f 1(SCALE_FACTOR = 1) 则最终生成的原始数据总大小约为 1G(977M+8.6M)

[root@node-20-100 ~]# hdfs dfs -du -h /user/root/benchmarks/bigbench/data 12.7 M   38.0 M   /user/root/benchmarks/bigbench/data/customer 5.1 M    15.4 M   /user/root/benchmarks/bigbench/data/customer_address 74.2 M   222.5 M  /user/root/benchmarks/bigbench/data/customer_demographics 14.7 M   44.0 M   /user/root/benchmarks/bigbench/data/date_dim 151.5 K  454.4 K  /user/root/benchmarks/bigbench/data/household_demographics 327      981      /user/root/benchmarks/bigbench/data/income_band 405.3 M  1.2 G    /user/root/benchmarks/bigbench/data/inventory 6.5 M    19.5 M   /user/root/benchmarks/bigbench/data/item 4.0 M    12.0 M   /user/root/benchmarks/bigbench/data/item_marketprices 53.7 M   161.2 M  /user/root/benchmarks/bigbench/data/product_reviews 45.3 K   135.9 K  /user/root/benchmarks/bigbench/data/promotion 3.0 K    9.1 K    /user/root/benchmarks/bigbench/data/reason 1.2 K    3.6 K    /user/root/benchmarks/bigbench/data/ship_mode 3.3 K    9.9 K    /user/root/benchmarks/bigbench/data/store 4.1 M    12.4 M   /user/root/benchmarks/bigbench/data/store_returns 88.5 M   265.4 M  /user/root/benchmarks/bigbench/data/store_sales 4.9 M    14.6 M   /user/root/benchmarks/bigbench/data/time_dim 584      1.7 K    /user/root/benchmarks/bigbench/data/warehouse 170.4 M  511.3 M  /user/root/benchmarks/bigbench/data/web_clickstreams 7.9 K    23.6 K   /user/root/benchmarks/bigbench/data/web_page 5.1 M    15.4 M   /user/root/benchmarks/bigbench/data/web_returns 127.6 M  382.8 M  /user/root/benchmarks/bigbench/data/web_sales 8.6 K    25.9 K   /user/root/benchmarks/bigbench/data/web_site

执行流程

要执行TPCx-BB测试,首先需要切换到TPCx-BB源程序的目录下,然后进入bin目录,执行以下语句:

./bigBench runBenchmark -f 1 -m 8 -s 2 -j 5

其中,-f、-m、-s、-j都是参数,用户可根据集群的性能以及自己的需求来设置。如果不指定,则使用默认值,默认值在 conf 目录下的 userSetting.conf 文件指定,如下:

export BIG_BENCH_DEFAULT_DATABASE="bigbench" export BIG_BENCH_DEFAULT_ENGINE="hive" export BIG_BENCH_DEFAULT_MAP_TASKS="80" export BIG_BENCH_DEFAULT_SCALE_FACTOR="1000" export BIG_BENCH_DEFAULT_NUMBER_OF_PARALLEL_STREAMS="2" export BIG_BENCH_DEFAULT_BENCHMARK_PHASE="run_query"

默认 MAP_TASKS80(-m 80)SCALE_FACTOR1000(-f 1000)NUMBER_OF_PARALLEL_STREAMS2(-s 2)

所有可选参数及其意义如下:

General options: -d  使用的数据库 (默认: $BIG_BENCH_DEFAULT_DATABASE -> bigbench) -e  使用的引擎 (默认: $BIG_BENCH_DEFAULT_ENGINE -> hive) -f  数据集的规模因子(scale factor) (默认: $BIG_BENCH_DEFAULT_SCALE_FACTOR -> 1000) -h  显示帮助 -m  数据生成的`map tasks`数 (default: $BIG_BENCH_DEFAULT_MAP_TASKS)" -s  并行的`stream`数 (默认: $BIG_BENCH_DEFAULT_NUMBER_OF_PARALLEL_STREAMS -> 2)  Driver specific options: -a  伪装模式执行 -b  执行期间将调用的bash脚本在标准输出中打印出来 -i  指定需要执行的阶段 (详情见$BIG_BENCH_CONF_DIR/bigBench.properties) -j  指定需要执行的查询 (默认:1-30共30个查询均执行)" -U  解锁专家模式

若指定了-U,即解锁了专家模式,则:

echo "EXPERT MODE ACTIVE" echo "WARNING - INTERNAL USE ONLY:" echo "Only set manually if you know what you are doing!" echo "Ignoring them is probably the best solution"  echo "Running individual modules:" echo "Usage: `basename $0` module [options]"  -D  指定需要debug的查询部分. 大部分查询都只有一个单独的部分 -p  需要执行的benchmark phase (默认: $BIG_BENCH_DEFAULT_BENCHMARK_PHASE -> run_query)" -q  指定需要执行哪个查询 -t  指定执行查询时的stream数 -v  metastore population的sql脚本 (默认: ${USER_POPULATE_FILE:-"$BIG_BENCH_POPULATION_DIR/hiveCreateLoad.sql"})" -w  metastore refresh的sql脚本 (默认: ${USER_REFRESH_FILE:-"$BIG_BENCH_REFRESH_DIR/hiveRefreshCreateLoad.sql"})" -y  含额外的用户自定义查询参数的文件 (global: $BIG_BENCH_ENGINE_CONF_DIR/queryParameters.sql)" -z  含额外的用户自定义引擎设置的文件 (global: $BIG_BENCH_ENGINE_CONF_DIR/engineSettings.sql)"  List of available modules:     $BIG_BENCH_ENGINE_BIN_DIR

回到刚刚执行TPCx-BB测试的语句:

./bigBench runBenchmark -f 1 -m 8 -s 2 -j 5

bigBench

bigBench是主脚本,runBenchmark是module。

bigBench 里设置了很多环境变量,因为后面调用 runBigBench.jar 的时候需要在Java程序里读取这些环境变量。

bigBench 前面都是在做一些基本工作,如设置环境变量、解析用户输入参数、赋予文件权限、设置路径等等。到最后一步调用 runBenchmarkrunModule() 方法:

  1. 设置基本路径

     export BIG_BENCH_VERSION="1.0"  export BIG_BENCH_BIN_DIR="$BIG_BENCH_HOME/bin"  export BIG_BENCH_CONF_DIR="$BIG_BENCH_HOME/conf"  export BIG_BENCH_DATA_GENERATOR_DIR="$BIG_BENCH_HOME/data-generator"  export BIG_BENCH_TOOLS_DIR="$BIG_BENCH_HOME/tools"  export BIG_BENCH_LOGS_DIR="$BIG_BENCH_HOME/logs"
  2. 指定 core-site.xmlhdfs-site.xml 的路径

    数据生成时要用到Hadoop集群,生成在hdfs上

     export BIG_BENCH_DATAGEN_CORE_SITE="$BIG_BENCH_HADOOP_CONF/core-site.xml" export BIG_BENCH_DATAGEN_HDFS_SITE="$BIG_BENCH_HADOOP_CONF/hdfs-site.xml"
  3. 赋予整个包下所有可执行文件权限(.sh/.jar/.py)

     find "$BIG_BENCH_HOME" -name '*.sh' -exec chmod 755 {} + find "$BIG_BENCH_HOME" -name '*.jar' -exec chmod 755 {} + find "$BIG_BENCH_HOME" -name '*.py' -exec chmod 755 {} +
  4. 设置 userSetting.conf 的路径并 source

     USER_SETTINGS="$BIG_BENCH_CONF_DIR/userSettings.conf"  if [ ! -f "$USER_SETTINGS" ]  then    echo "User settings file $USER_SETTINGS not found"    exit 1  else    source "$USER_SETTINGS"  fi
  5. 解析输入参数和选项并根据选项的内容作设置

    第一个参数必须是module_name

    如果没有输入参数或者第一个参数以”-“开头,说明用户没有输入需要运行的module。

     if [[ $# -eq 0 || "`echo "$1" | cut -c1`" = "-" ]]  then    export MODULE_NAME=""    SHOW_HELP="1"  else    export MODULE_NAME="$1"    shift  fi  export LIST_OF_USER_OPTIONS="$@"

    解析用户输入的参数

    根据用户输入的参数来设置环境变量

     while getopts ":d:D:e:f:hm:p:q:s:t:Uv:w:y:z:abi:j:" OPT; do case "$OPT" in  # script options  d)    #echo "-d was triggered, Parameter: $OPTARG" >&2    USER_DATABASE="$OPTARG"  ;;  D)    #echo "-D was triggered, Parameter: $OPTARG" >&2    DEBUG_QUERY_PART="$OPTARG"  ;;  e)    #echo "-e was triggered, Parameter: $OPTARG" >&2    USER_ENGINE="$OPTARG"  ;;  f)    #echo "-f was triggered, Parameter: $OPTARG" >&2    USER_SCALE_FACTOR="$OPTARG"  ;;  h)    #echo "-h was triggered, Parameter: $OPTARG" >&2    SHOW_HELP="1"  ;;  m)    #echo "-m was triggered, Parameter: $OPTARG" >&2    USER_MAP_TASKS="$OPTARG"  ;;  p)    #echo "-p was triggered, Parameter: $OPTARG" >&2    USER_BENCHMARK_PHASE="$OPTARG"  ;;  q)    #echo "-q was triggered, Parameter: $OPTARG" >&2    QUERY_NUMBER="$OPTARG"  ;;  s)    #echo "-t was triggered, Parameter: $OPTARG" >&2    USER_NUMBER_OF_PARALLEL_STREAMS="$OPTARG"  ;;  t)    #echo "-s was triggered, Parameter: $OPTARG" >&2    USER_STREAM_NUMBER="$OPTARG"  ;;  U)    #echo "-U was triggered, Parameter: $OPTARG" >&2    USER_EXPERT_MODE="1"  ;;  v)    #echo "-v was triggered, Parameter: $OPTARG" >&2    USER_POPULATE_FILE="$OPTARG"  ;;  w)    #echo "-w was triggered, Parameter: $OPTARG" >&2    USER_REFRESH_FILE="$OPTARG"  ;;  y)    #echo "-y was triggered, Parameter: $OPTARG" >&2    USER_QUERY_PARAMS_FILE="$OPTARG"  ;;  z)    #echo "-z was triggered, Parameter: $OPTARG" >&2    USER_ENGINE_SETTINGS_FILE="$OPTARG"  ;;  # driver options  a)    #echo "-a was triggered, Parameter: $OPTARG" >&2    export USER_PRETEND_MODE="1"  ;;  b)    #echo "-b was triggered, Parameter: $OPTARG" >&2    export USER_PRINT_STD_OUT="1"  ;;  i)    #echo "-i was triggered, Parameter: $OPTARG" >&2    export USER_DRIVER_WORKLOAD="$OPTARG"  ;;  j)    #echo "-j was triggered, Parameter: $OPTARG" >&2    export USER_DRIVER_QUERIES_TO_RUN="$OPTARG"  ;;  /?)    echo "Invalid option: -$OPTARG" >&2    exit 1  ;;  :)    echo "Option -$OPTARG requires an argument." >&2    exit 1 ;; esac done

    设置全局变量。如果用户指定了某个参数的值,则采用该值,否则使用默认值。

     export BIG_BENCH_EXPERT_MODE="${USER_EXPERT_MODE:-"0"}" export SHOW_HELP="${SHOW_HELP:-"0"}" export BIG_BENCH_DATABASE="${USER_DATABASE:-"$BIG_BENCH_DEFAULT_DATABASE"}" export BIG_BENCH_ENGINE="${USER_ENGINE:-"$BIG_BENCH_DEFAULT_ENGINE"}" export BIG_BENCH_MAP_TASKS="${USER_MAP_TASKS:-"$BIG_BENCH_DEFAULT_MAP_TASKS"}" export BIG_BENCH_SCALE_FACTOR="${USER_SCALE_FACTOR:-"$BIG_BENCH_DEFAULT_SCALE_FACTOR"}" export BIG_BENCH_NUMBER_OF_PARALLEL_STREAMS="${USER_NUMBER_OF_PARALLEL_STREAMS:-"$BIG_BENCH_DEFAULT_NUMBER_OF_PARALLEL_STREAMS"}" export BIG_BENCH_BENCHMARK_PHASE="${USER_BENCHMARK_PHASE:-"$BIG_BENCH_DEFAULT_BENCHMARK_PHASE"}" export BIG_BENCH_STREAM_NUMBER="${USER_STREAM_NUMBER:-"0"}" export BIG_BENCH_ENGINE_DIR="$BIG_BENCH_HOME/engines/$BIG_BENCH_ENGINE" export BIG_BENCH_ENGINE_CONF_DIR="$BIG_BENCH_ENGINE_DIR/conf"
  6. 检测 -s -m -f -j的选项是否为数字

     if [ -n "`echo "$BIG_BENCH_MAP_TASKS" | sed -e 's/[0-9]*//g'`" ]  then    echo "$BIG_BENCH_MAP_TASKS is not a number"  fi  if [ -n "`echo "$BIG_BENCH_SCALE_FACTOR" | sed -e 's/[0-9]*//g'`" ]  then    echo "$BIG_BENCH_SCALE_FACTOR is not a number"  fi  if [ -n "`echo "$BIG_BENCH_NUMBER_OF_PARALLEL_STREAMS" | sed -e 's/[0-9]*//g'`" ]  then    echo "$BIG_BENCH_NUMBER_OF_PARALLEL_STREAMS is not a number"  fi  if [ -n "`echo "$BIG_BENCH_STREAM_NUMBER" | sed -e 's/[0-9]*//g'`" ]  then    echo "$BIG_BENCH_STREAM_NUMBER is not a number"  fi
  7. 检查引擎是否存在

     if [ ! -d "$BIG_BENCH_ENGINE_DIR" ]  then    echo "Engine directory $BIG_BENCH_ENGINE_DIR not found. Aborting script..."    exit 1  fi  if [ ! -d "$BIG_BENCH_ENGINE_CONF_DIR" ]  then    echo "Engine configuration directory $BIG_BENCH_ENGINE_CONF_DIR not found. Aborting script..."    exit 1  fi
  8. 设置 engineSetting.conf 路径并 source

     ENGINE_SETTINGS="$BIG_BENCH_ENGINE_CONF_DIR/engineSettings.conf"  if [ ! -f "$ENGINE_SETTINGS" ]  then    echo "Engine settings file $ENGINE_SETTINGS not found"    exit 1  else    source "$ENGINE_SETTINGS"  fi
  9. 检查module是否存在

    当输入某个module时,系统会先到$BIG_BENCH_ENGINE_BIN_DIR/目录下去找该module是否存在,如果存在,就source "$MODULE";如果该目录下不存在指定的module,再到export MODULE="$BIG_BENCH_BIN_DIR/"目录下找该module,如果存在,就source "$MODULE";否则,输出Module $MODULE not found, aborting script.

    export MODULE="$BIG_BENCH_ENGINE_BIN_DIR/$MODULE_NAME" if [ -f "$MODULE" ] then   source "$MODULE" else   export MODULE="$BIG_BENCH_BIN_DIR/$MODULE_NAME"   if [ -f "$MODULE" ]   then     source "$MODULE"   else     echo "Module $MODULE not found, aborting script."     exit 1   fi fi
  10. 检查module里的runModule()、helpModule ( )、runEngineCmd()方法是否有定义

    MODULE_RUN_METHOD="runModule" if ! declare -F "$MODULE_RUN_METHOD" > /dev/null 2>&1 then   echo "$MODULE_RUN_METHOD was not implemented, aborting script"   exit 1 fi
  11. 运行module

    如果module是runBenchmark,执行
    runCmdWithErrorCheck "$MODULE_RUN_METHOD"
    也就是
    runCmdWithErrorCheck runModule()

由上可以看出,bigBench脚本主要执行一些如设置环境变量、赋予权限、检查并解析输入参数等基础工作,最终调用runBenchmarkrunModule()方法继续往下执行。

runBenchmark

接下来看看runBenchmark脚本。

runBenchmark里有两个函数:helpModule ()runModule ()

helpModule ()就是显示帮助。

runModule ()是运行runBenchmark模块时真正调用的函数。该函数主要做四件事:

  1. 清除之前生成的日志
  2. 调用RunBigBench.jar来执行
  3. logEnvInformation
  4. 将日志文件夹打包成zip

源码如下:

runModule () {   #check input parameters   if [ "$BIG_BENCH_NUMBER_OF_PARALLEL_STREAMS" -le 0 ]   then     echo "The number of parallel streams -s must be greater than 0"     return 1   fi    "${BIG_BENCH_BIN_DIR}/bigBench" cleanLogs -U $LIST_OF_USER_OPTIONS   "$BIG_BENCH_JAVA" -jar "${BIG_BENCH_TOOLS_DIR}/RunBigBench.jar"   "${BIG_BENCH_BIN_DIR}/bigBench" logEnvInformation -U $LIST_OF_USER_OPTIONS   "${BIG_BENCH_BIN_DIR}/bigBench" zipLogs -U $LIST_OF_USER_OPTIONS   return $? }

相当于运行runBenchmark模块时又调用了cleanLogslogEnvInformationzipLogs三个模块以及RunBigBench.jar。其中RunBigBench.jar是TCPx-BB测试执行的核心代码,用Java语言编写。接下来分析RunBigBench.jar源码。

runModule()

runModule()函数用来执行某个module。我们已知,执行某个module需要切换到主目录下的bin目录,然后执行:

./bigBench module_name arguments

在runModule()函数里,cmdLine用来生成如上命令。

ArrayList cmdLine = new ArrayList(); cmdLine.add("bash"); cmdLine.add(this.runScript); cmdLine.add(benchmarkPhase.getRunModule()); cmdLine.addAll(arguments);

其中,this.runScript为:

this.runScript = (String)env.get("BIG_BENCH_BIN_DIR") + "/bigBench";

benchmarkPhase.getRunModule()用来获得需要执行的module。

arguments为用户输入的参数。

至此,cmdLine为:

bash $BIG_BENCH_BIN_DIR/bigBench module_name arguments

那么,怎么让系统执行该bash命令呢?答案是调用runCmd()方法。

boolean successful = this.runCmd(this.homeDir, benchmarkPhase.isPrintStdOut(), (String[])cmdLine.toArray(new String[0]));

接下来介绍rumCmd()方法

runCmd()

runCmd()方法通过ProcessBuilder来创建一个操作系统进程,并用该进程执行以上的bash命令。

ProcessBuilder还可以设置工作目录和环境。

ProcessBuilder pb = new ProcessBuilder(command); pb.directory(new File(workingDirectory)); Process p = null; --- p = pb.start();

getQueryList()

getQueryList()用来获得需要执行的查询列表。从$BIG_BENCH_LOGS_DIR/bigBench.properties文件中读取。与$BIG_BENCH_HOME/conf/bigBench.properties内容一致。

bigBench.propertiespower_test_0=1-30规定了powter_test_0阶段需要执行的查询及其顺序。

可以用区间如 5-12 或者单个数字如 21 表示,中间用 , 隔开。

power_test_0=28-25,2-5,10,22,30

表示powter_test_0阶段需要执行的查询及其顺序为:25,26,27,28,2,3,4,5,10,22,30

如果想让30个查询按顺序执行,则:

power_test_0=1-30

获得查询列表的源码如下:

    private List<Integer> getQueryList(BigBench.BenchmarkPhase benchmarkPhase, int streamNumber) {         String SHUFFLED_NAME_PATTERN = "shuffledQueryList";         BigBench.BenchmarkPhase queryOrderBasicPhase = BigBench.BenchmarkPhase.POWER_TEST;         String propertyKey = benchmarkPhase.getQueryListProperty(streamNumber);         boolean queryOrderCached = benchmarkPhase.isQueryOrderCached();         if(queryOrderCached && this.queryListCache.containsKey(propertyKey)) {             return new ArrayList((Collection)this.queryListCache.get(propertyKey));         } else {             Object queryList;             String basicPhaseNamePattern;             if(!this.properties.containsKey(propertyKey)) {                 if(benchmarkPhase.isQueryOrderRandom()) {                     if(!this.queryListCache.containsKey("shuffledQueryList")) {                         basicPhaseNamePattern = queryOrderBasicPhase.getQueryListProperty(0);                         if(!this.properties.containsKey(basicPhaseNamePattern)) {                             throw new IllegalArgumentException("Property " + basicPhaseNamePattern + " is not defined, but is the basis for shuffling the query list.");                         }                          this.queryListCache.put("shuffledQueryList", this.getQueryList(queryOrderBasicPhase, 0));                     }                      queryList = (List)this.queryListCache.get("shuffledQueryList");                     this.shuffleList((List)queryList, this.rnd);                 } else {                     queryList = this.getQueryList(queryOrderBasicPhase, 0);                 }             } else {                 queryList = new ArrayList();                 String[] var11;                 int var10 = (var11 = this.properties.getProperty(propertyKey).split(",")).length;                  label65:                 for(int var9 = 0; var9 < var10; ++var9) {                     basicPhaseNamePattern = var11[var9];                     String[] queryRange = basicPhaseNamePattern.trim().split("-");                     switch(queryRange.length) {                     case 1:                         ((List)queryList).add(Integer.valueOf(Integer.parseInt(queryRange[0].trim())));                         break;                     case 2:                         int startQuery = Integer.parseInt(queryRange[0]);                         int endQuery = Integer.parseInt(queryRange[1]);                         int i;                         if(startQuery > endQuery) {                             i = startQuery;                              while(true) {                                 if(i < endQuery) {                                     continue label65;                                 }                                  ((List)queryList).add(Integer.valueOf(i));                                 --i;                             }                         } else {                             i = startQuery;                              while(true) {                                 if(i > endQuery) {                                     continue label65;                                 }                                  ((List)queryList).add(Integer.valueOf(i));                                 ++i;                             }                         }                     default:                         throw new IllegalArgumentException("Query numbers must be in the form X or X-Y, comma separated.");                     }                 }             }              if(queryOrderCached) {                 this.queryListCache.put(propertyKey, new ArrayList((Collection)queryList));             }              return new ArrayList((Collection)queryList);         }     }

parseEnvironment()

parseEnvironment()读取系统的环境变量并解析。

Map env = System.getenv(); this.version = (String)env.get("BIG_BENCH_VERSION"); this.homeDir = (String)env.get("BIG_BENCH_HOME"); this.confDir = (String)env.get("BIG_BENCH_CONF_DIR"); this.runScript = (String)env.get("BIG_BENCH_BIN_DIR") + "/bigBench"; this.datagenDir = (String)env.get("BIG_BENCH_DATA_GENERATOR_DIR"); this.logDir = (String)env.get("BIG_BENCH_LOGS_DIR"); this.dataGenLogFile = (String)env.get("BIG_BENCH_DATAGEN_STAGE_LOG"); this.loadLogFile = (String)env.get("BIG_BENCH_LOADING_STAGE_LOG"); this.engine = (String)env.get("BIG_BENCH_ENGINE"); this.database = (String)env.get("BIG_BENCH_DATABASE"); this.mapTasks = (String)env.get("BIG_BENCH_MAP_TASKS"); this.numberOfParallelStreams = Integer.parseInt((String)env.get("BIG_BENCH_NUMBER_OF_PARALLEL_STREAMS")); this.scaleFactor = Long.parseLong((String)env.get("BIG_BENCH_SCALE_FACTOR")); this.stopAfterFailure = ((String)env.get("BIG_BENCH_STOP_AFTER_FAILURE")).equals("1");

并自动在用户指定的参数后面加上 -U (解锁专家模式)

this.userArguments.add("-U");

如果用户指定了 PRETEND_MODEPRINT_STD_OUTWORKLOADQUERIES_TO_RUN,则以用户指定的参数为准,否则使用默认值。

if(env.containsKey("USER_PRETEND_MODE")) {     this.properties.setProperty("pretend_mode", (String)env.get("USER_PRETEND_MODE")); }  if(env.containsKey("USER_PRINT_STD_OUT")) {    this.properties.setProperty("show_command_stdout", (String)env.get("USER_PRINT_STD_OUT")); }  if(env.containsKey("USER_DRIVER_WORKLOAD")) {    this.properties.setProperty("workload", (String)env.get("USER_DRIVER_WORKLOAD")); }  if(env.containsKey("USER_DRIVER_QUERIES_TO_RUN")) {     this.properties.setProperty(BigBench.BenchmarkPhase.POWER_TEST.getQueryListProperty(0), (String)env.get("USER_DRIVER_QUERIES_TO_RUN")); }

读取 workload 并赋值 benchmarkPhases。如果 workload 里不包含 BENCHMARK_STARTBENCHMARK_STOP,自动在 benchmarkPhases 的首位和末位分别加上 BENCHMARK_STARTBENCHMARK_STOP

this.benchmarkPhases = new ArrayList(); Iterator var7 = Arrays.asList(this.properties.getProperty("workload").split(",")).iterator();  while(var7.hasNext()) {     String benchmarkPhase = (String)var7.next();     this.benchmarkPhases.add(BigBench.BenchmarkPhase.valueOf(benchmarkPhase.trim())); }  if(!this.benchmarkPhases.contains(BigBench.BenchmarkPhase.BENCHMARK_START)) {     this.benchmarkPhases.add(0, BigBench.BenchmarkPhase.BENCHMARK_START); }  if(!this.benchmarkPhases.contains(BigBench.BenchmarkPhase.BENCHMARK_STOP)) {     this.benchmarkPhases.add(BigBench.BenchmarkPhase.BENCHMARK_STOP); }

run()

run() 方法是 RunBigBench.jar 里核心的方法。所有的执行都是通过 run() 方法调用的。比如 runQueries()runModule()generateData()等。runQueries()runModule()generateData() 又通过调用 runCmd() 方法来创建操作系统进程,执行bash命令,调用bash脚本。

run() 方法里通过一个 while 循环来逐一执行 workload 里的每一个 benchmarkPhase。 不同的 benchmarkPhase 会调用 runQueries()runModule()generateData()…中的不同方法。

try { long e = 0L; this.log.finer("Benchmark phases: " + this.benchmarkPhases); Iterator startCheckpoint = this.benchmarkPhases.iterator();  long throughputStart; while(startCheckpoint.hasNext()) {     BigBench.BenchmarkPhase children = (BigBench.BenchmarkPhase)startCheckpoint.next();     if(children.isPhaseDone()) {         this.log.info("The phase " + children.name() + " was already performed earlier. Skipping this phase");     } else {         try {             switch($SWITCH_TABLE$io$bigdatabenchmark$v1$driver$BigBench$BenchmarkPhase()[children.ordinal()]) {             case 1:             case 20:                 throw new IllegalArgumentException("The value " + children.name() + " is only used internally.");             case 2:                 this.log.info(children.getConsoleMessage());                 e = System.currentTimeMillis();                 break;             case 3:                 if(!BigBench.BenchmarkPhase.BENCHMARK_START.isPhaseDone()) {                     throw new IllegalArgumentException("Error: Cannot stop the benchmark before starting it");                 }                  throughputStart = System.currentTimeMillis();                 this.log.info(String.format("%-55s finished. Time: %25s", new Object[]{children.getConsoleMessage(), BigBench.Helper.formatTime(throughputStart - e)}));                 this.logTreeRoot.setCheckpoint(new BigBench.Checkpoint(BigBench.BenchmarkPhase.BENCHMARK, -1L, -1L, e, throughputStart, this.logTreeRoot.isSuccessful()));                 break;             case 4:             case 15:             case 18:             case 22:             case 27:             case 28:             case 29:                 this.runModule(children, this.userArguments);                 break;             case 5:             case 10:             case 11:                 this.runQueries(children, 1, validationArguments);                 break;             case 6:             case 9:                 this.runModule(children, validationArguments);                 break;             case 7:                 this.generateData(children, false, validationArguments);                 break;             case 8:                 this.generateData(children, true, validationArguments);                 break;             case 12:             case 19:             case 24:                 this.runQueries(children, 1, this.userArguments);                 break;             case 13:             case 14:             case 21:             case 23:             case 25:             case 26:                 this.runQueries(children, this.numberOfParallelStreams, this.userArguments);                 break;             case 16:                 this.generateData(children, false, this.userArguments);                 break;             case 17:                 this.generateData(children, true, this.userArguments);             }              children.setPhaseDone(true);         } catch (IOException var21) {             this.log.info("==============/nBenchmark run terminated/nReason: An error occured while running a command in phase " + children + "/n==============");             var21.printStackTrace();             if(this.stopAfterFailure || children.mustSucceed()) {                 break;             }         }     } }

这里的 case 1-29 并不是 1-29 条查询,而是枚举类型里的 1-29benmarkPhase 。如下所示:

private static enum BenchmarkPhase { BENCHMARK((String)null, "benchmark", false, false, false, false, "BigBench benchmark"), BENCHMARK_START((String)null, "benchmark_start", false, false, false, false, "BigBench benchmark: Start"), BENCHMARK_STOP((String)null, "benchmark_stop", false, false, false, false, "BigBench benchmark: Stop"), CLEAN_ALL("cleanAll", "clean_all", false, false, false, false, "BigBench clean all"), ENGINE_VALIDATION_CLEAN_POWER_TEST("cleanQuery", "engine_validation_power_test", false, false, false, false, "BigBench engine validation: Clean power test queries"), ENGINE_VALIDATION_CLEAN_LOAD_TEST("cleanMetastore", "engine_validation_metastore", false, false, false, false, "BigBench engine validation: Clean metastore"), ENGINE_VALIDATION_CLEAN_DATA("cleanData", "engine_validation_data", false, false, false, false, "BigBench engine validation: Clean data"), ENGINE_VALIDATION_DATA_GENERATION("dataGen", "engine_validation_data", false, false, false, true, "BigBench engine validation: Data generation"), ENGINE_VALIDATION_LOAD_TEST("populateMetastore", "engine_validation_metastore", false, false, false, true, "BigBench engine validation: Populate metastore"), ENGINE_VALIDATION_POWER_TEST("runQuery", "engine_validation_power_test", false, false, false, false, "BigBench engine validation: Power test"), ENGINE_VALIDATION_RESULT_VALIDATION("validateQuery", "engine_validation_power_test", false, false, true, false, "BigBench engine validation: Check all query results"), CLEAN_POWER_TEST("cleanQuery", "power_test", false, false, false, false, "BigBench clean: Clean power test queries"), CLEAN_THROUGHPUT_TEST_1("cleanQuery", "throughput_test_1", false, false, false, false, "BigBench clean: Clean first throughput test queries"), CLEAN_THROUGHPUT_TEST_2("cleanQuery", "throughput_test_2", false, false, false, false, "BigBench clean: Clean second throughput test queries"), CLEAN_LOAD_TEST("cleanMetastore", "metastore", false, false, false, false, "BigBench clean: Load test"), CLEAN_DATA("cleanData", "data", false, false, false, false, "BigBench clean: Data"), DATA_GENERATION("dataGen", "data", false, false, false, true, "BigBench preparation: Data generation"), LOAD_TEST("populateMetastore", "metastore", false, false, false, true, "BigBench phase 1: Load test"), POWER_TEST("runQuery", "power_test", false, true, false, false, "BigBench phase 2: Power test"), THROUGHPUT_TEST((String)null, "throughput_test", false, false, false, false, "BigBench phase 3: Throughput test"), THROUGHPUT_TEST_1("runQuery", "throughput_test_1", true, true, false, false, "BigBench phase 3: First throughput test run"), THROUGHPUT_TEST_REFRESH("refreshMetastore", "throughput_test_refresh", false, false, false, false, "BigBench phase 3: Throughput test data refresh"), THROUGHPUT_TEST_2("runQuery", "throughput_test_2", true, true, false, false, "BigBench phase 3: Second throughput test run"), VALIDATE_POWER_TEST("validateQuery", "power_test", false, false, true, false, "BigBench validation: Power test results"), VALIDATE_THROUGHPUT_TEST_1("validateQuery", "throughput_test_1", false, false, true, false, "BigBench validation: First throughput test results"), VALIDATE_THROUGHPUT_TEST_2("validateQuery", "throughput_test_2", false, false, true, false, "BigBench validation: Second throughput test results"), SHOW_TIMES("showTimes", "show_times", false, false, true, false, "BigBench: show query times"), SHOW_ERRORS("showErrors", "show_errors", false, false, true, false, "BigBench: show query errors"), SHOW_VALIDATION("showValidation", "show_validation", false, false, true, false, "BigBench: show query validation results");  private String runModule; private String namePattern; private boolean queryOrderRandom; private boolean queryOrderCached; private boolean printStdOut; private boolean mustSucceed; private String consoleMessage; private boolean phaseDone;  private BenchmarkPhase(String runModule, String namePattern, boolean queryOrderRandom, boolean queryOrderCached, boolean printStdOut, boolean mustSucceed, String consoleMessage) {     this.runModule = runModule;     this.namePattern = namePattern;     this.queryOrderRandom = queryOrderRandom;     this.queryOrderCached = queryOrderCached;     this.printStdOut = printStdOut;     this.mustSucceed = mustSucceed;     this.consoleMessage = consoleMessage;     this.phaseDone = false; }

3对应 BENCHMARK_STOP,4对应 CLEAN_ALL,29对应 SHOW_VALIDATION,依此类推…

可以看出:

CLEAN_ALL、CLEAN_LOAD_TEST、LOAD_TEST、THROUGHPUT_TEST_REFRESH、SHOW_TIMES、SHOW_ERRORS、SHOW_VALIDATION等benchmarkPhases调用的是

this.runModule(children, this.userArguments);

方法是 runModule ,参数是 this.userArguments

ENGINE_VALIDATION_CLEAN_POWER_TEST、ENGINE_VALIDATION_POWER_TEST、ENGINE_VALIDATION_RESULT_VALIDATION 调用的是

this.runQueries(children, 1, validationArguments);

方法是 runQueries ,参数是 1(stream number) 和 validationArguments

ENGINE_VALIDATION_CLEAN_LOAD_TESTENGINE_VALIDATION_LOAD_TEST 调用的是

this.runModule(children, validationArguments);

ENGINE_VALIDATION_CLEAN_DATA 调用的是

this.generateData(children, false, validationArguments);

ENGINE_VALIDATION_DATA_GENERATION 调用的是

this.generateData(children, true, validationArguments);

CLEAN_POWER_TESTPOWER_TESTVALIDATE_POWER_TEST 调用的是

this.runQueries(children, 1, this.userArguments);

CLEAN_THROUGHPUT_TEST_1``CLEAN_THROUGHPUT_TEST_2``THROUGHPUT_TEST_1``THROUGHPUT_TEST_2``VALIDATE_THROUGHPUT_TEST_1 VALIDATE_THROUGHPUT_TEST_2 调用的是

this.runQueries(children, this.numberOfParallelStreams, this.userArguments);

CLEAN_DATA 调用的是

this.generateData(children, false, this.userArguments);

DATA_GENERATION 调用的是

this.generateData(children, true, this.userArguments);

总结一下以上的方法调用可以发现:

  • ENGINE_VALIDATION 相关的benchmarkPhase用的参数都是 validationArguments。其余用的是 userArguments
  • POWER_TEST 相关的都是调用 runQueries() 方法,因为 POWER_TEST 就是执行SQL查询
  • CLEAN_DATA DATA_GENERATION 相关的都是调用 generateData() 方法
  • LOAD_TEST SHOW 相关的都是调用 runModule() 方法

benchmarkPhase 和 module 对应关系

具体每个 benchmarkPhasemodule(执行的脚本)的对应关系如下:

CLEAN_ALL -> "cleanAll" ENGINE_VALIDATION_CLEAN_POWER_TEST -> "cleanQuery" ENGINE_VALIDATION_CLEAN_LOAD_TEST -> "cleanMetastore", ENGINE_VALIDATION_CLEAN_DATA -> "cleanData" ENGINE_VALIDATION_DATA_GENERATION -> "dataGen" ENGINE_VALIDATION_LOAD_TEST -> "populateMetastore" ENGINE_VALIDATION_POWER_TEST -> "runQuery" ENGINE_VALIDATION_RESULT_VALIDATION -> "validateQuery" CLEAN_POWER_TEST -> "cleanQuery" CLEAN_THROUGHPUT_TEST_1 -> "cleanQuery" CLEAN_THROUGHPUT_TEST_2 -> "cleanQuery" CLEAN_LOAD_TEST -> "cleanMetastore" CLEAN_DATA -> "cleanData" DATA_GENERATION -> "dataGen" LOAD_TEST -> "populateMetastore" POWER_TEST -> "runQuery" THROUGHPUT_TEST -> (String)null THROUGHPUT_TEST_1 -> "runQuery" THROUGHPUT_TEST_REFRESH -> "refreshMetastore" THROUGHPUT_TEST_2 -> "runQuery" VALIDATE_POWER_TEST -> "validateQuery" VALIDATE_THROUGHPUT_TEST_1 -> "validateQuery" VALIDATE_THROUGHPUT_TEST_2 -> "validateQuery" SHOW_TIMES -> "showTimes" SHOW_ERRORS -> "showErrors" SHOW_VALIDATION -> "showValidation"

当执行某个 benchmarkPhase 时会去调用如上该 benchmarkPhase 对应的 module (脚本位于 $BENCH_MARK_HOME/engines/hive/bin 目录下)

cmdLine.add(benchmarkPhase.getRunModule());

程序调用流程

一篇文章看懂TPCx-BB(大数据基准测试工具)源码
bigBench.png

接下来介绍每个module的功能

module

cleanAll

1. DROP DATABASE 2. 删除hdfs上的源数据
echo "dropping database (with all tables)" runCmdWithErrorCheck runEngineCmd -e "DROP DATABASE IF EXISTS $BIG_BENCH_DATABASE CASCADE;" echo "cleaning ${BIG_BENCH_HDFS_ABSOLUTE_HOME}" hadoop fs -rm -r -f -skipTrash "${BIG_BENCH_HDFS_ABSOLUTE_HOME}"

cleanQuery

1. 删除对应的 Query 生成的临时表 2. 删除对应的 Query 生成的结果表
runCmdWithErrorCheck runEngineCmd -e "DROP TABLE IF EXISTS $TEMP_TABLE1; DROP TABLE IF EXISTS $TEMP_TABLE2; DROP TABLE IF EXISTS $RESULT_TABLE;" return $?

cleanMetastore

1. 调用 `dropTables.sql` 将23张表依次DROP
echo "cleaning metastore tables" runCmdWithErrorCheck runEngineCmd -f "$BIG_BENCH_CLEAN_METASTORE_FILE"
export BIG_BENCH_CLEAN_METASTORE_FILE="$BIG_BENCH_CLEAN_DIR/dropTables.sql"

dropTables.sql 将23张表依次DROP,源码如下:

DROP TABLE IF EXISTS ${hiveconf:customerTableName}; DROP TABLE IF EXISTS ${hiveconf:customerAddressTableName}; DROP TABLE IF EXISTS ${hiveconf:customerDemographicsTableName}; DROP TABLE IF EXISTS ${hiveconf:dateTableName}; DROP TABLE IF EXISTS ${hiveconf:householdDemographicsTableName}; DROP TABLE IF EXISTS ${hiveconf:incomeTableName}; DROP TABLE IF EXISTS ${hiveconf:itemTableName}; DROP TABLE IF EXISTS ${hiveconf:promotionTableName}; DROP TABLE IF EXISTS ${hiveconf:reasonTableName}; DROP TABLE IF EXISTS ${hiveconf:shipModeTableName}; DROP TABLE IF EXISTS ${hiveconf:storeTableName}; DROP TABLE IF EXISTS ${hiveconf:timeTableName}; DROP TABLE IF EXISTS ${hiveconf:warehouseTableName}; DROP TABLE IF EXISTS ${hiveconf:webSiteTableName}; DROP TABLE IF EXISTS ${hiveconf:webPageTableName}; DROP TABLE IF EXISTS ${hiveconf:inventoryTableName}; DROP TABLE IF EXISTS ${hiveconf:storeSalesTableName}; DROP TABLE IF EXISTS ${hiveconf:storeReturnsTableName}; DROP TABLE IF EXISTS ${hiveconf:webSalesTableName}; DROP TABLE IF EXISTS ${hiveconf:webReturnsTableName}; DROP TABLE IF EXISTS ${hiveconf:marketPricesTableName}; DROP TABLE IF EXISTS ${hiveconf:clickstreamsTableName}; DROP TABLE IF EXISTS ${hiveconf:reviewsTableName};

cleanData

1. 删除hdfs上 /user/root/benchmarks/bigbench/data 目录 2. 删除hdfs上 /user/root/benchmarks/bigbench/data_refresh 目录
echo "cleaning ${BIG_BENCH_HDFS_ABSOLUTE_INIT_DATA_DIR}" hadoop fs -rm -r -f -skipTrash "${BIG_BENCH_HDFS_ABSOLUTE_INIT_DATA_DIR}" echo "cleaning ${BIG_BENCH_HDFS_ABSOLUTE_REFRESH_DATA_DIR}" hadoop fs -rm -r -f -skipTrash "${BIG_BENCH_HDFS_ABSOLUTE_REFRESH_DATA_DIR}"

dataGen

1. 创建目录 /user/root/benchmarks/bigbench/data 并赋予权限 2. 创建目录 /user/root/benchmarks/bigbench/data_refresh 并赋予权限 3. 调用 HadoopClusterExec.jar 和 pdgf.jar 生成 base data 到 /user/root/benchmarks/bigbench/data 目录下 4. 调用 HadoopClusterExec.jar 和 pdgf.jar 生成 refresh data 到 /user/root/benchmarks/bigbench/data_refresh 目录下

创建目录 /user/root/benchmarks/bigbench/data 并赋予权限

runCmdWithErrorCheck hadoop fs -mkdir -p "${BIG_BENCH_HDFS_ABSOLUTE_INIT_DATA_DIR}" runCmdWithErrorCheck hadoop fs -chmod 777 "${BIG_BENCH_HDFS_ABSOLUTE_INIT_DATA_DIR}"

创建目录 /user/root/benchmarks/bigbench/data_refresh 并赋予权限

runCmdWithErrorCheck hadoop fs -mkdir -p "${BIG_BENCH_HDFS_ABSOLUTE_REFRESH_DATA_DIR}" runCmdWithErrorCheck hadoop fs -chmod 777 "${BIG_BENCH_HDFS_ABSOLUTE_REFRESH_DATA_DIR}"

调用 HadoopClusterExec.jar 和 pdgf.jar 生成 base data

runCmdWithErrorCheck hadoop jar "${BIG_BENCH_TOOLS_DIR}/HadoopClusterExec.jar" -archives "${PDGF_ARCHIVE_PATH}" ${BIG_BENCH_DATAGEN_HADOOP_EXEC_DEBUG} -taskFailOnNonZeroReturnValue -execCWD "${PDGF_DISTRIBUTED_NODE_DIR}" ${HadoopClusterExecOptions} -exec ${BIG_BENCH_DATAGEN_HADOOP_JVM_ENV} -cp "${HADOOP_CP}:pdgf.jar" ${PDGF_CLUSTER_CONF} pdgf.Controller -nc HadoopClusterExec.tasks -nn HadoopClusterExec.taskNumber -ns -c -sp REFRESH_PHASE 0 -o "'${BIG_BENCH_HDFS_ABSOLUTE_INIT_DATA_DIR}/'+table.getName()+'/'" ${BIG_BENCH_DATAGEN_HADOOP_OPTIONS} -s ${BIG_BENCH_DATAGEN_TABLES} ${PDGF_OPTIONS} "$@" 2>&1 | tee -a "$BIG_BENCH_DATAGEN_STAGE_LOG" 2>&1

调用 HadoopClusterExec.jar 和 pdgf.jar 生成 refresh data

runCmdWithErrorCheck hadoop jar "${BIG_BENCH_TOOLS_DIR}/HadoopClusterExec.jar" -archives "${PDGF_ARCHIVE_PATH}" ${BIG_BENCH_DATAGEN_HADOOP_EXEC_DEBUG} -taskFailOnNonZeroReturnValue -execCWD "${PDGF_DISTRIBUTED_NODE_DIR}" ${HadoopClusterExecOptions} -exec ${BIG_BENCH_DATAGEN_HADOOP_JVM_ENV} -cp "${HADOOP_CP}:pdgf.jar" ${PDGF_CLUSTER_CONF} pdgf.Controller -nc HadoopClusterExec.tasks -nn HadoopClusterExec.taskNumber -ns -c -sp REFRESH_PHASE 1 -o "'${BIG_BENCH_HDFS_ABSOLUTE_REFRESH_DATA_DIR}/'+table.getName()+'/'" ${BIG_BENCH_DATAGEN_HADOOP_OPTIONS} -s ${BIG_BENCH_DATAGEN_TABLES} ${PDGF_OPTIONS} "$@" 2>&1 | tee -a "$BIG_BENCH_DATAGEN_STAGE_LOG" 2>&1

populateMetastore

该过程是真正的创建数据库表的过程。建表的过程调用的是 $BENCH_MARK_HOME/engines/hive/population/ 目录下的 hiveCreateLoad.sql ,通过该sql文件来建数据库表。

  1. 从 /user/root/benchmarks/bigbench/data 路径下读取 .dat 的原始数据,生成 TEXTFILE 格式的外部临时表
  2. select * from 临时表 来创建最终的 ORC 格式的数据库表
  3. 删除外部临时表。

从 /user/root/benchmarks/bigbench/data 路径下读取 .dat 的原始数据,生成 TEXTFILE 格式的外部临时表

DROP TABLE IF EXISTS ${hiveconf:customerTableName}${hiveconf:temporaryTableSuffix}; CREATE EXTERNAL TABLE ${hiveconf:customerTableName}${hiveconf:temporaryTableSuffix}   ( c_customer_sk             bigint              --not null   , c_customer_id             string              --not null   , c_current_cdemo_sk        bigint   , c_current_hdemo_sk        bigint   , c_current_addr_sk         bigint   , c_first_shipto_date_sk    bigint   , c_first_sales_date_sk     bigint   , c_salutation              string   , c_first_name              string   , c_last_name               string   , c_preferred_cust_flag     string   , c_birth_day               int   , c_birth_month             int   , c_birth_year              int   , c_birth_country           string   , c_login                   string   , c_email_address           string   , c_last_review_date        string   )   ROW FORMAT DELIMITED FIELDS TERMINATED BY '${hiveconf:fieldDelimiter}'   STORED AS TEXTFILE LOCATION '${hiveconf:hdfsDataPath}/${hiveconf:customerTableName}' ;

select * from 临时表 来创建最终的 ORC 格式的数据库表

DROP TABLE IF EXISTS ${hiveconf:customerTableName}; CREATE TABLE ${hiveconf:customerTableName} STORED AS ${hiveconf:tableFormat} AS SELECT * FROM ${hiveconf:customerTableName}${hiveconf:temporaryTableSuffix} ;

删除外部临时表

DROP TABLE ${hiveconf:customerTableName}${hiveconf:temporaryTableSuffix};

总结下执行以下 TPCx-BB 测试命令代码的执行步骤:

./bigBench runBenchmark -f 1 -m 8 -s 2 -j 5

runQuery

1. runQuery 调用每个query下的 run.sh 里的 `query_run_main_method()` 方法 2. `query_run_main_method()` 调用 `runEngineCmd` 来执行query脚本(qxx.sql)

runQuery 调用每个query下的 run.sh 里的 query_run_main_method() 方法

QUERY_MAIN_METHOD="query_run_main_method" ----------------------------------------- "$QUERY_MAIN_METHOD" 2>&1 | tee -a "$LOG_FILE_NAME" 2>&1

query_run_main_method() 调用 runEngineCmd 来执行query脚本(qxx.sql)

query_run_main_method () {     QUERY_SCRIPT="$QUERY_DIR/$QUERY_NAME.sql"     if [ ! -r "$QUERY_SCRIPT" ]     then         echo "SQL file $QUERY_SCRIPT can not be read."         exit 1     fi      runCmdWithErrorCheck runEngineCmd -f "$QUERY_SCRIPT"     return $? }

一遍情况下 query_run_main_method () 方法只是执行对应的query脚本,但是像 q05、q20… 这些查询,用到了机器学习算法,所以在执行对应的query脚本后会把生成的结果表作为输入,然后调用执行机器学习算法(如聚类、逻辑回归)的jar包继续执行,得到最终的结果。

runEngineCmd () {   if addInitScriptsToParams   then     "$BINARY" "${BINARY_PARAMS[@]}" "${INIT_PARAMS[@]}" "$@"   else     return 1   fi } -------------------------- BINARY="/usr/bin/hive" BINARY_PARAMS+=(--hiveconf BENCHMARK_PHASE=$BIG_BENCH_BENCHMARK_PHASE --hiveconf STREAM_NUMBER=$BIG_BENCH_STREAM_NUMBER --hiveconf QUERY_NAME=$QUERY_NAME --hiveconf QUERY_DIR=$QUERY_DIR --hiveconf RESULT_TABLE=$RESULT_TABLE --hiveconf RESULT_DIR=$RESULT_DIR --hiveconf TEMP_TABLE=$TEMP_TABLE --hiveconf TEMP_DIR=$TEMP_DIR --hiveconf TABLE_PREFIX=$TABLE_PREFIX) INIT_PARAMS=(-i "$BIG_BENCH_QUERY_PARAMS_FILE" -i "$BIG_BENCH_ENGINE_SETTINGS_FILE") INIT_PARAMS+=(-i "$LOCAL_QUERY_ENGINE_SETTINGS_FILE")  if [ -n "$USER_QUERY_PARAMS_FILE" ] then if [ -r "$USER_QUERY_PARAMS_FILE" ] then   echo "User defined query parameter file found. Adding $USER_QUERY_PARAMS_FILE to hive init."   INIT_PARAMS+=(-i "$USER_QUERY_PARAMS_FILE") else   echo "User query parameter file $USER_QUERY_PARAMS_FILE can not be read."   return 1 fi fi  if [ -n "$USER_ENGINE_SETTINGS_FILE" ] then if [ -r "$USER_ENGINE_SETTINGS_FILE" ] then   echo "User defined engine settings file found. Adding $USER_ENGINE_SETTINGS_FILE to hive init."   INIT_PARAMS+=(-i "$USER_ENGINE_SETTINGS_FILE") else   echo "User hive settings file $USER_ENGINE_SETTINGS_FILE can not be read."   return 1 fi fi return 0

validateQuery

1. 调用每个query下的 run.sh 里的 `query_run_validate_method()` 方法 2. `query_run_validate_method()` 比较 `$BENCH_MARK_HOME/engines/hive/queries/qxx/results/qxx-result` 和hdfs上 `/user/root/benchmarks/bigbench/queryResults/qxx_hive_${BIG_BENCH_BENCHMARK_PHASE}_${BIG_BENCH_STREAM_NUMBER}_result` 两个文件,如果一样,则验证通过,否则验证失败。
if diff -q "$VALIDATION_RESULTS_FILENAME" <(hadoop fs -cat "$RESULT_DIR/*") then     echo "Validation of $VALIDATION_RESULTS_FILENAME passed: Query returned correct results" else     echo "Validation of $VALIDATION_RESULTS_FILENAME failed: Query returned incorrect results"     VALIDATION_PASSED="0" fi

SF为1时(-f 1),用上面的方法比较,SF不为1(>1)时,只要hdfs上的结果表中行数大于等于1即验证通过

if [ `hadoop fs -cat "$RESULT_DIR/*" | head -n 10 | wc -l` -ge 1 ] then     echo "Validation passed: Query returned results" else     echo "Validation failed: Query did not return results"     return 1 fi

refreshMetastore

1. 调用 `$BENCH_MARK_HOME/engines/hive/refresh/` 目录下的 `hiveRefreshCreateLoad.sql` 脚本 2. `hiveRefreshCreateLoad.sql` 将hdfs上 `/user/root/benchmarks/bigbench/data_refresh/` 目录下每个表数据插入外部临时表 3. 外部临时表再将每个表的数据插入Hive数据库对应的表中

hiveRefreshCreateLoad.sql 将hdfs上 /user/root/benchmarks/bigbench/data_refresh/ 目录下每个表数据插入外部临时表

DROP TABLE IF EXISTS ${hiveconf:customerTableName}${hiveconf:temporaryTableSuffix}; CREATE EXTERNAL TABLE ${hiveconf:customerTableName}${hiveconf:temporaryTableSuffix}   ( c_customer_sk             bigint              --not null   , c_customer_id             string              --not null   , c_current_cdemo_sk        bigint   , c_current_hdemo_sk        bigint   , c_current_addr_sk         bigint   , c_first_shipto_date_sk    bigint   , c_first_sales_date_sk     bigint   , c_salutation              string   , c_first_name              string   , c_last_name               string   , c_preferred_cust_flag     string   , c_birth_day               int   , c_birth_month             int   , c_birth_year              int   , c_birth_country           string   , c_login                   string   , c_email_address           string   , c_last_review_date        string   )   ROW FORMAT DELIMITED FIELDS TERMINATED BY '${hiveconf:fieldDelimiter}'   STORED AS TEXTFILE LOCATION '${hiveconf:hdfsDataPath}/${hiveconf:customerTableName}' ; ----------------- set hdfsDataPath=${env:BIG_BENCH_HDFS_ABSOLUTE_REFRESH_DATA_DIR};

外部临时表再将每个表的数据插入Hive数据库对应的表中

INSERT INTO TABLE ${hiveconf:customerTableName} SELECT * FROM ${hiveconf:customerTableName}${hiveconf:temporaryTableSuffix} ;

附录

在测试之前会先用PDGF(并行数据生成框架)生成数据。指定Scale Factor为1(1GB),生成以下共23张表。以下是每张表的建表语句及每张表里的某一条记录。

23张表

hive> show tables; OK customer customer_address customer_demographics date_dim household_demographics income_band inventory item item_marketprices product_reviews promotion reason ship_mode store store_returns store_sales time_dim warehouse web_clickstreams web_page web_returns web_sales web_site Time taken: 0.017 seconds, Fetched: 23 row(s)

customer (99000行)4.03MB

建表语句:

hive> show create table customer; OK CREATE TABLE `customer`(  `c_customer_sk` bigint,  `c_customer_id` string,  `c_current_cdemo_sk` bigint,  `c_current_hdemo_sk` bigint,  `c_current_addr_sk` bigint,  `c_first_shipto_date_sk` bigint,  `c_first_sales_date_sk` bigint,  `c_salutation` string,  `c_first_name` string,  `c_last_name` string,  `c_preferred_cust_flag` string,  `c_birth_day` int,  `c_birth_month` int,  `c_birth_year` int,  `c_birth_country` string,  `c_login` string,  `c_email_address` string,  `c_last_review_date` string) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/customer/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'2/',  /'numRows/'=/'99000/',  /'rawDataSize/'=/'88084062/',  /'totalSize/'=/'4221267/',  /'transient_lastDdlTime/'=/'1473167154/') Time taken: 0.116 seconds, Fetched: 34 row(s)

某一行:

0 AAAAAAAAAAAAAAAA 1824793 3203 2555 28776 14690 Ms. Marisa Harrington N 17 4 1988 UNITED ARAB EMIRATES RRCyuY3XfE3a Marisa.Harrington@lawyer.com     gdMmGdU9

customer_address(49500行) 0.92MB

建表语句:

hive> show create table customer_address; OK CREATE TABLE `customer_address`(  `ca_address_sk` bigint,  `ca_address_id` string,  `ca_street_number` string,  `ca_street_name` string,  `ca_street_type` string,  `ca_suite_number` string,  `ca_city` string,  `ca_county` string,  `ca_state` string,  `ca_zip` string,  `ca_country` string,  `ca_gmt_offset` decimal(5,2),  `ca_location_type` string) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/customer_address/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'2/',  /'numRows/'=/'49500/',  /'rawDataSize/'=/'55836000/',  /'totalSize/'=/'966548/',  /'transient_lastDdlTime/'=/'1473167160/') Time taken: 0.043 seconds, Fetched: 29 row(s)

某一行:

6187 AAAAAAAAAAAAAJDZ 536 6th Lake Drive UKL8bE5C Lowell Brule County SD 18464 United States -6 apartment

customer_demographics (19200800行)6.84MB

建表语句:

hive> show create table customer_demographics; OK CREATE TABLE `customer_demographics`(  `cd_demo_sk` bigint,  `cd_gender` string,  `cd_marital_status` string,  `cd_education_status` string,  `cd_purchase_estimate` int,  `cd_credit_rating` string,  `cd_dep_count` int,  `cd_dep_employed_count` int,  `cd_dep_college_count` int) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/customer_demographics/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'1/',  /'numRows/'=/'1920800/',  /'rawDataSize/'=/'718371044/',  /'totalSize/'=/'7167503/',  /'transient_lastDdlTime/'=/'1473167169/')

某一行:

0 F     U Primary 6000 Good 0 5 0

date_dim (109573行) 1.67MB

建表语句:

hive> show create table date_dim; OK CREATE TABLE `date_dim`(  `d_date_sk` bigint,  `d_date_id` string,  `d_date` string,  `d_month_seq` int,  `d_week_seq` int,  `d_quarter_seq` int,  `d_year` int,  `d_dow` int,  `d_moy` int,  `d_dom` int,  `d_qoy` int,  `d_fy_year` int,  `d_fy_quarter_seq` int,  `d_fy_week_seq` int,  `d_day_name` string,  `d_quarter_name` string,  `d_holiday` string,  `d_weekend` string,  `d_following_holiday` string,  `d_first_dom` int,  `d_last_dom` int,  `d_same_day_ly` int,  `d_same_day_lq` int,  `d_current_day` string,  `d_current_week` string,  `d_current_month` string,  `d_current_quarter` string,  `d_current_year` string) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/date_dim/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'1/',  /'numRows/'=/'109573/',  /'rawDataSize/'=/'123050479/',  /'totalSize/'=/'1748707/',  /'transient_lastDdlTime/'=/'1473167172/')

某一行:

0 AAAAAAAAAAAAAAAA 1900-01-01 0 0 0 1900 1 1 1 1 1900 0 0 Monday 1900Q1 Y     N     N 2448812 2458802 2472542 2420941 N     N     NN     N

household_demographics (7200行) 14.31KB

建表语句:

hive> show create table household_demographics; OK CREATE TABLE `household_demographics`(  `hd_demo_sk` bigint,  `hd_income_band_sk` bigint,  `hd_buy_potential` string,  `hd_dep_count` int,  `hd_vehicle_count` int) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/household_demographics/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'1/',  /'numRows/'=/'7200/',  /'rawDataSize/'=/'835168/',  /'totalSize/'=/'14655/',  /'transient_lastDdlTime/'=/'1473167173/')

某一行:

0 3 1001-5000 0 0

income_band (20行) 754B

建表语句:

hive> show create table income_band; OK CREATE TABLE `income_band`(  `ib_income_band_sk` bigint,  `ib_lower_bound` int,  `ib_upper_bound` int) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/income_band/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'2/',  /'numRows/'=/'20/',  /'rawDataSize/'=/'320/',  /'totalSize/'=/'754/',  /'transient_lastDdlTime/'=/'1473167179/') Time taken: 0.032 seconds, Fetched: 19 row(s)

某一行:

0 1 10000

inventory (23255100行) 34.55MB

建表语句:

hive> show create table inventory; OK CREATE TABLE `inventory`(  `inv_date_sk` bigint,  `inv_item_sk` bigint,  `inv_warehouse_sk` bigint,  `inv_quantity_on_hand` int) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/inventory/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'3/',  /'numRows/'=/'23255100/',  /'rawDataSize/'=/'651118804/',  /'totalSize/'=/'36234106/',  /'transient_lastDdlTime/'=/'1473167235/') Time taken: 0.031 seconds, Fetched: 20 row(s)

某一行:

36890 0 0 503

item (17820行) 2.36MB

建表语句:

hive> show create table item; OK CREATE TABLE `item`(  `i_item_sk` bigint,  `i_item_id` string,  `i_rec_start_date` string,  `i_rec_end_date` string,  `i_item_desc` string,  `i_current_price` decimal(7,2),  `i_wholesale_cost` decimal(7,2),  `i_brand_id` int,  `i_brand` string,  `i_class_id` int,  `i_class` string,  `i_category_id` int,  `i_category` strong,  `i_manufact_id` int,  `i_manufact` string,  `i_size` string,  `i_formulation` string,  `i_color` string,  `i_units` string,  `i_container` string,  `i_manager_id` int,  `i_product_name` string) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/item/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'1/',  /'numRows/'=/'17820/',  /'rawDataSize/'=/'31238428/',  /'totalSize/'=/'2472973/',  /'transient_lastDdlTime/'=/'1473167181/')

某一行:

0 AAAAAAAAAAAAAAAA 2000-01-14 quickly even dinos beneath the frays must have to boost boldly careful bold escapades: stealthily even forges over the dependencies integrate always past the quiet sly decoys-- notornis sol 72.29 64.96 3898712 71TS7NSbvH1YbdiQMG6ttBHKAljiNoIRB 1 Fan Shop 9 Sports & Outdoors 995 2VOxvrIWwlJQTSk6 small 99Ee1r6uFbZSSClAX3 dodger Oz Unknown 18 8m9n5Q7T33DNWidoA6nWlg6ydmpA1SKOoOJLXiLVb

item_marketprices (89100行) 0.63MB

建表语句:

hive> show create table item_marketprices; OK CREATE TABLE `item_marketprices`(  `imp_sk` bigint,  `imp_item_sk` bigint,  `imp_competitor` string,  `imp_competitor_price` decimal(7,2),  `imp_start_date` bigint,  `imp_end_date` bigint) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/item_marketprices/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'2/',  /'numRows/'=/'89100/',  /'rawDataSize/'=/'21736912/',  /'totalSize/'=/'657024/',  /'transient_lastDdlTime/'=/'1473167275/')

某一行:

5 4737 AAAAAAAAAAAAAAIN 66.4 36890 36958

product_reviews (89991行) 24.36MB

建表语句:

hive> show create table product_reviews; OK CREATE TABLE `product_reviews`(  `pr_review_sk` bigint,  `pr_review_date` string,  `pr_review_time` string,  `pr_review_rating` int,  `pr_item_sk` bigint,  `pr_user_sk` bigint,  `pr_order_sk` bigint,  `pr_review_content` string) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/product_reviews/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'2/',  /'numRows/'=/'89991/',  /'rawDataSize/'=/'79597043/',  /'totalSize/'=/'25546821/',  /'transient_lastDdlTime/'=/'1473167305/')

某一行:

4 2004-02-25 21:43:07 1 3683 88783 41241 Had to offset the praise handed down by so many years is probably the result of some sin committed (obviously so grievous don/'t "get" it. Why all the original artwork by Stephen Gammel.

promotion (300行) 15.83KB

建表语句:

hive> show create table promotion; OK CREATE TABLE `promotion`(  `p_promo_sk` bigint,  `p_promo_id` string,  `p_start_date_sk` bigint,  `p_end_date_sk` bigint,  `p_item_sk` bigint,  `p_cost` decimal(15,2),  `p_response_target` int,  `p_promo_name` string,  `p_channel_dmail` string,  `p_channel_email` string,  `p_channel_catalog` string,  `p_channel_tv` string,  `p_channel_radio` string,  `p_channel_press` string,  `p_channel_event` string,  `p_channel_demo` string,  `p_channel_details` string,  `p_purpose` string,  `p_discount_active` string) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/promotion/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'2/',  /'numRows/'=/'300/',  /'rawDataSize/'=/'404926/',  /'totalSize/'=/'16204/',  /'transient_lastDdlTime/'=/'1473167186/')

某一行:

7 AAAAAAAAAAAAAAAH 94455 108529 13511 427.76 1 bar     YN     Y     N     N     N     N     N     blithe grouches past the blithe quick epitaphs print rut Unknown N

reason (35行) 3.17KB

建表语句:

hive> show create table reason; OK CREATE TABLE `reason`(  `r_reason_sk` bigint,  `r_reason_id` string,  `r_reason_desc` string) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/reason/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'2/',  /'numRows/'=/'35/',  /'rawDataSize/'=/'9027/',  /'totalSize/'=/'3240/',  /'transient_lastDdlTime/'=/'1473167190/')

某一行:

5 uy     busily sly excuses hang: slow braids to the daring somas was toward the epitaphs-- gifts betw

ship_mode (20行) 2.93KB

建表语句:

hive> show create table ship_mode; OK CREATE TABLE `ship_mode`(  `sm_ship_mode_sk` bigint,  `sm_ship_mode_id` string,  `sm_type` string,  `sm_code` string,  `sm_carrier` strong,  `sm_contract` string) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/ship_mode/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'2/',  /'numRows/'=/'20/',  /'rawDataSize/'=/'9576/',  /'totalSize/'=/'3001/',  /'transient_lastDdlTime/'=/'1473167196/')

某一行:

12 wGyozLRZ3uL     iCTZsMlNzsQ     cBRc FlWM9v tm0ehuQ2

store (12行) 8.03KB

建表语句:

hive> show create table store; OK CREATE TABLE `store`(  `s_store_sk` bigint,  `s_store_id` string,  `s_rec_start_date` string,  `s_rec_end_date` string,  `s_closed_date_sk` bigint,  `s_store_name` string,  `s_number_employees` int,  `s_floor_space` int,  `s_hours` string,  `s_manager` string,  `s_market_id` int,  `s_geography_class` string,  `s_market_desc` string,  `s_market_manager` string,  `s_division_id` int,  `s_division_name` string,  `s_company_id` int,  `s_company_name` string,  `s_street_number` string,  `s_street_name` string,  `s_street_type` string,  `s_suite_number` string,  `s_city` string,  `s_county` string,  `s_state` string,  `s_zip` string,  `s_country` string,  `s_gmt_offset` decimal(5,2),  `s_tax_precentage` decimal(5,2)) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/store/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'2/',  /'numRows/'=/'12/',  /'rawDataSize/'=/'25962/',  /'totalSize/'=/'8214/',  /'transient_lastDdlTime/'=/'1473167201/')

某一行:

10 AAAAAAAAAAAAAAAK 2000-11-01 6235 Avalon 254 6468537 8AM-12AM Michael Barlow 6 Unknown final sly gifts by the even final dependencies x-ray under the fluffy Barry Shaw 1 Unknown1Unknown 993 Mill Pkwy aQV Cold Springs Cumberland County TN 95692 United States -5 0.03

store_returns (37902行) 1.19MB

建表语句:

hive> show create table store_returns; OK CREATE TABLE `store_returns`(  `sr_returned_date_sk` bigint,  `sr_return_time_sk` bigint,  `sr_item_sk` bigint,  `sr_customer_sk` bigint,  `sr_cdemo_sk` bigint,  `sr_hdemo_sk` bigint,  `sr_addr_sk` bigint,  `sr_store_sk` bigint,  `sr_reason_sk` bigint,  `sr_ticket_number` bigint,  `sr_return_quantity` int,  `sr_return_amt` decimal(7,2),  `sr_return_tax` decimal(7,2),  `sr_return_amt_inc_tax` decimal(7,2),  `sr_fee` decimal(7,2),  `sr_return_ship_cost` decimal(7,2),  `sr_refunded_cash` decimal(7,2),  `sr_reversed_charge` decimal(7,2),  `sr_store_credit` decimal(7,2),  `sr_net_loss` decimal(7,2)) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/store_returns/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'2/',  /'numRows/'=/'37902/',  /'rawDataSize/'=/'41388272/',  /'totalSize/'=/'1250563/',  /'transient_lastDdlTime/'=/'1473167251/')

某一行:

37375 38182 11520 7640 242073 6754 25731 2 22 20293 33 1990.89 119.45 2110.34 34.14 2433.41 477.81 559.84 953.24 2672.31

store_sales (667579行) 23.49MB

建表语句:

hive> show create table store_sales; OK CREATE TABLE `store_sales`(  `ss_sold_date_sk` bigint,  `ss_sold_time_sk` bigint,  `ss_item_sk` bigint,  `ss_customer_sk` bigint,  `ss_cdemo_sk` bigint,  `ss_hdemo_sk` bigint,  `ss_addr_sk` bigint,  `ss_store_sk` bigint,  `ss_promo_sk` bigint,  `ss_ticket_number` bigint,  `ss_quantity` int,  `ss_wholesale_cost` decimal(7,2),  `ss_list_price` decimal(7,2),  `ss_sales_price` decimal(7,2),  `ss_ext_discount_amt` decimal(7,2),  `ss_ext_sales_price` decimal(7,2),  `ss_ext_wholesale_cost` decimal(7,2),  `ss_ext_list_price` decimal(7,2),  `ss_ext_tax` decimal(7,2),  `ss_coupon_amt` decimal(7,2),  `ss_net_paid` decimal(7,2),  `ss_net_paid_inc_tax` decimal(7,2),  `ss_net_profit` decimal(7,2)) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/store_sales/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'2/',  /'numRows/'=/'667579/',  /'rawDataSize/'=/'953293700/',  /'totalSize/'=/'24629162/',  /'transient_lastDdlTime/'=/'1473167245/')

某一行:

37115 20244 16481 98676 1211207 5239 37107 9 38 11138 93 56.47 82.45 36.28 4293.81 3374.04 5251.71 7667.85 101.22 0 3374.04 3475.26 -1877.67

time_dim (86400行) 219.49KB

建表语句:

hive> show create table time_dim; OK CREATE TABLE `time_dim`(  `t_time_sk` bigint,  `t_time_id` string,  `t_time` int,  `t_hour` int,  `t_minute` int,  `t_second` int,  `t_am_pm` string,  `t_shift` string,  `t_sub_shift` string,  `t_meal_time` string) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/time_dim/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'1/',  /'numRows/'=/'86400/',  /'rawDataSize/'=/'41040000/',  /'totalSize/'=/'224757/',  /'transient_lastDdlTime/'=/'1473167202/')

某一行:

2 AAAAAAAAAAAAAAAC 2 0 0 2 AM     third     night

warehouse (5行) 1.94KB

建表语句:

hive> show create table warehouse; OK CREATE TABLE `warehouse`(  `w_warehouse_sk` bigint,  `w_warehouse_id` string,  `w_warehouse_name` string,  `w_warehouse_sq_ft` int,  `w_street_number` string,  `w_street_name` string,  `w_street_type` string,  `w_suite_number` string,  `w_city` string,  `w_county` string,  `w_state` string,  `w_zip` string,  `w_country` string,  `w_gmt_offset` decimal(5,2)) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/warehouse/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'2/',  /'numRows/'=/'5/',  /'rawDataSize/'=/'5695/',  /'totalSize/'=/'1980/',  /'transient_lastDdlTime/'=/'1473167204/')

某一行:

1 AAAAAAAAAAAAAAAB     frets would sleep 845707 181 9th 13thCourt kQ Sherwood Forest Teton County WY 87891 United States -7

web_clickstreams (6770550行) 34.11MB

建表语句:

hive> show create table web_clickstreams; OK CREATE TABLE `web_clickstreams`(  `wcs_click_date_sk` bigint,  `wcs_click_time_sk` bigint,  `wcs_sales_sk` bigint,  `wcs_item_sk` bigint,  `wcs_web_page_sk` bigint,  `wcs_user_sk` bigint) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/web_clickstreams/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'2/',  /'numRows/'=/'6770550/',  /'rawDataSize/'=/'253337416/',  /'totalSize/'=/'35768476/',  /'transient_lastDdlTime/'=/'1473167298/')

某一行:

36890 27089 NULL 15465 47 NULL

web_page (60行) 6.81KB

建表语句:

hive> show create table web_page; OK CREATE TABLE `web_page`(  `wp_web_page_sk` bigint,  `wp_web_page_id` string,  `wp_rec_start_date` string,  `wp_rec_end_date` string,  `wp_creation_date_sk` bigint,  `wp_access_date_sk` bigint,  `wp_autogen_flag` string,  `wp_customer_sk` bigint,  `wp_url` string,  `wp_type` string,  `wp_char_count` int,  `wp_link_count` int,  `wp_image_count` int,  `wp_max_ad_count` int) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/web_page/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'2/',  /'numRows/'=/'60/',  /'rawDataSize/'=/'38763/',  /'totalSize/'=/'6969/',  /'transient_lastDdlTime/'=/'1473167215/')

某一行:

24 AAAAAAAAAAAAAAAY 2000-12-19 38819 69008 0 44446 http://www.C8jdri37RmtbDNeFnXjmYbyBPzeO4WWK9pVYP6xtBJbaQ5yIj4s.com     feedback    4340     23     5     2

web_returns (38487行) 1.40MB

建表语句:

hive> show create table web_returns; OK CREATE TABLE `web_returns`(  `wr_returned_date_sk` bigint,  `wr_returned_time_sk` bigint,  `wr_item_sk` bigint,  `wr_refunded_customer_sk` bigint,  `wr_refunded_cdemo_sk` bigint,  `wr_refunded_hdemo_sk` bigint,  `wr_refunded_addr_sk` bigint,  `wr_returning_customer_sk` bigint,  `wr_returning_cdemo_sk` bigint,  `wr_returning_hdemo_sk` bigint,  `wr_returning_addr_sk` bigint,  `wr_web_page_sk` bigint,  `wr_reason_sk` bigint,  `wr_order_number` bigint,  `wr_return_quantity` int,  `wr_return_amt` decimal(7,2),  `wr_return_tax` decimal(7,2),  `wr_return_amt_inc_tax` decimal(7,2),  `wr_fee` decimal(7,2),  `wr_return_ship_cost` decimal(7,2),  `wr_refunded_cash` decimal(7,2),  `wr_reversed_charge` decimal(7,2),  `wr_account_credit` decimal(7,2),  `wr_net_loss` decimal(7,2)) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/web_returns/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'2/',  /'numRows/'=/'38487/',  /'rawDataSize/'=/'43257220/',  /'totalSize/'=/'1471571/',  /'transient_lastDdlTime/'=/'1473167269/')

某一行:

38952 68126 9590 52948 893223 2020 6942 52948 893223 2020 6942 32 13 63444 33 626.0125.04 651.05 18.74 958.4 244.14 286.4 95.47 1008.48

web_sales (668052行) 32.99MB

建表语句:

hive> show create table web_sales; OK CREATE TABLE `web_sales`(  `ws_sold_date_sk` bigint,  `ws_sold_time_sk` bigint,  `ws_ship_date_sk` bigint,  `ws_item_sk` bigint,  `ws_bill_customer_sk` bigint,  `ws_bill_cdemo_sk` bigint,  `ws_bill_hdemo_sk` bigint,  `ws_bill_addr_sk` bigint,  `ws_ship_customer_sk` bigint,  `ws_ship_cdemo_sk` bigint,  `ws_ship_hdemo_sk` bigint,  `ws_ship_addr_sk` bigint,  `ws_web_page_sk` bigint,  `ws_web_site_sk` bigint,  `ws_ship_mode_sk` bigint,  `ws_warehouse_sk` bigint,  `ws_promo_sk` bigint,  `ws_order_number` bigint,  `ws_quantity` int,  `ws_wholesale_cost` decimal(7,2),  `ws_list_price` decimal(7,2),  `ws_sales_price` decimal(7,2),  `ws_ext_discount_amt` decimal(7,2),  `ws_ext_sales_price` decimal(7,2),  `ws_ext_wholesale_cost` decimal(7,2),  `ws_ext_list_price` decimal(7,2),  `ws_ext_tax` decimal(7,2),  `ws_coupon_amt` decimal(7,2),  `ws_ext_ship_cost` decimal(7,2),  `ws_net_paid` decimal(7,2),  `ws_net_paid_inc_tax` decimal(7,2),  `ws_net_paid_inc_ship` decimal(7,2),  `ws_net_paid_inc_ship_tax` decimal(7,2),  `ws_net_profit` decimal(7,2)) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/web_sales/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'2/',  /'numRows/'=/'668052/',  /'rawDataSize/'=/'1221174888/',  /'totalSize/'=/'34585166/',  /'transient_lastDdlTime/'=/'1473167263/')

某一行:

36890 26789 36993 794 85457 380790 2436 42649 29934 1583844 5251 525 2 14 1 3 120 0 93 32.62 81.22 59.29 2039.49 5513.97 3033.66 7553.46 330.84 0 1661.76 5513.97 5844.81 7175.73 7506.57 2480.31

web_site (30行)10.31KB

建表语句:

hive> show create table web_site; OK CREATE TABLE `web_site`(  `web_site_sk` bigint,  `web_site_id` string,  `web_rec_start_date` string,  `web_rec_end_date` string,  `web_name` string,  `web_open_date_sk` bigint,  `web_close_date_sk` bigint,  `web_class` string,  `web_manager` string,  `web_mkt_id` int,  `web_mkt_class` string,  `web_mkt_desc` string,  `web_market_manager` string,  `web_company_id` int,  `web_company_name` string,  `web_street_number` string,  `web_street_name` string,  `web_street_type` string,  `web_suite_number` string,  `web_city` string,  `web_county` string,  `web_state` string,  `web_zip` string,  `web_country` string,  `web_gmt_offset` decimal(5,2),  `web_tax_percentage` decimal(5,2)) ROW FORMAT SERDE  /'org.apache.hadoop.hive.ql.io.orc.OrcSerde/' STORED AS INPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat/' OUTPUTFORMAT  /'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat/' LOCATION  /'hdfs://mycluster/usr/hive/warehouse/bigbench.db/web_site/' TBLPROPERTIES (  /'COLUMN_STATS_ACCURATE/'=/'true/',  /'numFiles/'=/'2/',  /'numRows/'=/'30/',  /'rawDataSize/'=/'62832/',  /'totalSize/'=/'10554/',  /'transient_lastDdlTime/'=/'1473167210/')

某一行:

2 AAAAAAAAAAAAAAAC 2002-07-17 site_0 16450 91500 Unknown Gregory George 1 sheaves despite the quietly sly asymp     thin enticing frets except the sometimes final courts might promise blithe dino Frank Hernandez 1 ese 17 5th Ave EbDxJVL Georgetown Guadalupe County TX 75435 United States -6 0.01

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » 一篇文章看懂TPCx-BB(大数据基准测试工具)源码

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址