Built a working Hadoop-Spark-Hive-Superset cluster on Docker
Let’s get this thing started
Now you can download my repo hadoop-spark-hive-superset and from the directory where you placed it all it takes is this command to get the multi-container environment running:And you can break it all down again by going to that same directory and running this:
All the containers will then be stopped and removed. But: the images and volumes stay! So don’t be surprised that the csv file you uploaded to HDFS will still be there.
Quick starts
Quick start HDFS
Find the Container ID of the namenode.
Copy supermart_grocery_sales.csv to the namenode.
Go to the bash shell on the namenode with that same Container ID of the namenode.
Create a HDFS directory /data//sales/supermart_grocery_sales.
Copy supermart_grocery_sales.csv to HDFS:
Quick start Spark
Go to http://<dockerhadoop_IP_address>:8080 or http://localhost:8080/ on your Docker host (laptop). Here you find the spark:// master address:Go to the command line of the Spark master and start spark-shell.
Load supermart_grocery_sales.csv from HDFS.
Quick start Hive
Find the Container ID of the Hive Server.Go to the command line of the Hive server and start hiveserver2
Maybe a little check that something is listening on port 10000 now
Comments
Post a Comment