Posts

Showing posts from 2017

Launch spark shell on multiple executors

Image
Example: spark-shell --deploy-mode cluster --master yarn --executor-cores 4 --num-executors 6 --executor-memory 12g Like how we tune spark-submit parameters, same tuning parameters are applicable for spark-shell as-well. Except that deploy-mode can not be 'cluster', of-course right. Also make sure spark.dynamicAllocation.enabled is set to  true. With these settings, you can see that Yarn Executors are allocated on demand and removed when no more required.

Spark Scala - Perform data aggregation on last or next n seconds time window

Often while performing statistical aggregations, we get scenario to perform aggregations on next n number of seconds from current row for each of the rows. Spark Window api provides a nice rangeBetween functionality which facilitates performing above. For Example: // Sample data with timestamp // Sample data with timestamp val customers = sc.parallelize(List(("Alice", "2016-05-01 00:00:00", 10,4), ("Alice", "2016-05-01 00:00:01", 20,2), ("Alice", "2016-05-01 00:00:02", 30,4), ("Alice", "2016-05-01 00:00:02", 40,6), ("Alice", "2016-05-01 00:00:03", 50,1), ("Alice", "2016-05-01 00:00:03", 60,4), ("Alice", "2016-05-01 00:00:04", 70,2), ("Alice", "2016-05-01 00:00:05", 80,4), ("Bob", "2016-05-01 00:00:03", 25,6), ("Bob", "2016-05-01 00:00:04", 29,7), ("Bob", "2016-05-...

Accessing application listening at custom port on Oracle VirtualBox Hosted VM

Image
For the virtual machines like HortonWorks Sandbox (HDP) VM which is hosted on Oracle VirtualBox, we often deploy custom applications in the VM. This only brings a surprise later that this application is not accessible from Host machine, where as other standard hadoop ports like 8080, 8888, 4040 do open normally. This trick here is that we need to enable Port Forwarding for the new ports we added in VirtualBox. VM -> Devices -> Network ->  Adapter -> Advances -> Click on Port Forwarding button -> Add Host port number and Guest Port numbers (these can be same though). Here is the screenshot on where to update: That's it. Now the application should be accessible from Host computer using the Host Port. Happy Coding!

HBase : Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.

When accessing remote HBase database using HBase Client from Java Applications, there is a possibility of getting the following error: "Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master." This is because the configurations on HBase server could be non-default and which are not necessarily known to Client developer. For example in my case, 'zookeeper.znode.parent' was changed to /hbase/secure instead of default /hbase. Like this there could be more changes which makes is cumbersome for HBase Client to pass all the configurations. One easy solution to get around these is:     Get the hbase-site.xml from the HBase Cluster and add it to your Java applications Classpath. Thats it. This way we don't have to do the Zookeeper Quorum settings also. Sample Scala Client Code: val conf = org.apache.hadoop.hbase.HBaseConfiguration.create() // Instead of the following settings, pass hbase-s...