Wednesday, April 13, 2016

JDBC Operations in Spark + Scala

Here is the sample code to create a sample Dataframe from a RDD and then insert that into MySql database. This approved should work for any other relational databases.

import org.apache.spark.SparkConf
import java.util.Properties
import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.sql.SaveMode
import java.util.Date
import java.text.SimpleDateFormat


// Some Class
case class Person(age: Int, name: String, occu: String)

object Utilities {
  val sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
  val url = "jdbc:mysql://sna-blr-02.beds.boeing.com:3306/ahmdb";
  val prop = new Properties()
  prop.put("user", "hadoop")
  prop.put("password", "")
  prop.put("driver", "com.mysql.jdbc.Driver")
  val tablename = "person"

  def savePerson(person: Person, sc: SparkContext, sqlContext: SQLContext): Unit = {

    import sqlContext.implicits._

    // Get the current time
    val currentTime = sdf.format(new Date());

// Create a sample Dataframe
    val df = sc.parallelize(Array((person.age, person.name, person.occu,    currentTime))).toDF("age", "name", "occu", "updatedtime")

// JDBC insert in append mode
    df.write.mode(SaveMode.Append).jdbc(url, tablename , prop)
  }
}

//Above method usage:
Utilities.savePerson(33, "Foo", "bar", sparkContext, sqlContext)

// MySql Table Structure
CREATE TABLE person( 
  age int(3) DEFAULT NULL,
  name text NOT NULL,
  occu text,
  updatedtime datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Happy Sparking...

Thursday, March 3, 2016

Installing Additional Hadoop Services using Parcels in Cloudera Manager

To install additional Services using Parcel in Cloudera Manager, follow the following steps:

  1. Download Parcel file from internet, make sure the component version is supported by your Cloudera version & the Operating System
  2. Download corresponding manifest.json file, also create .sha file using command
     echo "*********************" > componentName.parcel.sha
  3.  Where this hash will be found in manifest.json file
  4. Scp these three files to Hadoop Master node at location like /opt/cloudera/parcel-local/newdir
  5. Run an httpserver in this dir to expose this a local parcel repository 
    python -m SimpleHTTPServer 8900 
  6. In Cloudera manager, go to parcel page and Check for New Parcels. You might have to configure parcel setting and add your server:8900 as one of the parcel repository.
  7. You should see your parcel listed here now
  8. Perform Download of this parcel in CM
  9. Perform Distribute, by clicking on Distribute button
  10. Perform Activate, by clicking on Activate button
  11. Check Parcel Usage
  12. Now you may need to perform 'Cluster Add Service' to actually install your component based on above Parcel

Monday, February 22, 2016

Cloudera Hadoop Setup - HDFS Canary Health Check issue

After setting up Hadoop Cluster using Cloudera Manager, one of the common issues some of us face is Canary Health Check issues. This most often happens due to connectivity between the Master and Slave Nodes. In my case HDFS was throwing Canary error saying unable to write/read to /tmp/.cloudera_health_monitoring_canary_timestamp. Then I finally had to open corresponding Port on my DataNodes to resolve this error: Open ports from Firewall list: $ iptables-save | grep 8042 output: Will be blank $ firewall-cmd --zone=public --add-port=8042/tcp --permanent output: success $ firewall-cmd --reload output: success $ iptables-save | grep 8042 output: -A IN_public_allow -p tcp -m tcp --dport 7180 -m conntrack --ctstate NEW -j ACCEPT

Sunday, January 24, 2016

Running Apache Kafka in Windows

Windows Kafka Setup steps:
1) Download Kafka and unzip.
2) Start Zookeeper:
kafka_2.10-0.8.2.1\bin\windows\zookeeper-server-start.bat tools\kafka_2.10-0.8.2.1\config\zookeeper.properties
3) Start Kafka Server
kafka_2.10-0.8.2.1\bin\windows\kafka-server-start.bat tools\kafka_2.10-0.8.2.1\config\server.properties
4) List topics, to make sure Kafka is up and running
kafka_2.10-0.8.2.1\bin\windows\kafka-topics.bat --list --zookeeper localhost:2181
5) Create new Topic, examples:
kafka-topics.bat --create --topic sensor1 --replication-factor 1 --zookeeper localhost:2181 --partition 5
6) Produce some sample Kafka Messages:
kafka_2.10-0.8.2.1\bin\windows\kafka-console-producer.sh --broker-list localhost:9092 --topic sensor1
6) Consumer to Test whether above produced messages are successfully published to Kafka broker:
kafka_2.10-0.8.2.1\bin\windows\kafka-console-consumer.bat --zookeeper localhost:2181 --topic sensor1 --from-beginning
7) Stop Kafka Server:
tools\kafka_2.10-0.8.2.1\bin\windows\kafka-server-stop.bat
8) Stop Zookeeper:
tools\kafka_2.10-0.8.2.1\bin\windows\zookeeper-server-stop.bat
Happy Kafka'ing..

Friday, December 19, 2014

String Literal as Sychronization Lock

String Literal are unique. When we create another string with same literal, the second reference will point to first object itself. Please pay attention while using Strings as Synchronization locks.
package com.sudheer.springbrain;

public class SringLiteralThread implements Runnable {
 
 
 String lock;
 SringLiteralThread (String lock) {
  this.lock = lock;
 }
 

 @Override
 public void run() {
  System.out.println(Thread.currentThread() + " : Trying lock");
  synchronized (lock) {
   System.out.println(Thread.currentThread() + " : Acquired lock");
   if (true) {
    while (true) {
     // Some infinite loop
    }
   }
  }
  System.out.println(Thread.currentThread() + " : Done");
 }
 
 public static void main(String[] args) {
  String lock = "abc";
  String lock2 = "abc";
  
  //String lock = new String("abc");
  //String lock2 = new String("abc");
  
  Thread t1 = new Thread(new SringLiteralThread(lock));
  t1.start();
  
  Thread t2 = new Thread(new SringLiteralThread(lock2));
  t2.start();
  
 }
}

Output:
Thread[Thread-0,5,main] : Trying lock
Thread[Thread-0,5,main] : Acquired lock
Thread[Thread-1,5,main] : Trying lock
In above, there lock and lock2 are different strings, there are referenced to same string literal. Eventually resulting the second thread to not enter the synchronized blocked being occupied by first thread. Where as new String() for same string literal will yield dirrent objects.
package com.sudheer.datastructures;

public class SringLiteralThread implements Runnable {
 
 
 String lock;
 SringLiteralThread (String lock) {
  this.lock = lock;
 }
 

 @Override
 public void run() {
  System.out.println(Thread.currentThread() + " : Trying lock");
  synchronized (lock) {
   System.out.println(Thread.currentThread() + " : Acquired lock");
   if (true) {
    while (true) {
     // Some infinite loop
    }
   }
  }
  System.out.println(Thread.currentThread() + " : Done");
 }
 
 public static void main(String[] args) {
  //String lock = "abc";
  //String lock2 = "abc";
  
  String lock = new String("abc");
  String lock2 = new String("abc");
  
  Thread t1 = new Thread(new SringLiteralThread(lock));
  t1.start();
  
  Thread t2 = new Thread(new SringLiteralThread(lock2));
  t2.start();
  
 }
}


Output: 
Thread[Thread-1,5,main] : Trying lock
Thread[Thread-0,5,main] : Trying lock
Thread[Thread-0,5,main] : Acquired lock
Thread[Thread-1,5,main] : Acquired lock

Thursday, June 17, 2010

Implementing common application headers using Maven Overlay

Many web applications these days are not really a single deployed instances, they are a set of applications (.ear files) deployed in different application server instances. Typically the developers divide them based on the tabs or functionality. For example, the landing or overview is one deployment, where as, billing and reports could be separate deployment and so on.

But one thing which will be common in all these individual applications would be the application header (i'm not talking about http headers) which shows the page navigation. This header can have multi-level navigation structure. When user click on say Billing tab which is deployed in different app server instance, the request goes to that new server, new session gets created and page will be rendered with its own headers. The look n feel of this Billing page header would be kept same as Overview page, to make this transition unnoticeable to user.

For the above purpose, the developers end up in implementing the same header code in all the projects (each application instance), which has the following concerns:
- Redundant code. All the JSPs, associated CSS & Javascript, any 3rd javascript frameworks, Java code if any backend logic has to be processed, should be replicated in all projects.
- Changes to the headers in project has to manually merged into other project header codes also. Which is cumbersome or possibility of forgetting.

We have Maven Overlay to rescue us here.

The idea hear is, all the header related code (Java, Jsp, Html, CSS, Javascript n others) will be kept in a single Web project (preferably a separate project). Then this project is dymically included into the main projects during build time. The advantanges of this apporach are :
- The main projects (Overview, Billing etc) doesnt have to have any header code.
- All the header codes are under single project, so either code or code changes does not have to be replicated.
- Compiled once, copied to other projects, means quicker builds.

The same concept can be applied to Footers. For that matter, this can be used in any functionality of the app which remains same in multiple domains.

Enough of theory right, now let us get our hands dirty:

Here is the CommonHeaderProject structure.



This projects contains all the header n footer jsps. Here i used Navigation.xml to define the navigation structure and using XSL to render the html header.











Now let us have a look at the maven pom.xml for this CommonHeaderProject.


<modelVersion>4.0.0</modelVersion>
<groupId>com.sudheer.mybiz</groupId>
<artifactId>commonHeaderProject</artifactId>
<packaging>war</packaging>
<version>3.0.0-SNAPSHOT</version>
<name>mybiz-header</name>
<url>http://maven.apache.org</url>


Note that this project is built as War.

Here is the pom.xml excerpt from one of the main projects (OverviewWeb).


<build>
<finalName>OverviewWeb</finalName>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-war-plugin</artifactId>
<version>2.1-beta-1</version>
<configuration>
<archive>
<manifest>
<addDefaultImplementationEntries>true</addDefaultImplementationEntries>
<addDefaultSpecificationEntries>true</addDefaultSpecificationEntries>
</manifest>
</archive>
<attachClasses>true</attachClasses>
<overlays>
<overlay>
<groupId>com.sudheer.mybiz</groupId>
<artifactId>mybiz-header</artifactId>
<includes>
<include>jsp/includes/*</include>
</includes>
</overlay>
</overlays>
<filteringDeploymentDescriptors>true</filteringDeploymentDescriptors>
</configuration>
</plugin>
</plugins>
</build>



The overlay tag above instructs maven build to look for project mybiz-header dependency for the given groupId. Maven overlay provides us the option to include and exclude files.

When OverviewWeb is built using maven, maven bring the included files from mybiz-header war and copies them into the OverviewWeb target. Maven retains the same package structure for the files brought.



Here is the OverviewWeb target after maven build. You may see the files from CommonHeaderProject are copied to this target.






Similarly for other projects to use the same header, just have to add overlays to their pom. Thats it, as simple as that.

Monday, June 29, 2009

Considering using Java Variable Arguments

one day i came across this situation, we have a Java project (LoggerCore) which is a common project used by many other projects. I had to modify a method signature in LoggerCore project, but was afraid that i would end up cascading changes in all the projects to satisfy java compiler.
Started thinking that there should be someway to modify my project and common projects with out touching other projects which also use this common one. Suddenly it flashed while driving home, Java Variable Arguments.

Let us have look at the code snippets:


public SpringBrainLogger logIt(String referenceNumber) {
...
}

This is the original method signature in LoggerCore project, if I have to pass extra information, i need to add new argument and add it through out its reference. How tedious? Easily error prone!


public SpringBrainLogger logIt(String referenceNumber, LogInfo... info) {
...
}

Intead, I considered doing this. I added LogInfo... info which is Java variable argument syntax.
Boon:
- Dont have to modify rest n number of projects, as java variable arg is treated optional. Yes, invoking it like logIt(String referenceNumber) in other project will compile happily.
- Just pass this extra argument from my project.
- We can pass any number of LogInfo objects.

Be careful with the type of variable argument passed and consider doing proper checks like:

if (LogInfo.class.isInstance(info[0])) { ... }


Thank you Var Args!