SPARK(ling)

Problems

How was my first contact with Apache Spark? To be honest it was not piece of cake.
There were several cases. First, after donwloading spark you have to manually download scala and set up scala environmental variables to make it living.
Second I had issue with JVM memory, after sbt package and setting _JAVAOPTIONS to some -Xmx values it was solved and Spark started. But later on JVM memory error returned then I have just changed JVM from jdk7u79 to jdk8 and .. no more tears.

Sample use-case

count the words

I like learning by example. So we have simple use case. Count the words from webpage. To start with I have just copied data from page to text file. What to do? First make your first RDD,
load it via

[code lang=”scala”]
val dlines = sc.textFile("C://path_to_file//site.txt")
[/code]

check if it has any data and print first 10 lines

[code lang=”scala”]
dlines.count()
dlines.take(10).foreach(println)
[/code]

Now we may want to split words in lines by space

[code lang=”scala”]
val word_arrays = dlines.map(ln=>ln.split(" "))
[/code]

Having this strange array of array let’s flatten it making array of all words here

[code lang=”scala”]
val words = word_arrays.collect().flatMap(y=>y)
val dwords = sc.parallelize(words)
[/code]

With RDD made up from words it is possible to have fun. To count the words we assign to each word one number. After that we shall accumulate all by magic reduceByKey

[code lang=”scala”]
val words_nr = dwords.map( s=>(s,1))
val counts = words_nr.reduceByKey((a, b) => a + b)
counts.take(10).foreach(println)
[/code]

We have all the counts, but there is mess. Let us sort and see what words are most popular on the site.

[code lang=”scala”]
scala> val sorted = counts.sortBy(k=>k._2, false)
scala> sorted.take(20).foreach(println)
//resulting
(the,339)
(a,194)
(to,186)
(of,168)
(,144)
(in,118)
(and,99)
(is,97)
(on,88)
(Spark,77)
(for,65)
(be,62)
(an,59)
(RDD,52)
(that,52)
(can,49)
(each,45)
(or,45)
(are,44)
(as,43)
[/code]

Voila!

Further nice things

Of course what is here is not everything, you may nicely communicate with databases or use Hadoop-dedicated parquets. There is rich number of addons, with worth to mention MLib used in machine learning.

JEE has not died

Why Spring?

Could you quickly and without hesitation anwer such a question:

Why am I using Spring

Why did people of Java once marked JEE as ugly?  I will tell you. It was during the time when JEE was big, lazy, and noone wanted to use it.  Spring has come with all its beauty – Annotated beans, lot of added value (easy Transactions, easy Data).  Of course during some period it was incomparable. But time is running.  Why do I say that it is easy to run small JavaEE (micro)service and have it in some minutes? Let me tell..

Prerequisities

Application server

Download App server: I recommend Wildfly.  After having a lot of mundane work with Websphere7 in work I felt relief when I could simply start quick Wildfly server and deploy application in seconds. That was something super cool

Java

Use Java8. Get JDK 8. No questions.

Forging tool

Download JBossForge. Of course it is nothing wrong to build it by yourself but: come on! You have your businness programming to be done!

Forge, code it, deploy

forging

Using JBossForge create the app schemat. You just type forge and nice console appears.

[code lang=”text”]
project-new
[/code]

and forge will take your hand and lead into project construction. Choose war please. Voila, maven project created. mvn clean install it and here we go. Now let’s do some coding.

coding

Create config class like that

[code lang=”java”]
@ApplicationPath("/app")
public class ApplicationConfig extends Application {
}
[/code]

and endpoint similar to that

[code lang=”java”]
@Path("/fast")
public class FastEndpoint {

@GET
public String getHelloFromFastJavaEEPath() {
return "fast!";
}

}
[/code]

That’s all folks!

deploying

Start your wildfly, go to the pretty fresh console at http:\\localhost:9990 and go on! Deploy your war.
Tired of deploying manually? Nothing simpier : use maven plugin for that.
Now check it – have your first fast-jave-ee hands-on experience: go to app url. You should see response.

Cloudy?

Nothing easier. After little research I made I see nice options at openshift. They give you 3 gears for free (with idling) or the same number of gears – bronze option, when you’ll provide your credic card data. One gear = one server. Sounds like enough for getting started.

The end

I am using Spring. And I will not stop doing it. But it’s valuable to know that world of JEE is not so scary as it used to be.

Why Javer should know some Python?

I am javer, during worktime I am using Java syntax with all Java micro and macroworld. But for some reason I started to learn that language. Reason? There are two people around me who use it (work/unofficially) and to coopearate with them I had to learn it.
While making some little Python project I encountered feelings like that: there are some things I love in Python comparing to Java, some I miss

Why I miss Java writing Python

Static typing

As dev I learned how to statically type. I was static in Delphi, I am static in Java. I prefer to be static. I feel safer then.

Virtual methods

Writing piece of code that should be done according to OOP rules and design patterns may not be easy with Python. You will not have a chance to write code that is virtual function-like. Strategy method can be hard.

Why do I love in Python

Lack of static typing

Yes, the sam reason why I prefer Java. But think about it: you just write. Your variable is exactly that what is. You easily bring to reality all your ideas. You just type list=[] and that is list! You type a = {'a':1} and you have a map (in Pythonish: ‚dict’). Life is easier but don’t forget about consequences!

Ease with daily tasks

For tasks like json, file IO, network and other issue Python is just less verbose. Any examples? Here we go.

Json made simple

No more thinking how to materialize your ideas as file. You may easy save dict/object as json or save it directly as file

[code lang=”python”]
with open(output, 'w', encoding='utf-8') as f :
json.dump(list_of_objects, f, ensure_ascii=False, indent=4)
f.close()
[/code]

File is something you open

Nothing easier more than read the file. What do you need to read the file? Just open it

[code lang=”python”]
f = open(path, 'a')
f.write("string")
f.close()
[/code]

While opening just type the mode (append/read/write) and act!

Network

No more thinking about immortal IOExceptions, networking is way of requesting the space, like that

[code lang=”python”]
page = requests.get(self.url)
page_tree = html.fromstring(page.content)
element_nodes = page_tree.cssselect(self.element_parent)
[/code]

Above you just GET the url, then take content as text, after that converting to DOM elements, easily traversable. Poetry.

No superheros

I am not trying to say that Python rules. But what I want to remember is fact, that knowing one language you are tight, wider perspective is much more attractive. There are no super-languages, just take what is good and use it in your project. Before that take some time to learn at least foundations of languages available.

why TDD is easy with Node app

motivation

I am not a node dev. I am not. But I like to do things using easy tools. I am lazy – in the same way as all of you are lazy

TDD?

Using experience of my colleagues I have come to the conclusion that TDD is nice. No more checking by yourself does the app runs, no more manual smoke test performed. I want to run tests and be sure that positive outcome of them says : app is running OK

runner : Node

I have been using following „language-frameworks” for past ages: Delphi and Java. When using some lanugage you are tied to possibilities of some framework. Every interpreted language need some interpreter. For java it is java for python it is python for Node it is .. node.
I have decided to use it because Node makes writing web-server very easy. First of all get it from web and try running any Node hello

dependency management : npm

Every framework has to have any dependency management tool. I can’t forget how bad it was in Delphi, how many days you had to spend to run app copied to pendrive from your colleague laptop. Npm has is just installing everything you need. You will need npm install to install all dependencies laying in package.json, npm test to run tests and npm start to start your node app.

webserver : express-js

How to make webserver easily? In your node app just write this in your js code

[code lang=”javascript”]
app.get('/', function (req, res) {
res.send('Hello World!');
});
[/code]

Context please check it here. Basically now you have running web-server that serves get at root. You also easily have sent text response.

test

runner – npm

First of all to acomplish test you have to have a runner for it. In npm config package.json you may just define your test runner. I have chosen mocha as my runner and I will not argue that is the best choice. Any runner may be here.

test description – mocha way

Mocha allows for nice test description. Using describe it pair you may do anything you want. One step for success is to make your test descriptive, in the same time they may become your documentation.

assertions with chai

With chai’s expect you may write your test very effective descriptive way which is also readable, isn’t it?

[code lang=”javascript”]
expect(res.status).to.equal(200);
[/code]

calls by superagent

This is easy way how to make web requests. You have nice availibity of textual response or body of the response message (if json is expected). Just shoot and listen 🙂

[code lang=”javascript”]
superagent
.get(SERVER_URL+'/api/last/2')
.end(function (res) {
expect(res.status).to.equal(200);
expect(res.body).to.have.length(2);
done();
});
[/code]

The shortest story about being dev

Are you developer?  Or maybe you think you are?  I also think that. I will not be extremely original.  I like my job.  I like because I believe I was created in the way I love creating.

Language?  It’s only the tool, I used Delphi cause old guys told this is only proper language to use. I switched to Java because they pay more and there is larger community. I use Python because it makes things easy. I use Javascript because Node rocks and nosql family loves it (CouchDb, Elastic and others)