A więc od początku. Wszystko zaczęło się od strony jvm-bloggers.com. Dostrzegłem że tak na prawdę pisanie blogów to coś co możesz osiągnąć. Ale czytanie też inspiruje! Problem tylko taki że na internecie dzieje się bardzo dużo. Ciężko to ogarnąć. Dlatego wymyśliłem sobie mały zbiór opinii i recenzji, które mają pomóc podjąć mi i Tobie decyzję czy zapoznać się z danym materiałem. Zainteresujmy się tym co na prawdę jest interesujące. Zobacz tutaj na 160znakow.pl
API for ES – why
So what is the goal? I want to have small app, that will allow greedy HR workers to explore Projects I took part in. I believe, one’s experience is the most important feature of future (IT) worker; school was long time ago, courses are OK but who knows who helped to accomplish them?
Czytaj dalej Flask can provide REST for RESTful Elasticsearch
I want to make life of job-changers and human-researchers easier. I want to make easily searchable projects view. I want to have it on Elasticsearch, cause I want to proof, you can easily do something small with it, so it’s not only for very BIG data projects.
I used to be regular business developer. Delphi, then Java. Recently working somewhere, where project was not veery interesting I came to the conclusion: I would like to bring a real business value here.
Then Elasticsearch came to my mind…
On Django page you can read:
Django makes it easier to build better Web apps more quickly and with less code.
I decided to validate that statement. And yeah, after working with Django for few days it turned out to be true. Following you may see how fundamental MVC tasks were implemented by me.
Czytaj dalej Django: Web MVC and DB DAO in minutes
Another chapter from Udacity course is data visualization. What I have learned was very important. To acquire, munge and analyze data. But it would be nothing when such a information wouldn’t be communicated to others. The easiest way it is to be done with data visualization. I learned about Napoleon’s march to Russia and why you have to think twice before you will post hue-colored diagram.
Lesson no 3 is about data analysis. If you were able to collect data and prepare it it’s time to draw conclusions. How to use datasets? How to predict the future? This is what I hoped to learn now..
Projekt skauta jak wiele jednoosobowych krótkich projektów po godzinach z pomysłem i energią na początku nie jest w takiej formie w jakiej by mnie zadowalał. Czas na szczerość z wpadkami i śmiałe stwierdzenie co poszło dobrze!
Are you going to play with data? First, you have to wrangle data, prepare it and make useful. Having practice session Wrangling Subway Data in udacity course was nice way to use my knowledge in practice. Subway data is a good sample of dataset consisting of several columns in CSV file.
Czytaj dalej Data wrangling in practice
Data science in progress. Right now reading and applying rules from lesson no 2: Data preparation, so called data munching. This is not something you want to do as a data scientist, it’s just indispensable to prepare your data for later processing. As Nick says:
More than 50 percent of time is just coming through the data and guessing is it OK – Nick
What can you learn in lessons from Data Wrangling section.
First I was reminded about common data formats. CSV, XML, JSON – these 3 are most popular in data world. Thankfully Pandas offers good way of consuming and producing these formats.
SQL found here
It was brand new for me that after forming dataframe I can play with it as with the SQL table using
pandasql library. Udacity course shows potential of these SQL extensions in Aadhaar dataset containg our dear registered friends from India. Using pandasql I can query dataset using SQL-92 syntax freely, including filtering (where) and grouping (group by).
To process data you have to get it first. Sometimes it can be available on some endpoint. In Python you can easily call APIs using
requests. It is as easy as
body = requests.get(url)
response = json.loads(body.text)
so you can easily get all info from any available endpoint and parse JSON response easily. It was an example to use that when querying against OpenFM API .
I miss you, value!
Missing values is another challenge you will find here, when dealing with data preparation. Usually in pythonic way such values are None’s in dataframe. What then? We can impute, or guess what to do. In baseball dataset I was encourage to use mean value as imputation. But better think twice! Imputation can lead to unclear conclusions. Here it was done using
numpy‚s mean function.