Index my bio

I want to make life of job-changers and human-researchers easier.  I want to make easily searchable projects view.  I want to have it on Elasticsearch, cause I want to proof, you can easily do something small with it, so it’s not only for very BIG data projects.


Following piece of code available is on Github.

Run on your Linux machine

It’s easy as piece of cake.  You only need docker and it’s younger brother docker-compose.  Then just this little script allows me to run whole the machine.

Write in YAML, index to ES

Thinking of domain right now, what do I need when thinking of my Bio?  I will simplify it to project object.  This could be exemplary bio of someone..

How to index it? First, make a JSON out of it.

Python in data wranigling and ES I/O

Data wrangiling or conversion is so easy.  Let’s make JSON out of Yaml notation here.  With yaml library it’s easy.

Then let’s proof we can connect to ElasticSearch.  After importing official library, we can see this line

Good, connection is done!

Index time!

Before we’ll start to index docs, we have to arrange place for them.  Any schema?  No.. Elasticsearch can do it for us.  We just have to create index, easily with some py script.  After that..

..we can query ES in Kibana tools. So let’s now fill it! After few refactors we are ready to insert or PUT docs to their place.  Indexing from within a script is done by this one-liner: All docs are there.  Any proof?  Query it:

Even you can write test and run it against this index.

That’s all.  In less then one hour we were able to run Elasticsearch on our machine and index docs from yaml directly to index.

Here is screenshot from my Kibana