Be elastic. Search.

Why am I elasticsearch’ing

What does it mean to do IT? What is information? Information is not data. It is data that has been crunched and digested, so we can learn something about that. Do you have some text PDFs or Word documents that make you sick everytime you try to search them? No more! Behold the tool

the tool

Elasticsearch is my hobby. Not only as ELK part. Can be easily entangled to work with any data. This data is in your files. Underneath there is information.

config

Here in this repo you will find doc searcher. You can run it locally. Make your documents searchable. How does it work?

Step #1 Run engine – Elastic

Following is about Elastic version = 5 you have embedded mechanism ingest not tested by me yet.

Elasticsearch will accept all docs. There is great libary written in Java for attachments – Tika. Based on that guys made fancy plugin mapper-attachment that can be found here. Follow instruction matching your Elasticsearch version to install that using good old bin\plugin install. After running successful plugin installation you should have list of at least one plugin:

[code lang=”bash”]
>> bin\plugin.bat list
Installed plugins in D:\p\elasticsearch-2.4.0\plugins:
– mapper-attachments
[/code]

Step #2 Index docs – Python

You could pass all of document contents manually using Sense or Postman but it could be little troublesome. That’s why automation is good. So I have wrote some Python script that can do it. There is Python3 needed. You just run the python app and it indexes by default all contents of folder files_to_index to your local ES instance. Python library for that is very cool, you just type the following and it opens up connection happy and ready for any query.

[code lang=”python”]
from elasticsearch import Elasticsearch
HOST = ‚http://localhost:9200’
es = Elasticsearch([HOST])
[/code]

Step #3 Browse your docs – Browser

Let us show indexed docs! I have made little js browser. You can also check how does it look with scoring for each document. Just run web/view.html and type anything in input at this ultra-simple html page.

right tool to right task

If you only have a hammer, you tend to see every problem as a nail.

Elasticsearch is not a database but search engine. It doesn’t come without cost – indexes with words from all documents can take some space. But whenever you encounter issue with search please consider search tool.