Getting Started with Elasticsearch

Getting Started with Elasticsearch

A tutorial to get to know Elasticsearch and how to use it in your next project

Welcome to my article on using Elasticsearch! In this post, I will build a simple HTTP server that can perform basic CRUD operations, giving you a starting point for creating your own Elasticsearch-powered projects. Let's get started!

What is Elasticsearch

Elasticsearch is a distributed, open-source search and analytics engine for all types of data, including text, numerical, geospatial, structured, and unstructured. It is built on top of the Apache Lucene search engine library and provides a powerful set of features for full-text search, highlighting, and advanced analytics.

Elasticsearch is designed to be scalable, fast, and flexible, and it can be used to index and search large volumes of data quickly and efficiently. Additionally, Elasticsearch provides a RESTful API that makes it easy to integrate with other systems and applications.

Installing Elasticsearch

One way to run Elasticsearch is by using Docker. This is a simple and easy method, and the Docker container can be easily removed when you no longer need Elasticsearch.

If you do not have Docker installed, you can follow the installation instructions provided in the Docker documentation.

Once you have installed Docker, you can use the following commands to start an Elasticsearch server:

  1. Pull the latest Elasticsearch docker image
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.5.2
  1. Create a new docker network for Elasticsearch
docker network create elastic
  1. Run Elasticsearch
docker run -e ES_JAVA_OPTS="-Xms1g -Xmx1g" -e "discovery.type=single-node" -e "xpack.security.enabled=false" --net elastic -p 9200:9200 docker.elastic.co/elasticsearch/elasticsearch:8.5.2

The docker run is used to start a Docker container, which is a lightweight and portable runtime environment for running applications or services. The command above is used to start an Elasticsearch server in a Docker container.

We set the ES_JAVA_OPTS environment variable to specify Java options for Elasticsearch, and -Xms1g -Xmx1g set the minimum and maximum memory for Elasticsearch to 1 GB. The discovery.type=single-node environment variable is used to run Elasticsearch in a single-node configuration, which is suitable for testing and development purposes. The xpack.security.enabled=false environment variable disables security features in Elasticsearch since it is for testing purposes, so it doesn't require authentication to access it.

You can then access Elasticsearch on port 9200 on your local machine.

Elasticsearch API Operations

As I mentioned earlier that Elasticsearch provides a RESTful API that makes it easy to integrate with other systems and applications, we will now check on those built-in Elasticsearch APIs.

Creating Index

Before creating an index on Elasticsearch, you should design your data model, just as you would with a SQL database. An index on Elasticsearch is equivalent to a schema on a SQL database, as it defines the structure of your data and how it will be indexed and searched.

By designing your data model carefully, you can ensure that your Elasticsearch index is optimized for your specific use case and will provide fast and accurate search results. Once you have designed your data model, you can create the corresponding index on Elasticsearch using the PUT or POST method of the Elasticsearch API.

Let's say we need to index a book with title and author, then we will need to create the index for those attributes:

curl --request POST 'http://localhost:9200/book' \
--header 'Content-Type: application/json' \
--data-raw '{
    "mappings": {
        "properties": {
            "id": {
                "type": "integer"
            },
            "title": {
                "type": "text"
            },
            "author": {
                "type": "text"
            }
        }
    }
}'

This cURL response should be indicating the index was successfully created.

{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "book"
}

To see the list of created indices on Elasticsearch, you can access http://localhost:9200/_cat/indices through your browser.

Inserting Data

To insert data after creating the book index, we can use the API POST {index}/_doc/{id} with the desired request payload. In our case, the request and response are shown below.

curl --request POST 'http://localhost:9200/book/_doc/1' \
--header 'Content-Type: application/json' \
--data-raw '{
    "id": 1,
    "title": "Learn Elasticsearch",
    "author": "Ardian Bahtiarsyah"
}'
{
  "_index": "book",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

Updating Data

To update a book data, we can use the endpoint POST {index}/_update/{id} and specify the desired changes in the payload. For example, updating book.title with id=1 can follow the request below:

curl --request POST 'http://localhost:9200/book/_update/1' \
--header 'Content-Type: application/json' \
--data-raw '{
"doc": {
   "title": "Learn Elasticsearch for Beginner"
 }
}'

This operation will have a response payload that indicates the update operation was successful.

{
  "_index": "book",
  "_id": "1",
  "_version": 4,
  "result": "updated",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 4,
  "_primary_term": 1
}

Deleting Data

Deleting can be done through API as well with the following endpoint:

curl --request DELETE 'http://localhost:9200/book/_doc/1'

with the response payload if the delete operation was successful.

{
  "_index": "book",
  "_id": "1",
  "_version": 5,
  "result": "deleted",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 5,
  "_primary_term": 1
}

Searching Data

Elasticsearch is a highly versatile and powerful tool for search and data analysis. Its advanced search capabilities make it an excellent choice for organizations that need to process large volumes of data quickly and accurately.

With Elasticsearch, users can perform complex searches using a variety of parameters and filters, enabling them to quickly and easily find the information they need.

The request body format can be found in this Elasticsearch documentation, with some built-in query DSL that is useful to empower your projects.

For a better understanding of how it works in terms of the API call, we can take a simple query for now.

curl --request GET 'http://localhost:9200/book/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
    "query": {
        "match": {
            "author": "ardian"
        }
    }
}'

The response payload returns the data that match the full-text query inside the hits attribute.

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 0.10536051,
    "hits": [
      {
        "_index": "book",
        "_id": "1",
        "_score": 0.10536051,
        "_source": {
          "id": 1,
          "title": "Learn Elasticsearch",
          "author": "Ardian Bahtiarsyah"
        }
      },
      {
        "_index": "book",
        "_id": "2",
        "_score": 0.10536051,
        "_source": {
          "id": 2,
          "title": "Learn Go",
          "author": "Ardian Bahtiarsyah"
        }
      }
    ]
  }
}

By now, you should have a good understanding of how Elasticsearch APIs work. While there are many libraries and modules available for your programming language of choice, they all ultimately rely on the same APIs that I have demonstrated above. So even if you choose to use a library or module for your Elasticsearch integration, it's important to understand the underlying APIs and how they work. This will give you the knowledge and flexibility to customize and troubleshoot your Elasticsearch integration as needed.

Coding Time! Elasticsearch from an Application

To help you better understand how to implement Elasticsearch in your own applications, let's walk through a simple example project. We will use Elasticsearch as the data storage and give our application the ability to search that data using Elasticsearch's powerful search capabilities. This will provide a hands-on demonstration of how to integrate Elasticsearch into your own projects.

I've decided to use Go for this project because I'm a firm believer in learning by doing. So don't be surprised if the code looks a little messy or the code doesn't meet your standard. The most important thing is that the information is conveyed. ;)

The figure above shows the architecture diagram that I will be following. Don't worry if the application structure or directory names don't make sense - I'm still learning about clean architecture in Go! So bear with me while I figure it out.

Go Insert Index

Before we can start working with Elasticsearch, we need to define the data structure that we will be using. In this case, we will create a book struct in our domain directory. This will allow us to represent the data we want to store and manipulate in Elasticsearch.

// domain/book.go

package domain

type Book struct {
    Id     int    `json:"id"`
    Title  string `json:"title,omitempty"`
    Author string `json:"author,omitempty"`
}

Then we will need to register those attributes to be indexed in Elasticsearch. Since this mechanism will be interacting with Elasticsearch, which is considered an external service so we will separate it into the repository directory.

// repository/elasticsearch_repository.rb

package repository

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
    "strconv"
    "strings"
    "es-go-client/domain"
)

type ESClient struct {
    baseURL string
}

func NewClient(host string) *ESClient {
    return &ESClient{host}
}

func (c *ESClient) InsertIndex(e *domain.Book) error {
    body, _ := json.Marshal(e)

    id := strconv.Itoa(e.Id)
    // Elastic search endpoint to insert the index.
    req, err := http.NewRequest("POST", c.baseURL+"/book/_doc/"+id, bytes.NewBuffer(body))
    if err != nil {
        return fmt.Errorf("failed to init insert index request: %v", err)
    }

    httpClient := http.Client{}
    req.Header.Add("Content-type", "application/json")
    response, err := httpClient.Do(req)
    if err != nil {
        return fmt.Errorf("failed to insert index: %v", err)
    }
    defer response.Body.Close()

    responseBody, err := ioutil.ReadAll(response.Body)
    if err != nil {
        return fmt.Errorf("failed read insert index response: %v", err)
    }

    log.Println("debug insert index response: ", string(responseBody))

    return nil
}

The elasticsearch_repository.go file will be accessed from incoming HTTP requests, so we need to handle these requests in the delivery directory.

//delivery/book_delivery.go

package delivery

import (
    "encoding/json"
    "es-go-client/domain"
    "es-go-client/repository"
    "net/http"
    "strconv"
)

type Server struct {
    ESClient *repository.ESClient
}

// Insert index function calls Elasticsearch repository
func (s *Server) InsertIndexHandler(w http.ResponseWriter, r *http.Request) {
    var book *domain.Book
    json.NewDecoder(r.Body).Decode(&book)
    err := s.ESClient.InsertIndex(book)
    if err != nil {
        writeResponseInternalError(w, err)
        return
    }
    writeResponseOK(w, book)
}

// ... other functions go here


// These functions belog to handle the responses
func writeResponseOK(w http.ResponseWriter, response interface{}) {
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(http.StatusOK)
    writeResponse(w, response)
}

func writeResponseInternalError(w http.ResponseWriter, err error) {
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(http.StatusInternalServerError)
    writeResponse(w, map[string]interface{}{
        "error": err,
    })
}

func writeResponse(w http.ResponseWriter, response interface{}) {
    json.NewEncoder(w).Encode(response)
}

To make our application functional, we need to register the book_delivery.go file in the main.go file in the root application directory. This will allow us to access and use the functions and endpoints we've defined in book_delivery.go within the main application.

Let's go ahead and do that now so we can start using our application.

// main.go

package main

import (
    "log"
    "net/http"
    "es-go-client/delivery"
    ESClient "es-go-client/repository"
)

func main() {
    // Elasticsearch runs on port 9200
    esClient := ESClient.NewClient("http://localhost:9200")

    server := delivery.Server{ESClient: esClient}
    // Our APIs
    http.HandleFunc("/insert", server.InsertIndexHandler)

    // Go server runs on port 8080
    log.Println("listening server on port 8080")
    http.ListenAndServe(":8080", nil)
}

Finally, we can see the go application in action by running the server and accessing our insert API on HTTP GET /insert, this communicates with Elasticsearch to add the index.

go run main.go

Go Update Index

To add the /update endpoint to our application, we can simply follow the pattern used for the previous index insertion functions. This will involve registering each endpoint with the corresponding function handler so this will allow us to use all of the available Elasticsearch functionality within our application.

// repository/elasticsearch_repository.go

...

func (c *ESClient) UpdateIndex(e *domain.Book) error {
    body, _ := json.Marshal(map[string]*domain.Book{
        "doc": e,
    })

    id := strconv.Itoa(e.Id)
    // Elastic search endpoint to update the index.
    req, err := http.NewRequest("POST", c.baseURL+"/book/_update/"+id, bytes.NewBuffer(body))
    if err != nil {
        return fmt.Errorf("failed to init update index request: %v", err)
    }

    httpClient := http.Client{}
    req.Header.Add("Content-type", "application/json")
    response, err := httpClient.Do(req)
    if err != nil {
        return fmt.Errorf("failed to update index: %v", err)
    }
    defer response.Body.Close()

    responseBody, err := ioutil.ReadAll(response.Body)
    if err != nil {
        return fmt.Errorf("failed read update index response: %v", err)
    }

    log.Println("debug update index response: ", string(responseBody))

    return nil
}
// delivery/book_delivery.go

...

func (s *Server) UpdateIndexHandler(w http.ResponseWriter, r *http.Request) {
    var book *domain.Book
    json.NewDecoder(r.Body).Decode(&book)
    if err := s.ESClient.UpdateIndex(book); err != nil {
        writeResponseInternalError(w, err)
        return
    }
    writeResponseOK(w, book)
}
// main.go

...
http.HandleFunc("/update", server.UpdateIndexHandler)
...

Let's use Postman to verify that everything is working properly. This will involve testing the /update endpoint to ensure that they are functioning as expected and that we can access and manipulate the Elasticsearch index as needed.

Go Delete Index

// repository/elasticsearch_repository.go

...

func (c *ESClient) DeleteIndex(id int) error {

    req, err := http.NewRequest("DELETE", c.baseURL+"/book/_doc/"+strconv.Itoa(id), nil)
    if err != nil {
        return fmt.Errorf("failed to make a delete index request: %v", err)
    }

    httpClient := http.Client{}
    req.Header.Add("Content-type", "application/json")
    response, err := httpClient.Do(req)
    if err != nil {
        return fmt.Errorf("failed to delete index: %v", err)
    }
    defer response.Body.Close()

    responseBody, err := ioutil.ReadAll(response.Body)
    if err != nil {
        return fmt.Errorf("failed read delete index response: %v", err)
    }

    log.Println("debug delete index response: ", string(responseBody))

    return nil
}
// delivery/book_delivery.go

...

func (s *Server) DeleteIndexHandler(w http.ResponseWriter, r *http.Request) {
    id, _ := strconv.Atoi(r.FormValue("id"))
    if err := s.ESClient.DeleteIndex(id); err != nil {
        writeResponseInternalError(w, err)
        return
    }
    writeResponseOK(w, domain.Book{Id: id})
}
// main.go

...
http.HandleFunc("/delete", server.DeleteIndexHandler)
...

Let's double-check our /delete endpoint using Postman. This will help us confirm that the endpoint is working properly and that we can successfully delete records from the Elasticsearch database.

Go Search Index

Well, we're talking about Elasticsearch so this will be the main functionality for our case.

Just put the code similar to the others, but this time we will need to handle the response from Elasticsearch properly. Let's take a look at the Elasticsearch search result payload before we continue.

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 0.10536051,
    "hits": [
      {
        "_index": "book",
        "_id": "1",
        "_score": 0.10536051,
        "_source": {
          "id": 1,
          "title": "Learn Elasticsearch",
          "author": "Ardian Bahtiarsyah"
        }
      },
      {
        "_index": "book",
        "_id": "2",
        "_score": 0.10536051,
        "_source": {
          "id": 2,
          "title": "Learn Go",
          "author": "Ardian Bahtiarsyah"
        }
      }
    ]
  }
}

You may have noticed that the data is part of the Hits attribute. In order to access and use this data, we will need to create a new struct to hold it. This will allow us to easily access and manipulate the data as needed.

// domain/book.go

...

type SearchHits struct {
    Hits struct {
        Hits []*struct {
            Source *Book `json:"_source"`
        } `json:"hits"`
    } `json:"hits"`
}

Following the pattern, we will add the Search function to the elasticsearch_repository.go file. This will allow us to access the function and use it to search the Elasticsearch database as needed.

// repository/elasticsearch_repository.go

...

func (c *ESClient) Search(keyword string) ([]*domain.Book, error) {
    // we search title or author
    query := fmt.Sprintf(`
    {
        "query": {
            "bool": {
                "should": [
                    {"term": { "title": "%s" }},
                    {"term": { "author": "%s" }}
                ]
            }
        }
    }
    `, keyword, keyword)

    // Elasticsearch endpoint for searching data
    req, err := http.NewRequest("GET", c.baseURL+"/book/_search", strings.NewReader(query))
    if err != nil {
        return nil, fmt.Errorf("failed to init search request: %v", err)
    }

    httpClient := http.Client{}
    req.Header.Add("Content-type", "application/json")
    response, err := httpClient.Do(req)
    if err != nil {
        return nil, fmt.Errorf("failed to search: %v", err)
    }
    defer response.Body.Close()

    responseBody, err := ioutil.ReadAll(response.Body)
    if err != nil {
        return nil, fmt.Errorf("failed read search response: %v", err)
    }

    var searchHits domain.SearchHits
    if err := json.Unmarshal(responseBody, &searchHits); err != nil {
        return nil, fmt.Errorf("failed to unmarshal search response: %v", err)
    }

    books := []*domain.Book{}
    for _, hit := range searchHits.Hits.Hits {
        books = append(books, hit.Source)
    }

    return books, nil
}

In the book_delivery.go file, we will need to handle the search HTTP request. This will involve defining the appropriate function and registering it as an endpoint to process the request and return the desired results.

// delivery/book_delivery.go

...

func (s *Server) SearchHandler(w http.ResponseWriter, r *http.Request) {
    keyword := r.FormValue("keyword")
    books, err := s.ESClient.Search(keyword)
    if err != nil {
        writeResponseInternalError(w, err)
        return
    }
    writeResponseOK(w, books)
}

Don't forget to register the endpoint with the corresponding function handler. This will ensure that the endpoint is properly associated with the correct function and can be accessed as expected.

// main.go

...
http.HandleFunc("/search", server.SearchHandler)
...

To ensure that Elasticsearch is working properly, try searching for a keyword. This will search the data based on the book title or author, similar to a SQL query of SELECT * FROM book WHERE title LIKE %keyword% or author LIKE %keyword%. If your search returns the expected results, then Elasticsearch and our Go code are functioning properly.

Search if it returns no data, it means no matching results were found. You may want to try adjusting your search query.

Go Source Code

Great, we are all done! You can use it to experiment with implementing Elasticsearch on your own project. If you want to take a closer look at the code, you can find it on my GitHub repository.

Give it a try and see what creative ideas you can come up with!

Closing

As you reach the end of this article, I appreciate you took the time to learn about Elasticsearch and its potential for your projects through this article.

Keep an eye out for future posts where I'll delve into even more interesting topics. In the meantime, I hope this information has been helpful and I look forward to your continued engagement. Thanks again!