ElasticSearch reindex increase number of shards

Reindex API - mechanism of receation of the index with applying of new settings

We have candidate-application index which has only 1 shard, we need 6

firs of all to create new index we need settings of existing one

# GET candidate-application/_mapping
{
  "candidate-application" : {
    "mappings" : {
      "properties" : {
        "employerId" : {
          "type" : "keyword"
        },
        "id" : {
          "type" : "keyword"
        },
        ...
      }
    }
  }
}

next, create new index with same settings but apply desired configuration for number of shards and replicas

PUT /candidate-application-2
{
  "settings": {
    "number_of_shards": 6,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "employerId" : {
        "type" : "keyword"
      },
      "id" : {
        "type" : "keyword"
      },
      ...
    }
  }
}

Note: while reindexing, we are setting number of replicas to 0, theoreticall it will be little bit faster

Next, start reindex:

OST _reindex?slices=5&wait_for_completion=false
{
  "source": {
    "index": "candidate-application",
    "size": 10000
  },
  "dest": {
    "index": "candidate-application-2",
    "version_type": "external"
  }
}

Notes:

  • version_type external ask elastic to move everything as is
  • wait_for_completion ask elastic to run operation async
  • slices number of threads (docs, says that it should match number of shards, in our case we have only one shard, but increasing this value speeds up reindex)
  • size by defaul 1К, number of docs copied at once

as response we will receive something like

{
  "task" : "Jwk8EOKOSLKxzqHSa2VWuQ:5105786962"
}

to check current status:

GET _tasks?detailed=true&actions=*reindex
{
  "nodes" : {
    "Jwk8EOKOSLKxzqHSa2VWuQ" : {
      "name" : "es01",
      "transport_address" : "62.149.5.105:9300",
      "host" : "62.149.5.105",
      "ip" : "62.149.5.105:9300",
      "roles" : [
        "ingest",
        "master",
        "data",
        "ml"
      ],
      "attributes" : {
        "ml.machine_memory" : "16818429952",
        "xpack.installed" : "true",
        "ml.max_open_jobs" : "20"
      },
      "tasks" : {
        "Jwk8EOKOSLKxzqHSa2VWuQ:5106832820" : {
          "node" : "Jwk8EOKOSLKxzqHSa2VWuQ",
          "id" : 5106832820,
          "type" : "transport",
          "action" : "indices:data/write/reindex",
          "status" : {
            "total" : 1013102,
            "updated" : 0,
            "created" : 60000,
            "deleted" : 0,
            "batches" : 60,
            "version_conflicts" : 0,
            "noops" : 0,
            "retries" : {
              "bulk" : 0,
              "search" : 0
            },
            "throttled_millis" : 0,
            "requests_per_second" : -1.0,
            "throttled_until_millis" : 0
          },
          "description" : "reindex from [candidate-application] to [candidate-application-mac-1][_doc]",
          "start_time_in_millis" : 1600430372126,
          "running_time_in_nanos" : 12301230038,
          "cancellable" : true,
          "headers" : { }
        }
      }
    }
  }
}

Where total - overall number of docs, created - number of copied docs, running_time_in_nanos - time from begining

after completion

GET /_cat/indices/candidate-application*?v&h=index,docs.count&s=index

GET candidate-application/_count

GET candidate-application-mac-1/_count

Notes:

  • first request may show fron info, untile last request is run
  • last request, runned first time may take some time (under the hood refresh is happening)

Switch aliases

POST /_aliases
{
  "actions": [
    {
      "remove": {
        "index": "candidate-application",
        "alias": "candidate-application-alias"
      }
    },
    {
      "add": {
        "index": "candidate-application-mac-1",
        "alias": "candidate-application-alias"
      }
    }
  ]
}

To see current status of aliases

GET _cat/aliases/candidate-application*?v&h=alias,index

Timing

time curl -s -X POST -u 'elastic:AVMrMHQ56augSsSGLAs3xahYB' -H 'Content-Type: application/json' 'https://es01.rabota.ua:9200/_reindex?slices=5' -d '{
  "source": {
    "index": "candidate-application",
    "size": 10000
  },
  "dest": {
    "index": "candidate-application-mac-1",
    "version_type": "external"
  }
}'

index with one million of docs were reindexed in 14 seconds

test on big index with 16M docs, 60gb, takes approx 30min which is fine, because copying of such amount of data on its own is not fast

here is small script to see whats going on

Пока ждал накалякал скрипт что бы смотреть что происходит

$username = 'elastic'
$password = '*******'

$base64 = [System.Convert]::ToBase64String([System.Text.Encoding]::ASCII.GetBytes("$($username):$($password)"))
$headers = @{ Authorization = "Basic $base64" }

$res = Invoke-RestMethod "https://es01.rabota.ua:9200/_tasks?detailed=true&actions=*reindex" -Headers $headers | Select-Object -ExpandProperty nodes
$key = $res | Get-Member -MemberType NoteProperty | Select-Object -ExpandProperty Name -First 1
$taskKeys = $res.$key.tasks | Get-Member -MemberType NoteProperty | Select-Object -ExpandProperty Name

foreach($taskKey in $taskKeys) {
    $task = $res.$key.tasks.$taskKey

    if ($task.status.total -eq 0) {
        Write-Host "$taskKey - empty"
        continue
    }

    $percent = [int]($task.status.created / $task.status.total * 100)
    $timer = [TimeSpan]::FromMilliseconds($task.running_time_in_nanos/1000000).ToString()

    Write-Host "$taskKey - $($percent)% in $timer"
}

Output will be something like:

Jwk8EOKOSLKxzqHSa2VWuQ:5150731438 - empty
Jwk8EOKOSLKxzqHSa2VWuQ:5150731439 - 64% in 00:20:45.4927814
Jwk8EOKOSLKxzqHSa2VWuQ:5150731441 - 66% in 00:20:45.4925209
Jwk8EOKOSLKxzqHSa2VWuQ:5150731443 - 68% in 00:20:45.4923133
Jwk8EOKOSLKxzqHSa2VWuQ:5150731446 - 67% in 00:20:45.4921202
Jwk8EOKOSLKxzqHSa2VWuQ:5150731449 - 66% in 00:20:45.4919510

Note: first empty record is ok, this one acts as a parent for child threads

Here are results for big index:

{
  "size": 0,
  "aggs": {
    "status": {
      "terms": {
        "field": "statusId",
        "size": 10
      }
    }
  }
}

On old index with single shard - 550ms, on new index - 20ms