- PoolParty Semantic Suite Documentation
- Administrator Guide
- PoolParty Installation
- Migrating the Index
- Migrating Index from Solr to Elasticsearch on Linux
Migrating Index from Solr to Elasticsearch on Linux
This section will provide you with an overview of all steps required for index migration from Solr to Elasticsearch in PoolParty running on a Linux system. This index service is set up in the PoolParty Semantic Middleware Configurator.
This procedure involves the following steps:
Pre-migration disk space assessment
Exporting Solr data
Executing a script to re-configure the system to use Elasticsearch
Importing the Solr data into Elasticsearch
Your are running PoolParty 2024 Release 2 (9.4.x).
Currently Solr is configured and enabled as your indexing service in PoolParty.
You have to make sure that the available free disk space equals to at least three times the current disk usage by Solr before triggering the migration to Elasticsearch.
Execute the following command to determine the current disk space consumed by Solr indices:
du -hs /opt/poolparty/data/solr # result 1.3G /opt/poolparty/data/solr
Run the following command to view free disk space:
mkdir /opt/poolparty/backup_solr && df -h /opt/poolparty/backup_solr # result Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg--data-opt 70G 58G 13G 83% /opt
Calculate required free space:
Multiply the total Solr disk usage by 3 to determine the minimum required free space for Elasticsearch migration. If the available free disk space exceeds this value, then you can proceed with the migration.
Download the indexexporter.jar
Download the indexexporter.jar
onto the server you want to migrate. It is an executable JAR file requiring Java 17 (or higher) to run.
Stop PoolParty and only start Solr (this prevents any write operations to the Solr index during export).
Make sure that you have these on hand:
a backup directory path with sufficient free disk space (approximately two times the size of the Solr data directory; no relative paths)
the credentials (user and password) for the Solr instance
Enter and run this command, then wait
java -jar "indexexporter-1.2.0.jar" export http://localhost:8983/solr <fullPathToBackupDir> <solrUser> <solrPass>
$ java -jar "indexexporter-1.2.0.jar" export http://localhost:8983/solr <fullPathToBackupDir> <solrUser> <solrPass> Exported 0 documents from core classification Exported 33852 documents from core conceptdata Exported 187662 documents from core conceptmatch Exported 75554 documents from core cooccurrence Exported 1087 documents from core corpusconcept Exported 20256 documents from core corpusterm Exported 0 documents from core disambiguation Exported 16 documents from core excludedterm Exported 0 documents from core languagemodel-en Exported 7615 documents from core searchdata Exported 0 documents from core tfidf Finished exporting indices.
The export operation has two possible return codes:
0 - Success
2 - Error (refer to the logs for details since a case by case assessment is required in such an eventuality)
Prepare the environment
Stop Solr
./bin/solr stop
Copy the below transition script (
migrate_solr_to_elasticsearch
) and paste it to thebin/
directory of your PoolParty installation.
Execute this script
Navigate to the
bin
directory and run the script by executing./migrate_solr_to_elasticsearch
This script performs several tasks:
it starts the Elasticsearch service
it checks and confirms that Elasticsearch is listening on port 9200
it resets the password of the Elasticsearch user 'elastic'
it modifies the
poolparty.conf
file to deactivate Solr and activate Elasticsearchit creates a new
installer.properties
file with configuration settings for Elasticsearch
Script output
The script will provide a new password for the Elasticsearch user 'elastic'.
Note
Remember to note down the password for the Elasticsearch user 'elastic' since you will need it during the subsequent data migration.
Finalize transition to Elasticsearch
Initial upstart: Start PoolParty. Initially, the status of the index may be shown in RED in the Semantic Middleware Configurator indicating that it has to be started again.
Restart PoolParty: Restart PoolParty to complete the integration of Elasticsearch which will be reflected by showing the status of the index in GREEN.
#!/bin/bash DIR_SCRIPT=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd) POOLPARTY_CONF="${DIR_SCRIPT}"/../config/poolparty.conf INSTALLER_PROPERTIES="${DIR_SCRIPT}"/../config/installer.properties DIR_ES="${DIR_SCRIPT}"/../index/elasticsearch # Start ES if [[ $EUID -eq 0 ]]; then run_as=$(ls -l "$DIR_SCRIPT/elasticsearch" | awk '{print $3}') su - $run_as -c "$DIR_SCRIPT/elasticsearch start" else $DIR_SCRIPT/elasticsearch start fi TIMEOUT=60 INTERVAL=5 ELAPSED=0 # Check if Elasticsearch is listening on port 9200 using lsof while [ $ELAPSED -lt $TIMEOUT ]; do if lsof -i :9200 | grep -q LISTEN; then echo "Elasticsearch is up and running on port 9200." break else echo "Elasticsearch is not up yet. Retrying... (Elapsed: $ELAPSED seconds)" fi # Increment the elapsed time and wait for the next interval ELAPSED=$((ELAPSED + INTERVAL)) sleep $INTERVAL done if [ $ELAPSED -ge $TIMEOUT ]; then echo "Elasticsearch did not start within the timeout period." exit 1 fi # Reset the password for the elastic user and capture the output PASSWORD_OUTPUT=$(echo "y" | "${DIR_ES}"/bin/elasticsearch-reset-password -u elastic 2>/dev/null) # Extract the new password from the output using grep with Perl-compatible regular expressions PASSWORD_ES=$(echo "$PASSWORD_OUTPUT" | grep -oP 'New value: \K.*') cp "${POOLPARTY_CONF}" "${POOLPARTY_CONF}".bak sed -i 's/^builtin=true/builtin=false/' "${POOLPARTY_CONF}" sed -i 's/^ES_START=false/ES_START=true/' "${POOLPARTY_CONF}" # create installer.properties { echo "index.type=ELASTICSEARCH" echo "index.host=http://localhost:9200" echo "index.username=elastic" echo "index.password=${PASSWORD_ES}" } > "${INSTALLER_PROPERTIES}" echo "New ES password: ${PASSWORD_ES}"
$ bash migrate_solr_to_elasticsearch Started Elasticsearch with the PID 1907 Elasticsearch is up and running on port 9200. New ES password: iMIIm_AA38XYUR2QVdmw
Troubleshooting
If Elasticsearch fails to start, check for any potential port conflicts or permission issues.
Make sure that the
poolparty.conf
andinstaller.properties
files contain the correct configuration changes.
Shut down PoolParty (if running) and start Elasticsearch only (this prevents unexpected writes to the index while the import is ongoing)
Make sure you have these on hand
The same backup directory you used earlier for export
The new password for the Elasticsearch user "elastic" created in the previous step
Fill and run this command, then wait
java -jar "indexexporter-1.2.0.jar" import http://localhost:9200 "elastic"
---------------------------------------------------------------- Start migration of index: classification ---------------------------------------------------------------- No documents to import for index 'classification' ---------------------------------------------------------------- Start migration of index: conceptdata ---------------------------------------------------------------- Number of documents in index export for 'conceptdata' is 78723 Flushing data of index conceptdata to disk. Some data had to be removed from 25 documents in the conceptdata index. Rebuilding your extraction model might solve any issues. ---------------------------------------------------------------- Start migration of index: conceptmatch ---------------------------------------------------------------- Number of documents in index export for 'conceptmatch' is 429297 Flushing data of index conceptmatch to disk. ---------------------------------------------------------------- Start migration of index: cooccurrence ---------------------------------------------------------------- Number of documents in index export for 'cooccurrence' is 4849019 Flushing data of index cooccurrence to disk. ---------------------------------------------------------------- Start migration of index: corpusconcept ---------------------------------------------------------------- Number of documents in index export for 'corpusconcept' is 11997 Flushing data of index corpusconcept to disk. ---------------------------------------------------------------- Start migration of index: corpusterm ---------------------------------------------------------------- Number of documents in index export for 'corpusterm' is 1863574 Flushing data of index corpusterm to disk. ---------------------------------------------------------------- Start migration of index: disambiguation ---------------------------------------------------------------- Number of documents in index export for 'disambiguation' is 521566 Flushing data of index disambiguation to disk. ---------------------------------------------------------------- Start migration of index: excludedterm ---------------------------------------------------------------- Number of documents in index export for 'excludedterm' is 1821 Flushing data of index excludedterm to disk. ---------------------------------------------------------------- Start migration of index: languagemodel-en ---------------------------------------------------------------- No documents to import for index 'languagemodel-en' ---------------------------------------------------------------- Start migration of index: languagemodel-es ---------------------------------------------------------------- No documents to import for index 'languagemodel-es' ---------------------------------------------------------------- Start migration of index: languagemodel-fr ---------------------------------------------------------------- No documents to import for index 'languagemodel-fr' ---------------------------------------------------------------- Start migration of index: languagemodel-pt ---------------------------------------------------------------- No documents to import for index 'languagemodel-pt' ---------------------------------------------------------------- Start migration of index: searchdata ---------------------------------------------------------------- No documents to import for index 'searchdata' ---------------------------------------------------------------- Start migration of index: tfidf ---------------------------------------------------------------- Number of documents in index export for 'tfidf' is 329 Flushing data of index tfidf to disk. ---------------------------------------------------------------- Migration of solr data to ElasticSearch done successfully ---------------------------------------------------------------- Index Name Status Migrated documents #corpusterm Success 1863574 conceptdata Warning 78723 corpusconcept Success 11997 disambiguation Success 521566 cooccurrence Success 4849019 tfidf Success 329 conceptmatch Success 429297 excludedterm Success 1821 Finished importing indices.
Import can have the following return codes:
0 - Success
1 - Warning (some field values had to be removed, in our example it was the 'conceptdata' index)
2 - Error (refer to the logs for details since a case by case assessment is required in such an eventuality)
2 - Error
Typically errors are connection or credential errors; an error can also be returned when complete documents can't be indexed in Elasticsearch.
Refer to the logs for details and attempt solving the issue or contact support for additional assistance.
1 - Warning
This means some fields in that index had to be removed during the import process, but the index is generally operational.
The import log will contain the number of documents (= number of concepts) that were affected. Such log entry may look like this: Some data had to be removed from 25 documents in the conceptdata index. Rebuilding your extraction model might solve any issues.
This problem occurs in less than 0.03% of the concepts. You can contact support for additional assistance at any time.
Now we will stop the Solr service and start PoolParty. It is recommended to check the migrated data to make sure that everything worked smoothly.
You should also check in the Semantic Middleware Configurator whether the active index is indeed Elasticsearch (1) (indicated by a green dot next to the index).
Now you have Elasticsearch set up as your active index and you have successfully migrated the previously used Solr index to Elasticsearch.