Skip to content

OpenDJ Indexes Explained

Bill Nelson Jan 1, 2014 9:56:45 AM
3d cube representing a database

Suppose that you have an OpenDJ directory server with 300,000 entries.  And further suppose that the space consumed on your disk for said directory is 1.2 GB and made up of 114 database (*.jdb) files.  Suppose that you didn’t plan correctly and you are now running out of space on your hard drive.  What should you do?  Run to your local System Administrator and beg for him to increase the size of your partition?  Before promising to buy him lunch for the next year or offering your first born child to mow his lawn, look to see if you actually need that much space in the first place.

In general, the size of your database is based on three things:

  • The number of entries in your database
  • The size of an average entry
  • Your indexing strategy

The first two items are relatively straight forward as you probably have a good idea of your data profile, but an improper indexing strategy can take you by surprise and may actually cause more harm than good.  Indexes are used to increase search performance based on application search filters.  Lack of necessary indexes can impact performance, increase aggravation, and lead to calls in the middle of the night.  But maintaining indexes that are never used can unnecessarily increase disk space and impact the performance of write operations.

 Default Indexes

OpenDJ comes with the following default indexes:

Attribute

Presence

Equality

Substring

Order

aci

x

operational
cn

x

x

standard
ds-sync-conflict

x

operational
ds-sync-hist

x

operational
entryUUID

x

operational
givenName

x

x

standard
mail

x

x

standard
member

x

standard
objectclass

x

standard
sn

x

x

standard
telephonenumber

x

x

standard
uid

x

standard
uniquemember

x

standard

Indexes on operational attributes are necessary to make OpenDJ run efficiently.  You should never modify these unless instructed to do so by ForgeRock support.  Standard attributes, however, are used to increase external application search performance and should reflect the types of searches being performed by your own applications.  Default attributes (and index types) are based on ForgeRock’s observations of what most of its customers use, but you may not be like most of their customers and while maintaining some index types can be relatively benign, others (like SUBSTRING) may have a more dramatic effect.

 Using Indexes to Increase Search Performance

From a high level perspective, indexes are used to identify likely candidates that might be found as a result of an application’s search filter.  Assume, for instance, that you have a simple phone book application that allows you to search for phone numbers based on first name and last name.  A filter to locate all entries that have a first name (givenname) of “Bill” would be:

(givenname=Bill)

But not every entry in your OpenDJ server has a givenname attribute with a value of “Bill” so looking at every entry to see if it matches may take a lot of time.  But how can you avoid looking at every entry?  The answer is simple; you create an index for the givenname attribute to narrow down your search.  Simply add an EQUALITY index for the givenname attribute and OpenDJ will associate all entries in its database with those that have a particular value.  The following is a conceptual representation of how OpenDJ will make this association:


givenname=Bill:      1,3,9,22
givenname=Ralph:     2,11
givenname=Wally:     4,5,6,7,8,10
givenname=Wild Bill: 12
givenname=Billy:     13,15,21
givenname=Silly:     14
….

This demonstrates that the givenname value for entries 1, 3, 9, and 22 are all “Bill”.  When OpenDJ receives a search for all entries that have a first name of “Bill”, it immediately knows that a match is found in records 1, 3, 9 and 22. It doesn’t even look at the other entries.  In a database that contains hundreds of thousands of entries, this can drastically increase search performance.

This is all well and fine, but how can indexes actually impact us?

How Unnecessary Indexes Can Hurt You

Imagine that you met a coworker at a party last night and you didn’t quite get his name.  You seem to remember his name was Bill, but you heard people call him Bill, Billy, Wild Bill, and even Bill-O-Rama. You want to look him up, but you can’t because you really aren’t sure about his first name.  Hopefully your same phone book application allows you to search for all entries that contain the string, “Bill”.  If so, an EQUALITY index would not work as you really don’t know the specifics of what you are looking for.   In this case you would create a SUBSTRING index for the givenname attribute.  In so doing, OpenDJ will associate substrings with entries as follows:


givenname=*Bill:  1,3,9,12,13,15,21,22
givenname=*illy:  13,14,15,21
givenname=*Wild:  12
givenname=*ild :  12
givenname=*ld B:  12
givenname=*d Bi:  12
givenname=*Ralp:  2,11
givenname=*alph:  2,11
givenname=*ally:  4,5,6,7,8,10
….

Note:  OpenDJ created entries for substrings consisting of four or more characters.  These include the beginning of string (^) and end of string ($) characters; the shorter the string, the fewer entries that are created.  Imagine how many entries would be generated if the attribute contained a value of ‘supercalifragilisticexpialidocious’!

There are times when maintaining indexes may actually be more costly than if you were to perform an unindexed search (i.e. evaluate every entry in the directory server).  To prevent this, OpenDJ provides the ds-cfg-index-entry-limit configuration parameter that allows you to define an upper limit on the number of indexes maintained for an attribute.  There is a global (default) value of 4000 for this parameter, but it may also be configured on a per indexed attribute basis.  A value of 4000 means that OpenDJ will stop generating index values once it reaches 4000 index entries.  A minor problem is that you can maintain up to 4000 index entries for attributes that are never included in a search filter.   A bigger problem, however, is that each time a write operation is performed that includes the indexed attribute, the indexes for that attribute are rebuilt.  If your OpenDJ server is subject to extensive write operations, then you may be constantly writing and rewriting your database files which may impact write performance and ultimately overall server performance.  (See “Unlocking the Mystery behind the OpenDJ User Database” for more information on how, when, and why the database files are rewritten on change operations.)

 Determining Whether an Index is Necessary or Not

A recommendation is to maintain only those indexes for attributes that are included in your application search filters.  The types of indexes selected should reflect the manner of searches being performed by your application.  To determine this, you can review your LDAP-enabled applications and attempt to ascertain the types of filters it may be producing; but this may not be so obvious.

A more realistic approach is to come up with a “best guess” and then monitor your server to see if your guess was accurate or not.  You can then add, delete, or modify attribute indexes based on your findings.

When to Add Indexes

You should monitor your access logs for searches that take a long time and consider adding indexes for search times that you find unacceptable.  This can be seen in the etime (or elapsed time) value which is displayed in milliseconds (by default).  This is subject to your own SLAs, but etimes greater than 5 milliseconds may be considered unacceptable.  If you see etimes in the order of seconds (as shown below) then you definitely need to investigate further.

[31/Dec/2013:18:07:21 +0000] SEARCH RES conn=2231288 op=6 msgID=502 result=0 nentries=1 unindexed etime=5836

This access log entry indicates that the search took 5.8 seconds to complete.  One reason why it took so long was that it was an unindexed search (as noted by the “unindexed” tag in the entry).  To determine the filter associated with this search, you need to search backwards in the access log and find the corresponding SEARCH REQ for this connection (conn=2231288) and this operation (op=6).

[31/Dec/2013:18:07:15 +0000] SEARCH REQ conn=2231288 op=6 msgID=502 base="ou=people,dc=example,dc=com"

scope=wholeSubtree filter="(&(&(exampleGUID=88291000818)(objectclass=inetorgperson)))" attrs="*"

This access log entry indicates that the filter used to perform the search is

(&(&(exampleGUID=88291000818)(objectclass=inetorgperson)))

OpenDJ contains a default EQUALITY index for objectclass so assuming that you have not modified the default indexes, then the unindexed attribute causing the problem is exampleGUID.  Now that you have identified the culprit, should you run right out and create an EQUALITY index for this attribute?  Not necessarily.  It really depends on how often you see searches of this type appear in the access logs and what their impact might be.  You don’t want to maintain exampleGUID indexes if your application only searches on this attribute once in a blue moon.  If, however, you see this type of search on a consistent basis, you might want to consider adding an index.

 When to Remove Unnecessary Indexes

It is relatively straightforward to determine when to add indexes, but how do you know when you are maintaining unnecessary indexes?  Unfortunately, OpenDJ does not include utilities to tell you this, but it is possible to determine unused indexes by once again, reviewing the search filters in the access logs. One approach to accomplishing this would be to perform the following:

  1. Determine attribute names included in the search filter.
  2. Determine type of search being performed (EQUALITY, SUBSTRING, PRESENCE, etc.)
  3. Determine the frequency of the searches.
  4. Compare the searches to the already configured indexes.
  5. Remove unnecessary indexes (if desired).

It is pretty easy to write a script to perform these steps and fortunately one has already been written by Chris Ridd to perform steps 1 through 4.  His topfilters script can be found here.  Once armed with the information from his script you would simply compare it to what you already have configured for OpenDJ.

How to Determine Current Indexes and Index Types

Current indexes are reflected beneath the cn=config suffix of your OpenDJ server.  You can either query this suffix as the rootDN user or you can simply view the contents of the config.ldif file to see what indexes have been configured.


dn: ds-cfg-attribute=givenName,cn=Index,ds-cfg-backend-id=userRoot,cn=Backends,cn=config
objectClass: top
objectClass: ds-cfg-local-db-index
ds-cfg-index-type: equality
ds-cfg-index-type: substring
ds-cfg-attribute: givenName

Another method is to use the dbtest command to obtain a more detailed analysis on each index.  The dbtest command can be found in the bin directory of your OpenDJ installation.  An example execution of this command might be:

/opt/opendj/bin/dbtest list-index-status -b "dc=example,dc=com" -n userRoot

Execution of this command will return each index, its type, the database it is associated with, whether the index is valid or not, and the number of records associated with the index.  It will also detail the undefined index keys that are not maintained due to the ds-cfg-index-entry-limit being reached for that attribute.

You can take the data returned from Chris’ script, compare it with the data found for those indexes you are currently maintaining and make an intelligent decision as to whether you want to modify your indexes in any way.

Should you delete any indexes that you believe are not being used?  Again, not necessarily.  Your access logs only reflect a point in time and may not provide a comprehensive listing of application search filters.  You should always carefully consider removing existing indexes, but if you find that you have made a mistake, you can always monitor the access log for searches that are taking an unacceptably long time – or wait for that 3:00 am phone call to let you know.

Configuring Indexes

If you do decide it is necessary to update your indexes, then the best approach is to do so using the OpenDJ Control Panel or the dsconfig command line tool.  You should never update the config.ldif file directly.

The following provides an overview of how to add a new index for the exampleGUID attribute.  The index type is set to EQUALITY.

/opt/opendj/bin/dsconfig  create-local-db-index  --port 4444  --hostname ldap1.example.com 

--bindDN "cn=Directory Manager" --bindPassword password
--backend-name userRoot  --index-name exampleGUID  --set index-type:equality
--trustAll 

The following provides an overview of how to remove an existing EQUALITY index type from an existing mail index.

/opt/opendj/bin/dsconfig  set-local-db-index-prop --port 4444  --hostname ldap1.example.com 

--bindDN "cn=Directory Manager" --bindPassword password
--backend-name userRoot  --index-name mail  --remove index-type:equality
--trustAll 

If you would rather remove the entire mail index, use the following command, instead.

/opt/opendj/bin/dsconfig  delete-local-db-index  --port 4444  --hostname ldap1.example.com 

--bindDN "cn=Directory Manager" --bindPassword password
--backend-name userRoot  --index-name mail  --trustAll

 Rebuilding Indexes

OpenDJ automatically updates indexes on LDAP operations that update the database.  Adding or deleting an index or an index value is a configuration change, however, and does not affect index values already in the database.  If you delete an index type, existing index values will remain in the database until you rebuild the index.  The same is true if you add a new index or index type.  Indexes will not be added for existing database entries until you rebuild the index.

As such, any configuration changes to indexes should be followed by a rebuilding of the appropriate index.  The following provides an overview of how to rebuild the mail index once its configuration has changed.

/opt/opendj/bin/rebuild-index -p 4444 -D "cn=Directory Manager" -w password -b "dc=example,dc=com"

--index mail --start 0 --trustAll

Note:  It is not necessary to stop the OpenDJ instance before performing this task.  It has been my experience, however, that if you are able to stop the server you might want to consider doing so.  If so, then you do not need to specify a start time, bind credentials, or the trust acceptance.  These are not necessary as you will be initiating the connection immediately and directly to the database.

 Debugging Index Problems

There are times when you may see performance problems that indicate that you are performing an unindexed search, but when you look at the indexes, you find that the appropriate index has been configured.

Note:  This problem typically occurs when you do not rebuild the index after you have configured it.  Essentially, there is already data in the database when the indexes were applied.  In such cases, OpenDJ will not attempt to update the index until an initial rebuild-index has been performed.

One method of debugging this problem is to use the debugsearchindex capability in OpenDJ.

If you perform your search and request that the debugsearchindex attribute be returned as follows:

/opt/opendj/bin/ldapsearch -D "cn=Directory Manager" -w cangetin -b "ou=people,dc=example,dc=com" -s sub

"(&(&( exampleGUID=88291000818)(objectclass=inetorgperson)))" debugsearchindex

OpenDJ will emulate the search, but will not actually perform it against the database.  Instead, it will tell you how the search is to be performed and whether or not the values are indexed or not as follows:

dn: cn=debugsearch
debugsearchindex: filter=(&(&( exampleGUID=88291000818)[NOT-INDEXED](object
Class=inetorgperson)[INDEX:objectClass.equality][LIMIT-EXCEEDED])[NOT-INDEXED])
[NOT-INDEXED] scope=wholeSubtree[LIMIT-EXCEEDED:30] final=[NOT-INDEXED]

If you see something like this but your configuration tells you that the indexes have been configured, then it is time to send your LDAP administrator to training.

 Summary

As with most middleware products knowing when and how to configure indexes can be as much of an art as it is a science.  You should follow best practices where possible, but as with other products you should monitor your server to see if those practices apply to you and react where appropriate.

Leave a Comment