EDRM Search Guide Glossary

Boolean Search – A search technique that utilizes Boolean Logic to connect individual keywords or phrases within a single query such as AND, OR, and NOT, within (w/5) , and NOT withinN (not w/5).

Concept Search – A search technique that provides words which are similar in concept to a query word. A concept search will return documents that relate to the same concept as the query word, regardless of whether the query word exists in the search results documents. Concept searches can be implemented as a simple thesaurus match, or by using sophisticated statistical analysis methods. Effectiveness of concept search in an e-discovery project depends greatly on the type of algorithm used and its implementation.

ESI / Electronically Stored Information – Electronically Stored Information or ESI is information that is stored electronically on enumerable types of media regardless of the original format in which it was created.

Fuzzy Search – A search technique that identifies ESI based on terms close to another term, with closeness defined as a typographical difference and/or change. For example, snitch, switch, and swanky can all match swatch, depending on how many incorrect letters are allowed within the search threshold.

Inverted Index – An index that maps a keyword to the list of documents that contain the keyword.

Keyword Index – A technique that examines the ESI and builds a searchable electronic index. This index typically maps from a keyword to all the documents that contain the keyword.

Keyword Search – A very common search technique that uses query words (“keywords”) and looks for them in ESI, using an index.

Phrase Search – A search consisting of multiple keywords separated by spaces to form a single phrase. For a document to match this search, the entire phrase as entered must be contained within the document.log” href=”13386″>Privilege Log A list of a set of documents that a Producing Party did not produce on account of Privilege such as Attorney-Client Privilege.

Privileged Documents – A set of documents that a Producing Party is not required to provide, since they fall into Privilege such as Attorney-Client Privilege. The existence of such documents should be recorded in the Privilege Log.

Producing Party – A party that owns the complete collection of ESI, and is responsible for producing a portion of the ESI that is deemed to be relevant for a legal case or legal enquiry.

Proximity Search – A Proximity Search searches for multiple keywords. The matching documents must contain all the keywords, with the keywords occurring within a specified number of words from each other.

RDBMS (Relational Database Management System) – This is a technical term for the class of software programs that manage data using a relational schema, such as Microsoft SQL Server or Oracle.

Regular Expressions – A pattern that describes what the search should return based on special characters added to the keyword. For example, car* uses the character * as a wildcard, and the resulting documents should contain words that begin with the characters “car”, such as car, cartoon, or cartography.

Relevancy Rank – A measurement of relevancy of a document, so that the Search Hits within a Search Results can be ordered. Relevancy measurements often involve counting the number of occurrences of a keyword within a document, as well as number of documents a keyword is found in.

Requesting Party – A party that does not own the ESI and is requesting that the Producing Party which owns the ESI to provide some subset of the ESI based on a Search Request.

Responsive Documents – A subset of ESI that matches potentially the desired set of documents for the case.

Search Engine – A search component that implements the actual process of interpreting a search request and identifying subsets of documents. For example, a database management system such as Microsoft SQL Server contains a component that manages searches of the data stored in its databases.

Search Hit – A document in ESI that is considered to match the requested Search Query.

Search Query – A well-formulated Search request that an automated search engine can interpret in order to produce matching results.

Search Results – A collection of Search Hits that match the intended documents of a Search Request.

Stemming – A search option that returns matches for all variations of the root word of the initial query word. For example, if the query word was sing, then if a search used stemming the search results would match singing, sang, sung, song, and songs as well as sing.

Synonym Search – A synonym search returns documents that contain terms similar in meaning to the query words, usually using a thesaurus to determine which terms would match the query words.

Tokenization – An operation that examines a document or block of text and breaks the text into words. Typically, a space is used to separate words, but special characters such as a hyphen, period, or quotation mark can also be used.

Truncation – A Search Specification that indicates that matching documents must contain words that begin with the letters entered, but that the matching words can end with any combination of letters.

Wildcards – Symbols such as * or ? included within a Keyword to indicate that the location where the symbols are used may match a single letter or multiple letters.