vx company
menu
sluiten
terug naar overzicht

20/11/19

Insight insight

Data Integration & Analytics
rob van den berg, vx company

Rob van den Berg

+31 6 22 98 69 27


20/11/19

Where did I read about that?

Suppose you own more than a few documents you more or less know the contents of, like a set of copies from the Oracle Documentation. At times, you’re looking for more reading about a certain topic but aren’t sure which of these documents to search for.

This blog post is about using Oracle® Text to quickly find in which (pdf) documents a search term can be found.

To start with, we need a table to store our documents. Let’s call it the GUIDES table, with a column named CONTENT to store pdf documents. We create it as follows:

create table guides 
( id number(15,0)
    generated always as identity
    constraint guides_id_not_null not null
, name varchar2(60)
    constraint guides_name_not_null not null
, content blob
) ;
create index guides_pk on guides (id);
create index guides_uk1 on guides (name);
alter table guides
  add constraint guides_pk
  primary key (id) not deferrable using index guides_pk;
alter table guides
  add constraint guides_uk1
  unique (name) not deferrable using index guides_uk1;

As a side note, notice how the not null constraints on both the id and name column have been created such that these constraints are named while remaining to be classified as column constraints. This refers to the null vs not null discussion, more about that here.

We now insert our documents. The SQL*Developer environment for example aids in adding BLOB content, like a pdf, to our table. More on adding BLOB content can be found here. I have loaded my top favourite guides:

An Oracle® Text index is needed to be able to query the pdf column. We anticipate on using the snippet method by creating a preference with the forward_index and save_copy attribute set.

call ctx_ddl.create_preference('guides_storage', 'BASIC_STORAGE')
/
call ctx_ddl.set_attribute('guides_storage','forward_index','TRUE')
/
call ctx_ddl.set_attribute('guides_storage','save_copy','PLAINTEXT')
/
create index guides_domidx on guides(content)
  indextype is ctxsys.context 
  parameters ('storage guides_storage');

And finally we can quickly query our guides:

define searchterm = 'snippet'
SQL> select  name
   2  from    guides g
   3  where   contains
   4          ( content
   5          , '&searchterm.',1
   6          ) > 0
   7  /
 NAME                                                        
 Database SecureFiles and Large Objects Developer's Guide
 XML DB Developer's Guide
 Text Application Developer's Guide
 Text Reference
 Database PL/SQL Packages and Types Reference
 Elapsed: 00:00:00.014

Now, to be able to gauge which of these documents we were actually looking for, we need to add a snippet, like below:

select  name
,       ctx_doc.snippet
        ( 'GUIDES_DOMIDX'
        , g.id
        , '&searchterm.'
        ) snippet
from    guides g
where   contains
        ( content
        , '&searchterm.',1
        ) > 0
/

Displaying the results of the query using SQL Developer user defined reports both helps in re-executing the query with different search values and displaying the snippet results:

 
Example of SQL*Developer user defined report

Delen

Meer weten over dit onderwerp?

joyce van vliet, vx company
Neem contact op met Joyce van Vliet
gang van het kantoor, vx company