Suppose you own more than a few documents you more or less know the contents of, like a set of copies from the Oracle Documentation. At times, you’re looking for more reading about a certain topic but aren’t sure which of these documents to search for.
This blog post is about using Oracle® Text to quickly find in which (pdf) documents a search term can be found.
To start with, we need a table to store our documents. Let’s call it the GUIDES table, with a column named CONTENT to store pdf documents. We create it as follows:
create table guides ( id number(15,0) generated always as identity constraint guides_id_not_null not null , name varchar2(60) constraint guides_name_not_null not null , content blob ) ; create index guides_pk on guides (id); create index guides_uk1 on guides (name); alter table guides add constraint guides_pk primary key (id) not deferrable using index guides_pk; alter table guides add constraint guides_uk1 unique (name) not deferrable using index guides_uk1;
As a side note, notice how the not null constraints on both the id and name column have been created such that these constraints are named while remaining to be classified as column constraints. This refers to the null vs not null discussion, more about that here.
We now insert our documents. The SQL*Developer environment for example aids in adding BLOB content, like a pdf, to our table. More on adding BLOB content can be found here. I have loaded my top favourite guides:
An Oracle® Text index is needed to be able to query the pdf column. We anticipate on using the snippet method by creating a preference with the forward_index and save_copy attribute set.
call ctx_ddl.create_preference('guides_storage', 'BASIC_STORAGE') / call ctx_ddl.set_attribute('guides_storage','forward_index','TRUE') / call ctx_ddl.set_attribute('guides_storage','save_copy','PLAINTEXT') / create index guides_domidx on guides(content) indextype is ctxsys.context parameters ('storage guides_storage');
And finally we can quickly query our guides:
define searchterm = 'snippet' SQL> select name 2 from guides g 3 where contains 4 ( content 5 , '&searchterm.',1 6 ) > 0 7 / NAME Database SecureFiles and Large Objects Developer's Guide XML DB Developer's Guide Text Application Developer's Guide Text Reference Database PL/SQL Packages and Types Reference Elapsed: 00:00:00.014
Now, to be able to gauge which of these documents we were actually looking for, we need to add a snippet, like below:
select name , ctx_doc.snippet ( 'GUIDES_DOMIDX' , g.id , '&searchterm.' ) snippet from guides g where contains ( content , '&searchterm.',1 ) > 0 /
Displaying the results of the query using SQL Developer user defined reports both helps in re-executing the query with different search values and displaying the snippet results: