The Kansas Hydrocarbon Association Navigator (KHAN) is the data mining component of GEMINI. Data mining involves automated or semi-automated statistical modeling for the sake of revealing meaningful patterns in large volumes data. Statistical modeling problems typically fall within one of three classes, continuous variable prediction (regression), supervised classification (discriminant analysis), and unsupervised classification (clustering). Numerous techniques are available for attacking each of these problem classes.

The above screen is the main screen for the KHAN Module. This screen was designed to assist the user in using KHAN. There are two functions with this Module to (1) Build and Maintain Training Datasets or to (2) Use Existing Training Datasets to Predict against Projects created in GEMINI.
The KHAN Create/Maintain Training Dataset Screen walks the user through the process of building a Training Dataset. This Module requires that the user knows what they wish to represent in their Training Dataset, i.e. Lithology with real core data, Potential of Hydrocarbon Pay, etc. This Module will help the user collect the necessary components to create the Training Dataset.
There are number of steps to creating the Dataset,
Step 1. Create Dataset Panel - Add a Title and Description to the Training Dataset File. This is the first step in building the Training Dataset. This panel will allow the user to create a title for the Training Dataset and to add a description of the Dataset. The user then selects the "Save Dataset Information" Button to create a new record in the Kansas Geological Survey Database.

Step 2. Add Oil and Gas Wells - The Add Oil and Gas Wells to Training Dataset Panel allows the user to select Kansas Wells that will be used in the Training Dataset. Oil & Gas Wells with Core Data & LAS Files in KGS Database Top Table is a list of all the wells that contain measured core data. There are only about 15 wells in Kansas with measured data in the Kansas Geological Surveys Database. The 15 wells are only in about 4 counties. This Table is provided to assist the user find the wells with measured core data easily. If the user wishes to use another set of wells for their Training Dataset they can select the Select Method to Add Well to Dataset button and the Select a method to add wells to project Screen will display.


These set of screens allow the user to select Oil and Gas Wells that are in the Kansas Geological Survey Database. There are three possible type of screens available.
This help will focus on Oil and Gas Wells with Core data. We will also focus on wells with Lithology data present. The following wells will be used in the example
Step 3. Add Electrofacies - Select the Electrofacies that will be used in training.

This screen allows the user to create their Electrofacies List. The Electofacies List can be anything the user wishes to use, i.e., Lithology using Core Data or Potential of Hydrocarbon Pay. The first example is using wells that have core that a Geologist was able to examine and identify the lithology. The second example is based on knowledge of the well and its ablity to produce. The second example the user will have to determine the depth range they wish to assign their Electrofacies, while the first is automatically determined by the core data that is present in the KGS Database.
Example
Click on the User Available Core Data option.

Click on Lithofacies - The core data that can be used with KHAN at present will be displayed in the Core Data Electrofacies Table. Highlight the Lithofacies Row, notice with this example set that all the wells selected for this example will have Lithofacies data. Now select the Select Core Data Button to retrieve the possible Lithofacies from the KGS Core Database Table.
The Core Electrofacies Data Table will contain all the possible Lithofacies that the 7 selected wells contain. It would be instructive to get a scratch piece of paper and jot down the possible lithofacies and decide on the color selection, i.e,

Click the Select Color button to change the color of the Core Electrofacies, then click << Add eFacies to List. Repeat until all Electrofacies have been color-coded.
Step 4. Select LAS Curves - Select the common LAS Log Curves that will be used in Training Digital LAS Files to predict the Electrofacies.

This panel assists the user in determining the common log curves that are available for the selected wells. The KHAN Model is created from the selection of the LAS Curves. The user can check the each curve to make sure that it is available for all wells. If the user is going to select curves they should select curves that will be generally used by all logging firms, i.e., Induction, Porosity, Bulk Density, Gamma Ray, etc. This will guarentee that not only is the selected curves common for the Training Dataset, but it should be standard for the wells in the project well list. Also note DO NOT Mix Resistivity with other log curves on the same track. The program will blacken the track.
The user must select up to a maximum of 6 curves from 46 standard curves, which are listed below:
| Mnemonic | Name | Standard Units | Unit Description |
| GR | Gamma Ray | API | API units |
| CGR | Gamma Ray Minus Uranium | API | API units |
| BS | Bit size | IN | inches |
| CAL* | Caliper | IN | inches |
| SP | Spontaneous Potential | MV | millivolts |
| RXRT | Rxo/Rt ratio | RATIO | ratio |
| COND | Conductivity | MMHO/M | millimhos per meter |
| CILD | Deep Induction Conductivity | MMHO/M | millimhos per meter |
| CILM | Medium Induction Conductivity | MMHO/M | millimhos per meter |
| RES | Resistivity | OHM-M | ohm-meters |
| LN | Long Normal Resistivity | OHM-M | ohm-meters |
| LL | Deep Laterolog Resistivity | OHM-M | ohm-meters |
| MINV | Micro Inverse Resistivity | OHM-M | ohm-meters |
| ILD | Deep Induction Resistivity | OHM-M | ohm-meters |
| RDEP | Medium Resistivity | OHM-M | ohm-meters |
| RMED | Deep Resistivity | OHM-M | ohm-meters |
| MSFL | Micro Spherically Focused Resistivity | OHM-M | ohm-meters |
| MNOR | Micro Normal Resistivity | OHM-M | ohm-meters |
| MLL | Micro Laterolog Resistivity | OHM-M | ohm-meters |
| SFLU | Spherically Focused Resistivity | OHM-M | ohm-meters |
| SN | Shallow Normal Resistivity | OHM-M | ohm-meters |
| RSHAL | Shallow Resistivity | OHM-M | ohm-meters |
| LL8 | Shallow Laterolog Resistivity | OHM-M | ohm-meters |
| ILM | Medium Induction Resistivity | OHM-M | ohm-meters |
| AHT* | Array Induction Resistivity | OHM-M | ohm-meters |
| PE | Photoelectric factor | BARNS/E | barns per electron |
| NEUT | Neutron counts | COUNTS | counts |
| EATT | Electromagnetic Attenuation Rate | DB/M | decibels per meter |
| RHOB | Bulk Density | GM/CC | grams per cc |
| DRHO | Bulk Density Correction | GM/CC | grams per cc |
| TENSION | Tension | LB | pounds |
| DPHI | Density porosity | PU | porosity units |
| NPHI | Neutron porosity | PU | porosity units |
| SPHI | Sonic porosity | PU | porosity units |
| DT | Acoustic transit time | USEC/FT | microseconds/foot |
| POTA | Potassium Concentration | % | percent or fraction |
| TEMP | Temperature | DEGF | degrees Fahrenheit |
| V1M3 | Calcite Volume Fraction | FRAC | proportion |
| V3M3 | Dolomite Volume Fraction | FRAC | proportion |
| V2M3 | Quartz Volume Fraction | FRAC | proportion |
| BHV | Borehole volume | FT3 | cubic feet |
| RWA | Apparent Water Resistivity | OHM-M | ohm-meters |
| THOR | Thorium Concentration | PPM | parts per million |
| URAN | Uranium Concentration | PPM | parts per million |
| DEPTH | Depth | FT | feet |
| TIME | Logging time | SEC | seconds |
Example - The example that we are working with we will select 5 curves to represent this lithology example, which are list below
| Mnemonic | Track | Name | Standard Units | Unit Description |
| GR | 1 | Gamma Ray | API | API units |
| DPHI | 1 | Density porosity | PU | porosity units |
| NPHI | 1 | Neutron porosity | PU | porosity units |
| ILD | 2 | Deep Induction Resistivity | OHM-M | ohm-meters |
| ILM | 2 | Medium Induction Resistivity | OHM-M | ohm-meters |

Step 5. Train Wells
- Train each of the selected Oil and Gas Wells in the Training Dataset File.

This screen allows the user to match the their Standard LAS Curve List with each of the Training Wells LAS Curves. They also have the ability to select the depth regions that will represent the selected Electrofacies with an interactive Well Profile Plot. If the user uses core data from the KGS Database the Program will automatically retrieve the Selected Core Data from the KGS Database and display it on the plot in Track 6 and save the data as part of the Dataset. The user may edit the data by removing the data from the dataset if they wish.
The user first highlights the well they wish to train and then select the
Select Well to Map LAS Curves and eFacies Depth Regions button.
The Well Profile Curve Selection Screen will display.
The user only needs to identify which LAS File contains the curves they are going to map. In our example we have chosen Resisitity, Porosity and Gamma Ray Curves to build the training dataset file. To assist the user the following table holds the LAS Curves and the depth range of the lithology core data.
Example
| API-Number | Well Name | Start Depth | End Depth | GR | ILD | ILM | DPHI | NPHI |
| 15-067-20338 | Alexander D2 | 2879.5 | 3120.1 | GR | ILD | ILM | DPHI | NPHI |
| 15-067-20652 | Craft Gas Unit 3HI | 2649.1 | 2787.3 | IDGR | IDID | IDIM | DNDP | NCNP |
| 15-067-21415 | Stuart 3-34R | 2830 | 2964.6 | GR | MNOR | MINV | DPHI | NPHI |
| 15-093-20134 | May Beaty E2 | 2666 | 2877.6 | GR | IDID | IDIM | DPHI | NPHI |
| 15-093-21250 | Shankle 2-9 | 2796.9 | 3119.3 | DLGR | IDID | IDIM | DLDP | NCNP |
| 15-187-20661 | Luke Gas Unit 4 | 2606 | 3012.1 | SGR | ILD | ILM | DPHI | NPHI |
| 15-189-22225 | Newby 2-28R | 2821 | 3061.1 | GR | ILD | ILM | DPHI | NPHI |
The user first identifies the LAS File and highlights the file. Starting
with our first well the "Alexander D2" Well there are two Digital
LAS Files. The second Digital LAS File will contain all the curves that will
match the standard LAS Curves selected for building the training database.
Highlight the "Resistivity/ Density/Neutron" Digital LAS File and
select the "Select Log for Curve" Button to display all the LAS
Curves that were used in the selected LAS File.
Now search the LAS File Curves Table for the Gamma Ray Curve (GR) and highlight that curve and then select the "Match LAS Curve with KHAN Curves" Button.
The Lower Panel in this Screen holds the selected Well LAS Curve to be matched with the standardize KHAN LAS Curve. We have selected the Gamma Ray (GR) Curve first.

In the Upper Panel in this Screen holds all the Standarized KHAN LAS Curves for this Training Dataset. Find the Gamma Ray (GR) Curve and select the "Select Standardize LAS Curve" Button. Notice that the Mnemonic GR appears in the Standarize Mnemonic Text Field. Now select the "Match Curves" Button to Match the Wells Gamma Ray Curve with the KHAN Standardized Gamma Ray Curve.
Notice that the Gamma Ray Mnemonic for the Well appears in Track 1 automatically. You preselected the Tracks each curve was to appear in the Select LAS Curves Panel in the KHAN Training Screen. Continue for the Resistivity and Porosity Curves. Now change the depth range to to 2850 to 3150 feet to make sure that all the core data is included in the dataset. Also modify the Plotting Scale to 50 Feet / Inch.

For this example we will not select the Formation Tops or add core data to the Well Profile Plot. The user may select the Help on the Well Profile Curve Selection Screen to see how to add the Formation Tops and Core Data to the Well Profile Plot. Now select the "Set Plot Limits" Button to display the Well Profile Set Curve Limits Screen.

For this example we will turn off Tracks 3, 4 and 5 since we are not displaying any data in those tracks. Track 1 and 2 are our Match LAS Curves and Track 6 will display the Litholgy data. Be sure to select the Track 1 and Track 2 Tabs to make sure that the Resistivity curves have the same starting and ending values and the Porosity curves have the same starting and ending values. The Porosity cuves may be recorded as 0.0 to 1.0 value, but the curves may go over 1.0 which will force the program to rescale to -10 to 30 instead of the real value of -0.1 to 0.3. Now select the "Plot" Button to display the Well Profile Plot.

Return to the KHAN Training Screen and select the next well and match the
curves for that well and so on until all the wells in your list have been
trained.
Step 6. Build Model
- Create a Model that will be used to Predict the Electrofacies for Oil and
Gas Wells in your Project File.

To create the Model that will be used in the Predict Phase of KHAN the user selects the Create Model Button. The Create Model Action retrieves all the data from the KGS Database and builds a Dataset, which is composed of the depth regions selected for each well, the efacies corresponding to each depth range and the LAS Curve values for each depth within the depth range for every well in the Training Dataset. Once the dataset is built the following screen will be displayed.
NOTE: This will take some time depending on the number of wells, the number
of depth ranges and the number and size of the selected LAS File.
The user is then presented with the variable selection dialog box, which should be familiar to all you Kipling fans out there. In this case, I have selected the standardized logs as predictor variables and eFacies as the response variable. I am using the standardized logs because they produced more pleasing results for this particular example. Standardizing the logs on a well-by-well basis, as I have done, means that the model is based on the patterns of relative variation within each well, rather than on the absolute magnitude of log values. Standardization is just one of a number of transformations that the user might be inclined to apply to the input logs. Others might include taking logarithms of resistivities or "normalizing" logs in the sense of shifting the log values to compensate for tool variations between wells. Thus, a more complete implementation of KHAN would provide a fairly open-ended log transformation toolkit or calculator.
After selecting the variables, the user is then presented with the infamous
model grid specification dialog box, which I have simplified considerably
relative to the original dialog box in Kipling. I have also switched from
"Averaged Shifted Histogram" terminology ("grid spacing"
and "number of layers") back to CMAC terminology ("quantization"
and "generalization"):
The user selects the OK Button and the Model will be created and saved in
the KGS Database. This Model may be used by anyone with the permissions selected
when the Training Dataset was created.
Step 7. Predict
- Validate the Model against the wells used in training dataset.
The user now can see how the Model fits the selected list of wells. There are two parts to creating a Predict Output.
This process takes some time although not as much time as creating the model
for the training dataset. The user should see the next screen after a time.

The user is then asked to match up variables in the prediction dataset with the predictor variables used in the model. I have improved this dialog box somewhat over the original one in Kipling by having the code make default selections of the matching variables (based on the variable names and units) and allowing for an arbitrary number of variables on each side of the match:
The user matches things up by selecting a line from the Numeric Variables
in Dataset list and a line from the list of model variables and then clicking
the Match>> button. The Matching Variable shown in
the right-hand list will be updated accordingly. After matching up the variables
and clicking OK, the user will be presented with the dialog box for selecting
which variables should be copied from the prediction dataset to the results
dataset.
WellID and Depth variables

Another benefit of the neural network model is that it does not involve prior
probabilities. (The CMAC model computes data density functions for each category
and turns them into probabilities using Bayes formula, which involves prior
probabilities, whereas the neural network model computes probabilities directly.)
Finally, the user is asked to provide a description for the prediction results
dataset, which is what gets put in the list of Available Prediction Results
in the main application window:
For categorical prediction, the prediction results contain the predicted
probabilities of membership in each category. I have written some code for
displaying these results, which may or may not be employed in GEMINI

To launch this option in KHANApp, select a set of output results (containing probabilities, depth, and a well id variable) from the list of Available Prediction Results and click Plot Probabilities.. The next dialog box asks the user to identify the Well ID and Depth variables in the prediction results dataset:
Select the OK Button.
Enter a Title for the output file. Note: Please no spaces in the title string. The Title is the filename of the output file that is saved and the report section will not work if the title has any spaces in it.
Example Output
