Field Sampling and Lab Testing
The silt samples collected randomly over Mackenzie Mountain areas, as shown in figure 4. Metal and non-metal elements of interest were tested by a variety of necessary chemical techniques including ICP, ICPES, ICPMS, AAS and INAA. Not every technique were used for all the variables in every sample, the tests conducted are determined by the detect limits and precision. For example, element As is measured by INAA, AAS and ICP. If all the tests were conducted at the same location for the same variable, ICP results of element As will be recorded. The final results are shown in the table 1 after selecting the data obtained from different lab testing techniques. The selecting method will be elaborated in detail in the next section at this page.
Data Selection
Raw data usually contain more than one measuring result for the same element. Another concern is the variables are of different units because various of lab testing techniques had been used to obtain the results. There are three different measurement units in the data set, Parts per Million (PPM), Parts per Billion (PPB), or Percentage (PCT). To avoid confusion, data transformation between PPM, PPB, and PCT are necessary for the variables with different unites. A hierarchical approach is used to keep the most important measure and unit for each element. The steps for data selection are listed below:
The selected geochemistry data are analyzed with basic statistics tools and multivariate statistics tools with Rstudio, version 4.04 and Anaconda Jupyter Notebook, Python version 3.6.3. Data and analysis results visualization was conducted with Rstudio. ArcGIS mapping tools are used to plot the cluster results on the regional geological map.
Descriptive statistics analysis tools are used to give out a basic summary of data features. Based on the descriptive statistics results, original data are log transformed and normal transformed before further multivariate analysis. To figure out the correlations between variables, Pearson Correlations between variables are calculated and displayed as dendrograms with heatmap. Discriminant analysis is then performed on the log transformed data set to find out the differences among element groups. Clustering multivariate techniques, k-means and h-cluster, are conducted on the whole data set with Euclidean and Mahal distance matrices. Cluster results are compared and the best set is plotted with ArcGIS to exhibit the hints for geological exploration. The group information produced in cluster analysis will be regarded as a responsive variable for Classification And Regression Tree analysis to figure out which predict variables are determinant in cluster analysis.
The silt samples collected randomly over Mackenzie Mountain areas, as shown in figure 4. Metal and non-metal elements of interest were tested by a variety of necessary chemical techniques including ICP, ICPES, ICPMS, AAS and INAA. Not every technique were used for all the variables in every sample, the tests conducted are determined by the detect limits and precision. For example, element As is measured by INAA, AAS and ICP. If all the tests were conducted at the same location for the same variable, ICP results of element As will be recorded. The final results are shown in the table 1 after selecting the data obtained from different lab testing techniques. The selecting method will be elaborated in detail in the next section at this page.
Data Selection
Raw data usually contain more than one measuring result for the same element. Another concern is the variables are of different units because various of lab testing techniques had been used to obtain the results. There are three different measurement units in the data set, Parts per Million (PPM), Parts per Billion (PPB), or Percentage (PCT). To avoid confusion, data transformation between PPM, PPB, and PCT are necessary for the variables with different unites. A hierarchical approach is used to keep the most important measure and unit for each element. The steps for data selection are listed below:
- For the samples bearing single measurement, this measurement will be kept;
- Convert the unit to the same unit of the most important variable of the sample If the current unit is different;
- If a variable has a measured data/total data ratio below 10%, the result will be regarded as invalid;
- All values below or equal to zero will be regarded as missing;
- Samples with more than 75% of missing data will be discarded.
The selected geochemistry data are analyzed with basic statistics tools and multivariate statistics tools with Rstudio, version 4.04 and Anaconda Jupyter Notebook, Python version 3.6.3. Data and analysis results visualization was conducted with Rstudio. ArcGIS mapping tools are used to plot the cluster results on the regional geological map.
Descriptive statistics analysis tools are used to give out a basic summary of data features. Based on the descriptive statistics results, original data are log transformed and normal transformed before further multivariate analysis. To figure out the correlations between variables, Pearson Correlations between variables are calculated and displayed as dendrograms with heatmap. Discriminant analysis is then performed on the log transformed data set to find out the differences among element groups. Clustering multivariate techniques, k-means and h-cluster, are conducted on the whole data set with Euclidean and Mahal distance matrices. Cluster results are compared and the best set is plotted with ArcGIS to exhibit the hints for geological exploration. The group information produced in cluster analysis will be regarded as a responsive variable for Classification And Regression Tree analysis to figure out which predict variables are determinant in cluster analysis.