Using Data Mining for Citation Analysis

Philip B. White


This paper presents a new model for citation analysis, applying new methodological approaches in citation studies. These methods are demonstrated by an analysis of cited references from publications by the Geological Sciences faculty at the University of Colorado Boulder. The author made use of simple Python scripting, the Web of Science API, and OpenRefine to examine the most frequently cited journals and compare them to library holdings data to discover materials absent from the local collection. Of the more than 20,000 citations analyzed, 80 percent cited approximately 10 percent of all titles (412 journals). A notable finding was the heavy reliance of faculty members upon works between zero and two years of age. The streamlined model presented here removes the constraints of time and effort encountered by academic librarians interested in conducting citation analyses.

Full Text:

Copyright Philip B. White

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Article Views (Last 12 Months)

No data available

Contact ACRL for article usage statistics from 2010-April 2017.

Article Views (By Year/Month)

January: 90
February: 35
January: 586
February: 160
March: 84
April: 81
May: 92
June: 80
July: 77
August: 83
September: 65
October: 102
November: 67
December: 108
January: 4
February: 52
March: 42
April: 21
May: 13
June: 12
July: 10
August: 19
September: 12
October: 17
November: 9
December: 20