How to Cite a Data Set in APA Style
Whether you’re a "numbers person" or not, if you’re a psychology student or an early-career psychologist, you may find yourself doing some data mining. Psychologists are increasingly encouraged to provide their data online for other researchers to use and analyze. And big-data psychologist is one of the hot new jobs in the industry.
Because big data is a big deal, you’ll want to know how to cite a data set.
Reference Example
U.S. Department of Health and Human Services, Substance Abuse and Mental Health Services Administration, Office of Applied Studies. (2013). Treatment episode data set -- discharges (TEDS-D) -- concatenated, 2006 to 2009 [Data set]. doi:10.3886/ICPSR30122.v2 |
Because this data set has a DOI, the reference includes that DOI. For data sets without a DOI, the URL should be included in the reference, like this: "Retrieved from http://www.icpsr.umich.edu/SDA/SAMHDA/30122-0001/CODEBOOK/conc.htm"
Also note that the name of the data set is italicized. And, a description of the material is included in brackets after the title, but before the ending period, for maximum clarity.
In-Text Citation Example
The in-text citation for this reference would be "U.S. Department of Health and Human Services, Substance Abuse and Mental Health Services Administration, Office of Applied Studies (2013)" or "(U.S. Department of Health and Human Services, Substance Abuse and Mental Health Services Administration, Office of Applied Studies, 2013)." Of course, if you cite that a number of times, you’ll probably want to abbreviate the author name.
Related Documents
Data sets sometimes have many other documents associated with them (e.g., reports, papers, and analyses about the data; tests and measures used to procure the data; user manuals; code books). When you are citing one of these related items, whether instead of or in addition to the data, be sure to describe the format in brackets after the title. For example, in this example from the APA Style Guide to Electronic References, Sixth Edition, "Data file and code book" is used to describe the format:
Pew Hispanic Center. (2004). Changing channels and crisscrossing cultures: A survey of Latinos on the news media [Data file and code book]. Retrieved from http://pewhispanic.org/datasets/ |
The in-text citation would be "Pew Hispanic Center (2004)" or "(Pew Hispanic Center, 2004)."
For more about big data, you may be interested in these pages on the APA website:
Note: I modified this post on 12/23/2013 to include the DOI of the data set in the first reference example.