The Data Management Challenge - University of Miami · PDF file · 2015-04-13The...
Transcript of The Data Management Challenge - University of Miami · PDF file · 2015-04-13The...
![Page 1: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/1.jpg)
The Data Management Challenge: wrangling data in the research environmentThe Data Management Challenge: wrangling data in the research environment
Timothy Norris – [email protected] of Miami Libraries
![Page 2: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/2.jpg)
The Data Management Nightmare
https://www.youtube.com/watch?v=N2zK3sAtr‐4Karen Hanson, Alisa Surkis, and Karen Yacobucci, NYU Health Sciences Libraries
“Very annoying panda”video
![Page 3: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/3.jpg)
Why Manage Data?
http://www.thesundaytimes.co.uk/sto/multimedia/archive/00051/Panic‐cartoon‐580x35_51233a.jpg
From this . . .
Missing: have you seen this Microsoft word document
(cartoon)
![Page 4: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/4.jpg)
. . . to this . . .
http://abcnews.go.com/blogs/business/2013/04/rutgers‐student‐offers‐1000‐for‐data‐on‐stolen‐laptop/
![Page 5: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/5.jpg)
. . . to this . . .
http://retractionwatch.com/2015/01/20/prominent‐geneticist‐nets‐retraction‐two‐corrections‐lot‐questions/#more‐25256
![Page 6: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/6.jpg)
The Data Deluge
• NSF: Jan 2011• NEH: June 2011
Data Management Requirements
• Federally funded research results should be made accessible to the public
• Both peer‐reviewed publications and data
The 2013 OSTP Memo: Open Data
• NIH: October 2003
Data Sharing Requirements
![Page 7: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/7.jpg)
Why Manage Data
• Compliance• Grant writing• University policy• Research ethics
• Productivity• Publishing• Knowledge creation• Career advancement
http://www.pd4pic.com/images800_/carrot‐orange‐vegetable.png
+
http://www.pd4pic.com/images/stick‐outline‐tree‐branch‐wood.png
![Page 8: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/8.jpg)
What is DATA Management?
NumbersWordsCitations / referencesNotebooks / marginaliaSpecimensField SamplesImagesVideos / sound recordingRelationshipsModelsCode
¿Data?
![Page 9: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/9.jpg)
What is DATA Management?
NumbersWordsCitations / referencesNotebooks / marginaliaSpecimensField SamplesImagesVideos / sound recordingRelationshipsModelsCode
¿Data?
“Examples of Research Data and Materials include laboratory notebooks, notes of any type, photographs, films, digital images, original biological and environmental samples, protocols, numbers, graphs, charts, numerical raw experimental results, instrumental outputs from which Research Data can be derived and other deliverables under sponsored agreements.”
Johns Hopkins University (2008)http://jhuresearch.jhu.edu/Data_Management_Policy.pdf
![Page 10: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/10.jpg)
What is DATA Management?
NumbersWordsCitations / referencesNotebooks / marginaliaSpecimensField SamplesImagesVideos / sound recordingRelationshipsModelsCode
File System OrganizationFile Naming ConventionsPrivacy/Security ConsiderationsFile Format ChoiceDocumentation and metadataRoles and responsibilities in research environmentStorage and backup strategiesAcquiring and cleaning dataSharing and collaboration strategiesOwnership of dataAccess strategies / Access restrictionsData publication / Data citation
¿Data? ¿Management?
![Page 11: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/11.jpg)
What is DATA Management?
NumbersWordsCitations / referencesNotebooks / marginaliaSpecimensField SamplesImagesVideos / sound recordingRelationshipsModelsCode
File System OrganizationFile Naming ConventionsPrivacy/Security ConsiderationsFile Format ChoiceDocumentation and metadataRoles and responsibilities in research environmentStorage and backup strategiesAcquiring and cleaning dataSharing and collaboration strategiesOwnership of dataAccess strategies / Access restrictionsData publication / Data citation
¿Data? ¿Management?
![Page 12: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/12.jpg)
and Research DATA Management?
Before: Data Management Planning / Grant Process
During: Compliance and Productivity
After: Publication and/or Repository Deposit
![Page 13: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/13.jpg)
and Research DATA Management?
Before: Data Management Planning / Grant Process
Privacy/Security ConsiderationsStorage and backup strategiesFile System OrganizationFile Naming ConventionsFile Format ChoiceDocumentation and metadataRoles and responsibilities in research environmentSharing and collaboration strategiesOwnership of dataAccess strategies / Access restrictions
![Page 14: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/14.jpg)
and Research DATA Management?
During: Compliance and Productivity
Follow file naming, organization and format conventionsDocumentation and metadataAcquiring and cleaning dataRegularly backup all dataBe mindful when sharing / version controlAccess / privacy policy enforcement
![Page 15: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/15.jpg)
and Research DATA Management?
After: Publication and/or Repository Deposit
PublishDeposit in a repository
![Page 16: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/16.jpg)
What data will you collect / create?
• Will you use sensors?• Includes survey instruments and hired research assistants• Will you collect data, buy data from a provider, or receive data as a contracted service?
• Will you build models?• Will you write code?• How will you parametrize the model?• What software (or other tools) will you use?
• Will you draw from previously published materials?• Textual (or other) analysis of cultural material• Where does the data come from? • Is it in the public domain or from the government?• Are there copyright concerns?
![Page 17: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/17.jpg)
Sensors and Data Levels
Active vs. Static:
Data Stage: Example or Focus:Typical File Formats:
ACTIVE
Raw Data: Temperature readings over time
Paper? Device‐specific? .xlsx, …
Processed Data: “Cleaned,” normalized temperature data compiled in spreadsheet
.xlsx, .sas, …
Analyzed Data: Temperature data with averages computed, graphs charted
.xlsx, .sas, …
STATICFinalized,
Published Data:Do the data support hypothesis?
.csv
http://classguides.lib.uconn.edu/content.php?pid=355458&sid=3391384
![Page 18: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/18.jpg)
Security and Privacy
Not all data has security or privacy “needs”, BUT . .
• Use automatic updates (mostly for virus issues)
• Use anti‐virus software
• Never connect to untrusted wireless connections
• Computer disposal?
• Know HTTPS, SSH/SCP, sFTP
• Understand certificate errors
• Do not send confidential email
• Public computers• Always log out• Never leave data/files
http://www.cnet.com/news/a‐word‐of‐warning‐about‐free‐public‐wi‐fi/
![Page 19: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/19.jpg)
Security and Privacy
Data derived from human subjects research • Must meet federal compliance requirements for security and privacy • As examples: HIPAA, FERPA, FISMA
University of Miami Human Subjects Research Officehttp://uresearch.miami.edu/regulatory‐compliance‐services/hsro
Personally Identifiable Information (PII)
• Name, SSN, Date of Birth, Drivers License, Address, IP Address, Phone Number, anything that can uniquely identify a person
• NIH Guide for Identifying Sensitive Informationhttp://datacenter.cit.nih.gov/interface/interface241/PIIguide.html
![Page 20: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/20.jpg)
Security and Privacy
• Password lock all devices (in case of theft or loss)• Set screen to lock • This is not foolproof, any hacker knows how to get past a password. See: http://pogostick.net/~pnh/ntpasswd/ for example.
• Empty trash securely• Mac: Finder‐>Preferences: Advanced Tab, “Empty Trash Securely”• PC: Eraser (open source GPL): http://eraser.heidi.ie/
• Encryption • UM Policy: http://www.miami.edu/it/index.php/policies/• Mac: System Preferences: Security and Privacy: FileVault• PC: Control Panel: BitLocker Drive Encryption
![Page 21: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/21.jpg)
University of Miami Policies
http://www.miami.edu/it/index.php/policies/
![Page 22: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/22.jpg)
Storage and Backup
• RAID storage
• External hard drives (XFAT)
• Cloud storage and file‐syncing
• Duplicate computers or hard drives
• USB thumb drives
• Email files to yourself
• Save files without knowing their location in the computer’s file structure
DO DON’T
The XFAT format is essential if you ever want to share between a mac and a pcMac: Applications:Utilities:Disk Utility PC: right click in explorer ‐>Format
![Page 23: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/23.jpg)
Storage and Backup
• RAID storage
• External hard drives (XFAT)
• Cloud storage and file‐syncing
• Duplicate computers or hard drives
• USB thumb drives
• Email files to yourself
• Save files without knowing their location in the computer’s file structure
DO DON’T
The XFAT format is essential if you ever want to share between a mac and a pcMac: Applications:Utilities:Disk Utility PC: right click in explorer ‐>Format
Have all your work in at least threeplaces at all times: working version + two backups
Drives fail, computers break, viruses happen, computers get stolen, usb thumb drives ALWAYS fail, you will make a mistake and delete your work on accident, ex‐partners seek revenge, and the list goes on . . .
![Page 24: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/24.jpg)
You will have to understand the command line first:Mac: Applications/Utilities/TerminalPC: Start Button‐>search “cmd”
Storage and Backup
File syncing tools that exist on your machine already:• Mac: rsync• PC: xcopy
![Page 25: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/25.jpg)
You will have to understand the command line first:Mac: Applications/Utilities/TerminalPC: Start Button‐>search “cmd”
Try “dir” (pc) or “ls” (mac)Try “pwd”Try “cd ..”Try “pwd” againTry “cd ~” (mac) or “cd %HOMEPATH%” (pc)
Mac help% man <command> %man lsPC help> <command> /? >dir /?
Data Carpentry / Software Carpentry
Storage and Backup
![Page 26: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/26.jpg)
Storage and Backup
• PC: xcopy‐> xcopy <source> <destination> [<options>]
‐> xcopy c:\Users\tnorris\Documents\MapData\*.* G:\MapData /D /S /YThis copies all NEWER files from the MapData directory on the local machine to the backup MapData folder on an external hard drive.
‐> xcopy /?This will show all of the options for the xcopy command. You can see that /D tells xcopyto only copy newer files, /S goes through all sub‐directories, and /Y tells xcopy to proceed without asking the user to confirm (be careful!!)
![Page 27: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/27.jpg)
Storage and Backup
• Mac: rsync% rsync [<options>] <source> <destination>
% rsync ‐arv /Users/tnorris/Documents/MapData/* /Volumes/MyDrive/MapDataFBThis copies all NEWER files from the MapData directory on the local machine to the backup MapDataFB folder on an external hard drive named “MyDrive”.
% man rsyncThis will show all of the options for the rsync command. You can see that ‐a tells rsyncpreserve archival information (date stamps, owners, permissions), ‐r tells rsync to go through all sub‐directories, and ‐v tells rsync to tell you what it is doing (which files it copied).
![Page 28: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/28.jpg)
The Cloud
http://cloudtweaks.com/2012/02/the‐lighter‐side‐of‐the‐cloud‐future‐cloud/http://cloudtweaks.com/2012/04/the‐lighter‐side‐of‐the‐cloud‐the‐fog/used with permission
The term “cloud computing” (or just “cloud”, in the context of computing) is a marketing buzzword with no coherent meaning. It is used for a range of different activities whose only common characteristic is that they use the Internet for something beyond transmitting files. Thus, the term spreads confusion. If you base your thinking on it, your thinking will be confused.
Richard Stallman ‐ https://www.gnu.org/philosophy/words‐to‐avoid.html
![Page 29: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/29.jpg)
Storage and Backupfile‐syncing software in the cloud
• Google Drive? Drop Box? Sky Drive? iCloud?• Drop Box is good for temporary sharing• Google Drive is good for collaborative work: synchronous file editing with multiple users
• What about security?? privacy??
• BOX ‐ https://www.box.com/• University of Miami has an affiliation• All content is encrypted• All platforms are supported including smart phones• 25 GB limithttp://www.miami.edu/it/index.php/about_it/aas/ps/documentation/box/
NOTE: make sure to do “Box Login” first. It will not work if you just download.
![Page 30: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/30.jpg)
Data Management Plan
• Compliance vs Practice• Check all funder requirements (funder website or DMPTool)• Consider your data
• How will the value of your data change with time?• Who will want to use your data? Why?• Who owns the data for your research?• Are there privacy concerns with your data?• Are there existing repositories and standards for your data?
![Page 31: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/31.jpg)
![Page 32: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/32.jpg)
![Page 33: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/33.jpg)
![Page 34: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/34.jpg)
![Page 35: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/35.jpg)
https://dmponline.dcc.ac.uk/
![Page 36: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/36.jpg)
Data Organization
File StructuresNaming Conventions
Cartoon from The Far Side by Gary Larson
Far Side cartoon with labels
![Page 37: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/37.jpg)
Think about Time and Space
• Directory Structure• File Naming Conventions
Save Time and Space
Data Organization
![Page 38: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/38.jpg)
![Page 39: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/39.jpg)
![Page 40: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/40.jpg)
![Page 41: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/41.jpg)
![Page 42: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/42.jpg)
File Naming Best Practices
• useCamelCasing.docx
• use_underscores.txt
• 2015_put_The_Date_First.csv
• 20150214_useTwoDidgitDateNumbers.xls
• startASeriesWithLeadingZeros_001.doc
• 20150214_UM_date‐place.shp
• useFileExtensions.jpg
• Leave spaces in the file name.xls
• Use the default save name from MS word that is simply the long first sentence in your file.doc
• January 5 2015 Samples with the month first.xls
http://assets.amuniversal.com/42ec27b03718012ea5cb00163e41dd5b
DO DON’T
Mac file extensions: finder‐>preferences: show all filename extensions (check)PC file extensions: explorer‐>organize‐>folder and search options:
View tab: Hide Extensions for known file types (uncheck)
Dilbert cartoon on file naming conventions
![Page 43: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/43.jpg)
File Naming Best Practices
• useCamelCasing.docx
• use_underscores.txt
• 2015_put_The_Date_First.csv
• 20150214_useTwoDidgitDateNumbers.xls
• startASeriesWithLeadingZeros_001.doc
• 20150214_UM_date‐place.shp
• useFileExtensions.jpg
• Leave spaces in the file name.xls
• Use the default save name from MS word that is simply the long first sentence in your file.doc
• January 5 2015 Samples with the month first.xls
http://assets.amuniversal.com/42ec27b03718012ea5cb00163e41dd5b
DO DON’T
Mac file extensions: finder‐>preferences: show all filename extensions (check)PC file extensions: explorer‐>organize‐>folder and search options:
View tab: Hide Extensions for known file types (uncheck)
BE CONSISTENT
Dilbert cartoon on file naming conventions
![Page 44: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/44.jpg)
Acquiring Data and Cleanup
• In GIS (geographic information systems) THE hardest step• Check known data repositories first• For google, use correct keywords to find the data you seek
• Include file format extension• Use discipline specific jargon
• Save in a known (and planned for) location on the computer• Not the desktop• Name your downloaded files with conventions (time/place)• Save original and then work on ‘versioned’ file• Take notes on sources as you download them (URL, date, and organization/author)
![Page 45: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/45.jpg)
Good Text Editors
• No proprietary formats• Line numbers in editor• Find/replace tools• Markup highlight
• Open Source• Gedit (PC, Linux)• TextWrangler (mac)• Notepad++ (windows)
• Commercial• Oxygen (all platforms) [xml]• Sublime Text 2 (all platforms)• BBEdit (mac)
https://wiki.gnome.org/Apps/Gedithttp://www.barebones.com/products/textwrangler/http://notepad‐plus‐plus.org/
http://www.oxygenxml.com/http://www.sublimetext.com/http://www.barebones.com/products/bbedit/
![Page 46: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/46.jpg)
Acquiring Data and Cleanup
• Open Refine (formerly Google Refine)• Examples of tools
• Identify typos or very similar entries (US and U.S.)• Identify duplicate entries• Data exploration and basic visualization tools• and so on
• Good tutorial here: http://enipedia.tudelft.nl/wiki/OpenRefine_Tutorial
• Data Wrangler• Examples of tools
• Transformations• Removing blanks• Aggregation
• http://vis.stanford.edu/wrangler/
tools for cleaning data from all sorts of sources
![Page 47: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/47.jpg)
Cleaning Data (continued)
• Tabula.technology– http://tabula.technology/
• Import.io– https://import.io/
Licensing/purchasing Data
• Purchasing guidelines– http://ukdataservice.ac.uk/media/455386/cd248‐data‐purchase‐
guidelines.pdf
![Page 48: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/48.jpg)
File FormatsArchive and compressed[edit]Main article: List of archive formats.cab — A cabinet (.cab) file is a library of compressed files stored as a single file. Cabinet files are used to organize installation files that are copied to the user's system.[1].?Q? — files compressed by the SQ program7z —7‐Zip compressed fileAAC —Advanced Audio Codingace —ACE compressed fileALZ — ALZip compressed fileAPK — Applications installable on AndroidAT3 — Sony's UMD Data compression.bke — BackupEarth.com Data compressionARCARJ — ARJ compressed fileBA — Scifer Archive (.ba), Scifer External Archive Typebig — Special file compression format used by Electronic Arts for compressing the data for many of EA's gamesBIK (.bik) —Bink Video file. A video compression system developed by RAD Game ToolsBIN — compressed Archive. can be read and used by cd‐roms and java, Extractable by 7‐zip and WINRARBKF (.bkf) — Microsoft backup created by NTBACKUP.EXEbzip2 — (.bz2)bld — Skyscraper Simulator Buildingc4 — JEDMICS image files, a DOD systemcab —Microsoft Cabinetcals — JEDMICS image files, a DOD systemCLIPFLAIR (.clipflair, .clipflair.zip) ‐ ClipFlair Studio [1] component saved state file (contains component options in XML, extra/attached files and nested components' state in child .clipflair.zip files – activities are also components and can be nested at any depth)cpt/sea —Compact Pro (Macintosh)DAA — Closed‐format, Windows‐only compressed disk imagedeb — Debian install packageDMG —an Apple compressed/encrypted formatDDZ —a file which can only be used by the "daydreamer engine" created by "fever‐dreamer", a program similar to RAGS, it's mainly used to make somewhat short games.DPE — Package of AVE documents made with Aquafadas digital publishing tools..egg —Alzip Egg Edition compressed fileEGT (.egt) — EGT Universal Document also used to create compressed cabinet files replaces .ecabECAB (.ECAB, .ezip) — EGT Compressed Folder used in advanced systems to compress entire system folders, replaced by EGT Universal DocumentESS (.ess) — EGT SmartSense File, detects files compressed using the EGT compression system.GHO (.gho, .ghs) —Norton Ghostgzip (.gz) —Compressed fileIPG (.ipg) — Format in which Apple Inc. packages their iPod games. can be extracted through Winrarjar — ZIP file with manifest for use with Java applications.LBR (.Lawrence) — Lawrence Compiler Type fileLBR —Library fileLQR — LBR Library file compressed by the SQ program.LHA (.lzh) — Lempel, Ziv, Huffmanlzip (.lz) —Compressed filelzolzmaLZX (algorithm)MBRWizard archive (.mbw)MPQ Archives (.mpq) — Used by Blizzard gamesMacBinary (.bin)NTH (.nth) — Nokia Theme Used by Nokia Series 40 CellphonesPAK — Enhanced type of .ARC archiveParchive — (.par, .par2)Portable Application File — (.paf)PYK (.pyk) Compressed FileQuake 3 archive (.pk3) — (See note on Doom³)Doom³ archive (.pk4) — (Opens similarly to a zip archive.)RAR Rar Archive (.rar) — for multiple file archive (rar to .r01‐.r99 to s01 and so on)RAG / RAGS game file, a game playable in the RAGS game‐engine, a free program which both allows people to create games, and play games, games created have the file format "RAG game file"RPM — Red Hat package/installer for Fedora, RHEL, and similar systems.SEN Scifer Archive (.sen) — Scifer Internal Archive Typesit/sitx — StuffIt (Macintosh)SKB — Google Sketchup Backup Filetar —group of files, packaged as one file.tar.gz, .tgz — gzipped tar fileTB (.tb) — Tabbery Virtual Desktop Tab fileTIB (.tib) — Acronis True Image backupuha — Ultra High Archive Compression.uue — user unifile elementVIV — Archive format used to compress data for several video games, including Need For Speed: High Stakes.VOL — unknown archiveVSA —Altiris Virtual Software ArchiveWAX — Wavexpress — A ZIP alternative optimized for packages containing video, allowing multiple packaged files to be all‐or‐none delivered with near‐instantaneous unpacking via NTFS file system manipulation..tar.xz — an xz compressed tar file. (Based on LZMA2.)Z —Unix compress filezoo —based on LZWzip — popular compression formatPhysical recordable media archiving[edit]ISO —The generic file format for most optical media, including CD‐ROM, DVD‐ROM, Blu‐ray Disc, HD DVD and UMD.NRG —The proprietary optical media archive format used by Nero applications.IMG — For archiving MS‐DOS formatted floppy disks.ADF —Amiga Disk Format, for archiving Amiga floppy disksADZ —The GZip‐compressed version of ADF.DMS — Disk Masher System, a disk‐archiving system native to the Amiga.DSK — For archiving floppy disks from a number of other platforms, including the ZX Spectrum and Amstrad CPC.D64 — An archive of a Commodore 64 floppy disk.SDI — System Deployment Image, used for archiving and providing "virtual disk" functionality.MDS — DAEMON tools native disc image file format used for making images from optical CD‐ROM, DVD‐ROM, HD DVD or Blu‐ray Disc. It comes together with MDF file and can be mounted with DAEMON Tools.MDX — New DAEMON Tools file format that allows to get one MDX disc image file instead of two (MDF and MDS).DMG —Macintosh disk image files(MPEG‐1 is found in a .DAT file on a video CD.)
CDI —DiscJuggler image fileCUE — CDRWrite CUE image fileCIF —Easy CD Creator .cif formatC2D — Roxio / WinOnCD .c2d formatDAA — PowerISO .daa formatCCD,SUB,IMG — CloneCD image fileB6T — BlindWrite 5/6 image file
Computer‐aided[edit]Computer‐aided is a prefix for several categories of tools (e.g., design, manufacture, engineering) which assist professionals in their respective fields (e.g., machining, architecture, schematics).
Computer‐aided design (CAD)[edit]Computer‐aided design (CAD) software assists engineers, architects and other design professionals in project design.
3dmlw — (3D Markup Language for Web) files3dxml — Dassault Systemes graphic representationACP — VA Software VA — Virtual Architecture CAD fileAMF — Additive Manufacturing File FormatAEC — DataCAD drawing format [2]AR — Ashlar‐Vellum Argon — 3D ModelingART — ArtCAM modelASC — BRL‐CAD Geometry File (old ASCII format)ASM — Solidedge Assembly, Pro/ENGINEER AssemblyBIN, BIM —Data Design System DDS‐CADBREP — Open CASCADE 3D model (shape)CCC —CopyCAD CurvesCCM — CopyCAD ModelCCS — CopyCAD SessionCAD —CadStdCATDrawing — CATIA V5 Drawing documentCATPart — CATIA V5 Part documentCATProduct — CATIA V5 Assembly documentCATProcess — CATIA V5 Manufacturing documentcgr — CATIA V5 graphic representation fileCO — Ashlar‐Vellum Cobalt —parametric drafting and 3D modelingDRW — Caddie Early version of Caddie drawing —Prior to Caddie changing to DWGDFT — Solidedge DraftDGN — MicroStation design fileDGK —Delcam GeometryDMT — DelcamMachining TrianglesDXF — ASCII Drawing Interchange file format —AutoCADDWB — VariCAD drawing fileDWF — AutoDesk's Web Design Format; AutoCAD & Revit can publish to this format; similar in concept to PDF files; AutoDesk Design Review is the readerDWG —AutoCAD and Open Design Alliance applications, Autodesk Inventor Drawing fileEASM — SolidWorks eDrawings assembly fileEDRW — eDrawings drawing fileEMB — WilcomES Designer Embroidery CAD fileEPRT — eDrawings part fileESW — AGTEK formatEXCELLON —Excellon fileEXP — Drawing Express file formatFM — FeatureCAM Part FileFMZ — FormZ Project fileG — BRL‐CAD Geometry FileGBR — Gerber fileGLM — KernelCAD modelGRB — T‐FLEX CAD FileGTC — GRAITEC Advance file formatIAM —Autodesk Inventor Assembly fileICD — IronCAD 2D CAD fileIDW —Autodesk Inventor Drawing fileIFC —buildingSMART for sharing AEC and FM dataIGES — Initial Graphics Exchange SpecificationIntergraph's Intergraph Standard File FormatsIPN —Autodesk Inventor Presentation fileIPT —Autodesk Inventor Part fileJT (visualization format) Jupiter TesselationMCD — Monu‐CAD (Monument/Headstone Drawing file)model — CATIA V4 part documentOCD —Orienteering Computer Aided Design (OCAD) filePAR — Solidedge PartPIPE —PIPE‐FLO Professional Piping system design filePLN — ArchiCad projectPRT — NX (recently known as Unigraphics), Pro/ENGINEER Part, CADKEY PartPSM — Solidedge SheetPSMODEL — PowerSHAPE ModelPWI — PowerINSPECT FilePYT — Pythagoras FileSKP — SketchUp ModelRLF —ArtCAM ReliefRVM ‐ AVEVA PDMS 3D Review modelRVT — AutoDesk Revit project filesRFA — AutoDesk Revit family filesS12 — Spirit file, by SofttechSCAD —OpenSCAD 3D part modelSCDOC — SpaceClaim 3D Part/AssemblySLDASM — SolidWorks Assembly drawingSLDDRW — SolidWorks 2D drawingSLDPRT — SolidWorks 3D part modelSoftimage's dotXSISTEP — Standard for the Exchange of Product model dataSTL — Stereo Lithographic data format used by various CAD systems and stereo lithographic printing machines.TCT — TurboCAD drawing templateTCW — TurboCAD for Windows 2D and 3D drawingUNV — I‐DEAS I‐DEAS (Integrated Design and Engineering Analysis Software)VC6 — Ashlar‐Vellum Graphite — 2D and 3D draftingVLM — Ashlar‐Vellum Vellum, Vellum 2D, Vellum Draft, Vellum 3D, DrawingBoardVS — Ashlar‐Vellum Vellum SolidsWRL — Similar to STL, but includes color. Used by various CAD systems and 3D printing rapid prototyping machines. Also used for VRML models on the web.XE — Ashlar‐Vellum Xenon — for Associative 3D Modeling
Electronic design automation (EDA)[edit]Electronic design automation (EDA), or electronic computer‐aided design (ECAD), is specific to the field of electrical engineering.
BRD — Board file for EAGLE Layout Editor, a commercial PCB design toolBSDL — Description language for testing through JTAGCDL — Transistor‐level netlist format for IC designCPF — Power‐domain specification in SoC implementation (see also UPF)DEF — Gate‐level layoutDSPF —Detailed Standard Parasitic Format, Analog‐level parasitics of interconnections in IC designEDIF — Vendor neutral gate‐level netlist formatFSDB —Analog waveform format (see also Waveform viewer)GDSII — Format for PCB and layout of integrated circuitsHEX — ASCII‐coded binary format for memory dumpsLEF — Library Exchange Format, physical abstract of cells for IC designLIB — Library modeling (function, timing) formatMS12 — NI Multisim fileOASIS — Open Artwork System Interchange StandardOpenAccess — Design database format with APIsSDC — Synopsys Design Constraints, format for synthesis constraintsSDF — Standard for gate‐level timingsSPEF — Standard format for parasitics of interconnections in IC designSPI, CIR — SPICE Netlist, device‐level netlist and commands for simulationSREC, S19 — S‐record, ASCII‐coded format for memory dumpsSTIL — Standard Test Interface Language, IEEE1450‐1999 standard for Test Patterns for ICSV — SystemVerilog source fileUPF — Standard for Power‐domain specification in SoC implementationV — Verilog source fileVCD — Standard format for digital simulation waveformVHD, VHDL — VHDL source fileWGL — Waveform Generation Language, format for Test Patterns for IC
Test technology[edit]Files output from Automatic Test Equipment or post‐processed from such.
Standard Test Data FormatDatabase[edit]4DB — 4D database Structure file4DD —4D database Data file4DIndy — 4D database Structure Index file4DIndx — 4D database Data Index file4DR — 4D database Data resource file (in old 4D versions)ACCDB — Microsoft Database (Microsoft Office Access 2007 and later)ACCDE — Compiled Microsoft Database (Microsoft Office Access 2007 and later)ADT — Sybase Advantage Database Server (ADS)APR — Lotus Approach data entry & reportsBOX — Lotus Notes Post Office mail routing databaseCHML — Krasbit Technologies Encrypted database file for 1 click integration between contact management software and the chameleon(tm) line of imaging workflow solutionsDAF —Digital Anchor data fileDAT —DOS BasicDAT — Intersystems Caché database fileDB — ParadoxDB — SQLiteDBF — db/dbase II,III,IV and V, Clipper, Harbour/xHarbour, Fox/FoxPro, OracleEGT — EGT Universal Document, used to compress sql databases to smaller files, may contain original EGT database style.ESS — EGT SmartSense is a database of files and its compression style. Specific to EGT SmartSenseEAP — Enterprise Architect ProjectFDB — Firebird DatabasesFDB — Navision database fileFP, FP3, FP5, and FP7 — FileMaker ProFRM — MySQL table definitionGDB —Borland InterBase DatabasesGTABLE — Google Drive Fusion TableKEXI —Kexi database file (SQLite‐based)KEXIC — shortcut to a database connection for a Kexi databases on a serverKEXIS — shortcut to a Kexi databaseLDB — Temporary database file, only existing when database is openMDA —Add‐in file for Microsoft AccessMDB — Microsoft Access databaseADP —Microsoft Access project (used for accessing databases on a server)MDE — Compiled Microsoft Database (Access)MDF — Microsoft SQL Server DatabaseMYD — MySQL MyISAM table dataMYI — MySQL MyISAM table indexNCF — Lotus Notes configuration fileNSF — Lotus Notes databaseNTF — Lotus Notes database design templateNV2 —QW Page NewViews object oriented accounting databaseODB — LibreOffice Base or OpenOffice Base databaseORA —Oracle tablespace files sometimes get this extension (also used for configuration files)PDB — Palm OS DatabasePDI —Portable Database ImagePDX — Corel Paradox database managementPRC — Palm OS resource databaseSQL — bundled SQL queriesREC — GNU recutils databaseREL — Sage Retrieve 4GL data fileRIN — Sage Retrieve 4GL index fileSDB — StarOffice's StarBaseSDF — SQL Compact Database fileUDL — Universal Data LinkwaData — Wakanda (software) database Data filewaIndx — Wakanda (software) database Index filewaModel —Wakanda (software) database Model filewaJournal —Wakanda (software) database Journal fileWDB — Microsoft Works DatabaseWMDB — Windows Media Database file — The CurrentDatabase_360.wmdb file can contain file name, file properties, music, video, photo and playlist information.
Desktop publishing[edit]AVE / ZAVE — AquafadasCHP / pub / STY / CAP / CIF / VGR / FRM —Ventura Publisher — Xerox (DOS / GEM)DTP — Greenstreet Publisher, GST PressWorksGDRAW — Google Drive DrawingINDD —Adobe InDesignPSD ‐ Adobe PhotoshopMCF — FotoInsight DesignerPMD — Adobe PageMakerPPP — Serif PagePlusPUB — Microsoft PublisherQXD —Quark XPressFM —Adobe FrameMakerSLA / SCD — Scribus
Document[edit]These files store formatted text and plain text.602 — Text602 documentABW — AbiWord DocumentACL — MS Word AutoCorrect List
AFP — Advanced Function Presentation — IBcAMI — Lostus Ami ProAmigaguideANS —American National Standards Institute (ANSI) textASC — ASCII textAWW — Ability WriteCCF — Color Chat 1.0CSV — ASCII text as comma‐separated values, used in spreadsheets and database management systemsCWK — ClarisWorks / AppleWorks documentDBK — DocBook XML sub‐formatDOC —Microsoft Word documentDOCM —Microsoft Word for Mac documentDOCX —Office Open XML documentDOT —Microsoft Word document templateDOTX —Office Open XML text document templateEGT — EGT Universal DocumentEPUB —EPUB open standard for e‐booksEZW — Reagency Systems easyOFFER document[3]FDX — Final NávrhFTM — Fielded Text MetaFTX — Fielded Text (Declared)GDOC — Google Drive DocumentHTML — HyperText Markup Language (.html, .htm)HWP — Haansoft (Hancom) Hangul Word Processor documentHWPML — Haansoft (Hancom) Hangul Word Processor Markup Language documentLOG — Text log fileLWP — Lotus Word ProMBP — metadata for Mobipocket documentsMD — Markdown text documentMCW —Microsoft Word for Macintosh (versions 4.0–5.1)Mobi —Mobipocket documentsNB — Mathematica NotebookNBP — Mathematica Player NotebookODM —OpenDocument master documentODT —OpenDocument text documentOTT — OpenDocument text document templateOMM — OmmWriter text documentPAGES — Apple Pages documentPAP — Papyrus word processor documentPDAX —Portable Document Archive (PDA) document index filePDF — Portable Document FormatRadix‐64RTF — Rich Text documentQUOX — Question Object File Format for Quobject Designer or Quobject ExplorerRPT — Crystal ReportsSDW — StarWriter text document, used in earlier versions of StarOfficeSE — Shuttle DocumentSTW — OpenOffice.org XML (obsolete) text document templateSxw — OpenOffice.org XML (obsolete) text documentTeX — TEXINFO — TexinfoTroffTXT — ASCII nebo Unicode plaintext Text fileUOF —Uniform Office FormatUOML — Unique Object Markup LanguageVIA — Revoware VIA Document Project FileWPD — WordPerfect documentWPS — Microsoft Works documentWPT — Microsoft Works document templateWRD — WordIt! documentWRF — ThinkFree WriteWRI — Microsoft Write documentXHTML (xhtml, XHT..) — eXtensible Hyper‐Text Markup LanguageXML —eXtensible Markup LanguageXPS — Open XML Paper Specification
Financial records[edit]MYO — MYOB Limited (Windows) FileMYOB — MYOB Limited (Mac) FileTAX — TurboTax FileYNAB —You Need a Budget (YNAB) File
Financial data transfer formats[edit]Interactive Financial Exchange (IFX) — XML‐based specification for various forms of financial transactionsOpen Financial Exchange (.ofx) —open standard supported by CheckFree and Microsoft and partially by Intuit; SGML and later XML basedQFX — proprietary pay‐only file format used only by IntuitQuicken Interchange Format (.qif) —open standard formerly supported by Intuit
Font file[edit]ABF — Adobe Binary Screen FontAFM — Adobe Font MetricsBDF — Bitmap Distribution FormatBMF — ByteMap Font FormatFNT — Bitmapped Font —Graphical Environment ManagerFON —Bitmapped Font — Microsoft WindowsMGF — MicroGrafx FontOTF — OpenType FontPCF — Portable Compiled FontPostScript Font — Type 1, Type 2PFA — Printer Font ASCIIPFB — Printer Font Binary —AdobePFM — Printer Font Metrics —AdobeAFM — Adobe Font MetricsFOND — Font Description resource —Mac OSSFD — FontForge spline font database FontSNF — Server Normal FormatTDF — TheDraw FontTFM — TeX font metricTTF (.ttf, .ttc) — TrueType FontWOFF — Web Open Font Format
Geographic information system[edit]ASC — ASCII point of interest (POI) text fileAPR — ESRI ArcView 3.3 and earlier project fileDEM — USGS DEM file formatE00 — ARC/INFO interchange file formatGeoTIFF — Geographically located raster dataGML — Geography Markup Language file[4]GPX — XML‐based interchange formatMXD — ESRI ArcGIS project file, 8.0 and higherNTF — National Transfer Format fileOV2 —TomTom POI overlay fileSHP — ESRI shapefileTAB — MapInfo Table file formatWorld TIFF — Geographically located raster data: text file giving corner coordinate, raster cells per unit, and rotationDTED —Digital Terrain Elevation DataKML —Keyhole Markup Language, XML‐based
Graphical information organizers[edit]3DT — 3D Topicscape The database in which the meta‐data of a 3D Topicscape is held. A 3D Topicscape is a form of 3D concept map (like a 3D mind‐map) used to organize ideas, information and computer files.ATY — 3D Topicscape file, produced when an association type is exported by 3D Topicscape. Used to permit round‐trip (export Topicscape, change files and folders as desired, re‐import them to 3D Topicscape).CAG — Linear Reference System.FES — 3D Topicscape file, produced when a fileless occurrence in 3D Topicscape is exported to Windows. Used to permit round‐trip (export Topicscape, change files and folders as desired, re‐import them to 3D Topicscape).MGMF — MindGenius Mind Mapping Software file format.MM — FreeMind mind map file (XML).MMP —Mind Manager mind map file.TPC — 3D Topicscape file, produced when an inter‐Topicscape topic link file is exported to Windows. Used to permit round‐trip (export Topicscape, change files and folders as desired, re‐import them to 3D Topicscape).
Graphics[edit]Main article: image file formats
Color palettes[edit]ACT — Adobe Color Table. Contains a raw color palette and consists of 256 24‐bit RGB colour values.PAL — Microsoft palette file
Color management[edit]ICC/ICM —Color profile conforming the specification of the ICC.
Raster graphics[edit]Raster (or Bitmap) files store images as a group of pixels.ASE — Adobe SwatchART — America Online proprietary formatBMP — Microsoft Windows Bitmap formatted imageBLP —Blizzard Entertainment proprietary texture formatCD5 — Chasys Draw IES imageCIT — Intergraph is a monochrome bitmap formatCPT — Corel PHOTO‐PAINT imageCR2 — Canon camera raw format. Photos will have this format on some Canon cameras if the quality "RAW" is selected in camera settings.CUT — Dr. Halo image fileDDS —DirectX texture fileDIB —Device‐Independent Bitmap graphicDjVu — DjVu for scanned documentsEGT — EGT Universal Document, used in EGT SmartSense to compress PNG files to yet a smaller fileExif — Exchangeable image file format (Exif) is a specification for the image file format used by digital camerasGIF —CompuServe's Graphics Interchange FormatGPL — GIMP Palette, using a textual representation of color names and RGB valuesGRF — Zebra Technologies proprietary formatICNS — file format use for icons in Mac OS X. Contains bitmap images at multiple resolutions and bitdepths with alpha channel.ICO —a file format used for icons in Microsoft Windows. Contains small bitmap images at multiple resolutions and sizes.IFF (.iff, .ilbm, .lbm) — ILBMJNG — a single‐frame MNG using JPEG compression and possibly an alpha channel.JPEG, JFIF (.jpg or .jpeg) — Joint Photographic Experts Group — a lossy image format widely used to display photographic images.JP2 — JPEG2000JPS — JPEG StereoLBM —Deluxe Paint image fileMAX — ScanSoft PaperPort documentMIFF — ImageMagick's native file formatMNG —Multiple Network Graphics, the animated version of PNGMSP — a file format used by old versions of Microsoft Paint. Replaced with BMP in Microsoft Windows 3.0NITF — A U.S. Government standard commonly used in Intelligence systemsOTA bitmap (Over The Air bitmap) — a specification designed by Nokia for black and white images for mobile phonesPBM — Portable bitmapPC1 — Low resolution, compressed Degas picture filePC2 — Medium resolution, compressed Degas picture filePC3 — High resolution, compressed Degas picture filePCF — Pixel Coordination FormatPCX — a lossless format used by ZSoft's PC Paint, popular at one time on DOS systems.PDN —Paint.NET image filePGM — Portable graymapPI1 — Low resolution, uncompressed Degas picture filePI2 —Medium resolution, uncompressed Degas picture file. Also Portrait Innovations encrypted image format.PI3 —High resolution, uncompressed Degas picture filePICT, PCT — Apple Macintosh PICT imagePNG —Portable Network Graphic (lossless, recommended for display and edition of graphic images)PNM — Portable anymap graphic bitmap imagePNS — PNG StereoPPM — Portable Pixmap (Pixel Map) imagePSB — Adobe Photoshop Big image file (for large files)PSD, PDD —Adobe Photoshop DrawingPSP — Paint Shop Pro imagePX — Pixel image editor image filePXM — Pixelmator image filePXR — Pixar Image Computer image fileQFX — QuickLink Fax imageRAW — General term for minimally processed image data (acquired by a digital camera)RLE —a run‐length encoded imageSCT — Scitex Continuous Tone image fileSGI, RGB, INT, BW — Silicon Graphics ImageTGA (.tga, .targa, .icb, .vda, .vst, .pix) — Truevision TGA (Targa) imageTIFF (.tif or .tiff) —Tagged Image File Format (usually lossless, but many variants exist, including lossy ones)TIFF/EP (.tif or .tiff) — ISO 12234‐2; tends to be used as a basis for other formats rather than in its own right.VTF — Valve Texture FormatXBM — X Window System BitmapXCF — GIMP image (from Gimp's origin at the eXperimental Computing Facility of the University of California)XPM — X Window System PixmapVector graphics[edit]Vector graphics use geometric primitives such as points, lines, curves, and polygons to represent images.
3DV —3‐D wireframe graphics by Oscar GarciaAMF — Additive Manufacturing File FormatAWG —Ability DrawAI —Adobe Illustrator DocumentCGM — Computer Graphics Metafile an ISO StandardCDR — CorelDRAW DocumentCMX — CorelDRAW vector imageDXF — ASCII Drawing Interchange file Format, used in AutoCAD and other CAD‐programsE2D — 2‐dimensional vector graphics used by the editor which is included in JFireEGT — EGT Universal Document, EGT Vector Draw images are used to draw vector to a websiteEPS — Encapsulated PostscriptFS — FlexiPro fileGBR — Gerber fileODG — OpenDocument DrawingMOVIE.BYURenderManSVG — Scalable Vector Graphics, employs XMLScene description languages (3D vector image formats)STL — Stereo Lithographic data format (see STL (file format)) used by various CAD systems and stereo lithographic printing machines. See above.VRML Uses .wrl extension — Virtual Reality Modeling Language, for the creation of 3D viewable web images.X3DSXD — OpenOffice.org XML (obsolete) DrawingV2D — voucher design used by the voucher management included in JFireVND — Vision numeric Drawing file used in TypeEdit, Gravostyle.WMF —Windows Meta FileEMF — Enhanced (Windows) MetaFile, an extension to WMFART — Xara — Drawing (superseded by XAR)XAR — Xara — Drawing3‐D graphics[edit]3D graphics are 3D models that allow building models in real‐time or non real‐time 3D rendering.
3DMF — QuickDraw 3D Metafile (.3dmf)3DM — OpenNURBS Initiative 3D Model (used by Rhinoceros 3D) (.3dm)3DS — Legacy 3D Studio Model (.3ds)ABC — Alembic (Computer Graphics)AC — AC3D Model (.ac)AMF — Additive Manufacturing File FormatAN8 —Anim8or Model (.an8)AOI — Art of Illusion Model (.aoi)B3D — Blitz3D Model (.b3d)BLEND —Blender (.blend)BLOCK —Blender encrypted blend files (.block)C4D — Cinema 4D (.c4d)Cal3D — Cal3D (.cal3d)CCP4 — X‐ray crystallography voxels (electron density)CFL —Compressed File Library (.cfl)COB —Caligari Object (.cob)CORE3D — Coreona 3D Coreona 3D Virtual File(.core3d)CTM — OpenCTM (.ctm)DAE —COLLADA (.dae)DFF — RenderWare binary stream, commonly used by Grand Theft Auto III‐era games as well as other RenderWare titlesDPM — deepMesh (.dpm)DTS — Torque Game Engine (.dts)EGG —Panda3D EngineFACT —Electric Image (.fac)FBX — Autodesk FBX (.fbx)G — BRL‐CAD geometry (.g)GLM — Ghoul Mesh (.glm)JAS — Cheetah 3D file (.jas)LWO — Lightwave Object (.lwo)LWS — Lightwave Scene (.lws)LXO — LuxologyModo (software) file (.lxo)MA — Autodesk Maya ASCII File (.ma)MAX — Autodesk 3D Studio Max file (.max)MB —Autodesk Maya Binary File (.mb)MD2 — Quake 2 model format (.md2)MD3 — Quake 3 model format (.md3)MDX — Blizzard Entertainment's own model format (.mdx)MESH — New York University(.m)MESH — Meshwork Model (.mesh)MM3D — Misfit Model 3d (.mm3d)MPO — Multi‐Picture Object —This JPEG standard is used for 3d images, as with the Nintendo 3DSMRC — voxels in cryo‐electron microscopyNIF —Gamebryo NetImmerse File (.nif)OBJ — Wavefront .obj file (.obj)OFF — OFF Object file format (.off)OGEX —Open Game Engine Exchange (OpenGEX) format (.ogex)PLY —Polygon File Format / Stanford Triangle Format (.ply)PRC — Adobe PRC (embedded in PDF files)POV —POV‐Ray document (.pov)RWX — RenderWare Object (.rwx)SIA —Nevercenter Silo Object (.sia)SIB —Nevercenter Silo Object (.sib)SKP — Google Sketchup file (.skp)SLDASM — SolidWorks Assembly Document (.sldasm)SLDPRT — SolidWorks Part Document (.sldprt)SMD — Valve Studiomdl Data format. (.smd)U3D —Universal 3D file format (.u3d)VIM —Revizto visual information model format (.vimproj)VRML97 — VRML Virtual reality modeling language (.wrl)VUE —Vue scene file (.vue)VWX — Vectorworks (.vwx)WINGS — Wings3D (.wings)W3D — Wes twood 3D Model (.w3d)X —DirectX 3D Model (.x)X3D — Extensible 3D (.x3d)Z3D — Zmodeler (.z3d)
Links and shortcuts[edit]Alias (Mac OS)JNLP — Java Network Launching Protocol, an XML file used by Java Web Start for starting Java applets over the Internet
LNK — binary‐format file shortcut in Microsoft Windows 95 and laterURL — INI file pointing to a URL bookmarks/Internet shortcut in Microsoft Windows.desktop —Desktop entry on Linux Desktop environments
Mathematical[edit]Harwell‐Boeing file format —a format designed to store sparse matricesMML —MathML — Mathematical Markup LanguageODF —OpenDocument Math FormulaSXM — OpenOffice.org XML (obsolete) Math Formula
Object code, executable files, shared and dynamically linked libraries[edit].8BF files — plugins for some photo editing programs including Adobe Photoshop, Paint Shop Pro, GIMP and Helicon Filter..a — Objective C native static librarya.out — (no suffix for executable image, .o for object files, .so for shared object files) classic UNIX object format, now often superseded by ELFAPK — Android PackageAPP — A folder found on Mac OS X systems containing program code and resources, appearing as a single file.BAC — an executable image for the RSTS/E system, created using the BASIC‐PLUS COMPILE command[5]BPL —a Win32 PE file created with Borland Delphi or C++Builder containing a package.Bundle — a Macintosh plugin created with Xcode or make which holds executable code, data files, and folders for that code..Class —used in JavaCOFF (no suffix for executable image, .o for object files) — UNIX Common Object File Format, now often superseded by ELFCOM files — commands used in DOSDCU —Delphi compiled unitDOL — the file format used by the Gamecube and Wii, short for Dolphin the codename of the Gamecube..EAR — archives of Java enterprise applicationsELF — (no suffix for executable image, .o for object files, .so for shared object files) used in many modern Unix and Unix‐like systems, including Solaris, other System V Release 4 derivatives, Linux, and BSD)expander (see bundle)DOS executable (.exe — used in DOS).IPA — apple IOS application executable file. Another form of zip file..JAR —archives of Java class files.XPI — PKZIP archive that can be run by Mozilla web browsers to install software)Mach‐O — (no suffix for executable image, .o for object files, .dylib and .bundle for shared object files) Mach based systems, notably native format of Mac OS X)NetWare Loadable Module (.NLM) — the native 32‐bit binaries compiled for Novell's NetWare Operating System (versions 3 and newer)New Executable (.EXE — used in multitasking ("European") MS‐DOS 4.0, 16‐bit Microsoft Windows, and OS/2).o — un‐linked object files directly from the compiler.Portable Executable (.EXE, .DLL —used in Microsoft Windows and some other systems)Preferred Executable Format — (Mac OS versions 9 and earlier; compatible with Mac OS X via the Classic emulator).s1es — Executable used for S1ES learning system..so — shared library, typically ELFValue Added Process (.VAP) — the native 16‐bit binaries compiled for Novell's NetWare Operating System (version 2, NetWare 286, Advanced NetWare, etc.).WAR —archives of Java Web applicationsXBE — Xbox executable.XAP — Windows Phone packageXCOFF — (no suffix for executable image, .o for object files, .a for shared object files) extended COFF, used in AIXXEX — Xbox 360 executable
Object extensions[edit].VBX — Visual Basic extensions.OCX — Object Control extensions.TLB —Windows Type Library
Page description language[edit]DVIEGT — Universal Document can be used to store CSS type styles (*.egt)PLDPCLPDF — Portable Document FormatPostScript (.ps, .ps.gz)SNP — Microsoft Access Report SnapshotXPSXSL‐FO (Formatting Objects)Configurations, MetadataCSS — Cascading Style SheetsXSLT, XSL — XML Style Sheet (.xslt, .xsl)TPL —Web template (.tpl)
Personal information manager[edit]Main article: Personal information managerMSG — Microsoft Outlook task managerORG — Lotus Organizer PIM packagePST — Microsoft Outlook email communicationSC2 — Microsoft Schedule+ calendar
Presentation[edit]GSLIDES — Google Drive PresentationKEY, KEYNOTE — Apple Keynote PresentationNB — Mathematica SlideshowNBP — Mathematica Player slideshowODP —OpenDocument PresentationOTP — OpenDocument Presentation templatePEZ — Prezi Desktop PresentationPOT — Microsoft PowerPoint templatePPS — Microsoft PowerPoint ShowPPT — Microsoft PowerPoint PresentationPPTX — Office Open XML PresentationPRZ — Lotus Freelance GraphicsSDD — StarOffice's StarImpressSHF — ThinkFree ShowSHOW —Haansoft(Hancom) Presentation software documentSHW — Corel Presentations slide show creationSLP — Logix‐4D Manager Show Control ProjectSSPSS — SongShow Plus Slide ShowSTI —OpenOffice.org XML (obsolete) Presentation templateSXI —OpenOffice.org XML (obsolete) PresentationTHMX — Microsoft PowerPoint theme templateWATCH — Dataton Watchout Presentation
Project management software[edit]Main article: Project management softwareMPP — Microsoft Project
Reference management software[edit]Main article: Reference management softwareFormats of files used for bibliographic information (citation) management.bib —BibTeXenl — EndNoteris — Research Information Systems RIS (file format)
Scientific data (data exchange)[edit]FITS (Flexible Image Transport System) — standard data format for astronomy (.fits)Silo — a storage format for visualization developed at Lawrence Livermore National LaboratorySPC — spectroscopic dataEAS3 —binary file format for structured dataOST (Open Spatio‐Temporal) — extensible, mainly images with related data, or just pure data; meant as an open alternative for microscope imagesCCP4 — X‐ray crystallography voxels (electron density)MRC — voxels in cryo‐electron microscopyHITRAN — spectroscopic data with one optical/infrared transition per line in the ASCII file (.hit)Simple Data Format (SDF) — a platform‐independent, precision‐preserving binary data I/O format capable of handling large, multi‐dimensional arrays.
Multi‐domain[edit]NetCDF — Network common data formatHDR, [HDF], h4 or h5 — Hierarchical Data FormatSDXF — (Structured Data Exchange Format)CDF — Common Data FormatCGNS —CFD General Notation SystemFMF ‐ Full‐Metadata Format
Meteorology[edit]GRIB — Grid In Binary, WMO format for weather model dataBUFR —WMO format for weather observation dataPP — UK Met Office format for weather model dataNASA‐Ames — Simple text format for observation data. First used in aircraft studies of the atmosphere.
Chemistry[edit]Main article: chemical file formatCML —Chemical Markup Language (CML) (.cml)Chemical table file (CTab) (.mol, .sd, .sdf)Joint Committee on Atomic and Molecular Physical Data (JCAMP) (.dx, .jdx)Simplified molecular input line entry specification (SMILES) (.smi)
Mathematics[edit]graph6, sparse6 —ASCII encoding of Adjacency matrices (.g6, .s6)
Biology[edit]Molecular biology and bioinformatics:AB1 — In DNA sequencing, chromatogram files used by instruments from Applied BiosystemsACE — A sequence assembly formatBAM — Binary compressed SAM formatBCF — Binary compressed VCF formatBED — The browser extensible display format is used for describing genes and other features of DNA sequencesCAF — Common Assembly Format for sequence assemblyEMBL —The flatfile format used by the EMBL to represent database records for nucleotide and peptide sequences from EMBL databasesFASTA — The FASTA file format, for sequence data. Sometimes also given as FNA or FAA (Fasta Nucleic Acid or Fasta Amino Acid).FASTQ — The FASTQ file format, for sequence data with quality. Sometimes also given as QUAL.GCPROJ — The Genome Compiler project. Advanced file format for genetic data to be designed, shared and visualized.GenBank — The flatfile format used by the NCBI to represent database records for nucleotide and peptide sequences from the GenBank and RefSeq databasesGFF — The General feature format is used for describing genes and other features of DNA, RNA and protein sequencesGTF — The Gene transfer format is used to hold information about gene structure.NEXUS — The Nexus file encodes mixed information about genetic sequence data in a block structured format.NWK — The Newick tree format is a way of representing graph‐theoretical trees with edge lengths using parentheses and commas and usefil to hold phylogenetic trees.PDB — structures of biomolecules deposited in Protein Data Bank. Also used for exchanging protein/nucleic acid structures.PHD —Phred output, from the basecalling software PhredSAM — Sequence Alignment/Map format, in which the results of the 1000 Genomes Project will be released.SCF — Staden chromatogram files used to store data from DNA sequencingSBML —The Systems Biology Markup Language is used to store biochemical network computational modelsSFF — Standard Flowgram FormatStockholm —The Stockholm format for representing multiple sequence alignmentsSwiss‐Prot — The flatfile format used to represent database records for protein sequences from the Swiss‐Prot databaseVCF — Variant Call Format, a standard created by the 1000 Genomes Project that lists and annotates the entire collection of human variants (with the exception of approximately 1.6 million variants).
Biomedical imaging[edit]Digital Imaging and Communications in Medicine (DICOM) (.dcm)Neuroimaging Informatics Technology Initiative (NIfTI).nii — single‐file (combined data and meta‐data) style.nii.gz — gzip‐compressed, used transparently by some software, notably the FMRIB Software Library (FSL).gii — single‐file (combined data and meta‐data) style; NIfTI offspring for brain surface data.img,.hdr — dual‐file (separate data and meta‐data, respectively) styleAFNI data, meta‐data (.BRIK,.HEAD)Massachusetts General Hospital imaging format, used by the FreeSurfer brain analysis package.MGH — uncompressed.MGZ — zip‐compressedAnalyze data, meta‐data (.img,.hdr)Signed Differential Mapping (SDM) brain maps and/or distributions (.sdm)Medical Imaging NetCDF (MINC) format, previously based on NetCDF; since version 2.0, based on HDF5 (.mnc)
Biomedical signals (time series)[edit]ACQ —AcqKnowledge File Format for Windows/PC from Biopac Systems Inc., Goleta, CA, USA.BCI2000 — The BCI2000 project, Albany, NY, USA.BDF — BioSemi data format from BioSemi B.V. Amsterdam, Netherlands.BKR — The EEG data format developed at the University of Technology Graz, Austria.CFWB — Chart Data File Format from ADInstruments Pty Ltd, Bella Vista NSW, Australia.DICOM — Waveform An extension of Dicom for storing waveform dataecgML — A markup language for electrocardiogram data acquisition and analysis.EDF/EDF+ — European Data Format.FEF — File Exchange Format for Vital signs, CEN TS 14271.GDF v1.x — The General Data Format for biomedical signals —Version 1.x.GDF v2.x — The General Data Format for biomedical signals —Version 2.x.HL7aECG — Health Level 7 v3 annotated ECG.MFER — Medical waveform Format Encoding RulesOpenXDF — Open Exchange Data Format from Neurotronics, Inc. Gainesville, FL, USA.SCP‐ECG — Standard Communication Protocol for Computer assisted electrocardiography EN1064:2007,SIGIF —A digital SIGnal Interchange Format with application in neurophysiology.WFDB — Format of PhysiobankXDF — eXtensible Data Format
Other Biomedical Formats[edit]Health Level 7 (HL7) —a framework for exchange, integration, sharing, and retrieval of electronic health informationxDT — a family of data exchange formats for medical records
Biometric Formats[edit]CBF — Common Biometric Format, based upon CBEFF 2.0 (Common Biometric Exchange Formats Framework).EBF — Extended Biometric Format. Based on CBF but with S/MIME encryption support and semantic extensionsCBFX — XML Common Biometric Format, based upon XCBF 1.1 (OASIS XML Common Biometric Format)EBFX — XML Extended Biometric Format, based on CBFX but with W3C XML Encryption support and semantic extensions
Script[edit]AHK —AutoHotkey script fileAPPLESCRIPT‐ applescript — See SCPT.AS — Adobe Flash ActionScript FileAU3 —AutoIt version 3BAT — Batch fileBAS — QBasic & QuickBASICCMD — Batch fileCoffee —CoffeeScriptEGG —ChickenEGT — EGT Asterisk Application Source File, EGT Universal DocumentERB — Embedded Ruby, Ruby on Rails Script FileHTA —HTML ApplicationIBI — Icarus scriptICI — ICIIJS — J script.ipynb ‐ IPython NotebookITCL — ItclJS — JavaScript and JScriptJSFL —Adobe JavaScript languageLUA — LuaM —Mathematica package fileMRC — mIRC ScriptNCF — NetWare Command File (scripting for Novell's NetWare OS)NUT — SquirrelPHP — PHPPHP? —PHP (? = version number)PL —PerlPM —Perl modulePS1 — Windows PowerShell shell scriptPS1XML — Windows PowerShell format and type definitionsPSC1 — Windows PowerShell console filePSD1 —Windows PowerShell data filePSM1 — Windows PowerShell module filePY — PythonPYC — Python byte code filesPYO — PythonR —R scriptsRB — RubyRDP — RDP connectionSCPT — ApplescriptSCPTD — See SCPT.SDL — State Description LanguageSH — Shell scriptTCL —TclVBS — Visual Basic ScriptXPL —XProc script/pipelineebuild — Gentoo linux's portage package.
Security[edit]Authentication and general encryption formats are listed here.
OpenPGP Message Format — used by Pretty Good Privacy, GNU Privacy Guard, and other OpenPGP software; can contain keys, signed data, or encrypted data; can be binary or text ("ASCII armored")Certificates and keys[edit]
GXK — Galaxkey, an encryption platform for authorized, private and confidential email communicationOpenSSH private key (.ssh) — Secure Shell private key; format generated by ssh‐keygen or converted from PPK with PuTTYgen[6][7][8]OpenSSH public key (.pub) — Secure Shell public key; format generated by ssh‐keygen or PuTTYgen[6][7][8]PuTTY private key (.ppk) — Secure Shell private key, in the file format generated by PuTTYgen instead of the format used by OpenSSH[6][7][8]
X.509[edit]Distinguished Encoding Rules (.cer, .crt, .der) — stores certificatesPKCS#7 SignedData (.p7b, .p7c) — commonly appears without main data, just certificates or certificate revokation lists (CRLs)PKCS#12 (.p12, .pfx) —can store public certificates and private keysPEM — Privacy‐enhanced Electronic Mail: full format not widely used, but often used to store Distinguished Encoding Rules in Base64 formatPFX — Microsoft predecessor of PKCS#12
Encrypted files[edit]This section shows file formats for encrypted general data, rather than a specific program's data.
AXX — encrypted file, created with AxcryptEEA — An encrypted CAB, ostensibly for protecting email attachmentsTC — virtual encrypted disk container, created by TrueCrypt
Password files[edit]Password files (sometimes called keychain files) contain lists of other passwords, usually encrypted.BPW — encrypted password file created by Bitser password managerKeePass 1 database (.kdb)KeePass 2 database (.kdbx)
Signal data (non‐audio)[edit]ACQ —AcqKnowledge File Format for Windows/PC from BiopacBKR — The EEG data format developed at the University of Technology GrazBDF — BioSemi data format — similar to EDF but 24bitCFG — Configuration file for Comtrade dataCFWB — Chart Data File Format from ADInstrumentsDAT —Raw data file for Comtrade dataEDF — European data formatFEF — File Exchange Format for Vital signsGDF —General data formats for biomedical signalsGMS — Gesture And Motion Signal formatIROCK — intelliRock Sensor Data File FormatMFER — Medical waveform Format Encoding RulesSCP‐ECG — Standard Communication Protocol for Computer assisted electrocardiography
SEG Y — Reflection seismology data formatSIGIF — SIGnal Interchange Format
Sound and music[edit]Lossless audio[edit]Uncompressed8SVX —Commodore‐Amiga 8‐bit sound (usually in an IFF container)16SVX —Commodore‐Amiga 16‐bit sound (usually in an IFF container)AIFF (.aif, .aifc, .aiff) —Audio Interchange File FormatAUBWF — Broadcast Wave Format (BWF), an extension of WAVECDDARAW — raw samples without any header or syncWAV —Microsoft WaveCompressedFLAC — (free lossless codec of the Ogg project)LA — Lossless Audio (.la)PAC — LPAC (.pac)M4A — Apple Lossless (M4A)APE — Monkey's Audio (APE)OptimFROG (.ofr, .ofs, .off)RKA — RKAU (.rka)SHN — Shorten (SHN)TAK — Tom's Lossless Audio Kompressor (TAK)[9]TTA — free lossless audio codec (True Audio)WV — WavPack (.wv)WMA — Windows Media Audio 9 Lossless (WMA)BRSTM — Binary Revolution Stream (.brstm)[10]DTS/DTSHD/DTSMA — DTS (sound system)AST — Audio Stream (.ast)[11]
Lossy audio[edit]AMR — for GSM and UMTS based mobile phonesMP2 — MPEG Layer 2MP3 — MPEG Layer 3SPX — Speex (Ogg project, specialized for voice, low bitrates)GSM — GSM Full Rate, originally developed for use in mobile phonesWMA — Windows Media Audio (.WMA)AAC (.m4a, .mp4, .m4p, .aac) —Advanced Audio Coding (usually in an MPEG‐4 container)MPC — MusepackVQF —Yamaha TwinVQRealAudio (RA, RM)OTS — Audio File (similar to MP3, with more data stored in the file and slightly better compression; designed for use with OtsLabs' OtsAV)SWA — Macromedia Shockwave Audio (Same compression as MP3 with additional header information specific to Macromedia DirectorVOX —Dialogic ADPCM Low Sample Rate Digitized Voice (VOX)VOC —Creative Labs Soundblaster Creative Voice 8‐bit & 16‐bit (VOC)DWD —DiamondWare Digitized (DWD)SMP — Turtlebeach SampleVision (SMP)
Other music[edit]AUP —Audacity project fileCUST —DeliPlayer custom sound file formatMID — standard MIDI file; most often just notes and controls but occasionally also sample dumpsMUS — Finale Notation file, see also Finale (software)SIB — Sibelius Notation file, see also Sibelius (computer program)SID — Sound Interface Device — Commodore 64 instructions to play SID music and sound effectsLY — LilyPond Notation file, see also GNU LilyPondGYM — Sega Genesis YM2612 logVGM — stands for "Video Game Music", log for several different chipsPSFNSF — NES Sound Format, bytecode program to play NES musicMOD — Soundtracker and Protracker sample and melody modulesPTB — Power Tab Editor tabS3M — Scream Tracker 3 module, with a few more effects and a dedicated volume columnXM — Fast Tracker module, adding instrument envelopesIT — Impulse Tracker module, adding compressed samples, note‐release actions, and more effects including a resonant filterMT2 — MadTracker 2 module. It could be resumed as being XM and IT combined with more features like track effects and automation.MNG —BGM for the Creatures game series, starting from Creatures 2PSF — Portable Sound Format, PlayStation variant (originally PlayStation Sound Format).minipsf, psflib — Multipart PSF2sf, dsf, gsf, psf2, qsf, ssf, usf — PSF for other platformsRMJ —RealJukebox Media used for RealPlayer.SPC — Super Nintendo Entertainment System sound file format.NIFF — Notation Interchange File FormatMusicXML (.mxl, .xml)TXM — Track ax mediaYM —Atari ST/Amstrad CPC YM2149 sound chip formatJAM — Jam music formatASF — Advanced Systems FormatMP1 — for use with UltraPlayerMSCZ — Musescore compressed fileMSCZ, —Musescore uncompressed file
Playlists[edit]ASX — Advanced Stream Redirector (.asx)M3UPLSRAM — Real Audio Metafile For RealAudio files only.TXT/No extension — Mplayer playlistXPL —HDi playlistXSPF — the XML Shareable Playlist FormatZPL —Xbox Music (Formerly Zune) Playlist format from Microsoft
Audio editing, music production[edit]ALS — Ableton Live setAUP —Audacity project fileBAND — GarageBand project fileCEL —Adobe Audition loop file (Cool Edit Loop)CPR — Steinberg Cubase project fileCWP — Cakewalk Sonar project fileDRM — Steinberg Cubase drum fileMMR —MAGIX Music Maker project fileNPR — Steinberg Nuendo project fileOMFI —Open Media Framework Interchange OMFI succeeds OMF (Open Media Framework)SES — Adobe Audition multitrack session fileSFL — Sound Forge sound fileSNG —MIDI sequence file (MidiSoft, Korg, etc.) or n‐Track Studio project fileSTF — StudioFactory project file. It contains all necessary patches, samples, tracks and settings to play the file.SND —Akai MPC sound fileSYN — SynFactory project file. It contains all necessary patches, samples, tracks and settings to play the file.FLP — Image Line Fruity Loops Project fileVCLS — VocaListener project fileVSQ —Vocaloid 2 Editor sequence excluding wave‐fileVSQX —Vocaloid 3 Editor sequence excluding wave‐file
Source code for computer programs[edit](see also: Script)
ADA, ADB, 2.ADA —Ada (body) sourceADS, 1.ADA — Ada (specification) sourceASM, S —Assembly language sourceBAS — BASIC, FreeBASIC, Visual Basic, BASIC‐PLUS source,[5] PICAXE basicBB — Blitz3DBMX — BlitzMaxC —C sourceCLJ —Clojure source codeCLS —Visual Basic classCOB, CBL —COBOL sourceCPP, CC, CXX, C, CBP — C++ sourceCS — C# sourceCSPROJ —C# project (Visual Studio .NET)D — D sourceDBA —DarkBASIC sourceDBPro123 — DarkBASIC Professional projectE —Eiffel sourceEFS — EGT Forever Source FileEGT — EGT Asterisk Source File, could be J, C#, VB.net, EF 2.0 (EGT Forever)EL —Emacs Lisp sourceFOR, FTN, F, F77, F90 — Fortran sourceFRM — Visual Basic formFRX — Visual Basic form stash file (binary form file)FTH — Forth sourceGED —Game Maker Extension Editable file as of version 7.0GM6 — Game Maker Editable file as of version 6.xGMD —Game Maker Editable file up to version 5.xGMK — Game Maker Editable file as of version 7.0GML — Game Maker Language script fileGO —Go sourceH — C/C++ header fileHPP, HXX — C++ header fileHS — Haskell sourceI — SWIG interface fileINC —Turbo Pascal included sourceJAVA — Java sourceL — lex sourceLGT — Logtalk sourceLISP — Common Lisp sourceM —Objective‐C sourceM —MATLABM —MathematicaM4 —m4 sourceML — Standard ML / OCaml sourceN — Nemerle sourceNB — Nuclear Basic sourceP —Parser sourcePAS, PP, P — Pascal source (DPR for projects)PHP, PHP3, PHP4, PHP5, PHPS, Phtml — PHP sourcePIV —Pivot stickfigure animatorPL, PM — PerlPRG — db, clipper, Microsoft FoxPro, harbour and XbasePRO — IDLPOL — Apcera Policy Language docletPY — Python sourceR —R sourceRED — Red sourceREDS —Red/System sourceRB — Ruby sourceRESX — Resource file for .NET applicationsRC, RC2 —Resource script files to generate resources for .NET applicationsRKT, RKTL —Racket sourceSCALA — Scala sourceSCI, SCE — ScilabSCM — Scheme sourceSD7 — Seed7 sourceSKB, SKC — Sage Retrieve 4GL Common Area (Main and Amended backup)SKD — Sage Retrieve 4GL DatabaseSKF, SKG — Sage Retrieve 4GL File Layouts (Main and Amended backup)SKI — Sage Retrieve 4GL InstructionsSKK — Sage Retrieve 4GL Report GeneratorSKM — Sage Retrieve 4GL MenuSKO — Sage Retrieve 4GL ProgramSKP, SKQ — Sage Retrieve 4GL Print Layouts (Main and Amended backup)SKS, SKT — Sage Retrieve 4GL Screen Layouts (Main and Amended backup)SKZ — Sage Retrieve 4GL Security FileSLN — Visual Studio solutionSPIN — Spin source (for Parallax Propeller microcontrollers)STK — Stickfigure file for Pivot stickfigure animatorSWG — SWIG source codeTCL —TCL source codeVAP —Visual Studio Analyzer projectVB — Visual Basic.NET sourceVBG —Visual Studio compatible project groupVBP, VIP —Visual Basic projectVBPROJ — Visual Basic .NET projectVCPROJ — Visual C++ projectVDPROJ — Visual Studio deployment projectXPL —XProc script/pipelineXQ — XQuery fileXSL —XSLT stylesheetY — yacc source
Spreadsheet[edit]123 — Lotus 1‐2‐3AB2 — Abykus worksheetAB3 — Abykus workbookAWS — Ability SpreadsheetCLF —ThinkFree CalcCELL —Haansoft(Hancom) SpreadSheet software documentCSV — Comma‐Separated ValuesGSHEET — Google Drive Spreadsheetnumbers —An Apple Numbers Spreadsheet filegnumeric — Gnumeric spreadsheet, a gziped XML fileODS —OpenDocument spreadsheetOTS — OpenDocument spreadsheet templateQPW — Quattro Pro spreadsheetSDC — StarOffice StarCalc SpreadsheetSLK — SYLK (SYmbolic LinK)STC — OpenOffice.org XML (obsolete) Spreadsheet templateSXC — OpenOffice.org XML (obsolete) SpreadsheetTAB — tab delimited columns; also TSV (Tab‐Separated Values)TXT — text fileVC — VisicalcWK1 — Lotus 1‐2‐3 up to version 2.01WK3 — Lotus 1‐2‐3 version 3.0WK4 — Lotus 1‐2‐3 version 4.0WKS — Lotus 1‐2‐3WKS — Microsoft WorksWQ1 — Quattro Pro DOS versionXLK —Microsoft Excel worksheet backupXLS —Microsoft Excel worksheet sheet (97–2003)XLSB — Microsoft Excel binary workbookXLSM —Microsoft Excel Macro‐enabled workbookXLSX — Office Open XML worksheet sheetXLR —Microsoft Works version 6.0XLT —Microsoft Excel worksheet templateXLTM —Microsoft Excel Macro‐enabled worksheet templateXLW —Microsoft Excel worksheet workspace (version 4.0)
Tabulated data[edit]TSV — Tab‐separated valuesCSV — Comma‐separated valuesdb — databank format; accessible by many econometric applicationsdif — accessible by many spreadsheet applications
Video[edit]Main article: video file formatAAF —mostly intended to hold edit decisions and rendering information, but can also contain compressed media essence3GP — the most common video format for cell phonesGIF —Animated GIF (simple animation; until recently often avoided because of patent problems)ASF — container (enables any form of compression to be used; MPEG‐4 is common; video in ASF‐containers is also called Windows Media Video (WMV))AVCHD —Advanced Video Codec High DefinitionAVI — container (a shell, which enables any form of compression to be used)CAM — aMSN webcam log fileDAT — video standard data file (automatically created when we attempted to burn as video file on the CD)DSHFLV — Flash video (encoded to run in a flash animation)M1V MPEG‐1 — VideoM2V MPEG‐2 — VideoFLA — Macromedia Flash (for producing)FLR — (text file which contains scripts extracted from SWF by a free ActionScript decompiler named FLARE)SOL — Adobe Flash shared object ("Flash cookie")M4V — (file format for videos for iPods and PlayStation Portables developed by Apple)Matroska (*.mkv) —Matroska is a container format, which enables any video format such as MPEG‐4 ASP or AVC to be used along with other content such as subtitles and detailed meta informationWRAP — MediaForge (*.wrap)MNG —mainly simple animation containing PNG and JPEG objects, often somewhat more complex than animated GIFQuickTime (.mov) — container which enables any form of compression to be used; Sorenson codec is the most common; QTCH is the filetype for cached video and audio streamsMPEG (.mpeg, .mpg, .mpe)MPEG‐4 Part 14, shortened "MP4" — multimedia container (most often used for Sony's PlayStation Portable and Apple's iPod)MXF — Material Exchange Format (standardized wrapper format for audio/visual material developed by SMPTE)ROQ —used by Quake 3NSV —Nullsoft Streaming Video (media container designed for streaming video content over the Internet)Ogg — container, multimediaRM —RealMediaSVI — Samsung video format for portable playersSMI — SAMI Caption file (HTML like subtitle for movie files)SWF — Macromedia Flash (for viewing)WMV — Windows Media Video (See ASF)[YUV] ‐ Raw video format ‐ Resolution (horizontal x vertical)and Sample structure 4:2:2 or 4:2:0 need to be know explicitly
Video editing, production[edit]FCP — Final Cut Pro project fileMSWMM — Windows Movie Maker project filePPJ & PRPROJ—Adobe Premiere Pro video editing fileIMOVIEPROJ — iMovie project fileVEG & VEG‐BAK — Sony Vegas project fileSUF — Sony camera configuration file (setup.suf) produced by XDCAM‐EX camcordersWLMP —Windows Live Movie Maker project file
Video game data[edit]List of common file formats of data for video games on systems that support filesystems, most commonly PC games.
TrackMania United/Nations Forever Engine — File formats used by games based on the TrackMania engine.XeXCHALLENGE.GBX — (Edited) Challenge files.CONSTRUCTIONCAMPAIGN.GBX —Construction campaignes files.CONTROLEFFECTMASTER.GBX/CONTROLSTYLE.GBX —Menu parts.FIDCACHE.GBX — Saved game.GBX — Other TrackMania items.REPLAY.GBX — Replays of races.
DOOM Engine — File formats used by games based on the DOOM engine.DEH —DeHackEd files to mutate the game executable (not officially part of the DOOM engine)DSG — Saved gameLMP —A lump is an entry in a DOOM wad.LMP — Saved demo recordingMUS — Music file (usually contained within a WAD file)WAD —Data storage (contains music, maps, and textures)Quake Engine — File formats used by games based on the Quake engine.BSP — (For Binary space partitioning) compiled map formatMAP — Raw map format used by editors like GtkRadiant or QuArKMDL/MD2/MD3/MD5 —Model for an item used in the gamePAK/PK2 —Data storagePK3/PK4 — used by the Quake II, Quake III Arena and Quake 4 game engines, respectively, to store game data, textures etc. They are actually .zip files..dat — not specific file type, often generic extension for "data" files for a variety of applicationssometimes used for general data contained within the .PK3/PK4 files.fontdat — a .dat file used for formatting game fonts.roq— Video format.sav — Savegame formatUnreal Engine — File formats used by games based on the Unreal engine.U — Unreal script formatUAX —Animations format for Unreal Engine 2UMX — Map format for Unreal TournamentUMX — Music format for Unreal Engine 1UNR —Map format for UnrealUPK — Package format for cooked content in Unreal Engine 3USX — Sound format for Unreal Engine 1 and Unreal Engine 2UT2 — Map format for Unreal Tournament 2003 and Unreal Tournament 2004UT3 — Map format for Unreal Tournament 3UTX — Texture format for Unreal Engine 1 and Unreal Engine 2UXX — Cache format. These are files that client downloaded from server (which can be converted to regular formats)Duke Nukem 3D Engine — File formats used by games based on the Duke Nukem 3D engine.DMO — Save gameGRP — Data storageMAP — Map (usually constructed with BUILD.EXE)Diablo Engine — File formats used by Diablo by Blizzard Entertainment.SV — Save GameITM — Item FileReal Virtuality Engine — File formats used by Bohemia Interactive. Operation:Flashpoint, ARMA 2, VBS2SQF — Format used for general editingSQM — Format used for mission filesPBO — Binarized file used for compiled modelsLIP — Format that is created from WAV files to create in‐game accurate lip‐synch for character animations.Other FormatsB —used for Grand Theft Auto saved game filesBOL —used for levels on Poing!PCDBPF —The Sims 2, DBPF, PackageHE0, HE2, HE4 HE games FileGCF — format used by the Steam content management system for file archivesIMG — format used by Renderware‐based Grand Theft Auto games for data storageMAP — format used by Halo: Combat Evolved for archive compression, Doom³, and various other gamesNBT — format used by Minecraft for storing program variables along with their (Java) type identifiersOEC — format used by OE‐Cake for scene data storageP3D — format for panda3d by DisneyPOD — format used by Terminal RealityREP — used by Blizzard Entertainment for scenario replays in StarCraft.Simcity 4, DBPF (.dat, .SC4Lot, .SC4Model) —All game plugins use this format, commonly with different file extensionsSMZIP —ZIP‐based package for Stepmania songs, themes and announcer packs.VVVVVV — format used by VVVVVV[12]
Video game storage media[edit]List of the most common filename extensions used when a game's ROM image or storage medium is copied from an original ROM device to an external memory such as hard disk for back up purposes or for making the game playable with an emulator. In the case ofcartridge‐based software, if the platform specific extension is not used then filename extensions ".rom" or ".bin" are usually used to clarify that the file contains a copy of a content of a ROM. ROM, disk or tape images usually do not consist of a single file or ROM, rather an entire file or ROM structure contained within a single file on the backup medium.[13]
JAG,J64 — Atari Jaguar (.jag, .j64)BIN —Wii (.bin)GCM — GameCube (.gcm)NDS —Nintendo DS (.nds)GB — Game Boy (.gb) (this applies to the original Game Boy and the Game Boy Color)GBC — Game Boy Color (.gbc)GBA —Game Boy Advance (.gba)GBA —Game Boy Advance (.gba)SAV —Game Boy Advance Saved Data Files (.sav)SGM — Visual Boy Advance Save States (.sgm)N64, V64, Z64, U64, USA, JAP, PAL, EUR, BIN — Nintendo 64 (.n64, .v64, .z64, .u64, .usa, .jap, .pal, .eur, .bin)PJ —Project 64 Save States (.pj)NES — Nintendo Entertainment System (.nes)FDS — Famicom Disk System (.fds)JST — Jnes Save States (.jst)FC? — FCEUX Save States (.fc#, where # is any character, usually a number)GG — Sega Game Gear (.gg)SMS — Sega Master System (.sms)SMD,BIN — Mega Drive/Sega Genesis (.smd or .bin)SMC,078,SFC — Super NES (.smc, .078, or .sfc) (.078 is for split ROMs, which are rare)FIG — Super Famicom (Japanese releases are rarely .fig, above extensions are more common)SRM — Super NES Saved Data Files (.srm)ZST — ZSNES Save States (.zst, .zs1‐.zs9, .z10‐.z99)FRZ — Snes9X Save States (.frz, .000‐.008)PCE — TurboGrafx‐16/PC Engine (.pce)NPC — Neo Geo Pocket (.npc)TZX — ZX Spectrum (.tzx) (for exact copies of ZX Spectrum games)TAP (for tape images without copy protection)Z80,SNA — (for snapshots of the emulator RAM)DSK — (for disk images)TAP — Commodore 64 (.tap) (for tape images including copy protection)T64 — (for tape images without copy protection, considerably smaller than .tap files)D64 — (for disk images)CRT — (for cartridge images)ADF —Amiga (.adf) (for 880K diskette images)ADZ —GZip‐compressed version of the above.DMS — Disk Masher System, previously used as a disk‐archiving system native to the Amiga, also supported by emulators.
Virtual machines[edit]Microsoft Virtual PC, Virtual Server[edit]VFD —Virtual Floppy Disk (.vfd)VHD — Virtual Hard Disk (.vhd)VUD — Virtual Undo Disk (.vud)VMC — Virtual Machine Configuration (.vmc)VSV —Virtual Machine Saved State (.vsv)
EMC VMware ESX, GSX, Workstation, Player[edit]LOG — Virtual Machine Logfile (.log)VMDK, DSK —Virtual Machine Disk (.vmdk, .dsk)NVRAM — Virtual Machine BIOS (.nvram)VMEM — Virtual Machine paging file (.vmem)VMSD —Virtual Machine snapshot metadata (.vmsd)VMSN —Virtual Machine snapshot (.vmsn)VMSS,STD —Virtual Machine suspended state (.vmss, .std)VMTM — Virtual Machine team data (.vmtm)VMX,CFG —Virtual Machine configuration (.vmx, .cfg)VMXF — Virtual Machine team configuration (.vmxf)
Virtualbox[edit]VDI — VirtualBox Virtual Disk Image (.vdi)
Parallels Workstation[edit]Main article: Parallels WorkstationHDD — Virtual Machine hard disk (.hdd)PVS — Virtual Machine preferences/configuration (.pvs)SAV —Virtual Machine saved state (.sav)
Qemu[edit]COW — Copy‐on‐writeQCOW —QEMU copy‐on‐write QcowQCOW2 — QEMU copy‐on‐write — version 2 QcowQED —QEMU enhanced disk format
Webpage[edit]Staticdtd, Document Type Definition (standard), MUST be public and freeHTML — (.html, .htm) — HyperText Markup LanguageXHTML — (.xhtml, .xht) — eXtensible HyperText Markup LanguageMHTML — (.mht, .mhtml) —Archived HTML, store all data on one web page (text, images, etc.) in one big fileMAF — (.maff) —web archive based on ZIPDynamically generatedASP — (.asp) —Microsoft Active Server PageASPX — (.aspx) —Microsoft Active Server Page. NETADP —AOLserver Dynamic PageBML — (.bml) — Better Markup Language (templating)CFM — (.cfm) —ColdFusionCGI — (.cgi)iHTML — (.ihtml) — Inline HTMLJSP — (.jsp) JavaServer PagesLasso — (.las, .lasso, .lassoapp) — A file created or served with the Lasso Programming LanguagePL —Perl (.pl)PHP — (.php, .php?, .phtml) — ? is version number (previously abbreviated Personal Home Page, later changed to PHP: Hypertext Preprocessor)RNA — (.rna) —Real Native Application FileR — (.r) —Real Native Application File (short alternative)RNX — (.rnx) — Real Native Application File (using experimental version 6 of RNA/Karbon Language)SSI — (.shtml) —HTML with Server Side Includes (Apache)SSI — (.stm) — HTML with Server Side Includes (Apache)
Markup languages other web standards‐based file formats[edit]Atom — (.atom, .xml) —Another syndication file formatEML — (.eml) — File format used by several desktop email clientsJSON‐LD — (.jsonld) — A JSON‐based Serialization for Linked DataMetalink — (.metalink, .met) — A file format for listing metadata about downloads, such as mirrors, checksums, and other information.RSS — (.rss, .xml) — Syndication file formatMarkdown — (.markdown, .md) — A light‐weight, plain‐text, easy to read and write markup language.Shuttle — (.se) — lightweight markup language
Other[edit]AXD — cookie extensions found in temporary internet folderSKP — SketchUp fileBDF — Binary Data Format — raw data from recovered blocks of unallocated space on a hard driveCBP — CD Box Labeler Pro, CentraBuilder, Code::Blocks Project File, Conlab Project[14]CEX — SolidWorks Enterprise PDM Vault FileCREDX —CredX Dat FileDUPX —DuupeCheck database management tool project fileGA3 —Graphical Analysis 3GEDCOM (.ged) — (GEnealogical Data COMmunication) file format for exchanging genealogical data between different genealogy softwareHLP — Windows help fileIGC — flight tracks downloaded from GPS devices in the FAI's prescribed formatINF — similar file format to INI file; used to install device drivers under Windows, inter alia.JAM — JAM Message Base Format for BBSesKMC — tests made with KatzReview's MegaCrammerLSM — LSMaker script file (program using layered .jpg to create special effects; specifically designed to render lightsabers from the Star Wars universe) (.lsm)OER — AU OER TOOL, Open Educational Resource editorPIF —Used for running MS‐DOS programs under WindowsPOR — So called "portable" SPSS files, readable by PSPPPXZ — Compressed file to exchange media elements with PSALMORISE — File containing RISE generated information model evolutionTOPC —TopicCrunchSEO Project file holding keywords, domain and search engine settings (ASCII);TOS — Character file from The Only SheetXLF —Extensible LADAR FormatXMC — Assisted contact lists file format, based on xml and used in kindergartens and schoolsZED — My Heritage Family TreeZone file —a text file containing a DNS zone
Cursors[edit]ANI — Animated CursorCUR —Cursor fileSmes — Hawk's Dock configuration file
Generalized files[edit]General data formats[edit]These file formats are fairly well defined by long‐term use or a general standard, but the content of each file is often highly specific to particular software or has been extended by further standards for specific uses.
Text‐based[edit]CSV — comma‐separated valuesHTML — hyper text markup languageCSS — cascading style sheetsINI — a configuration text file whose format is substantially similar between applicationsJSON —an openly used data file format used by many languages, not just JavaScriptLTSV — labeled tab‐separated valuesTSV — tab‐separated valuesXML —an open data file formatYAML — an open data file format
Generic file extensions[edit]These are filename extensions and broad types reused frequently with differing formats or no specific format by different programs.
Binary files[edit]Bak file (.bak, .bk) — various backup formats: some just copies of data files, some in application‐specific data backup formats, some formats for general file backup programsBIN —binary data, often memory dumps of executable code or data to be re‐used by the same software that originated itDAT —data file, usually binary data proprietary to the program that created itDSK — file representations of various disk storage imagesRAW — raw (unprocessed) data
Text files[edit]configuration file (.cnf, .conf, .cfg) — substantially software‐specificlogfiles (.log) —usually text, but sometimes binaryplain text (.asc or .txt) — human‐readable plain text, usually no more specific
Partial files[edit]
Differences and patches[edit]diff — text file differences created by the program diff and applied as updates by patch
Incomplete transfers[edit]!UT —partially completed downloads in uTorrentPART (.part) — partially completed downloads in Mozilla FirefoxCRDOWNLOAD (.crdownload) ‐ partially completed downloads in Google Chrome
Temporary files[edit]Temporary file (.temp, .tmp, various others) — sometimes in a specific format, but often just raw data in the middle of processingPseudo‐pipeline file — used to simulate a software pipe
http://en.wikipedia.org/wiki/List_of_file_formats ‐ accessed Jan 30 2015
![Page 49: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/49.jpg)
Data Formats
Suggested Preservation File formats: www.digitalpreservation.gov/formats/
• Text: doc, docx, rtf, odt, pages
• Tabular: xls, xlsx, numbers, dbf
• Stat: sas, jmp, rdata
• Images: jpg, tiff, svg, png, gif, bmp
• Geographic: shp, geotiff, kml, kmz, gdb
• Video: mp4, mov, avi, ogg
• Music: mp3, wav, m4a, aiff
• Plain text: txt, csv, json, html, xml
![Page 50: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/50.jpg)
Data Formats
Suggested Preservation File formats: www.digitalpreservation.gov/formats/
General formats• proprietary• mixed • open
• Text: doc, docx, rtf, odt, pages
• Tabular: xls, xlsx, numbers, dbf
• Stat: sas, jmp, rdata
• Images: jpg, tiff, svg, png, gif, bmp
• Geographic: shp, geotiff, kml, kmz, gdb
• Video: mp4, mov, avi, ogg
• Music: mp3, wav, m4a, aiff
• Plain text: txt, csv, json, html, xml
![Page 51: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/51.jpg)
Data Formats
Suggested Preservation File formats: www.digitalpreservation.gov/formats/
General formats• proprietary• mixed • open
• Text: doc, docx, rtf, odt, pages
• Tabular: xls, xlsx, numbers, dbf
• Stat: sas, jmp, rdata
• Images: jpg, tiff, svg, png, gif, bmp
• Geographic: shp, geotiff, kml, kmz, gdb
• Video: mp4, mov, avi, ogg
• Music: mp3, wav, m4a, aiff
• Plain text: txt, csv, json, html, xml
Compressionlossydependslossless
![Page 52: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/52.jpg)
Metadata
• Describe the data so that a well informed researcher in your field can use your data without talking to you
• Make a README file (this can be XML if you are good)• Title• Brief description or abstract• Creator• Contact information• Date created• Instruments used (sensor, survey, laboratory, etc.)• Process steps or data levels• File formats and software needed to open the data files• Assessment of error
• For all tabular data make a Data Dictionary• describe the contents of all column headers (units, calculations, abstractions, and so on)
![Page 53: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/53.jpg)
Metadata
![Page 54: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/54.jpg)
![Page 55: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/55.jpg)
Data
![Page 56: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/56.jpg)
![Page 57: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/57.jpg)
(some) Metadata Examples• Dublin Core (a common library standard)
– http://dublincore.org/
• National Science Digital Library Metadata Guidelines– http://nsdl.org/contribute/metadata‐guide
• Darwin Core (biological specimens)– http://rs.tdwg.org/dwc/– Examples: http://code.google.com/p/darwincore/wiki/Examples
• GeoInformation Standards (map data) – http://www.fgdc.gov/metadata/geospatial‐metadata‐standards– Top 10 Errors: https://www.fgdc.gov/metadata/documents/top10metadataerrors.pdf
• Social Sciences (mostly for survey data)– http://www.ddialliance.org– Tools/Resources: http://www.ddialliance.org/resources/tools
• GeoScience Markup Language (GeoSciML)– https://marinemetadata.org/references/geosciml
• NetCDF– http://www.unidata.ucar.edu/software/netcdf/
• Ecology (ecological metadata language)– http://knb.ecoinformatics.org/software/eml/– Example & Explanation of EML: http://knb.ecoinformatics.org/eml_metadata_guide.html
![Page 58: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/58.jpg)
Version Control and Sharing Strategies
• For programmers only??• Github: http://github.com• Bitbucket: http://bitbucket.org
• Project Management (for fee services)• Basecamp: http://basecamp.com• Teamwork: http://teamwork.com
• Again, if you start a sequence of versions, add leading zeros• Version0001.docx• 20150123_Version_0013.docx
• Looking at differences in file versions (text only):• PC: Windiff: http://www.grigsoft.com/download‐windiff.htm• Mac: Textwrangler: http://www.barebones.com/products/textwrangler/
![Page 59: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/59.jpg)
Data Repositories
Ainsley Seago, PLoS Biologyhttp://www.kavlifoundation.org/sites/default/files/image/resources/2014_SL_Neuro_Cartoon.jpg
“Sharing data from one laboratory to another—or even within a laboratory—
takes time and effort, but there are also psychological, cultural and
technological barriers to doing so.”
http://www.kavlifoundation.org/science‐spotlights/breaking‐down‐data‐barriers‐neuroscience
![Page 60: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/60.jpg)
Repository Directories
• Directory of Research Data Repositories• http://databib.org/
• The Directory of Open Access Repositories• http://www.opendoar.org/
• Registry of Research Data Repositories• http://www.re3data.org/
• The Open Access Directory – Disciplinary Repositories• http://oad.simmons.edu/oadwiki/Disciplinary_repositories
![Page 61: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/61.jpg)
Disciplinary Repositories
• U.S. National Library of Medicine• http://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html
• Inter‐University Consortium for Political and Social Research• http://www.icpsr.umich.edu/
• Data Observation Network for Earth• https://www.dataone.org/
• Chemistry ‐ ChemSpider• http://www.chemspider.com/About.aspx
• Biology and Life Sciences ‐ Dryad• http://www.datadryad.org/repo/
• Physics – HEP data ‐ high‐energy physics reaction database• http://hepdata.cedar.ac.uk/
These are just some examples, see previous slide for more
![Page 62: The Data Management Challenge - University of Miami · PDF file · 2015-04-13The Data Management Challenge: ... • useCamelCasing.docx • use_underscores.txt ... • Use the default](https://reader034.fdocuments.us/reader034/viewer/2022051010/5abb85237f8b9a24028cbcc5/html5/thumbnails/62.jpg)
Data Visualization
• Datawrapper (open source)• https://datawrapper.de/
• Tableau (commercial)• http://www.tableau.com/
• R (open source)• http://www.r‐project.org/
• http://ggplot2.org/
• GoogleVis (open source)• https://code.google.com/p/gvis/
• D3 (open source)• http://d3js.org/