filetype.js
Detect the file type of a Buffer/Uint8Array/ArrayBuffer
The file type is detected by checking the magic number of the buffer.
This package is for detecting binary-based file formats, not text-based formats like .txt
, .csv
, .svg
, etc.
Installation
$ npm install @jedithepro/filetype.js
Usage
Node.js
Determine file type from a file:
const FileType = require('@jedithepro/filetype.js');
(async () => {
console.log(await FileType.fromFile('Unicorn.png'));
//=> {ext: 'png', mime: 'image/png'}
})();
Determine file type from a Buffer, which may be a portion of the beginning of a file:
const FileType = require('@jedithepro/filetype.js');
const readChunk = require('read-chunk');
(async () => {
const buffer = readChunk.sync('Unicorn.png', 0, 4100);
console.log(await FileType.fromBuffer(buffer));
//=> {ext: 'png', mime: 'image/png'}
})();
Determine file type from a stream:
const fs = require('fs');
const FileType = require('@jedithepro/filetype.js');
(async () => {
const stream = fs.createReadStream('Unicorn.mp4');
console.log(await FileType.fromStream(stream));
//=> {ext: 'mp4', mime: 'video/mp4'}
}
)();
The stream method can also be used to read from a remote location:
const got = require('got');
const FileType = require('@jedithepro/filetype.js');
const url = 'https://upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg';
(async () => {
const stream = got.stream(url);
console.log(await FileType.fromStream(stream));
//=> {ext: 'jpg', mime: 'image/jpeg'}
})();
Another stream example:
const stream = require('stream');
const fs = require('fs');
const crypto = require('crypto');
const FileType = require('@jedithepro/filetype.js');
(async () => {
const read = fs.createReadStream('encrypted.enc');
const decipher = crypto.createDecipheriv(alg, key, iv);
const fileTypeStream = await FileType.stream(stream.pipeline(read, decipher));
console.log(fileTypeStream.fileType);
//=> {ext: 'mov', mime: 'video/quicktime'}
const write = fs.createWriteStream(`decrypted.${fileTypeStream.fileType.ext}`);
fileTypeStream.pipe(write);
})();
API
FileType.fromBuffer(buffer)
Detect the file type of a Buffer
, Uint8Array
, or ArrayBuffer
.
The file type is detected by checking the magic number of the buffer.
If file access is available, it is recommended to use FileType.fromFile()
instead.
Returns a Promise
for an object with the detected file type and MIME type:
-
ext
- One of the supported file types -
mime
- The MIME type
Or undefined
when there is no match.
buffer
Type: Buffer | Uint8Array | ArrayBuffer
A buffer representing file data. It works best if the buffer contains the entire file, it may work with a smaller portion as well.
FileType.fromFile(filePath)
Detect the file type of a file path.
The file type is detected by checking the magic number of the buffer.
Returns a Promise
for an object with the detected file type and MIME type:
-
ext
- One of the supported file types -
mime
- The MIME type
Or undefined
when there is no match.
filePath
Type: string
The file path to parse.
FileType.fromStream(stream)
Detect the file type of a Node.js readable stream.
The file type is detected by checking the magic number of the buffer.
Returns a Promise
for an object with the detected file type and MIME type:
-
ext
- One of the supported file types -
mime
- The MIME type
Or undefined
when there is no match.
stream
Type: stream.Readable
A readable stream representing file data.
FileType.fromTokenizer(tokenizer)
Detect the file type from an ITokenizer
source.
This method is used internally, but can also be used for a special "tokenizer" reader.
A tokenizer propagates the internal read functions, allowing alternative transport mechanisms, to access files, to be implemented and used.
Returns a Promise
for an object with the detected file type and MIME type:
-
ext
- One of the supported file types -
mime
- The MIME type
Or undefined
when there is no match.
An example is @tokenizer/http
, which requests data using HTTP-range-requests. A difference with a conventional stream and the tokenizer, is that it can ignore (seek, fast-forward) in the stream. For example, you may only need and read the first 6 bytes, and the last 128 bytes, which may be an advantage in case reading the entire file would take longer.
const {makeTokenizer} = require('@tokenizer/http');
const FileType = require('@jedithepro/filetype.js');
const audioTrackUrl = 'https://test-audio.netlify.com/Various%20Artists%20-%202009%20-%20netBloc%20Vol%2024_%20tiuqottigeloot%20%5BMP3-V2%5D/01%20-%20Diablo%20Swing%20Orchestra%20-%20Heroines.mp3';
(async () => {
const httpTokenizer = await makeTokenizer(audioTrackUrl);
const fileType = await FileType.fromTokenizer(httpTokenizer);
console.log(fileType);
//=> {ext: 'mp3', mime: 'audio/mpeg'}
})();
Or use @tokenizer/s3
to determine the file type of a file stored on Amazon S3:
const FileType = require('@jedithepro/filetype.js');
const S3 = require('aws-sdk/clients/s3');
const {makeTokenizer} = require('@tokenizer/s3');
(async () => {
// Initialize the S3 client
const s3 = new S3();
// Initialize the S3 tokenizer.
const s3Tokenizer = await makeTokenizer(s3, {
Bucket: 'affectlab',
Key: '1min_35sec.mp4'
});
// Figure out what kind of file it is.
const fileType = await FileType.fromTokenizer(s3Tokenizer);
console.log(fileType);
})();
Note that only the minimum amount of data required to determine the file type is read (okay, just a bit extra to prevent too many fragmented reads).
FileType.extensions
Returns a set of supported file extensions.
FileType.mimeTypes
Returns a set of supported MIME types.
Supported file types
jpg
png
-
apng
- Animated Portable Network Graphics gif
webp
flif
-
cr2
- Canon Raw image file (v2) -
cr3
- Canon Raw image file (v3) -
orf
- Olympus Raw image file -
arw
- Sony Alpha Raw image file -
dng
- Adobe Digital Negative image file -
nef
- Nikon Electronic Format image file -
rw2
- Panasonic RAW image file -
raf
- Fujifilm RAW image file tif
bmp
icns
jxr
psd
indd
zip
tar
rar
gz
bz2
7z
dmg
mp4
mid
mkv
webm
mov
avi
mpg
-
mp1
- MPEG-1 Audio Layer I mp2
mp3
ogg
ogv
ogm
oga
spx
ogx
opus
flac
wav
qcp
amr
pdf
epub
-
mobi
- Mobipocket exe
swf
rtf
woff
woff2
eot
ttf
otf
ico
flv
ps
xz
sqlite
nes
crx
xpi
cab
deb
ar
rpm
Z
lz
cfb
mxf
mts
wasm
blend
bpg
docx
pptx
xlsx
-
jp2
- JPEG 2000 -
jpm
- JPEG 2000 -
jpx
- JPEG 2000 -
mj2
- Motion JPEG 2000 aif
-
odt
- OpenDocument for word processing -
ods
- OpenDocument for spreadsheets -
odp
- OpenDocument for presentations xml
heic
cur
ktx
-
ape
- Monkey's Audio -
wv
- WavPack -
asf
- Advanced Systems Format -
dcm
- DICOM Image File -
mpc
- Musepack (SV7 & SV8) -
ics
- iCalendar -
glb
- GL Transmission Format -
pcap
- Libpcap File Format -
dsf
- Sony DSD Stream File (DSF) -
lnk
- Microsoft Windows file shortcut -
alias
- macOS Alias file -
voc
- Creative Voice File -
ac3
- ATSC A/52 Audio File -
3gp
- Multimedia container format defined by the Third Generation Partnership Project (3GPP) for 3G UMTS multimedia services -
3g2
- Multimedia container format defined by the 3GPP2 for 3G CDMA2000 multimedia services -
m4v
- MPEG-4 Visual bitstreams -
m4p
- MPEG-4 files with audio streams encrypted by FairPlay Digital Rights Management as were sold through the iTunes Store -
m4a
- Audio-only MPEG-4 files -
m4b
- Audiobook and podcast MPEG-4 files, which also contain metadata including chapter markers, images, and hyperlinks -
f4v
- ISO base media file format used by Adobe Flash Player -
f4p
- ISO base media file format protected by Adobe Access DRM used by Adobe Flash Player -
f4a
- Audio-only ISO base media file format used by Adobe Flash Player -
f4b
- Audiobook and podcast ISO base media file format used by Adobe Flash Player -
mie
- Dedicated meta information format which supports storage of binary as well as textual meta information -
shp
- Geospatial vector data format -
arrow
- Columnar format for tables of data -
aac
- Advanced Audio Coding -
it
- Audio module format: Impulse Tracker -
s3m
- Audio module format: ScreamTracker 3 -
xm
- Audio module format: FastTracker 2 -
ai
- Adobe Illustrator Artwork -
skp
- SketchUp -
avif
- AV1 Image File Format -
eps
- Encapsulated PostScript -
lzh
- LZH archive -
pgp
- Pretty Good Privacy -
asar
- Archive format primarily used to enclose Electron applications -
stl
- Standard Tesselated Geometry File Format (ASCII only)
Pull requests are welcome for additional commonly used file types.
The following file types will not be accepted:
-
MS-CFB: Microsoft Compound File Binary File Format based formats, too old and difficult to parse:
-
.doc
- Microsoft Word 97-2003 Document -
.xls
- Microsoft Excel 97-2003 Document -
.ppt
- Microsoft PowerPoint97-2003 Document -
.msi
- Microsoft Windows Installer
-
-
.csv
- Reason. -
.svg
- Detecting it requires a full-blown parser. Check outis-svg
for something that mostly works.