antivirus - How do AV engines search files for known signatures so efficiently? -


data in form of search strings continue grow new virus variants released, prompts question - how av engines search files known signatures efficiently? if download new file, av scanner rapidly identifies file being threat or not, based on signatures, how can quickly? i'm sure point there hundreds of thousands of signatures.

update: tripleee pointed out, aho-corasick algorithm seems relevant virus scanners. here stuff read:

http://www.dais.unive.it/~calpar/aa07-08/aho-corasick.pdf

http://www.researchgate.net/publication/4276168_generalized_aho-corasick_algorithm_for_signature_based_anti-virus_applications/file/d912f50bd440de76b0.pdf

http://jason.spashett.com/av/index.htm

aho-corasick-like algorithm use in anti-malware code

below old answer. still relevant detecting malware worms make copies of themselves:

i'll write of thoughts on how avs might work. don't know sure. if thinks information incorrect, please notify me.

there many ways in avs detect possible threats. 1 way signature-based detection.

a signature unique fingerprint of file (which sequence of bytes). in terms of computer science, can called hash. single hash take 4/8/16 bytes. assuming size of 4 bytes (for example, crc32), 67 million signatures stored in 256mb.

all these hashes can stored in signature database. database implemented balanced tree structure, insertion, deletion , search operations can done in o(logn) time, pretty fast large values of n (n number of entries). or else if lot of memory available, hashtable can used, gives o(1) insertion, deletion , search. can faster n grows bigger , hashing technique used.

so antivirus calculates hash of file or critical sections (where malicious injections possible), , searches signature database it. explained above, search fast, enables scanning huge amounts of files in short amount of time. if found, file categorized malicious.

similarly, database can updated since insertion , deletion fast too.

you read these pages more insight.

which faster, hash lookup or binary search?

https://security.stackexchange.com/questions/379/what-are-rainbow-tables-and-how-are-they-used


Comments

Popular posts from this blog

Perl - how to grep a block of text from a file -

delphi - How to remove all the grips on a coolbar if I have several coolbands? -

javascript - Animating array of divs; only the final element is modified -