spam/ 0042775 0000456 0000456 00000000000 10162233624 011371 5 ustar drupal drupal spam/CHANGELOG 0100745 0000456 0000456 00000016044 10160321367 012603 0 ustar drupal drupal December 16, 2004
- spam.module
o better fix for bug #14388: reworded display, mark ints as ints
December 15, 2004
- spam.module
o fix bug #14388: display accurate statistics
o fixed inaccurate counter when operating in TEFT mode
December 13, 2004
- spam.module
o fix bug #14263: update comment statistics when publish/unpublish comment
December 9, 2004
- spam.module
o added interface to add/edit/delete URL filters (spammer domains)
o added help text for URL filters
o add ability to limit the following:
- total URLs per comment
- repeat URLs per comment
- total URLs per non-comment content
- repeat URLs per non-comment content
(if limit is crossed, content is marked as spam)
December 8, 2004
- spam.module
o added a URL filter which quickly learns spammer URLs and provides an
option to automatically block new comments and content that contain
these spammer URLs.
o fixed a cut&paste bug in the open relay filter which prevented it from
ever working
o updated tokenizer to find urls in and out of href tags
NOTE: This change modifies the tokenizer. For best results, you should
rebuild all your spam tokens. This is done by pointing to the following
path on your site: /admin/spam/rebuild/all
WARNING: This will only work as intended if you have been saving your spam,
unpublishing it as recommended rather than deleting it. If you have been
deleting your spam, doing a "rebuild all" will cause your Bayesian filter
to forget everything it has learned to date. (You will not loose any custom
filters you may have configured.)
November 5, 2004
- spam.module
o allow filtering of comments and contents posted from known open email
relays, inspired by http://weblog.sinteur.com/index.php?p=7967
o keep additional statistics for custom filter matches
November 4, 2004
- spam.module
o statistics collection enabled by default
o don't display false statistics if statistics collection is disabled
- REDAME.txt
o mention custom filter functionality
- INSTALL.txt
o created installation/configuration guide
October 28, 2004
- spam.module
o perform regex validation on new custom filters, and prevent duplicates
October 25, 2004
- spam.module
o when displaying custom filters, wrap in htmlspecialchars so they
display properly
o fix pager cut&paste error to properly display >25 custom filters
October 24, 2004
- spam.module
o rest of feature #11991: introducing custom filters
(can define custom regex filters to blacklist/whitelist words/phrases)
- spam.mysql
o adds spam_custom table
(to upgrade, run CREATE TABLE spam_custom... section of spam.mysql)
October 23, 2004
- spam.module
o fix bug #11900: typo preventing users from administrating spam
o part of feature #11991: set multi comment spam/not spam
(requires 'comment.module.patch' be applied to 4.5.0 comment.module)
(also implements multiple-comment-delete, though should be cleaned up
to offer an 'are you sure' dialog)
October 17, 2004
- spam.module
o feature request #11662: display spam poster's IP address
To upgrade, execute the following on your database:
ALTER TABLE spam_nodes ADD hostname varchar(128) NOT NULL default '';
- spam.mysql
o add 'hostname' column to spam_nodes table (not needed for spam_comments
because hostname is already stored in core comments table)
October 16, 2004
- spam.module
o fixed bug #11462: make module work with MySQL 3.x
o fix mail sending logic when comments and nodes are updated
o added administrative spam overview page for viewing statistics
o fix "edit node" link on admin/node/spam admin page
October 10, 2004
- spam.module
o wrap two database queries in missing {}'s for db prefixing support
October 9, 2004
- spam.module:
o first official release for Drupal 4.5
o first official release for Drupal 4.4
o recheck for spam in comments/nodes when updated
o stick redundant code into functions spam_comment_actions/spam_node_actions
o hide bayesian filter options unless 'advanced configuration' is enabled
o added /admin/spam/rebuild/probabilities to force recalculation of the
spam probability of all learned tokens. No link to this option, the
url must be manually entered.
o re-order code, grouping hooks and internal logic
o general cleanup, added some more comments
o fix bug #11429: use htmlspecialchars() instead of htmlentities()
- spam.mysql
o added missing TYPE='s and ;'s
October 6, 2004
- spam.module
o publish/unpublish comments/nodes with function instead of direct db_query
o make comment and node spam overview pages visible again
October 5, 2004
- spam.module
o added /admin/spam/rebuild/all to help with upgrades when tokenizer logic
changes. No link to this option, the url must be manually entered.
o no longer saves hostname when detecting spam comment (redundant)
o fixed typo that prevented unpublished spam comments from being editable
o cleaned up comment filtering admin page, can update multiple comments at
a time
o added node filtering, can enable/disable per node type
o updated help to reflect recent changes
- spam.mysql
o remove redundant 'hostname' column from spam_comment table, info already
in comment table
October 3, 2004
- spam.module
o enhanced tokenizer logic to better handle html links
(Adds some redundancy that may prove problematic or may prove beneficial.
Specifically, each url is looked at whole as well as in pieces)
o added online help
o added phpdoc format comments to internal api functions
o fixed call to spam_unsave_tokens when admin marks comment not spam
o general cleanup
September 29, 2004
- spam.module:
o performance: only recalculate probability of tokens that have changed
o added ability to notify the admin when a spam comment is detected
o update 'last' field in spam_comments when changed
o store IP in spam_comments table when user leaves comment (to be
used for future blacklisting functionality)
o default to rebuilding probability table (if never built before)
o provide two probability calculation methods (development testing)
- spam.mysql:
o made 'token' field of spam_tokens table the PRIMARY KEY
o add 'hostname' field to spam_comments, for later use in blacklisting
September 28, 2004
- spam.module:
o reworked spam statistics logic, added additional counters/timestamps
o fixed comment regarding asort()
o in _spam_rating() switched from confusing while() loop to for() loop
o removed unused gid (group id)
o greatly simplified logic/optimized by combining all token tables
o added ability to auto-unpublish spam comments
o general cleanup
- spam.mysql
o new spam_statistics table - to upgrade drop the old, add the new
o removed unused gid column from all tokens_* tables
o combined all three tokens_* tables into one spam_tokens table
September 26, 2004
- spam.module: initial release (early beta)
spam/LICENSE.txt 0100664 0000456 0000456 00000043143 10162233626 013216 0 ustar drupal drupal GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it. (Some other Free Software Foundation software is covered by
the GNU Library General Public License instead.) You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must show them these terms so they know their
rights.
We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language. (Hereinafter, translation is included without limitation in
the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.
You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in
whole or in part contains or is derived from the Program or any
part thereof, to be licensed as a whole at no charge to all third
parties under the terms of this License.
c) If the modified program normally reads commands interactively
when run, you must cause it, when started running for such
interactive use in the most ordinary way, to print or display an
announcement including an appropriate copyright notice and a
notice that there is no warranty (or else, saying that you provide
a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this
License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections
1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three
years, to give any third party, for a charge no more than your
cost of physically performing source distribution, a complete
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.
If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.
5. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all. For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.
10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
Copyright (C)
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) year name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
`Gnomovision' (which makes passes at compilers) written by James Hacker.
, 1 April 1989
Ty Coon, President of Vice
This General Public License does not permit incorporating your program into
proprietary programs. If your program is a subroutine library, you may
consider it more useful to permit linking proprietary applications with the
library. If this is what you want to do, use the GNU Library General
Public License instead of this License.
spam/README.txt 0100745 0000456 0000456 00000007436 10156262626 013103 0 ustar drupal drupal Overview:
--------
The spam module is a powerful collection of tools designed to help website
administrators to automatically deal with spam. Spam is any content that
is posted to a website that is unrelated to the subject at hand, usually in
the form of advertising and links back to the spammer's own website. This
module can automatically detect spam, instantly unpublish it, and send
notification to the site administrator.
The spam module provides four main mechanisms for automatically detecting
spam: a trainable Bayesian filter, manually entered custom filters, counting
the number of URLs, and detection of content posted from open email relays.
The Bayesian filter does statistical analysis on spam content, learning from
spam and non-spam that it sees to determine the liklihood that new content
is or is not spam. The filter starts out knowing nothing, and has to be
trained every time it makes a mistake. This is done by marking spam content
on your site as spam when you see it. Each word of the spam content will be
remembered and assigned a probability. The more often a word shows up in
spam content, the higher the probability that future content with the same
word is also spam. As most comment spam contains links back to the spammer's
websites (ie to sell Prozac), the Bayesian filter provides a special option
to quickly learn and block content that contains links to known spammer
websites.
The custom filtering functionality can blacklist, whitelist or greylist
based on the matching of words, phrases and regular expressions. For example,
a custom filter can be defined to always mark content as spam if it contains
the word 'Viagra'. Or, a custom filter can be defined to increase the
probability that content is spam if it matches the case insensitive regular
expression /free/i.
The spam module can also limit the total number of URLs allowed in comments
and other content, as well as the number of times the same URL can be repeated
in the same content. These limits can be different for comments and for other
types of content. For example, if the module is set to only allow the same
exact URL to appear in a comment twice, if "http://kerneltrap.org/" shows up
in the same comment three or more times, the comment will be considered spam.
The fourth tool for detecting spam is to look up the poster's IP address in
the Distributed Server Boycott List (http://dsbl.org/). If the address is
listed, it is known to come from an untrusted email server such as an open
relay and is marked as spam. The theory is that most comment-spammers are
also email spammers.
As an Drupal administrator, you can decide to enable any or all of the above
tools as best suited to your needs.
Features:
--------
- written in PHP specifically for Drupal
- learns to detect spam in any language using Bayesian logic
- quickly learns spammer URLs, blocking new comments and other content
containing known spammer URLS
- offers spam filtering for comments and nodes when inserted and updated
- can be configured to only filter specific node types (ie, only forum
postings)
- allows manual configuration of custom spam filters to blacklist/whitelist
certain words and phrases, supporting the use of regular expressions
- can filter comments and nodes submitted from an IP that is a known
open email relay
- can be configured to automatically unpublish spam
- can be configured to notify the site administrator when spam is detected
- two permissions: 'access spam ratings' and 'administer spam ratings'
- maintains comprehensive statistics to measure spam filter effectiveness
Comments:
--------
Early discussion from when this module was first being designed can be found
here:
http://drupal.org/node/11129
Requires:
--------
- Drupal 4.5
Credits:
-------
- Written by Jeremy Andrews
spam/spam.mysql 0100765 0000456 0000456 00000003251 10137071704 013417 0 ustar drupal drupal CREATE TABLE spam_tokens (
token varchar(255) NOT NULL default '',
spam int(10) unsigned NOT NULL default '0',
notspam int(10) unsigned NOT NULL default '0',
probability int(10) unsigned NOT NULL default '0',
last int(11) unsigned NOT NULL default '0',
PRIMARY KEY token (token),
KEY spam (spam),
KEY notspam (notspam),
KEY probablitiy (probability),
KEY last (last)
) TYPE=MyISAM;
CREATE TABLE spam_statistics (
name varchar(255) NOT NULL default '',
value int(10) unsigned NOT NULL default '0',
last int(11) unsigned NOT NULL default '0',
PRIMARY KEY name (name)
) TYPE=MyISAM;
CREATE TABLE spam_comments (
cid int(10) unsigned NOT NULL default '0',
rating int(2) unsigned NOT NULL default '0',
spam tinyint(1) unsigned NOT NULL default '0',
last int(11) unsigned NOT NULL default '0',
PRIMARY KEY cid (cid),
KEY rating (rating),
KEY spam (spam),
KEY last (last)
) TYPE=MyISAM;
CREATE TABLE spam_custom (
scid int(10) unsigned NOT NULL auto_increment,
filter varchar(255) NOT NULL default '',
regex tinyint(1) unsigned NOT NULL default '0',
effect int(2) unsigned NOT NULL default '0',
matches int(11) unsigned NOT NULL default '0',
last int(11) unsigned NOT NULL default '0',
PRIMARY KEY scid (scid),
KEY filter (filter),
KEY matches (matches),
KEY last (last)
) TYPE=MyISAM;
CREATE TABLE spam_nodes (
nid int(10) unsigned NOT NULL default '0',
rating int(2) unsigned NOT NULL default '0',
spam tinyint(1) unsigned NOT NULL default '0',
hostname varchar(128) NOT NULL default '',
last int(11) unsigned NOT NULL default '0',
PRIMARY KEY nid (nid),
KEY rating (rating),
KEY spam (spam),
KEY last (last)
) TYPE=MyISAM;
spam/INSTALL.txt 0100745 0000456 0000456 00000016342 10156262625 013247 0 ustar drupal drupal ------------
Requirements:
------------
- Drupal 4.5
------------
Installation:
------------
1) The first thing to do is to update your database, adding the tables
used by the spam module. This can easily be done from the command line
by copying the included 'spam.mysql' file to your webserver, then
running a command something like:
$ mysql -u -p < spam.mysql
For example, if your username is 'drupal', your password is 'secret', and
your database is called 'drupal', you'd type the following command:
$ mysql -udrupal -psecret drupal < spam.mysql
2) Move 'spam.module' into your modules/ directory, and be sure your web
server has read permissions to this file. (It should match the ownership
and permissions of the other files in this directory)
3) Now you need to log in to your site and enable the spam.module.
(Goto :: administer -> modules :: then check 'spam')
-------------
Configuration:
-------------
4) Configure the spam module.
(Goto :: administer -> settings -> spam)
- Decide which types of content you wish to filter. By default, only the
filtering of comments is enabled. It is also possible to enable the
filtering of any enabled node type, such as forum content, page content,
and story content. To enable the spam filter for a given content type,
place a checkmark next to the desired selection(s).
- The spam module allows you to configure the module to automatically
unpublish spam when it is detected, and to generate an email to the
site administrator when spam is detected.
- Enable 'Filter open relays' to utilize the Distributed Server Boycott
List (http://dsbl.org/). This will cause comments and content posted
- It is possible to limit the number of URLs that can appear in comments
and other content. If a single new comment or other content contains
more URLs than the specified limit, it will be marked as spam.
- By default, 'Filter spammer URLs' is enabled. This option tells the
spam module to give URLs special treatment. That is, once the Bayesian
logic determines that a URL links to a spammer website, any future
comments or other content containing the URL will be automatically marked
as spam.
- You can ignore the advanced configuration settings for now. (See step
8 below if you're interested in the advanced configuration options.)
5) Setup spam module permissions.
(Goto :: administer -> users -> configure -> permissions)
- The two permissions defined by the spam module are intended for site
administrators to help them train their spam filter.
- Give 'access spam rating' permission to users that need to see the
rating that the spam filter is assigned to each piece of new content.
(The rating will be a value from 1 to 99, with 1 being most probably
not spam, and 99 being most probably spam.)
- Give 'administer spam rating' permission to users that should be allowed
to mark content as spam or not spam when the filter makes a wrong
decision.
6) Defining URL filters.
(Goto :: administer -> spam -> URL filters)
- If enabled, URL filters are automatically learned by the Bayesian
filter. Domains listed here are considered "spammer domains", and
any new comment or other content will containing references (ie links)
to these domains will be marked as spam.
- Domain names that were erroneously learned by the Bayesian filter
as spammer domain can be manually deleted here.
- Known spammer domains can also be manually entered.
7) Defining custom filters.
(Goto :: administer -> spam -> custom filters)
- In the 'custom filter' text area, enter a string. You can enter a
word, a phrase, a regular expression. For example, if a large number
of spam contents on your site contain the word 'Viagra', you can use
it as your custom filter.
- If the string you entered was formatted as a regular expression, you
need to check the "Regular expression" box to let the spam module know
it should treat your filter as a regular expression. (If your regular
expression is formatted incorrectly, you will get an error message when
you try to save it)
- Finally, you need to tell the filter what it should do if the filter
matches. Choices are 'always spam', 'usually spam', 'usually not spam',
and 'never spam'. The first option, 'always spam', allows you to
blacklist matching words, phrases and regular expressions. The last
option, 'never spam', allows you to whitelist matching words, phrases
and regular expressions. The middle two options allow you to greylist
matching words, phrases or regular expressions. When only one greylist
filter matches, the final choice of making new content spam or not spam
is left up to the Bayesian filter.
- After defining custom filters, it is a good idea to regularly visit this
page and review how effective your custom filters are. Simple statistics
are provided to show you how often your filters matched new content, and
when was the last time each of them matched.
- Refer to the 'contributed/custom_filters' directory that came with this
module for example regular expression custom filters.
- Whitelisting words, phrases and regular expressions can be a bad idea.
If a spammer discovers items from your whitelist, they will be able to
consistently get spam through your filter.
- Spam is constantly evolving, thus you will probably find that you are
constantly having to update your spam filter.
8) Advanced configuration of the spam module (optional).
(Goto :: administer -> settings -> spam)
You should only modify the advanced settings for this module if you
feel confident you know what you are doing. If you have not studied
the science behind Bayesian filters and spam filtering, it is not
advised that you tune the advanced settings. Proceed at your own risk.
- Check the 'Advanced configuration' option, and click 'Save configuration'.
- A few tips if too much spam is getting through your Bayesian filter:
o Each and every time the spam filter makes a mistake (marking spam as
not spam, or marking not spam as spam) you must correct it for it
to properly learn.
o Be sure you've taught the filter with several hundred spam comments
(on most websites spam is thankfully uncommon, and this will probably
take a long time)
o Try lowering the 'threshold' value. Be aware that the lower you make
this value, the more likely you will have "false positives" (content
that the filter was marked as spam
o Try increasing the 'assign unkown token probability'. This means
words the filter has not seen before will be considered most likely
spam. This is likely to cause more 'false positives' at first,
but with careful training should help the Bayesian filter to
catch much more spam.
o Try adjusting the 'examine how many words' setting to better match
the usual length of your spam and not spam content. It is generally
preferable to keep this value low, otherwise spam authors can easily
add noise words to fool your filter.
spam/optional/ 0042775 0000456 0000456 00000000000 10145172324 013216 5 ustar drupal drupal spam/optional/README.txt 0100765 0000456 0000456 00000000234 10136623127 014712 0 ustar drupal drupal Optional enhancements to the spam module.
To learn more about these optional patches, open them in a text editor
and review the comments at the beginning.
spam/optional/comment.module.patch 0100765 0000456 0000456 00000007571 10136623127 017176 0 ustar drupal drupal #
# Optionally apply this patch against the core 4.5.0 comment.module to add
# the ability to mark multiple comments as spam or not spam. (Also adds
# bulk publishing, unpublishing, and deletion)
#
--- comment.module.orig 2004-10-23 23:40:42.610396006 -0400
+++ comment.module 2004-10-23 23:39:15.228585442 -0400
@@ -1018,6 +1018,21 @@
print theme('page', form($output));
}
+function comment_delete_multi($cid) {
+ $comment = db_fetch_object(db_query('SELECT c.*, u.name AS registered_name, u.uid FROM {comments} c INNER JOIN {users} u ON u.uid = c.uid WHERE c.cid = %d', $cid));
+
+ drupal_set_message(t('The comment and all its replies have been deleted.'));
+
+ // Delete comment and its replies.
+ _comment_delete_thread($comment);
+
+ _comment_update_node_statistics($comment->nid);
+
+ // Clear the cache so an anonymous user
+ // can see his comment being added.
+ cache_clear_all();
+}
+
function comment_save($id, $edit) {
db_query("UPDATE {comments} SET subject = '%s', comment = '%s', status = %d, format = '%s', name = '%s', mail = '%s', homepage = '%s' WHERE cid = %d", $edit['subject'], $edit['comment'], $edit['status'], $edit['format'], $edit['name'], $edit['mail'], $edit['homepage'], $id);
watchdog('special', t('Comment: modified %subject.', array('%subject' => ''. $edit['subject'] .'')));
@@ -1029,7 +1044,28 @@
*/
function comment_admin_overview($type = 'new') {
+ $operations = array(
+ array(t('Mark the selected comments as spam'), 'spam_admin_mark_comment_spam'),
+ array(t('Mark the selected comments as not spam'), 'spam_admin_mark_comment_notspam'),
+ array(t('Unpublish the selected comments'), 'spam_admin_unpublish_comment'),
+ array(t('Publish the selected comments'), 'spam_admin_publish_comment'),
+ array(t('Delete the selected comments (no confirmation)'), 'comment_delete_multi'),
+ );
+
+ $op = $_POST['op'];
+
+ if ($op == t('Update comments') && isset($_POST['edit']['operation']) && isset($_POST['edit']['status'])) {
+ $function = $operations[$_POST['edit']['operation']][1];
+ foreach ($_POST['edit']['status'] as $cid => $value) {
+ if ($value) {
+ $function($cid);
+ }
+ }
+ drupal_set_message(t('the update has been performed.'));
+ }
+
$header = array(
+ array('data' => t('Select')),
array('data' => t('Subject'), 'field' => 'subject'),
array('data' => t('Author'), 'field' => 'u.name'),
array('data' => t('Status'), 'field' => 'status'),
@@ -1042,9 +1078,25 @@
$sql .= tablesort_sql($header);
$result = pager_query($sql, 50);
+ // Make sure the update controls are disabled if we don't have any rows
+ // to select from.
+ $disabled = !db_num_rows($result);
+
+ $options = array();
+ foreach ($operations as $key => $value) {
+ $options[] = $value[0];
+ }
+
+ $form = form_select(NULL, 'operation', 0, $options, NULL, ($disabled ? 'disabled="disabled"' : ''));
+ $form .= form_submit(t('Update comments'), 'op', ($disabled ? array('disabled' => 'disabled') : array()));
+
+ $output .= ''. t('Update options') .'
';
+ $output .= "$form
";
+
while ($comment = db_fetch_object($result)) {
$comment->name = $comment->uid ? $comment->registered_name : $comment->name;
$rows[] = array(
+ form_checkbox(NULL, "status][$comment->cid", 1, 0),
l($comment->subject, "node/$comment->nid", array('title' => htmlspecialchars(truncate_utf8($comment->comment, 128))), NULL, "comment-$comment->cid") ." ". (node_is_new($comment->nid, $comment->timestamp) ? theme('mark') : ''),
format_name($comment),
($comment->status == 0 ? t('Published') : t('Not published')),
@@ -1058,7 +1110,8 @@
$rows[] = array(array('data' => $pager, 'colspan' => 6));
}
- print theme('page', theme('table', $header, $rows));
+ $output .= theme('table', $header, $rows);
+ print theme('page', form($output));
}
/**
spam/spam.module 0100745 0000456 0000456 00000222403 10160321367 013536 0 ustar drupal drupal The spam module provides an automatic mechanism for detecting and dealing with unwanted spam. It is able to filter comments and nodes. Spam is content posted to your site that is unrelated to the current topic, usually either advertising a product or including links to an external website in the hopes of increasing that website\'s visibility in search engines.
Configure the spam module at '. l(t('administer » settings » spam'), 'admin/settings/spam') .'. View all comment spam at '. l(t('administer » comment » spam'), 'admin/comment/spam') .'. View all node spam at '. l(t('administer » content » spam'), 'admin/node/spam') .'.
');
break;
case 'admin/spam/custom':
$output = t('Define words, phrases and regular expressions to be tested against new content on your site. If your custom filter matches, you can cause this to increase or decrease the probability that the given content is spam. For example, if a comment about "viagra" is completely out of place on your site you could create a custom filter such that any comment with the word "viagra" in it would always be marked as spam, no matter what the Bayesian filter rates it.
');
break;
case 'admin/spam/urls':
$output = t('They Bayesian filter automatically learns spammer domain names. Any new comment or other content containing one of the domain names listed below will be automatically marked as spam. For example, if "spam.com" is listed below, a new comment containing the text "http://spam.com/great/deals" will be marked as spam. In addition to automatically learning spammer domains, you can also manually add known spammer domains below.
It is possible to instead block spammer domains by adding an appropriate "custom filter" at %custom, however the big advantage of "URL filters" is that they are automatically learned using Bayesian logic.', array('%custom' => l(t('administer » spam » customer filters'), 'admin/spam/custom')));
break;
case 'admin/help#spam':
$output = t("
Overview
The spam module provides an automatic mechanism for detecting and dealing with unwanted spam. It is able to filter comments and nodes. Spam is content posted to your site that is unrelated to the current topic, usually either advertising a product or including links to an external website in the hopes of increasing that website's visibility in search engines.
Spam is detected with Bayesian logic. Essentially, all new content is broken into individual words. These words are then looked up in a database table to check if we've seen them before. If most of the words that we have seen before more often in spam content than non-spam conent, then there is a high probability that this new content is also spam. If new content has a high enough probability of being spam, the module can take automatic action, such as unpublishing the content and notifying the website administrator.
To begin, the module has no knowledge of what is spam and what is not spam. Thus, it will assume that all new content is not spam. When you get your first spam posting, you will need to click 'mark as spam' to teach the Bayesian filter that it made a mistake. The content will then be broken into individual words, and the words will be stored in the database to be used for detecting future spam. You will have to continue teaching the Bayesian filter with a couple hundred examples before it will begin to consistently detect spam automatically.
Configuration
The spam module can be configured at ". l(t('administer » settings » spam'), 'admin/settings/spam') .". By default, only comments will be filtered. However, you are able to enable content filtering as well. Each node type supported by your website will be listed on the spam configuration page, allowing you to for example filter forum posts but not stories.
The next configuration section provides automatic actions for the module to take when it detects spam. For example, if you wish for spam to be automatically unpublished when detected, click 'Automatically unpublish spam'. If you wish to send a notification email to the site administrator, click 'Notify admin when spam detected'.
The rest of the options are related to the Bayesian logic, and are best left in their default configurations, unless you feel very confident that you know what you're doing.
Permissions
The spam module provides to permissions that can be configured at ". l(t('administer » user » configure » permissions'), 'admin/user/configure/permission') .". The 'administer spam rating' permisison allows a user to see whether or not conent is rated as spam. The 'administer spam rating' permissions allows a user to teach the Bayesian filter, providing links that say 'mark as spam', and 'mark as not spam'.
Spam
The spam module adds links to any site content that is being filtered. The links are only visible to users with the proper permissions (explained above). Additionally, a complete listing of all spam comments on the website can be viewed at ". l(t('administer » comments » spam'), 'admin/comment/spam') .". A complete listing of all spam posts can be viewed at ". l(t('administer » content » spam'), 'admin/node/spam') ."
Background
This module was inspired by my experiences with the Spamassassin mail filter. For actual implementation, I referred to Paul Graham's excellent papers, A Plan For Spam and Better Bayesian Filtering. Further enhancements and ideas were also inspired by Bill Yerazunis' CRM114, especially his paper 'The Spam-Filtering Accuracy Plateau at 99.9% Accuracy'.
");
}
return $output;
}
/* This modules defines two permissions:
* access spam rating - allows admin to see content's current spam status
* administer spam rating - allows admin to edit content's current spam status
*/
function spam_perm() {
return array('access spam rating', 'administer spam rating');
}
function spam_cron() {
if (variable_get('spam_calculate_probabilities', 1)) {
spam_calculate_probabilities();
}
}
/* logic to detect spam comments */
function spam_comment($action, $comment) {
global $base_url;
$comment = array2object($comment);
switch ($action) {
case 'insert':
$weight = spam_filter_open_relay();
$weight += spam_custom_filter($comment->subject .' '. $comment->comment);
$weight += spam_filter_urls($comment->subject .' '. $comment->comment);
$tokens = spam_tokenize($comment->subject, 'subject*');
$tokens = array_merge($tokens, spam_tokenize($comment->comment));
$weight += spam_limit_urls('comment', spam_count_urls());
if (($rating = _spam_rating($tokens, $weight)) >= variable_get('spam_threshold', 80)) {
spam_comment_actions($comment, $tokens, 1, $action);
}
else { // not spam
spam_comment_actions($comment, $tokens, 0, $action);
}
db_query('INSERT INTO {spam_comments} (cid,rating,spam,last) VALUES(%d, %d, %d, %d)', $comment->cid, $rating, $rating >= variable_get('spam_threshold', 80) ? 1 : 0, time());
break;
case 'update':
$weight = spam_filter_open_relay();
$weight += spam_custom_filter($comment->subject .' '. $comment->comment);
$weight += spam_filter_urls($comment->subject .' '. $comment->comment);
$tokens = spam_tokenize($comment->subject, 'subject*');
$tokens = array_merge($tokens, spam_tokenize($comment->comment));
$weight += spam_limit_urls('comment', spam_count_urls());
$old = db_fetch_object(db_query('SELECT * FROM {spam_comments} WHERE cid = %d', $comment->cid));
$rating = _spam_rating($tokens, $weight);
$spam = ($rating >= variable_get('spam_threshold', 80)) ? 1 : 0;
if ($old->rating) {
if ($old->spam != $spam) {
// update of comment changed whether or not it is probably spam
spam_comment_actions($comment, $tokens, $spam, $action);
}
db_query('UPDATE {spam_comments} SET rating = %d, last = %d WHERE cid = %d', $rating, time(), $comment->cid);
}
else {
db_query('INSERT INTO {spam_comments} (cid,rating,spam,last) VALUES(%d, %d, %d, %d)', $comment->cid, $rating, $spam, time());
}
break;
case 'delete':
// this hook doesn't actually exist (in Drupal 4.4), leaving for future
db_query('DELETE FROM {spam_comments} WHERE cid = %d', $comment->cid);
break;
}
}
/* logic to detect spam nodes */
function spam_nodeapi(&$node, $op, $arg = 0) {
switch ($op) {
case 'insert':
if (variable_get("spam_filter_$node->type", 0)) {
$weight = spam_filter_open_relay();
$weight += spam_custom_filter($node->title .' '. $node->body);
$weight += spam_filter_urls($node->title .' '. $node->body);
$tokens = spam_tokenize($node->title, 'title*');
$tokens = array_merge($tokens, spam_tokenize($node->body));
$weight += spam_limit_urls('content', spam_count_urls());
if (($rating = _spam_rating($tokens, $weight)) >=
variable_get('spam_threshold', 80)) {
spam_node_actions($node, $tokens, 1, $op);
}
else {
spam_node_actions($node, $tokens, 0, $op);
}
db_query('INSERT INTO {spam_nodes} (nid,rating,spam,hostname,last) VALUES(%d, %d, %d, "%s", %d)', $node->nid, $rating, $rating >= variable_get('spam_threshold', 80) ? 1 : 0, $_SERVER['REMOTE_ADDR'], time());
}
break;
case 'update':
// check if spam status of node has changed
if (variable_get("spam_filter_$node->type", 0)) {
$weight = spam_filter_open_relay();
$weight += spam_custom_filter($node->title .' '. $node->body);
$weight += spam_filter_urls($node->title .' '. $node->body);
$tokens = spam_tokenize($node->title, 'title*');
$tokens = array_merge($tokens, spam_tokenize($node->body));
$weight += spam_limit_urls('content', spam_count_urls());
$old = db_fetch_object(db_query('SELECT * FROM {spam_nodes} WHERE nid = %d', $node->nid));
$rating = _spam_rating($tokens, $weight);
$spam = ($rating >= variable_get('spam_threshold', 80)) ? 1 : 0;
if ($old->rating) {
if ($old->spam != $spam) {
// update comment changed whether or not it is probably spam
spam_node_actions($node, $tokens, $spam, $action);
}
// update rating and timestamp
db_query('UPDATE {spam_nodes} SET rating = %d, last = %d WHERE nid = %d', $rating, time(), $node->nid);
}
else {
db_query('INSERT INTO {spam_nodes} (nid,rating,spam,hostname,last) VALUES(%d, %d, %d, "%s", %d)', $node->nid, $rating, $spam, $_SERVER['REMOTE_ADDR'], time());
}
}
break;
case 'delete':
db_query('DELETE FROM {spam_nodes} WHERE nid = %d', $node->nid);
break;
}
}
function spam_filter_open_relay() {
// inspired by http://weblog.sinteur.com/index.php?p=7967
$weight = 0;
if (variable_get('spam_filter_open_relay', 0)) {
list($a, $b, $c, $d) = split('.', $_SERVER['REMOTE_ADDR']);
if(checkdnsrr("$d.$c.$b.$a.list.dsbl.org")) {
// this comment was submitted from an open relay, most probably spam
_count(array("+blocked_open_relay"));
$weight = $interesting * 200;
}
}
return $weight;
}
function spam_filter_urls($text) {
if (variable_get('spam_filter_urls', 1)) {
$interesting = variable_get('spam_interesting_tokens', 15);
$weight = 0;
$result = db_query("SELECT token FROM {spam_tokens} WHERE probability > %d AND token LIKE 'URL*%%'", variable_get('spam_threshold', 80));
while ($url = db_fetch_object($result)) {
$url = preg_replace('/^URL\*/', '', $url->token);
$match = preg_match_all("!$url!", $text, $matches);
if ($match) {
_count(array("+matched_spam_url"));
$weight = $weight + ($match * $interesting * 200);
}
}
}
return $weight;
}
function spam_settings() {
// general configuration
$group = form_checkbox(t('Filter comments'), 'spam_filter_comments', 1, variable_get('spam_filter_comments', 1), t('If checked, the spam module will check all new comments that are posted to this site and attempt to determine whether or not they are spam. If only trusted users are able to post comments, there is no need to enable this option.'));
$node_types = node_list();
foreach ($node_types as $type) {
$group .= form_checkbox(t("Filter $type content"), "spam_filter_$type", 1, variable_get("spam_filter_$type", 0), t("If checked, the spam module will check all new $type content that is posted to this site and attempt to determine whether or not it is spam. If only trusted users are able to post $type content, there is no need to enable this option."));
}
$group .= form_checkbox(t('Filter open relays'), 'spam_filter_open_relay', 1, variable_get('spam_filter_open_relay', 0), t('If checked, the spam module will mark as spam all new or updated comments or content that are being posted from an IP address that is a known open email relay. The determination of whether or not an IP address is an open relay or otherwise commonly used to generate spam is done with the Distributed Server Boycott List. Note that if using custom filters, matching on a "never spam" rule or multiple "usually not spam" rules can permit postings even from open relays.'));
$group .= form_checkbox(t('Filter spammer URLs'), 'spam_filter_urls', 1, variable_get('spam_filter_urls', 1), t('If checked, the spam module will pay special attention to any URLs that are embedded within comments and other content. When URLs that were found within known spam are found in new comments and other new content, the new content is automatically considered to be spam. When this option is enabled, a single spam URL found within an otherwise spam-free comment or other content will cause the filter to mark the new content as spam.'));
$output = form_group ('Filter', $group);
// limits
$group = form_select(t('Total URLs per comment'), 'spam_comment_total_urls', variable_get('spam_comment_total_urls', 0), array(0 => 'disabled', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '15', '20'), t('Specificy the maximum number of URLs that are allowed in a single comment before the comment is considered to be spam. For example, if you select 5 from the pop down menu, and then a comment has 6 weblinks, the comment will be marked as spam. This option will only take affect if you enable comment filtering above.'));
$group .= form_select(t('Repeated URLs per comment'), 'spam_comment_repeat_urls', variable_get('spam_comment_repeat_urls', 0), array(0 => 'disabled', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '15', '20'), t('Specificy the maximum number of times that the same URL is allowed to appear in one single comment before the comment is considered to be spam. For example, if you select 5 from the pop down menu, and then a comment has 6 weblinks to the same exact location, the comment will be marked as spam. (The entire url is used, so "http://kerneltrap.org/journals/hacker" would be considered different than "http://kerneltrap.org/journals".) This option will only take affect if you enable comment filtering above.'));
$group .= form_select(t('Total URLs per post'), 'spam_content_total_urls', variable_get('spam_content_total_urls', 0), array(0 => 'disabled', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '15', '20'), t('Specificy the maximum number of URLs that are allowed in a single post before the post is considered to be spam. For example, if you select 5 from the pop down menu, and then a post has 6 weblinks, the post will be marked as spam. Posts are content other than comments, such as story content and page content. This option will only take affect if you enable content filtering above.'));
$group .= form_select(t('Repeated URLs per post'), 'spam_content_repeat_urls', variable_get('spam_content_repeat_urls', 0), array(0 => 'disabled', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '15', '20'), t('Specificy the maximum number of times that the same URL is allowed to appear in one single post before the post is considered to be spam. For example, if you select 5 from the pop down menu, and then a post has 6 weblinks to the same exact location, the post will be marked as spam. (The entire url is used, so "http://kerneltrap.org/journals/hacker" would be considered different than "http://kerneltrap.org/journals".) This option will only take affect if you enable content filtering above.'));
$output .= form_group ('Limits', $group);
// actions
$group = form_checkbox(t('Automatically unpublish spam'), 'spam_unpublish', 1, variable_get('spam_unpublish', 0), t('Automatically unpublish spam content. This will prevent it from being displayed unless an administrator manually publishes it or marks the content as being not spam.'));
$group .= form_checkbox(t('Notify admin when spam detected'), 'spam_notify_admin', 1, variable_get('spam_notify_admin', 0), t('Send an email to the site administrator when the Bayesian filter detectes spam content.'));
$output .= form_group('Actions', $group);
// advanced configuration options
$group = form_checkbox(t('Advanced configuration'), 'spam_advanced_configuration', 1, variable_get('spam_advanced_configuration', 0), t('Enable the advanced configuration options allowing you to fine tune the Bayesion filter.'));
if (variable_get('spam_advanced_configuration', 0)) {
// developer tools
$inner_group = form_checkbox(t('Display spam rating'), 'spam_display_rating', 1, variable_get('spam_display_rating', 0), t('If enabled, the probability that a given piece of content is spam will be displayed next to the content. Useful when configuring your bayesian filter.'));
$inner_group .= form_checkbox(t('Collect statistics'), 'spam_statistics', 1, variable_get('spam_statistics', 1), t('Measure the effectiveness of the Bayesian filter. There is some performance overhead caused by enabling this option.'));
$inner_output = form_group('Tools', $inner_group);
// bayesion filter configuration
$inner_group = form_select(t('Threshold'), 'spam_threshold', variable_get('spam_threshold', 80), array(10 => 10, 20 => 20, 30 => 30, 40 => 40, 50 => 50, 60 => 60, 70 => 70, 80 => 80, 90 => 90), t('After content is passed through the bayesian filter, we get back a value from 1 to 99. 1 means that there is a 1% chance this content is spam. 99 means that there is a 99% chance this content is spam. The threshold defines at what probability we require content to be before labeling it as spam. It is best to keep this value high to minimize false positives.'));
$inner_group .= form_select(t('Assign unknown token probability'), 'spam_unknown_probability', variable_get('spam_unknown_probability', 40), array(10 => 10, 20 => 20, 30 => 30, 40 => 40, 50 => 50, 60 => 60, 70 => 70, 80 => 80, 90 => 90), t('To caculate whether or not given content is spam we look at each word and based on how often the words has been speen in spam content versus non-spam content. If we\'ve never seen the word before, we arbitrarily assign this value to it. The lower the value, the more you trust new content will not be spam.'));
$inner_group .= form_select(t('Examine how many words'), 'spam_interesting_tokens', variable_get('spam_interesting_tokens', 15), array(5 => 5, 10 => 10, 15 => 15, 25 => 25, 35 => 35, 50 => 50, 75 => 75, 100 => 100), t('When determining if content is spam or not we only pass a finite number of the "most interesting" words into the Bayesian filter. The more spam-like or non-spam-like the word, the "more interesting". That is, a word that has a 50% chance of being spam is "least interesting", whereas words that have 1% chance of being spam or 99% chance of being spam are "most interesting".'));
$inner_group .= form_select(t('Training method'), 'spam_train_method', variable_get('spam_train_method', 0), array('TOE (Train On Error)', 'TEFT (Train Everything)'), t('TOE, or Train on Error, means that the bayesian filter will only learn new words when it is told that it mislabeled content. TEFT, or Train Everything means that the Bayesian filter will automatically learn from every piece of content it sees. TOE is recommended as it tends to be more accurate, avoiding artificial predjudices often introduced by TEFT.'));
$inner_group .= form_select(t('Probability calculation method'), 'spam_waited_probability', variable_get('spam_waited_probability', 1), array(0 => 'simple', 1 => 'weighted'), t('Choosing the "simple" method will calculate the probability that each token is spam by dividing the number of times the token has been seen in spam content by the number of times the token has been seen in all content. Choosing the "weighted" method will use a different algorithm that is quicker to consider a word as more likely to be spam or non-spam.'));
$inner_output .= form_group('Bayesian logic (advanced)', $inner_group);
$group .= $inner_output;
}
$output .= form_group('Advanced configuration', $group);
return $output;
}
function spam_link($type, $node = 0, $main = 0) {
$links = array();
if ($type == 'comment') {
if ($spam = theme('spam_link', $node, 'comment')) {
$links[] = $spam;
}
}
if ($type == 'node') {
if ($spam = theme('spam_link', $node, 'node')) {
$links[] = $spam;
}
}
return ($links);
}
function spam_menu($may_cache) {
$items = array();
if ($may_cache) {
$items[] = array('path' => 'admin/comment/spam', 'title' => t('spam'),
'access' => user_access('administer spam rating'),
'callback' => 'spam_admin', 'type' => MENU_LOCAL_TASK);
$items[] = array('path' => 'admin/node/spam', 'title' => t('spam'),
'access' => user_access('administer spam rating'),
'callback' => 'spam_admin', 'type' => MENU_LOCAL_TASK);
$items[] = array('path' => 'admin/spam', 'title' => t('spam'),
'access' => user_access('administer spam rating'),
'callback' => 'spam_admin');
$items[] = array('path' => 'admin/spam/statistics',
'title' => t('statistics'), 'weight' => -1,
'access' => user_access('administer spam rating'),
'callback' => 'spam_admin', 'type' => MENU_LOCAL_TASK);
$items[] = array('path' => 'admin/spam/custom',
'title' => t('custom filters'),
'access' => user_access('administer spam rating'),
'callback' => 'spam_admin', 'type' => MENU_LOCAL_TASK);
$items[] = array('path' => 'admin/spam/urls',
'title' => t('URL filters'),
'access' => user_access('administer spam rating'),
'callback' => 'spam_admin', 'type' => MENU_LOCAL_TASK);
$items[] = array('path' => 'admin/spam/comment', 'title' => t('comment'),
'access' => user_access('administer spam rating'),
'callback' => 'spam_admin', 'type' => MENU_CALLBACK);
$items[] = array('path' => 'admin/spam/node', 'title' => t('node'),
'access' => user_access('administer spam rating'),
'callback' => 'spam_admin', 'type' => MENU_CALLBACK);
$items[] = array('path' => 'admin/spam/rebuild', 'title' => t('rebuild'),
'access' => user_access('administer spam rating'),
'callback' => 'spam_admin', 'type' => MENU_HIDE);
$items[] = array('path' => 'spam', 'title' => t('spam'),
'access' => user_access('administer spam rating'),
'callback' => 'spam_page', 'type' => MENU_CALLBACK);
}
return $items;
}
function spam_admin_comment_overview() {
drupal_set_title(t('Spam comments'));
$operations = array(
array(t('Mark the selected comments as not spam'), 'spam_admin_mark_comment_notspam'),
array(t('Unpublish the selected comments'), 'spam_admin_unpublish_comment'),
array(t('Publish the selected comments'), 'spam_admin_publish_comment'),
);
$op = $_POST['op'];
if ($op == t('Update comments') && isset($_POST['edit']['operation']) && isset($_POST['edit']['status'])) {
$function = $operations[$_POST['edit']['operation']][1];
foreach ($_POST['edit']['status'] as $cid => $value) {
if ($value) {
$function($cid);
}
}
drupal_set_message(t('the update has been performed.'));
}
$header = array(
array('data' => t('select')),
array('data' => t('subject'), 'field' => 'subject'),
array('data' => t('author'), 'field' => 'u.name'),
array('data' => t('hostname'), 'field' => 'c.hostname'),
array('data' => t('status'), 'field' => 'status'),
array('data' => t('time'), 'field' => 'c.timestamp', 'sort' => 'desc'),
array('data' => t('operations'), 'colspan' => 2)
);
$sql = "SELECT c.*, s.cid, s.spam FROM {spam_comments} s, {comments} c WHERE s.cid = c.cid AND s.spam = 1";
$sql .= tablesort_sql($header);
$result = pager_query($sql, 50);
// Make sure the update controls are disabled if we don't have any rows
// to select from.
$disabled = !db_num_rows($result);
$options = array();
foreach ($operations as $key => $value) {
$options[] = $value[0];
}
$form = form_select(NULL, 'operation', 0, $options, NULL, ($disabled ? 'disabled="disabled"' : ''));
$form .= form_submit(t('Update comments'), 'op', ($disabled ? array('disabled' => 'disabled') : array()));
$output .= ''. t('Update options') .'
';
$output .= "$form
";
while ($comment = db_fetch_object($result)) {
$rows[] = array(form_checkbox(NULL, "status][$comment->cid", 1, 0), l($comment->subject, "node/$comment->nid", array('title' => htmlspecialchars(substr($comment->comment, 0, 128))), NULL, "comment-$comment->cid") .' '. (node_is_new($comment->nid, $comment->timestamp) ? theme('mark') : ''), format_name($comment), $comment->hostname, ($comment->status == 0 ? t('published') : t('not published')) .''. format_date($comment->timestamp, 'small'), l(t('edit'), "admin/comment/edit/$comment->cid"), l(t('delete'), "admin/comment/delete/$comment->cid"));
}
if ($pager = theme('pager', NULL, 50, 0, tablesort_pager())) {
$rows[] = array(array('data' => $pager, 'colspan' => 6));
}
$output .= theme('table', $header, $rows);
return form($output);
}
function spam_admin_node_overview() {
drupal_set_title(t('Spam nodes'));
$operations = array(
array(t('Mark the selected posts as not spam'), 'spam_admin_mark_node_notspam'),
array(t('Unpublish the selected posts'), 'spam_admin_unpublish_node'),
array(t('Publish the selected posts'), 'spam_admin_publish_node'),
);
$op = $_POST['op'];
if ($op == t('Update nodes') && isset($_POST['edit']['operation']) && isset($_POST['edit']['status'])) {
$function = $operations[$_POST['edit']['operation']][1];
foreach ($_POST['edit']['status'] as $nid => $value) {
if ($value) {
$function($nid);
}
}
drupal_set_message(t('the update has been performed.'));
}
$header = array(
array('data' => t('select')),
array('data' => t('title'), 'field' => 'title'),
array('data' => t('type'), 'field' => 'type'),
array('data' => t('author'), 'field' => 'u.name'),
array('data' => t('hostname'), 'field' => 'hostname'),
array('data' => t('status'), 'field' => 'status'),
array('data' => t('time'), 'field' => 'n.changed', 'sort' => 'desc'),
array('data' => t('operations'), 'colspan' => 3)
);
$sql = 'SELECT n.*, u.name, u.uid, s.nid, s.spam, s.hostname FROM {spam_nodes} s, {node} n, {users} u WHERE n.uid = u.uid AND s.nid = n.nid AND s.spam = 1';
$sql .= tablesort_sql($header);
$result = pager_query($sql, 50);
// Make sure the update controls are disabled if we don't have any rows
// to select from.
$disabled = !db_num_rows($result);
$options = array();
foreach ($operations as $key => $value) {
$options[] = $value[0];
}
$form = form_select(NULL, 'operation', 0, $options, NULL, ($disabled ? 'disabled="disabled"' : ''));
$form .= form_submit(t('Update nodes'), 'op', ($disabled ? array('disabled' => 'disabled') : array()));
$output .= ''. t('Update options') .'';
$output .= "$form ";
// Overview table:
while ($node = db_fetch_object($result)) {
$rows[] = array(form_checkbox(NULL, "status][$node->nid", 1, 0), l($node->title, "node/$node->nid") .' '. (node_is_new($node->nid, $node->changed) ? theme_mark() : ''), node_invoke($node, 'node_name'), format_name($node), $node->hostname, ($node->status ? t('published') : t('not published')), format_date($node->changed, 'small'), l(t('edit node'), "node/$node->nid/edit"), l(t('delete node'), "admin/node/delete/$node->nid"));
}
if ($pager = theme('pager', NULL, 50, 0)) {
$rows[] = array(array('data' => $pager, 'colspan' => 7));
}
$output .= theme('table', $header, $rows);
return form($output);
}
function spam_admin_statistics() {
$stats = spam_get_statistics();
$output = "Overview: ";
if (variable_get('spam_statistics', 1)) {
if ($stats['spam']->value > 0) {
$output .= t('This site has had a combined total of %spam spam comment and spam node postings. The last spam posting to this site was at %time on %date. %true_positive of the %auto_spam (%true_positive_percent%) automatically detected spam postings were correctly marked as spam. %true_negative of the %auto_notspam (%true_negative_percent%) automatically detected non-spam postings were correctly marked as non-spam. This is an overall filter accuracy of %accuracy%%rebuilt.', array('%spam' => (int)$stats['spam']->value, '%auto_spam' => (int)$stats['auto_marked_spam']->value, '%was' => format_plural((int)$stats['auto_marked_spam']->value, 'was', 'were'), '%time' => format_date((int)$stats['spam']->last, 'custom', 'g:i a'), '%date' => format_date((int)$stats['spam']->last, 'custom', 'l, F j, Y'), '%true_positive' => (int)$stats['auto_marked_spam']->value - (int)$stats['false_positive']->value, '%true_positive_percent' => (int)$stats['auto_marked_spam']->value ? 100 - round((int)$stats['false_positive']->value / (int)$stats['auto_marked_spam']->value * 100, 2) : 0, '%auto_notspam' => (int)$stats['auto_marked_notspam']->value, '%true_negative' => (int)$stats['auto_marked_notspam']->value - (int)$stats['false_negative']->value, '%true_negative_percent' => (int)$stats['auto_marked_notspam']->value ? 100 - round((int)$stats['false_negative']->value / (int)$stats['auto_marked_notspam']->value * 100, 2) : 0, '%accuracy' => (int)$stats['auto_marked_spam']->value + (int)$stats['auto_marked_notspam']->value ? round(100 - ((int)$stats['false_positive']->value + (int)$stats['false_negative']->value) / ((int)$stats['auto_marked_spam']->value + (int)$stats['auto_marked_notspam']->value) * 100, 2) : 100, '%rebuilt' => (int)$stats['rebuilt_tokens_all']->value ? t(' since the last time the filter was rebuilt') : ''));
$output .= ' ['. l(t('rebuild filter'), 'admin/spam/rebuild/all') .']';
}
$output .= " Spam comments: ";
if ($stats['spam_comment']->value > 0) {
$output .= t('This site has had a total of %spam_comment spam comment postings. The last spam comment was posted at %time on %date.', array('%spam_comment' => (int)$stats['spam_comment']->value, '%time' => format_date($stats['spam_comment']->last, 'custom', 'g:i a'), '%date' => format_date($stats['spam_comment']->last, 'custom', 'l, F j, Y')));
$output .= ' ['. l(t('view comment spam'), 'admin/comment/spam') .']';
}
else {
$output .= t('This site has not had any spam comment postings.');
}
$output .= " Spam content: ";
$total = 0;
$node_types = node_list();
foreach ($node_types as $type) {
$spam["$type"] = $stats["spam_$type"]->value;
$total = $total + $stats["spam_$type"]->value;
}
if ($total) {
$types = array();
foreach ($spam as $type => $num) {
if ($num) {
$types[] = t('%num spam %type %posting', array('%num' => (int)$num, '%type' => $type, '%posting' => format_plural($num, 'posting', 'postings')));
$last[] = t(' The last spam %type posting was at %time on %date.', array('%type' => $type, '%time' => format_date($stats["spam_$type"]->last, 'custom', 'g:i a'), '%date' => format_date($stats["spam_$type"]->last, 'custom', 'l, F j, Y')));
}
}
$n = count($types);
if ($n == 1) {
$spam_types = $types[0];
}
else {
for ($i = 0; $i < ($n - 1); $i++) {
$spam_types .= $types[$i] .', ';
}
$spam_types .= 'and '. $types[$n-1];
}
$output .= t('This site has had a combined total of %spam_content spam node %posting in the form of %spam_types.', array('%spam_content' => (int)$total, '%spam_types' => $spam_types, '%posting' => format_plural($total, 'posting', 'postings')));
foreach ($last as $l) {
$output .= $l;
}
$output .= ' ['. l(t('view content spam'), 'admin/node/spam') .']';
}
else {
$output .= t('This site has not had any content spam.');
}
}
else {
$output .= t('This module is currently configured to not collect statistics.');
}
$output .= " ";
return $output;
}
function spam_get_statistics() {
$spam_statistics = array();
$result = db_query('SELECT name, value, last FROM {spam_statistics}');
while ($statistic = db_fetch_object($result)) {
$spam_statistics["$statistic->name"] = $statistic;
}
return ($spam_statistics);
}
function spam_admin_urls($edit = array()) {
if (variable_get('spam_filter_urls', 1) == 0) {
$group = t('The URL filtering functionality provided by this module is currently disabled. You can configure URL filters below, but they will not function until you check "Filter spammer URLs" at %url.', array('%url' => l(t('administer » settings » spam'), 'admin/settings/spam')));
$output = form_group(t('Notice'), $group);
}
if (!empty($edit)) {
$edit['url'] = preg_replace('/^URL\*/', '', $edit['token']);
}
$header = array(
array('data' => t('domain'), 'field' => 'token', 'sort' => 'asc'),
array('data' => t('spam matches'), 'field' => 'spam'),
array('data' => t('not spam matches'), 'field' => 'notspam'),
array('data' => t('spam probability'), 'field' => 'probability'),
array('data' => t('last match'), 'field' => 'last'),
array('data' => t('operations'), 'colspan' => 2)
);
$sql = 'SELECT * FROM {spam_tokens} WHERE probability > '. variable_get('spam_threshold', 80) .' AND token LIKE "URL%%"';
$sql .= tablesort_sql($header);
$result = pager_query($sql, 25);
while ($url = db_fetch_object($result)) {
$rows[] = array(
$url->url = htmlspecialchars(preg_replace('/^URL\*/', '', $url->token)),
$url->spam,
$url->notspam,
$url->probability .'%',
format_date($url->last, 'small'),
l(t('edit'), htmlspecialchars("admin/spam/urls/$url->url/edit")),
l(t('delete'), htmlspecialchars("admin/spam/urls/$url->url/delete"))
);
}
if ($pager = theme('pager', NULL, 25, 0, tablesort_pager())) {
$rows[] = array(array('data' => $pager, 'colspan' => 6));
}
$group = theme('table', $header, $rows);
$output .= form_group(t('URL filters'), $group);
$group = form_textfield(t('Domain'), 'url', $edit['url'], 45, 255, t('Enter a domain name that if found in new site content will cause the content to be marked as spam. For example if you enter "spam.com" as a domain name, a comment containing the URL "http://spam.com/stuff/for/sale" will be automatically marked as spam.'));
if (empty($edit['url'])) {
$group .= form_submit(t('Add URL filter'));
$output .= form_group(t('Add new URL filter'), $group);
}
else {
$group .= form_submit(t('Edit URL filter'));
$group .= form_hidden('token', $edit['token']);
$output .= form_group(t('Edit URL filter'), $group);
}
return form($output, 'post', url('admin/spam/urls'));
}
function _spam_load_urls($url, $type = 'object') {
$result = db_query('SELECT * FROM {spam_tokens} WHERE token = "URL*%s"', $url);
if ($type == 'object') {
$url = db_fetch_object($result);
}
else {
$url = db_fetch_array($result);
}
return $url;
}
function spam_admin_urls_edit($edit = array(), $action = 'add') {
if (!empty($edit['url'])) {
if ($action == 'edit') {
db_query('UPDATE {spam_tokens} SET token = "URL*%s" WHERE token = "%s"', $edit['url'], $edit['token']);
drupal_set_message(t('URL filter "%filter" updated.', array('%filter' => htmlspecialchars($edit['url']))));
}
else { // add
$duplicate = db_fetch_object(db_query('SELECT token FROM {spam_tokens} WHERE token = "URL*%s"', $edit['url']));
// there's no reason to allow duplicate filters
if ($duplicate->token) {
drupal_set_message(t('URL filter "%filter" already exists.', array('%filter' => htmlspecialchars($edit['url']))), 'error');
}
else {
db_query('INSERT INTO {spam_tokens} (token, probability) VALUES("URL*%s", %d)', $edit['url'], 99);
drupal_set_message(t('URL filter "%filter" added.', array('%filter' => htmlspecialchars($edit['url']))));
}
}
}
}
function spam_admin_urls_delete_confirm($edit = array()) {
$edit['url'] = preg_replace('/^URL\*/', '', $edit['token']);
$group = t('Are you sure you want to delete the "%filter" URL filter?', array('%filter' => htmlspecialchars($edit['url']))) .' ';
$group .= form_hidden('token', $edit['token']);
$group .= form_submit(t('Delete URL filter'));
$output = form_group('Confirm URL filter deletion', $group);
return form($output, 'post', url('admin/spam/urls'));
}
function spam_admin_urls_delete($edit) {
db_query('DELETE FROM {spam_tokens} WHERE token = "%s"', $edit['token']);
drupal_set_message(t('URL filter deleted.'));
}
function spam_admin_custom($edit = array()) {
$header = array(
array('data' => t('filter'), 'field' => 'filter', 'sort' => 'asc'),
array('data' => t('type'), 'field' => 'regex'),
array('data' => t('effect'), 'field' => 'effect'),
array('data' => t('matches'), 'field' => 'matches'),
array('data' => t('last match'), 'field' => 'last'),
array('data' => t('operations'), 'colspan' => 2)
);
$sql = 'SELECT * FROM {spam_custom}';
$sql .= tablesort_sql($header);
$result = pager_query($sql, 25);
$effects = array(t('always spam'), t('usually spam'), t('usually not spam'), t('never spam'));
while ($custom = db_fetch_object($result)) {
$rows[] = array(
htmlspecialchars($custom->filter),
$custom->regex ? t('regex') : t('plain text'),
$effects["$custom->effect"],
$custom->matches,
format_date($custom->last, 'small'),
l(t('edit'), "admin/spam/custom/$custom->scid/edit"),
l(t('delete'), "admin/spam/custom/$custom->scid/delete")
);
}
if ($pager = theme('pager', NULL, 25, 0, tablesort_pager())) {
$rows[] = array(array('data' => $pager, 'colspan' => 6));
}
$group = theme('table', $header, $rows);
$output = form_group(t('Custom filters'), $group);
$group = form_textfield(t('Custom filter'), 'filter', $edit['filter'], 45, 255, t('Enter a custom filter string. You can enter a word, a phrase, or a complete regular expression. All new content that is being scanned for spam will also be tested against your custom filters.'));
$group .= form_checkbox(t('Regular expression'), 'regex', 1, $edit['regex'], t('Check this box if the above custom filter should be treated as a regular expression. This module uses Perl-compatible regular expressions. As a simple example to do a case-insensitve match on the word "viagra", you would enter (without the quotes) "/viagra/i ".'));
$group .= form_select(t('Match effect'), 'effect', $edit['effect'], array(t('always spam'), t('usually spam'), t('usually not spam'), t('never spam')), t('Define the effect when your custom filter matches on new content. If your filter defines "always spam", this increases the chances the new content will be marked spam by 200%. If your filter defines "usually spam", this increases the chances the new content will be marked spam by 50%. If your filter defines "usually not spam", this decreases the chances the new content will be marked spam by 50%. And if your filter defines "never spam", this decreases the chances the new content will be marked spam by 200%. Note that it is possible to match both an "always spam" and a "never spam" filter with the same content, and that then the filters will cancel each other out. Additionally, four "usually not spam" matches will cancel out one "always spam" match.'));
if (empty($edit['filter'])) {
$group .= form_submit(t('Add filter'));
$output .= form_group(t('Add new custom filter'), $group);
}
else {
$group .= form_hidden('scid', $edit['scid']);
$group .= form_submit(t('Edit filter'));
$output .= form_group(t('Edit custom filter'), $group);
}
return form($output, 'post', url('admin/spam/custom'));
}
function spam_admin_custom_add($edit = array()) {
if (!empty($edit['filter'])) {
// validate if regex
if ($edit['regex'] && preg_match($edit['filter'], 'test') === FALSE) {
/* failed regex validation is a critical error and things break, so we
** just echo an error and exit. (If we don't exit, additional errors
** appear about modifying headers making it confusing)
*/
echo t('Your regular expression "%regex" does not validate. Please press the back button on your browser to fix the error you see above.', array('%regex' => $edit['filter']));
exit (1);
}
if ($edit['scid']) {
db_query('UPDATE {spam_custom} SET filter = "%s", regex = %d, effect = %d WHERE scid = %d', $edit['filter'], $edit['regex'], $edit['effect'], $edit['scid']);
drupal_set_message(t('Custom filter "%filter" updated.', array('%filter' => htmlspecialchars($edit['filter']))));
}
else {
// there's no reason to allow duplicate filters
$duplicate = db_fetch_object(db_query('SELECT scid FROM {spam_custom} WHERE filter = "%s"', $edit['filter']));
if ($duplicate->scid) {
drupal_set_message(t('Custom filter "%filter" already exists.', array('%filter' => htmlspecialchars($edit['filter']))), 'error');
}
else {
db_query('INSERT INTO {spam_custom} (filter, regex, effect) VALUES("%s", %d, %d)', $edit['filter'], $edit['regex'], $edit['effect']);
drupal_set_message(t('Custom filter "%filter" added.', array('%filter' => htmlspecialchars($edit['filter']))));
}
}
}
}
function _spam_load_custom($scid, $type = 'object') {
$result = db_query('SELECT * FROM {spam_custom} WHERE scid = %d', $scid);
if ($type == 'object') {
$custom = db_fetch_object($result);
}
else {
$custom = db_fetch_array($result);
}
return $custom;
}
function spam_admin_custom_delete_confirm($edit = array()) {
$group = t('Are you sure you want to delete the "%filter" filter?', array('%filter' => htmlspecialchars($edit['filter']))) .' ';
$group .= form_hidden('scid', $edit['scid']);
$group .= form_submit(t('Delete filter'));
$output = form_group('Confirm filter deletion', $group);
return form($output, 'post', url('admin/spam/custom'));
}
function spam_admin_custom_delete($scid) {
db_query('DELETE FROM {spam_custom} WHERE scid = %d', $scid);
drupal_set_message(t('Filter deleted.'));
}
function spam_admin() {
$op = $_POST['op'];
$edit = $_POST['edit'];
if (empty($op)) {
$op = arg(1);
}
switch ($op) {
case 'spam':
switch (arg(2)) {
case 'rebuild':
$output = spam_admin_rebuild_check(arg(3));
break;
case 'custom':
switch (arg(4)) {
case 'edit':
$edit = _spam_load_custom(arg(3), 'array');
$output = spam_admin_custom($edit);
break;
case 'delete':
$edit = _spam_load_custom(arg(3), 'array');
$output = spam_admin_custom_delete_confirm($edit);
break;
default:
$output = spam_admin_custom();
break;
}
break;
case 'urls':
switch (arg(4)) {
case 'edit':
$edit = _spam_load_urls(arg(3), 'array');
$output = spam_admin_urls($edit);
break;
case 'delete':
$edit = _spam_load_urls(arg(3), 'array');
$output = spam_admin_urls_delete_confirm($edit);
break;
default:
$output = spam_admin_urls();
break;
}
break;
default:
$output = spam_admin_statistics();
break;
}
break;
case 'Update nodes':
case 'node':
if (arg(2) == 'spam') {
$output = spam_admin_node_overview();
}
break;
case 'Update comments':
case 'comment':
if (arg(2) == 'spam') {
$output = spam_admin_comment_overview();
}
break;
case 'Cancel':
drupal_goto('admin/spam');
case 'Rebuild filter':
$output = spam_admin_rebuild($edit['action']);
break;
case 'Add filter':
spam_admin_custom_add($edit);
drupal_goto('admin/spam/custom');
break;
case 'Edit filter':
spam_admin_custom_add($edit);
drupal_goto('admin/spam/custom');
break;
case 'Delete filter':
spam_admin_custom_delete($edit['scid']);
drupal_goto('admin/spam/custom');
break;
case 'Add URL filter':
spam_admin_urls_edit($edit, 'add');
drupal_goto('admin/spam/urls');
break;
case 'Edit URL filter':
spam_admin_urls_edit($edit, 'edit');
drupal_goto('admin/spam/urls');
break;
case 'Delete URL filter':
spam_admin_urls_delete($edit);
drupal_goto('admin/spam/urls');
break;
}
print theme('page', $output);
}
function theme_spam_link($content, $type = 'comment') {
$output = NULL;
if ($type == 'comment') {
if (variable_get('spam_filter_comments', 1)) {
$c = db_fetch_object(db_query('SELECT rating FROM {spam_comments} WHERE cid = %d', $content->cid));
$spam = l(t('mark as spam'), "spam/comment/$content->cid/spam");
$notspam = l(t('mark as not spam'), "spam/comment/$content->cid/notspam");
}
else
return $output;
}
else {
// node
if (variable_get("spam_filter_$content->type", 0)) {
$c = db_fetch_object(db_query('SELECT rating FROM {spam_nodes} WHERE nid = %d', $content->nid));
$spam = l(t('mark as spam'), "spam/node/$content->nid/spam");
$notspam = l(t('mark as not spam'), "spam/node/$content->nid/notspam");
}
else
return $output;
}
$access = user_access('access spam rating');
$admin = user_access('administer spam rating');
$display = variable_get('spam_display_rating', 0);
if (!$c->rating && $admin) {
$output = "$spam - $notspam";
}
else if ($c->rating < variable_get('spam_threshold', 80)) {
if ($access) {
$output = t('not spam') . ($display ? " ($c->rating)" : '');
}
if ($admin) {
if ($output)
$output .= ' - ';
$output .= $spam;
}
}
else {
if ($access) {
$output = t('spam') . ($display ? " ($c->rating)" : '');
}
if ($admin) {
if ($output)
$output .= ' - ';
$output .= $notspam;
}
}
return $output;
}
function spam_page() {
$content = arg(1);
$op = arg(3);
switch ($content) {
case 'comment':
if ($op) {
$comment = db_fetch_object(db_query('SELECT cid,nid,subject,comment,uid FROM {comments} WHERE cid = %d', arg(2)));
$old = db_fetch_object(db_query('SELECT rating,spam FROM {spam_comments} WHERE cid = %d', $comment->cid));
$spam = ($op == 'spam');
$tokens = spam_tokenize($comment->subject, 'subject*');
$tokens = array_merge($tokens, spam_tokenize($comment->comment));
if (!$old) {
// first time we've looked at this comment
spam_save_tokens($tokens, $op);
db_query('INSERT INTO {spam_comments} (cid,rating,spam,last) VALUES(%d, %d, %d, %d)', $comment->cid, $spam ? 99 : 1, $spam, time());
if ($spam)
_count(array('+spam', '+spam_comment'));
else
_count(array('+notspam', '+notspam_comment'));
}
else {
// updating comments spam status
spam_unsave_tokens($tokens, $op);
spam_save_tokens($tokens, $op);
db_query('UPDATE {spam_comments} SET rating = %d, spam = %d, last = %d WHERE cid = %d', $spam ? 99 : 1, $spam, time(), $comment->cid);
if ($spam) {
_count(array('+spam', '+spam_comment', '-notspam', '-notspam_comment', '+false_negative'));
}
else {
_count(array('-spam', '-spam_comment', '+notspam', '+notspam_comment', '+false_positive'));
}
}
if ($spam) {
if (variable_get('spam_unpublish', 0)) {
spam_admin_unpublish_comment($comment->cid);
}
drupal_set_message(t('Comment marked as spam.'));
drupal_goto("node/$comment->nid#comment-$comment->cid");
}
else {
if (variable_get('spam_unpublish', 0)) {
spam_admin_publish_comment($comment->cid);
}
drupal_set_message(t('Comment marked as not spam.'));
drupal_goto("node/$comment->nid#comment-$comment->cid");
}
}
break; // comment content type
case 'node':
if ($op) {
$node = db_fetch_object(db_query('SELECT nid,title,body,uid,type FROM {node} WHERE nid = %d', arg(2)));
$old = db_fetch_object(db_query('SELECT rating,spam FROM {spam_nodes} WHERE nid = %d', $node->nid));
$spam = ($op == 'spam');
$tokens = spam_tokenize($node->title, 'title*');
$tokens = array_merge($tokens, spam_tokenize($node->body));
if (!$old) {
// first time we've looked at this node
spam_save_tokens($tokens, $op);
db_query('INSERT INTO {spam_nodes} (nid,rating,spam,last) VALUES(%d, %d, %d, %d)', $node->nid, $spam ? 99 : 1, $spam, time());
if ($spam)
_count(array('+spam', "+spam_$node->type"));
else
_count(array('+notspam', "+spam_$node->type"));
}
else {
// updating comments spam status
spam_unsave_tokens($tokens, $op);
spam_save_tokens($tokens, $op);
db_query('UPDATE {spam_nodes} SET rating = %d, spam = %d, last = %d WHERE nid = %d', $spam ? 99 : 1, $spam, time(), $node->nid);
if ($spam) {
_count(array('+spam', "+spam_$node->type", '-notspam', "-notspam_$node->type", '+false_negative'));
}
else {
_count(array('-spam', "-spam_$node->type", '+notspam', "+notspam_$node->type", '+false_positive'));
}
}
if ($spam) {
if (variable_get('spam_unpublish', 0)) {
spam_admin_unpublish_node($node->nid);
}
drupal_set_message(t("$node->type marked as spam."));
drupal_goto("node/$node->nid");
}
else {
if (variable_get('spam_unpublish', 0)) {
spam_admin_publish_node($node->nid);
}
drupal_set_message(t('Node marked as not spam.'));
drupal_goto("node/$node->nid");
}
}
break; // node content type
}
}
/*********** spam module internal logic ***********/
/**
* Break text into words. Special handling currently exists for urls.
*
* @param $string A string of text to tokenize
* @param $tag An optional tag to prepend to each token
* @return An array of all words obtained from string
*/
function spam_tokenize($string, $tag = NULL) {
/* TODO: Add additional intelligence, such as:
- don't strip comma/period if between two digits
(preserving prices, IP's, etc)
- break "$20-25" into two words, but leave "big-time" as one
*/
$words = array();
$protocols = "(http://|https://|ftp://|mailto:)";
$tokens = " \t\n.,<>'\"";
// strip some unwanted html/url noise
$sanitized = preg_replace("'(www\.)|()|(href=)|(target=)|(src=)'", '', $string);
$tok = strtok($sanitized, $tokens);
while ($tok) {
$words[] = htmlspecialchars("$tag$tok");
$tok = strtok($tokens);
}
// get additional words from href tags
preg_match_all('/(.*?)<\/a>/i', $string, $urls);
foreach ($urls[1] as $url) {
$words[] = $url;
spam_count_urls($url);
$url = preg_replace("'$protocols'", '', $url);
preg_match("/^()?([^\/\"\']+)/i", $url, $domain);
$words[] = htmlspecialchars("URL*$domain[2]");
$tokens = "/.";
$tok = strtok($url, $tokens);
while ($tok) {
$words[] = htmlspecialchars("$tag$tok");
$tok = strtok($tokens);
}
}
// get urls that are not in href tags
$matches = preg_match_all("!(|[ \n\r\t\(]*)($protocols([a-zA-Z0-9@:%_~#?&=.,/;-]*[a-zA-Z0-9@:%_~#&=/;-]))([.,?]?)(?=( |[ \n\r\t\)]*))!i", $string, $urls);
foreach ($urls[2] as $url) {
spam_count_urls($url);
$url = preg_replace("'$protocols'", '', $url);
preg_match("/^()?([^\/\"\']+)/i", $url, $domain);
$words[] = htmlspecialchars("URL*$domain[2]");
$tokens = "/.";
$tok = strtok($url, $tokens);
while ($tok) {
$words[] = htmlspecialchars("$tag$tok");
$tok = strtok($tokens);
}
}
return $words;
}
function spam_count_urls($url = NULL) {
static $urls = array();
if ($url != NULL) {
$urls["$url"]++;
$urls['total']++;
}
return $urls;
}
function spam_count_repeat_urls($urls = array()) {
static $max = 0;
if (!$max) {
foreach ($urls as $url => $value) {
if ($url != 'total' && $value > $max)
$max = $value;
}
}
return ($max);
}
function spam_limit_urls($type = 'comment', $urls = array()) {
$weight = 0;
$interesting = variable_get('spam_interesting_tokens', 15);
if ($type == 'comment') {
if ($limit = variable_get('spam_comment_total_urls', 0)) {
if ($urls['total'] > $limit) {
_count(array('+comment_too_many_total_urls'));
$weight += 200;
}
}
if ($limit = variable_get('spam_comment_repeat_urls', 0)) {
if (spam_count_repeat_urls($urls) > $limit) {
_count(array('+comment_too_many_repeat_urls'));
$weight += 200;
}
}
}
else {
if ($limit = variable_get('spam_content_total_urls', 0)) {
if ($urls['total'] > $limit) {
_count(array('+content_too_many_total_urls'));
$weight += 200;
}
}
if ($limit = variable_get('spam_content_repeat_urls', 0)) {
if (spam_count_repeat_urls($urls) > $limit) {
_count(array('+content_too_many_repeat_urls'));
$weight += 200;
}
}
}
return ($weight * $interesting);
}
function spam_custom_filter($text) {
$interesting = variable_get('spam_interesting_tokens', 15);
$weight = 0;
$result = db_query('SELECT scid, filter, regex, effect FROM {spam_custom}');
while ($filter = db_fetch_object($result)) {
if ($filter->regex) {
$match = preg_match_all($filter->filter, $text, $matches);
}
else {
// not the fastest, but finds all matches...
$match = preg_match_all("/$filter->filter/", $text, $matches);
}
if ($match) {
_count(array("+matched_custom_filter"));
db_query('UPDATE {spam_custom} SET matches = matches + %d, last = %d WHERE scid = %d', $match, time(), $filter->scid);
switch ($filter->effect) {
case 0:
// always spam
_count(array("+matched_custom_filter_always_spam"));
$weight = $weight + ($match * $interesting * 200);
break;
case 1:
// usually spam
_count(array("+matched_custom_filter_usually_spam"));
$weight = $weight + ($match * $interesting * 50);
break;
case 2:
// usually not spam
_count(array("+matched_custom_filter_usually_not_spam"));
$weight = $weight - ($match * $interesting * 50);
break;
case 3:
// never spam
_count(array("+matched_custom_filter_never_spam"));
$weight = $weight - ($match * $interesting * 200);
break;
}
}
}
return $weight;
}
/**
* Test array of words against known words, determine probability is spam
*
* @param $tokens An array of tokens
* @return An int from 1 to 99 which is the probability that $tokens are spam
*/
// test whether or not current content in form of token array is spam
function _spam_rating($tokens = array(), $weight = 0) {
// build token array, use drift as index
foreach ($tokens as $token) {
$p = db_fetch_object(db_query('SELECT probability FROM {spam_tokens} WHERE token = "%s"', $token));
if (!$p->probability) {
$p->probability = variable_get('spam_unknown_probability', 40);
}
// get drift from median of 50
$t["$token,$p->probability"] = abs($p->probability - 50);
}
// sort so largest drift is first
asort($t);
$keys = array_keys($t);
$max = variable_get('spam_interesting_tokens', 15);
$total = 0;
// grab n tokens with largest drift
for ($i = 0; $i < $max; $i++) {
if ($pair = array_shift($keys)) {
$p = explode(',',$pair);
// add up combined probabilities
$total = $total + $p[1];
}
else {
// no more tokens
break;
}
}
$rating = ($total + $weight) / $i;
if ($rating > 99)
$rating = 99;
else if ($rating < 1)
$rating = 1;
return $rating;
}
/**
* Increment and decrement counters, used to measure the performance of the
* bayesian filter. When passed in, each counter name should start with a
* '+' or a '-', indicating whether the counter should be incremented or
* decremented.
*
* @param $counters An array of counter names
*/
function _count($counters) {
if (variable_get('spam_statistics', 1)) {
foreach ($counters as $counter) {
if (($c = trim($counter, '+')) != $counter) {
// increment counter
db_query('UPDATE {spam_statistics} SET value = value + 1, last = %d WHERE name = "%s"', time(), $c);
if (!db_affected_rows()) {
db_query('INSERT INTO {spam_statistics} (name, value, last) VALUES("%s", 1, %d)', $c, time());
}
}
else {
// decrement counter
$c = trim($counter, '-');
db_query('UPDATE {spam_statistics} SET value = value - 1, last = %d WHERE name = "%s"', time(), $c);
}
}
}
}
/**
* Saves an array of tokens to the database, marking as 'spam' or 'notspam'.
* If the token already exists in the database, then the appropriate counter
* is incremented, tracking how many times it has been seen.
*
* @param $tokens An array of tokens to be saved in the database
* @param $type String, either 'spam' or 'notspam'
*/
function spam_save_tokens($tokens, $type) {
foreach ($tokens as $token) {
if ($type == 'spam') {
db_query('UPDATE {spam_tokens} SET spam = spam + 1, last = %d WHERE token = "%s"', time(), $token);
if (!db_affected_rows()) {
@db_query('INSERT INTO {spam_tokens} (token, spam, probability, last) VALUES("%s", 1, 99, %d)', $token, time());
}
}
else {
db_query('UPDATE {spam_tokens} SET notspam = notspam + 1, last = %d WHERE token = "%s"', time(), $token);
if (!db_affected_rows()) {
@db_query('INSERT INTO {spam_tokens} (token, notspam, probability, last) VALUES("%s", 1, 1, %d)', $token, time());
}
}
}
_count(array("+learned_$type"));
variable_set('spam_calculate_probabilities', 1);
}
/**
* Removes an array of 'spam' or 'notspam' tokens from the database. Depending
* on what type of action is requested, either the token's 'spam' counter or
* 'notspam' counter will be decremented. This function is used to unlearn
* specific content that was mis-marked as spam or notspam.
*
* @param $tokens An array of tokens to be saved in the database
* @param $type String, either 'spam' or 'notspam'
*/
function spam_unsave_tokens($tokens, $type) {
foreach ($tokens as $token) {
if ($type == 'spam') {
db_query('UPDATE {spam_tokens} SET notspam = notspam - 1, last = %d WHERE token = "%s"', time(), $token);
}
else {
db_query('UPDATE {spam_tokens} SET spam = spam - 1, last = %d WHERE token = "%s"', time(), $token);
}
}
if ($type == 'spam')
_count(array("+unlearned_notspam"));
else
_count(array("+unlearned_spam"));
variable_set('spam_calculate_probabilities', 1);
}
/**
* Simplistic algorithm for calculating the probability that each token is
* spam. The lowest probability of 1 means there is a 1% chance that the
* given token is spam. The highest probability of 99 means there is a 99%
* chance that the given token is spam.
*
* By default we use a 'weighted' algorithm for calculating probabilities.
* The weighted algorithm is quicker to determine that a given token is or
* is not spam. An alternative non-weighted algorithm can be selected through
* the module's configuration page.
*
* TODO: Measure the effectiveness of both of these algorithms. Pick the best,
* or come up with something better. Ultimately this should not be a
* configurable option.
*/
function spam_calculate_probabilities($since = 0) {
// TODO: look into memory consumption with large token tables, could get big
$result = db_query('SELECT token,spam,notspam FROM {spam_tokens} WHERE last >= %d', $since ? $since : variable_get('spam_calculated_probabilities', 0));
while ($token = db_fetch_object($result)) {
$total = $token->spam + $token->notspam;
if ($total) {
if (variable_get('spam_weighted_probability', 1)) {
$spam_probability = $token->spam / $total * 100;
$notspam_probability = $token->notspam / $total * 100;
// start at median, then add/subtract spam/nonspam weights
$probability = (50 + $spam_probability - $notspam_probability);
}
else {
$probability = $token->spam / $total * 100;
}
if ($probability > 99)
$probability = 99;
else if ($probability < 1)
$probability = 1;
}
else {
$probability = variable_get('spam_unknown_probability', 40);
}
if ($token->probability != $probability) {
db_query("UPDATE {spam_tokens} SET probability = %d WHERE token = '%s'", $probability, $token->token);
}
}
variable_set('spam_calculated_probabilities', time());
variable_set('spam_calculate_probabilities', 0);
_count(array('+calculated_probabilites'));
}
// wrapper to user_mail
function _spam_mail($subject, $message, $content_title, $content_body, $editurl) {
$admin = user_load(array('uid' => 1));
$to = $admin->mail;
$from = variable_get('site_mail', ini_get('sendmail_from'));
$headers = "From: $from\nReply-to: $from\nX-Mailer: Drupal\nReturn-path: <$from>\nErrors-to: $from\n";
$variables = array('%adminname' => $admin->name, '%title' => $content_title, '%body' => $content_body, '%sitename' => variable_get('site_name', 'drupal'), '%editurl' => $editurl);
user_mail($to, $subject, wordwrap(strtr($message, $variables), 72), $headers);
}
// the actual text of all email generated by this module
function _spam_mail_text($message) {
switch ($message) {
case 'notify_admin_comment':
return t("Hello %adminname.\n\n A spam comment has been automatically detected on your '%sitename' website. The text of the spam comment is as follows:\n\n\nSubject: %title\n%body\n\n\n You can find the comment here:\n %editurl");
default:
return t("Hello %adminname.\n\n Content recently posted to your '%sitename' website has been automatically marked as spam. The text of the spam content is as follows:\n\n\nTitle: %title\n%body\n\n\n You can find the content here:\n %editurl");
}
}
function spam_comment_actions($comment, $tokens, $spam, $type) {
if ($spam) {
if (variable_get('spam_unpublish', 0)) {
spam_admin_unpublish_comment($comment->cid);
}
if (variable_get('spam_train_method', 0)) {
// auto-learn, TEFT = train everything
spam_save_tokens($tokens, 'spam');
}
if ($type == 'insert') {
if (variable_get('spam_notify_admin', 0)) {
_spam_mail(t('[%sitename] Detected spam comment', array('%sitename' => variable_get('site_name', 'drupal'))), _spam_mail_text('notify_admin_comment'), $comment->subject, $comment->comment, "$base_url" . url("/admin/comment/spam"));
}
_count(array('+spam', '+auto_marked_spam', '+spam_comment'));
}
else { // update
if (variable_get('spam_notify_admin', 0)) {
_spam_mail(t('[%sitename] Detected spam comment on update', array('%sitename' => variable_get('site_name', 'drupal'))), _spam_mail_text('notify_admin_comment'), $comment->subject, $comment->comment, "$base_url" . url("/admin/comment/spam"));
}
_count(array('+spam', '-notspam', '+auto_marked_spam', '+spam_comment', '+spam_on_comment_update'));
}
}
else { // not spam
if (variable_get('spam_unpublish', 0)) {
spam_admin_publish_comment($comment->cid);
}
if (variable_get('spam_train_method', 0)) {
// auto-learn, TEFT = train everything
spam_save_tokens($tokens, 'notspam');
}
if ($type == 'insert') {
_count(array('+notspam', '+auto_marked_notspam', '+notspam_comment'));
}
else {
_count(array('+notspam', '-spam', '+auto_marked_notspam', '+notspam_comment', 'notspam_on_comment_update'));
}
}
}
function spam_node_actions($node, $tokens, $spam, $type) {
global $base_url;
if ($spam) {
if (variable_get('spam_unpublish', 0)) {
spam_admin_unpublish_node($node->nid);
}
if (variable_get('spam_train_method', 0)) {
// auto-learn, TEFT = train everything
spam_save_tokens($tokens, 'spam');
}
if ($type == 'insert') {
if (variable_get('spam_notify_admin', 0)) {
_spam_mail(t('[%sitename] Detected spam content', array('%sitename' => variable_get('site_name', 'drupal'))), _spam_mail_text('notify_admin_content'), $node->title, $node->body, "$base_url" . url("/admin/node/spam"));
}
_count(array('+spam', '+auto_marked_spam', "+spam_$node->type"));
}
else { // update
if (variable_get('spam_notify_admin', 0)) {
_spam_mail(t('[%sitename] Detected spam content on update', array('%sitename' => variable_get('site_name', 'drupal'))), _spam_mail_text('notify_admin_content'), $node->title, $node->body, "$base_url" . url("/admin/node/spam"));
}
_count(array('+spam', '-notspam', '+auto_marked_spam', "+spam_$node->type", "+spam_on_$node->type".'update'));
}
}
else { // not spam
if (variable_get('spam_unpublish', 0)) {
spam_admin_publish_node($node->nid);
}
if (variable_get('spam_train_method', 0)) {
// auto-learn, TEFT = train everything
spam_save_tokens($tokens, 'notspam');
}
if ($type == 'insert') {
_count(array('+notspam', '+auto_marked_notspam', "+notspam_$node->type"));
}
else { // update
_count(array('+notspam', '-spam', '+auto_marked_notspam', "+notspam_$node->type", "+notspam_on_$node->type".'update'));
}
}
}
function spam_admin_mark_comment_spam($cid) {
$comment = db_fetch_object(db_query('SELECT cid,nid,subject,comment,uid FROM {comments} WHERE cid = %d', $cid));
$tokens = spam_tokenize($comment->subject, 'subject*');
$tokens = array_merge($tokens, spam_tokenize($comment->comment));
spam_unsave_tokens($tokens, 'spam');
spam_save_tokens($tokens, 'spam');
db_query('UPDATE {spam_comments} SET rating = 99, spam = 1, last = %d WHERE cid = %d', time(), $comment->cid);
if (variable_get('spam_unpublish', 0)) {
spam_admin_unpublish_comment($comment->cid);
}
_count(array('+spam', '+spam_comment','-notspam','+false_negative'));
drupal_set_message(t('Comment marked as spam.'));
}
function spam_admin_mark_comment_notspam($cid) {
$comment = db_fetch_object(db_query('SELECT cid,nid,subject,comment,uid FROM {comments} WHERE cid = %d', $cid));
$tokens = spam_tokenize($comment->subject, 'subject*');
$tokens = array_merge($tokens, spam_tokenize($comment->comment));
spam_unsave_tokens($tokens, 'notspam');
spam_save_tokens($tokens, 'notspam');
db_query('UPDATE {spam_comments} SET rating = 1, spam = 0, last = %d WHERE cid = %d', time(), $comment->cid);
if (variable_get('spam_unpublish', 0)) {
spam_admin_publish_comment($comment->cid);
}
_count(array('-spam', '-spam_comment','+notspam','+false_positive'));
drupal_set_message(t('Comment marked as not spam.'));
}
function spam_admin_publish_comment($cid) {
db_query('UPDATE {comments} SET status = 0 WHERE cid = %d', $cid);
$nid = db_fetch_object(db_query('SELECT nid FROM {comments} WHERE cid = %d', $cid));
_comment_update_node_statistics($nid->nid);
}
function spam_admin_unpublish_comment($cid) {
db_query('UPDATE {comments} SET status = 2 WHERE cid = %d', $cid);
$nid = db_fetch_object(db_query('SELECT nid FROM {comments} WHERE cid = %d', $cid));
_comment_update_node_statistics($nid->nid);
}
function spam_admin_mark_node_notspam($nid) {
$node = db_fetch_object(db_query('SELECT nid,title,body,uid,type FROM {node} WHERE nid = %d', $nid));
$tokens = spam_tokenize($node->title, 'title*');
$tokens = array_merge($tokens, spam_tokenize($node->body));
spam_unsave_tokens($tokens, 'notspam');
spam_save_tokens($tokens, 'notspam');
db_query('UPDATE {spam_nodes} SET rating = 1, spam = 0, last = %d WHERE nid = %d', time(), $nid);
if (variable_get('spam_unpublish', 0)) {
spam_admin_publish_node($nid);
}
_count(array('-spam', "-spam_$node->type", '+notspam','+false_positive'));
drupal_set_message(t("$node->type marked as not spam."));
drupal_goto('admin/node/spam');
}
function spam_admin_publish_node($nid) {
db_query('UPDATE {node} SET status = 1 WHERE nid = %d', $nid);
}
function spam_admin_unpublish_node($nid) {
db_query('UPDATE {node} SET status = 0 WHERE nid = %d', $nid);
}
function spam_admin_rebuild_check($action) {
switch ($action) {
case 'probabilities':
// no check here, it won't hurt anything, just takes CPU time
spam_admin_rebuild($action);
break;
case 'all':
drupal_set_title(t('Rebuild bayesian filter'));
$group = t('Are you sure that you want to completely rebuild the bayesian filter? This will reset all your counters, and flush all data that the module has learned. It will then relearn the known spam content. Knowledge from any spam content that you may have previously deleted will be completely lost. You should only perform this action if you are upgrading the spam.module to one using different tokenizer logic. If you don\'t know what this means, click "Cancel".') . ' ';
$group .= form_submit(t('Cancel'));
$group .= form_submit(t('Rebuild filter'));
$group .= form_hidden(t('action'), $action);
$output = form_group(t('Confirm action'), $group);
return form($output, 'post', url('admin/spam/rebuild'));
default:
drupal_goto('admin/spam');
break;
}
}
/* This function purges all spam tokens and statistics, completely rebuilding
* based on the list of spamm comments and nodes in found in their respective
* tables. This function is useful for three reasons: 1) if upgrading to a new
* version of the tokenizer, you need to resynchronize your known tokens, 2)
* if your spam filter seems to be well trained, and you want to cleanly
* measure its performance, and 3) if testing/developing an improved tokenizer
* or spam probability logic.
*/
function spam_admin_rebuild($action) {
switch ($action) {
case 'probabilities':
// recalculate all token probabilities
spam_calculate_probabilities(1);
drupal_goto('');
break;
case 'all':
db_query('DELETE FROM {spam_tokens}');
db_query('DELETE FROM {spam_statistics} WHERE name != "rebuilt_tokens_all"');
$result = db_query('SELECT cid FROM {spam_comments} WHERE spam = 1');
while ($c = db_fetch_object($result)) {
$comment = db_fetch_object(db_query('SELECT cid,subject,comment FROM {comments} WHERE cid = %d', $c->cid));
$tokens = spam_tokenize($comment->subject, 'subject*');
$tokens = array_merge($tokens, spam_tokenize($comment->comment));
spam_save_tokens($tokens, 'spam');
db_query('UPDATE {spam_comments} SET rating = 1, last = %d WHERE cid = %d', time(), $comment->cid);
if (variable_get('spam_unpublish', 0)) {
spam_admin_unpublish_comment($comment->cid);
}
_count(array('+spam', '+spam_comment'));
}
$result = db_query('SELECT nid FROM {spam_nodes} WHERE spam = 1');
while ($n = db_fetch_object($result)) {
$node = db_fetch_object(db_query('SELECT nid,title,body,type FROM {node} WHERE nid = %d', $n->nid));
$tokens = spam_tokenize($node->title, 'title*');
$tokens = array_merge($tokens, spam_tokenize($node->body));
spam_save_tokens($tokens, 'spam');
db_query('UPDATE {spam_nodes} SET rating = 1, last = %d WHERE nid = %d', time(), $nid);
if (variable_get('spam_unpublish', 0)) {
spam_admin_unpublish_node($nid);
}
_count(array('+spam', "+spam_$node->type"));
}
_count(array('+rebuilt_tokens_all'));
drupal_goto('/admin/spam');
break;
}
}
?>
|