Cleanword
A simple, fast, and extensible JavaScript package to detect and censor abusive words in multiple Indian and international languages. Useful for chat moderation, content filtering, and building safe online communities.
Features
- Detects and censors abusive words in Hindi, English, Bengali, Urdu, and more
- Customizable censorship character (grawlix)
- Fine-grained control with
alwaysAllow
andalwaysBlock
word lists - Easy to use and integrate in Node.js projects
Installation
npm install cleanword
Usage
Example
const { cleanText } = require('cleanword');
const options = {
language: ['english', 'hindi'],
grawlixChar: '@',
alwaysAllow: ['kutto'],
alwaysBlock: ['test', 'what'],
};
const cleaned = cleanText('This is a test sentence with kutto and what.', options);
console.log(cleaned); // This is a @@@@ sentence with kutto and @@@@..
TypeScript Example
import { cleanText } from 'cleanword';
interface CleanTextOptions {
language: string[];
grawlixChar: string;
alwaysAllow: string[];
alwaysBlock: string[];
}
const options: CleanTextOptions = {
language: ['english', 'hindi'],
grawlixChar: '@',
alwaysAllow: ['kutto'],
alwaysBlock: ['test', 'what'],
};
const cleaned: string = cleanText('This is a test sentence with kutto and what.', options);
console.log(cleaned); // This is a @@@@ sentence with kutto and @@@@.
API
cleanText(text, options)
- text (
string
): The input string to clean. - options (
object
, optional):language
:string | string[]
— Language(s) to check (default:'hindi'
).grawlixChar
:string
— Character to use for censorship (default:'*'
).alwaysAllow
:string[]
— Words that should never be censored, even if abusive.alwaysBlock
:string[]
— Words that should always be censored, even if not abusive.customAbuseSet
:Set<string>
— Custom set of abusive words (for advanced use/testing).
Returns: The cleaned string with abusive words replaced by the grawlix character.
Config Options
Option | Type | Description |
---|---|---|
language |
string/string[] | Languages to check (e.g. 'hindi' , 'english' , 'bengali' , 'urdu' ) |
grawlixChar |
string | Character to use for censorship (default: '*' ) |
alwaysAllow |
string[] | Words to never censor |
alwaysBlock |
string[] | Words to always censor |
customAbuseSet |
Set<string> | Custom abusive word set (advanced/testing) |
Supported Languages
- Hindi
- English
- Assamese
- Bengali
- Bhojpuri
- Marathi
- Chhattisgarhi
- Gujarati
- Haryanvi
- Kannada
- Kashmiri
- Konkani
- Ladakhi
- Malayalam
- Manipuri
- Marwari
- Nepali
- Odia
- Punjabi
- Rajasthani
- Tamil
- Telugu
- Urdu
You can specify one or more languages using the language
option. Example:
cleanText('some text', { language: ['hindi', 'english'] });
Contributing
- Fork this repository and clone your fork.
- Install dependencies:
npm install
- Add or improve abusive word lists in
src/abuse_words.js
. - Add or update tests in
Test/cleanText.test.js
. - Run tests:
npm test
- Submit a pull request with a clear description of your changes.
Guidelines:
- Please be respectful and avoid adding non-abusive or irrelevant words.
- Keep word lists accurate and up-to-date for each language.
- Add tests for any new features or language support.
Author
Developed with ❤️ by Nabarup
If you find this package useful, ⭐ star the repo and share it!
License
MIT © 2025 Nabarup.
Use freely. Contribute with respect.
Feedback & Contact
For feature requests, feedback, or bug reports, open an issue or email me at nabaruproy.dev@gmail.com .