Cleanword

A simple, fast, and extensible JavaScript package to detect and censor abusive words in multiple Indian and international languages. Useful for chat moderation, content filtering, and building safe online communities.

Features

Detects and censors abusive words in Hindi, English, Bengali, Urdu, and more
Customizable censorship character (grawlix)
Fine-grained control with alwaysAllow and alwaysBlock word lists
Easy to use and integrate in Node.js projects

Installation

npm install cleanword

Usage

Example

const { cleanText } = require('cleanword');

const options = {
  language: ['english', 'hindi'],
  grawlixChar: '@',
  alwaysAllow: ['kutto'],
  alwaysBlock: ['test', 'what'],
};
const cleaned = cleanText('This is a test sentence with kutto and what.', options);
console.log(cleaned); // This is a @@@@ sentence with kutto and @@@@..

TypeScript Example

import { cleanText } from 'cleanword';

interface CleanTextOptions {
  language: string[];
  grawlixChar: string;
  alwaysAllow: string[];
  alwaysBlock: string[];
}

const options: CleanTextOptions = {
  language: ['english', 'hindi'],
  grawlixChar: '@',
  alwaysAllow: ['kutto'],
  alwaysBlock: ['test', 'what'],
};

const cleaned: string = cleanText('This is a test sentence with kutto and what.', options);
console.log(cleaned); // This is a @@@@ sentence with kutto and @@@@.

API

`cleanText(text, options)`

text (string): The input string to clean.
options (object, optional):
- language: string | string[] — Language(s) to check (default: 'hindi').
- grawlixChar: string — Character to use for censorship (default: '*').
- alwaysAllow: string[] — Words that should never be censored, even if abusive.
- alwaysBlock: string[] — Words that should always be censored, even if not abusive.
- customAbuseSet: Set<string> — Custom set of abusive words (for advanced use/testing).

Returns: The cleaned string with abusive words replaced by the grawlix character.

Config Options

Option	Type	Description
`language`	string/string[]	Languages to check (e.g. `'hindi'`, `'english'`, `'bengali'`, `'urdu'`)
`grawlixChar`	string	Character to use for censorship (default: `'*'`)
`alwaysAllow`	string[]	Words to never censor
`alwaysBlock`	string[]	Words to always censor
`customAbuseSet`	Set<string>	Custom abusive word set (advanced/testing)

Supported Languages

Hindi
English
Assamese
Bengali
Bhojpuri
Marathi
Chhattisgarhi
Gujarati
Haryanvi
Kannada
Kashmiri
Konkani
Ladakhi
Malayalam
Manipuri
Marwari
Nepali
Odia
Punjabi
Rajasthani
Tamil
Telugu
Urdu

You can specify one or more languages using the language option. Example:

cleanText('some text', { language: ['hindi', 'english'] });

Contributing

Fork this repository and clone your fork.
Install dependencies:
```
npm install
```
Add or improve abusive word lists in src/abuse_words.js.
Add or update tests in Test/cleanText.test.js.
Run tests:
```
npm test
```
Submit a pull request with a clear description of your changes.

Guidelines:

Please be respectful and avoid adding non-abusive or irrelevant words.
Keep word lists accurate and up-to-date for each language.
Add tests for any new features or language support.

Author

Developed with ❤️ by Nabarup

If you find this package useful, ⭐ star the repo and share it!

License

npm version npm downloads

Feedback & Contact

For feature requests, feedback, or bug reports, open an issue or email me at nabaruproy.dev@gmail.com .

Detalhes do pacote

cleanword

readme (leia-me)