Skip to content

Commit

Permalink
Merge pull request #42 from kiwilan/develop
Browse files Browse the repository at this point in the history
2.0.20
  • Loading branch information
ewilan-riviere committed Aug 28, 2023
2 parents 875c43e + 384cf23 commit cef5002
Show file tree
Hide file tree
Showing 15 changed files with 277 additions and 46 deletions.
5 changes: 3 additions & 2 deletions .github/workflows/codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,10 @@ jobs:
steps:
- name: Install for Linux
run: |
sudo apt update
sudo apt -y install p7zip-full ghostscript imagemagick
sudo apt-get install -y unrar
sudo apt-get install -y libunrar-dev
sudo apt install -y unrar
sudo apt install -y libunrar-dev
sudo sed -i '/disable ghostscript format types/,+6d' /etc/ImageMagick-6/policy.xml
shell: bash

Expand Down
5 changes: 3 additions & 2 deletions .github/workflows/run-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,10 @@ jobs:
steps:
- name: Install
run: |
sudo apt update
sudo apt -y install p7zip-full ghostscript imagemagick
sudo apt-get install -y unrar
sudo apt-get install -y libunrar-dev
sudo apt install -y unrar
sudo apt install -y libunrar-dev
sudo sed -i '/disable ghostscript format types/,+6d' /etc/ImageMagick-6/policy.xml
shell: bash

Expand Down
13 changes: 11 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,6 @@ composer require kiwilan/php-ebook

## Usage

### Main

With eBook files (`.epub`, `.cbz`, `.cba`, `.cbr`, `.cb7`, `.cbt`, `.pdf`) or audiobook files (`mp3`, `m4a`, `m4b`, `flac`, `ogg`).

```php
Expand All @@ -114,6 +112,7 @@ $ebook->getTitle(); // string
$ebook->getAuthors(); // BookAuthor[] (`name`: string, `role`: string)
$ebook->getAuthorMain(); // ?BookAuthor => First BookAuthor (`name`: string, `role`: string)
$ebook->getDescription(); // ?string
$ebook->getDescriptionHtml(); // ?string
$ebook->getCopyright(); // ?string
$ebook->getPublisher(); // ?string
$ebook->getIdentifiers(); // BookIdentifier[] (`value`: string, `scheme`: string)
Expand Down Expand Up @@ -142,13 +141,22 @@ $ebook->getExtras(); // array<string, mixed> => additional data for book
$ebook->getExtra(string $key); // mixed => safely extract data from `extras` array
```

To know if eBook is valid, you can use `isValid()` static method, before `read()`.

```php
use Kiwilan\Ebook\Ebook;

$isValid = Ebook::isValid('path/to/ebook.epub');
```

To get additional data, you can use these methods:

```php
$ebook->getMetadata(); // ?EbookMetadata => metadata with parsers
$ebook->getMetaTitle(); // ?MetaTitle, with slug and sort properties for `title` and `series`
$ebook->getFormat(); // ?EbookFormatEnum => `epub`, `pdf`, `cba`
$ebook->getCover(); // ?EbookCover => cover of book
$ebook->getArchive(); // ?BaseArchive => archive of book from `kiwilan/php-archive`
```

And to test if some data exists:
Expand Down Expand Up @@ -263,6 +271,7 @@ Please see [CHANGELOG](CHANGELOG.md) for more information on what has changed re

- [`spatie`](https://github.com/spatie) for `spatie/package-skeleton-php`
- [`kiwilan`](https://github.com/kiwilan) for `kiwilan/php-archive`, `kiwilan/php-audio`, `kiwilan/php-xml-reader`
- [All Contributors](../../contributors)

## License

Expand Down
2 changes: 1 addition & 1 deletion composer.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "kiwilan/php-ebook",
"description": "PHP package to read metadata and extract covers from eBooks (.epub, .cbz, .cbr, .cb7, .cbt, .pdf) and audiobooks (.mp3, .m4a, .m4b, .flac, .ogg).",
"version": "2.0.12",
"version": "2.0.20",
"keywords": [
"php",
"ebook",
Expand Down
116 changes: 91 additions & 25 deletions src/Ebook.php
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ class Ebook

protected ?string $description = null;

protected ?string $descriptionHtml = null;

protected ?string $publisher = null;

/** @var BookIdentifier[] */
Expand Down Expand Up @@ -66,6 +68,7 @@ class Ebook
protected function __construct(
protected string $path,
protected string $filename,
protected string $basename,
protected string $extension,
protected ?BaseArchive $archive = null,
protected ?Audio $audio = null,
Expand All @@ -83,7 +86,46 @@ protected function __construct(
public static function read(string $path): ?self
{
$start = microtime(true);
$filename = pathinfo($path, PATHINFO_BASENAME);
$self = self::parseFile($path);

$format = match ($self->format) {
EbookFormatEnum::EPUB => $self->epub(),
EbookFormatEnum::MOBI => $self->mobi(),
EbookFormatEnum::CBA => $self->cba(),
EbookFormatEnum::PDF => $self->pdf(),
EbookFormatEnum::AUDIOBOOK => $self->audiobook(),
default => null,
};

if ($format === null) {
return null;
}

$self->metaTitle = MetaTitle::make($self);

$time = microtime(true) - $start;
$self->execTime = (float) number_format((float) $time, 5, '.', '');

return $self;
}

/**
* Check if an ebook file is valid.
*/
public static function isValid(string $path): bool
{
$self = self::parseFile($path);

return ! $self->isBadFile;
}

/**
* Parse an ebook file.
*/
private static function parseFile(string $path): Ebook
{
$basename = pathinfo($path, PATHINFO_BASENAME);
$filename = pathinfo($path, PATHINFO_FILENAME);
$extension = pathinfo($path, PATHINFO_EXTENSION);

$cbaExtensions = ['cbz', 'cbr', 'cb7', 'cbt'];
Expand All @@ -100,7 +142,7 @@ public static function read(string $path): ?self
throw new \Exception("Unknown archive type: {$extension}");
}

$self = new self($path, $filename, $extension);
$self = new self($path, $filename, $basename, $extension);

$self->format = match ($extension) {
'epub' => $self->format = EbookFormatEnum::EPUB,
Expand Down Expand Up @@ -141,24 +183,6 @@ public static function read(string $path): ?self
$self->audio = Audio::get($path);
}

$format = match ($self->format) {
EbookFormatEnum::EPUB => $self->epub(),
EbookFormatEnum::MOBI => $self->mobi(),
EbookFormatEnum::CBA => $self->cba(),
EbookFormatEnum::PDF => $self->pdf(),
EbookFormatEnum::AUDIOBOOK => $self->audiobook(),
default => null,
};

if ($format === null) {
return null;
}

$self->metaTitle = MetaTitle::make($self);

$time = microtime(true) - $start;
$self->execTime = (float) number_format((float) $time, 5, '.', '');

return $self;
}

Expand Down Expand Up @@ -216,6 +240,7 @@ private function convertEbook(): self
$this->authorMain = $ebook->getAuthorMain();
$this->authors = $ebook->getAuthors();
$this->description = $ebook->getDescription();
$this->descriptionHtml = $ebook->getDescriptionHtml();
$this->publisher = $ebook->getPublisher();
$this->identifiers = $ebook->getIdentifiers();
$this->publishDate = $ebook->getPublishDate();
Expand Down Expand Up @@ -294,7 +319,12 @@ public function getAuthors(): array
}

/**
* Description of the book.
* Description of the book, without HTML.
*
* If original description has HTML, all HTML will be removed and text will be trimmed.
* You can use `getDescriptionHtml()` to get the original description sanitized.
*
* @param int|null $limit Limit the length of the description.
*/
public function getDescription(int $limit = null): ?string
{
Expand All @@ -305,6 +335,16 @@ public function getDescription(int $limit = null): ?string
return $this->description;
}

/**
* Description of the book with HTML sanitized.
*
* If original description doesn't have HTML, it will be the same as `getDescription()`.
*/
public function getDescriptionHtml(): ?string
{
return $this->descriptionHtml;
}

/**
* Publisher of the book.
*/
Expand Down Expand Up @@ -388,31 +428,48 @@ public function getPath(): string
}

/**
* Filename of the ebook.
* Filename of the ebook, e.g. `The Clan of the Cave Bear`.
*/
public function getFilename(): string
{
return $this->filename;
}

/**
* Extension of the ebook.
* Basename of the ebook, e.g. `The Clan of the Cave Bear.epub`.
*/
public function getBasename(): string
{
return $this->basename;
}

/**
* Extension of the ebook, e.g. `epub`.
*/
public function getExtension(): string
{
return $this->extension;
}

/**
* Archive reader.
* Archive reader, from `kiwilan/php-archive`.
*
* @docs https://github.com/kiwilan/php-archive
*/
public function getArchive(): ?BaseArchive
{
// if (! $this->archive) {
// error_log("{$this->path} can't be read as archive.");
// throw new \Exception("{$this->path} can't be read as archive.");
// }

return $this->archive;
}

/**
* Audio reader.
* Audio reader, from `kiwilan/php-audio`.
*
* @docs https://github.com/kiwilan/php-audio
*/
public function getAudio(): ?Audio
{
Expand Down Expand Up @@ -601,6 +658,13 @@ public function setDescription(?string $description): self
return $this;
}

public function setDescriptionHtml(?string $descriptionHtml): self
{
$this->descriptionHtml = $descriptionHtml;

return $this;
}

public function setPublisher(?string $publisher): self
{
$this->publisher = $publisher;
Expand Down Expand Up @@ -705,6 +769,7 @@ public function toArray(): array
'authorMain' => $this->authorMain?->getName(),
'authors' => array_map(fn (BookAuthor $author) => $author->getName(), $this->authors),
'description' => $this->description,
'descriptionHtml' => $this->descriptionHtml,
'publisher' => $this->publisher,
'identifiers' => array_map(fn (BookIdentifier $identifier) => $identifier->toArray(), $this->identifiers),
'date' => $this->publishDate?->format('Y-m-d H:i:s'),
Expand All @@ -716,6 +781,7 @@ public function toArray(): array
'pagesCount' => $this->pagesCount,
'path' => $this->path,
'filename' => $this->filename,
'basename' => $this->basename,
'extension' => $this->extension,
'format' => $this->format,
'metadata' => $this->metadata?->toArray(),
Expand Down
53 changes: 53 additions & 0 deletions src/Formats/EbookModule.php
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,59 @@ abstract public function toCounts(): Ebook;

abstract public function toArray(): array;

/**
* Convert HTML to string, remove all tags.
*/
protected function htmlToString(?string $html): ?string
{
if (! $html) {
return null;
}

$html = strip_tags($html);
$html = $this->formatText($html);

return $html;
}

/**
* Sanitize HTML, remove all tags except div, p, br, b, i, u, strong, em.
*/
protected function sanitizeHtml(?string $html): ?string
{
if (! $html) {
return null;
}

$html = strip_tags($html, [
'div',
'p',
'br',
'b',
'i',
'u',
'strong',
'em',
]);
$html = $this->formatText($html);

return $html;
}

/**
* Clean string, remove tabs, new lines, carriage returns, and multiple spaces.
*/
private function formatText(string $text): string
{
$text = str_replace("\n", '', $text);
$text = str_replace("\r", '', $text);
$text = str_replace("\t", '', $text);
$text = trim($text);
$text = preg_replace('/\s+/', ' ', $text);

return $text;
}

public function toJson(): string
{
return json_encode($this->toArray(), JSON_PRETTY_PRINT);
Expand Down
17 changes: 8 additions & 9 deletions src/Formats/Epub/EpubMetadata.php
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,8 @@ public function toEbook(): Ebook

$authors = array_values($this->opf->getDcCreators());
$this->ebook->setAuthors($authors);
if ($this->opf->getDcDescription()) {
$this->ebook->setDescription(strip_tags($this->opf->getDcDescription()));
}
$this->ebook->setDescription($this->htmlToString($this->opf->getDcDescription()));
$this->ebook->setDescriptionHtml($this->sanitizeHtml($this->opf->getDcDescription()));
$this->ebook->setCopyright(! empty($this->opf->getDcRights()) ? implode(', ', $this->opf->getDcRights()) : null);
$this->ebook->setPublisher($this->opf->getDcPublisher());
$this->ebook->setIdentifiers($this->opf->getDcIdentifiers());
Expand All @@ -97,14 +96,14 @@ public function toEbook(): Ebook
$rating = null;
if (! empty($this->opf->getMeta())) {
foreach ($this->opf->getMeta() as $meta) {
if ($meta->name() === 'calibre:series') {
$this->ebook->setSeries($meta->content());
if ($meta->getName() === 'calibre:series') {
$this->ebook->setSeries($meta->getContent());
}
if ($meta->name() === 'calibre:series_index') {
$this->ebook->setVolume((int) $meta->content());
if ($meta->getName() === 'calibre:series_index') {
$this->ebook->setVolume((int) $meta->getContent());
}
if ($meta->name() === 'calibre:rating') {
$rating = (float) $meta->content();
if ($meta->getName() === 'calibre:rating') {
$rating = (float) $meta->getContent();
}
}
}
Expand Down
Loading

0 comments on commit cef5002

Please sign in to comment.