Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify that parse-field returns an array. #57

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

PuercoPop
Copy link
Collaborator

Oi, If the headers are empty read-content signals a Type-Error (Value nil is not of type Array). If an empty header is to be allowed read-content shouldn't be able to handle it, which is what this patch does. If its not to be allowed the proper condition should be signaled.

Otherwise read-content fails to parse files with empty headers as it tries to aref nil
@kingcons
Copy link
Collaborator

I need to think more about the intent behind this change: 0e70d86. The docs probably also need an update. If a header is supplied, it should have a value (e.g. "tags: foo, bar"). But not supplying a header is a reasonable option, the header columns basically get converted to initargs and we have reasonable defaults most places. I'll double check the default-initargs on the different content types and figure out what to do here. I lean towards signaling a condition if empty headers (e.g. "tags:") are supplied.

@kingcons
Copy link
Collaborator

kingcons commented Sep 5, 2014

The docs have been updated and we now ignore invalid lines. (i.e. Those that do not match: (cl-ppcre:scan-to-strings "[a-zA-Z]+:\\s+(.*)" str)). This is on the basic-deploy branch and will be merged to master shortly for the next release. In a future release, this may be replaced with conditions and appropriate restarts.

@PuercoPop
Copy link
Collaborator Author

On a Meta note: Where should code-less commentary or code that is not PR-ready, but could use some ping-pong, be placed? I feel that issues are of bug reports and Email keeps potentially relevant history from possible contributors? I hope you are okay with me hijacking this thread for my 2c on the Quo Vadis Coleslaw topic.

I've read through the commits on the basic-deploy branch they look en-tuned to the direction of making coleslaw more extensible. I've tried extending coleslaw to fit a 'planet as repo' style where you just contribute by merging a pull request. The biggest issue I ran into is that most of the abstractions aren't 'first class', for example the theme is just a symbol taken as is from the config.

A minor annoyance is the abuse of global variables. It is convenient when a top level function like main grabs a default from a global *config* but *config* is sprayed in various not top level functions. Also the document db *site* should be a slot of the blog class, which could be renamed to a more general name like site.

Long story short I've started to code a solution from scratch to see if my ideas of what a static-site generator needs and what can be useful for coleslaw. What I think the core abstractions for a static-site generator are:

  • Content Loader: Content can be discovered using different strategy, currently Coleslaw uses a file extending as content-type,. Another approach would be to use the directory as content-type. Or even something like parsing rss feeds of a blog roll.
  • Documents: Documents are the at the heart of a site. That is why I think putting the document db at the core why some kind of query language (maybe just use optima's class constructor pattern?). That way Indices would be just a document that queries the document db with a specific selector. A more general name for indeices could be collections.
  • Themes: Documents can be displayed in many different forms. HTML and RSS feeds are two examples. Themes mediate this process.
  • URL Router: Basically just maps URLs to document. It could enable stuff stuff like url-of or to live preview the site without actually deploying it.

A Path for Coleslaw

IMHO the configuration of the blog is being conflated with the blog itself. However how best do so is still something that should be experimented on. One example of what I'm that talking is the theme slot of the blog object is just the symbol from the config. One short coming of this is that a symbol cannot convey information as to what templates (post/index/rss) it supports. A better idea would be to use the symbol to initialize a theme object which could have different back-ends (something you point at in a comment about debugging a template error).

A generic way to see themes is a way to turn a document into a concrete representation (HTML being the most common). So I propose something like:

(defgeneric render (theme template-name document)
  (:documentation "Evaluate the template using the document as the environment."))

and supporting different back-end (djula comes to mind) would just require to specialize on a different theme object.

Being able to extend the process of loading content to either follow different strategies or to allow dynamically generated content (say a twitter search or the rss feed of a blogs to follow) to be loaded into the document db. In my case I wanted to use directories to specify content types instead of file-extensions, so posts are in the posts/ dir etc. Some code to illustrate this:

(defgeneric load-content (loader site)
  (:documentation "Load documents to the site according the strategy of the
  loader. Arguments the loader takes should be passed as slots."))

(defclass directory-as-content-type-loader ()
  ((root-dir :initarg :root-dir :reader root-dir)
   (mappings :initarg :mappings :initform nil :reader mappings
             :documentation "An alist mapping a directory name to a
             content-type. If nil assume the directory is the content-type.")))


(defmethod load-content ((loader directory-as-content-type-loader) (site site))
  "List all directories and in every directory construct a document of the
  appropriate content-type and add it the site."
  (loop
     :for directory :in (subdirectories (root-dir loader))
     :do
     (loop
        :with last-dir-as-symbol := (symbolicate
                                        (string-upcase
                                         (car
                                          (last
                                           (pathname-directory directory)))))
        :for file :in (directory-files directory)
        :do 
        (add-document (parse-document file 
                                      (find-class 
                                       last-dir-as-symbol))
                          site))))

Indices are, and correct me If I am understanding them wrong, documents that are made by querying the document db and collecting documents which match a criteria. So they are a collection. How

Something I am not sure about is if the mapping between URLs and documents should be reified. Maybe this is outside Coleslaw's scope. It would make stuff like a building a web service to preview the site on a low hanging fruit though.

About some scripts for common tasks, I've been using a cl-launch script to deploy coleslaw during testing:

#!/usr/bin/cl -Q -sp coleslaw --entry entry
;;;;
;; This script assumes it is executed from the repository's top level directory
;; to determine correctly the blog-dir variable.
;;;;

(defun entry (argv)
  (declare (ignore argv))

  (let 
      ((old-rev (inferior-shell:run/s
                 "git log --oneline -1 | awk -e '{print $1}'"))
       (blog-dir (uiop/os:getcwd)))
    (main blog-dir old-rev)))

Other assorted Suggestions

Short

  • refactor site to be a slot in the blog class
  • rename blog class to a more general name site

Not so short

  • a defcontent macro, where creating an index for the content type is just a flag away.

Complex / Handwavery

libgit2 allows for a custom back-end to git's object db, maybe coleslaw's document-db could implement a cl-store back-end using cl-git. Keeping documenting versions that way to allow look into using cl-libgit. Just thinking out loud, I don't think this is a good idea.

Hosting Coleslaw documentation in a site made using coleslaw

Sorry for the long post, and if I come of as overly critical I apologize, I think coleslaw is great and am grateful for the hard work you and others have put into it.

@kingcons
Copy link
Collaborator

kingcons commented Sep 8, 2014

@PuercoPop There's a lot to respond to here so I'm likely to miss a few things.

First of all, you raise a great point that there isn't currently a public venue for discussing the long-term future of Coleslaw. We could go in the direction Rust does, discuss on IRC and post to the wiki, but a mailing list or google group might be even better. It would also help with "having trouble getting started"-type github issues.

I like most of the ideas you proposed and would be happy to merge or collaborate on pull requests for them. I should probably add notes to README or HACKING making it a little more clear how I'd like to run the project/work with contributors, I suppose.

I'm particularly interested in hearing more about your ideas for how collections/indices would work, perhaps using Optima's "class constructor pattern" which I'm unfamiliar with. defcontent seems related to this as well and might resolve some of the longstanding "Better Content Types" stuff in the HACKING docs.

One thing I'm not quite clear on is the global variable annoyances you mention. I agree that ignoring global state where possible is important. I think having separate variables for the config and content is reasonable but I suppose having one site variable containing everything wouldn't be harmful.

I think there is a sound mapping between URLs and Documents in coleslaw. add-document ensures that each document on the site has a unique URL. On the basic-deploy branch, the :routing config argument is used by the compute-url function to create a URL. This is done for each content type in an initialize-instance :after method. But maybe this isn't what you mean.

In general, I like the ideas you propose about first-class abstractions. I usually don't make things extensible until I, or someone else, needs the extensibility. That's one big reason themes and content loading aren't generic functions today. But I agree that coleslaw might be better if they were extensible and would be interested to see such a design fleshed out.

Long story short, I don't know how far you've gotten working on your solution but I'm interested in your ideas and happy to try and make broader changes to coleslaw.

@kingcons
Copy link
Collaborator

kingcons commented Sep 9, 2014

It also occurs to me that the directory loading scheme you mentioned could be implemented as a replacement of the current discover default method. I.e.

(defmethod discover (doc-type)
  (let* ((content-type (class-name doc-type))
         (dirname (format nil "~(~A~)" content-type)))
    (do-files (file (merge-pathnames dirname (repo *config*)))
      (let ((obj (construct content-type (read-content file))))
        (add-document obj)))))

Each content type currently has discover called on it during load-content. Defined as above, that would result in looking for a directory matching the class name and iterating over the files in it, adding each one to the site as an object of the respective class. It would be just as easy to pluck the dirname from an alist by doing something like:

(dirname (assoc content-type (dirnames *config*)))

Your solution may be better, of course. Just saying that accomodating the behavior you describe isn't currently impossible or even, in my opinion, very invasive.

@PuercoPop
Copy link
Collaborator Author

Oi Brit, sorry for the delayed response. I've been busy these days. I'll try to catch you on IRC later today.

First I want to clarify I think coleslaw is extensible as it currently stands, in fact I started to modify it before starting from a clean slate. It is just that modifying a a foreign code base is harder. Plus I kept worrying about breaking other people's blog.

To be fair, the global variables has been more of a nuisance to me on famiclom than on coleslaw, where I want to create a nes instance inside a let. I'll open an issue there later today or tomorrow. In coleslaw it would present a problem if for whatever reason I want to have two sites in parallel in the REPL. Some functions, like render take a value from *config* regardless of any parameter I pass to render.

You are right that discover provides a way to extend how content is discovered. However it conflates that content-type with where to look for it. Content-types are a generic way to represent content. It is then rendered into a specific presentation through a theme tempalte. Lets say I want to present the latest stock prices and I fetch them from yahoo-stocks, google-stocks or any other API. Currently I would have to either to either use different content-types for stocks fetched from different providers or provide a discover method that checks against the providers I want on a per site basis. With a content loader different stock providers could be just a different initialization argument of the same content-loader. Also I could look for different content types in the same place or combine different search strategies. I concede that this example is pretty very far from a common use case for coleslaw.

I agree that the routing mapping has been improved in the basic-deploy branch. I'll take a stab at a on the fly preview server using wookie and see if I run into any trouble and report back.

Regarding the collections/indices, that is where I am currently stuck. My ideal , which may be overkill is to have a document-db query DSL and collections would have query associated with them and slot where to save the matching docs. Maybe even a mechanism to be marked as dirty when more documents are added or add-document to would try to match against all collections. The gist of using optima would be something like this:

;; Note: My document base class uses a SHA1 of all the slots as a guid and use that as the key in the hash table.

(ql:quickload '(ironclad optima optima.ppcre))
(use-package :ironclad)
(use-package :optima)
(use-package :optima.ppcre)

(defclass document ()
  ((id :reader id :documentation "A SHA-1 of every slot except the id slot preprended with document."))
  (:documentation "Document base class."))

(defmethod initialize-instance :after ((obj document) &key)
  (setf (slot-value obj 'id)
        (let*
            ((sha1 (make-digest :sha1))
             (slots-as-strings
              (format nil "document~{~A~}"
                      (mapcar (lambda (slot)
                                (slot-value obj
                                            (slot-definition-name slot)))
                              (remove-if
                               (lambda (obj)
                                 (eq (slot-definition-name obj)
                                     'id))
                               (class-slots (class-of obj)))))))
          (update-digest sha1
                         (string-to-octets slots-as-strings))
          (sha1-buffer sha1))))

(defclass foo-doc (document)
  ((title :initarg :title :reader title)
   (author :initarg :author :reader author)))

(defmethod print-object ((obj foo-doc) stream)
  (print-unreadable-object (obj stream :type  t)
    (format stream "title: ~A author: ~A" (title obj) (author obj))))

(defun add-document (document db)
  (setf (gethash (id document) db) document))

(defvar *doc-db* 
  (let ((document-db (make-hash-table :test #'equal)))
    (add-document (make-instance 'foo-doc
                                 :title "I Love Lisp" :author "LispLover")
                  document-db)
    (add-document (make-instance 'foo-doc
                                 :title "Lisp Love Song" :author "LispLover")
                  document-db)
    (add-document (make-instance 'foo-doc
                                 :title "I hate Lisp" :author "LispHater")
                  document-db)
    document-db))

Now say I want to retrieve all instances of the document class, all foo-docs from "LispHater" and all foo-docs whose title contains the word Love. I would collect the documents for the document-db with

(loop
  :with matches = nil
  :for doc :being :the hash-values :in *doc-db* 
  :do 
  (match doc
    ((document) (setf matches (adjoin doc matches))))
  :finally (return matches))


(loop
  :with matches = nil
  :for doc :being :the hash-values :in *doc-db* 
  :do 
  (match doc
    ((foo-doc :author "LispHater")  (setf matches (adjoin doc matches))))
  :finally (return matches))

(loop
  :with matches = nil
  :for doc :being :the hash-values :in *doc-db* 
  :do 
  (match doc
    ((foo-doc :title (ppcre ".*Love.*"))  (setf matches (adjoin doc matches))))
  :finally (return matches))

On second thought I'm unsure if optima is the proper tool here as pattern matching is more about plucking values from a concrete structure. Well anyhow as you can see it is far from a polished idea.

On a side regarding the separating of configuration with the site object itself one of my favorite twitter accounts, NotDoctorOK, just tweeted about it

Finally my comment about the lack of public venue was more general than just Coleslaw, but adding a roadmap or wanted features would certainly help potential contributors. I'll guess I'll try using issues as forums for a while and see how it goes.

P.D. You might want to know that he stefil fork has recently been renamed to fiasco and is going to be in QL soon.

@kingcons
Copy link
Collaborator

Leaving this open for design discussion but the initial error with read-content has been improved in 0.9.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants